Chapter 13: ufunc Set Operations

1. First important truth: NumPy set operations are ufuncs — but with special behavior

NumPy provides six main set-operation ufuncs:

ufunc What it computes Returns Symmetric? Multi-array?
np.intersect1d intersection (common elements) sorted unique array yes no
np.union1d union (all unique elements) sorted unique array yes no
np.setdiff1d set difference (A − B) sorted unique array no no
np.setxor1d symmetric difference (A Δ B) sorted unique array yes no
np.in1d / np.isin membership test (element in set) boolean array yes
np.unique unique elements (often used with sets) sorted unique array

Key characteristics of NumPy set ufuncs:

  • They always return sorted unique values (except isin)
  • They ignore duplicates automatically
  • They work on 1D arrays only (flatten higher dimensions)
  • They are very fast — implemented in efficient C code
  • They do not preserve order — if order matters, you must sort manually

2. Basic usage examples – one by one

Python

3. The most important function: np.isin / np.in1d (membership test)

This is the ufunc you will use most often when doing set-like filtering.

Python

isin vs in1d — they are almost identical (since NumPy 1.12+ isin is preferred)

Python

4. Realistic examples – patterns you will use every day

Pattern 1 – Keep only rows where a column value is in a set

Python

Pattern 2 – Remove duplicates across multiple columns

Python

Pattern 3 – Find elements present in one array but not another

Python

Pattern 4 – Symmetric difference – users in A or B but not both

Python

5. Summary – NumPy Set Operation ufuncs Quick Reference

Function What it returns Sorted? Unique? Axis-aware?
np.intersect1d common elements yes yes no
np.union1d all unique elements yes yes no
np.setdiff1d(A, B) A − B yes yes no
np.setxor1d A Δ B (exclusive or) yes yes no
np.isin / np.in1d boolean mask — is element in set? yes
np.unique unique elements (sorted) yes yes yes (axis)

Final teacher advice (very important)

Golden rule #1 Use np.isin or np.in1d whenever you want to filter elements that belong to a set — this is the most common set operation you will write.

Golden rule #2 All set ufuncs (except isin) return sorted unique values — if you need original order preserved, use masking + np.unique(…, return_index=True).

Golden rule #3 Set operations in NumPy work only on 1D arrays — for 2D rows you must use np.unique(…, axis=0) or convert rows to tuples.

Golden rule #4 When comparing very large sets — np.isin is usually faster than np.intersect1d + indexing.

Would you like to continue with any of these next?

  • Finding unique rows in 2D arrays (axis=0 tricks)
  • Set operations with structured arrays / object dtype
  • Realistic mini-project: filter users, remove duplicates, find exclusive members
  • Performance comparison: isin vs setdiff1d vs in1d
  • Common bugs when mixing sorted vs unsorted expectations

Just tell me what you want to focus on next! 😊

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *