Chapter 13: ufunc Set Operations
1. First important truth: NumPy set operations are ufuncs — but with special behavior
NumPy provides six main set-operation ufuncs:
| ufunc | What it computes | Returns | Symmetric? | Multi-array? |
|---|---|---|---|---|
| np.intersect1d | intersection (common elements) | sorted unique array | yes | no |
| np.union1d | union (all unique elements) | sorted unique array | yes | no |
| np.setdiff1d | set difference (A − B) | sorted unique array | no | no |
| np.setxor1d | symmetric difference (A Δ B) | sorted unique array | yes | no |
| np.in1d / np.isin | membership test (element in set) | boolean array | — | yes |
| np.unique | unique elements (often used with sets) | sorted unique array | — | — |
Key characteristics of NumPy set ufuncs:
- They always return sorted unique values (except isin)
- They ignore duplicates automatically
- They work on 1D arrays only (flatten higher dimensions)
- They are very fast — implemented in efficient C code
- They do not preserve order — if order matters, you must sort manually
2. Basic usage examples – one by one
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
a = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5]) b = np.array([2, 7, 1, 8, 2, 8, 1, 8]) print("a =", a) print("b =", b) print("\nIntersection (common elements):") print(np.intersect1d(a, b)) # [1 2 5] print("\nUnion (all unique elements):") print(np.union1d(a, b)) # [1 2 3 4 5 6 7 8 9] print("\nA − B (elements in a but not in b):") print(np.setdiff1d(a, b)) # [3 4 6 9] print("\nSymmetric difference (A Δ B):") print(np.setxor1d(a, b)) # [3 4 6 7 9] |
3. The most important function: np.isin / np.in1d (membership test)
This is the ufunc you will use most often when doing set-like filtering.
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
values = np.array([10, 20, 30, 40, 50, 60, 70, 80, 90, 100]) allowed = np.array([20, 50, 80, 110]) print("Which values are in allowed set?") print(np.isin(values, allowed)) # [False True False False True False False True False False] # Very common pattern: keep only allowed values filtered = values[np.isin(values, allowed)] print("Filtered:", filtered) # [20 50 80] |
isin vs in1d — they are almost identical (since NumPy 1.12+ isin is preferred)
|
0 1 2 3 4 5 6 |
print(np.in1d(values, allowed)) # same boolean mask |
4. Realistic examples – patterns you will use every day
Pattern 1 – Keep only rows where a column value is in a set
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
data = np.array([ [101, 'A', 23.5], [102, 'B', 19.8], [103, 'A', 25.1], [104, 'C', 22.0], [105, 'B', 21.7], [106, 'A', 24.3] ], dtype=object) valid_categories = np.array(['A', 'B']) mask = np.isin(data[:, 1], valid_categories) filtered_data = data[mask] print("Original shape:", data.shape) print("Filtered shape:", filtered_data.shape) print("Filtered data:\n", filtered_data) |
Pattern 2 – Remove duplicates across multiple columns
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
events = np.array([ [1, 'login', '2023-01-01'], [2, 'click', '2023-01-02'], [1, 'login', '2023-01-01'], # duplicate [3, 'logout', '2023-01-03'], [2, 'click', '2023-01-02'] # duplicate ]) # Unique rows (very common need) unique_events = np.unique(events, axis=0) print("Original:\n", events) print("\nUnique:\n", unique_events) |
Pattern 3 – Find elements present in one array but not another
|
0 1 2 3 4 5 6 7 8 9 10 11 |
all_users = np.arange(1000, 1100) active_users = np.random.choice(all_users, size=60, replace=False) inactive_users = np.setdiff1d(all_users, active_users) print("Inactive users count:", len(inactive_users)) |
Pattern 4 – Symmetric difference – users in A or B but not both
|
0 1 2 3 4 5 6 7 8 9 10 11 |
team_A = np.array([101, 103, 105, 107, 109]) team_B = np.array([102, 104, 106, 107, 108]) exclusive = np.setxor1d(team_A, team_B) print("Exclusive members (A XOR B):", exclusive) |
5. Summary – NumPy Set Operation ufuncs Quick Reference
| Function | What it returns | Sorted? | Unique? | Axis-aware? |
|---|---|---|---|---|
| np.intersect1d | common elements | yes | yes | no |
| np.union1d | all unique elements | yes | yes | no |
| np.setdiff1d(A, B) | A − B | yes | yes | no |
| np.setxor1d | A Δ B (exclusive or) | yes | yes | no |
| np.isin / np.in1d | boolean mask — is element in set? | — | — | yes |
| np.unique | unique elements (sorted) | yes | yes | yes (axis) |
Final teacher advice (very important)
Golden rule #1 Use np.isin or np.in1d whenever you want to filter elements that belong to a set — this is the most common set operation you will write.
Golden rule #2 All set ufuncs (except isin) return sorted unique values — if you need original order preserved, use masking + np.unique(…, return_index=True).
Golden rule #3 Set operations in NumPy work only on 1D arrays — for 2D rows you must use np.unique(…, axis=0) or convert rows to tuples.
Golden rule #4 When comparing very large sets — np.isin is usually faster than np.intersect1d + indexing.
Would you like to continue with any of these next?
- Finding unique rows in 2D arrays (axis=0 tricks)
- Set operations with structured arrays / object dtype
- Realistic mini-project: filter users, remove duplicates, find exclusive members
- Performance comparison: isin vs setdiff1d vs in1d
- Common bugs when mixing sorted vs unsorted expectations
Just tell me what you want to focus on next! 😊
