{"id":2558,"date":"2026-02-02T10:53:33","date_gmt":"2026-02-02T10:53:33","guid":{"rendered":"https:\/\/demo.materiamedica.net\/demo6\/?p=2558"},"modified":"2026-02-02T10:53:33","modified_gmt":"2026-02-02T10:53:33","slug":"chapter-13-ufunc-set-operations","status":"publish","type":"post","link":"https:\/\/demo.materiamedica.net\/demo6\/chapter-13-ufunc-set-operations\/","title":{"rendered":"Chapter 13: ufunc Set Operations"},"content":{"rendered":"<h3 dir=\"auto\">1. First important truth: NumPy set operations are ufuncs \u2014 but with special behavior<\/h3>\n<p dir=\"auto\">NumPy provides <strong>six main set-operation ufuncs<\/strong>:<\/p>\n<div>\n<div dir=\"auto\">\n<table dir=\"auto\">\n<thead>\n<tr>\n<th data-col-size=\"md\">ufunc<\/th>\n<th data-col-size=\"xl\">What it computes<\/th>\n<th data-col-size=\"lg\">Returns<\/th>\n<th data-col-size=\"sm\">Symmetric?<\/th>\n<th data-col-size=\"xs\">Multi-array?<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td data-col-size=\"md\">np.intersect1d<\/td>\n<td data-col-size=\"xl\">intersection (common elements)<\/td>\n<td data-col-size=\"lg\">sorted unique array<\/td>\n<td data-col-size=\"sm\">yes<\/td>\n<td data-col-size=\"xs\">no<\/td>\n<\/tr>\n<tr>\n<td data-col-size=\"md\">np.union1d<\/td>\n<td data-col-size=\"xl\">union (all unique elements)<\/td>\n<td data-col-size=\"lg\">sorted unique array<\/td>\n<td data-col-size=\"sm\">yes<\/td>\n<td data-col-size=\"xs\">no<\/td>\n<\/tr>\n<tr>\n<td data-col-size=\"md\">np.setdiff1d<\/td>\n<td data-col-size=\"xl\">set difference (A \u2212 B)<\/td>\n<td data-col-size=\"lg\">sorted unique array<\/td>\n<td data-col-size=\"sm\">no<\/td>\n<td data-col-size=\"xs\">no<\/td>\n<\/tr>\n<tr>\n<td data-col-size=\"md\">np.setxor1d<\/td>\n<td data-col-size=\"xl\">symmetric difference (A \u0394 B)<\/td>\n<td data-col-size=\"lg\">sorted unique array<\/td>\n<td data-col-size=\"sm\">yes<\/td>\n<td data-col-size=\"xs\">no<\/td>\n<\/tr>\n<tr>\n<td data-col-size=\"md\">np.in1d \/ np.isin<\/td>\n<td data-col-size=\"xl\">membership test (element in set)<\/td>\n<td data-col-size=\"lg\">boolean array<\/td>\n<td data-col-size=\"sm\">\u2014<\/td>\n<td data-col-size=\"xs\">yes<\/td>\n<\/tr>\n<tr>\n<td data-col-size=\"md\">np.unique<\/td>\n<td data-col-size=\"xl\">unique elements (often used with sets)<\/td>\n<td data-col-size=\"lg\">sorted unique array<\/td>\n<td data-col-size=\"sm\">\u2014<\/td>\n<td data-col-size=\"xs\">\u2014<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<div><\/div>\n<\/div>\n<\/div>\n<p dir=\"auto\"><strong>Key characteristics<\/strong> of NumPy set ufuncs:<\/p>\n<ul dir=\"auto\">\n<li>They <strong>always return sorted unique values<\/strong> (except isin)<\/li>\n<li>They <strong>ignore duplicates<\/strong> automatically<\/li>\n<li>They <strong>work on 1D arrays only<\/strong> (flatten higher dimensions)<\/li>\n<li>They are <strong>very fast<\/strong> \u2014 implemented in efficient C code<\/li>\n<li>They <strong>do not preserve order<\/strong> \u2014 if order matters, you must sort manually<\/li>\n<\/ul>\n<h3 dir=\"auto\">2. Basic usage examples \u2013 one by one<\/h3>\n<div dir=\"auto\">\n<div data-testid=\"code-block\">\n<div>\n<div>Python<\/div>\n<div>\n<pre tabindex=\"0\"><code>a = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5])\r\nb = np.array([2, 7, 1, 8, 2, 8, 1, 8])\r\n\r\nprint(\"a =\", a)\r\nprint(\"b =\", b)\r\n\r\nprint(\"\\nIntersection (common elements):\")\r\nprint(np.intersect1d(a, b))           # [1 2 5]\r\n\r\nprint(\"\\nUnion (all unique elements):\")\r\nprint(np.union1d(a, b))               # [1 2 3 4 5 6 7 8 9]\r\n\r\nprint(\"\\nA \u2212 B (elements in a but not in b):\")\r\nprint(np.setdiff1d(a, b))             # [3 4 6 9]\r\n\r\nprint(\"\\nSymmetric difference (A \u0394 B):\")\r\nprint(np.setxor1d(a, b))              # [3 4 6 7 9]<\/code><\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<h3 dir=\"auto\">3. The most important function: np.isin \/ np.in1d (membership test)<\/h3>\n<p dir=\"auto\">This is the ufunc you will use <strong>most often<\/strong> when doing set-like filtering.<\/p>\n<div dir=\"auto\">\n<div data-testid=\"code-block\">\n<div>\n<div>Python<\/div>\n<div>\n<pre tabindex=\"0\"><code>values = np.array([10, 20, 30, 40, 50, 60, 70, 80, 90, 100])\r\n\r\nallowed = np.array([20, 50, 80, 110])\r\n\r\nprint(\"Which values are in allowed set?\")\r\nprint(np.isin(values, allowed))\r\n# [False  True False False  True False False  True False False]\r\n\r\n# Very common pattern: keep only allowed values\r\nfiltered = values[np.isin(values, allowed)]\r\nprint(\"Filtered:\", filtered)          # [20 50 80]<\/code><\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p dir=\"auto\"><strong>isin vs in1d<\/strong> \u2014 they are almost identical (since NumPy 1.12+ isin is preferred)<\/p>\n<div dir=\"auto\">\n<div data-testid=\"code-block\">\n<div>\n<div>Python<\/div>\n<div>\n<pre tabindex=\"0\"><code>print(np.in1d(values, allowed))     # same boolean mask<\/code><\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<h3 dir=\"auto\">4. Realistic examples \u2013 patterns you will use every day<\/h3>\n<p dir=\"auto\"><strong>Pattern 1 \u2013 Keep only rows where a column value is in a set<\/strong><\/p>\n<div dir=\"auto\">\n<div data-testid=\"code-block\">\n<div>\n<div>Python<\/div>\n<div>\n<pre tabindex=\"0\"><code>data = np.array([\r\n    [101, 'A', 23.5],\r\n    [102, 'B', 19.8],\r\n    [103, 'A', 25.1],\r\n    [104, 'C', 22.0],\r\n    [105, 'B', 21.7],\r\n    [106, 'A', 24.3]\r\n], dtype=object)\r\n\r\nvalid_categories = np.array(['A', 'B'])\r\n\r\nmask = np.isin(data[:, 1], valid_categories)\r\nfiltered_data = data[mask]\r\n\r\nprint(\"Original shape:\", data.shape)\r\nprint(\"Filtered shape:\", filtered_data.shape)\r\nprint(\"Filtered data:\\n\", filtered_data)<\/code><\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p dir=\"auto\"><strong>Pattern 2 \u2013 Remove duplicates across multiple columns<\/strong><\/p>\n<div dir=\"auto\">\n<div data-testid=\"code-block\">\n<div>\n<div>Python<\/div>\n<div>\n<pre tabindex=\"0\"><code>events = np.array([\r\n    [1, 'login',  '2023-01-01'],\r\n    [2, 'click',  '2023-01-02'],\r\n    [1, 'login',  '2023-01-01'],   # duplicate\r\n    [3, 'logout', '2023-01-03'],\r\n    [2, 'click',  '2023-01-02']    # duplicate\r\n])\r\n\r\n# Unique rows (very common need)\r\nunique_events = np.unique(events, axis=0)\r\n\r\nprint(\"Original:\\n\", events)\r\nprint(\"\\nUnique:\\n\", unique_events)<\/code><\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p dir=\"auto\"><strong>Pattern 3 \u2013 Find elements present in one array but not another<\/strong><\/p>\n<div dir=\"auto\">\n<div data-testid=\"code-block\">\n<div>\n<div>Python<\/div>\n<div>\n<pre tabindex=\"0\"><code>all_users = np.arange(1000, 1100)\r\nactive_users = np.random.choice(all_users, size=60, replace=False)\r\n\r\ninactive_users = np.setdiff1d(all_users, active_users)\r\n\r\nprint(\"Inactive users count:\", len(inactive_users))<\/code><\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p dir=\"auto\"><strong>Pattern 4 \u2013 Symmetric difference \u2013 users in A or B but not both<\/strong><\/p>\n<div dir=\"auto\">\n<div data-testid=\"code-block\">\n<div>\n<div>Python<\/div>\n<div>\n<pre tabindex=\"0\"><code>team_A = np.array([101, 103, 105, 107, 109])\r\nteam_B = np.array([102, 104, 106, 107, 108])\r\n\r\nexclusive = np.setxor1d(team_A, team_B)\r\n\r\nprint(\"Exclusive members (A XOR B):\", exclusive)<\/code><\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<h3 dir=\"auto\">5. Summary \u2013 NumPy Set Operation ufuncs Quick Reference<\/h3>\n<div>\n<div dir=\"auto\">\n<table dir=\"auto\">\n<thead>\n<tr>\n<th data-col-size=\"lg\">Function<\/th>\n<th data-col-size=\"xl\">What it returns<\/th>\n<th data-col-size=\"sm\">Sorted?<\/th>\n<th data-col-size=\"sm\">Unique?<\/th>\n<th data-col-size=\"xs\">Axis-aware?<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td data-col-size=\"lg\">np.intersect1d<\/td>\n<td data-col-size=\"xl\">common elements<\/td>\n<td data-col-size=\"sm\">yes<\/td>\n<td data-col-size=\"sm\">yes<\/td>\n<td data-col-size=\"xs\">no<\/td>\n<\/tr>\n<tr>\n<td data-col-size=\"lg\">np.union1d<\/td>\n<td data-col-size=\"xl\">all unique elements<\/td>\n<td data-col-size=\"sm\">yes<\/td>\n<td data-col-size=\"sm\">yes<\/td>\n<td data-col-size=\"xs\">no<\/td>\n<\/tr>\n<tr>\n<td data-col-size=\"lg\">np.setdiff1d(A, B)<\/td>\n<td data-col-size=\"xl\">A \u2212 B<\/td>\n<td data-col-size=\"sm\">yes<\/td>\n<td data-col-size=\"sm\">yes<\/td>\n<td data-col-size=\"xs\">no<\/td>\n<\/tr>\n<tr>\n<td data-col-size=\"lg\">np.setxor1d<\/td>\n<td data-col-size=\"xl\">A \u0394 B (exclusive or)<\/td>\n<td data-col-size=\"sm\">yes<\/td>\n<td data-col-size=\"sm\">yes<\/td>\n<td data-col-size=\"xs\">no<\/td>\n<\/tr>\n<tr>\n<td data-col-size=\"lg\">np.isin \/ np.in1d<\/td>\n<td data-col-size=\"xl\">boolean mask \u2014 is element in set?<\/td>\n<td data-col-size=\"sm\">\u2014<\/td>\n<td data-col-size=\"sm\">\u2014<\/td>\n<td data-col-size=\"xs\">yes<\/td>\n<\/tr>\n<tr>\n<td data-col-size=\"lg\">np.unique<\/td>\n<td data-col-size=\"xl\">unique elements (sorted)<\/td>\n<td data-col-size=\"sm\">yes<\/td>\n<td data-col-size=\"sm\">yes<\/td>\n<td data-col-size=\"xs\">yes (axis)<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<div><\/div>\n<\/div>\n<\/div>\n<h3 dir=\"auto\">Final teacher advice (very important)<\/h3>\n<p dir=\"auto\"><strong>Golden rule #1<\/strong> <strong>Use np.isin or np.in1d whenever you want to filter elements that belong to a set<\/strong> \u2014 this is the most common set operation you will write.<\/p>\n<p dir=\"auto\"><strong>Golden rule #2<\/strong> <strong>All set ufuncs (except isin) return sorted unique values<\/strong> \u2014 if you need original order preserved, use masking + np.unique(&#8230;, return_index=True).<\/p>\n<p dir=\"auto\"><strong>Golden rule #3<\/strong> <strong>Set operations in NumPy work only on 1D arrays<\/strong> \u2014 for 2D rows you must use np.unique(&#8230;, axis=0) or convert rows to tuples.<\/p>\n<p dir=\"auto\"><strong>Golden rule #4<\/strong> <strong>When comparing very large sets<\/strong> \u2014 np.isin is usually faster than np.intersect1d + indexing.<\/p>\n<p dir=\"auto\">Would you like to continue with any of these next?<\/p>\n<ul dir=\"auto\">\n<li>Finding unique rows in 2D arrays (axis=0 tricks)<\/li>\n<li>Set operations with structured arrays \/ object dtype<\/li>\n<li>Realistic mini-project: filter users, remove duplicates, find exclusive members<\/li>\n<li>Performance comparison: isin vs setdiff1d vs in1d<\/li>\n<li>Common bugs when mixing sorted vs unsorted expectations<\/li>\n<\/ul>\n<p dir=\"auto\">Just tell me what you want to focus on next! \ud83d\ude0a<\/p>\n","protected":false},"excerpt":{"rendered":"<p>1. First important truth: NumPy set operations are ufuncs \u2014 but with special behavior NumPy provides six main set-operation ufuncs: ufunc What it computes Returns Symmetric? Multi-array? np.intersect1d intersection (common elements) sorted unique array&#46;&#46;&#46;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[75],"tags":[],"class_list":["post-2558","post","type-post","status-publish","format-standard","hentry","category-numpy"],"_links":{"self":[{"href":"https:\/\/demo.materiamedica.net\/demo6\/wp-json\/wp\/v2\/posts\/2558","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/demo.materiamedica.net\/demo6\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/demo.materiamedica.net\/demo6\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/demo.materiamedica.net\/demo6\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/demo.materiamedica.net\/demo6\/wp-json\/wp\/v2\/comments?post=2558"}],"version-history":[{"count":1,"href":"https:\/\/demo.materiamedica.net\/demo6\/wp-json\/wp\/v2\/posts\/2558\/revisions"}],"predecessor-version":[{"id":2559,"href":"https:\/\/demo.materiamedica.net\/demo6\/wp-json\/wp\/v2\/posts\/2558\/revisions\/2559"}],"wp:attachment":[{"href":"https:\/\/demo.materiamedica.net\/demo6\/wp-json\/wp\/v2\/media?parent=2558"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/demo.materiamedica.net\/demo6\/wp-json\/wp\/v2\/categories?post=2558"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/demo.materiamedica.net\/demo6\/wp-json\/wp\/v2\/tags?post=2558"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}