Chapter 16: NumPy Sorting Arrays
NumPy Sorting Arrays — written as if I’m your patient teacher sitting next to you, going through every important detail step by step, showing realistic examples, explaining trade-offs, warning about common mistakes, and showing patterns you will actually use in real data work.
Let’s open a notebook together and learn this properly.
|
0 1 2 3 4 5 6 |
import numpy as np |
1. The two main ways to sort in NumPy
NumPy gives you two very different philosophies for sorting:
| Method | What it returns | Modifies original? | Most common use case |
|---|---|---|---|
| np.sort() | new sorted array (copy) | No | When you want to keep original |
| array.sort() | None — sorts in place | Yes | When memory is tight or you don’t need original |
| np.argsort() | indices that would sort the array | No | Ranking, top-k, indirect sorting |
2. np.sort() — the safe & most commonly used method
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 |
scores = np.array([78, 92, 65, 88, 71, 95, 82, 59, 67, 91]) sorted_scores = np.sort(scores) print(sorted_scores) # [59 65 67 71 78 82 88 91 92 95] print(scores) # [78 92 65 88 71 95 82 59 67 91] ← original unchanged! |
2D array — important behavior
By default, np.sort() sorts along the last axis (columns in 2D).
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
mat = np.array([ [45, 12, 78, 33], [19, 88, 5, 62], [91, 27, 54, 3], [ 8, 66, 41, 75] ]) print(np.sort(mat)) # Each **row** is sorted independently # [[12 33 45 78] # [ 5 19 62 88] # [ 3 27 54 91] # [ 8 41 66 75]] |
Control the axis
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# Sort each column (vertical sort) print(np.sort(mat, axis=0)) # [[ 8 12 5 3] # [19 27 41 33] # [45 66 54 62] # [91 88 78 75]] # Sort the whole array as if flattened print(np.sort(mat, axis=None)) # [ 3 5 8 12 19 27 33 41 45 54 62 66 75 78 88 91] |
3. In-place sorting: array.sort()
Modifies the array itself — returns None
|
0 1 2 3 4 5 6 7 8 9 10 |
data = np.random.randint(0, 100, 12) print("Before:", data) data.sort() # ← in-place print("After:", data) |
Very important warning — people often forget it returns None:
|
0 1 2 3 4 5 6 7 8 9 10 |
wrong = data.sort() # ← wrong! print(wrong) # None correct = data.copy() correct.sort() # now correct is sorted |
When to use array.sort()
- You are sure you don’t need the original anymore
- You want to save memory (no copy created)
- You are sorting very large arrays
4. The most powerful & most used tool: np.argsort()
Returns indices that would sort the array — not the values themselves.
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 |
values = np.array([45, 92, 18, 76, 33, 89, 61, 12, 55]) idx = np.argsort(values) print(idx) # [7 2 4 0 8 6 3 5 1] print(values[idx]) # sorted order # [12 18 33 45 55 61 76 89 92] |
Top 5 highest values (very common pattern)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 |
top5_indices = np.argsort(values)[-5:] # last 5 indices = highest print("Top 5 values:", values[top5_indices]) print("Their positions:", top5_indices) # Highest first: top5_desc = np.argsort(values)[-5:][::-1] print(values[top5_desc]) |
2D argsort — per row / per column
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 |
mat = np.random.randint(0, 100, (5, 6)) # Index of best score per row (student with highest mark per subject) best_per_subject = np.argmax(mat, axis=0) # or np.argsort(mat, axis=0)[-1] # Sort each row and get the order row_order = np.argsort(mat, axis=1) print(row_order) # shape (5,6) — each row contains 0..5 in sorted order |
5. Sorting with kind= parameter — when performance or stability matters
NumPy offers different sorting algorithms:
|
0 1 2 3 4 5 6 7 8 9 10 11 |
kind | Speed | Stable? | Use case -------------|-------------|---------|---------------------------------- 'quicksort' | fastest | no | default — good for most cases 'mergesort' | medium | **yes** | when you need stable sort 'heapsort' | slower | no | rarely used 'timsort' | good | **yes** | stable & fast (used in pandas) |
Stable sort example — very important when you have ties
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 |
names = np.array(['Anna', 'Bob', 'Clara', 'David', 'Emma']) scores = np.array([85, 92, 85, 78, 92]) # We want to sort by score, but keep original order when scores are equal idx = np.argsort(scores, kind='mergesort') # stable print(names[idx]) # ['David' 'Anna' 'Clara' 'Bob' 'Emma'] ← Anna before Clara (original order) |
With quicksort (not stable) → order of equal elements is random.
6. Realistic patterns you will write 100× times
Pattern 1: Get top-k scores with their original indices
|
0 1 2 3 4 5 6 7 8 9 |
sales = np.random.randint(1000, 100000, 500) top10_idx = np.argsort(sales)[-10:][::-1] top10_sales = sales[top10_idx] top10_positions = top10_idx |
Pattern 2: Rank items
|
0 1 2 3 4 5 6 7 8 9 |
times = np.array([12.4, 11.9, 13.1, 11.8, 12.7]) ranks = np.argsort(np.argsort(times)) + 1 # 1-based rank print(ranks) # [3 2 5 1 4] ← fastest gets rank 1 |
Pattern 3: Sort rows of a matrix by one column
|
0 1 2 3 4 5 6 7 8 |
data = np.random.randint(0, 100, (100, 5)) # sort by the 3rd column sorted_data = data[np.argsort(data[:, 2])] |
Pattern 4: Sort strings / categories
|
0 1 2 3 4 5 6 7 8 |
products = np.array(['mouse', 'keyboard', 'monitor', 'usb', 'laptop']) sorted_idx = np.argsort(products) print(products[sorted_idx]) |
Summary – Quick Decision Table
| You want to… | Best choice |
|---|---|
| Sort values, keep original unchanged | np.sort(arr) |
| Sort in place (save memory) | arr.sort() |
| Get sorted values and their original positions | np.argsort() |
| Find position of max / min | np.argmax() / np.argmin() |
| Need stable sorting (important for ties) | kind=’mergesort’ or ‘stable’ |
| Sort each row / each column independently | np.sort(…, axis=0) or axis=1 |
| Get top 10 / bottom 10 with indices | np.argsort(…)[-10:] or [::-1] |
Common Mistakes to Avoid
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 |
# Mistake 1: Forgetting that sort() returns None wrong = arr.sort() # wrong — wrong is None # Mistake 2: Thinking argsort sorts strings case-insensitively # → it does NOT — uppercase comes before lowercase # Mistake 3: Using sort on very large arrays without thinking about memory # → prefer argsort + indexing when you only need order |
Would you like to go deeper into any of these topics next?
- Stable vs unstable sorting with real examples
- Sorting structured arrays / record arrays
- Sorting along multiple keys (like SQL ORDER BY col1, col2)
- Performance: sort vs argsort vs pandas sort
- Mini-exercise: rank students, find top products, clean sorted time series
Just tell me what you want to focus on now! 😊
