Chapter 3: Random Permutations
What is a permutation? (quick honest explanation)
A permutation is simply a rearrangement of the elements of a sequence.
Examples:
- Original: [A, B, C, D]
- One permutation: [B, D, A, C]
- Another: [D, A, C, B]
- etc.
There are n! possible permutations of n distinct items.
NumPy gives us two main ways to create random permutations:
| Method | What it does | Modifies original? | Returns |
|---|---|---|---|
| np.random.permutation() | Returns a new randomly shuffled copy | No | new array |
| np.random.shuffle() | Shuffles in place (modifies original) | Yes | None |
1. np.random.permutation() — most commonly used
Creates a new shuffled copy — original stays unchanged.
|
0 1 2 3 4 5 6 7 8 9 10 11 |
# Simple 1D example numbers = np.arange(10) # [0 1 2 3 4 5 6 7 8 9] shuffled = np.random.permutation(numbers) print("Original:", numbers) print("Shuffled :", shuffled) |
Important observation — every time you run it → different order (unless seeded)
|
0 1 2 3 4 5 6 7 8 |
np.random.seed(42) print(np.random.permutation(numbers)) # [6 3 7 2 9 1 8 4 0 5] |
Very common pattern — permuting indices
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
# 1000 samples — we want to shuffle the order randomly indices = np.arange(1000) random_order = np.random.permutation(indices) # Now we can use this to shuffle data X = np.random.randn(1000, 20) # features y = np.random.randint(0, 2, 1000) # labels X_shuffled = X[random_order] y_shuffled = y[random_order] |
This is exactly what train_test_split does behind the scenes.
2. np.random.shuffle() — shuffles in place
Modifies the array directly — does not return anything.
|
0 1 2 3 4 5 6 7 8 9 10 11 |
deck = np.arange(1, 53) # cards 1 to 52 print("Before:", deck[:10]) np.random.shuffle(deck) print("After :", deck[:10]) |
Very common mistake students make:
|
0 1 2 3 4 5 6 7 8 |
# WRONG — shuffle returns None! wrong = np.random.shuffle(deck) # wrong = None print(wrong) # None |
Correct usage:
|
0 1 2 3 4 5 6 |
np.random.shuffle(deck) # modifies deck directly |
3. Quick comparison table (very useful to remember)
| Property | np.random.permutation() | np.random.shuffle() |
|---|---|---|
| Returns | new shuffled array | None |
| Modifies original? | No | Yes |
| Can take integer N? | Yes — creates 0..N-1 shuffled | No |
| Memory usage | creates copy | no extra memory |
| Most common use case | creating shuffled indices, new copy | shuffling existing dataset in place |
4. Special & very useful feature of permutation()
You can pass an integer instead of an array!
|
0 1 2 3 4 5 6 7 8 9 10 11 |
# Create a random permutation of 0, 1, 2, ..., 999 idx = np.random.permutation(1000) # Same as: idx = np.arange(1000) np.random.shuffle(idx) # but this modifies in place |
This is extremely common when you want to shuffle indices without creating the full arange first.
5. Realistic patterns you will use again and again
Pattern 1 – Shuffle dataset before training
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
# Full dataset X = np.random.randn(15000, 35) y = np.random.randint(0, 3, 15000) # Shuffle perm = np.random.permutation(len(X)) X = X[perm] y = y[perm] |
Pattern 2 – Create k-fold cross-validation indices
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
n = 12000 indices = np.random.permutation(n) fold_size = n // 5 for i in range(5): val_start = i * fold_size val_end = (i+1) * fold_size val_idx = indices[val_start:val_end] train_idx = np.concatenate([indices[:val_start], indices[val_end:]]) print(f"Fold {i+1}: train={len(train_idx)}, val={len(val_idx)}") |
Pattern 3 – Random sampling without replacement
|
0 1 2 3 4 5 6 7 |
all_customers = np.arange(50000) selected = np.random.permutation(all_customers)[:500] # first 500 after shuffle |
Pattern 4 – Shuffle rows of a matrix
|
0 1 2 3 4 5 6 7 8 9 |
matrix = np.random.randint(0, 100, size=(1000, 6)) np.random.shuffle(matrix) # rows are shuffled in place # or matrix = matrix[np.random.permutation(len(matrix))] |
Summary – Quick Decision Guide
| You want to… | Best choice |
|---|---|
| Get a new shuffled version (keep original) | np.random.permutation(arr) |
| Shuffle existing array in place (save memory) | np.random.shuffle(arr) |
| Create random order of 0..n-1 | np.random.permutation(n) |
| Shuffle rows of a 2D array | either — shuffle() or permutation + indexing |
| Need shuffled indices for splitting | np.random.permutation(len(data)) |
Final teacher advice
Always think about whether you need the original order preserved:
- Need original later → use permutation()
- Don’t need original + want to save memory → use shuffle()
- Working with indices → permutation(n) is usually cleanest
Always set a seed when you want reproducible shuffling:
|
0 1 2 3 4 5 6 7 |
np.random.seed(42) perm = np.random.permutation(1000) |
Where would you like to go next?
- Difference between permutation and choice (sampling)
- Shuffling multi-dimensional arrays correctly
- Random permutations in machine learning pipelines
- Common bugs when shuffling labels/data separately
- Mini-exercise: shuffle a dataset and create train/val splits
Just tell me what feels most useful or interesting right now! 😊
