Chapter 1: Random Numbers in NumPy
Random Numbers in NumPy — written as if I’m your patient teacher sitting next to you, showing examples on the screen, explaining every important detail, giving realistic use-cases, warning about common mistakes, and helping you build good habits from the beginning.
Let’s imagine we’re working together in a Jupyter notebook. Ready? 😊
|
0 1 2 3 4 5 6 |
import numpy as np |
1. Why NumPy random instead of Python’s random?
Many beginners start with Python’s built-in random module.
But in serious numerical/scientific/ML work you should almost always use numpy.random.
Main reasons:
- Much faster when generating thousands/millions of numbers
- Returns NumPy arrays directly (ready for math, slicing, broadcasting)
- Much larger variety of distributions (normal, binomial, poisson, beta, gamma, multivariate…)
- Proper multi-dimensional support out of the box
- Consistent seeding behavior across different functions
- Better integration with the rest of the NumPy ecosystem
Golden rule #1 (write it somewhere):
If you’re doing anything numerical/scientific/ML/data analysis → use numpy.random — not random, not secrets, not anything else.
2. The MOST important habit: Always control the seed
|
0 1 2 3 4 5 6 |
np.random.seed(42) # ← this single line changes everything |
Why is this so important?
- Makes experiments reproducible
- Makes debugging much easier
- Allows others to get exactly the same results
- Extremely helpful when teaching, writing tutorials, comparing models
Quick demonstration:
|
0 1 2 3 4 5 6 7 8 9 10 11 12 |
# Without seed → different every time you run print(np.random.rand(5)) # With seed → always exactly the same numbers np.random.seed(42) print(np.random.rand(5)) # → [0.37454012 0.95071431 0.73199394 0.59865848 0.15601864] |
Teacher tip: During learning/experimenting → always put a fixed seed at the top of your notebook/script. When you go to production or want true randomness → remove/comment out the seed line.
3. The most frequently used random functions (you’ll use these 90% of the time)
A. np.random.rand() — Uniform random numbers [0, 1)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
# Single number print(np.random.rand()) # 1D vector print(np.random.rand(10)) # 2D matrix (very common!) print(np.random.rand(4, 6)) # 4D — batch of images (common in deep learning) print(np.random.rand(32, 64, 64, 3).shape) # (32, 64, 64, 3) |
B. np.random.randn() — Standard normal (Gaussian) distribution
Mean = 0, standard deviation = 1
|
0 1 2 3 4 5 6 7 8 9 10 |
print(np.random.randn(12)) # typical output: mixture of positive & negative, most values between -3 and +3 # Very common shape in neural networks weights_layer1 = np.random.randn(784, 128) # 784 input → 128 neurons |
Quick comparison table students should remember:
| Function | Range / Distribution | Typical shape example | Most common use case |
|---|---|---|---|
| rand() | [0, 1) uniform | rand(1000, 30) | Features, dropout masks, probabilities |
| randn() | ~ Normal(0, 1) | randn(512, 512, 3) | Weights init, noise, synthetic data |
| randint() | integers [low, high) | randint(1, 7, size=100) | Dice, labels, indices, pixel values |
C. np.random.randint(low, high, size=…) — Random integers
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 |
# Single dice roll (1–6) print(np.random.randint(1, 7)) # 1000 coin flips (0 or 1) coins = np.random.randint(0, 2, size=1000) # Realistic: exam scores (0–100) scores = np.random.randint(35, 101, size=(200, 5)) # 200 students × 5 subjects |
D. np.random.uniform(low, high, size=…) — Uniform in custom range
|
0 1 2 3 4 5 6 7 8 9 10 |
# Temperatures between 18–32 °C temps = np.random.uniform(18, 32, size=365) # Prices between 9.99 and 99.99 prices = np.random.uniform(9.99, 99.99, size=500) |
E. np.random.normal(loc, scale, size=…) — Custom normal distribution
|
0 1 2 3 4 5 6 7 8 9 10 11 |
# Realistic IQ scores iq = np.random.normal(loc=100, scale=15, size=10000) # Measurement with sensor noise (±2 units) true_distance = 150.0 measured = true_distance + np.random.normal(0, 2, size=200) |
4. Very common realistic use cases (you will write these many times)
Use case 1 – Synthetic training data
|
0 1 2 3 4 5 6 7 8 9 10 11 12 |
np.random.seed(42) n = 5000 X = np.random.randn(n, 8) * 3 + 50 # centered around 50 noise = np.random.normal(0, 4, n) y = (2.5 * X[:,0] - 1.8 * X[:,2] + 0.9 * X[:,5]) + noise |
Use case 2 – Adding realistic noise to images
|
0 1 2 3 4 5 6 7 8 |
img = np.full((300, 400, 3), 180, dtype=np.uint8) # light gray noise = np.random.normal(0, 30, img.shape) noisy = np.clip(img + noise, 0, 255).astype(np.uint8) |
Use case 3 – Random train/val/test split (manual)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 |
data = np.random.randn(12000, 25) np.random.shuffle(data) # ← very important step! train = data[:9000] val = data[9000:10500] test = data[10500:] |
Use case 4 – Random sampling / bootstrapping
|
0 1 2 3 4 5 6 7 8 9 |
population = np.random.normal(100, 20, 100000) # Bootstrap sample of 1000 sample = np.random.choice(population, size=1000, replace=True) |
Summary – Your Quick Reference Table
| Function | What you get | Typical shape | Typical seed usage |
|---|---|---|---|
| rand() | Uniform [0,1) | rand(1000,20) | Yes |
| randn() | Standard normal (μ=0, σ=1) | randn(512,512,3) | Yes |
| randint(low,high) | Integers [low, high) | randint(1,101,10000) | Yes |
| uniform(a,b) | Uniform [a,b) | uniform(0,100,500) | Yes |
| normal(μ,σ) | Normal distribution μ,σ | normal(170,10,10000) | Yes |
| choice() | Sample from given array | choice(names, 200) | Yes |
| shuffle() | Shuffle array in place | shuffle(dataset) | Yes (before) |
Final teacher advice (very important)
Always start your random-related work with:
|
0 1 2 3 4 5 6 7 |
import numpy as np np.random.seed(42) # or 123, 0, 777 — any fixed number |
This tiny habit will save you many hours of confusion later.
Where do you want to go next?
- More exotic distributions (poisson, binomial, beta, gamma…)
- Randomness in machine learning (dropout, weight init, data augmentation)
- Common bugs & misunderstandings with random numbers
- Mini-project: generate synthetic dataset + add noise + visualize
- Reproducibility tricks across multiple files/notebooks
Just tell me what feels most interesting or useful for you right now! 🚀
