Chapter 2: NumPy HOME
NumPy – The Honest “Home” Explanation (What it really is for beginners)
NumPy is the foundation of numerical & scientific computing in Python.
Almost every serious data/ML/science library (pandas, matplotlib, scikit-learn, tensorflow, pytorch, opencv, scipy, statsmodels, etc.) is built on top of NumPy.
Core idea you must understand from day one:
Python lists → good for general purpose, flexible, mixed types NumPy arrays → extremely fast, same-type elements, mathematical operations on whole arrays at once
Most important mindset shift:
Stop thinking element-by-element Start thinking whole-array-at-once
This single change makes code 10–100× faster and much more readable.
|
0 1 2 3 4 5 6 |
import numpy as np |
(Everyone uses np — don’t fight it 😄)
1. Creating NumPy Arrays – The 7 most common ways (2025 style)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
# 1. From existing Python list / tuple scores = np.array([78, 92, 65, 88, 71]) matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) # 2. All zeros / ones (very common for initialization) zeros_2d = np.zeros((4, 5)) # shape = (rows, columns) ones_3d = np.ones((2, 3, 4)) # 3. Fill with any value price_table = np.full((3, 10), 99.99) identity = np.eye(5) # identity matrix — very useful in linear algebra # 4. Sequence creation (you will use these daily) a = np.arange(0, 20, 2) # 0, 2, 4, ..., 18 b = np.linspace(0, 1, 11) # 0.0, 0.1, 0.2, ..., 1.0 ← perfect for plotting c = np.arange(100) # 0 to 99 # 5. Random numbers – extremely important np.random.seed(42) # ← makes random results the same every time (debugging!) uniform = np.random.rand(5, 3) # uniform distribution [0, 1) normal = np.random.randn(1000) # standard normal (Gaussian) mean=0, std=1 dice = np.random.randint(1, 7, size=20) # 1 to 6 inclusive noisy_image = np.random.normal(128, 25, (256, 256)) # realistic noise example |
Quick student mistake to avoid Don’t do this:
|
0 1 2 3 4 5 6 |
np.arange(0, 10, 0.5) # sometimes gives floating point surprise (last value missing) |
Better:
|
0 1 2 3 4 5 6 |
np.arange(0, 10.1, 0.5) # or use linspace |
2. The 4 properties you must look at every time
|
0 1 2 3 4 5 6 7 8 9 10 11 |
data = np.random.randint(0, 100, size=(3, 4, 5)) print(data.shape) # (3, 4, 5) ← MOST IMPORTANT print(data.ndim) # 3 print(data.size) # 60 print(data.dtype) # int64 (or float64, uint8, bool, int32...) |
Real-world shape examples you will see very often:
|
0 1 2 3 4 5 6 7 8 9 10 11 |
# Machine learning X_train.shape → (number_of_samples, number_of_features) # e.g. (8000, 784) images.shape → (batch_size, height, width, channels) # e.g. (32, 224, 224, 3) # Time series prices.shape → (timesteps,) or (timesteps, features) |
3. The #1 thing that confuses everyone: Views vs Copies
NumPy tries very hard NOT to copy data → this makes it fast, but dangerous if you forget.
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
a = np.array([10, 20, 30, 40, 50]) b = a # ← THIS IS NOT A COPY! b[0] = 999 print(a) # [999 20 30 40 50] ← surprise! # Safe ways to copy c = a.copy() # explicit deep copy d = np.copy(a) e = a[:] # usually a view, but in many cases behaves like copy |
Quick test to remember:
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 |
x = np.arange(10) y = x[::2] # every second element → VIEW y[0] = 999 print(x) # first element changed → yes, view! z = x[[0,2,4,6]] # fancy indexing → usually COPY z[0] = 777 print(x) # x did NOT change |
4. Vectorization – The reason people fall in love with NumPy
Bad & slow (classic Python):
|
0 1 2 3 4 5 6 7 8 9 |
temperatures = [23.4, 25.1, 19.8, 28.7, 22.0] feels_like = [] for t in temperatures: feels_like.append(t * 1.8 + 32 - 10) |
Beautiful & fast NumPy:
|
0 1 2 3 4 5 6 7 8 |
temps = np.array([23.4, 25.1, 19.8, 28.7, 22.0]) feels_like = temps * 1.8 + 32 - 10 # array([64.12, 68.18, 55.64, 74.66, 61.6 ]) |
All these are vectorized (no loops!):
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
x = np.array([1, 2, 3, 4, 5]) x + 10 x * 5 x ** 2 x % 3 np.sqrt(x) np.exp(x) np.log1p(x) # log(1+x) — safer for small values np.sin(x * np.pi / 180) # if x is degrees x > 3 # → boolean array [False False False True True] |
5. Broadcasting – The feature students call “magic”
Broadcasting rules (very simple when you see them):
- Dimensions must match or
- One dimension is 1 → NumPy stretches it automatically
Examples that work beautifully:
|
0 1 2 3 4 5 6 7 8 9 10 |
# Very common patterns (1000, 784) + (784,) → adds bias to each feature (32, 224, 224, 3) + (3,) → adds value to each color channel (500, 1) + (1, 800) → outer sum → creates 500×800 matrix 300 + np.random.randn(100,50) → adds scalar to whole array |
Examples that fail (students always forget):
|
0 1 2 3 4 5 6 |
np.ones((4,5)) + np.ones((3,6)) # ValueError |
6. Indexing & Slicing – Real patterns you will use every day
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
arr = np.arange(36).reshape(6, 6) # Basic slicing arr[0] # first row arr[:, -1] # last column arr[1:4, 2:5] # submatrix rows 1-3, columns 2-4 # Boolean indexing (very powerful!) mask = arr > 20 high_values = arr[mask] # Replace values conditionally (very common) arr[arr < 10] = 0 # Fancy indexing with lists/arrays rows = [0, 3, 5] cols = [1, 4, 2] selected = arr[rows, cols] # Combined pattern – very frequent in data cleaning data = np.random.normal(100, 15, (1000, 20)) outliers = np.any(np.abs(data) > 160, axis=1) clean_data = data[~outliers] # ~ = NOT |
7. Reshaping, Transposing, Flattening – Daily operations
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
a = np.arange(24) a.reshape(4, 6) a.reshape(3, -1) # -1 means "automatically calculate" a.reshape(-1, 8) # Very common in deep learning flat_images = np.random.rand(5000, 784) # 5000 × 28×28 flattened images = flat_images.reshape(-1, 28, 28, 1) # → (5000, 28, 28, 1) # Transpose matrix.T np.transpose(matrix, axes=(0,2,1)) # for 3D # Flatten a.ravel() # usually view → faster a.flatten() # always copy → safer sometimes |
8. Most useful statistics & reductions
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
sales = np.random.randint(50, 500, (12, 30)) # 12 months × 30 days sales.sum() # total sales sales.sum(axis=0) # sales per day of month sales.mean(axis=1) # average per month sales.std(axis=1) # volatility per month sales.min(), sales.max() sales.argmin(axis=0) # which month had lowest sales each day? np.median(sales) np.percentile(sales, [25, 75]) np.quantile(sales, 0.95) # 95th percentile |
Very common ML normalization pattern
|
0 1 2 3 4 5 6 7 |
features = np.random.randn(10000, 30) features_norm = (features - features.mean(axis=0)) / features.std(axis=0) |
Summary – Your NumPy “Cheat Sheet” to keep forever
| Operation | Most common & useful way |
|---|---|
| Create array | np.array(), zeros(), ones(), arange(), linspace(), random.rand/randn/randint |
| Shape info | .shape .ndim .size .dtype |
| Safe copy | .copy() or np.copy() |
| Element-wise math | + – * / ** sqrt sin exp log abs |
| Matrix multiplication | a @ b or np.dot(a,b) |
| Transpose | a.T |
| Reshape | reshape(…, -1) |
| Stack arrays | np.vstack(), np.hstack(), np.concatenate(…, axis=…) |
| Boolean filtering | arr[arr > 5], arr[~mask] |
| Conditional replace | arr[arr < 0] = 0, np.where() |
| Statistics | sum(axis=), mean(axis=), std, min, max, percentile |
Would you like to continue with any of these next?
- Linear algebra basics (dot product, matrix inverse, eigenvalues…)
- Advanced indexing & memory-efficient views
- Common performance traps & how to avoid them
- NumPy + matplotlib — first plots together
- Real data cleaning examples
- NumPy patterns used in machine learning
Just tell me what feels most helpful right now! 😊
