1. What is NumPy really? (The honest explanation)
NumPy = Numerical Python
It’s the foundation library for almost everything serious in data science, machine learning, scientific computing, image processing, finance, etc.
Main superpowers:
- Extremely fast arrays (called ndarray)
- Vectorized operations (no slow Python loops)
- Broadcasting (magic shape matching)
- Linear algebra, statistics, random numbers, Fourier transforms…
- Memory efficient compared to Python lists
Rule of thumb you should burn into your brain:
If you’re doing numerical computation in Python and you’re using loops → you’re probably doing it wrong.
2. First things first — Importing NumPy
|
0 1 2 3 4 5 6 |
import numpy as np |
Almost everyone uses np as the alias. Just accept it 😄
3. Creating NumPy Arrays (The Most Important Skill)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
# 1. From Python list a = np.array([1, 2, 3, 4]) # 1D array b = np.array([[1, 2], [3, 4]]) # 2D array (matrix) # 2. Special creation functions (very common) zeros = np.zeros((3, 4)) # 3 rows × 4 columns filled with 0.0 ones = np.ones((2, 5)) # filled with 1.0 empty = np.empty((2, 3)) # garbage values (fast but dangerous) full = np.full((3, 3), 7) # fill with any number eye = np.eye(4) # identity matrix 4×4 # 3. Range-like arrays arange = np.arange(0, 10, 2) # 0, 2, 4, 6, 8 linspace = np.linspace(0, 1, 11) # 11 equally spaced points [0, 0.1, ..., 1.0] # 4. Random numbers (super useful) rand = np.random.rand(3, 2) # uniform [0, 1) randn = np.random.randn(4, 3) # standard normal (Gaussian) randint = np.random.randint(1, 100, size=(5,)) # random integers |
Quick tip: np.random.seed(42) — makes random results reproducible (very important when debugging)
4. Understanding .shape, .ndim, .size, .dtype
|
0 1 2 3 4 5 6 7 8 9 10 11 |
x = np.random.randint(0, 10, size=(3, 4, 5)) print(x.shape) # (3, 4, 5) ← most important! print(x.ndim) # 3 print(x.size) # 3*4*5 = 60 print(x.dtype) # int64 (or int32, float64, etc.) |
Memorize this order: (depth, rows, columns) or (z, y, x)
5. Super Important: NumPy is PASS BY REFERENCE (not like Python lists!)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
a = np.array([1, 2, 3, 4]) b = a # NOT a copy! b[0] = 999 print(a) # [999 2 3 4] ← oh no! # Correct ways to copy: c = a.copy() # deep copy d = np.copy(a) # same e = a[:] # also copy (slicing creates view sometimes — be careful) |
6. Vectorization — The Reason NumPy is Fast
Bad (slow Python style):
|
0 1 2 3 4 5 6 7 8 9 |
lst = [1, 2, 3, 4, 5] result = [] for x in lst: result.append(x**2 + 3*x - 7) |
Good (NumPy way — 10–100× faster):
|
0 1 2 3 4 5 6 7 8 |
arr = np.array([1, 2, 3, 4, 5]) result = arr**2 + 3*arr - 7 # array([ -3, 3, 11, 21, 33]) |
All these work element-wise:
|
0 1 2 3 4 5 6 7 |
+ - * / ** % // > >= < <= == != np.sin(), np.cos(), np.exp(), np.log(), np.sqrt(), np.abs() |
7. Broadcasting — The Magic You’ll Love & Hate
Rules (very simple actually):
- Dimensions must be equal or
- One of them is 1 → it gets stretched
Examples that work:
|
0 1 2 3 4 5 6 7 8 |
(5, 3) + (3,) → (5, 3) (7, 1, 4) + (1, 6, 1) → (7, 6, 4) (1, 8) + 10 → (1, 8) |
Example that fails:
|
0 1 2 3 4 5 6 |
np.ones((3,4)) + np.ones((2,5)) # ValueError |
8. Indexing & Slicing (very powerful)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
a = np.arange(24).reshape(4, 6) # Basic a[0, :] # first row a[:, -1] # last column a[1:3, 2:5] # submatrix # Boolean indexing (super useful!) mask = a % 2 == 0 even_numbers = a[mask] # Fancy indexing rows = [0, 2, 3] cols = [1, 4, 5] selected = a[rows, cols] # picks a[0,1], a[2,4], a[3,5] |
9. Reshaping, Transpose, Flatten
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
a = np.arange(24) a.reshape(4, 6) # most common a.reshape(-1, 8) # -1 means "figure it out" a.T # transpose (very fast, just view) a.ravel() # flatten (usually view) a.flatten() # flatten (always copy) np.concatenate([a1, a2], axis=0) # stack vertically np.hstack(), np.vstack(), np.dstack() |
10. Important Aggregation Functions
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
x = np.random.randint(0, 100, (5, 6)) x.sum() # total sum x.sum(axis=0) # sum of each column x.sum(axis=1) # sum of each row x.mean(), x.std(), x.var() x.min(), x.max(), x.argmin(), x.argmax() np.median(x), np.percentile(x, 90) |
11. Mini Real-Life Examples
Normalize data (very common in ML)
|
0 1 2 3 4 5 6 7 |
X = np.random.randn(1000, 50) X_norm = (X - X.mean(axis=0)) / X.std(axis=0) |
Image as array (RGB example)
|
0 1 2 3 4 5 6 7 8 |
# height × width × 3 image = np.random.randint(0, 256, (1080, 1920, 3), dtype=np.uint8) gray = image.mean(axis=2) # simple grayscale |
Distance between all pairs of points
|
0 1 2 3 4 5 6 7 8 |
points = np.random.rand(500, 2) # 500 points in 2D diff = points[:, np.newaxis, :] - points # broadcasting magic distances = np.sqrt((diff**2).sum(axis=2)) # 500 × 500 distance matrix |
Quick Reference Table (Keep this somewhere)
| Operation | Syntax |
|---|---|
| Create array | np.array(), np.zeros(), np.linspace() |
| Element-wise math | + – * / ** sin exp log … |
| Matrix multiply | np.dot(a,b) or a @ b (Python 3.5+) |
| Transpose | a.T |
| Reshape | a.reshape(3,4,-1) |
| Flatten | a.ravel() or a.flatten() |
| Concatenate | np.concatenate(…, axis=0) |
| Stack | np.vstack(), np.hstack() |
| Boolean indexing | a[a > 5] |
| Where (if-else) | np.where(condition, x, y) |
Would you like to go deeper into any of these topics?
- Linear algebra (np.linalg)
- Advanced indexing & views vs copies
- Masked arrays
- Performance tricks
- Common mistakes people make
- NumPy + Pandas + Matplotlib mini project
Just tell me where you want to zoom in! 🚀
