Chapter 3: NumPy Introduction
NumPy – The Very First Honest Explanation
NumPy = Numerical Python
It is the most important library in the entire Python data/science/machine-learning world.
Almost everything serious that happens with numbers in Python uses NumPy under the hood:
- pandas (data frames/tables)
- matplotlib, seaborn, plotly (plotting)
- scikit-learn (machine learning)
- tensorflow, pytorch, jax (deep learning)
- scipy (scientific computing)
- opencv (image & video processing)
- statsmodels, pingouin (statistics)
- financial libraries, physics simulations, bioinformatics…
If you want to do anything serious with numbers in Python, you must learn NumPy first.
The most important mindset change you need right now:
Python lists → good for general things, shopping lists, names, mixed types → very slow when doing math on thousands/millions of numbers
NumPy arrays → made only for numbers (all elements same type) → extremely fast mathematical operations → thinks in whole arrays instead of item-by-item
This single difference makes code 10–100× faster and usually much shorter and cleaner.
Let’s Start – First Code You Should Type
|
0 1 2 3 4 5 6 |
import numpy as np |
Almost every person in data science / ML / scientific computing uses np as the short name. Just accept it — it’s the universal convention.
|
0 1 2 3 4 5 6 7 8 9 |
# Let's create our very first NumPy array scores = np.array([78, 92, 65, 84, 71, 88]) print(scores) # output: [78 92 65 84 71 88] |
Compare with normal Python list:
|
0 1 2 3 4 5 6 7 8 |
python_list = [78, 92, 65, 84, 71, 88] print(python_list) # output: [78, 92, 65, 84, 71, 88] |
They look almost the same… but they are very different inside.
Why NumPy Feels So Different – First Magic Example
Let’s say we want to give everyone +5 bonus points.
Normal Python way (slow and ugly when list is big):
|
0 1 2 3 4 5 6 7 8 9 |
new_scores = [] for score in python_list: new_scores.append(score + 5) print(new_scores) |
NumPy way (clean & fast):
|
0 1 2 3 4 5 6 7 8 |
bonus = scores + 5 print(bonus) # [83 97 70 89 76 93] |
No loop! NumPy did the addition on every element automatically.
This is called vectorization — and it is the #1 reason people love NumPy.
Most Common First Arrays You Will Create
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
# 1. From a list (most common way at beginning) a = np.array([1.5, 2.7, 3.2, 4.0]) # 2. All zeros (very useful to initialize) zeros = np.zeros(10) # 1D zeros2d = np.zeros((4, 6)) # 4 rows, 6 columns # 3. All ones ones = np.ones((3, 5)) # 4. Fill with same number prices = np.full((2, 8), 99.99) # 5. Sequence of numbers (very common!) ar = np.arange(0, 20, 2) # 0,2,4,...,18 lin = np.linspace(0, 1, 11) # 0.0, 0.1, ..., 1.0 # 6. Random numbers – you will use these A LOT np.random.seed(42) # makes random results repeatable uniform = np.random.rand(5) # 5 random numbers between 0 and 1 normal = np.random.randn(1000) # 1000 numbers from normal distribution dice = np.random.randint(1, 7, size=20) # 20 dice rolls |
The 4 Most Important Properties – Check These Every Time
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
data = np.random.randint(0, 100, size=(3, 4)) print(data) # example output: # [[45 68 37 91] # [12 73 8 29] # [64 19 83 52]] print(data.shape) # (3, 4) ← MOST IMPORTANT LINE print(data.ndim) # 2 ← how many dimensions print(data.size) # 12 ← total numbers print(data.dtype) # int64 ← what type are the numbers |
Real-world examples you will see very soon:
|
0 1 2 3 4 5 6 7 8 9 10 11 |
# Machine learning dataset X.shape → (number_of_samples, number_of_features) # e.g. (10000, 784) # Images (very common) image.shape → (height, width, 3) # RGB image batch_images.shape → (32, 224, 224, 3) # 32 images of 224×224 pixels |
First Useful Math You Can Do Immediately
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
temps = np.array([23.4, 25.1, 19.8, 28.7, 22.0]) print(temps * 1.8 + 32) # convert °C → °F # [74.12 77.18 67.64 83.66 71.6 ] print(temps ** 2) # square each value print(np.sqrt(temps)) # square root print(np.round(temps)) # round to nearest integer print(temps > 24) # boolean array: True/False |
All these operations happen element by element — no loops needed.
Very Common Beginner Mistake #1 – Copying
|
0 1 2 3 4 5 6 7 8 9 10 |
a = np.array([10, 20, 30, 40]) b = a # ← DANGER! This is NOT a real copy b[0] = 999 print(a) # [999 20 30 40] ← a also changed! |
Correct ways to copy:
|
0 1 2 3 4 5 6 7 8 |
c = a.copy() # safest and clearest d = np.copy(a) e = a[:] # usually works, but be careful with slices |
Rule to remember: b = a → same array (just two names for same data) b = a.copy() → new independent array
Quick Summary – Your First NumPy Survival Kit
What you should be able to do after this introduction:
- Import NumPy
- Create 1D and 2D arrays
- Use np.zeros, np.ones, np.arange, np.linspace
- Create random numbers (rand, randn, randint)
- Check .shape, .ndim, .dtype
- Do math on whole arrays (+, -, *, /, **, sqrt, round…)
- Understand why we avoid loops
- Know how to safely copy arrays
Where should we go next?
Pick one (or tell me what you feel you need most):
- More about creating arrays (many more realistic examples)
- Indexing & slicing in detail (very important)
- Broadcasting (the magic that makes shapes work together)
- Boolean masks and filtering data
- First statistics (mean, std, min, max, percentiles…)
- Reshaping arrays (very common in machine learning)
- Common beginner mistakes and how to avoid them
- Small realistic mini-project (e.g. grade calculator, simple data cleaning)
Just say a number or describe what feels most useful right now — we continue from exactly where you are.
You’re doing great — let’s keep going! 😊
