Chapter 9: NumPy Array Copy vs View
NumPy Array Copy vs View — written exactly like a patient teacher sitting next to you, showing examples, explaining why this confuses almost everyone, what actually happens in memory, and how to avoid the most common painful bugs.
This topic is very important — misunderstanding copy vs view is one of the top 3 reasons beginners get surprised or wrong results in NumPy.
|
0 1 2 3 4 5 6 |
import numpy as np |
The Core Idea — One Sentence Version
When you create a new array from an existing one, NumPy tries very hard not to copy the data — because copying is slow and uses extra memory.
So most operations give you a view (a different way to look at the same data in memory) instead of a real copy.
View = same data, different window / name Copy = completely new, independent data
If you change something through a view, the original array also changes — this is what surprises most people.
1. The Classic Trap — Everyone Falls Into This
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
a = np.array([10, 20, 30, 40, 50, 60]) print("Original a:", a) b = a[2:5] # ← slicing → this is a VIEW print("b (slice): ", b) b[0] = 999 print("b after change:", b) print("Original a after change:", a) # ← surprise! |
Output:
|
0 1 2 3 4 5 6 7 8 9 |
Original a: [10 20 30 40 50 60] b (slice): [30 40 50] b after change: [999 40 50] Original a after change: [ 10 20 999 40 50 60] |
Why did a change? Because b is not a copy — it is just a different way of looking at the same memory block.
b starts at position 2 of a, takes 3 elements, but the data is not duplicated.
2. Operations That Usually Return a VIEW
These almost always give you a view (same data in memory):
| Operation | Example | Returns |
|---|---|---|
| Basic slicing | a[3:8], a[:10], a[::2] | View |
| Reverse slicing | a[::-1] | View |
| Reshape (when possible) | a.reshape(4, -1) | View (most cases) |
| Transpose | a.T, matrix.T | View |
| Swapaxes, rollaxis | np.swapaxes(a, 0, 1) | View |
| Simple indexing (single element) | a[5] | scalar (not array) |
Very important rule for beginners:
If you used only numbers and : in the indexing → almost always view
3. Operations That Usually Return a COPY
These usually create a new, independent array:
| Operation | Example | Returns |
|---|---|---|
| Boolean indexing | a[a > 0], a[mask] | Copy |
| Fancy indexing (list/array) | a[[1,4,7]], a[idx] | Copy |
| Advanced indexing (multiple lists) | a[[0,2], [1,3]] | Copy |
| .copy() method | a.copy() | Copy |
| np.copy() function | np.copy(a) | Copy |
| np.array(a) | np.array(a) | Copy |
| Most np. functions that create new data | np.concatenate, np.vstack, etc. | Copy |
Quick memory rule:
If you used lists, arrays, or boolean masks inside the brackets → usually copy
4. How to Force a Real Copy (Safest Ways)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
original = np.arange(12).reshape(3, 4) # The three safest, clearest ways: safe1 = original.copy() # most readable safe2 = np.copy(original) safe3 = original.astype(original.dtype) # also creates copy # Slicing + copy (very common pattern) safe_slice = original[1:3, 2:5].copy() |
Rule to live by:
Whenever you plan to modify the new array and you don’t want the original to change → always add .copy()
5. Realistic Examples — When This Bites People
Example 1 – Preprocessing subset of columns
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
X = np.random.randn(1000, 20) # Wrong way (very common mistake) important = X[:, [0, 3, 7, 12, 19]] # ← this is a COPY (fancy indexing) important -= important.mean(axis=0) # OK, original X unchanged # But if you do slicing this way: subset = X[:, 0:5] # ← this is a VIEW subset -= subset.mean(axis=0) # ← modifies original X columns 0–4 ! |
Example 2 – Image cropping (classic)
|
0 1 2 3 4 5 6 7 8 9 |
img = np.random.randint(0, 256, (512, 512, 3), dtype=np.uint8) crop = img[100:400, 150:450, :] # ← VIEW crop[:] = 0 # ← you just painted part of original image black! |
Correct:
|
0 1 2 3 4 5 6 7 |
crop_safe = img[100:400, 150:450, :].copy() crop_safe[:] = 0 # original img unchanged |
6. Quick Test You Can Use to Check
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
def is_view(a, b): return a.base is b.base and a.base is not None a = np.arange(20) b = a[5:15] c = a[[5,6,7,8,9]] print(is_view(a, b)) # True ← view print(is_view(a, c)) # False ← copy |
But honestly, most people just remember:
If I didn’t write .copy(), and I used only :, assume it’s a view.
Summary Table – Copy vs View Cheat Sheet
| Situation / Operation | Usually Copy or View? |
|---|---|
| b = a | View (just another name) |
| b = a[3:10] | View |
| b = a[::2] | View |
| b = a.T | View |
| b = a.reshape(…) | View (most cases) |
| b = a[a > 0] | Copy |
| b = a[[1,4,7]] | Copy |
| b = a[np.ix_([1,3],[2,5])] | Copy |
| b = a.copy() | Copy |
| You want to modify b without changing a | Always use .copy() |
Final Advice from Teacher to Student
Golden rules to live by:
- If you are going to modify the new array → always write .copy()
- If you are only reading or doing calculations that create new arrays → view is fine (and faster)
- When in doubt → .copy()
- Boolean indexing and fancy indexing are usually safe (they copy), slicing is usually dangerous (it views)
Would you like to go deeper into any of these next?
- How to debug copy/view issues
- Memory layout and why views are so fast
- Situations where reshape / transpose creates copy
- Realistic data cleaning / preprocessing examples with copy/view traps
Just tell me what you want to focus on now! 😊
