Chapter 14: NumPy Splitting Array
NumPy Array Splitting — written as if I’m your patient teacher sitting next to you, showing examples line by line, drawing little mental pictures, explaining the logic, warning about common traps, and showing realistic patterns you will actually use.
Let’s go slowly and thoroughly.
|
0 1 2 3 4 5 6 |
import numpy as np |
What does “splitting” an array mean?
Splitting = dividing one array into multiple smaller arrays.
It is the opposite of joining/concatenating.
NumPy offers several ways to split arrays, and each method has its own personality and best use-case.
Main splitting functions:
| Function | What it does | Most common use case |
|---|---|---|
| np.split() | General-purpose split — you give exact indices or number of pieces | When you know exactly where to cut |
| np.array_split() | Like split, but more forgiving — can handle uneven pieces | Most flexible & most commonly used |
| np.vsplit() | Vertical split (split rows) — shortcut for axis=0 | Splitting matrices by rows |
| np.hsplit() | Horizontal split (split columns) — shortcut for axis=1 | Splitting matrices by columns |
| np.dsplit() | Depth split — splits along axis=2 (3D arrays) | Splitting image channels, depth volumes |
| np.split(…, axis=…) | General version — choose any axis | Most powerful |
1. The most important functions: split() vs array_split()
np.split() — strict & exact
You must tell it exactly where to cut — or exactly how many equal pieces you want.
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 |
arr = np.arange(12) print(arr) # [ 0 1 2 3 4 5 6 7 8 9 10 11] # Split into 3 equal parts parts = np.split(arr, 3) print(parts) # [array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8, 9, 10, 11])] |
Wait — the last piece has more elements? → No, actually this fails if the array size is not perfectly divisible.
|
0 1 2 3 4 5 6 |
np.split(arr, 5) # ValueError: array split does not result in an equal division |
Using indices instead of number of sections (very useful)
|
0 1 2 3 4 5 6 7 8 9 |
# Cut at positions 3 and 7 splits = np.split(arr, [3, 7]) print(splits) # [array([0, 1, 2]), array([3, 4, 5, 6]), array([7, 8, 9,10,11])] |
→ The numbers [3,7] mean split before index 3 and before index 7.
np.array_split() — more forgiving (you will use this most often)
It automatically handles cases where the array cannot be divided equally.
|
0 1 2 3 4 5 6 7 8 9 10 |
arr = np.arange(11) # 0 to 10 → 11 elements parts = np.array_split(arr, 3) print(parts) # [array([0, 1, 2, 3]), array([4, 5, 6, 7]), array([ 8, 9, 10])] |
→ First part gets 4 elements, others get 3 — very practical!
2. 2D splitting – most common real use
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 |
matrix = np.arange(24).reshape(6, 4) print(matrix) # [[ 0 1 2 3] # [ 4 5 6 7] # [ 8 9 10 11] # [12 13 14 15] # [16 17 18 19] # [20 21 22 23]] |
Vertical split (split by rows) → vsplit() or split(axis=0)
|
0 1 2 3 4 5 6 7 8 9 10 |
# Split into 3 parts along rows top, mid, bottom = np.vsplit(matrix, 3) print(top) # [[ 0 1 2 3] # [ 4 5 6 7]] |
Or using split / array_split:
|
0 1 2 3 4 5 6 7 8 |
parts = np.array_split(matrix, 3, axis=0) print(len(parts)) # 3 print(parts[0].shape) # (2, 4) |
Horizontal split (split by columns) → hsplit() or axis=1
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
left, right = np.hsplit(matrix, 2) print(left) # [[ 0 1] # [ 4 5] # [ 8 9] # [12 13] # [16 17] # [20 21]] print(right) # [[ 2 3] # [ 6 7] # [10 11] # [14 15] # [18 19] # [22 23]] |
3. Splitting 3D arrays – dsplit() or axis=2
Very common when working with images (channels) or volumes.
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 |
# Imagine 2 images of 4×4 pixels with 3 color channels images = np.arange(96).reshape(2, 4, 4, 3) # (batch, height, width, channels) # Split into separate color channels R, G, B = np.dsplit(images, 3) print(R.shape) # (2, 4, 4, 1) print(R[0, :, :, 0]) # first image – Red channel |
4. Realistic patterns you will actually use
Pattern 1: Train / validation / test split (very common)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 |
data = np.random.randn(10000, 30) # 10,000 samples × 30 features train, val, test = np.array_split(data, [8000, 9000], axis=0) print(train.shape) # (8000, 30) print(val.shape) # (1000, 30) print(test.shape) # (1000, 30) |
Pattern 2: Splitting time series into chunks
|
0 1 2 3 4 5 6 7 8 9 |
ts = np.arange(1200) # 1200 time points chunks = np.array_split(ts, 10) # 10 chunks print([len(c) for c in chunks]) # most have 120, some 120 |
Pattern 3: Separating features / targets
|
0 1 2 3 4 5 6 7 8 9 |
dataset = np.loadtxt('data.csv', delimiter=',', skiprows=1) X = dataset[:, :-1] # all columns except last y = dataset[:, -1] # only last column |
Pattern 4: Splitting image into patches / tiles
|
0 1 2 3 4 5 6 7 8 9 |
img = np.random.randint(0, 256, (512, 512, 3), dtype=np.uint8) tiles = np.array_split(img, 4, axis=0) # split into 4 horizontal strips print([tile.shape for tile in tiles]) # [(128,512,3), ...] |
Summary – Quick Decision Table
| You want to… | Best function(s) |
|---|---|
| Split into exactly equal parts | np.split(…, sections) |
| Split into roughly equal parts (forgiving) | np.array_split(…) ← most common |
| Split rows / vertically | np.vsplit() or split(…, axis=0) |
| Split columns / horizontally | np.hsplit() or split(…, axis=1) |
| Split depth / channels (3D) | np.dsplit() or split(…, axis=2) |
| Split along any axis | np.split() / np.array_split(…, axis=…) |
| Split using exact cut positions | np.split(…, [idx1, idx2, …]) |
Common Mistakes to Avoid
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# Mistake 1: Using split() when size is not divisible np.split(np.arange(10), 3) # ValueError # Mistake 2: Forgetting axis in 2D/3D np.split(matrix, 2) # splits along axis=0 (rows) by default # Mistake 3: Expecting list of arrays but getting tuple parts = np.vsplit(matrix, 2) print(type(parts)) # <class 'tuple'> left, right = parts # correct unpacking |
Would you like to continue with any of these next?
- Splitting with uneven sizes & custom indices in detail
- Combining split + reshape patterns
- Performance: split vs manual slicing
- Realistic mini-project: splitting dataset, image, time series
- Difference between split / array_split / vs slicing
Just tell me what feels most useful right now! 😊
