Chapter 13: NumPy Joining Array
NumPy Joining Arrays — written as if I’m your patient teacher sitting next to you, showing examples on the screen, explaining exactly what’s happening, when to use each method, common beginner mistakes, and realistic patterns you will actually use.
Let’s go step by step.
|
0 1 2 3 4 5 6 |
import numpy as np |
What does “joining arrays” really mean?
Joining = combining multiple arrays into a single larger array.
NumPy provides several functions to do this, and each one has a slightly different purpose:
| Function | What it does | Most common dimension (axis) |
|---|---|---|
| np.concatenate() | General-purpose joining along any axis | axis=0 or axis=1 |
| np.vstack() | Vertical stack (like stacking rows) | always along axis=0 |
| np.hstack() | Horizontal stack (like putting columns side by side) | always along axis=1 |
| np.dstack() | Depth stack (adds a new 3rd dimension) | along axis=2 |
| np.column_stack() | Special case: turns 1D arrays into columns | treats 1D as columns |
| np.row_stack() | Alias for vstack | along axis=0 |
| np.append() | Convenient but often slower | usually along axis=0 |
1. The most important function: np.concatenate()
This is the general-purpose and most flexible method.
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
a = np.array([1, 2, 3]) b = np.array([4, 5, 6]) # Join along axis=0 (default) → vertical / rows c = np.concatenate([a, b]) print(c) # [1 2 3 4 5 6] # Same as: c = np.concatenate((a, b), axis=0) |
2D example – most common real use
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
A = np.array([[1, 2], [3, 4]]) B = np.array([[5, 6], [7, 8]]) # Stack vertically (add rows) → axis=0 vert = np.concatenate([A, B], axis=0) print(vert) # [[1 2] # [3 4] # [5 6] # [7 8]] # Stack horizontally (add columns) → axis=1 horiz = np.concatenate([A, B], axis=1) print(horiz) # [[1 2 5 6] # [3 4 7 8]] |
Very important rule:
All arrays must have the same shape in all dimensions except the one you are joining along.
Wrong example:
|
0 1 2 3 4 5 6 7 8 9 |
x = np.array([[1,2,3]]) y = np.array([[4,5]]) # ← different number of columns np.concatenate([x, y], axis=0) # ValueError: all the input array dimensions except for the concatenation axis must match exactly |
2. Convenience functions: vstack, hstack, dstack
These are just special cases of concatenate — easier to remember.
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
# vstack = vertical stack = concatenate along axis=0 print(np.vstack([A, B])) # same as concatenate(..., axis=0) # hstack = horizontal stack = concatenate along axis=1 (for 2D) print(np.hstack([A, B])) # same as concatenate(..., axis=1) # For 1D arrays → hstack behaves differently p = np.array([10, 20, 30]) q = np.array([40, 50, 60]) print(np.hstack([p, q])) # [10 20 30 40 50 60] ← just like concatenate print(np.vstack([p, q])) # [[10 20 30] # [40 50 60]] ← turns 1D into rows |
dstack — depth / third dimension (less common but useful)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
x = np.array([[1,2],[3,4]]) y = np.array([[5,6],[7,8]]) z = np.dstack([x, y]) print(z.shape) # (2, 2, 2) print(z) # [[[1 5] # [2 6]] # [[3 7] # [4 8]]] |
Think of it as: adding a new color channel or adding a new feature layer.
3. Very useful special case: np.column_stack()
This is extremely common when you have several 1D arrays and want to make them columns.
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
time = np.array([0, 1, 2, 3]) temp = np.array([22.5, 23.1, 24.0, 23.8]) pressure = np.array([1013, 1012, 1010, 1009]) data = np.column_stack((time, temp, pressure)) print(data) # [[ 0. 22.5 1013. ] # [ 1. 23.1 1012. ] # [ 2. 24. 1010. ] # [ 3. 23.8 1009. ]] |
→ This is very similar to np.c_[…] (another short syntax)
|
0 1 2 3 4 5 6 |
data2 = np.c_[time, temp, pressure] # exactly the same result |
4. Realistic patterns you will use very often
Pattern 1: Building a dataset row by row (simulation / logging)
|
0 1 2 3 4 5 6 7 8 9 10 11 |
# Start empty results = np.empty((0, 4)) # 0 rows, 4 columns for i in range(100): new_row = np.array([i, i**2, np.sin(i), np.random.randn()]) results = np.vstack([results, new_row]) # ← common but slow! |
Better & faster way (recommended):
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 |
# Collect in list first (much faster) rows = [] for i in range(100): rows.append([i, i**2, np.sin(i), np.random.randn()]) results = np.array(rows) # convert once at the end # or results = np.vstack(rows) |
Pattern 2: Combining features / channels
|
0 1 2 3 4 5 6 7 8 9 10 11 |
red = np.random.randint(0, 256, (100, 100)) green = np.random.randint(0, 256, (100, 100)) blue = np.random.randint(0, 256, (100, 100)) rgb = np.dstack([red, green, blue]) print(rgb.shape) # (100, 100, 3) ← perfect image shape |
Pattern 3: Merging train/test or old/new data
|
0 1 2 3 4 5 6 7 8 9 10 |
X_old = np.random.randn(800, 20) X_new = np.random.randn(200, 20) X_all = np.concatenate([X_old, X_new], axis=0) print(X_all.shape) # (1000, 20) |
Summary – Quick Decision Table
| You want to… | Best function(s) |
|---|---|
| Add rows / stack vertically | vstack() or concatenate(…, axis=0) |
| Add columns / stack horizontally | hstack() or concatenate(…, axis=1) |
| Turn 1D arrays into columns | column_stack() or np.c_[…] |
| Add a new depth / channel dimension | dstack() or concatenate(…, axis=2) |
| General case / more than 2 arrays | np.concatenate() |
| You are building an array incrementally | collect in list → np.array(list) or vstack at end |
Common Mistakes to Avoid
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
# Mistake 1: different shapes in non-join dimension a = np.ones((3,4)) b = np.ones((3,5)) np.hstack([a,b]) # ValueError # Mistake 2: forgetting list/tuple np.concatenate(a, b) # TypeError – must be sequence of arrays # Mistake 3: using append in a loop (very slow) for i in range(10000): arr = np.append(arr, i) # ← extremely inefficient – avoid! |
Would you like to go deeper into any of these topics next?
- Performance comparison: concatenate vs vstack vs list collecting
- Joining arrays with different dtypes (what happens?)
- Joining >2 arrays or joining along axis=2, axis=3…
- Real mini-project: combining measurement files / images / time series
- Difference between concatenate and append in detail
Just tell me what you want to focus on now! 😊
