Chapter 6: NumPy Summations
1. Why do we need special summation functions in NumPy?
In plain Python we usually write:
|
0 1 2 3 4 5 6 7 |
numbers = [1, 2, 3, 4, 5] total = sum(numbers) # 15 |
This works fine for small lists, but when you work with NumPy arrays (especially large ones or multi-dimensional), using sum() has several problems:
- It is much slower than NumPy’s built-in methods
- It returns a Python scalar (not a NumPy type)
- It does not understand axes — you cannot easily sum rows, columns, layers…
- It does not handle NaN values the way scientific code usually wants
NumPy gives you several very fast, axis-aware, flexible summation tools.
2. The three most important summation functions
| Function | What it does | Most common use case | Returns |
|---|---|---|---|
| np.sum() | General-purpose sum, axis control | Almost everything | scalar or array |
| arr.sum() | Same as np.sum(arr), method version | Very common when you already have the array | scalar or array |
| np.nansum() | Sum, ignoring NaN values | Real-world data with missing values | scalar or array |
3. Basic usage – 1D arrays
|
0 1 2 3 4 5 6 7 8 9 10 |
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) print("np.sum(a) =", np.sum(a)) # 55 print("a.sum() =", a.sum()) # 55 print("sum(a) (Python)=", sum(a)) # 55 — but slower and returns int |
Quick performance comparison (you should try this yourself)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 |
large = np.arange(1_000_000) %timeit np.sum(large) # usually 500–1000× faster %timeit large.sum() %timeit sum(large) # very slow for large arrays |
4. The most important feature: axis parameter
This is where NumPy summation becomes really powerful.
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
matrix = np.array([ [ 1, 2, 3, 4], [ 5, 6, 7, 8], [ 9, 10, 11, 12], [13, 14, 15, 16] ]) print("matrix =\n", matrix) print("\nSum everything:", matrix.sum()) # 136 print("\nSum along axis=0 (columns):", matrix.sum(axis=0)) # [28 32 36 40] ← sum of each column print("\nSum along axis=1 (rows):", matrix.sum(axis=1)) # [10 26 42 58] ← sum of each row print("\nSum along axis=None (same as no axis):", matrix.sum(axis=None)) # 136 |
Visual memory aid:
|
0 1 2 3 4 5 6 7 |
axis=0 → vertical sums (collapse rows, keep columns) axis=1 → horizontal sums (collapse columns, keep rows) |
5. Very common real patterns you will write again and again
Pattern 1 – Row-wise and column-wise sums
|
0 1 2 3 4 5 6 7 8 9 10 11 12 |
sales = np.random.randint(100, 1000, size=(12, 7)) # 12 months × 7 stores monthly_total = sales.sum(axis=1) # total per month store_total = sales.sum(axis=0) # total per store print("Monthly totals:", monthly_total) print("Store totals: ", store_total) |
Pattern 2 – Mean, sum, and normalization along axis
|
0 1 2 3 4 5 6 7 8 9 10 11 12 |
X = np.random.randn(10000, 30) # 10k samples, 30 features # Standardize each feature (very common in ML) X_norm = (X - X.mean(axis=0)) / X.std(axis=0) # Or just sum along features row_sums = X.sum(axis=1) # total per sample |
Pattern 3 – Handling missing values with nansum
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
data = np.array([ [1.2, np.nan, 3.4], [5.6, 7.8, np.nan], [9.1, 2.3, 4.5] ]) print("Normal sum (gives nan):") print(data.sum(axis=0)) print("\nSafe sum (ignores nan):") print(np.nansum(data, axis=0)) # [15.9 10.1 7.9] |
Pattern 4 – Cumulative sums (very useful in time series)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
daily_sales = np.random.randint(50, 500, 30) cumulative = np.cumsum(daily_sales) plt.plot(cumulative, marker='o', ms=4, lw=1.5) plt.title("Cumulative sales over 30 days") plt.xlabel("Day") plt.ylabel("Total sales so far") plt.show() |
Pattern 5 – Weighted sum (dot product style)
|
0 1 2 3 4 5 6 7 8 9 10 |
weights = np.array([0.2, 0.3, 0.5]) values = np.random.rand(3) weighted_avg = np.sum(weights * values) # or np.dot(weights, values) print("Weighted average:", weighted_avg.round(4)) |
6. Summary – NumPy Summation Functions Quick Reference
| Function | Most common usage pattern | Returns when axis is used |
|---|---|---|
| np.sum(arr) | total sum, or arr.sum() | scalar or reduced array |
| arr.sum(axis=0) | sum down columns | array with one fewer dimension |
| arr.sum(axis=1) | sum across columns (row totals) | array with one fewer dimension |
| np.nansum | sum while ignoring NaN | same as sum |
| np.cumsum | cumulative / running sum | same shape as input |
| np.sum(…, keepdims=True) | keep reduced dimensions (useful for broadcasting later) | keeps shape with size 1 |
Final teacher advice (very important)
Golden rule #1 Always prefer np.sum() or arr.sum() over Python’s sum() when working with NumPy arrays.
Golden rule #2 Use axis= almost every time you have 2D or higher arrays — very few real problems want the total sum of everything.
Golden rule #3 Use nansum by default when your real data might contain missing values (NaN).
Golden rule #4 When you see code like this:
|
0 1 2 3 4 5 6 7 8 |
total = 0 for row in matrix: total += sum(row) |
→ rewrite it immediately as:
|
0 1 2 3 4 5 6 7 8 |
total = matrix.sum() # or column_totals = matrix.sum(axis=0) |
Would you like to go deeper into any of these topics?
- Summation along multiple axes (3D, 4D arrays)
- Difference between sum vs nansum vs nanmean
- Weighted sums, einsum, dot vs sum
- Cumulative sums vs diff (very useful in time series)
- Realistic mini-project: analyze sales / sensor data with axis sums
Just tell me what you want to focus on next! 😊
