Chapter 8: NumPy Differences
1. What do we mean by “differences” in NumPy?
“Differences” means how values change from one element to the next.
In mathematics, this is the discrete derivative or first difference:
Δx[i] = x[i] − x[i−1]
NumPy gives you a very fast, vectorized way to compute these differences — no loops needed.
The main function is:
np.diff(arr, n=1, axis=-1)
- n = how many times to difference (1 = first difference, 2 = second difference…)
- axis = which direction to compute differences (usually -1 = last axis)
2. Basic usage – first differences (n=1)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
# Simple time series: daily temperatures temps = np.array([22.5, 23.1, 24.0, 23.8, 25.2, 26.1, 25.7, 24.9]) print("Original temperatures:", temps) daily_change = np.diff(temps) print("Daily changes :", daily_change.round(2)) # [ 0.6 0.9 -0.2 1.4 0.9 -0.4 -0.8] # Length is one shorter! print("Original length:", len(temps)) print("Diff length :", len(daily_change)) |
Key observation:
np.diff() returns an array one element shorter than the input because it needs a previous value to subtract.
3. Higher-order differences (n > 1)
|
0 1 2 3 4 5 6 7 8 9 10 11 |
print("First differences :", daily_change.round(2)) print("Second differences:", np.diff(temps, n=2).round(2)) # [ 0.3 -1.1 1.6 -0.5 -1.3 -0.4] print("Third differences :", np.diff(temps, n=3).round(2)) # [-1.4 2.7 -2.1 -0.8 0.9] |
Real meaning:
- First difference → daily change (velocity)
- Second difference → daily acceleration
- Third difference → jerk
Very useful in physics, finance, and signal processing.
4. Differences along different axes (2D and higher)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
# 2D array: sales over 4 weeks for 5 stores sales = np.array([ [120, 135, 140, 155], # store 1 [200, 190, 210, 220], # store 2 [80, 85, 90, 95], # store 3 [300, 320, 310, 340], # store 4 [150, 160, 155, 170] # store 5 ]) print("Sales data (stores × weeks):\n", sales) # Week-to-week change for each store (default axis=-1 = along weeks) weekly_change = np.diff(sales, axis=1) print("\nWeekly change per store:\n", weekly_change) # Store-to-store change in week 1 (axis=0) store_change_week1 = np.diff(sales[:, 0]) print("\nStore-to-store change in week 1:", store_change_week1) |
5. Very common realistic patterns you will write
Pattern 1 – Daily returns in finance
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
prices = np.array([100, 102, 101, 105, 103, 107, 110]) # Simple returns simple_returns = np.diff(prices) / prices[:-1] # Log returns (more common in finance) log_returns = np.diff(np.log(prices)) print("Prices :", prices) print("Simple ret :", simple_returns.round(4)) print("Log returns :", log_returns.round(4)) |
Pattern 2 – Detecting changes / edges in signals
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
# Simulate a step signal with noise signal = np.concatenate([np.ones(50)*5, np.ones(50)*10]) + np.random.normal(0, 0.5, 100) change = np.diff(signal) plt.plot(signal, label="Signal", lw=2) plt.plot(np.arange(1, len(signal)), change, label="First difference", lw=2, alpha=0.8) plt.axvline(50, color='red', ls='--', label="True change point") plt.legend() plt.title("Using differences to detect change points") plt.show() |
Pattern 3 – Acceleration from position data
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
# Position of an object over time (simulated) time = np.linspace(0, 10, 100) position = 0.5 * time**2 + 2 * time + 1 # quadratic motion velocity = np.diff(position) / np.diff(time) acceleration = np.diff(velocity) / np.diff(time[:-1]) plt.plot(time, position, label="Position", lw=2) plt.plot(time[:-1], velocity, label="Velocity", lw=2) plt.plot(time[:-2], acceleration, label="Acceleration", lw=2) plt.legend() plt.title("Position → Velocity → Acceleration using np.diff") plt.show() |
Pattern 4 – Percentage change
|
0 1 2 3 4 5 6 7 8 9 10 11 |
revenue = np.array([1.2e6, 1.35e6, 1.28e6, 1.5e6, 1.7e6]) pct_change = np.diff(revenue) / revenue[:-1] * 100 print("Revenue :", revenue) print("% change :", pct_change.round(2)) |
6. Summary – NumPy Differences Quick Reference
| Function / Usage | What it computes | Length of output |
|---|---|---|
| np.diff(arr) | first differences along last axis | len(arr) − 1 |
| np.diff(arr, n=2) | second differences | len(arr) − 2 |
| np.diff(arr, axis=0) | differences down columns | shape with axis reduced by 1 |
| np.diff(arr, axis=1) | differences across rows | shape with axis reduced by 1 |
| np.diff(prices) / prices[:-1] | simple returns | — |
| np.diff(np.log(prices)) | log returns (preferred in finance) | — |
Final teacher advice (very important)
Golden rule #1 Never write a loop to compute consecutive differences — use np.diff().
Golden rule #2 Remember: np.diff() makes the array shorter by n elements — be careful when aligning with original data.
Golden rule #3 For financial returns, prefer log returns:
|
0 1 2 3 4 5 6 |
log_returns = np.diff(np.log(prices)) |
They are additive over time and more statistically well-behaved.
Golden rule #4 When you want percentage change, do:
|
0 1 2 3 4 5 6 |
pct_change = np.diff(arr) / arr[:-1] * 100 |
Would you like to continue with any of these next?
- Differences vs gradient vs finite differences
- Using differences for outlier/change point detection
- Higher-order differences in time series analysis
- Realistic mini-project: analyze stock prices or sensor data
- Difference between diff and gradient in multiple dimensions
Just tell me what you want to focus on next! 😊
