Chapter 5: NumPy Logs
NumPy Logs tutorial — written as if I’m sitting next to you, explaining slowly and patiently, showing many small realistic examples, comparing different log functions, warning about common traps, and giving you patterns you will actually use in real data analysis, machine learning, statistics, signal processing, and scientific computing.
Let’s pretend we’re looking at the same notebook.
|
0 1 2 3 4 5 6 7 |
import numpy as np import matplotlib.pyplot as plt |
1. The four most important logarithm functions in NumPy
NumPy provides four different log functions — each one has its own purpose and safety characteristics.
| Function | Full name | Computes | Best used when | Domain / Notes |
|---|---|---|---|---|
| np.log | Natural logarithm | ln(x) = logₑ(x) | Most mathematical & statistical work | x > 0 |
| np.log1p | log(1 + x) | ln(1 + x) | Very small positive x | Accurate for x ≈ 0 |
| np.log10 | Base-10 logarithm | log₁₀(x) | When you need decadic (common) log | x > 0 |
| np.log2 | Base-2 logarithm | log₂(x) | Bits, information theory, computer science | x > 0 |
2. Why do we have log1p? (very important to understand)
For very small x > 0, np.log(1 + x) suffers from catastrophic cancellation in floating-point arithmetic.
|
0 1 2 3 4 5 6 7 8 9 10 |
x = 1e-10 print("np.log(1 + x) =", np.log(1 + x)) # → 0.0 (lost precision!) print("np.log1p(x) =", np.log1p(x)) # → correct small value print("Exact value (math)≈", x - x**2/2 + x**3/3) # ≈ x when x very small |
Rule #1 (write this down):
Whenever you need log(1 + x) and x is small (especially < 1e-6), always use np.log1p(x) instead of np.log(1 + x)
3. Basic usage examples – all four logs
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 |
x_positive = np.array([0.001, 0.1, 1.0, 2.71828, 10, 100, 1000]) print("x =", x_positive) print("\nNatural log (ln) =", np.log(x_positive).round(5)) print("log(1+x) =", np.log1p(x_positive).round(5)) print("Base-10 log =", np.log10(x_positive).round(5)) print("Base-2 log =", np.log2(x_positive).round(5)) |
Output highlights:
|
0 1 2 3 4 5 6 7 8 9 |
Natural log (ln) = [-6.90776 -2.30259 0. 1. 2.30259 4.60517 6.90776] log(1+x) = [ 0.0009995 0.09531 0.693147 1.31326 2.3979 4.61512 6.90875] Base-10 log = [-3. -1. 0. 0.43429 1. 2. 3. ] Base-2 log = [-9.96578 -3.32193 0. 1.4427 3.32193 6.64386 9.96578] |
4. What happens when x ≤ 0? (very common trap)
|
0 1 2 3 4 5 6 7 8 9 |
bad_values = np.array([0.0, -0.1, -1.0, np.nan, np.inf]) print("np.log(bad_values) =", np.log(bad_values)) print("np.log1p(bad_values)=", np.log1p(bad_values)) |
Output:
|
0 1 2 3 4 5 6 7 |
np.log(bad_values) = [ -inf nan nan nan inf] np.log1p(bad_values) = [ 0. nan nan nan inf] |
Important behavior:
- log(0) → -inf (negative infinity)
- log(negative) → nan (not a number)
- log(nan) → nan
- log(inf) → inf
Safe pattern (you will use this often):
|
0 1 2 3 4 5 6 7 8 9 |
x = np.random.uniform(-2, 10, 1000) # Protect against invalid values safe_log = np.log(np.maximum(x, 1e-10)) # or np.where(x > 0, np.log(x), np.nan) |
5. Realistic patterns you will actually write many times
Pattern 1 – Log-transform skewed data (very common in ML & statistics)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
# Income / price / file size / view counts – typical right-skewed values = np.random.pareto(a=1.5, size=10000) + 1 # shifted Pareto # Before log plt.hist(values, bins=80, density=True, alpha=0.7, color='coral') plt.title("Original – very skewed") plt.show() # After log log_values = np.log1p(values) # safe for small values too plt.hist(log_values, bins=60, density=True, alpha=0.7, color='teal') plt.title("After np.log1p – much more symmetric") plt.show() |
Pattern 2 – Log-scale plotting (very common)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
x = np.logspace(-3, 4, 200) # from 0.001 to 10000 y = 1 / (1 + np.exp(-x)) # sigmoid plt.plot(x, y, lw=2.5) plt.xscale('log') # ← log scale on x-axis plt.title("Sigmoid on log x-scale") plt.xlabel("Input (log scale)") plt.ylabel("Output") plt.grid(True, which="both", ls="--", alpha=0.4) plt.show() |
Pattern 3 – Convert log-returns to prices
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
log_returns = np.random.normal(0, 0.01, 252) # daily log-returns prices = 100 * np.exp(np.cumsum(log_returns)) plt.plot(prices) plt.title("Simulated stock price path from log-returns") plt.ylabel("Price") plt.xlabel("Trading days") plt.show() |
Pattern 4 – Log-loss / cross-entropy (machine learning)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 |
y_true = np.array([0, 0, 1, 1, 0, 1]) y_pred = np.array([0.1, 0.3, 0.7, 0.9, 0.2, 0.85]) # Binary cross-entropy (log-loss) loss = -np.mean(y_true * np.log(y_pred + 1e-10) + (1 - y_true) * np.log(1 - y_pred + 1e-10)) print("Log-loss:", loss.round(4)) |
Summary – NumPy Log Functions Quick Reference
| Function | Computes | Best used when | Danger zone |
|---|---|---|---|
| np.log | ln(x) | General math / stats | x ≤ 0 → nan / -inf |
| np.log1p | ln(1 + x) | Very small x > 0 (most important!) | x < -1 → nan |
| np.log10 | log₁₀(x) | Decadic / engineering / pH / decibels | x ≤ 0 → nan / -inf |
| np.log2 | log₂(x) | Computer science / bits / information | x ≤ 0 → nan / -inf |
Final teacher advice (very important)
Golden rule #1 Whenever you write np.log(1 + x) or log(1 + small_value) — replace it with np.log1p(x). This is one of the most common numerical accuracy mistakes people make.
Golden rule #2 Protect against invalid input:
|
0 1 2 3 4 5 6 |
safe_log = np.log(np.maximum(x, 1e-10)) # or np.where(x > 0, np.log(x), np.nan) |
Golden rule #3 Use log-scale plots (plt.xscale(‘log’), plt.yscale(‘log’)) whenever you have heavy-tailed or exponential-looking data.
Would you like to go deeper into any of these areas?
- How to safely handle log of zero / negative values in real datasets
- Log-transforms in machine learning (when & why)
- Logarithmic binning for histograms
- Realistic mini-project: clean & visualize skewed real-world data (prices, counts…)
- Difference between log1p vs expm1 (the pair)
Just tell me what you want to focus on next! 😊
