Chapter 14: Pareto Distribution
1. What is the Pareto distribution really? (honest intuition)
The Pareto distribution is a power-law distribution — it describes phenomena where:
- Most values are small / low / common
- But there is a long, heavy right tail of very large / extreme / rare values
In plain language:
A small number of items are responsible for most of the value / impact / size.
This is the famous 80/20 rule (Pareto principle) in its mathematical form:
- 20% of customers generate 80% of revenue
- 20% of bugs cause 80% of crashes
- 20% of websites get 80% of traffic
- etc.
Key intuition (say this sentence out loud):
Pareto = “a few things are extremely large / important, most things are small / unimportant, and the relationship follows a power law”.
2. Two common parameterizations (you will see both)
Type I Pareto (most common in statistics / NumPy / SciPy)
- xₘ (xm) = minimum possible value (scale / location parameter) → everything is ≥ xm
- α (alpha) = shape parameter (tail index)
PDF (probability density function):
f(x) = α × xₘ^α / x^(α+1) for x ≥ xₘ f(x) = 0 otherwise
CDF (cumulative):
F(x) = 1 − (xₘ / x)^α for x ≥ xₘ
Mean (only exists when α > 1):
E[X] = α × xₘ / (α − 1)
Variance (only exists when α > 2):
Var(X) = α × xₘ² / ((α − 1)² (α − 2))
Rule of thumb:
- α ≤ 1 → mean is infinite
- 1 < α ≤ 2 → mean finite, but variance infinite
- α > 2 → both mean and variance finite
Smaller α → heavier tail (more extreme values)
3. Generating Pareto random numbers in NumPy / SciPy
|
0 1 2 3 4 5 6 7 8 9 10 11 12 |
# Pareto Type I with xm = 1 (minimum value), alpha = 3 (moderate tail) pareto_data = stats.pareto.rvs(b=3, scale=1, size=80000) print("First 10 values:", pareto_data[:10].round(2)) print("Minimum value:", pareto_data.min().round(2)) # should be ≈ 1 print("Sample mean:", pareto_data.mean().round(2)) print("Theoretical mean:", 3*1/(3-1)) # = 1.5 |
Different tail heaviness (very important to feel)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
# Heavy tail vs light tail heavy = stats.pareto.rvs(b=1.3, scale=1, size=50000) # α=1.3 → very heavy moderate = stats.pareto.rvs(b=3.0, scale=1, size=50000) light = stats.pareto.rvs(b=5.0, scale=1, size=50000) fig, ax = plt.subplots(figsize=(11, 6)) sns.kdeplot(heavy, label="α = 1.3 (very heavy tail)", linewidth=2.4, color="coral") sns.kdeplot(moderate, label="α = 3.0", linewidth=2.4, color="teal") sns.kdeplot(light, label="α = 5.0 (light tail)", linewidth=2.4, color="purple") plt.title("Pareto density – effect of shape parameter α", fontsize=14, pad=15) plt.xlabel("Value (log scale recommended for tails)", fontsize=12) plt.ylabel("Density", fontsize=12) plt.xscale('log') plt.xlim(0.8, 1000) plt.legend(title="Shape parameter α", fontsize=11) plt.show() |
Log-log plot (the signature view of power laws)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
# Log-log plot of survival function (1-CDF) — should be straight line x = np.linspace(1, 1000, 1000) for alpha in [1.5, 2.5, 4.0]: survival = (1 / x)**alpha plt.loglog(x, survival, lw=2.5, label=f"α = {alpha}") plt.title("Log-log survival plot – straight line = power law", fontsize=14) plt.xlabel("Value x (log)", fontsize=12) plt.ylabel("P(X > x) (log)", fontsize=12) plt.legend(title="Shape parameter α") plt.grid(True, which="both", ls="--", alpha=0.4) plt.show() |
4. Real-world situations where Pareto appears naturally
| Domain / Phenomenon | Typical Pareto parameters | What is heavy-tailed |
|---|---|---|
| City population sizes | α ≈ 1.0–1.2 | Few megacities, many small towns |
| Company sizes / revenues | α ≈ 1.0–1.5 | Few giant corporations |
| Individual wealth / income | α ≈ 1.5–2.0 | Small number of billionaires |
| File sizes on internet servers | α ≈ 1.0–1.5 | Few very large files |
| Number of citations / popularity | α ≈ 2.0–3.0 | Few highly cited papers |
| Earthquake magnitudes | α ≈ 1.0 (Gutenberg-Richter) | |
| Insurance claims / losses | α ≈ 1.0–1.5 | Many small claims, few catastrophic |
| Web page views / traffic | α ≈ 1.2–1.8 | Few extremely popular sites |
5. Realistic code patterns you will actually write
Pattern 1 – Simulating wealth / income distribution
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 |
# Realistic individual net worth (in thousands of dollars) # Minimum wealth = 10k, α ≈ 1.5 (heavy tail) wealth = stats.pareto.rvs(b=1.5, scale=10, size=100000) print("Median wealth:", np.median(wealth).round(1)) # around 20–30k print("Mean wealth:", wealth.mean().round(1)) # much higher due to tail print("Top 1% own what % of total wealth:", (wealth[wealth >= np.percentile(wealth, 99)].sum() / wealth.sum() * 100).round(1), "%") |
Pattern 2 – Simulating file sizes on a server
|
0 1 2 3 4 5 6 7 8 9 10 11 |
# File sizes in MB, minimum 0.1 MB, α ≈ 1.2 file_sizes = stats.pareto.rvs(b=1.2, scale=0.1, size=50000) print("Average file size:", file_sizes.mean().round(2), "MB") print("90th percentile:", np.percentile(file_sizes, 90).round(2), "MB") print("Max file size in sample:", file_sizes.max().round(2), "MB") |
Pattern 3 – Checking if data follows power-law tail (rough visual check)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
# Assume we have some data (here simulated) data = stats.pareto.rvs(b=1.7, scale=5, size=20000) # Survival function on log-log scale sorted_data = np.sort(data)[::-1] ranks = np.arange(1, len(sorted_data)+1) plt.loglog(ranks, sorted_data, '.', alpha=0.6, ms=3) plt.title("Log-log rank plot – straight line suggests Pareto tail") plt.xlabel("Rank (log)", fontsize=12) plt.ylabel("Value (log)", fontsize=12) plt.grid(True, which="both", ls="--", alpha=0.4) plt.show() |
Summary – Pareto Distribution Quick Reference
| Property | Value / Formula |
|---|---|
| Shape | Heavy right tail (power-law decay) |
| Defined by | scale xm (minimum value), shape α (tail index) |
| Support | x ≥ xm |
| Mean (exists only if α > 1) | α × xm / (α − 1) |
| Variance (exists only if α > 2) | α × xm² / ((α−1)² (α−2)) |
| Mode | xm (most probable value = minimum) |
| NumPy / SciPy | stats.pareto.rvs(b=α, scale=xm, size=…) |
| Most common use cases | wealth, city sizes, file sizes, claim sizes, popularity, natural extremes |
Final teacher messages
- Whenever you see “a few very large values dominate everything” → think Pareto / power-law.
- Pareto tails are much heavier than exponential — extreme events are far more common.
- α ≤ 1 → infinite mean — very important in finance / insurance / risk modeling.
- Log-log plot being straight is the fingerprint of a power-law tail.
Would you like to continue with any of these next?
- How to estimate α from real data (Hill estimator, log-log regression)
- Pareto vs log-normal (two main heavy-tailed distributions)
- Realistic mini-project: simulate wealth distribution + calculate inequality
- Pareto in insurance / reinsurance (large claims modeling)
- Difference between Pareto Type I, II, IV
Just tell me what you want to explore next! 😊
