Chapter 14: Pareto Distribution

1. What is the Pareto distribution really? (honest intuition)

The Pareto distribution is a power-law distribution — it describes phenomena where:

  • Most values are small / low / common
  • But there is a long, heavy right tail of very large / extreme / rare values

In plain language:

A small number of items are responsible for most of the value / impact / size.

This is the famous 80/20 rule (Pareto principle) in its mathematical form:

  • 20% of customers generate 80% of revenue
  • 20% of bugs cause 80% of crashes
  • 20% of websites get 80% of traffic
  • etc.

Key intuition (say this sentence out loud):

Pareto = “a few things are extremely large / important, most things are small / unimportant, and the relationship follows a power law”.

2. Two common parameterizations (you will see both)

Type I Pareto (most common in statistics / NumPy / SciPy)

  • xₘ (xm) = minimum possible value (scale / location parameter) → everything is ≥ xm
  • α (alpha) = shape parameter (tail index)

PDF (probability density function):

f(x) = α × xₘ^α / x^(α+1) for x ≥ xₘ f(x) = 0 otherwise

CDF (cumulative):

F(x) = 1 − (xₘ / x)^α for x ≥ xₘ

Mean (only exists when α > 1):

E[X] = α × xₘ / (α − 1)

Variance (only exists when α > 2):

Var(X) = α × xₘ² / ((α − 1)² (α − 2))

Rule of thumb:

  • α ≤ 1 → mean is infinite
  • 1 < α ≤ 2 → mean finite, but variance infinite
  • α > 2 → both mean and variance finite

Smaller αheavier tail (more extreme values)

3. Generating Pareto random numbers in NumPy / SciPy

Python

Different tail heaviness (very important to feel)

Python

Log-log plot (the signature view of power laws)

Python

4. Real-world situations where Pareto appears naturally

Domain / Phenomenon Typical Pareto parameters What is heavy-tailed
City population sizes α ≈ 1.0–1.2 Few megacities, many small towns
Company sizes / revenues α ≈ 1.0–1.5 Few giant corporations
Individual wealth / income α ≈ 1.5–2.0 Small number of billionaires
File sizes on internet servers α ≈ 1.0–1.5 Few very large files
Number of citations / popularity α ≈ 2.0–3.0 Few highly cited papers
Earthquake magnitudes α ≈ 1.0 (Gutenberg-Richter)
Insurance claims / losses α ≈ 1.0–1.5 Many small claims, few catastrophic
Web page views / traffic α ≈ 1.2–1.8 Few extremely popular sites

5. Realistic code patterns you will actually write

Pattern 1 – Simulating wealth / income distribution

Python

Pattern 2 – Simulating file sizes on a server

Python

Pattern 3 – Checking if data follows power-law tail (rough visual check)

Python

Summary – Pareto Distribution Quick Reference

Property Value / Formula
Shape Heavy right tail (power-law decay)
Defined by scale xm (minimum value), shape α (tail index)
Support x ≥ xm
Mean (exists only if α > 1) α × xm / (α − 1)
Variance (exists only if α > 2) α × xm² / ((α−1)² (α−2))
Mode xm (most probable value = minimum)
NumPy / SciPy stats.pareto.rvs(b=α, scale=xm, size=…)
Most common use cases wealth, city sizes, file sizes, claim sizes, popularity, natural extremes

Final teacher messages

  1. Whenever you see “a few very large values dominate everything” → think Pareto / power-law.
  2. Pareto tails are much heavier than exponential — extreme events are far more common.
  3. α ≤ 1 → infinite mean — very important in finance / insurance / risk modeling.
  4. Log-log plot being straight is the fingerprint of a power-law tail.

Would you like to continue with any of these next?

  • How to estimate α from real data (Hill estimator, log-log regression)
  • Pareto vs log-normal (two main heavy-tailed distributions)
  • Realistic mini-project: simulate wealth distribution + calculate inequality
  • Pareto in insurance / reinsurance (large claims modeling)
  • Difference between Pareto Type I, II, IV

Just tell me what you want to explore next! 😊

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *