Chapter 6: Binomial Distribution
1. What is the Binomial distribution really?
The binomial distribution answers a very simple but extremely common question:
“If I repeat the same yes/no (success/failure) experiment n times independently, and each trial has the same probability p of success, what is the probability of getting exactly k successes?”
That’s it.
Key ingredients (you must remember these four):
- n = number of independent trials / attempts / repetitions
- p = probability of “success” on each trial (0 ≤ p ≤ 1)
- k = number of successes we are interested in (k = 0, 1, 2, …, n)
- Each trial must be independent and have exactly two possible outcomes (success / failure)
2. Classic everyday examples
| Example | n (trials) | p (success probability) | What k means |
|---|---|---|---|
| Flipping a fair coin 20 times | 20 | 0.5 | Number of heads |
| Clicking “buy now” on a website | 1000 | 0.024 | Number of purchases |
| Testing 50 light bulbs | 50 | 0.03 | Number of defective bulbs |
| Sending 200 emails in a campaign | 200 | 0.18 | Number of people who open the email |
| Shooting 10 free throws | 10 | 0.72 | Number of successful shots |
3. Generating binomial data in NumPy
|
0 1 2 3 4 5 6 7 8 9 10 11 12 |
# 10,000 simulations of flipping a fair coin 20 times # → how many heads each time? heads = np.random.binomial(n=20, p=0.5, size=10000) print("First 15 experiments:", heads[:15]) print("Average number of heads:", heads.mean().round(3)) print("Theoretical expected value:", 20 * 0.5) # n × p |
4. Visualizing binomial distributions (very important)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
fig, axes = plt.subplots(2, 2, figsize=(14, 10), sharey=False) # Different combinations of n and p params = [ (10, 0.5, "Coin flips – n=10, p=0.5", "skyblue"), (40, 0.5, "n=40, p=0.5", "teal"), (30, 0.1, "Rare events – n=30, p=0.1", "coral"), (30, 0.9, "Very likely events – n=30, p=0.9", "orchid") ] for ax, (n, p, title, color) in zip(axes.flat, params): data = np.random.binomial(n=n, p=p, size=40000) sns.histplot(data, bins=np.arange(-0.5, n+1.5, 1), stat="probability", discrete=True, color=color, alpha=0.8, ax=ax) ax.set_title(title, fontsize=13, pad=12) ax.set_xlabel("Number of successes (k)", fontsize=11) ax.set_ylabel("Probability", fontsize=11) ax.set_xticks(range(0, n+1, max(1, n//10))) plt.tight_layout() plt.show() |
What you should notice:
- When p = 0.5 → symmetric bell shape
- When p is small (e.g. 0.1) → skewed right (most values near 0)
- When p is large (e.g. 0.9) → skewed left
- As n increases → shape becomes more symmetric and bell-like (→ approaches normal!)
5. Expected value & variance – very important formulas
Expected number of successes (mean): E[k] = n × p
Variance: Var(k) = n × p × (1-p)
Standard deviation: σ = √(n × p × (1-p))
|
0 1 2 3 4 5 6 7 8 9 |
n, p = 200, 0.04 print(f"Expected conversions: {n * p:.1f}") print(f"Standard deviation: {np.sqrt(n * p * (1-p)):.2f}") |
→ Most of the time you will see roughly 8 ± 3 conversions (mean ± 1 sd)
6. Realistic examples you will actually use
Example 1 – A/B test simulation
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
# Control group: 12,000 visitors, 3.2% conversion rate control = np.random.binomial(n=12000, p=0.032, size=10000) # Variant B: 12,000 visitors, suppose true rate is 3.8% variant = np.random.binomial(n=12000, p=0.038, size=10000) diff = variant - control plt.hist(diff, bins=60, color="purple", alpha=0.7) plt.axvline(0, color='red', linestyle='--', lw=2) plt.title("Difference in conversions (Variant B – Control)\n10,000 simulated A/B tests") plt.xlabel("Extra conversions from Variant B") plt.show() |
Example 2 – Quality control
|
0 1 2 3 4 5 6 7 8 9 |
# Batch of 500 products, historical defect rate 1.8% defects = np.random.binomial(500, 0.018, size=2000) print("Probability of ≥ 15 defects:", np.mean(defects >= 15).round(4)) |
Example 3 – Email campaign planning
|
0 1 2 3 4 5 6 7 8 9 10 11 |
sent = 25000 open_rate = 0.22 opens = np.random.binomial(sent, open_rate, 5000) print(f"95% of campaigns will get between {np.percentile(opens, 2.5):.0f} and {np.percentile(opens, 97.5):.0f} opens") |
Summary – Binomial Distribution Cheat Sheet
| Property | Value / Formula |
|---|---|
| Number of trials | n (fixed) |
| Success probability | p (same for every trial) |
| Possible outcomes | k = 0, 1, 2, …, n |
| Expected value (mean) | n × p |
| Variance | n × p × (1-p) |
| Standard deviation | √(n × p × (1-p)) |
| NumPy function | np.random.binomial(n, p, size=…) |
| Shape when p ≈ 0.5 | Symmetric (bell-like for large n) |
| Shape when p << 0.5 | Right-skewed |
| Shape when p >> 0.5 | Left-skewed |
| Approximation for large n | Normal distribution (Central Limit Theorem) |
Final teacher messages
- Whenever you have “number of successes in fixed number of yes/no trials” → think binomial.
- When n is large and p is not too close to 0 or 1 → binomial looks very much like normal → you can often use normal approximation.
- Binomial + Poisson connection — when n is very large and p is very small (n×p = λ fixed) → binomial ≈ Poisson.
Would you like to continue with any of these next?
- How binomial becomes Poisson (rare events limit)
- How binomial becomes normal (large n)
- Binomial confidence intervals (real A/B testing)
- Comparing binomial simulations vs theoretical probabilities
- Realistic mini-project: simulate A/B test + power analysis
Just tell me what feels most interesting or useful right now! 😊
