Chapter 10: Multinomial Distribution

1. What is the Multinomial distribution really?

The multinomial distribution is the generalization of the binomial distribution to more than two categories.

  • Binomial = 2 outcomes (success/failure, heads/tails, yes/no)
  • Multinomial = k outcomes (k ≥ 2 categories)

You have:

  • n independent trials / experiments / draws
  • k possible categories / outcomes / classes
  • p₁, p₂, …, pₖ probabilities for each category (they sum to 1)
  • Each trial must produce exactly one of the k categories

The multinomial tells you the probability of getting a particular combination of counts across all categories.

2. Classic real-world examples (these appear very often)

Situation n (trials) Categories (k) Probabilities p₁, p₂, … What we count
Rolling a 6-sided die 100 times 100 {1,2,3,4,5,6} each 1/6 How many times each face appeared
Customers choosing product categories 5000 purchases {electronics, clothing, books, …} different shares How many purchases in each category
Words in a document 800 words vocabulary size (e.g. 5000) word probabilities Word counts (bag-of-words)
Image classification predictions 10000 images {cat, dog, bird, car, …} predicted probabilities Predicted class counts
A/B/C test results 30000 visitors {A, B, C} conversion rates Number of conversions per variant
Election votes millions {party A, B, C, …} vote shares Votes per party

3. The core idea with a small example

Imagine you have a biased 3-sided die with probabilities:

  • Face A: 0.5
  • Face B: 0.3
  • Face C: 0.2

You roll it n = 10 times.

Possible outcomes are all combinations where the counts add up to 10, e.g.:

  • (A=6, B=3, C=1)
  • (A=4, B=4, C=2)
  • (A=10, B=0, C=0)
  • etc.

The multinomial gives the probability of each possible count vector.

4. Generating multinomial data in NumPy

Python

Multiple independent experiments at once (very common)

Python

5. Visualizing multinomial counts (very important)

Python

What you should observe:

  • Higher p → distribution centered farther to the right
  • Lower p → distribution squeezed near zero, more skewed
  • Variance = n × p × (1-p) → maximum when p=0.5

6. Realistic code patterns you will actually use

Pattern 1 – Simulating A/B/C test results

Python

Pattern 2 – Simulating bag-of-words / topic proportions

Python

Pattern 3 – Simulating class distribution in imbalanced classification

Python

Summary – Multinomial Distribution Quick Reference

Property Value / Formula
Number of trials n (fixed)
Number of categories k (≥ 2)
Probabilities p₁ + p₂ + … + pₖ = 1
Counts vector (c₁, c₂, …, cₖ) where c₁ + … + cₖ = n
Expected count for category i n × pᵢ
Variance for category i n × pᵢ × (1 − pᵢ)
Covariance between i and j −n × pᵢ × pⱼ (negative!)
NumPy function np.random.multinomial(n, pvals, size=…)
When n large & pᵢ not extreme Approximates normal (multivariate)

Final teacher messages

  1. Whenever you are counting “how many times each category appeared after n trials” → think multinomial.
  2. Multinomial is the multivariate version of binomial.
  3. When you only care about one category vs everything else → you can collapse to binomial.
  4. Multinomial + Dirichlet is the foundation of topic modeling (LDA).

Would you like to continue with any of these next?

  • Multinomial vs multinomial logistic regression
  • Dirichlet-multinomial (topic modeling intuition)
  • How multinomial becomes multivariate normal (large n)
  • Realistic mini-project: simulate customer segments or A/B/C test
  • Comparing multinomial to binomial & Poisson

Just tell me what you want to explore next! 😊

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *