Chapter 12: Chi Square Distribution

1. What is the Chi-Square distribution really?

The Chi-Square distribution (written χ²) is the distribution of the sum of squares of k independent standard normal random variables.

In simple words:

If you take k independent random numbers from a standard normal distribution (mean=0, sd=1), square each of them, and add them all together → the result follows a Chi-Square distribution with k degrees of freedom.

Mathematical definition:

Let Z₁, Z₂, …, Zₖ ~ N(0,1) and independent Then: Q = Z₁² + Z₂² + … + Zₖ² ~ χ²(k)

Key properties (write these down):

  • Only defined for x ≥ 0 (because squares are non-negative)
  • Degrees of freedom (df or k) is the only parameter
  • Mean = k
  • Variance = 2k
  • Shape: always right-skewed, but becomes more symmetric as k increases
  • When k ≥ 30 → looks quite similar to a normal distribution (Central Limit Theorem)

2. Visual intuition – how the shape changes with degrees of freedom

Python

What you should observe:

  • df = 1 → very strong right skew, peaks at 0
  • df = 2 → still skewed, but flatter
  • df = 5 → peak moves right, skew decreases
  • df = 10 → starting to look bell-like
  • df = 20+ → almost symmetric, looks similar to normal

3. Generating Chi-Square random numbers in NumPy / SciPy

Python

4. Where does Chi-Square appear in real life? (very important)

Most common situations you will actually meet

  1. Variance testing Sample variance of normal data → (n-1)S²/σ² ~ χ²(n-1)
  2. Goodness-of-fit test Comparing observed vs expected frequencies (classic χ² test)
  3. Independence test Contingency tables (χ² test of independence)
  4. Confidence interval for variance Used in quality control, process capability
  5. F-distribution (very important connection) F = (χ₁² / df₁) / (χ₂² / df₂) → used in ANOVA, regression
  6. Multiple linear regression Residual sum of squares / σ² ~ χ²(n-p)

5. Realistic examples & code you will actually write

Example 1 – Testing variance of measurements

Python

Example 2 – Chi-Square goodness-of-fit (classic dice test)

Python

6. Summary – Chi-Square Distribution Quick Reference

Property Value / Formula
Shape Right-skewed (skew decreases as df increases)
Defined by degrees of freedom k (or df)
Mean k
Variance 2k
Standard deviation √(2k)
Support x ≥ 0
PDF complicated (involves gamma function)
Most common use cases variance testing, goodness-of-fit, independence tests, F-distribution, regression diagnostics

Final teacher messages

  1. Whenever you see “sum of squares of normal variables” or “scaled variance” → think Chi-Square.
  2. Chi-Square is the building block for many other important distributions:
    • F-distribution
    • Chi-Square goodness-of-fit / independence tests
    • Confidence intervals for variance
  3. As df increases → Chi-Square becomes more symmetric → normal approximation becomes good (mean=k, variance=2k)

Would you like to go deeper into any of these next?

  • How to perform a real Chi-Square goodness-of-fit test step-by-step
  • Chi-Square vs F-distribution (very important connection)
  • Confidence interval for population variance using Chi-Square
  • Realistic mini-project: test whether dice rolls are fair
  • Difference between Chi-Square and non-central Chi-Square

Just tell me what feels most useful or interesting for you right now! 😊

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *