TheoremComplete

Common Distributions - Main Theorem

The Central Limit Theorem is perhaps the most important result in probability theory, explaining why the normal distribution appears ubiquitously in nature and justifying many statistical procedures.

Central Limit Theorem (CLT)

Theorem

Let X1,X2,,XnX_1, X_2, \ldots, X_n be independent and identically distributed random variables with mean μ\mu and finite variance σ2\sigma^2. Define: Sn=i=1nXi,Xˉn=SnnS_n = \sum_{i=1}^n X_i, \quad \bar{X}_n = \frac{S_n}{n}

Then as nn \to \infty, the standardized sum converges in distribution to the standard normal: Zn=Snnμσn=Xˉnμσ/ndN(0,1)Z_n = \frac{S_n - n\mu}{\sigma\sqrt{n}} = \frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}} \xrightarrow{d} \mathcal{N}(0,1)

Equivalently, for any xx: P(Znx)Φ(x)=12πxet2/2dtP(Z_n \leq x) \to \Phi(x) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^x e^{-t^2/2} dt

This remarkable result states that sums (or averages) of IID random variables become approximately normal, regardless of the original distribution!

Example

Roll a fair die n=100n = 100 times. Each roll has mean μ=3.5\mu = 3.5 and variance σ2=35/122.917\sigma^2 = 35/12 \approx 2.917.

Total sum S100S_{100} has: E[S100]=100(3.5)=350,Var(S100)=100(35/12)291.7E[S_{100}] = 100(3.5) = 350, \quad \text{Var}(S_{100}) = 100(35/12) \approx 291.7

By CLT, S100N(350,291.7)S_{100} \approx \mathcal{N}(350, 291.7).

Probability total is between 330 and 370: P(330<S100<370)Φ(370350291.7)Φ(330350291.7)P(330 < S_{100} < 370) \approx \Phi\left(\frac{370-350}{\sqrt{291.7}}\right) - \Phi\left(\frac{330-350}{\sqrt{291.7}}\right) Φ(1.17)Φ(1.17)2Φ(1.17)10.758\approx \Phi(1.17) - \Phi(-1.17) \approx 2\Phi(1.17) - 1 \approx 0.758

Conditions and Extensions

Lindeberg-Lévy CLT: The basic form requires IID with finite variance.

Lyapunov CLT: Allows non-identical distributions under certain growth conditions on moments.

Berry-Esseen Theorem: Quantifies the rate of convergence: supxP(Znx)Φ(x)Cρσ3n\sup_x |P(Z_n \leq x) - \Phi(x)| \leq \frac{C\rho}{\sigma^3\sqrt{n}}

where ρ=E[Xμ3]\rho = E[|X - \mu|^3] and C0.4748C \approx 0.4748. This bounds the approximation error.

Example

For Bernoulli(p)(p): μ=p\mu = p, σ2=p(1p)\sigma^2 = p(1-p), ρ=p(1p)(12p)2+p2(1p)+p(1p)2\rho = p(1-p)(1-2p)^2 + p^2(1-p) + p(1-p)^2.

For p=0.5p = 0.5 and n=30n = 30: Error0.4748×0.1250.125300.089\text{Error} \leq \frac{0.4748 \times 0.125}{0.125\sqrt{30}} \approx 0.089

The normal approximation is accurate to within about 9%.

Applications

Statistical Inference: The CLT justifies using normal-based confidence intervals and hypothesis tests for sample means, even when the population distribution is non-normal.

Quality Control: If individual measurements have mean μ\mu and variance σ2\sigma^2, the average of nn measurements has approximately: XˉN(μ,σ2n)\bar{X} \sim \mathcal{N}\left(\mu, \frac{\sigma^2}{n}\right)

The standard error σ/n\sigma/\sqrt{n} decreases with sample size.

Example

A factory produces bolts with mean length 10cm and std dev 0.5cm (unknown distribution). Taking sample of n=25n = 25 bolts: XˉN(10,(0.5)225)=N(10,0.01)\bar{X} \approx \mathcal{N}\left(10, \frac{(0.5)^2}{25}\right) = \mathcal{N}(10, 0.01)

Standard error: 0.5/25=0.10.5/\sqrt{25} = 0.1 cm.

95% confidence interval for population mean: xˉ±1.96(0.1)=xˉ±0.196\bar{x} \pm 1.96(0.1) = \bar{x} \pm 0.196

If sample mean is 10.15cm, we're 95% confident true mean is in [9.954,10.346][9.954, 10.346].

Remark

The CLT explains why the normal distribution dominates statistics: many measurable quantities are sums or averages of independent effects. Heights, test scores, measurement errors—all tend toward normality due to the CLT. This theorem is the theoretical foundation for the ubiquity of the Gaussian distribution in nature and statistics.