ConceptComplete

Common Distributions - Examples and Constructions

Real-world applications demonstrate how to select appropriate distributions based on the underlying data-generating process.

Modeling Count Data

Quality Control: A factory inspects batches of 100 items. The number of defective items follows Binomial(100,p)(100, p) where pp is the defect rate.

If p=0.02p = 0.02: P(at most 3 defects)=k=03(100k)(0.02)k(0.98)100kP(\text{at most 3 defects}) = \sum_{k=0}^3 \binom{100}{k} (0.02)^k (0.98)^{100-k}

For rare defects (pp small), Poisson(np=2)(np = 2) approximation: P(X3)=k=03e22kk!0.857P(X \leq 3) = \sum_{k=0}^3 \frac{e^{-2} \cdot 2^k}{k!} \approx 0.857

Example

Radioactive Decay: Particles arrive at a detector following a Poisson process with rate λ=5\lambda = 5 per minute. The number of arrivals in one minute follows Poisson(5)(5).

Probability of exactly 7 arrivals: P(X=7)=e5577!0.1044P(X = 7) = \frac{e^{-5} \cdot 5^7}{7!} \approx 0.1044

Time until first arrival follows Exponential(5)(5) with mean 0.2 minutes = 12 seconds.

Modeling Continuous Measurements

Measurement Errors: Scientific measurements often follow normal distributions due to the aggregation of many small independent errors (CLT).

Heights in a population: XN(170,100)X \sim \mathcal{N}(170, 100) cm (mean 170cm, std dev 10cm).

Probability someone is taller than 185cm: P(X>185)=1Φ(18517010)=1Φ(1.5)0.0668P(X > 185) = 1 - \Phi\left(\frac{185-170}{10}\right) = 1 - \Phi(1.5) \approx 0.0668

About 6.7% are taller than 185cm.

Reliability and Survival Analysis

Component Lifetimes: Electronic components often have exponentially distributed lifetimes if failures occur at constant rate.

For a component with mean lifetime 5 years (Exponential(1/5)(1/5)): P(lasts >10 years)=e10/5=e20.135P(\text{lasts } > 10 \text{ years}) = e^{-10/5} = e^{-2} \approx 0.135

System Reliability: For a system with nn independent components, each with lifetime Exponential(λ)(\lambda):

  • Series (all must work): Tmin=min{T1,,Tn}Exponential(nλ)T_{\text{min}} = \min\{T_1, \ldots, T_n\} \sim \text{Exponential}(n\lambda)
  • Parallel (one must work): Tmax=max{T1,,Tn}T_{\text{max}} = \max\{T_1, \ldots, T_n\} has more complex distribution
Example

Two redundant systems, each lasting Exponential(1)(1) years: P(Tmax>2)=P(T1>2 or T2>2)=1P(T12)P(T22)P(T_{\max} > 2) = P(T_1 > 2 \text{ or } T_2 > 2) = 1 - P(T_1 \leq 2)P(T_2 \leq 2) =1(1e2)20.271= 1 - (1-e^{-2})^2 \approx 0.271

Redundancy improves reliability!

Financial Applications

Stock Returns: Daily log-returns often modeled as normal. For a stock with annual return 8% and volatility 20%:

Daily return N(0.08/252,(0.20)2/252)\sim \mathcal{N}(0.08/252, (0.20)^2/252) (252 trading days/year)

Value at Risk (VaR): For a portfolio with value V \sim \mathcal{N}(\1M, $100K^2)$:

95% VaR (loss exceeded 5% of time): VaR0.05=μ1.645σ=1M1.645(100K)$835,500\text{VaR}_{0.05} = \mu - 1.645\sigma = 1M - 1.645(100K) \approx \$835,500

Expected loss is at most about $165K with 95% confidence.

Queueing Theory

Bank Teller: Customers arrive at rate λ=10\lambda = 10/hour (Poisson process). Service time per customer is Exponential(12/hour)(12/\text{hour}) (mean 5 minutes).

Number of arrivals in one hour: Poisson(10)(10)

Time until first arrival: Exponential(10)(10) with mean 6 minutes

For stability, arrival rate must be less than service rate: λ<μ\lambda < \mu (here 10<1210 < 12 ✓)

Remark

Distribution selection is an art informed by:

  1. Nature of the variable (discrete vs. continuous, bounded vs. unbounded)
  2. Physical process (arrivals → Poisson, waiting times → Exponential, sums → Normal)
  3. Empirical data (histogram, Q-Q plots, goodness-of-fit tests)
  4. Mathematical tractability (sometimes approximate distributions chosen for convenience)