TheoremComplete

Probability Spaces and Events - Main Theorem

Bayes' theorem is the cornerstone of statistical inference, providing a systematic method for updating beliefs in light of new evidence. It connects forward probabilities (from causes to effects) with inverse probabilities (from effects to causes).

Bayes' Theorem

Theorem

Let (Ω,F,P)(\Omega, \mathcal{F}, P) be a probability space, and let {B1,B2,,Bn}\{B_1, B_2, \ldots, B_n\} be a partition of Ω\Omega with P(Bi)>0P(B_i) > 0 for all ii. For any event AA with P(A)>0P(A) > 0: P(BjA)=P(ABj)P(Bj)i=1nP(ABi)P(Bi)P(B_j|A) = \frac{P(A|B_j) P(B_j)}{\sum_{i=1}^n P(A|B_i) P(B_i)}

Proof Outline: Starting from the definition of conditional probability: P(BjA)=P(ABj)P(A)=P(ABj)P(Bj)P(A)P(B_j|A) = \frac{P(A \cap B_j)}{P(A)} = \frac{P(A|B_j) P(B_j)}{P(A)}

Applying the law of total probability to the denominator: P(A)=i=1nP(ABi)P(Bi)P(A) = \sum_{i=1}^n P(A|B_i) P(B_i)

Substitution yields the result. □

Interpretation and Terminology

In Bayesian terminology:

  • P(Bj)P(B_j) is the prior probability of hypothesis BjB_j (before observing data)
  • P(ABj)P(A|B_j) is the likelihood of observing data AA given hypothesis BjB_j
  • P(BjA)P(B_j|A) is the posterior probability of hypothesis BjB_j (after observing data)
  • iP(ABi)P(Bi)\sum_i P(A|B_i)P(B_i) is the marginal likelihood or evidence

The theorem states: Posterior is proportional to likelihood times prior.

Example

Medical Diagnosis: A rare disease affects 0.1% of the population. A diagnostic test has 99% sensitivity (detects disease when present) and 95% specificity (correct when disease absent).

Let DD = "has disease", TT = "tests positive". Given:

  • Prior: P(D)=0.001P(D) = 0.001, P(Dc)=0.999P(D^c) = 0.999
  • Likelihood: P(TD)=0.99P(T|D) = 0.99, P(TDc)=0.05P(T|D^c) = 0.05

By Bayes' theorem: P(DT)=P(TD)P(D)P(TD)P(D)+P(TDc)P(Dc)=0.99×0.0010.99×0.001+0.05×0.999P(D|T) = \frac{P(T|D)P(D)}{P(T|D)P(D) + P(T|D^c)P(D^c)} = \frac{0.99 \times 0.001}{0.99 \times 0.001 + 0.05 \times 0.999} =0.000990.050940.01942%= \frac{0.00099}{0.05094} \approx 0.0194 \approx 2\%

Despite the test's high accuracy, a positive result only indicates 2% probability of disease due to the low base rate (prior).

Continuous Version

For continuous random variables with densities, Bayes' theorem takes the form: fΘX(θx)=fXΘ(xθ)fΘ(θ)fX(x)=fXΘ(xθ)fΘ(θ)fXΘ(xt)fΘ(t)dtf_{\Theta|X}(\theta|x) = \frac{f_{X|\Theta}(x|\theta) f_{\Theta}(\theta)}{f_X(x)} = \frac{f_{X|\Theta}(x|\theta) f_{\Theta}(\theta)}{\int f_{X|\Theta}(x|t) f_{\Theta}(t) \, dt}

This form is fundamental in Bayesian statistics, where we update our beliefs about parameters θ\theta based on observed data xx.

Remark

Bayes' theorem is not controversial—it follows directly from the definition of conditional probability. The philosophical debate in statistics centers on whether we should assign probability distributions to parameters (Bayesian approach) or treat them as fixed unknowns (frequentist approach).