ConceptComplete

Point Estimation

Point estimation concerns the problem of using sample data to produce a single best guess for an unknown population parameter, along with criteria for evaluating the quality of estimators.


Estimators and Their Properties

Definition

Let X1,,XnX_1, \ldots, X_n be a random sample from a distribution FθF_\theta parameterized by θΘ\theta \in \Theta. A point estimator of θ\theta is a statistic θ^=T(X1,,Xn)\hat{\theta} = T(X_1, \ldots, X_n) — a function of the observed data that does not depend on θ\theta.

Definition

Key properties of an estimator θ^\hat{\theta} for the parameter θ\theta:

  1. Bias: Bias(θ^)=E[θ^]θ\operatorname{Bias}(\hat{\theta}) = E[\hat{\theta}] - \theta. The estimator is unbiased if E[θ^]=θE[\hat{\theta}] = \theta for all θ\theta.
  2. Variance: Var(θ^)=E[(θ^E[θ^])2]\operatorname{Var}(\hat{\theta}) = E[(\hat{\theta} - E[\hat{\theta}])^2]
  3. Mean squared error: MSE(θ^)=E[(θ^θ)2]=Var(θ^)+[Bias(θ^)]2\operatorname{MSE}(\hat{\theta}) = E[(\hat{\theta} - \theta)^2] = \operatorname{Var}(\hat{\theta}) + [\operatorname{Bias}(\hat{\theta})]^2
  4. Consistency: θ^nPθ\hat{\theta}_n \xrightarrow{P} \theta as nn \to \infty for all θ\theta
  5. Efficiency: θ^\hat{\theta} achieves the Cramer-Rao lower bound

Common Estimators

ExampleSample mean and variance

For i.i.d. X1,,XnX_1, \ldots, X_n with mean μ\mu and variance σ2\sigma^2:

  • Xˉ=1nXi\bar{X} = \frac{1}{n}\sum X_i is unbiased for μ\mu with Var(Xˉ)=σ2/n\operatorname{Var}(\bar{X}) = \sigma^2/n
  • S2=1n1(XiXˉ)2S^2 = \frac{1}{n-1}\sum(X_i - \bar{X})^2 is unbiased for σ2\sigma^2
  • The biased version σ^2=1n(XiXˉ)2\hat{\sigma}^2 = \frac{1}{n}\sum(X_i - \bar{X})^2 has E[σ^2]=n1nσ2E[\hat{\sigma}^2] = \frac{n-1}{n}\sigma^2 and lower MSE than S2S^2 for normal populations

The Bias-Variance Tradeoff

RemarkMSE decomposition

The MSE decomposition MSE=Variance+Bias2\operatorname{MSE} = \operatorname{Variance} + \operatorname{Bias}^2 reveals a fundamental tradeoff: reducing bias may increase variance and vice versa. A biased estimator can have lower MSE than an unbiased one if the variance reduction more than compensates for the bias. This tradeoff is central to modern statistical methods including regularization, shrinkage estimators, and machine learning.