Maximum Likelihood Estimation

Maximum likelihood estimation (MLE) is the most widely used method of point estimation, producing estimators with excellent asymptotic properties by choosing the parameter value that makes the observed data most probable.

Definition

Let $X_1, \ldots, X_n$ be i.i.d. with density (or mass function) $f(x; \theta)$ for $\theta \in \Theta$ . The likelihood function is $L(\theta) = L(\theta; x_1, \ldots, x_n) = \prod_{i=1}^n f(x_i; \theta)$ The log-likelihood is $\ell(\theta) = \log L(\theta) = \sum_{i=1}^n \log f(x_i; \theta)$ . The maximum likelihood estimator (MLE) is $\hat{\theta}_{MLE} = \arg\max_{\theta \in \Theta} L(\theta) = \arg\max_{\theta \in \Theta} \ell(\theta)$

Under regularity conditions, the MLE satisfies the score equation $\frac{\partial \ell}{\partial \theta}(\hat{\theta}) = 0$ .

Examples

ExampleMLE for normal parameters

For $X_i \sim N(\mu, \sigma^2)$ : $\ell(\mu, \sigma^2) = -\frac{n}{2}\log(2\pi) - \frac{n}{2}\log\sigma^2 - \frac{1}{2\sigma^2}\sum(x_i - \mu)^2$ Setting $\frac{\partial \ell}{\partial \mu} = 0$ and $\frac{\partial \ell}{\partial \sigma^2} = 0$ gives: $\hat{\mu}_{MLE} = \bar{x}, \quad \hat{\sigma}^2_{MLE} = \frac{1}{n}\sum(x_i - \bar{x})^2$ Note $\hat{\sigma}^2_{MLE}$ is biased (divides by $n$ , not $n-1$ ), but is consistent.

ExampleMLE for exponential distribution

For $X_i \sim \text{Exp}(\lambda)$ with $f(x;\lambda) = \lambda e^{-\lambda x}$ : $\ell(\lambda) = n\log\lambda - \lambda \sum x_i$ Setting $\ell'(\lambda) = n/\lambda - \sum x_i = 0$ gives $\hat{\lambda}_{MLE} = 1/\bar{x}$ .

RemarkInvariance property

The MLE enjoys an invariance property: if $\hat{\theta}$ is the MLE of $\theta$ , then $g(\hat{\theta})$ is the MLE of $g(\theta)$ for any function $g$ . This is useful because it means, for instance, the MLE of the standard deviation is the square root of the MLE of the variance, without needing to re-derive the estimator.