ConceptComplete

Maximum Likelihood Estimation

Maximum likelihood estimation (MLE) is the most widely used method of point estimation, producing estimators with excellent asymptotic properties by choosing the parameter value that makes the observed data most probable.


Definition

Definition

Let X1,,XnX_1, \ldots, X_n be i.i.d. with density (or mass function) f(x;θ)f(x; \theta) for θΘ\theta \in \Theta. The likelihood function is L(θ)=L(θ;x1,,xn)=i=1nf(xi;θ)L(\theta) = L(\theta; x_1, \ldots, x_n) = \prod_{i=1}^n f(x_i; \theta) The log-likelihood is (θ)=logL(θ)=i=1nlogf(xi;θ)\ell(\theta) = \log L(\theta) = \sum_{i=1}^n \log f(x_i; \theta). The maximum likelihood estimator (MLE) is θ^MLE=argmaxθΘL(θ)=argmaxθΘ(θ)\hat{\theta}_{MLE} = \arg\max_{\theta \in \Theta} L(\theta) = \arg\max_{\theta \in \Theta} \ell(\theta)

Under regularity conditions, the MLE satisfies the score equation θ(θ^)=0\frac{\partial \ell}{\partial \theta}(\hat{\theta}) = 0.


Examples

ExampleMLE for normal parameters

For XiN(μ,σ2)X_i \sim N(\mu, \sigma^2): (μ,σ2)=n2log(2π)n2logσ212σ2(xiμ)2\ell(\mu, \sigma^2) = -\frac{n}{2}\log(2\pi) - \frac{n}{2}\log\sigma^2 - \frac{1}{2\sigma^2}\sum(x_i - \mu)^2 Setting μ=0\frac{\partial \ell}{\partial \mu} = 0 and σ2=0\frac{\partial \ell}{\partial \sigma^2} = 0 gives: μ^MLE=xˉ,σ^MLE2=1n(xixˉ)2\hat{\mu}_{MLE} = \bar{x}, \quad \hat{\sigma}^2_{MLE} = \frac{1}{n}\sum(x_i - \bar{x})^2 Note σ^MLE2\hat{\sigma}^2_{MLE} is biased (divides by nn, not n1n-1), but is consistent.

ExampleMLE for exponential distribution

For XiExp(λ)X_i \sim \text{Exp}(\lambda) with f(x;λ)=λeλxf(x;\lambda) = \lambda e^{-\lambda x}: (λ)=nlogλλxi\ell(\lambda) = n\log\lambda - \lambda \sum x_i Setting (λ)=n/λxi=0\ell'(\lambda) = n/\lambda - \sum x_i = 0 gives λ^MLE=1/xˉ\hat{\lambda}_{MLE} = 1/\bar{x}.


RemarkInvariance property

The MLE enjoys an invariance property: if θ^\hat{\theta} is the MLE of θ\theta, then g(θ^)g(\hat{\theta}) is the MLE of g(θ)g(\theta) for any function gg. This is useful because it means, for instance, the MLE of the standard deviation is the square root of the MLE of the variance, without needing to re-derive the estimator.