TheoremComplete

Joint and Conditional Distributions - Applications

Multivariate techniques enable sophisticated analysis of dependent random variables in statistics and machine learning.

Multivariate Normal Distribution

Definition

The random vector X=(X1,,Xn)T\mathbf{X} = (X_1, \ldots, X_n)^T has multivariate normal distribution N(μ,Σ)\mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma}) with PDF: f(x)=1(2π)n/2Σ1/2exp(12(xμ)TΣ1(xμ))f(\mathbf{x}) = \frac{1}{(2\pi)^{n/2}|\boldsymbol{\Sigma}|^{1/2}} \exp\left(-\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu})^T\boldsymbol{\Sigma}^{-1}(\mathbf{x}-\boldsymbol{\mu})\right)

where μ\boldsymbol{\mu} is the mean vector and Σ\boldsymbol{\Sigma} is the covariance matrix.

Properties:

  • Linear combinations are normal: If Y=AX+b\mathbf{Y} = \mathbf{A}\mathbf{X} + \mathbf{b}, then YN(Aμ+b,AΣAT)\mathbf{Y} \sim \mathcal{N}(\mathbf{A}\boldsymbol{\mu} + \mathbf{b}, \mathbf{A}\boldsymbol{\Sigma}\mathbf{A}^T)
  • Marginals are normal
  • Conditionals are normal
  • Uncorrelated components are independent
Example

Portfolio Analysis: Three stocks with returns RN(μ,Σ)\mathbf{R} \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma}) where: μ=(0.080.120.10),Σ=(0.040.010.020.010.090.0150.020.0150.06)\boldsymbol{\mu} = \begin{pmatrix}0.08\\0.12\\0.10\end{pmatrix}, \quad \boldsymbol{\Sigma} = \begin{pmatrix}0.04 & 0.01 & 0.02\\0.01 & 0.09 & 0.015\\0.02 & 0.015 & 0.06\end{pmatrix}

Portfolio with weights w=(0.5,0.3,0.2)T\mathbf{w} = (0.5, 0.3, 0.2)^T has return: Rp=wTRN(wTμ,wTΣw)R_p = \mathbf{w}^T\mathbf{R} \sim \mathcal{N}(\mathbf{w}^T\boldsymbol{\mu}, \mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w})

Conditional Expectation

Theorem
  1. Tower Property: E[E[XY]]=E[X]E[E[X|Y]] = E[X]
  2. Taking Out Known: E[g(Y)XY]=g(Y)E[XY]E[g(Y)X|Y] = g(Y)E[X|Y]
  3. Independence: If XYX \perp Y, then E[XY]=E[X]E[X|Y] = E[X]
  4. Linearity: E[aX+bZY]=aE[XY]+bE[ZY]E[aX + bZ|Y] = aE[X|Y] + bE[Z|Y]
Example

Prediction: Minimize E[(Xg(Y))2]E[(X - g(Y))^2] over all functions gg.

Solution: g(Y)=E[XY]g^*(Y) = E[X|Y] (conditional expectation is optimal predictor)

For bivariate normal: E[XY=y]=μX+ρσXσY(yμY)E[X|Y=y] = \mu_X + \rho\frac{\sigma_X}{\sigma_Y}(y - \mu_Y)

This is linear regression!

Principal Component Analysis

PCA finds orthogonal directions of maximum variance in multivariate data.

Given covariance matrix Σ\boldsymbol{\Sigma}, find eigenvalues λ1λ2λn\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n and eigenvectors v1,,vn\mathbf{v}_1, \ldots, \mathbf{v}_n.

Principal components: Zi=viTXZ_i = \mathbf{v}_i^T \mathbf{X} with Var(Zi)=λi\text{Var}(Z_i) = \lambda_i.

First component captures direction of greatest variability.

Remark

Multivariate analysis is central to modern data science. From finance (portfolio optimization) to machine learning (dimensionality reduction), understanding joint distributions and their properties enables sophisticated modeling of complex, high-dimensional phenomena.