ConceptComplete

Expectation and Variance - Key Properties

Advanced properties of expectation and variance provide powerful tools for probability calculations and reveal deep connections between different concepts.

Conditional Expectation

Definition

The conditional expectation of XX given event BB (with P(B)>0P(B) > 0) is: E[XB]=xxP(X=xB)(discrete)E[X|B] = \sum_x x \cdot P(X=x|B) \quad \text{(discrete)} E[XB]=xfXB(x)dx(continuous)E[X|B] = \int_{-\infty}^{\infty} x \cdot f_{X|B}(x) \, dx \quad \text{(continuous)}

Law of Total Expectation: If {Bi}\{B_i\} is a partition of Ω\Omega: E[X]=iE[XBi]P(Bi)E[X] = \sum_i E[X|B_i] P(B_i)

Example

A fair die is rolled. If the outcome is even, you win the value shown. If odd, you win 0. Expected winnings:

Let B1B_1 = "even" and B2B_2 = "odd". Then: E[XB1]=2+4+63=4,E[XB2]=0E[X|B_1] = \frac{2+4+6}{3} = 4, \quad E[X|B_2] = 0 E[X]=412+012=2E[X] = 4 \cdot \frac{1}{2} + 0 \cdot \frac{1}{2} = 2

Covariance and Correlation

Definition

The covariance between random variables XX and YY is: Cov(X,Y)=E[(XμX)(YμY)]=E[XY]E[X]E[Y]\text{Cov}(X,Y) = E[(X-\mu_X)(Y-\mu_Y)] = E[XY] - E[X]E[Y]

The correlation coefficient is: ρ(X,Y)=Cov(X,Y)σXσY\rho(X,Y) = \frac{\text{Cov}(X,Y)}{\sigma_X \sigma_Y}

where σX=Var(X)\sigma_X = \sqrt{\text{Var}(X)} and σY=Var(Y)\sigma_Y = \sqrt{\text{Var}(Y)}.

Properties:

  • If XX and YY are independent, then Cov(X,Y)=0\text{Cov}(X,Y) = 0 (but not conversely!)
  • 1ρ(X,Y)1-1 \leq \rho(X,Y) \leq 1
  • ρ=1|\rho| = 1 if and only if Y=aX+bY = aX + b for some constants a0,ba \neq 0, b
Example

For independent random variables: Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y)=Var(X)+Var(Y)\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y) + 2\text{Cov}(X,Y) = \text{Var}(X) + \text{Var}(Y)

For dependent variables, the covariance term can be positive (positive association) or negative (negative association).

Markov's and Chebyshev's Inequalities

Theorem

For any non-negative random variable XX and a>0a > 0: P(Xa)E[X]aP(X \geq a) \leq \frac{E[X]}{a}

Theorem

For any random variable XX with finite variance and k>0k > 0: P(Xμkσ)1k2P(|X - \mu| \geq k\sigma) \leq \frac{1}{k^2}

Equivalently: P(Xμk)Var(X)k2P(|X - \mu| \geq k) \leq \frac{\text{Var}(X)}{k^2}

Example

Chebyshev with k=2k = 2: At least 75% of the distribution lies within 2 standard deviations of the mean.

With k=3k = 3: At least 89% lies within 3 standard deviations.

These bounds are universal—they hold for any distribution with finite variance.

Moment Generating Functions Revisited

The MGF uniquely determines a distribution and has useful properties:

Uniqueness: If MX(t)=MY(t)M_X(t) = M_Y(t) for all tt in an interval around 0, then XX and YY have the same distribution.

Sum of Independent Variables: If XX and YY are independent: MX+Y(t)=E[et(X+Y)]=E[etX]E[etY]=MX(t)MY(t)M_{X+Y}(t) = E[e^{t(X+Y)}] = E[e^{tX}]E[e^{tY}] = M_X(t) \cdot M_Y(t)

Example

If X1,,XnX_1, \ldots, X_n are independent N(μ,σ2)\mathcal{N}(\mu, \sigma^2), then XiN(nμ,nσ2)\sum X_i \sim \mathcal{N}(n\mu, n\sigma^2): MXi(t)=i=1neμt+σ2t2/2=enμt+nσ2t2/2M_{\sum X_i}(t) = \prod_{i=1}^n e^{\mu t + \sigma^2 t^2/2} = e^{n\mu t + n\sigma^2 t^2/2}

Remark

These properties form the foundation for statistical inference. Chebyshev's inequality guarantees concentration around the mean, while the MGF simplifies calculations involving sums of independent variables—essential for the Central Limit Theorem.