Expectation and Variance - Key Properties

Advanced properties of expectation and variance provide powerful tools for probability calculations and reveal deep connections between different concepts.

Conditional Expectation

Definition

The conditional expectation of $X$ given event $B$ (with $P(B) > 0$ ) is: $E[X|B] = \sum_x x \cdot P(X=x|B) \quad \text{(discrete)}$ $E[X|B] = \int_{-\infty}^{\infty} x \cdot f_{X|B}(x) \, dx \quad \text{(continuous)}$

Law of Total Expectation: If $\{B_i\}$ is a partition of $\Omega$ : $E[X] = \sum_i E[X|B_i] P(B_i)$

Example

A fair die is rolled. If the outcome is even, you win the value shown. If odd, you win 0. Expected winnings:

Let $B_1$ = "even" and $B_2$ = "odd". Then: $E[X|B_1] = \frac{2+4+6}{3} = 4, \quad E[X|B_2] = 0$ $E[X] = 4 \cdot \frac{1}{2} + 0 \cdot \frac{1}{2} = 2$

Covariance and Correlation

Definition

The covariance between random variables $X$ and $Y$ is: $\text{Cov}(X,Y) = E[(X-\mu_X)(Y-\mu_Y)] = E[XY] - E[X]E[Y]$

The correlation coefficient is: $\rho(X,Y) = \frac{\text{Cov}(X,Y)}{\sigma_X \sigma_Y}$

where $\sigma_X = \sqrt{\text{Var}(X)}$ and $\sigma_Y = \sqrt{\text{Var}(Y)}$ .

Properties:

If $X$ and $Y$ are independent, then $\text{Cov}(X,Y) = 0$ (but not conversely!)
$-1 \leq \rho(X,Y) \leq 1$
$|\rho| = 1$ if and only if $Y = aX + b$ for some constants $a \neq 0, b$

Example

For independent random variables: $\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y) + 2\text{Cov}(X,Y) = \text{Var}(X) + \text{Var}(Y)$

For dependent variables, the covariance term can be positive (positive association) or negative (negative association).

Markov's and Chebyshev's Inequalities

Theorem

For any non-negative random variable $X$ and $a > 0$ : $P(X \geq a) \leq \frac{E[X]}{a}$

Theorem

For any random variable $X$ with finite variance and $k > 0$ : $P(|X - \mu| \geq k\sigma) \leq \frac{1}{k^2}$

Equivalently: $P(|X - \mu| \geq k) \leq \frac{\text{Var}(X)}{k^2}$

Example

Chebyshev with $k = 2$ : At least 75% of the distribution lies within 2 standard deviations of the mean.

With $k = 3$ : At least 89% lies within 3 standard deviations.

These bounds are universal—they hold for any distribution with finite variance.

Moment Generating Functions Revisited

The MGF uniquely determines a distribution and has useful properties:

Uniqueness: If $M_X(t) = M_Y(t)$ for all $t$ in an interval around 0, then $X$ and $Y$ have the same distribution.

Sum of Independent Variables: If $X$ and $Y$ are independent: $M_{X+Y}(t) = E[e^{t(X+Y)}] = E[e^{tX}]E[e^{tY}] = M_X(t) \cdot M_Y(t)$

Example

If $X_1, \ldots, X_n$ are independent $\mathcal{N}(\mu, \sigma^2)$ , then $\sum X_i \sim \mathcal{N}(n\mu, n\sigma^2)$ : $M_{\sum X_i}(t) = \prod_{i=1}^n e^{\mu t + \sigma^2 t^2/2} = e^{n\mu t + n\sigma^2 t^2/2}$

Remark

These properties form the foundation for statistical inference. Chebyshev's inequality guarantees concentration around the mean, while the MGF simplifies calculations involving sums of independent variables—essential for the Central Limit Theorem.