Lecture Note: §4 Expectation

Published: November 24, 2025

Last Update: 2025-11-24

概率分布是随机变量的概率性质最完整的刻画。
随机变量的数字特征，则是某些由随机变量的分布所决定的常数，它刻画了随机变量（或其分布）的某一方面的性质。
In this section, we describe various summary statistics that can be derived from a probability distribution (either a p.d.f or p.m.f).

Expectation for a Discrete Distribution
- \(E(X) = \sum_{\text{All } x}xf(x)\)
- Examples: Bernoulli, Binomial, Geometric, Poisson Distributions
Expectation for a Continuous Distribution
- \(E(X) = \int_{-\infty}^{\infty} xf(x)dx\)
- Examples: Exponential, Normal, Cauchy Distributions
Theorem: Law of the Unconscious Statistician
- \(E[r(X)] = \int_{-\infty}^{\infty} r(x) f(x) dx\)
- \(E[r(X)] = \sum_{\text{All } x} r(x) f(x)\)
Properties of Expectations
- Linear Function \(Y = aX +b\): \(E(Y) = aE(X)+b\)
- Linear Combination: \(E(a_1X_1 + \cdots + a_nX_n + b) = a_1E(X_1) + \cdots + a_nE(X_n) + b\)
- Independent Random Variables: \(E\left(\prod_{i=1}^{n} X_i\right) = \prod_{i=1}^{n} X_i\)
Median
- Let \((X\) be a random variable. Every number \(m\) with the following property is called a median of the distribution of \(X\): \(\Pr(X \leq m) \geq 1/2\) and \(\Pr(X \geq m) \geq 1/2\)
- For continuous distribution \(\Pr(X \leq m) = 1/2\)
Conditional Expectation
- The conditional expectation (or conditional mean) of \(Y\) given \(X = x\) is denoted by \(E(Y \vert x)\) and is defined to be the expectation of the conditional distribution of \(Y\) given \(X=x\).
- \(E(Y\vert x) = \int_{-\infty}^{\infty} y g_2(y \vert x)dy\) - \(E(Y\vert x) = \sum_{\text{All }y} y g_2(y\vert x)\)
- \(E(E(Y \vert X)) = E(Y)\)

The variance is a measure of the “spread” of a distribution, often denoted by \(\sigma^{2}\).

Variance \(\text{Var}(X) = E[(X - \mu)^{2}]\), where \(\mu = E(X)\).
- Standard Deviation \(\sigma = \sqrt{\text{Var}(X)}\)
- \(\text{Var}(X) = E(X^{2}) - [E(X)]^{2}\)
- Examples: Bernoulli, Binomial, Geometric, Poisson, Exponential, Normal Distributions
Properties of the Variance
- \(\text{Var}(aX + b) = a^{2} \text{Var}(X)\)
- If \(X_1, \ldots, X_n\) are independent random variables with finite means, then \(\text{Var}(a_1 X_1 +\cdots + a_n X_n)=a_1^{2}\text{Var}(X_1)+\cdots+a_n^{2}\text{Var}(X_n)\).
Interquartile (IQR)
- IQR \(= F^{-1}(0.75) - F^{-1}(0.25)\)

Bernoulli Distribution: \( E(X) = p, \text{Var}(X) = p(1-p) \)
Binomial Distribution \( X \sim B(n,p) \): \( E(X) = np, \text{Var}(X) = np(1-p) \)
Poisson Distribution \( X \sim P(\lambda) \): \( E(X) = \lambda, \text{Var}(X) = \lambda \)
Uniform Distribution \( X \sim U(a,b) \): \( E(X) = (a + b)/2, \text{Var}(X) = (b - a)^{2}/12 \)
Exponential Distribution: \( E(X) = 1/\lambda, \text{Var}(X) = 1/\lambda^{2} \)
Normal Distribution \( X \sim N(\mu, \sigma^{2}) \): \( E(X) = \mu, \text{Var}(X) = \sigma^{2} \)

Moments
- The \(k\)th moment of \(X\): \(E(X^{k})\)
- Central moment of \(X\): \(E[(X-E(X))^{k}]\)
Moment Generating Function \(\Psi(t) = E(e^{tX})\)
- \(E(X^{n}) = \Psi^{(n)}(0)\), for \(n = 1,2,\ldots\)

Covariance
- \(\text{Cov}(X,Y) = E[(X - \mu_x)(Y - \mu_Y)]\)
- \(\text{Cov}(X,Y) = E(XY) - E(X)E(Y)\)
- \([\text{Cov}(X,Y)]^{2} \leq \sigma_X^{2}\sigma_Y^{2}\)
Correlation
- \(\rho(X,Y) = \dfrac{\text{Cov}(X,Y)}{\sigma_X\sigma_Y}\)
- A better way to think of the correlation coefficient is as a degree of linearity.
Properties
- \(\text{Var}(aX + bY +c) = a^{2}\text{Var}(X) + b^{2}\text{Var}(Y) + 2ab\text{Cov}(X,Y)\)
Uncorrelated does not imply independent
Correlation does not imply causation
- Spurious correlation examples.
- Simpson’s paradox