知识点卡片:期望、方差与协方差
基本信息
| 属性 | 内容 |
|---|---|
| 知识点 | 期望、方差与协方差 |
| 掌握程度 | ★★★★★ |
| 学习优先级 | P0 |
| 预估时间 | 4小时 |
| 面试频率 | ★★★★★ |
核心原理
期望 (Expectation)
离散:E[X] = Σ x_i * P(X=x_i)
连续:E[X] = ∫ x f(x) dx
性质:E[aX + bY] = aE[X] + bE[Y](线性性)方差 (Variance)
Var[X] = E[(X - μ)²] = E[X²] - (E[X])²
性质:
- Var[aX] = a² Var[X]
- 标准差 σ = √Var[X]协方差 (Covariance)
Cov(X, Y) = E[(X - μ_X)(Y - μ_Y)] = E[XY] - E[X]E[Y]
协方差矩阵:
Σ = [[Var(X₁), Cov(X₁,X₂), ..., Cov(X₁,Xₙ)],
[Cov(X₂,X₁), Var(X₂), ..., Cov(X₂,Xₙ)],
...
[Cov(Xₙ,X₁), Cov(Xₙ,X₂), ..., Var(Xₙ)]]在深度学习中的应用
1. BatchNorm中的统计量
python
import torch
import torch.nn as nn
# BatchNorm在batch维度上计算均值和方差
# μ = E_batch[x], σ² = Var_batch[x]
class BatchNorm2d_Scratch(nn.Module):
def __init__(self, num_features, eps=1e-5, momentum=0.1):
super().__init__()
self.gamma = nn.Parameter(torch.ones(num_features))
self.beta = nn.Parameter(torch.zeros(num_features))
self.eps = eps
self.momentum = momentum
self.register_buffer('running_mean', torch.zeros(num_features))
self.register_buffer('running_var', torch.ones(num_features))
def forward(self, x):
# x: (B, C, H, W)
if self.training:
# 在 (B, H, W) 维度上计算统计量
mean = x.mean(dim=(0, 2, 3)) # 每通道均值
var = x.var(dim=(0, 2, 3), unbiased=False) # 每通道方差
# 更新全局均值和方差(指数移动平均)
self.running_mean = (1 - self.momentum) * self.running_mean + self.momentum * mean
self.running_var = (1 - self.momentum) * self.running_var + self.momentum * var
else:
mean = self.running_mean
var = self.running_var
# 归一化
x_hat = (x - mean.view(1, -1, 1, 1)) / torch.sqrt(var.view(1, -1, 1, 1) + self.eps)
return self.gamma.view(1, -1, 1, 1) * x_hat + self.beta.view(1, -1, 1, 1)2. 初始化中的方差分析
python
"""
Xavier/Glorot初始化:保持前向和反向传播的方差不变
前向方差要求:Var(output) = Var(input)
→ n_in * Var(w) = 1
→ Var(w) = 1 / n_in
反向方差要求:Var(grad_output) = Var(grad_input)
→ n_out * Var(w) = 1
→ Var(w) = 1 / n_out
折中:Var(w) = 2 / (n_in + n_out) (Xavier)
ReLU修正:Var(w) = 2 / n_in (He/Kaiming)
"""
import math
def xavier_init(weight, gain=1.0):
"""Xavier初始化"""
fan_in, fan_out = weight.size(1), weight.size(0)
std = gain * math.sqrt(2.0 / (fan_in + fan_out))
with torch.no_grad():
weight.normal_(0, std)
def kaiming_init(weight, mode='fan_in', nonlinearity='relu'):
"""He/Kaiming初始化"""
fan = weight.size(1) if mode == 'fan_in' else weight.size(0)
gain = math.sqrt(2.0) # ReLU对应的增益
std = gain / math.sqrt(fan)
with torch.no_grad():
weight.normal_(0, std)3. 协方差矩阵在PCA中的应用
python
import numpy as np
def pca_via_covariance(X, n_components):
"""通过协方差矩阵实现PCA"""
# 中心化
X_centered = X - X.mean(axis=0)
# 协方差矩阵
cov_matrix = X_centered.T @ X_centered / (len(X) - 1)
# 特征分解
eigenvalues, eigenvectors = np.linalg.eigh(cov_matrix)
# 按特征值降序排列
idx = np.argsort(eigenvalues)[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]
# 前n个主成分
components = eigenvectors[:, :n_components]
X_reduced = X_centered @ components
return X_reduced, components, eigenvalues[:n_components]面试高频问题
Q1: 无偏方差为什么除以 n-1 而不是 n?
答:
有偏估计:s²_b = (1/n) * Σ(x_i - x̄)²
无偏估计:s²_u = (1/(n-1)) * Σ(x_i - x̄)²
E[s²_b] = (n-1)/n * σ² ≠ σ² (有偏)
E[s²_u] = σ² (无偏)
原因:用样本均值x̄代替真实均值μ,损失了一个自由度
在深度学习中:
- BatchNorm训练时用有偏估计(unbiased=False),因为batch足够大
- 计算推理时的标准化用全局统计量Q2: 协方差矩阵有什么用?
答:
- PCA:协方差矩阵的特征向量就是主成分方向
- 特征分析:协方差接近1表示特征相关,可能有冗余
- 白化:用协方差矩阵的逆来使特征解相关
- 马氏距离:d(x,y) = √((x-y)^T Σ^{-1} (x-y))
Q3: 中心极限定理在深度学习中的意义?
答:
- 大量独立随机变量之和趋近高斯分布
- 解释了为什么高斯初始化合理
- 解释了为什么样本均值可以用来估计参数
- 在BatchNorm中,大batch下特征统计量更稳定
练习题
python
# 1. 验证无偏方差
np.random.seed(42)
population = np.random.randn(10000) # 真实分布
true_var = population.var()
# 随机采样,计算有偏/无偏估计
n_trials = 1000
biased_ests = []
unbiased_ests = []
for _ in range(n_trials):
sample = np.random.choice(population, size=10)
biased_ests.append(sample.var()) # 默认除以n
unbiased_ests.append(sample.var(ddof=1)) # 除以n-1
print(f"真实方差: {true_var:.4f}")
print(f"有偏估计均值: {np.mean(biased_ests):.4f}") # 偏小
print(f"无偏估计均值: {np.mean(unbiased_ests):.4f}") # 接近真实
# 2. 计算协方差矩阵
X = np.random.randn(100, 5)
cov = np.cov(X, rowvar=False) # (5, 5)
print(f"协方差矩阵形状: {cov.shape}")
print(f"迹(总方差): {np.trace(cov):.4f}")
# 3. 验证BatchNorm的方差计算
x = torch.randn(16, 3, 32, 32)
bn_mean = x.mean(dim=(0, 2, 3))
bn_var_biased = x.var(dim=(0, 2, 3), unbiased=False)
bn_var_unbiased = x.var(dim=(0, 2, 3), unbiased=True)
print(f"有偏方差: {bn_var_biased}")
print(f"无偏方差: {bn_var_unbiased}")
# 有偏方差略小于无偏方差