Skip to content

知识点卡片:期望、方差与协方差

基本信息

属性内容
知识点期望、方差与协方差
掌握程度★★★★★
学习优先级P0
预估时间4小时
面试频率★★★★★

核心原理

期望 (Expectation)

离散:E[X] = Σ x_i * P(X=x_i)
连续:E[X] = ∫ x f(x) dx

性质:E[aX + bY] = aE[X] + bE[Y](线性性)

方差 (Variance)

Var[X] = E[(X - μ)²] = E[X²] - (E[X])²

性质:
- Var[aX] = a² Var[X]
- 标准差 σ = √Var[X]

协方差 (Covariance)

Cov(X, Y) = E[(X - μ_X)(Y - μ_Y)] = E[XY] - E[X]E[Y]

协方差矩阵:
Σ = [[Var(X₁), Cov(X₁,X₂), ..., Cov(X₁,Xₙ)],
     [Cov(X₂,X₁), Var(X₂), ..., Cov(X₂,Xₙ)],
     ...
     [Cov(Xₙ,X₁), Cov(Xₙ,X₂), ..., Var(Xₙ)]]

在深度学习中的应用

1. BatchNorm中的统计量

python
import torch
import torch.nn as nn

# BatchNorm在batch维度上计算均值和方差
# μ = E_batch[x], σ² = Var_batch[x]

class BatchNorm2d_Scratch(nn.Module):
    def __init__(self, num_features, eps=1e-5, momentum=0.1):
        super().__init__()
        self.gamma = nn.Parameter(torch.ones(num_features))
        self.beta = nn.Parameter(torch.zeros(num_features))
        self.eps = eps
        self.momentum = momentum
        self.register_buffer('running_mean', torch.zeros(num_features))
        self.register_buffer('running_var', torch.ones(num_features))

    def forward(self, x):
        # x: (B, C, H, W)
        if self.training:
            # 在 (B, H, W) 维度上计算统计量
            mean = x.mean(dim=(0, 2, 3))  # 每通道均值
            var = x.var(dim=(0, 2, 3), unbiased=False)  # 每通道方差

            # 更新全局均值和方差(指数移动平均)
            self.running_mean = (1 - self.momentum) * self.running_mean + self.momentum * mean
            self.running_var = (1 - self.momentum) * self.running_var + self.momentum * var
        else:
            mean = self.running_mean
            var = self.running_var

        # 归一化
        x_hat = (x - mean.view(1, -1, 1, 1)) / torch.sqrt(var.view(1, -1, 1, 1) + self.eps)
        return self.gamma.view(1, -1, 1, 1) * x_hat + self.beta.view(1, -1, 1, 1)

2. 初始化中的方差分析

python
"""
Xavier/Glorot初始化:保持前向和反向传播的方差不变

前向方差要求:Var(output) = Var(input)
→ n_in * Var(w) = 1
→ Var(w) = 1 / n_in

反向方差要求:Var(grad_output) = Var(grad_input)
→ n_out * Var(w) = 1
→ Var(w) = 1 / n_out

折中:Var(w) = 2 / (n_in + n_out)  (Xavier)
ReLU修正:Var(w) = 2 / n_in        (He/Kaiming)
"""

import math

def xavier_init(weight, gain=1.0):
    """Xavier初始化"""
    fan_in, fan_out = weight.size(1), weight.size(0)
    std = gain * math.sqrt(2.0 / (fan_in + fan_out))
    with torch.no_grad():
        weight.normal_(0, std)

def kaiming_init(weight, mode='fan_in', nonlinearity='relu'):
    """He/Kaiming初始化"""
    fan = weight.size(1) if mode == 'fan_in' else weight.size(0)
    gain = math.sqrt(2.0)  # ReLU对应的增益
    std = gain / math.sqrt(fan)
    with torch.no_grad():
        weight.normal_(0, std)

3. 协方差矩阵在PCA中的应用

python
import numpy as np

def pca_via_covariance(X, n_components):
    """通过协方差矩阵实现PCA"""
    # 中心化
    X_centered = X - X.mean(axis=0)

    # 协方差矩阵
    cov_matrix = X_centered.T @ X_centered / (len(X) - 1)

    # 特征分解
    eigenvalues, eigenvectors = np.linalg.eigh(cov_matrix)

    # 按特征值降序排列
    idx = np.argsort(eigenvalues)[::-1]
    eigenvalues = eigenvalues[idx]
    eigenvectors = eigenvectors[:, idx]

    # 前n个主成分
    components = eigenvectors[:, :n_components]
    X_reduced = X_centered @ components

    return X_reduced, components, eigenvalues[:n_components]

面试高频问题

Q1: 无偏方差为什么除以 n-1 而不是 n?

有偏估计:s²_b = (1/n) * Σ(x_i - x̄)²
无偏估计:s²_u = (1/(n-1)) * Σ(x_i - x̄)²

E[s²_b] = (n-1)/n * σ² ≠ σ²  (有偏)
E[s²_u] = σ²                   (无偏)

原因:用样本均值x̄代替真实均值μ,损失了一个自由度

在深度学习中:
- BatchNorm训练时用有偏估计(unbiased=False),因为batch足够大
- 计算推理时的标准化用全局统计量

Q2: 协方差矩阵有什么用?

  1. PCA:协方差矩阵的特征向量就是主成分方向
  2. 特征分析:协方差接近1表示特征相关,可能有冗余
  3. 白化:用协方差矩阵的逆来使特征解相关
  4. 马氏距离:d(x,y) = √((x-y)^T Σ^{-1} (x-y))

Q3: 中心极限定理在深度学习中的意义?

  • 大量独立随机变量之和趋近高斯分布
  • 解释了为什么高斯初始化合理
  • 解释了为什么样本均值可以用来估计参数
  • 在BatchNorm中,大batch下特征统计量更稳定

练习题

python
# 1. 验证无偏方差
np.random.seed(42)
population = np.random.randn(10000)  # 真实分布
true_var = population.var()

# 随机采样,计算有偏/无偏估计
n_trials = 1000
biased_ests = []
unbiased_ests = []
for _ in range(n_trials):
    sample = np.random.choice(population, size=10)
    biased_ests.append(sample.var())  # 默认除以n
    unbiased_ests.append(sample.var(ddof=1))  # 除以n-1

print(f"真实方差: {true_var:.4f}")
print(f"有偏估计均值: {np.mean(biased_ests):.4f}")  # 偏小
print(f"无偏估计均值: {np.mean(unbiased_ests):.4f}")  # 接近真实

# 2. 计算协方差矩阵
X = np.random.randn(100, 5)
cov = np.cov(X, rowvar=False)  # (5, 5)
print(f"协方差矩阵形状: {cov.shape}")
print(f"迹(总方差): {np.trace(cov):.4f}")

# 3. 验证BatchNorm的方差计算
x = torch.randn(16, 3, 32, 32)
bn_mean = x.mean(dim=(0, 2, 3))
bn_var_biased = x.var(dim=(0, 2, 3), unbiased=False)
bn_var_unbiased = x.var(dim=(0, 2, 3), unbiased=True)
print(f"有偏方差: {bn_var_biased}")
print(f"无偏方差: {bn_var_unbiased}")
# 有偏方差略小于无偏方差

相关知识点