知识点卡片：自动微分

基本信息

属性	内容
知识点	自动微分 (Automatic Differentiation)
掌握程度	★★★★☆
学习优先级	P0
预估时间	6小时
面试频率	★★★☆☆

核心原理

自动微分是深度学习框架的核心，它不是符号微分也不是数值微分，而是通过计算图+链式法则精确高效地计算梯度。

三种计算梯度方式：
1. 数值微分：f'(x) ≈ (f(x+h)-f(x))/h
   - 有截断误差，慢
2. 符号微分：手动推导公式
   - 精确但表达式膨胀
3. 自动微分：在计算图上使用链式法则
   - 精确且高效 ✓

Forward Mode vs Reverse Mode

Forward Mode：从输入到输出
∂y/∂x = ∂y/∂v_n · ... · ∂v_1/∂x
适合：输入维度 << 输出维度

Reverse Mode（DL使用）：从输出到输入
∂y/∂x = ∂y/∂v · ∂v/∂x
适合：输出维度 << 输入维度（如标量loss对百万参数）

从零实现微型Autograd

python

import numpy as np

class Tensor:
    """支持自动微分的张量"""
    def __init__(self, data, children=(), op='', requires_grad=False):
        self.data = np.array(data, dtype=np.float32)
        self.grad = np.zeros_like(self.data) if requires_grad else None
        self._backward = lambda: None
        self.children = children
        self.op = op
        self.requires_grad = requires_grad

    def backward(self):
        """拓扑排序 + 反向传播"""
        # 拓扑排序（从后向前）
        topo = []
        visited = set()
        def build_topo(v):
            if v not in visited:
                visited.add(v)
                for c in v.children:
                    build_topo(c)
                topo.append(v)
        build_topo(self)

        # 反向传播
        self.grad = np.ones_like(self.data)  # ∂L/∂L = 1
        for v in reversed(topo):
            v._backward()

    def __add__(self, other):
        other = other if isinstance(other, Tensor) else Tensor(other)
        out = Tensor(self.data + other.data, children=(self, other), op='+')

        def _backward():
            if self.requires_grad: self.grad += out.grad
            if other.requires_grad: other.grad += out.grad
        out._backward = _backward
        out.requires_grad = self.requires_grad or other.requires_grad
        return out

    def __matmul__(self, other):
        """矩阵乘法"""
        other = other if isinstance(other, Tensor) else Tensor(other)
        out = Tensor(self.data @ other.data, children=(self, other), op='@')

        def _backward():
            if self.requires_grad: self.grad += out.grad @ other.data.T
            if other.requires_grad: other.grad += self.data.T @ out.grad
        out._backward = _backward
        out.requires_grad = self.requires_grad or other.requires_grad
        return out

    def __mul__(self, other):
        other = other if isinstance(other, Tensor) else Tensor(other)
        out = Tensor(self.data * other.data, children=(self, other), op='*')

        def _backward():
            if self.requires_grad: self.grad += out.grad * other.data
            if other.requires_grad: other.grad += out.grad * self.data
        out._backward = _backward
        out.requires_grad = self.requires_grad or other.requires_grad
        return out

    def relu(self):
        out = Tensor(np.maximum(0, self.data), children=(self,), op='ReLU')

        def _backward():
            if self.requires_grad:
                self.grad += out.grad * (self.data > 0)
        out._backward = _backward
        out.requires_grad = self.requires_grad
        return out

    def sum(self):
        out = Tensor(self.data.sum(), children=(self,), op='sum')

        def _backward():
            if self.requires_grad:
                self.grad += np.ones_like(self.data) * out.grad
        out._backward = _backward
        out.requires_grad = self.requires_grad
        return out

# 测试：x² + 2x + 1 在 x=3 处的导数
x = Tensor(3.0, requires_grad=True)
y = x * x + Tensor(2.0) * x + Tensor(1.0)
y.backward()
print(f"y = x² + 2x + 1, x=3")
print(f"y = {y.data}")
print(f"dy/dx = {x.grad}")  # 2*3+2 = 8

PyTorch的Autograd机制

python

import torch

# requires_grad标记需要计算梯度的张量
x = torch.tensor([3.0], requires_grad=True)

# 计算图自动构建
y = x ** 2 + 2 * x + 1
# y.grad_fn = <AddBackward0>

y.backward()  # 触发反向传播
print(x.grad)  # tensor([8.])

# 禁用梯度计算（推理/评估时）
with torch.no_grad():
    y = x ** 2 + 2 * x + 1
    # 不会构建计算图，节省内存

# 分离计算图
z = y.detach()  # z与y值相同但不追踪梯度

面试高频问题

Q1: 自动微分 vs 数值微分 vs 符号微分？

方面	数值微分	符号微分	自动微分
精度	截断误差	精确	精确
速度	慢(O(n))	表达式膨胀	快
适用性	任意函数	闭式表达式	可微函数
DL应用	调试/验证	不可行	核心方案

Q2: 为什么DL用Reverse Mode？

答： DL中输出通常是标量loss，输入是大量参数（百万到亿级）。

Forward Mode：需计算N次（N=参数量）
Reverse Mode：只需1次反向传播即可得到所有梯度
这就是为什么反向传播只需一次！

Q3: PyTorch的detach()有什么用？

答：

冻结梯度传播：如GAN训练中，生成器固定时用detach
节省内存：不需要梯度的张量从计算图断开
获取数值：需要tensor的值但不参与梯度计算

知识点卡片：自动微分 ​

基本信息 ​

核心原理 ​

Forward Mode vs Reverse Mode ​

从零实现微型Autograd ​

PyTorch的Autograd机制 ​

面试高频问题 ​

Q1: 自动微分 vs 数值微分 vs 符号微分？ ​

Q2: 为什么DL用Reverse Mode？ ​

Q3: PyTorch的detach()有什么用？ ​

相关知识点 ​

知识点卡片：自动微分

基本信息

核心原理

Forward Mode vs Reverse Mode

从零实现微型Autograd

PyTorch的Autograd机制

面试高频问题

Q1: 自动微分 vs 数值微分 vs 符号微分？

Q2: 为什么DL用Reverse Mode？

Q3: PyTorch的detach()有什么用？

相关知识点