Why Differentiation Matters
Training neural networks requires computing gradients of a loss function with respect to model parameters. Automatic differentiation (autodiff) does this efficiently and exactly.
Forward Mode vs Reverse Mode
- Forward mode: Computes derivatives alongside the function evaluation. Efficient when there are few inputs.
- Reverse mode: Computes derivatives by propagating backwards from the output. Efficient when there are few outputs (like a scalar loss).
Deep learning uses reverse mode (backpropagation).
Computational Graphs
Autodiff works by building a computational graph of operations:
import torch
x = torch.tensor(2.0, requires_grad=True)
y = x ** 2 + 3 * x + 1
y.backward()
print(x.grad) # dy/dx = 2x + 3 = 7.0
Chain Rule
The chain rule is the mathematical foundation:
Reverse mode autodiff applies the chain rule systematically through the computational graph.