From Perceptron to MLP
A single perceptron computes a linear function followed by an activation:
An MLP stacks multiple layers of perceptrons to learn non-linear functions.
Architecture
An MLP consists of:
- Input layer: receives the feature vector
- Hidden layers: learn intermediate representations
- Output layer: produces the final prediction
Activation Functions
Common activations:
- ReLU: — most popular, avoids vanishing gradients
- Sigmoid: — squashes to (0, 1)
- Tanh: — squashes to (-1, 1)
PyTorch Implementation
import torch.nn as nn
model = nn.Sequential(
nn.Linear(784, 256),
nn.ReLU(),
nn.Linear(256, 128),
nn.ReLU(),
nn.Linear(128, 10),
)
Universal Approximation
An MLP with a single hidden layer and sufficient neurons can approximate any continuous function. In practice, deeper networks with fewer neurons per layer work better.