Multilayer Perceptrons

Draft 1 min read

From Perceptron to MLP

A single perceptron computes a linear function followed by an activation:

y=σ(wTx+b)y = \sigma(\mathbf{w}^T \mathbf{x} + b)

An MLP stacks multiple layers of perceptrons to learn non-linear functions.

Architecture

An MLP consists of:

  1. Input layer: receives the feature vector
  2. Hidden layers: learn intermediate representations
  3. Output layer: produces the final prediction

Activation Functions

Common activations:

  • ReLU: f(x)=max(0,x)f(x) = \max(0, x) — most popular, avoids vanishing gradients
  • Sigmoid: f(x)=11+exf(x) = \frac{1}{1+e^{-x}} — squashes to (0, 1)
  • Tanh: f(x)=tanh(x)f(x) = \tanh(x) — squashes to (-1, 1)

PyTorch Implementation

import torch.nn as nn

model = nn.Sequential(
    nn.Linear(784, 256),
    nn.ReLU(),
    nn.Linear(256, 128),
    nn.ReLU(),
    nn.Linear(128, 10),
)

Universal Approximation

An MLP with a single hidden layer and sufficient neurons can approximate any continuous function. In practice, deeper networks with fewer neurons per layer work better.

Related Notes

Other notes in the same chapter or with shared tags