Multilayer Perceptrons

From Perceptron to MLP

A single perceptron computes a linear function followed by an activation:

y = \sigma(\mathbf{w}^T \mathbf{x} + b)

An MLP stacks multiple layers of perceptrons to learn non-linear functions.

Architecture

An MLP consists of:

Input layer: receives the feature vector
Hidden layers: learn intermediate representations
Output layer: produces the final prediction

Activation Functions

Common activations:

ReLU: $f(x) = \max(0, x)$ — most popular, avoids vanishing gradients
Sigmoid: $f(x) = \frac{1}{1+e^{-x}}$ — squashes to (0, 1)
Tanh: $f(x) = \tanh(x)$ — squashes to (-1, 1)

PyTorch Implementation

import torch.nn as nn

model = nn.Sequential(
    nn.Linear(784, 256),
    nn.ReLU(),
    nn.Linear(256, 128),
    nn.ReLU(),
    nn.Linear(128, 10),
)

Universal Approximation

An MLP with a single hidden layer and sufficient neurons can approximate any continuous function. In practice, deeper networks with fewer neurons per layer work better.

From Perceptron to MLP

Architecture

Activation Functions

PyTorch Implementation

Universal Approximation

Related Notes