Convolutional Neural Networks

Motivation

Fully connected layers don’t exploit spatial structure. CNNs use local connectivity and weight sharing to efficiently process images.

Convolution Operation

A 2D convolution slides a kernel over the input:

(f * g)(i, j) = \sum_m \sum_n f(m, n) \cdot g(i - m, j - n)

Key Components

Convolutional Layer

Applies learnable filters to extract features (edges, textures, shapes).

Pooling Layer

Reduces spatial dimensions. Max pooling takes the maximum value in each window.

Batch Normalization

Normalizes activations to stabilize training.

Classic Architectures

LeNet (1998): pioneered CNNs for digit recognition
AlexNet (2012): deeper, used ReLU and dropout
VGG (2014): uniform 3x3 convolutions, very deep
ResNet (2015): skip connections, enabled 100+ layer networks

PyTorch Example

import torch.nn as nn

model = nn.Sequential(
    nn.Conv2d(1, 32, kernel_size=3, padding=1),
    nn.ReLU(),
    nn.MaxPool2d(2),
    nn.Conv2d(32, 64, kernel_size=3, padding=1),
    nn.ReLU(),
    nn.MaxPool2d(2),
    nn.Flatten(),
    nn.Linear(64 * 7 * 7, 10),
)