Transformers, attention mechanisms, pretraining, and fine-tuning.
The core innovation behind transformers — scaled dot-product attention.