Machine Learning Guide
MLG 033 Transformers
- Autor: Vários
- Narrador: Vários
- Editora: Podcast
- Duração: 0:43:23
- Mais informações
Informações:
Sinopse
Links: Notes and resources at ocdevel.com/mlg/33 3Blue1Brown videos: https://3blue1brown.com/ Try a walking desk stay healthy & sharp while you learn & code Try Descript audio/video editing with AI power-tools Background & Motivation RNN Limitations: Sequential processing prevents full parallelization—even with attention tweaks—making them inefficient on modern hardware. Breakthrough: “Attention Is All You Need” replaced recurrence with self-attention, unlocking massive parallelism and scalability. Core Architecture Layer Stack: Consists of alternating self-attention and feed-forward (MLP) layers, each wrapped in residual connections and layer normalization. Positional Encodings: Since self-attention is permutation invariant, add sinusoidal or learned positional embeddings to inject sequence order. Self-Attention Mechanism Q, K, V Explained: Query (Q): The representation of the token seeking contextual info. Key (K): The representation of tokens being compared against. Value (V): The inf