Machine Learning Guide

MLG 033 Transformers

Autor: Vários
Narrador: Vários
Editora: Podcast
Duração: 0:43:23
Mais informações

Adicionar à lista

Ouvir

Sinopse

Links: Notes and resources at ocdevel.com/mlg/33 3Blue1Brown videos: https://3blue1brown.com/ Try a walking desk stay healthy & sharp while you learn & code Try Descript audio/video editing with AI power-tools Background & Motivation RNN Limitations: Sequential processing prevents full parallelization—even with attention tweaks—making them inefficient on modern hardware. Breakthrough: “Attention Is All You Need” replaced recurrence with self-attention, unlocking massive parallelism and scalability. Core Architecture Layer Stack: Consists of alternating self-attention and feed-forward (MLP) layers, each wrapped in residual connections and layer normalization. Positional Encodings: Since self-attention is permutation invariant, add sinusoidal or learned positional embeddings to inject sequence order. Self-Attention Mechanism Q, K, V Explained: Query (Q): The representation of the token seeking contextual info. Key (K): The representation of tokens being compared against. Value (V): The inf

Mostrar mais

Machine Learning Guide

MLG 033 Transformers

Sinopse

Experimente 30 dias grátis

Precisando de ajuda?

Instale o aplicativo:

Machine Learning Guide

MLG 033 Transformers

Informações:

Sinopse

Experimente 30 dias grátis

Precisando de ajuda?

Instale o aplicativo: