Implementation of Vanilla Transformers from scratch (Attention is all you need)
-
Embeddings: representing text in lower dimensional representation. converting text into numerical : learnable parameter.
-
Positional Embedding: Convey positional information to the model Sin function: Even position of embedding cos function: odd position of embedding stays the same : nonlearnable parameter.
-
Attention Mechanism:
- Umar Jamil: Transfomers from scratch
- Analytics Vidhya: Transformers from scratch
- Stanford Online: Transformers United Series
- Andrej Karpathy: Neural Network Zero to Hero: Build GPT