A few super-helpful resources to better understand Attention and Transformers, in recommended order:
- Luis Serano: A Friendly Introduction to RNNs
- Jay Alamar: Seq2Seq RNN model with Attention
- Jay Alamar: The Illustrated Transformer
- Harvard NLP: The Anotated Transformer
- @bentrevett's Seq2Seq explainer notebooks
- Annotated GPT-2
- Andrew Peng: Translation with transformer