Training of Transformer architecture on a Free GPU for English to German Translation

Implementation of research paper which has the core algorithm (the Transformers architecture) that all LLMs follow and that has been cited more than 100k times “Attention is all you Need” and tried to implement the transformer-based encoder-decoder model on Colab.

This project involves training a Transformer model on Colab GPU to translate English sentences into German. The model is trained on a dataset containing 15,000 pairs of English and German sentences. Dataset link - Download Dataset

TODO

Model is overfitting, need to reduce the test error as well.
Add inference to predict english to german sentence.
Experiment with pretrained tokenizers such as SentencePiece and TikToken.
Train the model on parallel GPUs

For beter understanding you can refer to the blogs Blog Link - Intro blog

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Training of Transformer architecture on a Free GPU for English to German Translation

TODO

Files

README.md

Latest commit

History

README.md

File metadata and controls

Training of Transformer architecture on a Free GPU for English to German Translation

TODO