Skip to content

Latest commit

 

History

History
21 lines (12 loc) · 1.22 KB

README.md

File metadata and controls

21 lines (12 loc) · 1.22 KB

Training of Transformer architecture on a Free GPU for English to German Translation

Implementation of research paper which has the core algorithm (the Transformers architecture) that all LLMs follow and that has been cited more than 100k times “Attention is all you Need” and tried to implement the transformer-based encoder-decoder model on Colab.

This project involves training a Transformer model on Colab GPU to translate English sentences into German. The model is trained on a dataset containing 15,000 pairs of English and German sentences. Dataset link - Download Dataset

colabvideo

TODO

  • Model is overfitting, need to reduce the test error as well.
  • Add inference to predict english to german sentence.
  • Experiment with pretrained tokenizers such as SentencePiece and TikToken.
  • Train the model on parallel GPUs

For beter understanding you can refer to the blogs Blog Link - Intro blog