Implementation of Transformer Architecture introduced in the paper "Attention is All You Need" from scratch using PyTorch.
- Encompasses all key components of the Transformer model, including multi-head self-attention, positional encodings, feedforward layers, and layer normalization.
- Scalable and modular design, allowing for extensions or experimentation with Transformer variants.
- Also trained the model for Machine translation task (English-Italian Language Translation)