torch.transformer

My first step in transformer legacy =)

This verson is constructed from other sources of transformer implementation.

Yet it has no masking at Decoder, so you should be aware of that.

This implementation has multi-head self-attention and positional encoding

which gives the nn understanding of the structure of sequence.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
README.md		README.md
torch.ipynb		torch.ipynb

Provide feedback