RetNet

Maybe the simplest RetNet implementation, merged parallel, recurrent and trunkwise code all. This repo is just for learning and backup. I haven't do any testing on CUDA device yet. So, the code can only run on CPU for now.

The intresting thing is: MSR block implementation < 100 lines code. :)

Please ignore all aka code here. It's a sample proxy to torch:

aka.nn --> torch.nn
aka.numpy --> torch + torch.nn.functional
aka.repo --> transformer

Requirements

python
torch
torchvision
transformer

Prepare

Download data files from: https://hf-mirror.com/isek-ai/SDPrompt-RetNet-300M

to folder:

data/SDPrompt-RetNet-300M

Run

python Retention.py

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
aka		aka
.gitignore		.gitignore
CausalLM.py		CausalLM.py
CausalScan5d.cpp		CausalScan5d.cpp
CausalScan5d.cu		CausalScan5d.cu
CausalScan5d.py		CausalScan5d.py
README.md		README.md
Retention.png		Retention.png
Retention.py		Retention.py
Xproj.py		Xproj.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RetNet

Requirements

Prepare

Run

Graphs

About

Uh oh!

Releases

Packages

Languages

agiwave/RetNet

Folders and files

Latest commit

History

Repository files navigation

RetNet

Requirements

Prepare

Run

Graphs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages