Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Put flash attention 2 into ProGen2 #6

Merged
merged 14 commits into from
Dec 6, 2024
Prev Previous commit
Next Next commit
Merge branch 'main' into fa_progen2
JinyuanSun authored Dec 5, 2024

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
commit d9b2b662430cd94ed38b66fd595d015477ac7c85
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -113,7 +113,9 @@ It's recommended to use the flash attention for training. Because in the forward

# Benchmarking

Below is the comparison of peak memory usage and inference time of FAESM with the official ESM2 and shows that FAESM can save memory usage by up to 60% and inference time by up to 70% (length 1000). The benchmarking is done on ESM-650M with batch size 8, and a single A100 with 80GB of memory.

### FAESM vs. Official ESM2
Below is the comparison of peak memory usage and inference time of FAESM with the official ESM2. We show that FAESM can save memory usage by up to 60% and inference time by up to 70% (length 1000). The benchmarking is done on ESM-650M with batch size 8, and a single A100 with 80GB of memory.

![benchmark](assets/figs/benchmark.png)

You are viewing a condensed version of this merge commit. You can view the full changes here.