Skip to content
This repository has been archived by the owner on Aug 1, 2024. It is now read-only.

Feasibility of fine-tuning ESM2_15B #286

Answered by tomsercu
wwwhhhccc asked this question in Q&A
Discussion options

You must be logged in to vote

This really depends on how many datapoints you're finetuning on (more accurately, num_seqs x seq_lengths, ie how many tokens).
Note that our pretraining batch size is 2M tokens/amino acids for a single batch.
You'll be able to do a decent amount of training with 8 A100s.
Also if you get 80GBs of mem you will be able to finetune even the 15B model with simple dataparallel, otherwise you may need to work with model parallel (FSDP) which requires some more technical knowledge and probably using a framework like fairseq or pytorch lightning.

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@wwwhhhccc
Comment options

@tomsercu
Comment options

Answer selected by wwwhhhccc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants