-
Hi! Thanks for sharing these wonderful pretrained models! I'm trying to use esm2 for a downstream protein classification task. I'm thinking about fine-tuning ESM2_15B or ESM2_3B but am slighly concerned about how feasible it is. Based on your experience, what hardware do I need if I want to fine-tune ESM2_15B or ESM2_3B? Will 8 A100's or 8 A5000's cut it? Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
This really depends on how many datapoints you're finetuning on (more accurately, num_seqs x seq_lengths, ie how many tokens). |
Beta Was this translation helpful? Give feedback.
This really depends on how many datapoints you're finetuning on (more accurately, num_seqs x seq_lengths, ie how many tokens).
Note that our pretraining batch size is 2M tokens/amino acids for a single batch.
You'll be able to do a decent amount of training with 8 A100s.
Also if you get 80GBs of mem you will be able to finetune even the 15B model with simple dataparallel, otherwise you may need to work with model parallel (FSDP) which requires some more technical knowledge and probably using a framework like fairseq or pytorch lightning.