Feasibility of fine-tuning ESM2_15B #286

wwwhhhccc · 2022-09-20T03:09:06Z

wwwhhhccc
Sep 20, 2022

Hi! Thanks for sharing these wonderful pretrained models!

I'm trying to use esm2 for a downstream protein classification task. I'm thinking about fine-tuning ESM2_15B or ESM2_3B but am slighly concerned about how feasible it is. Based on your experience, what hardware do I need if I want to fine-tune ESM2_15B or ESM2_3B? Will 8 A100's or 8 A5000's cut it? Thanks!

Answered by tomsercu

Sep 21, 2022

This really depends on how many datapoints you're finetuning on (more accurately, num_seqs x seq_lengths, ie how many tokens).
Note that our pretraining batch size is 2M tokens/amino acids for a single batch.
You'll be able to do a decent amount of training with 8 A100s.
Also if you get 80GBs of mem you will be able to finetune even the 15B model with simple dataparallel, otherwise you may need to work with model parallel (FSDP) which requires some more technical knowledge and probably using a framework like fairseq or pytorch lightning.

View full answer

tomsercu · 2022-09-21T15:30:28Z

tomsercu
Sep 21, 2022

This really depends on how many datapoints you're finetuning on (more accurately, num_seqs x seq_lengths, ie how many tokens).
Note that our pretraining batch size is 2M tokens/amino acids for a single batch.
You'll be able to do a decent amount of training with 8 A100s.
Also if you get 80GBs of mem you will be able to finetune even the 15B model with simple dataparallel, otherwise you may need to work with model parallel (FSDP) which requires some more technical knowledge and probably using a framework like fairseq or pytorch lightning.

2 replies

wwwhhhccc Sep 22, 2022
Author

Thanks for your answer! I have two training sets. One has ~1000 samples and the other has ~15000 samples. The average seq length in both is around 500. Would these datasets be too small for fine-tuning to have a meaningful impact on the 15B or 3B model?

tomsercu Sep 27, 2022

1k samples of length 500 is a single small batch - meaning you will do full batch Gradient Descent (not SGD). I'd definitely first try with keeping the LM fixed, and not expect much if any improvement from finetuning

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feasibility of fine-tuning ESM2_15B #286

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Feasibility of fine-tuning ESM2_15B #286

wwwhhhccc Sep 20, 2022

Replies: 1 comment · 2 replies

tomsercu Sep 21, 2022

wwwhhhccc Sep 22, 2022 Author

tomsercu Sep 27, 2022

wwwhhhccc
Sep 20, 2022

Replies: 1 comment 2 replies

tomsercu
Sep 21, 2022

wwwhhhccc Sep 22, 2022
Author