Skip to content
This repository has been archived by the owner on Aug 1, 2024. It is now read-only.

Using ESM2-650M to obtain protein embedding, why does it take longer when the batch_size is larger? #685

Answered by ebetica
ylzdmm asked this question in Q&A
Discussion options

You must be logged in to vote

Could it be that your file contains sequences of very ragged sequence lengths, so the number of padding tokens is high? Try sorting the sequences first.

Replies: 1 comment 3 replies

Comment options

You must be logged in to vote
3 replies
@ylzdmm
Comment options

@ebetica
Comment options

Answer selected by ylzdmm
@ylzdmm
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants