extract and save embeddings for multiple sequence alignment based on msa transformer #92

maticmarin · 2021-06-23T14:58:00Z

maticmarin
Jun 23, 2021

Can you please suggest how to extract and save embeddings for MSA with MSA transformer in a manner like the extract.py. So to batch extract for each fasta file containing MSA a single embedding, not for each sequence in a single MSA file ?

Answered by tomsercu

Jun 23, 2021

The internal state of MSA transformer is M x L x d (msa size x seqlength x embedding dim).
Typically you want the MSA to produce sequence-level features that summarize all MSA information, and taking the final layer's embedding of the first (typically query) sequence, gives good results. You could also try (weighted) averaging over the whole MSA but we didn't see much difference.

View full answer

tomsercu · 2021-06-23T15:07:13Z

tomsercu
Jun 23, 2021

The internal state of MSA transformer is M x L x d (msa size x seqlength x embedding dim).
Typically you want the MSA to produce sequence-level features that summarize all MSA information, and taking the final layer's embedding of the first (typically query) sequence, gives good results. You could also try (weighted) averaging over the whole MSA but we didn't see much difference.

1 reply

maticmarin Jun 23, 2021
Author

I see thanks a lot that helped

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extract and save embeddings for multiple sequence alignment based on msa transformer #92

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

extract and save embeddings for multiple sequence alignment based on msa transformer #92

maticmarin Jun 23, 2021

Replies: 1 comment · 1 reply

tomsercu Jun 23, 2021

maticmarin Jun 23, 2021 Author

maticmarin
Jun 23, 2021

Replies: 1 comment 1 reply

tomsercu
Jun 23, 2021

maticmarin Jun 23, 2021
Author