Predict which residues are the most important #113

Mark-a-Lis · 2021-08-10T20:19:04Z

Mark-a-Lis
Aug 10, 2021

Hi All!
First off, thanks so much for publishing your wonderful language model!
I'm looking for a way to predict which residues in a particular protein are the most central to its current form. What I'd like to do is create a probability distribution of the residues' likelihood of mutation that I could sample from. I would appreciate input on how to go about this. I'm relatively new to transformers and DL and I really do appreciate the help.

Answered by tomsercu

Aug 12, 2021

The straightforward way to do this is to mask the token under consideration (there is some existing discussion on how to do this, a repo update to make this easier is upcoming too).
Then with result = model(masked_sequences), you'll find result['logits'], per sequence it'll be size L x K (seqlen x alphabet_size). To make it a probability distribution, use F.softmax. Then you can compute for example the per-position entropy: (p * p.log()).sum(-1)

View full answer

tomsercu · 2021-08-12T10:46:10Z

tomsercu
Aug 12, 2021

The straightforward way to do this is to mask the token under consideration (there is some existing discussion on how to do this, a repo update to make this easier is upcoming too).
Then with result = model(masked_sequences), you'll find result['logits'], per sequence it'll be size L x K (seqlen x alphabet_size). To make it a probability distribution, use F.softmax. Then you can compute for example the per-position entropy: (p * p.log()).sum(-1)

3 replies

tomsercu Aug 23, 2021

FYI We added an easier way to provide masked sequences, commit 3bb552af and demonstrated in https://github.com/facebookresearch/esm#quick-start-

Mark-a-Lis Aug 25, 2021
Author

Thank you so much! I really appreciate the help!

bj600800 Dec 14, 2021

Thanks for your discussion. I'm trying to do supervised learning with a downstream model. Is the solution provided above still valid for the hot-spot sites output?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Predict which residues are the most important #113

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Predict which residues are the most important #113

Mark-a-Lis Aug 10, 2021

Replies: 1 comment · 3 replies

tomsercu Aug 12, 2021

tomsercu Aug 23, 2021

Mark-a-Lis Aug 25, 2021 Author

bj600800 Dec 14, 2021

Mark-a-Lis
Aug 10, 2021

Replies: 1 comment 3 replies

tomsercu
Aug 12, 2021

Mark-a-Lis Aug 25, 2021
Author