Skip to content
This repository has been archived by the owner on Aug 1, 2024. It is now read-only.

Understanding the tutorial output (ESM-1b unsupervised self-attention map contact predictions) #72

Answered by tomsercu
remomomo asked this question in Q&A
Discussion options

You must be logged in to vote

Hi, thanks for your interest.
The contacts correspond to the output of the logistic regression model, as described in Transformer protein language models are unsupervised structure learners. (Rao et al. 2020).
See also this notebook: https://github.com/facebookresearch/esm#unsupervised-contact-prediction
The paper mentions the logistic regression weights were fit using minimum separation of 6, which explains why the local range is absent from the predictions.
The LM is trained on a max sequence length of 1024, but yes you could split up longer proteins into shorter pieces. There were some previous questions around this if you search through discussions/github issues.
Hope this helps!

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@remomomo
Comment options

Answer selected by remomomo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants