-
Notifications
You must be signed in to change notification settings - Fork 547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add use_custom_templates option #408
Conversation
Hi @rostro36 , thank you very much for this contribution. This is a really nice option to have for running with templates, and we would like to work with you to add this to OpenFold. We're preparing some testcases based on previous work that we have done with templates, especially on the single sequence model. Hopefully these testcases will be helpful for adjusting the code changes to work for all inference modes. I did run some quick integration tests with your PR. It looks like monomer inference mode using pre-computed alignments without templates has a small issue? Here's a sample command:
And the corresponding error:
I haven't had a chance to look into this deeply, but maybe have another look at the logic in get_custom_templates for parsing multiple mmcif files? |
Hey @jnwei ,
On my machine, the code now works for one or two custom templates with and without the use_custom_templates argument. Let me know if/how I can help you with the further |
README.md
Outdated
@@ -174,7 +174,10 @@ where `data` is the same directory as in the previous step. If `jackhmmer`, | |||
`/usr/bin`, their `binary_path` command-line arguments can be dropped. | |||
If you've already computed alignments for the query, you have the option to | |||
skip the expensive alignment computation here with | |||
`--use_precomputed_alignments`. | |||
`--use_precomputed_alignments`. If you wish to use a specific template as input, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this information to a different subheading of it's own? Maybe called "Custom Templates"?
Do mention that it'll be the same template(s) that will be used if multiple separate sequences are passed for inference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comments. The others are fulfilled without any question.
I am not sure if I understood you correctly about this one here though.
Now, I changed the README, so custom templates stands a bit more on it's own, but still in the monomer inference section.
Should I make a new section on the same level as monomer inference, multimer inference or soloseq?
Hi @rostro36 thanks for the fixes. I finally had a chance to take a closer look at this PR and do some more testing. I had a couple more thoughts/requests:
Let me know what you think, and if you'd be interested in contributing further on either of these directions. Thanks! |
Hey @jnwei , @sachinkadyan7 I should have a look to get some models running again soon and there is another OpenFold contribution I want to look into, but if it is nothing too big, then I can try to help with 1. the sequence alignments. |
Leaving my notes here regarding the sequence alignments in case they are helpful later.
You can find the template matches for a query sequence by adding a breakpoint after the template features dict are generated, and looking at the values for the key
|
I added support for sequence alignments. 6mfw, 5wmm and 6mfy that you suggested did not have a good sequence similarity to 1lci and Kalign raised an error. When supplying 3iep_A, 4g37_A and 5gz2_A (found with FoldSeek) with high sequence similarity, but not 100, everything worked and also did get high pLDDT scores in single sequence mode. Removing the templates still worked, but gave a bad pLDDT in single sequence. These checks look good to me so far. Let me know if there is something else to test. |
Dear @jnwei, @sachinkadyan7 The manuscript that utilizes this code has been submitted. As there were no roadblocks I can find on this pull request, I would like to push for feedback again to have a cleaner and wrapped up project on my side. Thank you |
Congratulations on the manuscript! I would be happy to accept the PR at this time. The only remaining suggestion would be regarding the documentation of your feature. Since your original submission, we changed the way that we handle documentation. If you would like to include a description of your feature in our new documentation, I would invite you to add a section to the Inference.md |
Thank for the congratulations @jnwei , the work of all of you on the new release of OpenFold deserves even more praise. Well done! I updated the README and Inference.md. The text is taken more or less from my README version. Inference.md only has the one addendum for the new option and some whitespaces artifacts, but otherwise the same text. I hope this is not a hindrance. |
LGTM, thanks for your contribution! |
Introduction & motiviation
Dear OpenFold team,
Thank you very much for creating such a useful tool and give the community the opportunity to use, ablate and extend the implementation!
In my research, I look at the impact of templates and try to find new protocols centered around them. For some aspects of this, I found that localcolabfold was too restrictive and I would like to use some features, which OpenFold can offer. That is why I implemented the
use_custom_templates
option, which should be similar to the--custom-template-path
argument of localcolabfold. I think this should be enough to have an implementation of AF2Rank (see tests below), touted by some their "favorite in the post-AF2 realm".Changes
Basically, I fitted the template pipeline from thread_sequence to use it with run_pretrained_model.py
use_custom_templates
optionTests
With the same pre-computed one row MSA, I saw that the inclusion of the template increased the pLDDT from 40ish to 80ish. I have not tested multimers or ESM single sequence mode, but it should not interfere with those. Don't know how well it plays along though.
I did a deeper dive and I think I found that using the very simple 1:1 mapping without alignment allows to use the template without any sequence information, except for the size of the sequence. As an empty
use_precomputed_alignments
also does not pass any information, this setup can be used for AF2Rank. The only other necessary condition is to correctly form the template to all GLY and only backbone + C-Beta. I can donate a script for the creation of this template in about one month after a deadline if you want.The only test I did is with the randomly chosen, but short 3NIR against the AF2Rank implementation in the ColabFold notebook.
Do you think the differences is within margin of error or should I have another look?
https://gist.github.com/rostro36/42113291b644388730b17d2aac1186be
End
Please let me know what tests and adjustments I can fulfill to add this feature or if I should merge it to another branch first.
Thank you for work!