Using GPN-MSA on nonhuman vertebrates? #40
-
Hello everyone! I'm very interested in using GPN-MSA VEP for organisms other than human. I got the example working, but when I've tried to give it data from a different species (converted to be in human-referent coordinates), it seems to dislike when the reference allele in my species differs from the human reference allele and fails an assertion. Is there a way to use the pretrained model for non-human vertebrates, or will I need to retrain the model with the MSA using my species of choice as the reference? Thank you! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi, thank you for your interest! We don't recommend using the existing model outside of human, as it is specifically tuned to the human genomic and evolutionary context. Our best recommendation today would be to train a new model on the new species. The main hurdle is obtaining the MAF alignment file. There are some available at the UCSC Genome Browser downloads page. Additionally, the MAF alignment can be extracted from the HAL alignments in Zoonomia for any of its species, although it can be computationally expensive. We are currently training on mouse as well and as we apply it to more species we might be able to share more polished and general code for processing the alignments. In the future we can also imagine a model that is more flexible and can re-stitch the alignment blocks flexibly for any query species. This is mainly a bioinformatics challenge. Let us know if we can help in any way. |
Beta Was this translation helpful? Give feedback.
Hi, thank you for your interest! We don't recommend using the existing model outside of human, as it is specifically tuned to the human genomic and evolutionary context. Our best recommendation today would be to train a new model on the new species. The main hurdle is obtaining the MAF alignment file. There are some available at the UCSC Genome Browser downloads page. Additionally, the MAF alignment can be extracted from the HAL alignments in Zoonomia for any of its species, although it can be computationally expensive.
We are currently training on mouse as well and as we apply it to more species we might be able to share more polished and general code for processing the alignments. In th…