-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add miRBase groundings to GroundingMapper #918
Conversation
I did my best to follow the same procedure as when handling the UniProt grounding. Also switched some staticmethods over to classmethods to make the references a bit more elegant
Great, thanks! One question is whether we really want to standardize the names of microRNAs to their gene name, e.g., in the test you added, Also, given that we can now reliably standardize the names and groundings of microRNAs, we can take this into account in the Agent.get_grounding method. Finally, the fact that micro-RNAs will now all have HGNC grounding (which they were never given before from any of our sources) may cause unexpected behaviors in some downstream applications where HGNC grounding is often assumed to be associated with UP (protein) grounding. So we'll need to review and test some of these to see if anything needs to be done. |
I think I may have misunderstood the directionality of the test - I agree that using the miRBase name would be preferred over the HGNC gene name. However, I think it's still meaningful to get the HGNC grounding, even though this might cause a bit of pain while checking what INDRA code makes assumptions about the existence of an HGNC gene identifier leading to the assumption that it's coding for a protein. The world is much weirder than that :) I'll take a look into |
Now this has higher priority than HGNC or UP in order to avoid possible conflicts with when the agent has both MIRBASE and HGNC identifiers in the db_refs
The name standardization issue is just that we check for HGNC first, and never reach
if there is an HGNC grounding that was added during db_refs standardization. |
Great, thanks for the changes! So then the only remaining issue is for me to think a little bit about downstream effects. |
Also fixed a typo in the miRNA name
@bgyori and just discussed this some more. Some conclusions:
|
I think 1-4 are all reasonable. For 5 I think it would be interesting to have hierarchy information about miRNAs - reading from the literature likely suffers from the same problem that proteins did before you introduced FamPlex. |
Yes, I agree about 5 in principle, though it's not an immediate priority. In the meantime, I did #926, which I will merge soon. It might be a bit inconvenient to rebase this branch once that is merged due to the refactoring but otherwise it will fit in well in the new structure too. |
Oh, per https://github.com/sorgerlab/indra/blob/master/CONTRIBUTING.md#git-forkpr-workflow we never merge master into branches, rather, rebase. I know it's not convenient in this case - I'm happy to sort out the git issues. |
@bgyori sorry about that. I appreciate the help - I'll check out that link so I don't make the same mistake next time. Good news is everything is upgraded and the tests are passing locally. Just waiting for travis now. |
This feature is now implemented as part of the more general db_refs and name standardization process via the ontology. Note that in case there is a one-to-one human gene correspondence for the given microRNA, its name will be standardized to HGNC, otherwise MIRBASE. |
References #255
I did my best to follow the same procedure as when handling the UniProt grounding.
Also switched some staticmethods over to classmethods to make the references a bit more elegant