Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tool: AMIGO API for enrichment of gene for an organism #16

Open
nathandunn opened this issue Jun 30, 2017 · 14 comments
Open

Tool: AMIGO API for enrichment of gene for an organism #16

nathandunn opened this issue Jun 30, 2017 · 14 comments

Comments

@nathandunn
Copy link
Contributor

Want to emulate the behavior of this tool:

screen shot 2017-06-30 at 11 58 28 am

I think it actually calls this tool (which would also be fine), but I don't see a webservice for this (though maybe just a post is fine).

You can look at Amigo as well, but I think that is just wrapping panther:
http://wiki.geneontology.org/index.php/AmiGO_2_Web_Services#API_Documentation

@kltm is that true (the amigo enrichment is just calling Panther) or are you calling the GOLR backend? Are you doing a post to pantherdb for this or is there a hidden web-service that you are calling?

@hexylena
Copy link
Member

This could be a data source, but as a tool this would be completely unreproducible. Is there anyway we can reproduce the infrastructure in some offline manner?

@nathandunn
Copy link
Contributor Author

I was just thinking of using it as a datasource.

That being said it would be possible to run it locally, but I think the footprint / effort may be immense and would need to ingest / update data from a number of sources to be appropriately functional.

@kltm would know more.

@hexylena
Copy link
Member

Yes, but that would make it reproducible. And term enrichment is usually done as a step in a workflow, I assume, it isn't the source of data, it's a processing step. And non reproducible processing steps are not good, so if there's a way to work around this by having local databases that are run for a query, then this becomes an attractive proposition for a Galaxy tool.

@nathandunn
Copy link
Contributor Author

@erasche That is an excellent point. I think the way you get around that is to provide data versions during the query to amigo, which I don't think it supports. (is this right @cjmungall / @kltm ?)

@hexylena
Copy link
Member

hexylena commented Jun 30, 2017

@nathandunn yep, that would be one solution (but would not work for completely network isolated galaxy instances, hence always the push for "is there a DB dump we can download, we can investigate modifying tooling to search that")

@nathandunn
Copy link
Contributor Author

This is a good idea. If we had a self-contained version that would be great. @cjmungall / @kltm would it be feasible for @erasche to dockerize amigo or would this be a herculean undertaking?

@kltm
Copy link

kltm commented Jun 30, 2017

@cmungall, not @cjmungall

There is no real API API for the enrichment at PANTHER anymore. At one point we had worked out TERP (a Term EnRichment Protocol), but that fell by the wayside with the practical considerations of working with PANTHER's TE quickly (e.g. https://github.com/geneontology/amigo/search?utf8=%E2%9C%93&q=TERP&type=Issues).

@cmungall
Copy link

cmungall commented Jul 1, 2017

You can write a python script to do this easily using ontobio

not well documented yet as you can see!
http://ontobio.readthedocs.io/en/latest/analyses.html#enrichment

Here's an example:
http://nbviewer.jupyter.org/github/biolink/ontobio/blob/master/notebooks/Phenotype_Enrichment.ipynb

(change category from 'phenotype' to 'function' for GO enrichment)

(this notebook does a whole lot more that you don't need, or if you did would be better split into separate tools)

Re: TERP this capability should be exposed via the web API (biolink) soon but for galaxy direct python API is probably fine

caveat: for querying GO you need to know in advance what kind of IDs to use. MOD IDs for MODs, UniProtKB IDs for everything else. I'll add an example for mapping. Would you want to do the ID mapping via a separate galaxy tool, or just bundle into the enrichment capability?

@cmungall
Copy link

cmungall commented Jul 1, 2017

When writing the tool, make sure to include background gene set, this is v important

@cmungall
Copy link

cmungall commented Jul 1, 2017

Reproducibility: use a versioned PURL for the ontology and a specific version of the annotation files. Instructions in the PR about to come

@cmungall
Copy link

cmungall commented Jul 3, 2017

OK, slightly improved docs on enrichment here: http://ontobio.readthedocs.io/en/latest/analyses.html#enrichment

Note that one option is simply to wrap the existing cli script

@nathandunn
Copy link
Contributor Author

@cmungall I think that wrapping the existing script is a much better idea especially if you plan to publish on pypi or the like.

Is there a way to query available versions and more importantly set URL version from within the API? Also, is there a way to confirm the source versions when getting results?

Maybe we can wait until the doc is fully developed.

@cmungall
Copy link

cmungall commented Jul 6, 2017

Should this be the responsibility of the enrichment tool, or should there be a separate fetcher tool (or two, one to get a versioned ont, other to get versioned annotations)? The latter seems more modular, you can then plug in different analytic tools without each analytic tool worrying about versioning. But I don't know galaxy best practice these days

@hexylena
Copy link
Member

some brief notes

  • existing cli = great!
  • For a very "galaxy" way of doing things, I think this would be separate tools.
    • One "data manager" which fetches the reference PURL (and maybe lets the user specify a specific version)
    • And separate querying tools.

But these are implementation details. Maybe the next time IUC has a hackathon I'll have some time to work on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants