Skip to content

Commit

Permalink
Added to RTD for enrichment analyses, driven by galaxy-genome-annotat…
Browse files Browse the repository at this point in the history
  • Loading branch information
cmungall committed Jul 1, 2017
1 parent 23385a6 commit 694dde6
Showing 1 changed file with 56 additions and 0 deletions.
56 changes: 56 additions & 0 deletions docs/analyses.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,67 @@ Enrichment

See the `Notebook example <http://nbviewer.jupyter.org/github/biolink/ontobio/blob/master/notebooks/Phenotype_Enrichment.ipynb>`_

OntoBio allows for generalized gene set enrichment: given a set of
annotations that map genes to descriptor terms, and an input set of
genes, and a background set, find what terms are enriched in the input
set compared to the background.

With OntoBio, enrichment tests work for any annotation corpus, not
necessarily just gene-oriented. For example,
disease-phenotype. However, care must be taken with underlying
assumptions with non-gene sets.

The very first thing you need to do before an enrichment analysis is
fetch both an `Ontology` object and an `AsssociationSet` object. This
could be a mix of local files or remote service/database. See
:ref:`inputs` for details.

Assume that we are using a remote ontology and local GAF:

.. code-block:: python
from ontobio import OntologyFactory
from ontobio import AssociationSetFactory
ofactory = OntologyFactory()
afactory = AssociationSetFactory()
ont = ofactory.create('go')
aset = afactory.create_from_gaf('my.gaf', ontology=ont)
Assume also that we have a set of sample and background gene IDs, the
test is:

.. code-block:: python
enr = aset.enrichment_test(subjects=gene_ids, background=background_gene_ids, threshold=0.00005, labels=True)
This returns a list of dicts (**TODO** - decide if we want to make
this an object and follow a standard class model)

**NOTE** the input gene IDs *must* be the same ones used in the
AssociationSet. If you load from a GAF, this is the IDs that are
formed by combining col1 and col2, separated by a
":". E.g. UniProtKB:P123456

What if you have different IDs? Or what if you just have a list of
gene symbols? In this case you will need to *map* these names or IDs,
the subject of the next section.

Further reading:

For API docs, see `enrichment_test in AssociationSet model <http://ontobio.readthedocs.io/en/latest/api.html#assocation-object-model>`_

Identifier Mapping
------------------

**TODO**

Semantic Similarity
-------------------

**TODO**

To follow progress, see `this PR <https://github.com/biolink/ontobio/pull/49>`_

Slimming
--------

Expand Down

0 comments on commit 694dde6

Please sign in to comment.