Added to RTD for enrichment analyses, driven by galaxy-genome-annotat…

…ion/docker-galaxy-genome-annotation#16
biolink · Jul 1, 2017 · 694dde6 · 694dde6
1 parent 23385a6
commit 694dde6
Showing 1 changed file with 56 additions and 0 deletions.
diff --git a/docs/analyses.rst b/docs/analyses.rst
@@ -13,11 +13,67 @@ Enrichment
 
 See the `Notebook example <http://nbviewer.jupyter.org/github/biolink/ontobio/blob/master/notebooks/Phenotype_Enrichment.ipynb>`_
 
+OntoBio allows for generalized gene set enrichment: given a set of
+annotations that map genes to descriptor terms, and an input set of
+genes, and a background set, find what terms are enriched in the input
+set compared to the background.
+
+With OntoBio, enrichment tests work for any annotation corpus, not
+necessarily just gene-oriented. For example,
+disease-phenotype. However, care must be taken with underlying
+assumptions with non-gene sets.
+
+The very first thing you need to do before an enrichment analysis is
+fetch both an `Ontology` object and an `AsssociationSet` object. This
+could be a mix of local files or remote service/database. See
+:ref:`inputs` for details.
+
+Assume that we are using a remote ontology and local GAF:     
+
+.. code-block:: python
+
+    from ontobio import OntologyFactory
+    from ontobio import AssociationSetFactory
+    ofactory = OntologyFactory()
+    afactory = AssociationSetFactory()
+    ont = ofactory.create('go')
+    aset = afactory.create_from_gaf('my.gaf', ontology=ont)
+
+Assume also that we have a set of sample and background gene IDs, the
+test is:    
+
+.. code-block:: python
+
+    enr = aset.enrichment_test(subjects=gene_ids, background=background_gene_ids, threshold=0.00005, labels=True)    
+
+This returns a list of dicts (**TODO** - decide if we want to make
+this an object and follow a standard class model)
+
+**NOTE** the input gene IDs *must* be the same ones used in the
+AssociationSet. If you load from a GAF, this is the IDs that are
+formed by combining col1 and col2, separated by a
+":". E.g. UniProtKB:P123456
+
+What if you have different IDs? Or what if you just have a list of
+gene symbols? In this case you will need to *map* these names or IDs,
+the subject of the next section.
+
+Further reading:
+
+For API docs, see `enrichment_test in AssociationSet model <http://ontobio.readthedocs.io/en/latest/api.html#assocation-object-model>`_
+
+Identifier Mapping
+------------------
+
+**TODO**
+
 Semantic Similarity
 -------------------
 
 **TODO**
 
+To follow progress, see `this PR <https://github.com/biolink/ontobio/pull/49>`_
+
 Slimming
 --------