DOC added a tutorial

nservant · Jan 10, 2017 · aa2a5db · aa2a5db
1 parent 1a4fb75
commit aa2a5db
Show file tree

Hide file tree

Showing 3 changed files with 48 additions and 11 deletions.
diff --git a/doc/_static/css/jhepc.css b/doc/_static/css/jhepc.css
@@ -179,15 +179,12 @@ h2 {
 }
 
 .section img {
-  display: block;
-  margin: auto;
   -webkit-border-radius: 10px; /* Saf3-4, iOS 1-3.2, Android <1.6 */
   -moz-border-radius: 10px; /* FF1-3.6 */
   border-radius: 10px; /* Opera 10.5, IE9, Saf5, Chrome, FF4, iOS 4, Android 2.1+ */
   border: 2px solid #fff;
   max-width: 75%;
   max-height: 60%;
-  margin-bottom: 40px;
 }
 
 .highlight {

diff --git a/doc/tutorial/index.rst b/doc/tutorial/index.rst
@@ -86,7 +86,47 @@ also match the lengths vector::
 
 The ``counts`` matrix is here of size 350 by 350.
 
-You've successfully loaded your first Hi-C data! Let's plot it.
-
-
-
+You've successfully loaded your first Hi-C data! 
+The corresponding image is the following.
+
+.. image:: /auto_examples/datasets/images/sphx_glr_plot_yeast_sample_001.png
+    :target: ../../auto_examples/datasets/plot_yeast_sample.html                               
+    :align: center                                                                                        
+    :scale: 50 
+
+
+Normalizing a data set
+=======================
+
+Now that we have some data loaded, let's proceed to normalizing it. There are
+two normalization algorithms implemented in `iced`: ICE and SCN. ICE is the
+most widely used normalization technique on Hi-C data, so this is the one we
+will showcase.
+
+ICE is based on a matrix balancing algorithm. The underlying assumptions are
+that the contact map suffers from biases that can be decomposable as a product
+of regionale biases: :math:`C_{ij} = \beta_i \beta_j N_{ij}`, where
+:math:`C_{ij}` is the raw contact counts between loci :math:`i` and :math:`j`,
+:math:`N_{ij}` the normalized contact counts, and :math:`\beta` the bias
+vector.
+
+Normalizing the data is as simple as follows ::
+
+  >>> from iced import normalization
+  >>> normed = normalization.ICE_normalization(counts)
+
+But the estimation of the bias vector can be severely problematic in low
+coverage regions. In fact, if the matrix is too sparse, the algorithm may not
+converge at all! To avoid this, Imakaev et al recommend filtering out a
+certain percentage of rows and columns that interact the least. This has to
+be performed prior to applying the normalization algorithm::
+  
+  >>> from iced import filter
+  >>> counts = filter.filter_low_counts(counts, percentage=0.04)
+  >>> normed = normalization.ICE_normalization(counts)
+
+
+.. image:: /auto_examples/normalization/images/sphx_glr_plot_ice_normalization_001.png
+   :target: ../../auto_examples/normalization/plot_ice_normalization.html
+   :align: center
+   :scale: 75
diff --git a/examples/normalization/plot_ice_normalization.py b/examples/normalization/plot_ice_normalization.py
@@ -26,18 +26,18 @@
 
 fig, axes = plt.subplots(ncols=2, figsize=(12, 4))
 
-axes[0].imshow(counts, cmap="Blues", norm=colors.SymLogNorm(1),
+axes[0].imshow(counts, cmap="RdBu_r", norm=colors.SymLogNorm(1),
                origin="bottom",
                extent=(0, len(counts), 0, len(counts)))
 
 [axes[0].axhline(i, linewidth=1, color="#000000") for i in lengths.cumsum()]
 [axes[0].axvline(i, linewidth=1, color="#000000") for i in lengths.cumsum()]
-axes[0].set_title("Raw contact counts")
+axes[0].set_title("Raw contact counts", fontweight="bold")
 
-m = axes[1].imshow(normed, cmap="Blues", norm=colors.SymLogNorm(1),
+m = axes[1].imshow(normed, cmap="RdBu_r", norm=colors.SymLogNorm(1),
                    origin="bottom",
                    extent=(0, len(counts), 0, len(counts)))
 [axes[1].axhline(i, linewidth=1, color="#000000") for i in lengths.cumsum()]
 [axes[1].axvline(i, linewidth=1, color="#000000") for i in lengths.cumsum()]
 cb = fig.colorbar(m)
-axes[1].set_title("Normalized contact counts")
+axes[1].set_title("Normalized contact counts", fontweight="bold")