Skip to content

Commit

Permalink
DOC added a tutorial
Browse files Browse the repository at this point in the history
  • Loading branch information
NelleV committed Jan 10, 2017
1 parent 1a4fb75 commit aa2a5db
Show file tree
Hide file tree
Showing 3 changed files with 48 additions and 11 deletions.
3 changes: 0 additions & 3 deletions doc/_static/css/jhepc.css
Original file line number Diff line number Diff line change
Expand Up @@ -179,15 +179,12 @@ h2 {
}

.section img {
display: block;
margin: auto;
-webkit-border-radius: 10px; /* Saf3-4, iOS 1-3.2, Android <1.6 */
-moz-border-radius: 10px; /* FF1-3.6 */
border-radius: 10px; /* Opera 10.5, IE9, Saf5, Chrome, FF4, iOS 4, Android 2.1+ */
border: 2px solid #fff;
max-width: 75%;
max-height: 60%;
margin-bottom: 40px;
}

.highlight {
Expand Down
48 changes: 44 additions & 4 deletions doc/tutorial/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,47 @@ also match the lengths vector::

The ``counts`` matrix is here of size 350 by 350.

You've successfully loaded your first Hi-C data! Let's plot it.



You've successfully loaded your first Hi-C data!
The corresponding image is the following.

.. image:: /auto_examples/datasets/images/sphx_glr_plot_yeast_sample_001.png
:target: ../../auto_examples/datasets/plot_yeast_sample.html
:align: center
:scale: 50


Normalizing a data set
=======================

Now that we have some data loaded, let's proceed to normalizing it. There are
two normalization algorithms implemented in `iced`: ICE and SCN. ICE is the
most widely used normalization technique on Hi-C data, so this is the one we
will showcase.

ICE is based on a matrix balancing algorithm. The underlying assumptions are
that the contact map suffers from biases that can be decomposable as a product
of regionale biases: :math:`C_{ij} = \beta_i \beta_j N_{ij}`, where
:math:`C_{ij}` is the raw contact counts between loci :math:`i` and :math:`j`,
:math:`N_{ij}` the normalized contact counts, and :math:`\beta` the bias
vector.

Normalizing the data is as simple as follows ::

>>> from iced import normalization
>>> normed = normalization.ICE_normalization(counts)

But the estimation of the bias vector can be severely problematic in low
coverage regions. In fact, if the matrix is too sparse, the algorithm may not
converge at all! To avoid this, Imakaev et al recommend filtering out a
certain percentage of rows and columns that interact the least. This has to
be performed prior to applying the normalization algorithm::
>>> from iced import filter
>>> counts = filter.filter_low_counts(counts, percentage=0.04)
>>> normed = normalization.ICE_normalization(counts)


.. image:: /auto_examples/normalization/images/sphx_glr_plot_ice_normalization_001.png
:target: ../../auto_examples/normalization/plot_ice_normalization.html
:align: center
:scale: 75
8 changes: 4 additions & 4 deletions examples/normalization/plot_ice_normalization.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,18 +26,18 @@

fig, axes = plt.subplots(ncols=2, figsize=(12, 4))

axes[0].imshow(counts, cmap="Blues", norm=colors.SymLogNorm(1),
axes[0].imshow(counts, cmap="RdBu_r", norm=colors.SymLogNorm(1),
origin="bottom",
extent=(0, len(counts), 0, len(counts)))

[axes[0].axhline(i, linewidth=1, color="#000000") for i in lengths.cumsum()]
[axes[0].axvline(i, linewidth=1, color="#000000") for i in lengths.cumsum()]
axes[0].set_title("Raw contact counts")
axes[0].set_title("Raw contact counts", fontweight="bold")

m = axes[1].imshow(normed, cmap="Blues", norm=colors.SymLogNorm(1),
m = axes[1].imshow(normed, cmap="RdBu_r", norm=colors.SymLogNorm(1),
origin="bottom",
extent=(0, len(counts), 0, len(counts)))
[axes[1].axhline(i, linewidth=1, color="#000000") for i in lengths.cumsum()]
[axes[1].axvline(i, linewidth=1, color="#000000") for i in lengths.cumsum()]
cb = fig.colorbar(m)
axes[1].set_title("Normalized contact counts")
axes[1].set_title("Normalized contact counts", fontweight="bold")

0 comments on commit aa2a5db

Please sign in to comment.