Just 300 reads difference between IgG control files causing 1.5 lakh peak difference #65

RanjanaJambu9 · 2021-06-29T16:41:36Z

Hi Mike,

I was trying to validate the SEACR CUT&Tag pipeline from https://www.protocols.io/view/cut-amp-tag-data-processing-and-analysis-tutorial-bjk2kkye. The same input datasets have been used. The results match when the treatment and control are not trimmed (for low quality). We tested trimming with different parameters. FastQ is the tool used to trim the reads. We set the M parameter to different values to check how trimming would affect peak calling.

	NoTrim_NoProperAlign_NoDedupIgG	NoTrim_NoProperAlign_DedupIgGonly	M10_noproperalign_dedupIgGonly	M20_noproperalign_dedupIgGonly	M30_noproperalign_dedupIgGonly
Hs_K27m3_rep1.stringent.bed	185672	148521	154916	152717	94353
Hs_K27m3_rep2.stringent.bed	105895	73051	75760	73384	243649
Hs_K4m3_rep1.stringent.bed	5013	8965	8956	8934	8285
Hs_K4m3_rep2.stringent.bed	8249	8210	8207	8149	7076

The difference in reads between M20 and M30 is hardly 0.1% but there is a huge jump in the number of peaks between these datasets.

Any idea why so many suprious peaks are called?

Ranjana

mpmeers · 2021-07-07T17:39:17Z

Hi Ranjana,

Apologies for the delay in responding. Do you have a sense of whether whole H3K27me3 domains are being lost as peaks, or whether there are still peaks representing domains with intermediate peaks "dropping out" or being fused together? When input depth changes, especially when the files are being internally normalized in "norm" mode, the threshold may change to cause filtering out of intermediate peaks within a coherent domain, or alternatively peaks may become longer and then become automatically joined as default SEACR behavior when they are close enough together, which would cause a large difference in peak number without a large difference in the actual base coverage of all peaks.

Related to this, I just recently pushed a development version for SEACR v1.4 that implements a new scaling approach for "norm" that I have found to be more robust to variations in read depth, so you might give that a try.

Mike

RanjanaJambu9 · 2021-07-07T19:05:33Z

Hi Mike,

Thanks for the prompt response.

I think I have forgotten to mention that the peaks were called with "non" and "stringent" options with IgG as the control.

Ranjana

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Just 300 reads difference between IgG control files causing 1.5 lakh peak difference #65

Just 300 reads difference between IgG control files causing 1.5 lakh peak difference #65

RanjanaJambu9 commented Jun 29, 2021 •

edited

Loading

mpmeers commented Jul 7, 2021

RanjanaJambu9 commented Jul 7, 2021

Just 300 reads difference between IgG control files causing 1.5 lakh peak difference #65

Just 300 reads difference between IgG control files causing 1.5 lakh peak difference #65

Comments

RanjanaJambu9 commented Jun 29, 2021 • edited Loading

mpmeers commented Jul 7, 2021

RanjanaJambu9 commented Jul 7, 2021

RanjanaJambu9 commented Jun 29, 2021 •

edited

Loading