Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Just 300 reads difference between IgG control files causing 1.5 lakh peak difference #65

Open
RanjanaJambu9 opened this issue Jun 29, 2021 · 2 comments

Comments

@RanjanaJambu9
Copy link

RanjanaJambu9 commented Jun 29, 2021

Hi Mike,

I was trying to validate the SEACR CUT&Tag pipeline from https://www.protocols.io/view/cut-amp-tag-data-processing-and-analysis-tutorial-bjk2kkye. The same input datasets have been used. The results match when the treatment and control are not trimmed (for low quality). We tested trimming with different parameters. FastQ is the tool used to trim the reads. We set the M parameter to different values to check how trimming would affect peak calling.

  NoTrim_NoProperAlign_NoDedupIgG NoTrim_NoProperAlign_DedupIgGonly M10_noproperalign_dedupIgGonly M20_noproperalign_dedupIgGonly M30_noproperalign_dedupIgGonly
Hs_K27m3_rep1.stringent.bed 185672 148521 154916 152717 94353
Hs_K27m3_rep2.stringent.bed 105895 73051 75760 73384 243649
Hs_K4m3_rep1.stringent.bed 5013 8965 8956 8934 8285
Hs_K4m3_rep2.stringent.bed 8249 8210 8207 8149 7076
The difference in reads between M20 and M30 is hardly 0.1% but there is a huge jump in the number of peaks between these datasets.

Any idea why so many suprious peaks are called?

Ranjana

@mpmeers
Copy link
Collaborator

mpmeers commented Jul 7, 2021

Hi Ranjana,

Apologies for the delay in responding. Do you have a sense of whether whole H3K27me3 domains are being lost as peaks, or whether there are still peaks representing domains with intermediate peaks "dropping out" or being fused together? When input depth changes, especially when the files are being internally normalized in "norm" mode, the threshold may change to cause filtering out of intermediate peaks within a coherent domain, or alternatively peaks may become longer and then become automatically joined as default SEACR behavior when they are close enough together, which would cause a large difference in peak number without a large difference in the actual base coverage of all peaks.

Related to this, I just recently pushed a development version for SEACR v1.4 that implements a new scaling approach for "norm" that I have found to be more robust to variations in read depth, so you might give that a try.

Mike

@RanjanaJambu9
Copy link
Author

Hi Mike,

Thanks for the prompt response.

I think I have forgotten to mention that the peaks were called with "non" and "stringent" options with IgG as the control.

Ranjana

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants