You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was trying to validate the SEACR CUT&Tag pipeline from https://www.protocols.io/view/cut-amp-tag-data-processing-and-analysis-tutorial-bjk2kkye. The same input datasets have been used. The results match when the treatment and control are not trimmed (for low quality). We tested trimming with different parameters. FastQ is the tool used to trim the reads. We set the M parameter to different values to check how trimming would affect peak calling.
NoTrim_NoProperAlign_NoDedupIgG
NoTrim_NoProperAlign_DedupIgGonly
M10_noproperalign_dedupIgGonly
M20_noproperalign_dedupIgGonly
M30_noproperalign_dedupIgGonly
Hs_K27m3_rep1.stringent.bed
185672
148521
154916
152717
94353
Hs_K27m3_rep2.stringent.bed
105895
73051
75760
73384
243649
Hs_K4m3_rep1.stringent.bed
5013
8965
8956
8934
8285
Hs_K4m3_rep2.stringent.bed
8249
8210
8207
8149
7076
The difference in reads between M20 and M30 is hardly 0.1% but there is a huge jump in the number of peaks between these datasets.
Any idea why so many suprious peaks are called?
Ranjana
The text was updated successfully, but these errors were encountered:
Apologies for the delay in responding. Do you have a sense of whether whole H3K27me3 domains are being lost as peaks, or whether there are still peaks representing domains with intermediate peaks "dropping out" or being fused together? When input depth changes, especially when the files are being internally normalized in "norm" mode, the threshold may change to cause filtering out of intermediate peaks within a coherent domain, or alternatively peaks may become longer and then become automatically joined as default SEACR behavior when they are close enough together, which would cause a large difference in peak number without a large difference in the actual base coverage of all peaks.
Related to this, I just recently pushed a development version for SEACR v1.4 that implements a new scaling approach for "norm" that I have found to be more robust to variations in read depth, so you might give that a try.
Hi Mike,
I was trying to validate the SEACR CUT&Tag pipeline from https://www.protocols.io/view/cut-amp-tag-data-processing-and-analysis-tutorial-bjk2kkye. The same input datasets have been used. The results match when the treatment and control are not trimmed (for low quality). We tested trimming with different parameters. FastQ is the tool used to trim the reads. We set the M parameter to different values to check how trimming would affect peak calling.
Any idea why so many suprious peaks are called?
Ranjana
The text was updated successfully, but these errors were encountered: