Improbably high peak call counts with low IgG read counts -- any way around this? #92

SolKatzman · 2023-01-11T20:43:55Z

SEACR v1.3
SEACR v1.4-beta.2

In some CutRun experiments that I am analyzing there were, unfortunately, relatively low counts of IgG control mappings (compared to target mappings) . So when I try to use bedgraph files derived from merged target replicates, in order to increase the accuracy of called peaks, the net result seems to be a substantial overabundance of called peaks.

Here are the details. I ran the 3 replicates (sampA, sampB, sampC) separately and also merged (3samp) into one. The 3 IgG replicates were merged into one control that was used for all SEACR runs. I ran with "norm" in both "relaxed" and "stringent" modes, for each of the two versions of SEACR noted above. I tried v1.4 based on the suggestion in Issue #76

The experimental samples were H3K4me1 from mouse (mapped to mm10). The controls were IgG. The indicated read counts were taken from the bam files before generating the bedgraph files. The numPeaks, median, and average lengths were derived from the SEACR output files. The empirical FDR values were taken from the SEACR run logs. (Note my Issue #91 -- more information in the log might be helpful to understand these results.)

seacr_run: v1.3_relaxed: v1.3_relaxed: v1.3_relaxed: v1.3_relaxed: v1.4_relaxed: v1.4_relaxed: v1.4_relaxed: v1.4_relaxed: v1.3_stringent: sampA v1.3_stringent: sampB v1.3_stringent: sampC v1.3_stringent: 3samp v1.4_stringent: sampA v1.4_stringent: sampB v1.4_stringent: sampC v1.4_stringent: 3samp #Sample ctrlReads exptReads numPeaks medLen avgLen empFDR sampA 2.56M 3.45M 81422 172 187 0.054 sampB 2.56M 3.91M 65394 195 214 0.058 sampC 2.56M 3.70M 83225 169 181 0.055 3samp 2.56M 11.06M 502204 185 234 0.011 sampA 2.56M 3.45M 70063 179 193 0.058 sampB 2.56M 3.91M 100790 175 193 0.065 sampC 2.56M 3.70M 82307 169 181 0.096 3samp 2.56M 11.06M 421670 197 249 0.014 2.56M 3.45M 32275 213 228 0.024 2.56M 3.91M 26703 241 260 0.029 2.56M 3.70M 28739 211 223 0.028 2.56M 11.06M 213091 254 319 0.003 2.56M 3.45M 27407 220 236 0.028 2.56M 3.91M 33200 229 248 0.022 2.56M 3.70M 20847 225 237 0.038 2.56M 11.06M 182323 269 337 0.003

It is mildly interesting that v1.4 has fewer peaks called than v1.3 for sampA and sampC, but more for sampB, apparently due to less merging of the peaks (based on the median length, which is more consistent across the 3 replicates in v1.4)

Although v1.4 has reduced the count of peaks called for 3samp, the net output is still improbably high. Note that the median length of the 3samp peaks is (only) 10% to 20% higher than the sampA,B,C peaks. Average length is more like 50% higher. I am not sure how to interpret the lower empirical FDR numbers for the 3samp runs compared to sampA,B,C.

Note that I am presenting this (perhaps extreme) case to see if you have any other suggestions for dealing with relatively low control mapping counts. I have other experiments in which even single replicates have perhaps a factor of 2X read counts compared to a merged set of control replicates.

I truly would appreciate any insight that you might have.

Sol Katzman
UC Santa Cruz Genomics Institute

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improbably high peak call counts with low IgG read counts -- any way around this? #92

Improbably high peak call counts with low IgG read counts -- any way around this? #92

SolKatzman commented Jan 11, 2023

Improbably high peak call counts with low IgG read counts -- any way around this? #92

Improbably high peak call counts with low IgG read counts -- any way around this? #92

Comments

SolKatzman commented Jan 11, 2023