Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEACR missing lots of obvious peaks #87

Open
Lillianwu0314 opened this issue Jun 27, 2022 · 4 comments
Open

SEACR missing lots of obvious peaks #87

Lillianwu0314 opened this issue Jun 27, 2022 · 4 comments

Comments

@Lillianwu0314
Copy link

Hi all,

  • I ran CUT&RUN experiments with 200.000 cells per experiment and H3K27me3 (CST #9733 antibody) being my positive control and IgG being my negative control.

  • These were all sequenced at 150x150bp with 10-20 million reads each.

  • I aligned the reads to the new CHM13-T2T genome with Bowtie2, checked the bam duplication rate and they were all <50% so I did not remove duplicates.

  • I generated bedgraphs from the bam files with bedtools genomecov (I did not normalise with my spike-in as it's not required for my set of experiments).

  • I then called peaks using SEACR with IgG as the background and in 'norm' mode.

  • Looking at the peaks called, it seems like SEACR fails to call peaks in many sites where there is obvious enrichment in H3K27me3.

IGV H3K27me3 HOXC cluster
Here's an IGV screenshot of the HOXC gene cluster.
Blue: H3K27me3 (top) and IgG (bottom) bedgraphs
Green: Peaks called with MACS2 using different p and q-value cut-offs (I used my bam files as input and -f BAMPE)
Red: Peaks called with SEACR on relaxed (top) and stringent (bottom) mode

What could possibly be causing this? I'm double checking all my scripts in case I've made a silly mistake.

Thanks for taking the time to read this.
Lillian

@Lillianwu0314 Lillianwu0314 changed the title SEACR missing lots of obvious SEACR missing lots of obvious peaks Jun 27, 2022
@Lillianwu0314
Copy link
Author

Hi all,

Adding to this, I also did the same analysis with reads aligned to the hg38 assembly. The peaks called with SEACR make a lot more sense. Here's a screenshot of the HOXC gene cluster with peaks called by SEACR relaxed (top) and stringent (bottom) mode. IGV H3K27me3 HOXC cluster hg38

Thanks,
Lillian

@mpmeers
Copy link
Collaborator

mpmeers commented Jun 27, 2022

Hello,

It's possible that the T2T assembly is going to cause there to be some differences in the global background estimation, since large pileups at difficult-to-map repeated regions might be differently represented now that there are presumably many more available reference sites at which ambiguous reads could map. My understanding is that the bowtie2 default (in contrast with -k or -a mode) is to assign the read semi-randomly to a single map location when there are multiple possible map locations with the same quality score (described in more detail here). Are you using bowtie2 here, and if so is it doing the default map position reporting? Does removing duplicates make a difference? This is something I will probably need to take a closer look at since it could be a generalized behavior with the T2T builds, which are unlike anything we've used before.

Mike

@Lillianwu0314
Copy link
Author

Hi Mike,

Thanks for the quick reply and explanation. I did try removing duplicates from the IgG (duplication rate ~50%) but not H3K27me3 (duplication 13%) and then running SEACR with that. It doesn't seem to improve peak calling unfortunately.

image

I did use Bowtie2 with the default setting. I'll have a look into -k and -a and see if that's appropriate for what I'm looking into. I've also considered using --non-deterministic so that identical reads don't get aligned to the same places if there are multiple equally good alignments.

Thanks,
Lillian

@clabanillas
Copy link

Hello,

I've been having the same issue for a while now. I also run on T2T aligned reads. Many peaks are missed and peak calling is not uniform between replicates:
image

Has there been an update on this?
Thanks a lot,
Claudia

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants