Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible bug in peak calling_influenced by sequencing depth #76

Open
zuzerdhoondia opened this issue Jan 6, 2022 · 2 comments
Open

Comments

@zuzerdhoondia
Copy link

Hi,

I tried peak calling for a transcription factor with two IgG control files (test & control was spiked with E.coli DNA and normalized with scaling factor as described in protocols.io data processing). The control files were sequenced to different sequencing depth (30 million versus 24 million). Test file was with sequencing depth at 20 million reads

bash SEACR_1.3.sh target.bedgraph IgG.bedgraph non stringent output

Results were as follows:

  1. Test with control1 (30 million) : 2000 peaks approximately
  2. Test with control (24 million): 10000 peaks approximately.

With MACS 2 both combination gave large number of peaks. However SEACR was gave inconsistent results. I browsed peaks from all combinations in IGV and seems certain region were skipped by SEACR. (may be due to some cut off in seacr)

Does SEACR takes in to account any differences in sequencing depth between test and control similar to macs2 ?

@mpmeers
Copy link
Collaborator

mpmeers commented Jan 10, 2022

Hi,

SEACR does take account of read depth and internalized via the "norm" option. However, I have seen this issue in the past and I think it arises from edge cases where a "point" threshold is defined from a control file that gets normalized improperly. In response I changed the code in SEACR v1.4 (https://github.com/FredHutch/SEACR/tree/SEACRv1.4_dev) to test a range of thresholds around the point threshold defined by the specific control file to try to smooth out some of the inconsistencies in peak calling that are specific to control read depth. I'd recommend giving v1.4 a shot to see if that improves some of the divergence in peaks called for the different control files. Let me know if that helps or if there's anything else I can help with.

Mike

@icanwinwyz
Copy link

Hi,

SEACR does take account of read depth and internalized via the "norm" option. However, I have seen this issue in the past and I think it arises from edge cases where a "point" threshold is defined from a control file that gets normalized improperly. In response I changed the code in SEACR v1.4 (https://github.com/FredHutch/SEACR/tree/SEACRv1.4_dev) to test a range of thresholds around the point threshold defined by the specific control file to try to smooth out some of the inconsistencies in peak calling that are specific to control read depth. I'd recommend giving v1.4 a shot to see if that improves some of the divergence in peaks called for the different control files. Let me know if that helps or if there's anything else I can help with.

Mike

Hi mpmeers,
The SEACRv1.4_dev link doesn't work. Do you still have a version fixing the issue mentioned by zuzerdhoondia? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants