Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing MACS2 output from earlier pipeline version #291

Open
clarabakker opened this issue Jan 25, 2023 · 2 comments
Open

Changing MACS2 output from earlier pipeline version #291

clarabakker opened this issue Jan 25, 2023 · 2 comments

Comments

@clarabakker
Copy link

I am currently updating an adaptation of this pipeline from a version forked from v1.1.2 to v2.1.6. Very minor changes were made in both adaptations for the sake of metadata/display, without modifying the analyses performed. Given that, however, my question is a bit more general.

The phenomenon I am observing is that the optimal and conservative peaks generated by MACS2 are often identical for pipeline version 2.1.6. I read in the documentation that the optimal peaks are selected as the set with the most peaks, while the conservative set is selected as the set of true replicate consistent peaks, and can see that these are the same set. However, I was not expecting that the true replicate peaks would be the largest set.

To give a concrete example, running the old and new pipeline on the same set (from fastq files), the choice of conservative peaks changes from the ppr to rep1-rep2 set from v1.1.1 to v2.1.6. The full QC metrics are available here (v1.1.1 and v2.1.6) and the dataset is available at https://data.4dnucleome.org/experiment-set-replicates/4DNESSNWXHXK/. The control experiment was similarly processed starting from fastq files with old and new pipeline versions, respectively.

All this is to ask if there is some change (perhaps in the method of generating/number of pseudoreplicates) that might explain this difference in output, or if it is more likely there is a flaw in the new implementation. I know v1.1.1 is quite old at this point, which I apologize for. I would be grateful for any insight you might have and happy to provide more information should it be helpful.

  • Docker image: encodedcc/chip-seq-pipeline:v2.1.6 (with slight modification to qc metrics)
  • Pipeline version: 2.1.6
@leepc12
Copy link
Contributor

leepc12 commented Jan 25, 2023

Looking at these two QC reports. I need to see QC metrics for alignment steps too.
Also, please post your input JSON file for those two runs. Also, input JSON for the alignment process too if possible.
We changed the way how Strand cross-correlation measures (xcor step) estimates fragment length.
I think this might be the major reason for the difference.

@clarabakker
Copy link
Author

Thank you so much--sorry for the delay! Here are the two QCs for alignment for this set using v1.1.1 (1 and 2) and the two using v2.1.6 (3 and 4).

The input JSONs for the two runs in the original question: v1.1.1 post-alignment JSON and v2.1.6 post-alignment JSON.

The input JSONs for v1.1.1 alignment are here: 1 and 2. However, the v2.1.6 alignment input JSON is a bit difficult to retrieve at the moment. I can update with it if it would be helpful--I just don't want to take too much of your time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants