Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q: gappedPeak instead of narrow peak on HMMRATAC output #670

Open
jffpviana opened this issue Oct 29, 2024 · 3 comments
Open

Q: gappedPeak instead of narrow peak on HMMRATAC output #670

jffpviana opened this issue Oct 29, 2024 · 3 comments

Comments

@jffpviana
Copy link

Use case
I'm using the hmmratac implementation within MACS3 but it does not output narrowPeak.

Describe the problem
I'm using the hmmratac implementation within MACS3 but it does not output narrowPeak. Instead it outputs a gappedPeak file. The description of the columns for the original HMMRATAC gappedPeak output () do not match the MACS3 output.

Describe the solution you tried
I'm using the gappedPeak column descriptors provided by ENCODE (https://genome.ucsc.edu/FAQ/FAQformat.html#format14), but it's not clear if this is correct. The "score" column is all 0. Would it be possible to either output narrowPeak using macs3 hmmratac or to provide a better description for the columns on the gappedPeak output? Could the documentation please be changed? It currently says the output is narrowPeak.

@taoliu
Copy link
Contributor

taoliu commented Oct 29, 2024

Hi @jffpviana when we reimplement the HMMRATAC algorithm in MACS3, we decided to make the output simple. Since we want to focus on the interested 'open' regions, we currently output those 'open states' that surrounded by 'nucleosome states' in the narrowPeak format. The gappedPeak format is good to describe a nested structure such as the intron-exon structure of gene, and previously we put 'open' and 'nucleosome' states together and use this format for output. We feel that it may create unnecessary difficulties in downstream analyses if people only want to detect the open sites.

If you are insterested in the nucleosome states, you can use --save-states option to save all open and nucleosome states.

@jffpviana
Copy link
Author

Hi @taoliu, thank you very much for the clarification and quick response. Also thank you for the great software and HMMRATAC implementation.
Can I ask about the two sets of coordinates given in the gappedPeak? In my case I seem to have the peak coordinates as columns 1 to 3 and then some other shifted coordinates. Could you help me understand the difference?
Also, is the score column 5 like it says on the UCSC webpage? In my case all scores are 0.
I'm very sorry if all these things are obvious, but I'm sort of new to ATAC-seq and I there's still little out there using the HMMRATAC implementation in MACS.

Mostly I'm struggling with regions of the genome with an extension of nearby peaks, which are all being clumped together as one peak. But when I look on IGV they look like they are distinct peaks one after the other.

@taoliu
Copy link
Contributor

taoliu commented Nov 4, 2024

@jffpviana Let's first make it clear that we are discussing the macs3 hmmratac (v3.0.2) output. If you plan to use the macs3 version of hmmratac, please use the new v3.0.2 version since 1) it requires less memory; 2) it provides a nicer poisson emission model version of HMM; 3) the output format of narrowPeak is cleaner than previous gappedPeak. The gappedPeak output may make the nearby peaks clumped together which I don't like. The description of the output can be found here. In this case, the fifth column :

5. peak score. The score is the maximum foldchange (signal/average signal) 
  within the peak. By default, the signal is the total pileup of all types of fragments 
  from short to tri-nuc size fragments.

I need to modify this description more accurately since the actual value there is 10*foldchange instead of just the foldchange. So if you want to get the actual 'score' or foldchange of this peak, take the 5th column and divided by 10 at this moment... I need to find a better way to put the foldchange value such as the 7th column.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants