Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does modkit extract handle indels on reads? #249

Open
Fu-Yilei opened this issue Aug 21, 2024 · 4 comments
Open

How does modkit extract handle indels on reads? #249

Fu-Yilei opened this issue Aug 21, 2024 · 4 comments
Labels
enhancement New feature or request question Looking for clarification on inputs and/or outputs

Comments

@Fu-Yilei
Copy link

Hey I recently am running modkit on/around some structural variant calls. I was interested in using modkit extract to extract the methylation status within each reads on and around SVs. However, when I was trying to run modkit extract with given BED file generated from my SV calls, I found the modkit extract seems only output the location within my provided reference genome.
My command is:
modkit extract -t 20 -q 100000 --reference GCA_000001405.15_GRCh38_no_alt_analysis_set.fa --read-calls-path ./modkit_extract/sv_related_read_calls.csv --cpg --include-bed sv_related.bed --allow-non-primary --log-filepath log.log ont.bam sv_related.csv
In here, sv_related.bed is several hundreds - thousands bps around the SV breakpoint.
Does removing the --reference option helps rescuing the locations on reads that are not aligned to the reference genome? Have you guys have any practices with profiling the methylation along with SVs?

@ArtRand
Copy link
Contributor

ArtRand commented Aug 23, 2024

Hello @Fu-Yilei,

I apologize for the delayed response. When you use the --include-bed option modkit extract will only output mapped modified base positions. I believe there is a log line indicating this behavior. If you want to capture unmapped positions you can't use --include-bed, unfortunately. What you could do is grab reads that overlap with samtools first, then pipe it through extract. You may also want to be sure to use the -Y flag when aligning so that supplementary alignments can be used by extract (see the docs). Adding the reference shouldn't make a difference. Unmapped positions will have reference position -1. Leveraging SVs is something I'm interested in making easier. Maybe the modVCF format will help, open to hearing ideas.

@ArtRand ArtRand added enhancement New feature or request question Looking for clarification on inputs and/or outputs labels Aug 26, 2024
@Fu-Yilei
Copy link
Author

Thank you Arthur!
Sorry for the late reply, I was moving over the weekend..
While I think modVCF is great for reporting methylation signals with the capability of handling haplotype etc., the real-level analysis is always needed at least for SV-related analysis with methylation. Here I am not talking about stuff like mQTL, but more about reporting methylation changes around a SV. In this case, it is inevitable to look into SV-supporting-reads from the SV callers, which makes the modVCF and current bedmethyl file both cannot handle this task.
However, I found visualization sometimes can provide a more intuitive angle for checking methylation changes around SVs. I think Marcus talked about modkit localize over the 1000g meeting last week, which I found quite interesting. Have you guys checked that function around a SV breakpoint?

@ArtRand
Copy link
Contributor

ArtRand commented Aug 26, 2024

Hello @Fu-Yilei,

Using localize is an interesting idea. The prototype that I have uses the bedMethyl and is a pretty simple "bedtools-like" operation of aggregating methylation frequencies around genomic features. Sounds like that wouldn't work for SVs though since what you care about are reads reporting methylation that isn't in the bedMethyl (i.e. reads partially aligned to the SV). What you could do is essentially another "pileup" where you anchor at the SV breakpoint then use the read-coordinates to look aggregate. What do you think? I'm not an SV expert. If I get you a prototype, would you be willing to test it?

@Fu-Yilei
Copy link
Author

Sure I'd love to do that! I have several insertion breakpoint to test in a mixed cell line sample.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Looking for clarification on inputs and/or outputs
Projects
None yet
Development

No branches or pull requests

2 participants