Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of reads processed by modkit #241

Open
ZhihanNUS opened this issue Jul 30, 2024 · 4 comments
Open

Number of reads processed by modkit #241

ZhihanNUS opened this issue Jul 30, 2024 · 4 comments
Labels
question Looking for clarification on inputs and/or outputs

Comments

@ZhihanNUS
Copy link

Hi, it is my first time to use Modkit to extract modified base signals but have some problems.

I use Dorado v0.7.1 to do SUP + modification calling (m6A,pseU).
I use minimap2 to align the unaligned_mod_bam file
Number of read aligned to 'Mus_musculus.GRCm39.dna.primary_assembly.fa' is 19,446,790
However, when I use modkit to extract m6A, it says Processed ~7505886 reads and skipped zero reads

I wonder why modkit filtered the other reads?
Here is what I did,
modkit pileup input.bam output.bed --ref Mus_musculus.GRCm39.dna.primary_assembly.fa --motif A 0 -t 36

I notice it says Threshold of 0.6464844 for base A is low. Consider increasing the filter-percentile or specifying a higher threshold.
Is it related to my problem?

Thank you very much!

@ArtRand ArtRand added the question Looking for clarification on inputs and/or outputs label Aug 1, 2024
@ArtRand
Copy link
Contributor

ArtRand commented Aug 1, 2024

Hello @ZhihanNUS,

The pileup command will only use primary alignments to generate the bedMethyl table. If there are errors with records in the modBAM file you can inspect the reasons by using the --log-filepath $log_file option. The threshold is not related to the number of reads that are used, skipped, or error. Let me take a look at what threshold values I get on similar (mouse) data and get back to you.

@ZhihanNUS
Copy link
Author

Hi @ArtRand ,

I have checked my aligned bam file and the number of reads of primary alignment is 19M, I tried to realign the bam file and run modkit, but I still find the mismatch. Since there is no skipped reads, does it mean modkit pocessed all reads in bamfile, or there is any other rules to filter some reads?
This mismatch between number of reads from aligned bam file and number of reads processed by modkit happened in all samples. Do you have any other suggestions to check the reason for this?

Thank you very much!

@ArtRand
Copy link
Contributor

ArtRand commented Aug 27, 2024

Hello @ZhihanNUS,

Since there is no skipped reads, does it mean modkit pocessed all reads in bamfile, or there is any other rules to filter some reads?

Can you check the output of the --log-filepath? It will tell you if a read isn't used (usually because there isn't any modified base information).

The counter in modkit pileup that you see on the command line isn't completely perfect (thus why it says "Processed approx X reads"). The information in the log file will be more accurate. If you're not seeing any errors in the log file, all of the primary alignments are being used. If you have a log file I'd be happy to review it.

@ZhihanNUS
Copy link
Author

m6Alog.txt
Hello @ArtRand ,
Here it is the log file. And may I know if you have a recommend threshold value for mouse data?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Looking for clarification on inputs and/or outputs
Projects
None yet
Development

No branches or pull requests

2 participants