Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document of QV file #148

Open
ZexuanZhao opened this issue Dec 2, 2024 · 1 comment
Open

Document of QV file #148

ZexuanZhao opened this issue Dec 2, 2024 · 1 comment

Comments

@ZexuanZhao
Copy link

Hi!

I'm using merqury v1.3 to evaluate QV scores of each HiFi reads to identify contamination reads based on an Illumina dataset that is not contaminated. Here's a glance of the data I got from the .qv file.

Screenshot 2024-12-02 at 1 19 43 PM

The first 5 columns are named according to this doc, where the shared column, according to the doc, is k-mers found in both assembly and the read set. The last column is the size of the reads calculated using samtools faidx.

What drew my attention is that sometimes uniq + shared > size + 20, which should not happen given the limited amount of kmers a sequence can have. But I also noticed that shared = size - 20, which makes me wonder if shared should actually be the total amount of kmers instead of shared kmers.

To confirm that I recalculated the QV score from your paper but assuming the shared column is the total number of kmers while the shared number of khmers is the third column subtract the second column.
Screenshot 2024-12-02 at 1 24 59 PM

And this confirmed my suspicions. My QV calculation is the same as the one generated by mercury.
Screenshot 2024-12-02 at 1 27 07 PM

I'm wondering if you can check the documentation and see if the description of the third column of the QV file is correct?

Best,
Zexuan

@arangrhie
Copy link
Contributor

Hi @ZexuanZhao!

Thanks for pointing this out - I corrected the wiki description.
The shared is indeed total num. of k-mers in the assembly, in your case, the read sequence.

Sorry for the confusion!

Best,
Arang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants