You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using merqury v1.3 to evaluate QV scores of each HiFi reads to identify contamination reads based on an Illumina dataset that is not contaminated. Here's a glance of the data I got from the .qv file.
The first 5 columns are named according to this doc, where the shared column, according to the doc, is k-mers found in both assembly and the read set. The last column is the size of the reads calculated using samtools faidx.
What drew my attention is that sometimes uniq + shared > size + 20, which should not happen given the limited amount of kmers a sequence can have. But I also noticed that shared = size - 20, which makes me wonder if shared should actually be the total amount of kmers instead of shared kmers.
To confirm that I recalculated the QV score from your paper but assuming the shared column is the total number of kmers while the shared number of khmers is the third column subtract the second column.
And this confirmed my suspicions. My QV calculation is the same as the one generated by mercury.
I'm wondering if you can check the documentation and see if the description of the third column of the QV file is correct?
Best,
Zexuan
The text was updated successfully, but these errors were encountered:
Thanks for pointing this out - I corrected the wiki description.
The shared is indeed total num. of k-mers in the assembly, in your case, the read sequence.
Hi!
I'm using merqury v1.3 to evaluate QV scores of each HiFi reads to identify contamination reads based on an Illumina dataset that is not contaminated. Here's a glance of the data I got from the
.qv
file.The first 5 columns are named according to this doc, where the
shared
column, according to the doc, is k-mers found in both assembly and the read set. The last column is the size of the reads calculated usingsamtools faidx
.What drew my attention is that sometimes
uniq
+shared
>size
+ 20, which should not happen given the limited amount of kmers a sequence can have. But I also noticed that shared = size - 20, which makes me wonder ifshared
should actually be the total amount of kmers instead of shared kmers.To confirm that I recalculated the QV score from your paper but assuming the
shared
column is the total number of kmers while the shared number of khmers is the third column subtract the second column.And this confirmed my suspicions. My QV calculation is the same as the one generated by mercury.
I'm wondering if you can check the documentation and see if the description of the third column of the QV file is correct?
Best,
Zexuan
The text was updated successfully, but these errors were encountered: