You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to compare the performance of different mappers on long reads and minigraph has unexpectedly low accuracy. I'm mapping 1 million simulated HiFi and R10 reads to the HPRC v1.0 chm13 minigraph graph with minigraph. Minigraph is about 1% less accurate than graphaligner on the same graph, minimap2 on chm13, as well as other mappers on the minigraph-cactus graph. I expected minigraph to perform at least as well as minimap2. Do you see anything wrong about the way I processed the graph or ran minigraph?
In order to get the output gaf to work with the vg tools, I edited the gfa: sed 's/chr([0-9]*|X|Y|M)/CHM13#0#chr\1/g' to change the reference names and sed 's/\ts([0-9]*)\t/\t\1\t/g' to take the s out of the segment names.
I then ran minigraph with: minigraph --vc -N 0 -cx lr -t {threads} {input.gfa} {input.fastq} >{output.gaf}
This may be an unrelated issue, but I also noticed that minigraph produces multiple primary alignments that sometimes overlap in the read or the graph. I attached an example of such a read. S1_19235.gaf.txt As far as I understand it, these cannot be chimeric alignments because some of them overlap in the read or the graph. Should some of them be considered secondary alignments?
Thanks!
Xian
The text was updated successfully, but these errors were encountered:
If you don't want secondary alignment, use option --secondary=no. Applying -N0 will reduce mapping accuracy. When minimap2 sees -N0, it will ignore the option and throw a warning because this is a common mistake. Minigraph doesn't have this mechanism.
How accuracy is evaluated? Around a complex VNTR, minigraph often can align most bases correctly but may choose a few wrong nodes in the middle of the alignment. If you require exact path match, graphaligner can be more accurate. The minigraph paper mentions this limitation. The latest minigraph alleviates this problem but the latest graphaligner might be better.
Both bwa-mem and minimap2 may also output multiple primary alignments with overlaps as there are often local homology around a breakpoint. This is the expected behavior.
I tried running it again with --secondary=no instead of -N0 but the accuracy is still low.
I'm using vg annotate and vg gamcompare for evaluating accuracy. Reads are annotated with reference positions everywhere they overlap the reference paths in the graph. If any of the annotations on the read match any of the truth annotations on the simulated read, regardless of where they occur on the read, then it is counted as correct.
This is inconsistent with my old evaluation on GRCh38. Perhaps most wrong mappings come from chm13 centromeres. Minigraph doesn't try hard to align centromeric reads as on real data, a large fraction of centromeric reads can't be aligned between samples anyway.
Hello,
I am trying to compare the performance of different mappers on long reads and minigraph has unexpectedly low accuracy. I'm mapping 1 million simulated HiFi and R10 reads to the HPRC v1.0 chm13 minigraph graph with minigraph. Minigraph is about 1% less accurate than graphaligner on the same graph, minimap2 on chm13, as well as other mappers on the minigraph-cactus graph. I expected minigraph to perform at least as well as minimap2. Do you see anything wrong about the way I processed the graph or ran minigraph?
In order to get the output gaf to work with the vg tools, I edited the gfa:
sed 's/chr([0-9]*|X|Y|M)/CHM13#0#chr\1/g'
to change the reference names andsed 's/\ts([0-9]*)\t/\t\1\t/g'
to take thes
out of the segment names.I then ran minigraph with:
minigraph --vc -N 0 -cx lr -t {threads} {input.gfa} {input.fastq} >{output.gaf}
This may be an unrelated issue, but I also noticed that minigraph produces multiple primary alignments that sometimes overlap in the read or the graph. I attached an example of such a read. S1_19235.gaf.txt As far as I understand it, these cannot be chimeric alignments because some of them overlap in the read or the graph. Should some of them be considered secondary alignments?
Thanks!
Xian
The text was updated successfully, but these errors were encountered: