Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low accuracy on simulated reads and overlapping primary alignments #115

Open
xchang1 opened this issue Oct 14, 2024 · 3 comments
Open

Low accuracy on simulated reads and overlapping primary alignments #115

xchang1 opened this issue Oct 14, 2024 · 3 comments
Labels
question Further information is requested

Comments

@xchang1
Copy link

xchang1 commented Oct 14, 2024

Hello,

I am trying to compare the performance of different mappers on long reads and minigraph has unexpectedly low accuracy. I'm mapping 1 million simulated HiFi and R10 reads to the HPRC v1.0 chm13 minigraph graph with minigraph. Minigraph is about 1% less accurate than graphaligner on the same graph, minimap2 on chm13, as well as other mappers on the minigraph-cactus graph. I expected minigraph to perform at least as well as minimap2. Do you see anything wrong about the way I processed the graph or ran minigraph?

In order to get the output gaf to work with the vg tools, I edited the gfa:
sed 's/chr([0-9]*|X|Y|M)/CHM13#0#chr\1/g' to change the reference names and
sed 's/\ts([0-9]*)\t/\t\1\t/g' to take the s out of the segment names.
I then ran minigraph with:
minigraph --vc -N 0 -cx lr -t {threads} {input.gfa} {input.fastq} >{output.gaf}

This may be an unrelated issue, but I also noticed that minigraph produces multiple primary alignments that sometimes overlap in the read or the graph. I attached an example of such a read. S1_19235.gaf.txt As far as I understand it, these cannot be chimeric alignments because some of them overlap in the read or the graph. Should some of them be considered secondary alignments?

Thanks!
Xian

@lh3
Copy link
Owner

lh3 commented Oct 14, 2024

If you don't want secondary alignment, use option --secondary=no. Applying -N0 will reduce mapping accuracy. When minimap2 sees -N0, it will ignore the option and throw a warning because this is a common mistake. Minigraph doesn't have this mechanism.

How accuracy is evaluated? Around a complex VNTR, minigraph often can align most bases correctly but may choose a few wrong nodes in the middle of the alignment. If you require exact path match, graphaligner can be more accurate. The minigraph paper mentions this limitation. The latest minigraph alleviates this problem but the latest graphaligner might be better.

Both bwa-mem and minimap2 may also output multiple primary alignments with overlaps as there are often local homology around a breakpoint. This is the expected behavior.

@lh3 lh3 added the question Further information is requested label Oct 14, 2024
@xchang1
Copy link
Author

xchang1 commented Oct 14, 2024

Thanks for the quick response!

I tried running it again with --secondary=no instead of -N0 but the accuracy is still low.

I'm using vg annotate and vg gamcompare for evaluating accuracy. Reads are annotated with reference positions everywhere they overlap the reference paths in the graph. If any of the annotations on the read match any of the truth annotations on the simulated read, regardless of where they occur on the read, then it is counted as correct.

@lh3
Copy link
Owner

lh3 commented Oct 14, 2024

This is inconsistent with my old evaluation on GRCh38. Perhaps most wrong mappings come from chm13 centromeres. Minigraph doesn't try hard to align centromeric reads as on real data, a large fraction of centromeric reads can't be aligned between samples anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants