Speed up alignment 2-5x(?) by using mm2-plus #33

samuell · 2024-12-02T13:17:16Z

Enhancement suggestion

It seems it might be possible to speed up the most resource intensive part of EMU (the alignment part done by minimap2) by switching minimap2 to this new improved drop-in replacement: https://github.com/at-cg/mm2-plus

They report speedups of around 2-5x, from what I can see in the graphs, depending on the dataset, although that includes spreading out the workload on multiple CPU cores, which I understand might not make such a big difference in EMU, since EMU can leverage multiple CPU cores by running multiple minimap2 jobs in parallel anyways if I understand correctly(?)

Anyways, there is a preprint about the tool here: https://www.biorxiv.org/content/10.1101/2024.11.25.625328v1

Motivation

The resource requirements for EMU right now are somewhat demanding, which might hinder fast response times depending on the amount of samples and available compute power.

In our rough tests we have seen resource requirements in the ballpark of:

0.3-0.4 (CPU) core seconds per read (avg 1400 bp in length)
Around 0.5 core hours per 4000 reads chunk file output from the instrument
Full samples easily taking ~20 core hours, if having 40 such chunks (x 4000 reads), to compute (of course can be cut by ~10x by scaling out on 10 cores etc, but still pretty demanding).

kdc10 · 2024-12-03T11:41:55Z

Thanks for letting us know! We will look into it!

jodjo86 · 2024-12-04T13:59:53Z

Did you use the --mm2-forward-only argument ? It force minimap2 to consider the forward transcript strand only. While not as promising as mm2-fast, it can speed up EMU. The argument is suitable for Iso-seq, Direct RNA-seq and traditional full-length cDNAs.

source: https://github.com/lh3/minimap2?tab=readme-ov-file#map-long-mrnacdna-reads

samuell · 2024-12-05T14:06:46Z

Thank you for the suggestion @jodjo86 ! We'll have a look at that!

I should also say that after looking closer into mm2-plus, I realize the speedup might not be as great since a big part of it seems to be based on utilizing multiple CPU cores, which is already done in EMU by running multiple minimap2 jobs in parallel. So any remaining speedups are then probably coming from the SIMD optimizations I guess.

I'm still interested in trying that out, but haven't got to it just yet.

Will report back if and when!

jodjo86 · 2024-12-06T01:56:19Z

As far as I understand, EMU does not run multiple minimap2 jobs in parallel. The --threads argument of EMU is given directly to minimap2. The minimap2 documentation says Minimap2 uses at most three threads when indexing target sequences, and uses up to INT+1 threads when mapping. The indexing step with EMU is very fast (<2sec), it is therefore advantageous to use as much core as possible to speed up EMU.

Cautionary tale: mm2 is a robust and heavily documented tool, I would wait to see what happens with mm2-plus before implementing it.

I'm just an EMU user but I hope this helps.

source: https://lh3.github.io/minimap2/minimap2.html

jodjo86 · 2024-12-06T15:44:45Z

I did a small benchmark with the minimap2 step of EMU and a fastq file (16S amplicon ONT) of 61k reads (with EMU_db).

example command: minimap2 -ax map-ont -t $THREADS -N 50 -p .9 -u f -K 500000000 EMU_db/species_taxid.fasta $FASTQ -o $SAM

samuell changed the title ~~Use faster version of minimap2 to speed up pipeline~~ Speed up alignment 2-5x(?) by using mm2-plus Dec 2, 2024

kdc10 added the enhancement New feature or request label Dec 3, 2024

samuell mentioned this issue Dec 12, 2024

Optimize resource usage genomic-medicine-sweden/gms_16S#36

Open

jodjo86 mentioned this issue Jan 31, 2025

Problem running EMU with some samples #53

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up alignment 2-5x(?) by using mm2-plus #33

Speed up alignment 2-5x(?) by using mm2-plus #33

samuell commented Dec 2, 2024 •

edited

Loading

kdc10 commented Dec 3, 2024

jodjo86 commented Dec 4, 2024

samuell commented Dec 5, 2024 •

edited

Loading

jodjo86 commented Dec 6, 2024 •

edited

Loading

jodjo86 commented Dec 6, 2024 •

edited

Loading

Speed up alignment 2-5x(?) by using mm2-plus #33

Speed up alignment 2-5x(?) by using mm2-plus #33

Comments

samuell commented Dec 2, 2024 • edited Loading

Enhancement suggestion

Motivation

kdc10 commented Dec 3, 2024

jodjo86 commented Dec 4, 2024

samuell commented Dec 5, 2024 • edited Loading

jodjo86 commented Dec 6, 2024 • edited Loading

jodjo86 commented Dec 6, 2024 • edited Loading

samuell commented Dec 2, 2024 •

edited

Loading

samuell commented Dec 5, 2024 •

edited

Loading

jodjo86 commented Dec 6, 2024 •

edited

Loading

jodjo86 commented Dec 6, 2024 •

edited

Loading