GroupReadsByUmi long runtime #944

ghost · 2023-10-19T12:50:46Z

Running this command with v2.1.0:

    fgbio -Xmx32g --async-io GroupReadsByUmi  \
        -Djava.io.tmpdir=tmp  \
        --input=input.bam \
        --output=output.bam  \
        --strategy=Adjacency  \
        --edits=1  \
        --min-map-q=10  \
        --family-size-histogram metrics.txt

Why did the first 2M reads take ~8 hours to group?

I have several samples all around 100M reads. Some process quickly as expected and others hang as this one does. I have no idea why this is happening.

The text was updated successfully, but these errors were encountered:

nh13 · 2023-10-19T15:55:04Z

It may be the case you have extremely high coverage of each template and/or genomic coordinate? Can you check if you have provided enough memory by looking at the memory usage of the process?

ghost · 2023-10-19T16:03:28Z

I have tried using a large amount of memory (up to 100GB). Would adding multithreading to this step be an option for future development? Similar to what is available in the CallMolecularConsensus step?

nh13 · 2023-10-19T16:09:43Z

It definitely looks like you have high coverage in that region, which makes it tough. Not knowing your UMI length(s), you may have very high per-molecule coverage.

It's not too much code, so I think both porting this to rust (like we have for other tools) as well as incorporating other advances since the time we originally wrote the tool can dramatically speed things up and perhaps reduce memory. We would be glad for folks to sponsor that work.

ghost · 2023-10-19T16:42:01Z

I understand. Thank you for the reply.

nh13 added the question label Oct 19, 2023

nh13 closed this as completed Feb 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GroupReadsByUmi long runtime #944

GroupReadsByUmi long runtime #944

ghost commented Oct 19, 2023 •

edited by ghost

Loading

nh13 commented Oct 19, 2023

ghost commented Oct 19, 2023

nh13 commented Oct 19, 2023

ghost commented Oct 19, 2023

GroupReadsByUmi long runtime #944

GroupReadsByUmi long runtime #944

Comments

ghost commented Oct 19, 2023 • edited by ghost Loading

nh13 commented Oct 19, 2023

ghost commented Oct 19, 2023

nh13 commented Oct 19, 2023

ghost commented Oct 19, 2023

ghost commented Oct 19, 2023 •

edited by ghost

Loading