slow feature finding on animal genomes #20

arivers · 2023-06-01T14:47:55Z

I'm doing a standard Cas9 search on a 1 gigabase chicken genome and the step "Find genomic features closest to the guide" is taking longer than the guide finding step. I ran this on a 384 GB, 48-core node. using SLURM sacct I get a reported Max RSS of 199GB. I will look into ways to process this that improve speed /memory use. A lot of optimizations of guidemaker were for microbial genomes but most users are using it for eukaryotes. Long-term the plan is to improve the experience for eukaryotic users.

arivers · 2023-09-21T20:04:30Z

I've looked into this issue more while doing some other updates. It takes 2.5 h to identify all the guides then 30 hours to figure out the genes they target in the chicken genome. The performance issue appears to be with the bedtools library called by pybedtools. Some alternatives include Bedops and bedtk. it is also possible this could be handled by Pandas or an interval tree data object like this Python interval tree or this c++ interval tree. The guidemaker.core.Annotation class needs to be rewritten to handle this.

arivers added the enhancement New feature or request label Jun 1, 2023

arivers self-assigned this Jun 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

slow feature finding on animal genomes #20

slow feature finding on animal genomes #20

arivers commented Jun 1, 2023 •

edited

Loading

arivers commented Sep 21, 2023

slow feature finding on animal genomes #20

slow feature finding on animal genomes #20

Comments

arivers commented Jun 1, 2023 • edited Loading

arivers commented Sep 21, 2023

arivers commented Jun 1, 2023 •

edited

Loading