You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm doing a standard Cas9 search on a 1 gigabase chicken genome and the step "Find genomic features closest to the guide" is taking longer than the guide finding step. I ran this on a 384 GB, 48-core node. using SLURM sacct I get a reported Max RSS of 199GB. I will look into ways to process this that improve speed /memory use. A lot of optimizations of guidemaker were for microbial genomes but most users are using it for eukaryotes. Long-term the plan is to improve the experience for eukaryotic users.
The text was updated successfully, but these errors were encountered:
I've looked into this issue more while doing some other updates. It takes 2.5 h to identify all the guides then 30 hours to figure out the genes they target in the chicken genome. The performance issue appears to be with the bedtools library called by pybedtools. Some alternatives include Bedops and bedtk. it is also possible this could be handled by Pandas or an interval tree data object like this Python interval tree or this c++ interval tree. The guidemaker.core.Annotation class needs to be rewritten to handle this.
I'm doing a standard Cas9 search on a 1 gigabase chicken genome and the step "Find genomic features closest to the guide" is taking longer than the guide finding step. I ran this on a 384 GB, 48-core node. using SLURM sacct I get a reported Max RSS of 199GB. I will look into ways to process this that improve speed /memory use. A lot of optimizations of guidemaker were for microbial genomes but most users are using it for eukaryotes. Long-term the plan is to improve the experience for eukaryotic users.
The text was updated successfully, but these errors were encountered: