Releases: USDA-ARS-GBRU/GuideMaker
V 0.4.2
Fixed a bug where calculating Doench efficiency scores raised an error if there was an 'N' in the first three nucleotides past the PAM in the flanking genomic sequence. Guidemaker now removes those guides from consideration and reports it as a warning if the flag --doench_efficiency_score
is used and N's are present.
v0.4.1
- Changed how Guidemaker handles DNA sequences that are soft-masked with lowercase letters. The new behavior unmasks all sequences and finds guide candidates (and filters them for distance) against the entire sequence. It also fixes a key value error in computing the Doench efficiency scores when mixed case sequences were used. Previously, guides were only identified in regions where the PAM was all capitalized and edit distance was different if the case was different (a capitalized and lowercase guide were not considered the same).
- Dockerfiles now have more version information
- GitHub build actions were improved
v0.4.0
API changes
- changed the cli flag --filter_by_locus to --filter_by_attribute and added the flag --attribute_key so that keys other than the addribute key "locus_tag" can be filtered.
- Changed the cli to add --raw_output_only. This option allows the user to just get the guides that meet LSR and distance criteria without doing any parsing of the genome annotation files.
Other changes
1 Updated all most major dependencies
2. Updated caching for Streamlit 1.26
3. Updated GC and position methods for Biopython 1.8.1
4. Replaced append methods with concat methods for Pandas 2.1.1
5. Output data is now gzipped
6. Updated Dockerfile to use minimamba base image
v0.3.6
v0.3.5
Fixes to deal with inconsistent GTF file formatting. GTF is a problematic file format. see gtfutils documentation for more information about all the weird ways a gtf file can be formatted.
- Added a check for #gtf-version or #gff-version file headers
- Better handling of blank attributes in GTF files
- Un-parsable attributes in GTFs now result in a warning rather than an error.
v0.3.4
- updated the grid search parameters
v0.3.3
- checks for restriction site now doesnot remove targets from indexing, thus improves off-target checks at genome level
v0.3.2
- Added fasta/gff/gft options to the web app
v0.3.1
- Added multiprocessing support for Doench scoring of guides
- Refactored doench_featurization.py to calculate nucleotide features more efficiently
- Modified cfd_score_calculator.py to only open the scoring matrix from the file once for increased performance
- Simplified logging for target scoring modules
0.3.0
Added functions/features to:
- Select leven and hamming edit distance. Default is hamming.
- Subset output by locus_tag
- Predict efficiency score for NGG pam based on Doench et al 2016.
- Predict Cutting Frequency Determination (CFD) scores