Skip to content

Releases: sourmash-bio/sourmash_plugin_directsketch

v0.5.0

29 Jan 00:56
5870f10
Compare
Choose a tag to compare

This release include some major functionality changes:

  • Download via NCBI REST API for gbsketch. Input file no longer uses ftp_path.
  • add --n-simultaneous-downloads parameter, and allow up to 10 if using an API KEY with gbsketch
  • Allow merged & ranged sketching in urlsketch
  • Enable building skipmer signatures (skipm1n3, skipm2n3; sourmash experimental addition)

And some nice UX updates:

  • use input csv as default base filename for --fail and --checksum-fail
  • ignore extra columns in gbsketch input CSV

It also fixes a bug where directsketch zips did not properly record n_hashes and thus did not get properly summarized via sourmash sig summarize.

This release includes first content contributions from @ctb 🎉 .

What's Changed

Functionality updates

  • MRG: modify n simultaneous downloads; update buildutils by @bluegenes in #154
  • MRG: add skipmer sketching by @bluegenes in #159
  • MRG: fix manifest n_hashes + test by @bluegenes in #171
  • MRG: Enable merged sigs, sequence range selection in urlsketch by @bluegenes in #161
  • MRG: batched zip reporting - notify after finishing batch to be clearer by @bluegenes in #179
  • MRG: download via NCBI REST API by @bluegenes in #181
  • MRG: doc rerunning failures by @bluegenes in #184
  • MRG: set n-simultaneous-downloads to 9 if api key provided by @ctb in #194
  • MRG: provide default failed filenames based on CSV by @ctb in #195
  • MRG: ignore extra columns in gbsketch input CSV by @ctb in #188

Developer updates

dependabot

Full Changelog: v0.4.1...v0.5.0

v0.4.1

22 Oct 01:21
fafdb7a
Compare
Choose a tag to compare

What's Changed

This release includes a bugfix where using a zipfile without an explicit path would yield an error (#118). The remaining changes are internal, including adding parameter string validation and improving the sketching utilities for potential use in other plugins.

dependabot

Full Changelog: v0.4.0...v0.4.1

v0.4.0

04 Oct 18:58
b1afbcd
Compare
Choose a tag to compare

This release introduces two new parameters:

  • --checksum-failures - an output file to log any failures with the checksum file download and parsing or any md5sum mismatches. Required for gbsketch
  • --batch-size - enables writing smaller, batched zipfiles. This is recommended for large database generation, as batches allow restart after unexpected failure. It also should address some issues arising from extremely large zips.

Under the hood, this release also introduces a standardized sketching building framework that may be useful outside of this plugin.

What's Changed

Dependabot

Full Changelog: v0.3.2...v0.4.0

v0.3.2

14 Jun 21:09
81242ac
Compare
Choose a tag to compare

What's Changed

Dependabot

New Contributors

  • @ctb made their first contribution in #52

Full Changelog: v0.3.1...v0.3.2

v0.3.1

21 May 07:10
ef97067
Compare
Choose a tag to compare
  • fixes URL formatting bug in failure output
  • adds new urlsketch command
  • changes failure output format for both gbsketch, urlsketch. The new header is: accession,name,moltype,md5sum,download_filename,url, which matches the urlsketch input format.

What's Changed

Dependabot and version updates

Full Changelog: v0.3.0...v0.3.1

v0.3.0

13 May 23:09
Compare
Choose a tag to compare

This release fixes a bug where the wrong version may be downloaded #27.

The input format has changed slightly! Required columns are now: accession,name,ftp_path. ftp_path column name must be present, but column can be empty.

  • if ftp_path is provided, it is used as the path for finding files associated with the accession. Otherwise, gbsketch will build the ftp_path from the accession.

What's Changed

  • optionally use ftp_path input for gbsketch by @bluegenes in #29
  • prevent unneccesary downloads by also setting genomes-only/proteomes-only via params if not keeping fastas by @bluegenes in #30
  • do not require signature output file if not sketching by @bluegenes in #31

Full Changelog: v0.2.3...v0.3.0

v0.2.3

10 May 03:40
872133b
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.2.2...v0.2.3

v0.2.2

09 May 17:05
e1fa2fa
Compare
Choose a tag to compare

Bugfix Release

  • fix a bug where md5sum file error caused directsketch to hang

What's Changed

New Contributors

Full Changelog: v0.2.1...v0.2.2

v0.2.1

08 May 22:01
dc9e256
Compare
Choose a tag to compare

What's Changed

  • changed progress reporting back from 5% --> 1%; adjusted to reflect start times better
  • remove interval delay by @bluegenes in #16

Full Changelog: v0.2.0...v0.2.1

v0.2.0

08 May 18:29
Compare
Choose a tag to compare

Major changes:

  • #8 - actually use tokio threading, fully asynchronous file downloading + writing
  • #9 - download md5sums and check them prior to sketching
  • #14 - make sure we return an error if the md5sum can't be downloaded (rather than just continuing)
  • #15 - safer tokio thread/runtime setting while still allowing pytest to run multiple iterations at once

Benchmarking shows this structure is much faster

software/version command acc details time max RAM
v0.1.0 gbsketch 9 fungal 6min 156 MB
main (v0.2.0) gbsketch 9 fungal 10s 156 MB
v0.1.0 gbsketch 49 fungal 58min 1.5 GB
main (v0.2.0) gbsketch 49 fungal 1min 26s 1.6GB
main(v0.2.0) gbsketch 243 fungal 4min 1.16GB

What's Changed

Full Changelog: v0.1.0...v0.2.0