Releases: sourmash-bio/sourmash_plugin_directsketch
v0.5.0
This release include some major functionality changes:
- Download via NCBI REST API for
gbsketch
. Input file no longer usesftp_path
. - add
--n-simultaneous-downloads
parameter, and allow up to 10 if using an API KEY withgbsketch
- Allow merged & ranged sketching in
urlsketch
- Enable building skipmer signatures (
skipm1n3
,skipm2n3
; sourmash experimental addition)
And some nice UX updates:
- use input csv as default base filename for
--fail
and--checksum-fail
- ignore extra columns in
gbsketch
input CSV
It also fixes a bug where directsketch
zips did not properly record n_hashes
and thus did not get properly summarized via sourmash sig summarize
.
This release includes first content contributions from @ctb 🎉 .
What's Changed
Functionality updates
- MRG: modify n simultaneous downloads; update buildutils by @bluegenes in #154
- MRG: add skipmer sketching by @bluegenes in #159
- MRG: fix manifest n_hashes + test by @bluegenes in #171
- MRG: Enable merged sigs, sequence range selection in
urlsketch
by @bluegenes in #161 - MRG: batched zip reporting - notify after finishing batch to be clearer by @bluegenes in #179
- MRG: download via NCBI REST API by @bluegenes in #181
- MRG: doc rerunning failures by @bluegenes in #184
- MRG: set n-simultaneous-downloads to 9 if api key provided by @ctb in #194
- MRG: provide default failed filenames based on CSV by @ctb in #195
- MRG: ignore extra columns in gbsketch input CSV by @ctb in #188
Developer updates
- MRG: remove
BuildParams
, filter via manifest /Select
approaches by @bluegenes in #127 - try fixing ci by @bluegenes in #157
- upd sourmash core to 0.17.2 by @bluegenes in #156
- bump version; add ctb to authors by @bluegenes in #199
dependabot
- Bump tokio from 1.40.0 to 1.41.0 by @dependabot in #130
- Bump pyo3 from 0.22.5 to 0.23.3 by @dependabot in #151
- Bump tokio from 1.41.0 to 1.42.0 by @dependabot in #150
- Bump reqwest from 0.12.8 to 0.12.9 by @dependabot in #136
- Bump regex from 1.11.0 to 1.11.1 by @dependabot in #133
- Bump tokio-util from 0.7.12 to 0.7.13 by @dependabot in #149
- Bump anyhow from 1.0.90 to 1.0.94 by @dependabot in #152
- Bump serde_json from 1.0.132 to 1.0.134 by @dependabot in #162
- Update pytest-cov requirement from <6.0,>=2.12 to >=2.12,<7.0 by @dependabot in #135
- Bump anyhow from 1.0.94 to 1.0.95 by @dependabot in #163
- Bump reqwest from 0.12.9 to 0.12.12 by @dependabot in #169
- Bump serde from 1.0.216 to 1.0.217 by @dependabot in #167
- Bump tokio from 1.42.0 to 1.43.0 by @dependabot in #176
- Bump pyo3 from 0.23.3 to 0.23.4 by @dependabot in #178
- Bump serde_json from 1.0.134 to 1.0.135 by @dependabot in #177
- Bump serde_json from 1.0.135 to 1.0.137 by @dependabot in #190
- Bump getset from 0.1.3 to 0.1.4 by @dependabot in #197
- Bump openssl from 0.10.68 to 0.10.69 by @dependabot in #196
Full Changelog: v0.4.1...v0.5.0
v0.4.1
What's Changed
This release includes a bugfix where using a zipfile without an explicit path would yield an error (#118). The remaining changes are internal, including adding parameter string validation and improving the sketching utilities for potential use in other plugins.
- MRG: refactor sketching utilities by @bluegenes in #112
- MRG: validate param strings by @bluegenes in #114
- MRG: update sourmash core to 0.16.0 by @bluegenes in #115
- MRG: fix bug in zip paths if output provided in current dir by @bluegenes in #121
- bump to 0.4.1 by @bluegenes in #128
dependabot
- Bump reqwest from 0.12.7 to 0.12.8 by @dependabot in #110
- Bump futures from 0.3.30 to 0.3.31 by @dependabot in #111
- Bump pyo3 from 0.22.3 to 0.22.5 by @dependabot in #122
- Bump anyhow from 1.0.89 to 1.0.90 by @dependabot in #126
- Bump serde_json from 1.0.128 to 1.0.132 by @dependabot in #124
- Bump openssl from 0.10.66 to 0.10.68 by @dependabot in #125
Full Changelog: v0.4.0...v0.4.1
v0.4.0
This release introduces two new parameters:
--checksum-failures
- an output file to log any failures with the checksum file download and parsing or any md5sum mismatches. Required forgbsketch
--batch-size
- enables writing smaller, batched zipfiles. This is recommended for large database generation, as batches allow restart after unexpected failure. It also should address some issues arising from extremely large zips.
Under the hood, this release also introduces a standardized sketching building framework that may be useful outside of this plugin.
What's Changed
- MRG: report checksum file download failures by @bluegenes in #92
- MRG: add generic support for signature building by @bluegenes in #101
- MRG: improve restart by optionally writing batched zipfiles by @bluegenes in #102
- MRG: fix ci by moving install from
mambaforge
-->miniforge
by @bluegenes in #106 - bump to v0.4.0 by @bluegenes in #109
Dependabot
sourmash-core
:- Bump sourmash from 0.14.0 to 0.14.1 by @dependabot in #62
- Bump sourmash from 0.14.1 to 0.15.0 by @dependabot in #75
- Bump sourmash from 0.15.0 to 0.15.1 by @dependabot in #87
- Bump sourmash from 0.15.1 to 0.15.2 by @dependabot in #103
simple-error
:- Bump simple-error from 0.3.0 to 0.3.1 by @dependabot in #59
reqwest
:- Bump reqwest from 0.12.4 to 0.12.5 by @dependabot in #60
- Bump reqwest from 0.12.5 to 0.12.7 by @dependabot in #88
lazy_static
:- Bump lazy_static from 1.4.0 to 1.5.0 by @dependabot in #61
pyo3
:- Bump pyo3 from 0.21.2 to 0.22.0 by @dependabot in #64
- Bump pyo3 from 0.22.0 to 0.22.1 by @dependabot in #66
- Bump pyo3 from 0.22.1 to 0.22.2 by @dependabot in #73
- Bump pyo3 from 0.22.2 to 0.22.3 by @dependabot in #99
serde_json
:- Bump serde_json from 1.0.117 to 1.0.119 by @dependabot in #63
- Bump serde_json from 1.0.119 to 1.0.120 by @dependabot in #67
serde
:- Bump serde from 1.0.203 to 1.0.204 by @dependabot in #65
tokio
:- Bump tokio from 1.38.0 to 1.38.1 by @dependabot in #74
- Bump tokio from 1.38.1 to 1.40.0 by @dependabot in #91
pytest
:- Update pytest requirement from <8.3.0,>=6.2.4 to >=6.2.4,<8.4.0 by @dependabot in #71
openssl
:- Bump openssl from 0.10.64 to 0.10.66 by @dependabot in #72
regex
:- Bump regex from 1.10.5 to 1.10.6 by @dependabot in #80
- Bump regex from 1.10.6 to 1.11.0 by @dependabot in #104
anyhow
:- Bump anyhow from 1.0.86 to 1.0.89 by @dependabot in #100
Full Changelog: v0.3.2...v0.4.0
v0.3.2
What's Changed
- MRG: update to sourmash-rs core r0.14.0 by @ctb in #52
- MRG: set zip permissions to 644 by @bluegenes in #53
- MRG: enable dayhoff, hp sketching by @bluegenes in #55
- bump version to 0.3.2 by @bluegenes in #54
Dependabot
-
Bump tokio from 1.37.0 to 1.38.0 by @dependabot in #46
-
Bump serde from 1.0.202 to 1.0.203 by @dependabot in #45
-
Bump regex from 1.10.4 to 1.10.5 by @dependabot in #51
New Contributors
Full Changelog: v0.3.1...v0.3.2
v0.3.1
- fixes URL formatting bug in failure output
- adds new
urlsketch
command - changes failure output format for both
gbsketch
,urlsketch
. The new header is:accession,name,moltype,md5sum,download_filename,url
, which matches theurlsketch
input format.
What's Changed
- fix url printing by @bluegenes in #36
- add
urlsketch
command by @bluegenes in #34
Dependabot and version updates
- Bump anyhow from 1.0.83 to 1.0.86 by @dependabot in #39
- Bump serde from 1.0.201 to 1.0.202 by @dependabot in #38
- Bump camino from 1.1.6 to 1.1.7 by @dependabot in #37
- bump version to 0.3.1 by @bluegenes in #43
Full Changelog: v0.3.0...v0.3.1
v0.3.0
This release fixes a bug where the wrong version may be downloaded #27.
The input format has changed slightly! Required columns are now: accession,name,ftp_path
. ftp_path
column name must be present, but column can be empty.
- if
ftp_path
is provided, it is used as the path for finding files associated with the accession. Otherwise,gbsketch
will build theftp_path
from the accession.
What's Changed
- optionally use ftp_path input for
gbsketch
by @bluegenes in #29 - prevent unneccesary downloads by also setting genomes-only/proteomes-only via params if not keeping fastas by @bluegenes in #30
- do not require signature output file if not sketching by @bluegenes in #31
Full Changelog: v0.2.3...v0.3.0
v0.2.3
What's Changed
- fix ci by @bluegenes in #6
- revert channel sizes by @bluegenes in #23
- bump version to 0.2.3 by @bluegenes in #24
Full Changelog: v0.2.2...v0.2.3
v0.2.2
Bugfix Release
- fix a bug where md5sum file error caused
directsketch
to hang
What's Changed
- fix error handling by @bluegenes in #19
- Bump serde from 1.0.200 to 1.0.201 by @dependabot in #12
- Bump anyhow from 1.0.82 to 1.0.83 by @dependabot in #11
- Bump serde_json from 1.0.116 to 1.0.117 by @dependabot in #10
New Contributors
- @dependabot made their first contribution in #12
Full Changelog: v0.2.1...v0.2.2
v0.2.1
What's Changed
- changed progress reporting back from 5% --> 1%; adjusted to reflect start times better
- remove interval delay by @bluegenes in #16
Full Changelog: v0.2.0...v0.2.1
v0.2.0
Major changes:
- #8 - actually use tokio threading, fully asynchronous file downloading + writing
- #9 - download md5sums and check them prior to sketching
- #14 - make sure we return an error if the md5sum can't be downloaded (rather than just continuing)
- #15 - safer tokio thread/runtime setting while still allowing pytest to run multiple iterations at once
Benchmarking shows this structure is much faster
software/version | command | acc details | time | max RAM |
---|---|---|---|---|
v0.1.0 | gbsketch |
9 fungal | 6min | 156 MB |
main (v0.2.0) | gbsketch |
9 fungal | 10s | 156 MB |
v0.1.0 | gbsketch |
49 fungal | 58min | 1.5 GB |
main (v0.2.0) | gbsketch |
49 fungal | 1min 26s | 1.6GB |
main(v0.2.0) | gbsketch |
243 fungal | 4min | 1.16GB |
What's Changed
- check md5sums by @bluegenes in #9
- WIP: fully async with tokio threading by @bluegenes in #8
- return error if downloading md5sums fails by @bluegenes in #14
- safer tokio thread setting by @bluegenes in #15
Full Changelog: v0.1.0...v0.2.0