This repository has been archived by the owner on May 17, 2024. It is now read-only.
Releases: datafold/data-diff
Releases · datafold/data-diff
Quickfix for v0.3.0
What's Changed
Full Changelog: v0.3.0...v0.3.1
v0.3.0 - New algorithm for in-db diffing (joindiff) + tons of new features and bugfixes!
Big points:
- Added a new algorithm for in-db diffing that uses OUTER JOIN, called "joindiff".
- Much faster than the original "hashdiff" algorithm!
- Automatically chosen if both dbs are the same
- Validates that the key column is unique and contains no NULLs (joindiff only)
- Explicitly switch between algorithms using the
--algorithm
parameter.
- New feature to materialize joindiff results to DB
- New feature that diffs the schemas when both dbs are the same
- Added DuckDB support (thanks @jardayn!)
- Better support for alphanumerics
- Better support for boolean types
- Added
--version
switch - New and improved database and query interface, named "sqeleton"
- Tons of bugfixes and improvements!
What's Changed
- Join-diff (in-db) + new query builder by @erezsh in #242
- Bugfix: Joindiff crashed when no numeric columns were used. by @erezsh in #255
- Deprecate use of FixedAlphanum by @erezsh in #254
- Refactor tests oct2022 by @erezsh in #253
- General tests now include Presto, Trino & Vertica; Includes small fixes by @erezsh in #256
- Added --materialize-all-rows switch + tests by @erezsh in #258
- Various small fixes and refactors by @erezsh in #260
- Downgrade mysql-connector-python to 8.0.29 by @erezsh in #262
- Update documentation link by @williebsweet in #263
- Small changes by @erezsh in #264
- Added link on how to get a slack invite by @jardayn in #265
- link to docs and incorporate roman/gerard feedback by @leoebfolsom in #266
- Tiny Cleanup by @erezsh in #267
- tests for unique key constraints (if possible) instead of always actively validating (+ tests) by @erezsh in #257
- Attempt to fix PR #269 by @erezsh in #272
- Contrib improvements + Fixed Test by @jardayn in #269
- Refactor dialect by @erezsh in #271
- Tests: Improvements to CI flow + fixes by @erezsh in #274
- Bugfix in alphanums (reported by Guarav Singh) by @erezsh in #277
- Fix databricks by @pik94 in #273
- Added support for Boolean types by @erezsh in #282
- Fixed broken "How To Use" links in README. by @daniel-leicht in #290
- Fix for issue #286 by @erezsh in #291
- Materialize: rename and reorder columns by @erezsh in #287
- Revised CLI output to be more understandable and detailed by @erezsh in #292
- New DB Driver guide update by @jardayn in #288
- Duckdb driver for Issue #176 by @jardayn in #276
- Update typing of TableSegment().count() by @MattDelac in #293
- Refactor common database interface into Sqeleton (databases, queries) by @erezsh in #285
- Added DDB as an extra by @jardayn in #296
- More Sqeleton refactoring by @erezsh in #295
- Added InfoTree as a more descriptive alternative to .stats by @erezsh in #297
- Refactor tests to use insert_rows_in_batches(), instead of internally… by @erezsh in #299
- CLI: Better errors + tiny bugfix by @erezsh in #303
- Rudderstack poc by @kylemcnair in #298
- add databases we support to readme by @leoebfolsom in #309
- Nov22 sqeleton refactor by @erezsh in #308
- Fix readme link by @dlawin in #310
- List tables from schema by @erezsh in #311
- Tests: Set bisection_factor=2 for much faster tests; Fix random failures in test_string_keys by @erezsh in #312
- Nov24 - Small fixes to tests by @erezsh in #313
- Adjustments for PR #314 by @erezsh in #315
- return all duplicated rows by @pik94 in #314
- Cleanup by @erezsh in #320
- Added version and --version switch (issue #318) by @erezsh in #319
- data-diff now uses database A's now instead of cli's now. by @erezsh in #306
- extract methods for stats by @dlawin in #300
- connect(): Added support for shared connection; Database.is_closed property by @erezsh in #323
- Better error messages in databases; Default database in clickhouse is now 'default'. by @erezsh in #325
- diff_tables() now accepts all JoinDiffer params by @erezsh in #326
- CLI: Automatically choose joindiff is dbs are the same (don't rely just on syntax) by @erezsh in #328
- Add version module and add version to tracking by @kylemcnair in #327
- Dec2 cleanup by @erezsh in #329
- fix link to docs by @leoebfolsom in #330
- Fix _normalize_table_path to always return a pair by @erezsh in #333
New Contributors
- @williebsweet made their first contribution in #263
- @jardayn made their first contribution in #265
- @daniel-leicht made their first contribution in #290
- @MattDelac made their first contribution in #293
- @kylemcnair made their first contribution in #298
- @dlawin made their first contribution in #310
Full Changelog: v0.2.8...v0.3.0
Let us know what you think in Discussions!
v0.3.0rc2 - New algorithm for in-db diffing (joindiff) + features and bugfixes
Pre-release
Big points
- Add new algorithm for in-db diffing that uses OUTER JOIN, called "joindiff".
- New feature to materialize joindiff results to DB
- A bunch of bugfixes and improvements
What's Changed
- Join-diff (in-db) + new query builder by @erezsh in #242
- Bugfix: Joindiff crashed when no numeric columns were used. by @erezsh in #255
- Deprecate use of FixedAlphanum by @erezsh in #254
- Refactor tests oct2022 by @erezsh in #253
- General tests now include Presto, Trino & Vertica; Includes small fixes by @erezsh in #256
- Added --materialize-all-rows switch + tests by @erezsh in #258
- Various small fixes and refactors by @erezsh in #260
- Downgrade mysql-connector-python to 8.0.29 by @erezsh in #262
- Update documentation link by @williebsweet in #263
- Small changes by @erezsh in #264
- Added link on how to get a slack invite by @jardayn in #265
- link to docs and incorporate roman/gerard feedback by @leoebfolsom in #266
- Tiny Cleanup by @erezsh in #267
- tests for unique key constraints (if possible) instead of always actively validating (+ tests) by @erezsh in #257
- Attempt to fix PR #269 by @erezsh in #272
- Contrib improvements + Fixed Test by @jardayn in #269
- Refactor dialect by @erezsh in #271
- Tests: Improvements to CI flow + fixes by @erezsh in #274
- Bugfix in alphanums (reported by Guarav Singh) by @erezsh in #277
- Fix databricks by @pik94 in #273
New Contributors
Full Changelog: v0.2.8...v0.3.0rc2
v0.2.8 - Bugfix in algorithm for an edge-case
What's Changed
- Bugfix in algorithm: Trigger download if the segment space is smaller than the bisection factor by @erezsh in #249
- v0.2.8 - Release PR by @erezsh in #251
Full Changelog: v0.2.7...v0.2.8
v0.2.7 - Better alphanumerics, better threading, and small fixes
What's Changed
- Support for varying alphanums, with special characters by @erezsh in #235
- Re-wrote threading to use a thread-pool + priority queue. by @erezsh in #238
- Added support for specifying db-name in CLI instead of URI when using --conf by @erezsh in #248
- Added validation for UUID columns (Also fixes issue #245) by @erezsh in #247
Full Changelog: v0.2.6...v0.2.7
v0.2.6 - Support for Clickhouse, Vertica, and various bugfixes
- Support of Clickhouse by @pik94 in #217
- add support of Vertica db by @pik94 in #231
- Fix for pip extras (e.g. pip install data-diff[snowflake]) by @erezsh in #232
- Fixed support for diffing columns of different names by @erezsh in #230
- Bugfix in TableSegment: Sampling now respects the 'where' clause (issue #221) by @erezsh in #224
Other changes
- Better error messages. Move some parsing to before the connects. Tests now only connect if being run. by @erezsh in #222
- Small bugfixes and refactor by @erezsh in #223
- Refactors and fixes by @erezsh in #227
Full Changelog: v0.2.5...v0.2.6
v0.2.5 - Alphanum key columns; Certificate auth in snowflake & presto
New features
- Support for alphanumeric key columns
- Support certificate authentication in snowflake and presto
- Various bugfixes
What's Changed
- Fixed docstring in diff_tables() (Issue #182) by @nklsw in #183
- Bugfix for Oracle - didn't properly handle .rounds attribute. by @erezsh in #184
- Added support for auto-detecting mutual columns, and using patterns in -c by @erezsh in #185
- Added new guide for implementing a database driver by @erezsh in #189
- Update issue templates by @erezsh in #192
- Create CODE_OF_CONDUCT.md by @erezsh in #193
- Update README.md by @kning in #181
- Bugfix for mutual columns feature (6a4c443) by @erezsh in #198
- [Tests] now using connect() instead of connect_to_uri(); refactor by @erezsh in #202
- Refactor - nicer regexp parsing; Trino now inherits from Presto by @erezsh in #205
- Add extra documentation on installing drivers for postgresql by @cfernhout in #206
- Update README.md by @glebmezh in #209
- Various fixes (issue #211, #208) by @erezsh in #212
- Fix for merging PR #187 by @erezsh in #214
- Cleanup by @erezsh in #215
- Added optional tracking by @erezsh in #213
- Cleanup by @erezsh in #216
- Presto snowflake enhancement by @matthiasekundayo-eb in #187
- Fix tests for BigQuery by @erezsh in #218
New Contributors
- @nklsw made their first contribution in #183
- @kning made their first contribution in #181
- @matthiasekundayo-eb made their first contribution in #187
Full Changelog: v0.2.4...v0.2.5
v0.2.4
Main changes
-
New features:
-
New drivers:
-
Optimization:
-
Reliability :
-
Bugfixes and other fixes:
- Fix for the occasional failure in tests in 3.7 by @erezsh in #153
- Removed snowflake from list of dependencies (only a dev dep) by @erezsh in #161
- Update Preql version to 0.2.16 by @erezsh in #166
- Create CONTRIBUTING.md by @erezsh in #164
- Initial support for running the tests for multiple databases (replacing TestWithConnection) by @erezsh in #167
- Tests now cover oracle, Redshift, snowflake and bigquery; Various fixes to said drivers. by @erezsh in #170
- Small fix for Oracle, for when a database isn't specified. by @erezsh in #173
- Fix for CLI + tests for CLI (issue #175) by @erezsh in #177
- Print configuration during debug, but with passwords redacted by @erezsh in #172
New Contributors
- @danthelion made their first contribution in #155
Full Changelog: v0.2.3...v0.2.4
v0.2.3 - Config files; Better UUID support.
- Added support for config files - specify the arguments to data-diff using a TOML file
- Added support for native UUIDs in Postgresql
What's Changed
- Fixed tests; bisection_threshold can now be inf by @erezsh in #134
- tests: parallel + snowflake, presto in CI + benchmark scripts by @sirupsen in #135
- Update README.md to include authenticator in Snowflake connection string by @franloza in #142
- Fix tests for PRs from contributors who don't have access to 'secrets'. by @erezsh in #147
- Corrections for PR #144 - fix UUID things by @erezsh in #148
- Fix UUID things by @pik94 in #144
- Added support for native UUIDs in postgresql. by @erezsh in #149
- Specify data-diff arguments using config files by @erezsh in #143
- Small Fixes by @erezsh in #151
New Contributors
Full Changelog: v0.2.2...v0.2.3
v0.2.2 - Support for UUIDs; Oracle schemas.
What's Changed
Main -
- Oracle: Added support for schemas (Issue #115) by @erezsh in #117
- [MySQL] Added varbinary by @erezsh in #132
- Support for UUID key column by @erezsh in #119
- Fix UUIDs + small fix for presto by @erezsh in #133
Also -
- Split Integer from Decimal to reduce casts in SQL. Added FractionalType. by @erezsh in #111
- tests: add bigint/int by @sirupsen in #126
- benchmark: add suite by @sirupsen in #125
Full Changelog: v0.2.1...v0.2.2