-
Notifications
You must be signed in to change notification settings - Fork 271
Let the differ choose a shared hashing algorithm #103
Comments
Agree. sha1 is also sometimes faster than md5, because some instruction-sets have specific instructions for it, but not for md5. crc32 is also generally faster than both of those. |
I think this should be broken down into 2 tasks:
I'd be happy to start working on this, if it hasn't been picked up already. |
@abhinav1912 it'd be awesome if you could pick this up! If you can do some research for (1) about what kinds of "alternate" negotiations we can do, and how you'd implement it, that'd be a great start before you start implementing it 👀 For example, for MSSQL, how can we do strings if we can't do md5 fast? |
@sirupsen good point, I'll start looking into the alternatives and probably create a table of algorithms for each database. Will reach out on the Slack channel to clarify a few questions though. |
That MD5 performance doesn't look that bad... I wonder why it came up so slow in our tests. Maybe the instance just didn't have enough CPU? Might be worth confirming that result first by writing a basic driver! |
The performance point raises another question: is there be some script/utility that can calculate perf metrics for the current build? If not, then this could be a good addition. |
@abhinav1912 working on it, will tag you in the PR — it's what generated what's in the README. Basically re-uses the type suite (otherwise usually benchmark suites get out of date) |
This issue has been marked as stale because it has been open for 60 days with no activity. If you would like the issue to remain open, please comment on the issue and it will be added to the triage queue. Otherwise, it will be closed in 7 days. |
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment and it will be reopened for triage. |
Right now we only support
md5
for hashing columns.However, for some databases that might not be feasible. For example, in
mssql
it's too slow, and sounds like Spanner only supports sha1.I think the solution should be to allow several implementations in subclasses of
Database
, and have the differ choose the best one that's shared between them. (or throw an error if there isn't one)See issues #51 and #99.
The text was updated successfully, but these errors were encountered: