-
Notifications
You must be signed in to change notification settings - Fork 85
(don't merge) Re-add siphash #222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(don't merge) Re-add siphash #222
Conversation
cbits/siphash.h
Outdated
#include <stdint.h> | ||
#include <stdlib.h> | ||
|
||
typedef uint64_t u64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't use these typedefs, they barely save any typing.
hashable.cabal
Outdated
@@ -61,6 +61,15 @@ flag random-initial-seed | |||
manual: True | |||
default: False | |||
|
|||
|
|||
flag sse2 | |||
description: Do we want to assume that a target supports SSE 2? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use SSE2 instruction set. Available on virtually any x86 CPU made since 2005.
i.e. don't ask, tell.
For sse4
write something similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, the latest text
version assumes that SSE2 is always available on x64, see e.g. https://hackage.haskell.org/package/text-1.2.5.0/src/cbits/cbits.c
#if defined(__x86_64__)
/* All the intrinsics used here are from SSE2,
* so every x86_64 CPU supports them.
*/
I actually don't care if i386 is slow. I don't have machines to test it anyway, so being simple on these is a virtue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I take from this that sse2 can be enabled without flag, and sse4 probably can be enabled by default at this point in time.
(I still need to do this btw).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sse2 on x64. On i386 i'd rather not have conditional branches at all, as they are hard to test nowadays.
You need to include includes in |
I introduced |
sorry Oleg I didn't already expect a review, I'll address all your comments. |
cbits/siphash.c
Outdated
return buffered + (len & 7); | ||
} | ||
|
||
b |= ((uint64_t) totallen) << 56; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there a variant of siphash which doesn't force to include total length? This makes it uncomposable. Note how FNV-1 is hashing a stream, we just feed it data. (And also md5,sha1.sha256 etc.)
To repeat my previous comment, rebase on current master
and try to contain the changes inside LowLevel
module. It's not good that you need to reimplement how lazy test and bytestring hashable instances, you shouldn't. Int, bytearray/ptr buffer hash primitives should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I eliminated this totallen, I think this is NIH code. (can't find it in the paper)
It's still not easily compose-able because siphash has an internal state of 4 64 bit words.
These need to be initialized with k0 and k1, but after this step it sortoff becomes like fnv if we keep track of that internal state separatly and you can fold over it (as long as you keep track of the state).
I still need to test this out against that rechunck property test.
I think it's possible to express everything in terms of bytearray#, although I'm not sure how to get around the initilization/finilization steps.
8d8d99e
to
3a6f2b6
Compare
8be2e6c
to
080759a
Compare
This reverts commit d65f9de. Re-add fnv Make it sortoff work with siphash except linking Make it link Remove typedef Add siphash.h to includes folder Fix golden test for siphash If we can't make siphash as fast as fnv, we probably want to support both in a final version. But I'll do that later. Fix tests for chuncks in bytestring (I just forgot to port a bunch of code) Fix lazy text test suite as well Delete example file Rebase on master, make test suite pass once more
080759a
to
4d3fa6a
Compare
this is just the old implementation of siphash. with some tweaks to make it more streamable (still not great but good enough for benchmarks). -- squash log Use capi if available for offset Add more capi calls Flip expectation and actual args it showed up wrongly in the test suite Add more capi, add TODO Revert Hashable/Class to upstream master trying to contain changes to just that module Fix regression (once more) Don't expose the old functions Fix capi issues. Apprantly half of these functions weren't defined in the header. Tell people to use sse instead of asking Apply clang format Add siphash to benchmarks Add initializeState function, this will capture initizlie don't know how to do finalize yet Make it compile again (apparntly k1 and salt were used interchanbly) that's not a good idea Clean out the c code a bit. It was really strange. Make it build.. again? re-add header file I guess this will just keep on bein inconsistent remove cbits *.h from benchamark cabal file OMG that worked, lazy text now bytearray man, I doubted myself every step along the way, maybe I can do this? Draft for doing bytestring Make some headway on the regession tests Make testsuite pass once more Put in rounds as an argument Add compression round as args Add some common sense tests on previous state fails of course, not sure what's happening here Import <$> I guess different chars can also produce same hashes. Cleanup C-api for a bit it's still utterly broken however. I guess I need to try it out in C and see why the state isn't changing. Chunck -> Chunk Get rid of more c code idk maybe this do something with the sse stuff (after reading ghci comment) Figured out why no compression (lol) Update 64bit text once more Better explain 0 Split lines Lazy text can be big as well don't funroll by hand, add comments explaing what's going on Use properties instead of hardcoded test cases for statefullness Add prefix based tests as well hmm not so ez for text this also makes me doubt the bytestring implementaiton. I should check if with larger arbitrary instances that doesn't fall over (text tends to have larger bytestring representations then bytestring) Guess we really did solve the bytestring implementation This should pass CI Fix imports Fix ci for ghc 8.2.2 Add icescream cone Try fix ancient ghc Attempt fix ancient ghc II expose bytestring dependency ancient ghc import pure as well diddn't realize my style was so modern fix parse error maybe this will make conditional true? Fix properties for ancient ghc Fix parse error on imports add bytestring builder everywhere maybe solve cabal pain like this? disable property for ancient ghc try fix ancient ghc IX
ae69a36
to
b169fe9
Compare
I'm unsubscribing from this PR. Ping me when there's something to see again. |
we use xxhash now, I don't see forward for adding siphash |
This is a draft of getting siphash back into hashable.
This is the commit that removed it: d65f9de
I intend to benchmark this implementation, and also https://github.com/google/highwayhash#siphash
These things are left to be done:
See related issue: haskell-unordered-containers/unordered-containers#319