-
Notifications
You must be signed in to change notification settings - Fork 26
Hook up sha2 instruction and detect sha-supporting cpu #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@newpavlov are you taking new PRs here? |
@sakridge I can potentially review/merge your code (which at first glance looks fine to me), but we need @newpavlov to cut another release |
@tarcieri that would be awesome. thanks |
Any update on this? |
@sakridge I can review more thoroughly soon |
@sakridge I think we can do it directly in the Unfortunately without this method internal state of a hash function will not be kept in SIMD registers (I guess because |
Hi, no problem, thanks for the reply.
It sounds like we cannot for now given all the obstacles that you listed. Maybe we should go with something like this change for now which is ready than wait for all those things to be fixed. Can you also upload your change to a branch in whatever state it is in?
My benchmarking of this change shows much higher performance on ryzen-based parts. Can you give an example where it is regressed? |
To help push the intrinsic versio forward I created a version of |
I have also created a pure rust port here: RustCrypto/hashes#90 |
@dignifiedquire I tried your pure rust PR, it seems about 30% slower than this version, at least for the case I tested. |
Interesting, how did you test? |
I am definitely a huge fan of "pure Rust" as well, and would honestly prefer slightly slower To a certain extent this has some risks around constant time operation. But let me be blunt: ASM blobs have failed in practice over and over and over. They are inscrutable. They are a pain in the ass to link. And they are, for the most part, a fixed object which rarely see improvements due to the inscrutability, except when they have catastrophic security breakages. As far as asm blobs go, ring is the go-to crate in the Rust ecosystem, and I don't see any need for this project to compete with that (and perhaps in saying that, I'm also saying the concept of The goal of this org is "cryptographic algorithms written in pure Rust" and that is definitely my personal preference. |
I am curious where you are getting the 30% difference from, these are numbers I am getting on a AMD 3970X: Pure rust
Asm (this PR)
|
I had tested a bit of a pathological case. That is the benchmark, it's essentially a loop of hashing:
Maybe not an important case for most people, but our project is relying on this being as fast as possible. I was testing on AMD 2950x. |
It looks like |
Will close in favor of RustCrypto/hashes#167. |
Newer AMD ryzen-based cpus support sha2 hardware instructions. This hooks up that instruction and detects using the relevant cpuid flag.
This gives about 2.7x performance improvement on AMD Threadripper 1900x and 2950x.