Skip to content

Add x86 intrinsics support for sha1 and sha2 #167

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jun 15, 2020
Merged

Conversation

newpavlov
Copy link
Member

@newpavlov newpavlov commented Jun 11, 2020

Written using #90 and the software implementation as a reference.

This PR also simplifies handling of ARM backend and deprecates asm-aarch64 feature.

@newpavlov newpavlov changed the title sha2: add x86 intrinsics support Add x86 intrinsics support for sha1 and sha2 Jun 11, 2020
@dignifiedquire
Copy link
Member

dignifiedquire commented Jun 11, 2020

@newpavlov I agree, your code looks much nicer :)

Some thoughts for my specific use cases

(1) Bringing back dynamic detection would be great, often times the code will be built once and then run on different cpus in my case.
(2) now that input_blocks is a thing again it will be more efficient to process multiple blocks at once, like I am doing here: https://github.com/filecoin-project/rust-fil-proofs/blob/master/sha2raw/src/sha256_intrinsics.rs#L12 Looks like you already did this
(3) currently all implementations do have some overhead and additional padding in my measurements, which is why I ended up implementing this "raw" construction https://github.com/filecoin-project/rust-fil-proofs/blob/master/sha2raw/src/sha256.rs#L25 which avoids going through the block_buffer and can only process appropriately sized and pre padded chunks. It would be great to find a way such that passing in slices of matching data could have the same speed.

@Fleshgrinder
Copy link

I second the dynamic detection request. This is important for software that is installed on many different systems.

@newpavlov
Copy link
Member Author

Bringing back dynamic detection would be great

The code is written with dynamic dispatch in mind, I just need to write CPUID detection using getrandom code as a reference (probably I will move it into a separate crate).

currently all implementations do have some overhead and additional padding in my measurements

If you don't need padding, then I think the easiest way will be to use the compression functions directly, which can be accessed after enabling the compress feature. I guess we also could add imports for initialization constants.

@newpavlov newpavlov marked this pull request as ready for review June 11, 2020 19:04
@newpavlov newpavlov requested a review from tarcieri June 11, 2020 19:04
@newpavlov
Copy link
Member Author

newpavlov commented Jun 11, 2020

Runtime detection is back! If you interested, it's powered by the cpuid-bool crate.

@linkmauve
Can you verify that crates are not broken on ARM?

Copy link
Member

@dignifiedquire dignifiedquire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice, thank you!

@newpavlov newpavlov merged commit a380194 into master Jun 15, 2020
@newpavlov newpavlov deleted the sha2_intrinsics branch June 15, 2020 22:48
pub fn compress(state: &mut [u32; 5], blocks: &[[u8; 64]]) {
// TODO: Replace with https://github.com/rust-lang/rfcs/pull/2725
// after stabilization
if cpuid_bool::cpuid_bool!("sha", "sse2", "ssse3", "sse4.1") {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code breaks platforms that don't support CPUID (e.g. SGX). You should be using https://doc.rust-lang.org/std/macro.is_x86_feature_detected.html, which is the standard way all platforms support feature detection.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that is_x86_feature_detected is not available in libcore and the linked PR proposes changes to change it. See #183 for discussion as how to fix SGX.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants