-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MD5_std.c: Use HH2() also in its 2x interleaved code #5636
base: bleeding-jumbo
Are you sure you want to change the base?
MD5_std.c: Use HH2() also in its 2x interleaved code #5636
Conversation
@solardiz I have no idea what formats use this? My only testing was adding a temprary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also change step 48 to use HH2. The corresponding non-interleaved version has HH2 there.
In scalar builds, the md5crypt format uses this. Also some/most/all(?) dynamic formats that use MD5 will probably use this. Looking at the dynamic formats stuff, quite possibly this is also frequently used even in SIMD builds, in cases where SIMD-enabled MD5 is somehow unusable/unused by a given dynamic format. The main md5crypt format will have
Was that in scalar or SIMD builds? |
SIMD, and I too think the dynamic formats use this a lot. |
3056523
to
a847bd8
Compare
I fixed step 48 now but testing with md5crypt indicate a regression (not bc step 48 but the whole thing) with 1-2% on M1 macbook. I can't remember whether the original change in the same file was ever benchmarked. Very slight regression (or just noise) seen on raw-md5. This should be benchmarked on some actual non-SIMD arch though. EDIT: Obviously non-simd build now |
3% regression on super single thread or 8 threads, but 0-0.5% boost running 32 threads. |
Besides speeds, I'd also look at
I suggest you try on cfarm SPARC machines https://portal.cfarm.net/machines/list/ as we don't have SIMD code for SPARC. On the other hand, the newer ones of those have MD5 instructions used by OpenSSL, so e.g. our md5crypt-long performs better than our md5crypt there. Oh, also they have POWER7, where we broke our SIMD when we started requiring POWER8+ because of our use of 64-bit vector elements. We should ideally fix that separately, but meanwhile scalar code is used on POWER7. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The source code change looks good to me now.
Closes #5635