Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MD5_std.c: Use HH2() also in its 2x interleaved code #5636

Open
wants to merge 1 commit into
base: bleeding-jumbo
Choose a base branch
from

Conversation

magnumripper
Copy link
Member

Closes #5635

@magnumripper
Copy link
Member Author

@solardiz I have no idea what formats use this? My only testing was adding a temprary #warning showing that the code was indeed compiled in, then a full -test=0 -format=cpu. And bots seem happy. Consequently I don't know if there was a boost or regression...

Copy link
Member

@solardiz solardiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also change step 48 to use HH2. The corresponding non-interleaved version has HH2 there.

@solardiz
Copy link
Member

I have no idea what formats use this?

In scalar builds, the md5crypt format uses this. Also some/most/all(?) dynamic formats that use MD5 will probably use this.

Looking at the dynamic formats stuff, quite possibly this is also frequently used even in SIMD builds, in cases where SIMD-enabled MD5 is somehow unusable/unused by a given dynamic format.

The main md5crypt format will have X2 in its algorithm name. Unfortunately, it looks like this reporting isn't propagated into the dynamic formats.

My only testing was adding a temprary #warning showing that the code was indeed compiled in

Was that in scalar or SIMD builds?

@magnumripper
Copy link
Member Author

My only testing was adding a temprary #warning showing that the code was indeed compiled in

Was that in scalar or SIMD builds?

SIMD, and I too think the dynamic formats use this a lot.

@magnumripper
Copy link
Member Author

magnumripper commented Dec 25, 2024

I fixed step 48 now but testing with md5crypt indicate a regression (not bc step 48 but the whole thing) with 1-2% on M1 macbook. I can't remember whether the original change in the same file was ever benchmarked. Very slight regression (or just noise) seen on raw-md5. This should be benchmarked on some actual non-SIMD arch though.

EDIT: Obviously non-simd build now

@magnumripper
Copy link
Member Author

md5crypt .. regression .. with 1-2% on M1

3% regression on super single thread or 8 threads, but 0-0.5% boost running 32 threads.

@solardiz
Copy link
Member

Besides speeds, I'd also look at size MD5_std.o.

This should be benchmarked on some actual non-SIMD arch though.

I suggest you try on cfarm SPARC machines https://portal.cfarm.net/machines/list/ as we don't have SIMD code for SPARC. On the other hand, the newer ones of those have MD5 instructions used by OpenSSL, so e.g. our md5crypt-long performs better than our md5crypt there.

Oh, also they have POWER7, where we broke our SIMD when we started requiring POWER8+ because of our use of 64-bit vector elements. We should ideally fix that separately, but meanwhile scalar code is used on POWER7.

Copy link
Member

@solardiz solardiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The source code change looks good to me now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MD5_std.c should use HH2() also in its 2x interleaved code
2 participants