crypto/md5: improve ARM64 MD5 performance by optimizing ROUND3 function #69302

kdjdbbfk · 2024-09-06T02:33:26Z

This commit enhances the performance of the MD5 functionality on ARM64 architecture by optimizing the ROUND3 function in the md5block_arm64.s assembly file.

Refactored the ROUND3 macro to improve the computation order, introducing a new ROUND3FIRST macro to handle the initial calculation more efficiently.
Optimized the XOR operations in the ROUND3 macro to reduce unnecessary instructions and improve parallelism within the ARM64 architecture.

Performance testing was conducted on an ARM64 Linux machine using Go's benchmark tool. The benchmarks were run 10 times each to ensure statistical significance. The following results were observed:

Benchmark	Old Time (sec/op)	New Time (sec/op)	Change
Hash8Bytes-8	175.0ns ± 2%	175.0ns ± 1%	~
Hash1K-8	2.065µs ± 0%	2.060µs ± 0%	-0.22%
Hash8K-8	15.31µs ± 0%	15.29µs ± 0%	-0.11%
Hash8BytesUnaligned-8	174.0ns ± 1%	174.0ns ± 1%	~
Hash1KUnaligned-8	2.067µs ± 0%	2.059µs ± 0%	-0.41%
Hash8KUnaligned-8	15.44µs ± 0%	15.45µs ± 0%	~

In terms of throughput:

Benchmark	Old Throughput (B/s)	New Throughput (B/s)	Change
Hash8Bytes-8	43.58MiB/s ± 2%	43.69MiB/s ± 0%	+0.24%
Hash1K-8	473.1MiB/s ± 0%	474.0MiB/s ± 0%	+0.20%
Hash8K-8	510.4MiB/s ± 0%	511.0MiB/s ± 0%	+0.11%
Hash8BytesUnaligned-8	43.80MiB/s ± 0%	43.82MiB/s ± 0%	~
Hash1KUnaligned-8	472.5MiB/s ± 0%	474.3MiB/s ± 0%	+0.38%
Hash8KUnaligned-8	506.1MiB/s ± 0%	505.8MiB/s ± 0%	~

When testing with large files (e.g., a 3GB file), the runtime was reduced from 8.65 seconds to 7.39 seconds, resulting in an approximate 9% reduction in execution time. This demonstrates a more significant performance gain when handling larger datasets.

Overall, these optimizations provide modest improvements for small input sizes and more noticeable performance benefits when processing larger files, especially in memory-intensive workloads like file hashing.

gopherbot · 2024-09-06T02:42:43Z

This PR (HEAD: 67f8686) has been imported to Gerrit for code review.

Please visit Gerrit at https://go-review.googlesource.com/c/go/+/611299.

Important tips:

Don't comment on this PR. All discussion takes place in Gerrit.
You need a Gmail or other Google account to log in to Gerrit.
To change your code in response to feedback:
- Push a new commit to the branch used by your GitHub PR.
- A new "patch set" will then appear in Gerrit.
- Respond to each comment by marking as Done in Gerrit if implemented as suggested. You can alternatively write a reply.
- Critical: you must click the blue Reply button near the top to publish your Gerrit responses.
- Multiple commits in the PR will be squashed by GerritBot.
The title and description of the GitHub PR are used to construct the final commit message.
- Edit these as needed via the GitHub web interface (not via Gerrit or git).
- You should word wrap the PR description at ~76 characters unless you need longer lines (e.g., for tables or URLs).
See the Sending a change via GitHub and Reviews sections of the Contribution Guide as well as the FAQ for details.

gopherbot · 2024-09-06T02:49:14Z

Message from Gopher Robot:

Patch Set 1:

(1 comment)

Please don’t reply on this GitHub thread. Visit golang.org/cl/611299.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2024-09-06T02:56:06Z

This PR (HEAD: 85ec85f) has been imported to Gerrit for code review.

Please visit Gerrit at https://go-review.googlesource.com/c/go/+/611299.

Important tips:

Don't comment on this PR. All discussion takes place in Gerrit.
You need a Gmail or other Google account to log in to Gerrit.
To change your code in response to feedback:
- Push a new commit to the branch used by your GitHub PR.
- A new "patch set" will then appear in Gerrit.
- Respond to each comment by marking as Done in Gerrit if implemented as suggested. You can alternatively write a reply.
- Critical: you must click the blue Reply button near the top to publish your Gerrit responses.
- Multiple commits in the PR will be squashed by GerritBot.
The title and description of the GitHub PR are used to construct the final commit message.
- Edit these as needed via the GitHub web interface (not via Gerrit or git).
- You should word wrap the PR description at ~76 characters unless you need longer lines (e.g., for tables or URLs).
See the Sending a change via GitHub and Reviews sections of the Contribution Guide as well as the FAQ for details.

This commit enhances the performance of the MD5 functionality on ARM64 architecture by optimizing the ROUND3 function in the `md5block_arm64.s` assembly file. 1.Refactored the `ROUND3` macro to improve the computation order, introducing a new `ROUND3FIRST` macro to handle the initial calculation more efficiently. 2.Optimized the XOR operations in the `ROUND3` macro to reduce unnecessary instructions and improve parallelism within the ARM64 architecture. Performance testing was conducted on an ARM64 Linux machine using Go's benchmark tool. The benchmarks were run 10 times each to ensure statistical significance. The following results were observed: | Benchmark | Old Time (sec/op) | New Time (sec/op) | Change | |-----------------------|-------------------|-------------------|--------| | Hash8Bytes-8 | 175.0ns 2% | 175.0ns 1% | ~ | | Hash1K-8 | 2.065µs 0% | 2.060µs 0% | -0.22% | | Hash8K-8 | 15.31µs 0% | 15.29µs 0% | -0.11% | | Hash8BytesUnaligned-8 | 174.0ns 1% | 174.0ns 1% | ~ | | Hash1KUnaligned-8 | 2.067µs 0% | 2.059µs 0% | -0.41% | | Hash8KUnaligned-8 | 15.44µs 0% | 15.45µs 0% | ~ | In terms of throughput: | Benchmark | Old Throughput (B/s) | New Throughput (B/s) | Change | |-----------------------|----------------------|----------------------|--------| | Hash8Bytes-8 | 43.58MiB/s 2% | 43.69MiB/s 0% | +0.24% | | Hash1K-8 | 473.1MiB/s 0% | 474.0MiB/s 0% | +0.20% | | Hash8K-8 | 510.4MiB/s 0% | 511.0MiB/s 0% | +0.11% | | Hash8BytesUnaligned-8 | 43.80MiB/s 0% | 43.82MiB/s 0% | ~ | | Hash1KUnaligned-8 | 472.5MiB/s 0% | 474.3MiB/s 0% | +0.38% | | Hash8KUnaligned-8 | 506.1MiB/s 0% | 505.8MiB/s 0% | ~ | When testing with large files (e.g., a 3GB file), the runtime was reduced from 8.65 seconds to 7.39 seconds, resulting in an approximate 9% reduction in execution time. This demonstrates a more significant performance gain when handling larger datasets. Overall, these optimizations provide modest improvements for small input sizes and more noticeable performance benefits when processing larger files, especially in memory-intensive workloads like file hashing.

gopherbot · 2024-09-06T03:02:57Z

Message from 赵静玉:

Patch Set 1:

(1 comment)

Please don’t reply on this GitHub thread. Visit golang.org/cl/611299.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2024-09-06T03:03:06Z

This PR (HEAD: 3149567) has been imported to Gerrit for code review.

Please visit Gerrit at https://go-review.googlesource.com/c/go/+/611299.

Important tips:

Don't comment on this PR. All discussion takes place in Gerrit.
You need a Gmail or other Google account to log in to Gerrit.
To change your code in response to feedback:
- Push a new commit to the branch used by your GitHub PR.
- A new "patch set" will then appear in Gerrit.
- Respond to each comment by marking as Done in Gerrit if implemented as suggested. You can alternatively write a reply.
- Critical: you must click the blue Reply button near the top to publish your Gerrit responses.
- Multiple commits in the PR will be squashed by GerritBot.
The title and description of the GitHub PR are used to construct the final commit message.
- Edit these as needed via the GitHub web interface (not via Gerrit or git).
- You should word wrap the PR description at ~76 characters unless you need longer lines (e.g., for tables or URLs).
See the Sending a change via GitHub and Reviews sections of the Contribution Guide as well as the FAQ for details.

kdjdbbfk mentioned this pull request Sep 6, 2024

crypto/md5: modify md5block_arm64.s optimization md5 #66395

Closed

kdjdbbfk force-pushed the master branch from 67f8686 to 85ec85f Compare September 6, 2024 02:52

kdjdbbfk force-pushed the master branch from 85ec85f to 3149567 Compare September 6, 2024 02:58

kdjdbbfk changed the title ~~crypto/md5: Improve ARM64 MD5 performance by optimizing ROUND3 function~~ crypto/md5: improve ARM64 MD5 performance by optimizing ROUND3 function Sep 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

crypto/md5: improve ARM64 MD5 performance by optimizing ROUND3 function #69302

crypto/md5: improve ARM64 MD5 performance by optimizing ROUND3 function #69302

kdjdbbfk commented Sep 6, 2024 •

edited

Loading

gopherbot commented Sep 6, 2024

gopherbot commented Sep 6, 2024

gopherbot commented Sep 6, 2024

gopherbot commented Sep 6, 2024

gopherbot commented Sep 6, 2024

crypto/md5: improve ARM64 MD5 performance by optimizing ROUND3 function #69302

Are you sure you want to change the base?

crypto/md5: improve ARM64 MD5 performance by optimizing ROUND3 function #69302

Conversation

kdjdbbfk commented Sep 6, 2024 • edited Loading

gopherbot commented Sep 6, 2024

gopherbot commented Sep 6, 2024

gopherbot commented Sep 6, 2024

gopherbot commented Sep 6, 2024

gopherbot commented Sep 6, 2024

kdjdbbfk commented Sep 6, 2024 •

edited

Loading