Adding more assembly to AARCH64 (macOS) #14645
Replies: 6 comments 29 replies
-
Copying 50G of randomfile.bin
So it is at least a little faster. |
Beta Was this translation helpful? Give feedback.
-
We currently do not have assembly for accelerating GCM with AARCH64, so the door is open to any PR that contains working code that is more performant than the generic code and is reasonably clean such that it can pass review. Once such a PR has been merged, an alternative solution, such as grabbing That being said, I have a few thoughts:
Lastly, I did a quick comparison on a Ryzen 7 5800X using OpenSSL 1.1.1s. The following disables PCLMULQDQ, AVX, AVX2 and AES-NI:
The following disables only AVX, AVX2 and AES-NI:
The following disables only AES-NI:
The following uses AES-NI:
Also, just in case you really did benchmark CCM rather than GCM, here are some CCM numbers. The following disables PCLMULQDQ, AVX, AVX2 and AES-NI:
The following disables only AVX, AVX2 and AES-NI:
The following disables only AES-NI:
The following uses AES-NI:
Oddly, it looks like OpenSSL 1.1.1s does not have an AVX/AVX2 implementation for aes-256-gcm or aes-256-ccm. It also appears to have no implementation to take advantage of PCLMULQDQ with SSE2 on aes-256-ccm. I am not sure whether SSE2 or SSE2 + PCLMULQDQ is equivalent to ARM's 128-bit SIMD to make the comparison fair, but these numbers suggest that does not really matter, given that the Apple numbers for a generic implementation outperform the OpenSSL numbers for everything but AES-NI on Zen 3. The assembly routine just makes it even better, although it is still significantly slower than AES-NI, which is why it might be worthwhile looking into using Apple's hardware encryption instructions if they exist. Although it should be well known at this point, these numbers confirm that AES GCM is faster than AES CCM. If you really did benchmark |
Beta Was this translation helpful? Give feedback.
-
Apple Mac Mini M1 (16K PAGESIZE kernel)
|
Beta Was this translation helpful? Give feedback.
-
So I used As you can see in But as mentioned, I think this could do with the Meanwhile, I am adding I personally am not that interested in using macOS/Windows own/built in crypto work, as it is often restrictive. Both as in linking, and might-not-support-all-the-things-we-want. (say blake3 for example). I also like using But with a In a prior commit I test for the CPU capabilities of AESV8 here: so you can think of it as the |
Beta Was this translation helpful? Give feedback.
-
This is a "can lundman really put in openssl assembly and make it work" proof-of-concept, and I am emboldened by the success I have got. I was unaware of this: #12171 So it would be straight forward to add in armv7 as well, then do some CPUID checks on capabilities to work out which one to call. Have we decided on how the ARM CPUID will be done (in ZFS)? Are we just exposing How does it work with ArmV7? How does Linux prefer to do that? I additionally tested
|
Beta Was this translation helpful? Give feedback.
-
OK I ported over the PR work, and massaged it to work for macOS
I'll post to the PR the changes I've needed to do |
Beta Was this translation helpful? Give feedback.
-
Looking into adding some possible speed^ups for macOS port on M1 etc, and rather than working in a vacuum I thought I'd run it by everyone. Someone might already have looked into it, or there are other plans in the workds.
I was tempted by the
aes-gcm-armv8-unroll8_64.S
but there isn't an easy way to plug in the whole aes-gcm that I see. So I started smaller, modeling after theaes_aesni.S
work.I picked
aesv8_armx.S
at random - honestly, it isn't clear to me which of the aarch64 options is the "newer"/"faster".Confirmed valid with a hack call from zdb. Confirmed it runs in kernel. Not confirmed that it is faster though, I need more than 1 GB file pool to test that.
Working on this branch:
https://github.com/openzfsonosx/openzfs-fork/commits/unify_arm64
Specifically, this commit (while it exists)
openzfsonosx@4512a58
Thoughts?
@AttilaFueloep @mcmilk @ryao @behlendorf
Beta Was this translation helpful? Give feedback.
All reactions