arm Neon vector backend #19

Tarinn · 2022-11-03T10:13:53Z

This adds a new vector backend that utilizes arm neon SIMD instructions to implement the parallel formulae, conforming to the avx2 implementation. Other files outside of of the new neon backend are updated to be able to use this backend when the required feature flag is set. The new backend is largely a copy of the avx2 backend with the instructions changed to arm neon, and other changes from the 'base' avx2 implementation mainly focus on the need to split a 256 bit vector into two 128 bit ones and differences in structure of input/output of arm neon instructions.

On a Raspberry Pi 4, the recorded speed-up of serial vs vector backend was around 20% for relevant benchmarks of variable-time multiplication.

rubdos · 2022-12-02T06:57:52Z

@Tarinn I think you should squash the last two commits into d822a2c, since that last one accidentally committed a bunch of large files that you removed later.

Added ARM neon backend support

073ac6c

Updated arm neon intrinsics for better speed-up

db30d66

Tarinn force-pushed the neon branch from 3f6c586 to db30d66 Compare December 2, 2022 09:49

This was referenced Dec 7, 2022

u32_backend is slower than u64_backend on armv7 Android dalek-cryptography/curve25519-dalek#449

Closed

NEON backend for aarch64 dalek-cryptography/curve25519-dalek#457

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

arm Neon vector backend #19

arm Neon vector backend #19

Tarinn commented Nov 3, 2022

rubdos commented Dec 2, 2022

arm Neon vector backend #19

Are you sure you want to change the base?

arm Neon vector backend #19

Conversation

Tarinn commented Nov 3, 2022

rubdos commented Dec 2, 2022