Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This adds a new vector backend that utilizes arm neon SIMD instructions to implement the parallel formulae, conforming to the avx2 implementation. Other files outside of of the new neon backend are updated to be able to use this backend when the required feature flag is set. The new backend is largely a copy of the avx2 backend with the instructions changed to arm neon, and other changes from the 'base' avx2 implementation mainly focus on the need to split a 256 bit vector into two 128 bit ones and differences in structure of input/output of arm neon instructions.
On a Raspberry Pi 4, the recorded speed-up of serial vs vector backend was around 20% for relevant benchmarks of variable-time multiplication.