Apple Neural Engine optimizations #97

ashvardanian · 2024-12-19T04:55:33Z

Apple chips provide several functional units capable of high-throughput matrix multiplication and AI inference. Those computeUnits include the CPU, GPU, and Neural Engine. For maximum compatibility, the .all option is used by default. Sadly, Apple's scheduler is not always optimal, and it might be beneficial to specify the target device explicitly, especially if the models are pre-compiled for the Apple Neural Engine, as it may yield significant performance gains.

Model	GPU Text E.	ANE Text E.	GPU Image E.	ANE Image E.
`english-small`	2.53 ms	0.53 ms	6.57 ms	1.23 ms
`english-base`	2.54 ms	0.61 ms	18.90 ms	3.79 ms
`english-large`	2.30 ms	0.61 ms	79.68 ms	20.94 ms
`multilingual-base`	2.34 ms	0.50 ms	18.98 ms	3.77 ms

On Apple M4 iPad, running iOS 18.2. Batch size is 1, and the model is pre-loaded into memory. The original encoders use f32 single-precision numbers for maximum compatibility, and mostly rely on GPU for computation. The quantized encoders use a mixture of i8, f16, and f32 numbers for maximum performance, and mostly rely on the Apple Neural Engine (ANE) for computation. The median latency is reported.

Co-authored-by: Kirill Solodskikh <[email protected]> Co-authored-by: Azim Kurbanov <[email protected]> Co-authored-by: Ruslan Aydarkhanov <[email protected]> Co-authored-by: Andrey Ageev <[email protected]>

Add: Apple Neural Engine optimizations

00c92f2

Co-authored-by: Kirill Solodskikh <[email protected]> Co-authored-by: Azim Kurbanov <[email protected]> Co-authored-by: Ruslan Aydarkhanov <[email protected]> Co-authored-by: Andrey Ageev <[email protected]>

ashvardanian force-pushed the main-dev branch from 238324b to 00c92f2 Compare December 20, 2024 12:29

ashvardanian merged commit 2dbcc42 into main Dec 20, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apple Neural Engine optimizations #97

Apple Neural Engine optimizations #97

ashvardanian commented Dec 19, 2024

Apple Neural Engine optimizations #97

Apple Neural Engine optimizations #97

Conversation

ashvardanian commented Dec 19, 2024