Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apple Neural Engine optimizations #97

Merged
merged 1 commit into from
Dec 20, 2024
Merged

Apple Neural Engine optimizations #97

merged 1 commit into from
Dec 20, 2024

Conversation

ashvardanian
Copy link
Contributor

Apple chips provide several functional units capable of high-throughput matrix multiplication and AI inference. Those computeUnits include the CPU, GPU, and Neural Engine. For maximum compatibility, the .all option is used by default. Sadly, Apple's scheduler is not always optimal, and it might be beneficial to specify the target device explicitly, especially if the models are pre-compiled for the Apple Neural Engine, as it may yield significant performance gains.

Model GPU Text E. ANE Text E. GPU Image E. ANE Image E.
english-small 2.53 ms 0.53 ms 6.57 ms 1.23 ms
english-base 2.54 ms 0.61 ms 18.90 ms 3.79 ms
english-large 2.30 ms 0.61 ms 79.68 ms 20.94 ms
multilingual-base 2.34 ms 0.50 ms 18.98 ms 3.77 ms

On Apple M4 iPad, running iOS 18.2. Batch size is 1, and the model is pre-loaded into memory. The original encoders use f32 single-precision numbers for maximum compatibility, and mostly rely on GPU for computation. The quantized encoders use a mixture of i8, f16, and f32 numbers for maximum performance, and mostly rely on the Apple Neural Engine (ANE) for computation. The median latency is reported.

Co-authored-by: Kirill Solodskikh <[email protected]>
Co-authored-by: Azim Kurbanov <[email protected]>
Co-authored-by: Ruslan Aydarkhanov <[email protected]>
Co-authored-by: Andrey Ageev <[email protected]>
@ashvardanian ashvardanian merged commit 2dbcc42 into main Dec 20, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant