Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed AutoVectorize to use return by value for better performance #1

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gitpy
Copy link

@gitpy gitpy commented Nov 10, 2022

It's a proof of concept that shows that the auto vectorizer can create similar performance than the manual SSE and ISPC.

The issue with the return by ref is that the optimizer, is bad at making assumptions about memory. Also in general the optimizer is very good at widening, but doing so manually might irritate him more, because of it not being in the compilers canonicalized form for widening.

Also changed to -O1 for main.cpp. It is mostly required to get rid of tuple boilerplate. And return by value by itself often benefits from having optimizations for the Caller.
Also checked the ASM: AutoVectorize doesn't get inlined and gets no benefit this way.

The pull request "as is" isn't really made to be merged. More to show that the vectorizer goes a long way, when the code is structured the correct way for the optimizer.

My local numbers are:

SSE: 10.8873 ns average
  Total time for 100000 runs: 1088.73 μs
  ...

Autovectorize: 9.08702 ns average
  Total time for 100000 runs: 908.702 μs
  ...

ISPC: 10.2582 ns average
  Total time for 100000 runs: 1025.81 μs
  ...

It's a proof of concept that shows that the auto vectorizer can create
similar performance than the manual SSE and ISPC.

The issue with the return by ref is that the optimizer is bad at
making assumptions about memory. Also in general the optimizer is very good
at widening but doing so manually might irritate him more.

Also changed to -O1 for main.cpp. It is mostly required to get rid of tuple boilerplate.
And return by value by itself benefits from having optimizations for the Caller.
Checked the ASM: AutoVectorize doesn't get inlined and gets no possible benefit this way.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant