Changed AutoVectorize to use return by value for better performance #1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
It's a proof of concept that shows that the auto vectorizer can create similar performance than the manual SSE and ISPC.
The issue with the return by ref is that the optimizer, is bad at making assumptions about memory. Also in general the optimizer is very good at widening, but doing so manually might irritate him more, because of it not being in the compilers canonicalized form for widening.
Also changed to -O1 for main.cpp. It is mostly required to get rid of tuple boilerplate. And return by value by itself often benefits from having optimizations for the Caller.
Also checked the ASM: AutoVectorize doesn't get inlined and gets no benefit this way.
The pull request "as is" isn't really made to be merged. More to show that the vectorizer goes a long way, when the code is structured the correct way for the optimizer.
My local numbers are: