-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparse and AVX2 #172
Comments
The number of search threads might also have an impact on which is faster... |
311020_1 = Correctly display castling rights for Chess960. |
Thanks, so sparse AVX2 is still clearly better on AMD. Were these all tested on Ryzen 3900X? |
Yes, all on my Ryzen 3900X. I hope to get tests on another CPUs soon. |
Thanks again! So on Nehalem, no_sparse is now better than sparse, which was the other way around before the improvement. The Athlon resutls have a pretty high variance, but seem to suggest sparse is better. |
Thanks. On AMD, sparse=yes seems better. |
Maybe the cpu is overheating and then throttles down? |
@AlexB123 |
@syzygy1 |
Ah, I see now. |
What is the difference between SSSE3.exe and SSSE3_popcnt_mingw_10.exe ? |
I think the fact that no_sparse now beats sparse on Zen 3 shows that AMD has improved their AVX2 implementation in Zen 3. |
SSSE3 and SSSE3_sparse is 32-bit builds (compiled in MinGW i686-8.1.0-posix-dwarf-rt_v6-rev0) |
OK, so for 64-bit SSSE3 on Zen 2, sparse=yes is still faster than sparse=no. But it seems sparse=no is now faster than sparse=yes for AVX2 on Zen 2. I thought sparse=yes was clearly faster before the AVX2 speed up. This suggests that sparse=no is now faster on all CPUs with AVX2. |
I just tested a Ryzen 4500U laptop and also found that sparse=yes was faster than sparse=no before the AVX2 speedup patch and is now slower. |
Pure being fasted is pretty nice. Is it also stronger? |
No, Hybrid still stronger BMI2 Score of Cfish_x64_120421_ELTO_BMI2 vs Cfish_x64_130421_ELTO_BMI2_Pure: 668 - 521 - 6564 [0.509] AVX512_VNNI Score of Cfish_x64_120421_ELTO_AVX512___VNNI vs Cfish_x64_130421_ELTO_AVX512_VNNI_Pure: 527 - 507 - 6038 [0.501] |
@syzygy1 even x32 build is faster |
On my AVX2 laptop, sparse multiplication now turns out to be slower than the non-sparse multiplication. I suspect that this is not the case on some other AVX2 CPUs, in particular Zen 1.
I have therefore added a compilation option.
To compile with sparse multiplication:
make -j pgo sparse=yes
To compile without sparse multiplication:
make -j pgo sparse=no
By default "sparse=yes" except for AVX2 targets (including BMI2, VNNI, AVX512).
If it is clear that "sparse=no" is still faster on Zen 1 or on other CPUs with AVX2, I can make it the default on those CPUs. I cannot test this myself, so if anyone is willing to try sparse=yes/no on Zen 1 or other CPUs, that would be very welcome.
It would also be interesting to know if sparse=no is faster on any non-AVX2 CPUs.
The text was updated successfully, but these errors were encountered: