Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a compiler bug on Clang 15+ with AVX-512 #4830

Closed
wants to merge 1 commit into from

Conversation

MinetaS
Copy link
Contributor

@MinetaS MinetaS commented Oct 17, 2023

No functional changes

@mstembera
Copy link
Contributor

I confirm this workaround fixes it. It would be nice if we could at least do -O1 since that also works but I don't know how using attributes or pragmas.

@MinetaS
Copy link
Contributor Author

MinetaS commented Oct 18, 2023

I confirm this workaround fixes it. It would be nice if we could at least do -O1 since that also works but I don't know how using attributes or pragmas.

That's what I was thinking about as well, but sadly clang supports optimize on/off only via pragma directive.
The only viable option would be controlling compiler flags in Makefile, but I'm not familiar with it...

@vondele
Copy link
Member

vondele commented Oct 18, 2023

I think more important than to try to enable -O1 or so, is to add a link/comment to the issue that has been filed with clang, so we remove the workaround in due time. Usually compiler developers tend to fix these wrong code problems if they have a clear reproducer.

@MinetaS
Copy link
Contributor Author

MinetaS commented Oct 22, 2023

That's the ultimate objective for sure, but it might be less likely to get spotted without sufficient analysis considering there are 20k+ open issues listed in LLVM issue tracker. Also submitting a well formed issue will take a lot of time and efforts because the bug requires rare conditions (AVX512 + LTO + >O2 opt) to happen.

So I think we definitely need some kind of assistance from some people w/ knowledge of LLVM optimization for the fix on LLVM's side.

Until then I believe this patch is good enough to cover the bug.

@vondele
Copy link
Member

vondele commented Oct 22, 2023

Of course, we don't have to provide a fix to LLVM, but opening an LLVM issue with the title 'wrong code generation with AVX512 + LTO + O3' with the precise steps to reproduce it (on their side) would be enough for me (e.g. provide them with the repos to checkout and the data to download, as well as the precise commands). Creating a reduced test case is out of scope, and typically these compiler people are pretty good at that themselves.

This won't be a blocker for this PR to be merged, but I think we should do a good effort to see this workaround obsoleted in the future.

@mstembera
Copy link
Contributor

@vondele I just realized that if we had syzygy coverage by default we would have known exactly when this first started happening. Also it would make it simpler to create an issue for the compiler developers because it wouldn't involve having to download and configure additional external files to reproduce it. In light of this event I just want to ask if maybe you change your mind about the value of #3100 ?
cc @Disservin

@Disservin
Copy link
Member

Disservin commented Feb 10, 2024

@mstembera btw, not sure if you've seen the activity just before your comment but I created a bug report to clang, and someone found a nice clean reproduction of the issue that we experienced, and it's indeed a miscompilation.

@mstembera
Copy link
Contributor

@Disservin No just a coincidence, I haven't seen it. Thanks for creating the bug report!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants