Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use clang-12 to fix issue with LLVM_ABI_BREAKING_CHECKS #4760

Merged
merged 2 commits into from
Oct 10, 2024

Conversation

karupayun
Copy link
Collaborator

This commit is fixing the root cause of the issue that appeared during an llvm update, and was raised in #4212 (comment).

The issue was temporary fixed in #4512, where we set LLVM_ABI_BREAKING_CHECKS to FORCE_OFF constant in order to not use a non-deterministic seed inside the hashing function (included in this llvm commit: llvm/llvm-project@ce80c80).

A further investigation (with chsigg@) found that the issue is that LLVM is built with clang11, while Triton use a newer version. The ABI issue is brought up here: llvm/llvm-project#96282 (comment), but the consensus seemed to be that this setup is rare.

Updated the clang version to 12 in the ubuntu build fixed the issue and therefore we can revert setting LLVM_ABI_BREAKING_CHECKS constant. I am additionaly erasing LLVM_ABI_BREAKING_CHECKS in the setup of the other hardwares (it seems it was not needed) and I am splitting ubuntu and macOS configurations because it seems cleaner than having a variable that sets the compiler version for each of them.

@ThomasRaoux
Copy link
Collaborator

Do we need to update LLVM hash in the same PR?
Why does the macOS build doesn't need the same change as linux one?

@karupayun
Copy link
Collaborator Author

karupayun commented Sep 19, 2024

Do we need to update LLVM hash in the same PR?

It's actually not needed since not updating the llvm-hash will mean an old (in the cache) LLVM built with clang-11 but also LLVM_ABI_BREAKING_CHECKS=FORCE_OFF, but I needed to change it in order to test, so I included for completeness.

Why does the macOS build doesn't need the same change as linux one?

I'm actually not sure, but I guess that it's because mac-12 should be using a newer version of clang by default. I could not find which one they are using.

@ThomasRaoux
Copy link
Collaborator

does that mean any Triton build with clang <= 11 or gcc would show the error?

@chsigg
Copy link
Collaborator

chsigg commented Sep 23, 2024

Do we need to update LLVM hash in the same PR?

It sounds risky to update the build flags while keeping the hash, because the prebuilt LLVM stored in Azure would no longer match the build script.

does that mean any Triton build with clang <= 11 or gcc would show the error?

Any build of Triton with clang > 11 or gcc shows the error when linking against LLVM built with clang <= 11 (which is the case before this PR).

@karupayun
Copy link
Collaborator Author

Updated the llvm-hash (after #4791). Everything still green.

Waiting for your input @ThomasRaoux :)

@ThomasRaoux
Copy link
Collaborator

sorry for the delay, I was OOO.

does that mean any Triton build with clang <= 11 or gcc would show the error?

Any build of Triton with clang > 11 or gcc shows the error when linking against LLVM built with clang <= 11 (which is the case before this PR).

I'm still trying to understand that part. So right LLVM built with clang <= 11 is broken unless we set DLLVM_ABI_BREAKING_CHECKS=FORCE_OFF in the LLVM build. If move to build LLVM with clang > 11 it will be solved for triton build with clang > 11 but then it would still mismatch when LLVM is built with clang > 11 and triton is built with clang <= 11 right?

I thought the problem was that this preprocessor mismatches between LLVM build and triton?

#if LLVM_ENABLE_ABI_BREAKING_CHECKS &&                                         \
    (!defined(__clang__) || __clang_major__ > 11)

Did I get that wrong?

@chsigg
Copy link
Collaborator

chsigg commented Oct 1, 2024

sorry for the delay, I was OOO.

does that mean any Triton build with clang <= 11 or gcc would show the error?

Any build of Triton with clang > 11 or gcc shows the error when linking against LLVM built with clang <= 11 (which is the case before this PR).

I'm still trying to understand that part. So right LLVM built with clang <= 11 is broken unless we set DLLVM_ABI_BREAKING_CHECKS=FORCE_OFF in the LLVM build. If move to build LLVM with clang > 11 it will be solved for triton build with clang > 11 but then it would still mismatch when LLVM is built with clang > 11 and triton is built with clang <= 11 right?

That's correct. You can't pass hash containers across a component A built with clang <= 11 and a different component B built with clang > 11 or gcc, unless one uses DLLVM_ABI_BREAKING_CHECKS=FORCE_OFF. Currently A=llvm and B=Triton, but it could also be the other way around.

I thought the problem was that this preprocessor mismatches between LLVM build and triton?

#if LLVM_ENABLE_ABI_BREAKING_CHECKS &&                                         \
    (!defined(__clang__) || __clang_major__ > 11)

Did I get that wrong?

No, you didn't get that wrong. This is the problem.

@ThomasRaoux
Copy link
Collaborator

sorry for the delay, I was OOO.

does that mean any Triton build with clang <= 11 or gcc would show the error?

Any build of Triton with clang > 11 or gcc shows the error when linking against LLVM built with clang <= 11 (which is the case before this PR).

I'm still trying to understand that part. So right LLVM built with clang <= 11 is broken unless we set DLLVM_ABI_BREAKING_CHECKS=FORCE_OFF in the LLVM build. If move to build LLVM with clang > 11 it will be solved for triton build with clang > 11 but then it would still mismatch when LLVM is built with clang > 11 and triton is built with clang <= 11 right?

That's correct. You can't pass hash containers across a component A built with clang <= 11 and a different component B built with clang > 11 or gcc, unless one uses DLLVM_ABI_BREAKING_CHECKS=FORCE_OFF. Currently A=llvm and B=Triton, but it could also be the other way around.

I thought the problem was that this preprocessor mismatches between LLVM build and triton?

#if LLVM_ENABLE_ABI_BREAKING_CHECKS &&                                         \
    (!defined(__clang__) || __clang_major__ > 11)

Did I get that wrong?

No, you didn't get that wrong. This is the problem.

Thanks for clarifying. So based on that if we were to keep DLLVM_ABI_BREAKING_CHECKS=FORCE_OFF in the LLVM build it should work with any combination of gcc and llvm while if we remove it it would only work when building both LLVM and triton with LLVM > 11? Isn't it better to keep DLLVM_ABI_BREAKING_CHECKS=FORCE_OFF in this case? Is the motivation mainly clean up the build?

@chsigg
Copy link
Collaborator

chsigg commented Oct 1, 2024

The motivation is that we want to have checks. NDEBUG builds automatically disable LLVM_ABI_BREAKING_CHECKS, but we currently build with assertions, so ABI breaking checks should be enabled as well.

@ThomasRaoux
Copy link
Collaborator

The motivation is that we want to have checks. NDEBUG builds automatically disable LLVM_ABI_BREAKING_CHECKS, but we currently build with assertions, so ABI breaking checks should be enabled as well.

I see your point. What kind of problem could that catch?
I'm a bit worried about potentially breaking the workflow of users using gcc or an older clang, especially since the error generated is hard to track to this. If we want to do that we should at least make clang > 11 a requirement.
Let me discuss with others to see if they have a feel of how likely it is to break developers flow out there.

@ThomasRaoux
Copy link
Collaborator

actually, would that only break triton when built with clang <= 11? If so maybe we just need to require clang > 11 which would be reasonable

@chsigg
Copy link
Collaborator

chsigg commented Oct 2, 2024

Yes, only the issue is clang <= 11 on one side vs clang > 11 or gcc on the other. If we don't use clang <= 11 for LLVM or Triton, everything is fine.

Clang 11 is the default for the GitHub Ubuntu 20.04 runner image though, so there is a chance that people will run into this.

I can look into making the build fail explicitly for bad compiler combinations.

@ThomasRaoux
Copy link
Collaborator

I can look into making the build fail explicitly for bad compiler combinations.

That would be great. I think we should at least make clang minimum version to 12. It should be easy to do in cmake

@chsigg chsigg force-pushed the llvm-abi-breaking-checks-issue branch from 9ef20cc to c34395c Compare October 7, 2024 09:31
The issue raised in triton-lang#4212 (comment) and temporary fixed in triton-lang#4512 has now been fixed upstream.

The non-deterministic seed inside the hashing function
(included in this llvm commit: llvm/llvm-project@ce80c80) was disabled for clang <= 11 for non-pic builds.

This was removed again in xgupta/llvm-project@5255b81 and we can enable ABI-breaking checks again.

We build LLVM with PIC. If anyone ever builds with non-PIC, they will get a compilation error and need to disable LLVM_ABI_BREAKING_CHECKS manually.
@chsigg chsigg force-pushed the llvm-abi-breaking-checks-issue branch from c34395c to 78050f1 Compare October 7, 2024 09:33
@chsigg
Copy link
Collaborator

chsigg commented Oct 7, 2024

The non-pic workaround in LLVM has been removed (llvm/llvm-project#110952). We can therefore remove the LLVM_ABI_BREAKING_CHECKS=FORCE_OFF again. Anyone building LLVM with clang <= 11 and non-pic will get a compilation error. If this is acceptable for LLVM upstream, it should be acceptable for Triton users as well.

@ThomasRaoux
Copy link
Collaborator

The non-pic workaround in LLVM has been removed (llvm/llvm-project#110952). We can therefore remove the LLVM_ABI_BREAKING_CHECKS=FORCE_OFF again. Anyone building LLVM with clang <= 11 and non-pic will get a compilation error. If this is acceptable for LLVM upstream, it should be acceptable for Triton users as well.

We should still make clang 12 a minimum requirement to build triton or is this not needed?

@chsigg
Copy link
Collaborator

chsigg commented Oct 8, 2024

We should still make clang 12 a minimum requirement to build triton or is this not needed?

No, not needed. I doubt anyone will build Triton with non-pic, and if anyone does and happens to use clang-11 for that it will produce a compile error. Specifically, the risk of silent ODR-violations no longer exists.

Copy link
Collaborator

@ThomasRaoux ThomasRaoux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks good thanks for explaining the problem in detail. I'm probably doing something wrong but I can't find the hash 5255b819c1b117e6048f354c035f5d36f3afad2b in llvm?

@@ -1 +1 @@
61f8a7f618901797ee8663389a29722f29216a96
5255b819c1b117e6048f354c035f5d36f3afad2b
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird I can't find this hash in llvm repo?

@chsigg
Copy link
Collaborator

chsigg commented Oct 9, 2024

Yeah, sorry. I grabbed the wrong commit. The right one is def9550c9b389eaa3adbac3089acf06a61a6fdb5. I will just revert the llvm-hash change and then this should be mergable.

@chsigg chsigg merged commit e8d7957 into triton-lang:main Oct 10, 2024
7 checks passed
@chsigg chsigg deleted the llvm-abi-breaking-checks-issue branch October 10, 2024 06:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants