-
Notifications
You must be signed in to change notification settings - Fork 13.3k
[X86] Fix RegAlloc issue by implementing TRI::getCustomEHPadPreservedMask #135518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This method needs to be implemented so that the register allocator can correctly split live ranges of loop invariant virtual registers. This default return value is correct for DWARF unwinder, which preserves all registers (except exception pointer/selector), but not for SjLj, which only preserves sp/fp and possibly bp (they should be reserved). In the added test `rcx` on the entry to the loop contains the argument to `foo`. It survives the call due to `preserve_allcc`, but not the unwind edge entering LBB0_3, and so must be recomputed before going to the next iteration. Similarly, `rdx` holds the address of a jump table and is recomputed on every iteration rather than once before the loop.
@llvm/pr-subscribers-backend-x86 Author: Sergei Barannikov (s-barannikov) ChangesThis method needs to be implemented so that the register allocator can In the added test Full diff: https://github.com/llvm/llvm-project/pull/135518.diff 3 Files Affected:
diff --git a/llvm/lib/Target/X86/X86RegisterInfo.cpp b/llvm/lib/Target/X86/X86RegisterInfo.cpp
index 1c4114f8cc9c6..68c227cb987de 100644
--- a/llvm/lib/Target/X86/X86RegisterInfo.cpp
+++ b/llvm/lib/Target/X86/X86RegisterInfo.cpp
@@ -525,6 +525,13 @@ X86RegisterInfo::getNoPreservedMask() const {
return CSR_NoRegs_RegMask;
}
+const uint32_t *
+X86RegisterInfo::getCustomEHPadPreservedMask(const MachineFunction &MF) const {
+ if (MF.getTarget().Options.ExceptionModel == ExceptionHandling::SjLj)
+ return getNoPreservedMask();
+ return TargetRegisterInfo::getCustomEHPadPreservedMask(MF);
+}
+
const uint32_t *X86RegisterInfo::getDarwinTLSCallPreservedMask() const {
return CSR_64_TLS_Darwin_RegMask;
}
diff --git a/llvm/lib/Target/X86/X86RegisterInfo.h b/llvm/lib/Target/X86/X86RegisterInfo.h
index 5b6ac3c5da019..d9491cd225827 100644
--- a/llvm/lib/Target/X86/X86RegisterInfo.h
+++ b/llvm/lib/Target/X86/X86RegisterInfo.h
@@ -102,6 +102,8 @@ class X86RegisterInfo final : public X86GenRegisterInfo {
const uint32_t *getCallPreservedMask(const MachineFunction &MF,
CallingConv::ID) const override;
const uint32_t *getNoPreservedMask() const override;
+ const uint32_t *
+ getCustomEHPadPreservedMask(const MachineFunction &MF) const override;
// Calls involved in thread-local variable lookup save more registers than
// normal calls, so they need a different mask to represent this.
diff --git a/llvm/test/CodeGen/X86/sjlj-unwind-clobber.ll b/llvm/test/CodeGen/X86/sjlj-unwind-clobber.ll
new file mode 100644
index 0000000000000..c1d718a9f7031
--- /dev/null
+++ b/llvm/test/CodeGen/X86/sjlj-unwind-clobber.ll
@@ -0,0 +1,76 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=x86_64 -exception-model=sjlj -o - %s | FileCheck %s
+
+declare dso_local i32 @__gxx_personality_sj0(...)
+declare dso_local preserve_allcc void @foo(ptr)
+
+define void @test() personality ptr @__gxx_personality_sj0 {
+; CHECK-LABEL: test:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: pushq %rbp
+; CHECK-NEXT: movq %rsp, %rbp
+; CHECK-NEXT: pushq %r15
+; CHECK-NEXT: pushq %r14
+; CHECK-NEXT: pushq %r13
+; CHECK-NEXT: pushq %r12
+; CHECK-NEXT: pushq %rbx
+; CHECK-NEXT: subq $104, %rsp
+; CHECK-NEXT: movq $__gxx_personality_sj0, -104(%rbp)
+; CHECK-NEXT: movq $GCC_except_table0, -96(%rbp)
+; CHECK-NEXT: movq %rbp, -88(%rbp)
+; CHECK-NEXT: movq %rsp, -72(%rbp)
+; CHECK-NEXT: movq $.LBB0_3, -80(%rbp)
+; CHECK-NEXT: leaq -136(%rbp), %rdi
+; CHECK-NEXT: callq _Unwind_SjLj_Register@PLT
+; CHECK-NEXT: leaq -44(%rbp), %rcx
+; CHECK-NEXT: .LBB0_1: # %while.cond
+; CHECK-NEXT: # =>This Inner Loop Header: Depth=1
+; CHECK-NEXT: movl $1, -128(%rbp)
+; CHECK-NEXT: .Ltmp0:
+; CHECK-NEXT: movq %rcx, %rdi
+; CHECK-NEXT: callq foo
+; CHECK-NEXT: .Ltmp1:
+; CHECK-NEXT: jmp .LBB0_2
+; CHECK-NEXT: .LBB0_3: # in Loop: Header=BB0_1 Depth=1
+; CHECK-NEXT: leaq -44(%rbp), %rcx
+; CHECK-NEXT: movl -128(%rbp), %eax
+; CHECK-NEXT: cmpl $1, %eax
+; CHECK-NEXT: jae .LBB0_5
+; CHECK-NEXT: # %bb.4: # in Loop: Header=BB0_1 Depth=1
+; CHECK-NEXT: leaq .LJTI0_0(%rip), %rdx
+; CHECK-NEXT: jmpq *(%rdx,%rax,8)
+; CHECK-NEXT: .LBB0_6: # %lpad
+; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1
+; CHECK-NEXT: .Ltmp2:
+; CHECK-NEXT: movl -124(%rbp), %eax
+; CHECK-NEXT: movl -120(%rbp), %eax
+; CHECK-NEXT: jmp .LBB0_1
+; CHECK-NEXT: .LBB0_2: # %while.end
+; CHECK-NEXT: leaq -136(%rbp), %rdi
+; CHECK-NEXT: callq _Unwind_SjLj_Unregister@PLT
+; CHECK-NEXT: addq $104, %rsp
+; CHECK-NEXT: popq %rbx
+; CHECK-NEXT: popq %r12
+; CHECK-NEXT: popq %r13
+; CHECK-NEXT: popq %r14
+; CHECK-NEXT: popq %r15
+; CHECK-NEXT: popq %rbp
+; CHECK-NEXT: retq
+; CHECK-NEXT: .LBB0_5:
+; CHECK-NEXT: ud2
+entry:
+ %ptr = alloca i32, align 4
+ br label %while.cond
+
+while.cond:
+ invoke preserve_allcc void @foo(ptr %ptr)
+ to label %while.end unwind label %lpad
+
+lpad:
+ %lp = landingpad { ptr, i32 }
+ catch ptr null
+ br label %while.cond
+
+while.end:
+ ret void
+}
|
The first commit contains a test. I can craft a mir test instead if necessary, but it is going to be much less readable. The test is a stripped down version of this C++ source compiled to LLVM IR with [[clang::preserve_all]]
void foo(int *ptr);
void test() {
int x;
while (true) {
try {
foo(&x);
} catch (...) {
continue;
}
break;
}
} |
@@ -525,6 +525,13 @@ X86RegisterInfo::getNoPreservedMask() const { | |||
return CSR_NoRegs_RegMask; | |||
} | |||
|
|||
const uint32_t * | |||
X86RegisterInfo::getCustomEHPadPreservedMask(const MachineFunction &MF) const { | |||
if (MF.getTarget().Options.ExceptionModel == ExceptionHandling::SjLj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not familar with SjLj. Is the following the same rule on both Linux and Windows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question! I never thought about supporting SJLJ on Windows, but I think it should be the same there.
Unlike DWARF unwinder, SJLJ implementation does not actually unwind the stack (does not step through frames), it just does __builtin_longjmp
to the dispatch block. The builtin only restores program counter and stack pointer(s), as that is all that is saved at the beginning of each function that can potentially throw an exception. (Saving all registers at the beginning of each function would be too expensive.)
Also, there is a related workaround that adds the same "no-preserved" mask to an arbitrary instruction in the dispatch block. (It should be removed and any remaining issues fixed in the generic code, but I'm hesitant to do this as test SJLJ test coverage is pretty poor.)
Apparently, the link doesn't work, look for
// Add a register mask with no preserved registers. This results in all
// registers being marked as clobbered.
in X86ISelLowering.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll give +1 here since I don't see any blocker issue. It's better if someone who is familiar with SjLj takes another look.
Thanks! |
Maybe someone from @llvm/pr-subscribers-backend-arm can help with the review? IIUC SJLJ-style exception handling is the default on ARM-Darwin. |
This method needs to be implemented so that the register allocator can
correctly split live ranges of loop invariant virtual registers.
This default return value is correct for DWARF unwinder, which preserves
all registers (except exception pointer/selector), but not for SjLj,
which only preserves sp/fp and possibly bp (they should be reserved).
In the added test
rcx
on the entry to the loop contains the argumentto
foo
. It survives the call due topreserve_allcc
, but not theunwind edge entering
LBB0_3
, and so must be recomputed before going tothe next iteration.
Similarly,
rdx
holds the address of a jump table and is recomputed onevery iteration rather than once before the loop as was before regalloc.