-
Notifications
You must be signed in to change notification settings - Fork 5k
Separate refactoring changes in 43250 #48199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate refactoring changes in 43250 #48199
Conversation
… in src/coreclr/jit/codegencommon.cpp
…n.cpp to CodeGen::genPushCalleeSavedRegisters in codegenarmarch.cpp
…p src/coreclr/jit/codegencommon.cpp
…/jit/codegenarm64.cpp src/coreclr/jit/compiler.h
…eclr/jit/codegenxarch.cpp
@dotnet/jit-contrib This is ready for review, PTAL. I am running spmi to validate that the changes are no-op on x64/x86/arm64 and that there are no unexpected diffs on arm32. Should have results later today. |
@dotnet/jit-contrib ping |
@echesakovMSFT Did you validate these are spmi no-diff as you said you were going to do? |
// registers have been saved. So instead of letting genAllocLclFrame use initReg as a temporary register, | ||
// always use REG_SCRATCH. We don't care if it trashes it, so ignore the initRegZeroed output argument. | ||
bool ignoreInitRegZeroed = false; | ||
genAllocLclFrame(compiler->compLclFrameSize, REG_SCRATCH, &ignoreInitRegZeroed, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems a little weird to put the call to do probing inside of a function that's supposed to push callee saved registers, but ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agree. On other hand, genAllocLclFrame
is not allocating anything on Arm64 and should be removed as soon as I replace the stack probing logic with helper call/or "probe while adjusting sp" algorithm as we discussed earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, I think that genPushCalleeSavedRegisters
should be renamed as genAllocLclFrameAndStoreCalleeSavedRegisters
on Arm64.
@@ -9658,7 +9658,7 @@ void CodeGen::genAllocLclFrame(unsigned frameSize, regNumber initReg, bool* pIni | |||
{ | |||
lastTouchDelta = frameSize; | |||
} | |||
else if (frameSize < compiler->getVeryLargeFrameSize()) | |||
else if (frameSize < 3 * pageSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this deserves a comment: why 3 * pageSize
?
@BruceForstall I did on win-x86/win-x64 with libraries collection - found no diffs. Haven't done yet with other platforms. |
… in src/coreclr/jit/codegenarm64.cpp
I re-did spmi-asmdiffs for this PR with all mch collections after fixing issues with spmi-asmdiffs on Arm64 with #48683. There are some benign diffs with this PR on win-x64 and win-x86. win-x64
An example, of the diffs --- "a/F:\\echesako\\Runtime_48199\\asm.windows.x64.Checked\\base/247425.dasm"
+++ "b/F:\\echesako\\Runtime_48199\\asm.windows.x64.Checked\\diff/247425.dasm"
@@ -27,16 +27,16 @@ G_M59477_IG01: ; gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, nog
call CORINFO_HELP_STACK_PROBE
mov rsp, r11
lea rbp, [rsp+61B20H]
- xor eax, eax
- mov qword ptr [rbp-61AB8H], rax
+ xor ecx, ecx
+ mov qword ptr [rbp-61AB8H], rcx
vxorps xmm4, xmm4
- mov rax, 0xD1FFAB1E
- vmovdqa xmmword ptr [rbp+rax-40H], xmm4
- vmovdqa xmmword ptr [rbp+rax-30H], xmm4
- vmovdqa xmmword ptr [rbp+rax-20H], xmm4
- add rax, 48
+ mov rcx, 0xD1FFAB1E
+ vmovdqa xmmword ptr [rbp+rcx-40H], xmm4
+ vmovdqa xmmword ptr [rbp+rcx-30H], xmm4
+ vmovdqa xmmword ptr [rbp+rcx-20H], xmm4
+ add rcx, 48
jne SHORT -5 instr
- mov qword ptr [rbp-40H], rax
+ mov qword ptr [rbp-40H], rcx
;; bbWeight=1 PerfScore 17.33 win-x86
An example, of the diffs --- "a/F:\\echesako\\Runtime_48199\\asm.windows.x86.Checked\\base/192467.dasm"
+++ "b/F:\\echesako\\Runtime_48199\\asm.windows.x86.Checked\\diff/192467.dasm"
@@ -42,14 +42,14 @@ G_M45492_IG01: ; gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, nog
vxorps xmm4, xmm4
vmovdqu xmmword ptr [ebp-6874H], xmm4
vmovdqu xmmword ptr [ebp-6864H], xmm4
- mov eax, -0x6840
- vmovdqu xmmword ptr [ebp+eax-14H], xmm4
- vmovdqu xmmword ptr [ebp+eax-04H], xmm4
- vmovdqu xmmword ptr [ebp+eax+0CH], xmm4
- add eax, 48
+ mov edx, -0x6840
+ vmovdqu xmmword ptr [ebp+edx-14H], xmm4
+ vmovdqu xmmword ptr [ebp+edx-04H], xmm4
+ vmovdqu xmmword ptr [ebp+edx+0CH], xmm4
+ add edx, 48
jne SHORT -5 instr
- mov dword ptr [ebp-14H], eax
- mov dword ptr [ebp-10H], eax
+ mov dword ptr [ebp-14H], edx
+ mov dword ptr [ebp-10H], edx
;; bbWeight=1 PerfScore 15.83
G_M45492_IG02: ; gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref
lea edi, [ebp-7098H] linux-arm
The diffs are expected due to removing the inlined stack probing and replacing it with a call to stack probe helper in commit 39501f1. An example of such diffs: @@ -1287,11 +1287,13 @@
G_M390_IG01: ; gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
push {r4,r5,r6,r7,r8,r10,r11,lr}
add r11, sp, 24
- movw r0, 0xd1ff
- sxth r0, r0
- ldr r0, [sp+r0]
- movw r0, 0xd1ff
- sub sp, sp, r0
+ movw r4, 0xd1ff
+ sub r4, sp, r4
+ movw r5, LOW RELOC 0xD1FFAB1E
+ movt r5, HIGH RELOC 0xD1FFAB1E
+ add r5, pc
+ blx r5 // CORINFO_HELP_STACK_PROBE
+ mov sp, r4
sub r2, r11, 0x800
movs r3, 253
movs r0, 0
@@ -1299,7 +1301,7 @@ G_M390_IG01: ; gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prol
stm r2!, {r0,r1}
subs r3, 1
bhi SHORT pc-6 (-3 instructions)
- ;; bbWeight=1 PerfScore 14.00
+ ;; bbWeight=1 PerfScore 16.00
G_M390_IG02: ; gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
movw r4, LOW RELOC 0xD1FFAB1E
movt r4, HIGH RELOC 0xD1FFAB1E
@@ -11483,7 +11485,7 @@ G_M390_IG80: ; , epilog, nogc, extend
pop {r4,r5,r6,r7,r8,r10,r11,pc}
;; bbWeight=1 PerfScore 3.00 win-arm64 |
The issues in runtime-coreclr gcstress0x3-gcstress0xc are consistent with weekend test runs. |
Separate refactoring and functional changes in #43250. I will open another PR for changes that will implement stack probe helper on Arm64.
This one contains mostly refactoring changes (there is one functional change here - stop probing under sp on Arm32).