Skip to content

Separate refactoring changes in 43250 #48199

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

echesakov
Copy link
Contributor

@echesakov echesakov commented Feb 12, 2021

Separate refactoring and functional changes in #43250. I will open another PR for changes that will implement stack probe helper on Arm64.

This one contains mostly refactoring changes (there is one functional change here - stop probing under sp on Arm32).

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 12, 2021
@echesakov echesakov marked this pull request as ready for review February 12, 2021 18:05
@echesakov
Copy link
Contributor Author

@dotnet/jit-contrib This is ready for review, PTAL.

I am running spmi to validate that the changes are no-op on x64/x86/arm64 and that there are no unexpected diffs on arm32. Should have results later today.

@echesakov echesakov marked this pull request as draft February 12, 2021 20:32
@echesakov echesakov marked this pull request as ready for review February 13, 2021 02:58
@JulieLeeMSFT JulieLeeMSFT added this to the 6.0.0 milestone Feb 16, 2021
@echesakov
Copy link
Contributor Author

@dotnet/jit-contrib ping

@BruceForstall
Copy link
Contributor

@echesakovMSFT Did you validate these are spmi no-diff as you said you were going to do?

// registers have been saved. So instead of letting genAllocLclFrame use initReg as a temporary register,
// always use REG_SCRATCH. We don't care if it trashes it, so ignore the initRegZeroed output argument.
bool ignoreInitRegZeroed = false;
genAllocLclFrame(compiler->compLclFrameSize, REG_SCRATCH, &ignoreInitRegZeroed,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems a little weird to put the call to do probing inside of a function that's supposed to push callee saved registers, but ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agree. On other hand, genAllocLclFrame is not allocating anything on Arm64 and should be removed as soon as I replace the stack probing logic with helper call/or "probe while adjusting sp" algorithm as we discussed earlier.

Copy link
Contributor Author

@echesakov echesakov Feb 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I think that genPushCalleeSavedRegisters should be renamed as genAllocLclFrameAndStoreCalleeSavedRegisters on Arm64.

@@ -9658,7 +9658,7 @@ void CodeGen::genAllocLclFrame(unsigned frameSize, regNumber initReg, bool* pIni
{
lastTouchDelta = frameSize;
}
else if (frameSize < compiler->getVeryLargeFrameSize())
else if (frameSize < 3 * pageSize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this deserves a comment: why 3 * pageSize?

@echesakov
Copy link
Contributor Author

Did you validate these are spmi no-diff as you said you were going to do?

@BruceForstall I did on win-x86/win-x64 with libraries collection - found no diffs. Haven't done yet with other platforms.

@echesakov
Copy link
Contributor Author

I re-did spmi-asmdiffs for this PR with all mch collections after fixing issues with spmi-asmdiffs on Arm64 with #48683.

There are some benign diffs with this PR on win-x64 and win-x86.
These are due to commit 03f64dc where initReg is no longer required to be rax/eax.
The constraint was imposed by the loop stack probing algorithm that the JIT doesn't have anymore.

win-x64

Found 4 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 2117
Total bytes of diff: 2117
Total bytes of delta: 0 (0.00% of base)


0 total files with Code Size differences (0 improved, 0 regressed), 4 unchanged.

0 total methods with Code Size differences (0 improved, 0 regressed), 4 unchanged.

--------------------------------------------------------------------------------

An example, of the diffs

--- "a/F:\\echesako\\Runtime_48199\\asm.windows.x64.Checked\\base/247425.dasm"
+++ "b/F:\\echesako\\Runtime_48199\\asm.windows.x64.Checked\\diff/247425.dasm"
@@ -27,16 +27,16 @@ G_M59477_IG01:        ; gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, nog
        call     CORINFO_HELP_STACK_PROBE
        mov      rsp, r11
        lea      rbp, [rsp+61B20H]
-       xor      eax, eax
-       mov      qword ptr [rbp-61AB8H], rax
+       xor      ecx, ecx
+       mov      qword ptr [rbp-61AB8H], rcx
        vxorps   xmm4, xmm4
-       mov      rax, 0xD1FFAB1E
-       vmovdqa  xmmword ptr [rbp+rax-40H], xmm4
-       vmovdqa  xmmword ptr [rbp+rax-30H], xmm4
-       vmovdqa  xmmword ptr [rbp+rax-20H], xmm4
-       add      rax, 48
+       mov      rcx, 0xD1FFAB1E
+       vmovdqa  xmmword ptr [rbp+rcx-40H], xmm4
+       vmovdqa  xmmword ptr [rbp+rcx-30H], xmm4
+       vmovdqa  xmmword ptr [rbp+rcx-20H], xmm4
+       add      rcx, 48
        jne      SHORT  -5 instr
-       mov      qword ptr [rbp-40H], rax
+       mov      qword ptr [rbp-40H], rcx
                                                ;; bbWeight=1    PerfScore 17.33

win-x86

Found 4 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 1533
Total bytes of diff: 1533
Total bytes of delta: 0 (0.00% of base)


0 total files with Code Size differences (0 improved, 0 regressed), 4 unchanged.

0 total methods with Code Size differences (0 improved, 0 regressed), 4 unchanged.

--------------------------------------------------------------------------------

An example, of the diffs

--- "a/F:\\echesako\\Runtime_48199\\asm.windows.x86.Checked\\base/192467.dasm"
+++ "b/F:\\echesako\\Runtime_48199\\asm.windows.x86.Checked\\diff/192467.dasm"
@@ -42,14 +42,14 @@ G_M45492_IG01:        ; gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, nog
        vxorps   xmm4, xmm4
        vmovdqu  xmmword ptr [ebp-6874H], xmm4
        vmovdqu  xmmword ptr [ebp-6864H], xmm4
-       mov      eax, -0x6840
-       vmovdqu  xmmword ptr [ebp+eax-14H], xmm4
-       vmovdqu  xmmword ptr [ebp+eax-04H], xmm4
-       vmovdqu  xmmword ptr [ebp+eax+0CH], xmm4
-       add      eax, 48
+       mov      edx, -0x6840
+       vmovdqu  xmmword ptr [ebp+edx-14H], xmm4
+       vmovdqu  xmmword ptr [ebp+edx-04H], xmm4
+       vmovdqu  xmmword ptr [ebp+edx+0CH], xmm4
+       add      edx, 48
        jne      SHORT  -5 instr
-       mov      dword ptr [ebp-14H], eax
-       mov      dword ptr [ebp-10H], eax
+       mov      dword ptr [ebp-14H], edx
+       mov      dword ptr [ebp-10H], edx
                                                ;; bbWeight=1    PerfScore 15.83
 G_M45492_IG02:        ; gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref
        lea      edi, [ebp-7098H]

linux-arm


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 3511178
Total bytes of diff: 3511424
Total bytes of delta: 246 (0.01% of base)
    diff is a regression.


Top file regressions (bytes):
          10 : 211106.dasm (0.08% of base)
           8 : 174491.dasm (0.10% of base)
           6 : 246618.dasm (8.82% of base)
           6 : 238879.dasm (0.05% of base)
           4 : 177216.dasm (0.02% of base)
           4 : 188884.dasm (0.02% of base)
           4 : 6852.dasm (0.02% of base)
           4 : 108446.dasm (0.01% of base)
           4 : 180498.dasm (0.02% of base)
           4 : 206545.dasm (0.02% of base)
           4 : 3726.dasm (0.15% of base)
           4 : 38562.dasm (0.01% of base)
           4 : 38858.dasm (0.01% of base)
           4 : 168934.dasm (0.02% of base)
           4 : 171167.dasm (0.02% of base)
           4 : 173598.dasm (0.02% of base)
           4 : 186748.dasm (0.02% of base)
           4 : 206546.dasm (0.02% of base)
           4 : 186747.dasm (0.02% of base)
           4 : 23056.dasm (0.01% of base)

Top file improvements (bytes):
          -8 : 232082.dasm (-0.02% of base)
          -8 : 229946.dasm (-0.02% of base)
          -8 : 229995.dasm (-0.02% of base)
          -4 : 215563.dasm (-1.68% of base)
          -2 : 209938.dasm (-0.00% of base)
          -2 : 217037.dasm (-0.01% of base)
          -2 : 214258.dasm (-0.01% of base)
          -2 : 217703.dasm (-0.01% of base)

102 total files with Code Size differences (8 improved, 94 regressed), 5 unchanged.

Top method regressions (bytes):
          10 ( 0.08% of base) : 211106.dasm - TestApp:Main():int
           8 ( 0.10% of base) : 174491.dasm - MCCTest.MyClass:Main(System.String[]):int
           6 ( 8.82% of base) : 246618.dasm - BigFrames.Test:Test1()
           6 ( 0.05% of base) : 238879.dasm - ldloca_s_r8:test_float64():int
           4 ( 0.02% of base) : 177216.dasm - MCCTest.VTypeE:Check(MCCTest.VTypeE):this
           4 ( 0.02% of base) : 188884.dasm - MCCTest.VTypeE:Add(MCCTest.VTypeE):this
           4 ( 0.02% of base) : 6852.dasm - HtmlEntities:.cctor()
           4 ( 0.01% of base) : 108446.dasm - Microsoft.Diagnostics.Tracing.Parsers.KernelTraceEventParser:EnumerateTemplates(System.Func`3[[System.String, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.String, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[Microsoft.Diagnostics.Tracing.EventFilterResponse, Microsoft.Diagnostics.Tracing.TraceEvent, Version=2.0.55.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]],System.Action`1[[Microsoft.Diagnostics.Tracing.TraceEvent, Microsoft.Diagnostics.Tracing.TraceEvent, Version=2.0.55.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]]):this
           4 ( 0.02% of base) : 180498.dasm - MCCTest.VTypeE:Check(MCCTest.VTypeE):this
           4 ( 0.02% of base) : 206545.dasm - MCCTest.VTypeE:Add(MCCTest.VTypeE):this
           4 ( 0.15% of base) : 3726.dasm - System.Number:Dragon4(long,int,int,bool,int,bool,System.Span`1[Byte],byref):int
           4 ( 0.01% of base) : 38562.dasm - Microsoft.Diagnostics.Tracing.Parsers.ClrPrivateTraceEventParser:EnumerateTemplates(System.Func`3[[System.String, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.String, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[Microsoft.Diagnostics.Tracing.EventFilterResponse, Microsoft.Diagnostics.Tracing.TraceEvent, Version=2.0.64.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]],System.Action`1[[Microsoft.Diagnostics.Tracing.TraceEvent, Microsoft.Diagnostics.Tracing.TraceEvent, Version=2.0.64.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]]):this
           4 ( 0.01% of base) : 38858.dasm - Microsoft.Diagnostics.Tracing.Parsers.ClrTraceEventParser:EnumerateTemplates(System.Func`3[[System.String, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.String, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[Microsoft.Diagnostics.Tracing.EventFilterResponse, Microsoft.Diagnostics.Tracing.TraceEvent, Version=2.0.64.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]],System.Action`1[[Microsoft.Diagnostics.Tracing.TraceEvent, Microsoft.Diagnostics.Tracing.TraceEvent, Version=2.0.64.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]]):this
           4 ( 0.02% of base) : 168934.dasm - MCCTest.VTypeE:Check(MCCTest.VTypeE):this
           4 ( 0.02% of base) : 171167.dasm - MCCTest.VTypeE:Add(MCCTest.VTypeE):this
           4 ( 0.02% of base) : 173598.dasm - MCCTest.VTypeE:Check(MCCTest.VTypeE):this
           4 ( 0.02% of base) : 186748.dasm - MCCTest.VTypeE:Check(MCCTest.VTypeE):this
           4 ( 0.02% of base) : 206546.dasm - MCCTest.VTypeE:Check(MCCTest.VTypeE):this
           4 ( 0.02% of base) : 186747.dasm - MCCTest.VTypeE:Add(MCCTest.VTypeE):this
           4 ( 0.01% of base) : 23056.dasm - Microsoft.CodeAnalysis.DesktopAssemblyIdentityComparer:.cctor()

Top method improvements (bytes):
          -8 (-0.02% of base) : 232082.dasm - decimalMDArrTest:Main():int
          -8 (-0.02% of base) : 229946.dasm - doubleMDArrTest:Main():int
          -8 (-0.02% of base) : 229995.dasm - ulongMDArrTest:Main():int
          -4 (-1.68% of base) : 215563.dasm - Packet256Tracer:Shade(Intersections,RayPacket256,Scene,int):VectorPacket256:this
          -2 (-0.00% of base) : 209938.dasm - r8NaNrem:Main():int
          -2 (-0.01% of base) : 217037.dasm - r8NaNmul:Main():int
          -2 (-0.01% of base) : 214258.dasm - lclfldsub:Main():int
          -2 (-0.01% of base) : 217703.dasm - r8NaNsub:Main():int

Top method regressions (percentages):
           2 (16.67% of base) : 247707.dasm - ldloc_s_i8:test_int64():int
           6 ( 8.82% of base) : 246618.dasm - BigFrames.Test:Test1()
           2 ( 2.56% of base) : 246619.dasm - BigFrames.Test:Test2()
           2 ( 1.52% of base) : 85877.dasm - Repro:Main():int
           2 ( 1.20% of base) : 80686.dasm - <Module>:TestLVNumCnt():int
           2 ( 0.90% of base) : 84769.dasm - initblk2:main():int
           2 ( 0.85% of base) : 86049.dasm - JitTest.BaseStruct:caller(bool):JitTest.BaseStruct
           2 ( 0.68% of base) : 215564.dasm - Packet256Tracer:GetNaturalColor(System.Runtime.Intrinsics.Vector256`1[Int32],VectorPacket256,VectorPacket256,VectorPacket256,Scene):VectorPacket256:this
           2 ( 0.42% of base) : 83875.dasm - initblk2:main():int
           4 ( 0.15% of base) : 3726.dasm - System.Number:Dragon4(long,int,int,bool,int,bool,System.Span`1[Byte],byref):int
           8 ( 0.10% of base) : 174491.dasm - MCCTest.MyClass:Main(System.String[]):int
          10 ( 0.08% of base) : 211106.dasm - TestApp:Main():int
           6 ( 0.05% of base) : 238879.dasm - ldloca_s_r8:test_float64():int
           2 ( 0.03% of base) : 170259.dasm - MCCTest.MyClass:Main(System.String[]):int
           2 ( 0.02% of base) : 155764.dasm - testout1:Func_0_4_5_4():System.Decimal
           2 ( 0.02% of base) : 160821.dasm - testout1:Func_0_4_5_4():System.Decimal
           2 ( 0.02% of base) : 168830.dasm - MCCTest.Common:CheckResult(MCCTest.VTypeD,int):int
           4 ( 0.02% of base) : 188884.dasm - MCCTest.VTypeE:Add(MCCTest.VTypeE):this
           4 ( 0.02% of base) : 206545.dasm - MCCTest.VTypeE:Add(MCCTest.VTypeE):this
           4 ( 0.02% of base) : 171167.dasm - MCCTest.VTypeE:Add(MCCTest.VTypeE):this

Top method improvements (percentages):
          -4 (-1.68% of base) : 215563.dasm - Packet256Tracer:Shade(Intersections,RayPacket256,Scene,int):VectorPacket256:this
          -8 (-0.02% of base) : 232082.dasm - decimalMDArrTest:Main():int
          -8 (-0.02% of base) : 229946.dasm - doubleMDArrTest:Main():int
          -8 (-0.02% of base) : 229995.dasm - ulongMDArrTest:Main():int
          -2 (-0.01% of base) : 214258.dasm - lclfldsub:Main():int
          -2 (-0.01% of base) : 217703.dasm - r8NaNsub:Main():int
          -2 (-0.01% of base) : 217037.dasm - r8NaNmul:Main():int
          -2 (-0.00% of base) : 209938.dasm - r8NaNrem:Main():int

102 total methods with Code Size differences (8 improved, 94 regressed), 5 unchanged.

--------------------------------------------------------------------------------

The diffs are expected due to removing the inlined stack probing and replacing it with a call to stack probe helper in commit 39501f1.

An example of such diffs:

@@ -1287,11 +1287,13 @@
 G_M390_IG01:        ; gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
             push    {r4,r5,r6,r7,r8,r10,r11,lr}
             add     r11, sp, 24
-            movw    r0, 0xd1ff
-            sxth    r0, r0
-            ldr     r0, [sp+r0]
-            movw    r0, 0xd1ff
-            sub     sp, sp, r0
+            movw    r4, 0xd1ff
+            sub     r4, sp, r4
+            movw    r5, LOW RELOC 0xD1FFAB1E
+            movt    r5, HIGH RELOC 0xD1FFAB1E
+            add     r5, pc
+            blx     r5         // CORINFO_HELP_STACK_PROBE
+            mov     sp, r4
             sub     r2, r11, 0x800
             movs    r3, 253
             movs    r0, 0
@@ -1299,7 +1301,7 @@ G_M390_IG01:        ; gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prol
             stm     r2!, {r0,r1}
             subs    r3, 1
             bhi     SHORT pc-6 (-3 instructions)
-                                               ;; bbWeight=1    PerfScore 14.00
+                                               ;; bbWeight=1    PerfScore 16.00
 G_M390_IG02:        ; gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             movw    r4, LOW RELOC 0xD1FFAB1E
             movt    r4, HIGH RELOC 0xD1FFAB1E
@@ -11483,7 +11485,7 @@ G_M390_IG80:        ; , epilog, nogc, extend
             pop     {r4,r5,r6,r7,r8,r10,r11,pc}
                                                ;; bbWeight=1    PerfScore 3.00

win-arm64
No asm diffs

@dotnet dotnet deleted a comment from azure-pipelines bot Feb 25, 2021
@echesakov
Copy link
Contributor Author

The issues in runtime-coreclr gcstress0x3-gcstress0xc are consistent with weekend test runs.

@echesakov echesakov merged commit ee3f7da into dotnet:master Feb 26, 2021
@echesakov echesakov deleted the Separate-Refactor-Changes-Runtime-43250 branch February 26, 2021 23:25
@ghost ghost locked as resolved and limited conversation to collaborators Mar 29, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants