[NativeAOT] Do not emit safe point for TLS_GET_ADDR calls into native runtime. #102237

VSadov · 2024-05-15T02:39:37Z

The reason for the failure is GC-reporting of uninitialized temp.

We declare a temp to hold the address to the managed TLS blob. The temp is initialized by indirecting into native TLS. In some cases fetching the native TLS involves a method call.

Roughly it looks like:

managed_ptr tlsRoot;

// unmanaged TLS pattern starts
...
...
unmanaged_ptr nativeTLS = TLS_GET_ADDR();   
nativeTLS += someAdjustment;                           // if this is a safe point, tlsRoot must be zero-inited
// unmanaged TLS pattern ends

// only here tlsRoot gets a real value
tlsRoot = nativeTLS[some indirection];

The optimizer assumes that if we did not see a safe point between prolog and the first assignment, then zero-initing is unnecessary. The problem is that optimizer does not know that TLS access may emit a call, and that call may introduce a safe point (as calls do by default).

A simplest fix would be to not emit safepoints for calls into native TLS. They cannot participate in GC stackwalks anyways.

VSadov · 2024-05-15T14:35:54Z

/azp run runtime-nativeaot-outerloop

azure-pipelines · 2024-05-15T14:36:10Z

Azure Pipelines successfully started running 1 pipeline(s).

VSadov · 2024-05-15T17:16:02Z

all linux-arm64 jobs with NativeAOT have passed. I will run tests one more time to be sure.

VSadov · 2024-05-15T17:16:49Z

/azp run runtime-nativeaot-outerloop

azure-pipelines · 2024-05-15T17:16:59Z

Azure Pipelines successfully started running 1 pipeline(s).

kunalspathak

LGTM. The TLS inlining change on arm64 has been enabled for few months now. Wondering if we tracked what triggered this failure now?

kunalspathak · 2024-05-15T19:12:37Z

src/coreclr/jit/emitarm64.cpp

@@ -8985,7 +8985,8 @@ void emitter::emitIns_Call(EmitCallType          callType,
                           regNumber        xreg /* = REG_NA */,
                           unsigned         xmul /* = 0     */,
                           ssize_t          disp /* = 0     */,
-                           bool             isJump /* = false */)
+                           bool             isJump /* = false */,
+                           bool             isNoGCframe /* = false */)


can you include the new parameter in the method description? for this file and in other files?

Yes. A good point.

I am also not sure if isNoGCframe is the best name. I could not come up with anything better at the time, but now I think noSafePoint might be better.

I've renamed the parameter noSafePoint. I think it is better and easier to explain. I've also added comments about the parameter where we had any comments.

kunalspathak · 2024-05-15T19:21:33Z

src/coreclr/jit/emitarm64.cpp

@@ -9079,7 +9080,7 @@ void emitter::emitIns_Call(EmitCallType          callType,
    emitThisByrefRegs = byrefRegs;

    // for the purpose of GC safepointing tail-calls are not real calls
-    id->idSetIsNoGC(isJump || emitNoGChelper(methHnd));
+    id->idSetIsNoGC(isJump || isNoGCframe || emitNoGChelper(methHnd));


Instead of passing around isNoGCframe, we could have added a check inside emitNoGChelper() to ignore for TLS, but adding a check for TLS methHnd might not solve the purpose, because for xarch we pass a dummy methodHandle == 1 and for arm64, it is the actual address.

I started with a solution where a special/fake JIT helper would be used to specify that we are dealing with a TLS call to the native runtime. In a way that is right - this is a JIT helper, just provided by the native runtime. That did not work though, since on arm64 the methodHandle is not just a marker.

I also realized that we may want to opt-out more calls from being safeponts in the future. Not for correctness, but for "we do not need them" reason. We make nearly every call a safe point, but some calls would never be stackwalked.
Like native calls (not talking about entire pinvoke here, but the actual calls to native methods - it is ok if they are safepoints, but not necessary - a thing to think about anyways,..)

So I switched to a parameter that allows to override the default. Currently it would only be used for TLS calls, but we could use it for other cases.

This is basically the reason why I am passing around a parameter.

VSadov · 2024-05-16T02:08:52Z

LGTM. The TLS inlining change on arm64 has been enabled for few months now. Wondering if we tracked what triggered this failure now?

Good question. #95565 made safe points interruptible and that triggered the failure.
Prior to that, a safepoint could not be "activated" unless we do a stackwalk through it, which could not happen for this one since the calee is native. Now, if we catch a thread on a safepoint we can do a stackwalk, so the safepoint can be "activated".

Note that the presence of a safepoint by itself is not a problem here. It is a well-formed safepoint and has correct GC info, so we could do stack walks. The problem is that optimizer does not know that TLS access may introduce a safe point and leaves GC locals observably uninitialized.
So it was - either teach optimizer about TLS calls or just not produce a safe point, since a native call does not need it. Not emitting a safepoint is simpler.

Another question to ask - Why was this not a problem in Fully-Interruptible methods even before #95565, since we could start a stack walk at the same call site?
That is because optimizer does not optimize initialization of GC locals in Fully Interruptible code.

VSadov · 2024-05-16T03:22:01Z

/azp run runtime-nativeaot-outerloop

azure-pipelines · 2024-05-16T03:22:10Z

Azure Pipelines successfully started running 1 pipeline(s).

VSadov · 2024-05-16T03:45:04Z

/azp run runtime-nativeaot-outerloop

azure-pipelines · 2024-05-16T03:45:13Z

Azure Pipelines successfully started running 1 pipeline(s).

VSadov · 2024-05-16T16:19:26Z

Thanks!

… runtime. (dotnet#102237) * Do not emit safe point for TLS_GET_ADDR calls into native runtime. * formatting * renamed parameter to `noSafePoint`, added comments. * clang formatting, bane of my existence

ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 15, 2024

dotnet-policy-service bot assigned VSadov May 15, 2024

Do not emit safe point for TLS_GET_ADDR calls into native runtime.

fe8be02

VSadov force-pushed the fix102140 branch from 1ded207 to fe8be02 Compare May 15, 2024 13:47

formatting

151c10f

VSadov marked this pull request as ready for review May 15, 2024 17:29

VSadov requested a review from kunalspathak May 15, 2024 17:29

kunalspathak approved these changes May 15, 2024

View reviewed changes

build-analysis bot mentioned this pull request May 15, 2024

NativeAOT legs timing out in CI #102239

Closed

renamed parameter to noSafePoint, added comments.

e9dae3e

clang formatting, bane of my existence

dcc593a

build-analysis bot mentioned this pull request May 16, 2024

Checkout failure: "Git fetch failed with exit code 128" dotnet/arcade#9009

Open

2 tasks

VSadov merged commit 13d753c into dotnet:main May 16, 2024
127 of 135 checks passed

VSadov deleted the fix102140 branch May 16, 2024 16:19

github-actions bot locked and limited conversation to collaborators Jun 16, 2024

[NativeAOT] Do not emit safe point for TLS_GET_ADDR calls into native runtime. #102237

[NativeAOT] Do not emit safe point for TLS_GET_ADDR calls into native runtime. #102237

Uh oh!

Conversation

VSadov commented May 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

VSadov commented May 15, 2024

Uh oh!

azure-pipelines bot commented May 15, 2024

Uh oh!

VSadov commented May 15, 2024

Uh oh!

VSadov commented May 15, 2024

Uh oh!

azure-pipelines bot commented May 15, 2024

Uh oh!

kunalspathak left a comment

Choose a reason for hiding this comment

Uh oh!

kunalspathak May 15, 2024

Choose a reason for hiding this comment

Uh oh!

VSadov May 16, 2024

Choose a reason for hiding this comment

Uh oh!

VSadov May 16, 2024

Choose a reason for hiding this comment

Uh oh!

VSadov May 16, 2024

Choose a reason for hiding this comment

Uh oh!

kunalspathak May 15, 2024

Choose a reason for hiding this comment

Uh oh!

VSadov May 16, 2024

Choose a reason for hiding this comment

Uh oh!

VSadov commented May 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

VSadov commented May 16, 2024

Uh oh!

azure-pipelines bot commented May 16, 2024

Uh oh!

VSadov commented May 16, 2024

Uh oh!

azure-pipelines bot commented May 16, 2024

Uh oh!

VSadov commented May 16, 2024

Uh oh!

Uh oh!

Uh oh!

VSadov commented May 15, 2024 •

edited

Loading

VSadov commented May 16, 2024 •

edited

Loading