-
Notifications
You must be signed in to change notification settings - Fork 5k
[NativeAOT] Do not emit safe point for TLS_GET_ADDR calls into native runtime. #102237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/azp run runtime-nativeaot-outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
all linux-arm64 jobs with NativeAOT have passed. I will run tests one more time to be sure. |
/azp run runtime-nativeaot-outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The TLS inlining change on arm64 has been enabled for few months now. Wondering if we tracked what triggered this failure now?
src/coreclr/jit/emitarm64.cpp
Outdated
@@ -8985,7 +8985,8 @@ void emitter::emitIns_Call(EmitCallType callType, | |||
regNumber xreg /* = REG_NA */, | |||
unsigned xmul /* = 0 */, | |||
ssize_t disp /* = 0 */, | |||
bool isJump /* = false */) | |||
bool isJump /* = false */, | |||
bool isNoGCframe /* = false */) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you include the new parameter in the method description? for this file and in other files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. A good point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also not sure if isNoGCframe
is the best name. I could not come up with anything better at the time, but now I think noSafePoint
might be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've renamed the parameter noSafePoint
. I think it is better and easier to explain. I've also added comments about the parameter where we had any comments.
src/coreclr/jit/emitarm64.cpp
Outdated
@@ -9079,7 +9080,7 @@ void emitter::emitIns_Call(EmitCallType callType, | |||
emitThisByrefRegs = byrefRegs; | |||
|
|||
// for the purpose of GC safepointing tail-calls are not real calls | |||
id->idSetIsNoGC(isJump || emitNoGChelper(methHnd)); | |||
id->idSetIsNoGC(isJump || isNoGCframe || emitNoGChelper(methHnd)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of passing around isNoGCframe
, we could have added a check inside emitNoGChelper()
to ignore for TLS, but adding a check for TLS methHnd
might not solve the purpose, because for xarch we pass a dummy methodHandle == 1
and for arm64, it is the actual address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started with a solution where a special/fake JIT helper would be used to specify that we are dealing with a TLS call to the native runtime. In a way that is right - this is a JIT helper, just provided by the native runtime. That did not work though, since on arm64 the methodHandle
is not just a marker.
I also realized that we may want to opt-out more calls from being safeponts in the future. Not for correctness, but for "we do not need them" reason. We make nearly every call a safe point, but some calls would never be stackwalked.
Like native calls (not talking about entire pinvoke here, but the actual calls to native methods - it is ok if they are safepoints, but not necessary - a thing to think about anyways,..)
So I switched to a parameter that allows to override the default. Currently it would only be used for TLS calls, but we could use it for other cases.
This is basically the reason why I am passing around a parameter.
Good question. #95565 made safe points interruptible and that triggered the failure. Note that the presence of a safepoint by itself is not a problem here. It is a well-formed safepoint and has correct GC info, so we could do stack walks. The problem is that optimizer does not know that TLS access may introduce a safe point and leaves GC locals observably uninitialized. Another question to ask - Why was this not a problem in Fully-Interruptible methods even before #95565, since we could start a stack walk at the same call site? |
/azp run runtime-nativeaot-outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run runtime-nativeaot-outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
Thanks! |
… runtime. (dotnet#102237) * Do not emit safe point for TLS_GET_ADDR calls into native runtime. * formatting * renamed parameter to `noSafePoint`, added comments. * clang formatting, bane of my existence
Fixes: #102140
The reason for the failure is GC-reporting of uninitialized temp.
We declare a temp to hold the address to the managed TLS blob. The temp is initialized by indirecting into native TLS. In some cases fetching the native TLS involves a method call.
Roughly it looks like:
The optimizer assumes that if we did not see a safe point between prolog and the first assignment, then zero-initing is unnecessary. The problem is that optimizer does not know that TLS access may emit a call, and that call may introduce a safe point (as calls do by default).
A simplest fix would be to not emit safepoints for calls into native TLS. They cannot participate in GC stackwalks anyways.