-
Notifications
You must be signed in to change notification settings - Fork 5k
[NativeAOT] Follow up changes for the suspension loop routine. #73741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I think this can be reviewed/merged. We can tune this a bit further, but we may get more improvements from fixing issues like in #73655, which is v.Next |
I've instrumented the code while developing to report suspension timings and looked at what reported by library tests. The latency results are noisy, but there was nothing extraordinary before or after the changes. |
I've also looked at timings reported by the following benchmark. using System;
using System.Diagnostics;
using System.Runtime.CompilerServices;
using System.Threading;
class Program
{
static object[] arr1 = new Exception[10000];
enum E1
{
a
}
static object box = E1.a;
static void DoSomeWork()
{
var a1 = arr1;
ArgumentNullException val = new ArgumentNullException();
for (; ; )
{
Assign(a1, 1, val);
Assign(a1, 2, val);
Assign(a1, 3, val);
Assign(a1, 4, val);
Assign(a1, 5, val);
Assign(a1, 6, val);
Assign(a1, 7, val);
}
}
static bool doPoll;
[MethodImpl(MethodImplOptions.NoInlining)]
private static void Assign(object[] a, int i, ArgumentNullException val)
{
if (val != null)
{
a[i] = val;
}
if (doPoll)
{
// enum->int unbox will perform a GC-polling FCALL (easy case for suspension)
// int unboxed = (int)box;
// unboxing is not an FCall in NativeAOT and does not poll. Hard to find benign operation that polls.
// use PInvoke instead as an "easy for suspension" case
Thread.Yield();
}
a[i] = null;
}
static void Main()
{
for (int i = 0; i < Environment.ProcessorCount - 1; i++)
{
new Thread(() =>
{
DoSomeWork();
}).Start();
}
Thread.Sleep(10);
for (; ; )
{
long longest = 0;
int iterations = 0;
doPoll ^= true;
// do 5sec of suspension/resume
var sw = Stopwatch.StartNew();
while (sw.ElapsedMilliseconds < 5000)
{
long oneIter = sw.ElapsedMilliseconds;
// Suspend/Resume EE
GC.GetTotalAllocatedBytes(precise: true);
oneIter = sw.ElapsedMilliseconds - oneIter;
longest = Math.Max(longest, oneIter);
iterations++;
// Let other threads actually resume
Thread.Yield();
}
sw.Stop();
{
Console.WriteLine($"frequent GC-polling : {doPoll}");
Console.WriteLine($"#suspensions in 5 sec. : {iterations}");
Console.WriteLine($"Average suspension took : {(double)sw.ElapsedMilliseconds / iterations}");
Console.WriteLine($"longest suspend was (msec): {longest}");
Console.WriteLine();
}
}
}
} |
Typical results on Windows:
|
Typical results on Unix (WSL2 Ubuntu 18.04 on the same machine)
|
|
Do you think we can refactor the thread suspension such that majority of the code is shared between NativeAOT and CoreCLR? It would be the ultimate place to be. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
The main differences are call outs to hijack routines, detection if a thread is “suspended” and interaction with debugger and maybe threadabort support. |
/azp run runtime-extra-platforms |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run runtime-extra-platforms |
Azure Pipelines successfully started running 1 pipeline(s). |
Now I see a bunch of cryptography and COM Interop failures. |
Linux and Linux musl both fail with:
|
Windows (both x64 and arm64):
or
|
Exists in the baseline - ignore. This API is trimming unsafe and it looks like it ran out of luck. We'll annotate it like that.
These are new tests from #72051 that just merged. We'll disable them. The feature doesn't work with trimming or AOT. |
I've created a PR with a trivial noop change #73856. |
Thanks!!! |
Last item for #67805
Turns out polling, while simpler, works better than CoreCLR-style event handshake, so I am keeping the pure polling style, while making it a bit more robust - added an observe-only pass, set a limit for per-iteration wait, avoid hijacking already redirected threads, etc. Otherwise it is the same algorithm as before.
In CoreCLR implementation, user threads signal an event as they suspend and suspending thread waits on the event with a 1 msec timeout. Porting that here resulted in regressions.
The problem is when we do timeout, and that happens often enough as some threads may not be hijacked on a first try, - that results in 1 msec delay, which is considerable, and on some OSes 15 msec, which is really a lot.
It is probably amplified by the fact that in NativeAOT synchronous GC polls are relatively rare, unless a thread does a lot of allocations or pinvokes.
We should reconsider using the event pattern in CoreCLR as well. It is questionable if the complexity adds any benefits.