Skip to content

[NativeAOT] Follow up changes for the suspension loop routine. #73741

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 12, 2022

Conversation

VSadov
Copy link
Member

@VSadov VSadov commented Aug 11, 2022

Last item for #67805

Turns out polling, while simpler, works better than CoreCLR-style event handshake, so I am keeping the pure polling style, while making it a bit more robust - added an observe-only pass, set a limit for per-iteration wait, avoid hijacking already redirected threads, etc. Otherwise it is the same algorithm as before.

In CoreCLR implementation, user threads signal an event as they suspend and suspending thread waits on the event with a 1 msec timeout. Porting that here resulted in regressions.
The problem is when we do timeout, and that happens often enough as some threads may not be hijacked on a first try, - that results in 1 msec delay, which is considerable, and on some OSes 15 msec, which is really a lot.
It is probably amplified by the fact that in NativeAOT synchronous GC polls are relatively rare, unless a thread does a lot of allocations or pinvokes.

We should reconsider using the event pattern in CoreCLR as well. It is questionable if the complexity adds any benefits.

@ghost ghost assigned VSadov Aug 11, 2022
@VSadov VSadov marked this pull request as ready for review August 11, 2022 06:33
@dotnet dotnet deleted a comment from azure-pipelines bot Aug 11, 2022
@VSadov VSadov requested review from jkotas and janvorli August 11, 2022 14:26
@VSadov
Copy link
Member Author

VSadov commented Aug 11, 2022

I think this can be reviewed/merged.

We can tune this a bit further, but we may get more improvements from fixing issues like in #73655, which is v.Next

@VSadov
Copy link
Member Author

VSadov commented Aug 11, 2022

I've instrumented the code while developing to report suspension timings and looked at what reported by library tests. The latency results are noisy, but there was nothing extraordinary before or after the changes.

@VSadov
Copy link
Member Author

VSadov commented Aug 11, 2022

I've also looked at timings reported by the following benchmark.
It measures 2 cases - trivial when threads actively poll and "hard" - when threads do not poll and need to be hijacked.

using System;
using System.Diagnostics;
using System.Runtime.CompilerServices;
using System.Threading;

class Program
{
    static object[] arr1 = new Exception[10000];

    enum E1
    {
        a
    }

    static object box = E1.a;

    static void DoSomeWork()
    {
        var a1 = arr1;

        ArgumentNullException val = new ArgumentNullException();
        for (; ; )
        {
            Assign(a1, 1, val);
            Assign(a1, 2, val);
            Assign(a1, 3, val);
            Assign(a1, 4, val);
            Assign(a1, 5, val);
            Assign(a1, 6, val);
            Assign(a1, 7, val);
        }
    }

    static bool doPoll;

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static void Assign(object[] a, int i, ArgumentNullException val)
    {
        if (val != null)
        {
            a[i] = val;
        }

        if (doPoll)
        {
            // enum->int  unbox will perform a GC-polling FCALL   (easy case for suspension)
            // int unboxed = (int)box;

            // unboxing is not an FCall in NativeAOT and does not poll. Hard to find benign operation that polls.
            // use PInvoke instead as an "easy for suspension" case
            Thread.Yield();
        }

        a[i] = null;
    }

    static void Main()
    {
        for (int i = 0; i < Environment.ProcessorCount - 1; i++)
        {
            new Thread(() =>
            {
                DoSomeWork();
            }).Start();
        }

        Thread.Sleep(10);

        for (; ; )
        {
            long longest = 0;
            int iterations = 0;
            doPoll ^= true;

            // do 5sec of suspension/resume
            var sw = Stopwatch.StartNew();
            while (sw.ElapsedMilliseconds < 5000)
            {
                long oneIter = sw.ElapsedMilliseconds;

                // Suspend/Resume EE
                GC.GetTotalAllocatedBytes(precise: true);

                oneIter = sw.ElapsedMilliseconds - oneIter;

                longest = Math.Max(longest, oneIter);
                iterations++;

                // Let other threads actually resume
                Thread.Yield();
            }
            sw.Stop();

            {
                Console.WriteLine($"frequent GC-polling        : {doPoll}");
                Console.WriteLine($"#suspensions in 5 sec.     : {iterations}");
                Console.WriteLine($"Average suspension took    : {(double)sw.ElapsedMilliseconds / iterations}");
                Console.WriteLine($"longest suspend  was (msec): {longest}");
                Console.WriteLine();
            }
        }
    }
}

@VSadov
Copy link
Member Author

VSadov commented Aug 11, 2022

Typical results on Windows:

========  baseline (before the change)

frequent GC-polling        : True
#suspensions in 5 sec.     : 110773
Average suspension took    : 0.045137352965072715
longest suspend  was (msec): 1

frequent GC-polling        : False
#suspensions in 5 sec.     : 3891
Average suspension took    : 1.285016705217168
longest suspend  was (msec): 15

=========  new (after the change)

frequent GC-polling        : True
#suspensions in 5 sec.     : 113079
Average suspension took    : 0.044216874928147576
longest suspend  was (msec): 1

frequent GC-polling        : False
#suspensions in 5 sec.     : 4287
Average suspension took    : 1.166316771635176
longest suspend  was (msec): 5


========= CoreCLR

frequent GC-polling        : True
#suspensions in 5 sec.     : 110626
Average suspension took    : 0.04519733154954531
longest suspend  was (msec): 1

frequent GC-polling        : False
#suspensions in 5 sec.     : 1181
Average suspension took    : 4.239627434377646
longest suspend  was (msec): 61

@VSadov
Copy link
Member Author

VSadov commented Aug 11, 2022

Typical results on Unix (WSL2 Ubuntu 18.04 on the same machine)

========  baseline (before the change)

frequent GC-polling        : True
#suspensions in 5 sec.     : 89629
Average suspension took    : 0.055785515848665056
longest suspend  was (msec): 2

frequent GC-polling        : False
#suspensions in 5 sec.     : 38184
Average suspension took    : 0.13094489838675885
longest suspend  was (msec): 2


========== new (after the change)

frequent GC-polling        : True
#suspensions in 5 sec.     : 90128
Average suspension took    : 0.05547665542339784
longest suspend  was (msec): 2

frequent GC-polling        : False
#suspensions in 5 sec.     : 57635
Average suspension took    : 0.08675284115554785
longest suspend  was (msec): 2

========== CoreCLR

frequent GC-polling        : True
#suspensions in 5 sec.     : 130544
Average suspension took    : 0.03830126240960902
longest suspend  was (msec): 2

frequent GC-polling        : False
#suspensions in 5 sec.     : 5045
Average suspension took    : 0.9916749256689792
longest suspend  was (msec): 13

@VSadov
Copy link
Member Author

VSadov commented Aug 11, 2022

  • The change is a slight improvement, but does not change much overall. That is expected as it is roughly the same approach.
  • CoreCLR is slightly faster in "easy" case. This is an ideal case and small differences are not very important as long as the results are not abnormally bad.
  • CoreCLR is slower in "harder" case, likely due to pauses introduced by event timeouts. We can look at that in v.Next on the CoreCLR side
  • Linux is faster in the "harder" case. That is likely because hijacks run in signal handlers and thus in parallel to each other. We see that in CoreCLR as well.

@jkotas
Copy link
Member

jkotas commented Aug 11, 2022

We can look at that in v.Next on the CoreCLR side

Do you think we can refactor the thread suspension such that majority of the code is shared between NativeAOT and CoreCLR? It would be the ultimate place to be.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@VSadov
Copy link
Member Author

VSadov commented Aug 11, 2022

Do you think we can refactor the thread suspension such that majority of the code is shared between NativeAOT and CoreCLR?

The main differences are call outs to hijack routines, detection if a thread is “suspended” and interaction with debugger and maybe threadabort support.
I think that could be tricky to abstract out, but may be doable..

@VSadov
Copy link
Member Author

VSadov commented Aug 12, 2022

/azp run runtime-extra-platforms

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MichalStrehovsky
Copy link
Member

/azp run runtime-extra-platforms

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@VSadov
Copy link
Member Author

VSadov commented Aug 12, 2022

Now I see a bunch of cryptography and COM Interop failures.

@VSadov
Copy link
Member Author

VSadov commented Aug 12, 2022

Linux and Linux musl both fail with:

System.NullReferenceException : Object reference not set to an instance of an object.
   at System.Security.Cryptography.PasswordDeriveBytes.ComputeBaseValue() + 0x14
   at System.Security.Cryptography.PasswordDeriveBytes.GetBytes(Int32) + 0x3a
   at System.Security.Cryptography.DeriveBytesTests.PasswordDeriveBytesTests.TestKnownValue_GetBytes(HashAlgorithmName, String, Byte[], Int32, Byte[]) + 0xea
   at System.Security.Cryptography.Csp!<BaseAddress>+0xe46df3
   at System.Reflection.DynamicInvokeInfo.Invoke(Object, IntPtr, Object[], BinderBundle, Boolean) + 0xfd

@VSadov
Copy link
Member Author

VSadov commented Aug 12, 2022

Windows (both x64 and arm64):
A bunch of failures like

System.NotSupportedException : COM Interop requires ComWrapper instance registered for marshalling.
   at System.Runtime.InteropServices.ComWrappers.ComObjectForInterface(IntPtr) + 0x64
   at System.Transactions.DtcProxyShim.DtcProxyShimFactory.DtcGetTransactionManagerExW(String, String, Guid&, Int32, Object, ITransactionDispenser&) + 0x130
   at System.Transactions.DtcProxyShim.DtcProxyShimFactory.ConnectToProxy(String, Guid, Object, Boolean&, Byte[]&, Re

or

[FAIL] System.Transactions.Tests.OleTxTests.Volatile_and_durable_enlistments(volatileCount: 1)
Assert.Equal() Failure
Expected: Aborted
Actual:   Committed
   at System.Transactions.Tests.TestEnlistment.Rollback(Enlistment) + 0x47
   at System.Transactions.VolatileEnlistmentAborting.EnterState(InternalEnlistment) + 0xbe
   at System.Transactions.TransactionStateAborted.EnterState(InternalTransaction) + 0x9f
   at System.Transactions.TransactionStatePromoted.EnterState(InternalTransaction) + 0x31f
   at System.Transactions.EnlistableStates.EnlistDurable(InternalTransaction, Guid, IEnlistmentNotification, Enlis

@MichalStrehovsky
Copy link
Member

System.NullReferenceException : Object reference not set to an instance of an object.

Exists in the baseline - ignore. This API is trimming unsafe and it looks like it ran out of luck. We'll annotate it like that.

System.Transactions

These are new tests from #72051 that just merged. We'll disable them. The feature doesn't work with trimming or AOT.

@VSadov
Copy link
Member Author

VSadov commented Aug 12, 2022

I've created a PR with a trivial noop change #73856.
I do not think I have seen a failure in this PR that was not also in the noop PR, so I assume this PR does not introduce new failures and can be merged.

@VSadov VSadov merged commit 2369156 into dotnet:main Aug 12, 2022
@VSadov VSadov deleted the suspLoop branch August 12, 2022 23:37
@VSadov
Copy link
Member Author

VSadov commented Aug 12, 2022

Thanks!!!

@VSadov VSadov mentioned this pull request Aug 13, 2022
9 tasks
@ghost ghost locked as resolved and limited conversation to collaborators Sep 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants