-
Notifications
You must be signed in to change notification settings - Fork 5k
[NativeAOT] ConcurrentDictionary is slower #68891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yes, this looks very much like a result of #67805 I also wonder why ConcurrentDictionary is tested with workstation GC ( |
By default BDN does not enforce any GC settings, so the defaults are used. |
It would be more realistic to test ConcurrentDictionary with server GC, so that concurrency is not limited by singlethreaded GC. Anyways, the most likely problem for this regression has been fixed for Windows as of #70769 Thanks for raising this issue!!! |
It might have helped a little, but the NativeAOT profile shows that we're spending a lot of time in infrastructure around Monitor.Enter/Exit (e.g. DeadEntryCollector.Finalize). Taking a lock on object is a bit more expensive on NativeAOT than in CoreCLR, especially if we then quickly throw away the objects we were locking on and make a new one. The trace seems to be dominated by the costs of Monitor.Entering on an object the first time, and discarding locking information that we have for objects that were collected. The benchmark doesn't seem very real world in the sense that I wouldn't expect ConcurrentDictionaries to be used for such short periods of time that the cost of first/last using them dominates. |
I've used the latest bits and it turned out that it's now slower? The results I got for ILCompiler -------------------- Histogram --------------------
[290.537 us ; 300.306 us) | @
[300.306 us ; 312.848 us) | @@@@
[312.848 us ; 324.572 us) | @@@@@@@@
[324.572 us ; 336.464 us) | @@@@@@
[336.464 us ; 347.056 us) | @
---------------------------------------------------
BenchmarkDotNet=v0.13.1.1786-nightly, OS=Windows 11 (10.0.22000.739/21H2)
AMD Ryzen Threadripper PRO 3945WX 12-Cores, 1 CPU, 24 logical and 12 physical cores
.NET SDK=7.0.100-preview.7.22323.2
[Host] : .NET 7.0.0 (7.0.22.32108), X64 RyuJIT
Job-ELUXMM : .NET 7.0.0 (7.0.22.32108), X64 RyuJIT
Job-EEOWYD : .NET 7.0.0-preview.5.22254.9, X64 NativeAOT
Latest bits: -------------------- Histogram --------------------
[317.286 us ; 387.411 us) | @@@
[387.411 us ; 441.485 us) |
[441.485 us ; 501.692 us) | @@
[501.692 us ; 571.816 us) | @@@@@@@@@@@@@@@
---------------------------------------------------
BenchmarkDotNet=v0.13.1.1799-nightly, OS=Windows 11 (10.0.22000.739/21H2)
AMD Ryzen Threadripper PRO 3945WX 12-Cores, 1 CPU, 24 logical and 12 physical cores
.NET SDK=7.0.100-preview.7.22323.19
[Host] : .NET 7.0.0 (7.0.22.32207), X64 RyuJIT
Job-SXLBJF : .NET 7.0.0 (7.0.22.32207), X64 RyuJIT
Job-NTQTOB : .NET 7.0.0-preview.6.22323.6, X64 NativeAOT
|
This would expect slowness that I have observed in other, simpler benchmarks. |
The new round of tests in on a machine with 2X cores compared to original. Maybe that made the situation with locks a bit worse. Also while GC suspension in NativeAOT on Windows should be on parity with CoreCLR functionally (i.e. unlikely to pause/hang), it misses some optimizations. I wonder if that is important. We plan to implement that, but reliability is the first priority. The scenario does look like impacted by locks a lot. ConcurrentDictionary uses locks internally and may dynamically add locks as needed. I looked at NativeAOT implementation of object locks and it is relatively heavy on lock creation path. I have some ideas how that could be improved, but not sure how much that would help. |
Most of the
ConcurrentDictionary
micro benchmarks are few times slower compared to .NET.Examples:
System.Collections.CreateAddAndClear.ConcurrentDictionary(Size: 512)
System.Collections.CtorFromCollection.ConcurrentDictionary(Size: 512)
Microsoft.Extensions.Caching.Memory.Tests.MemoryCacheTests.AddThenRemove_ExpirationTokens
Microsoft.Extensions.Caching.Memory.Tests.MemoryCacheTests.AddThenRemove_AbsoluteExpiration
Repro:
I took a quick look at numbers reported by VTune and it seems that it might be caused by #67805, but I am not 100% sure so I am reporting a new issue (locking itself might just be slower).
NativeAOT
JIT
cc @jkotas @MichalStrehovsky
The text was updated successfully, but these errors were encountered: