-
Notifications
You must be signed in to change notification settings - Fork 5k
Reduce background GC allocations #114320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Reduce background GC allocations #114320
Conversation
@@ -96,22 +92,64 @@ internal sealed class GCMemoryInfoData | |||
/// </remarks> | |||
public readonly struct GCMemoryInfo | |||
{ | |||
private readonly GCMemoryInfoData _data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will change the atomicity of GCMemoryInfo
. Is there anything depending on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All uses in the repo just read a value from the struct and throw it away (except one ArrayTest that stores the struct in a static readonly field). Also, it's just an implementation detail, and the old implementation before #37879 did not wrap a class, so it would be unsafe for anyone else to have taken a dependency on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that the implementation was changed to use class internally to avoid large amount of copying. This is likely a performance regression for a typical GCMemoryInfo use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems unlikely to affect performance for typical use cases (which is just reading some values and throwing it away). It should at least reduce GC pressure for typical use cases that always throw away the allocated object. @Maoni0 Maybe you have some more insight here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this change makes ReadOnlySpan<System.GCGenerationInfo> GenerationInfo
getters dangerous. These getters assume that the lifetime for the underlying storage is managed by the GC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something that object stack allocation might be able to optimize at some point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something that object stack allocation might be able to optimize at some point?
Possibly. cc @dotnet/jit-contrib
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something that object stack allocation might be able to optimize at some point?
Possibly. cc @dotnet/jit-contrib
Escape analysis doesn't handle objects with finalizers, since the runtime allocation helper for finalizable objects immediately causes an object reference to escape.
But it seems like if the JIT could prove that object was created within a try/finally with proper invocation of Dispose
in the finally, stack allocation might be possible. I believe with foreach
enumerations the enumerator constructor invocation is not typically within the try... will need to double-check.
In order to stack allocate we'd also need to ensure that the various enumerator methods were inlined. The methods here are fairly large so not clear if that would happen with current heuristics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AndyAyersMS What about the GC.GetGCMemoryInfo
, which allocates an object without a finalizer, but does send it to native code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it seems like if the JIT could prove that object was created within a try/finally with proper invocation of
Dispose
in the finally, stack allocation might be possible. I believe withforeach
enumerations the enumerator constructor invocation is not typically within the try... will need to double-check.
You're correct, the enumerator is retrieved outside of the try block. But why should it matter in .Net Core since it doesn't have thread aborts? Is there something else that could interrupt it between the enumerator constructor and the try?
Maybe Roslyn can change it so that it does retrieve inside of the try if it's necessary.
Also, would Dispose
need to call GC.SuppressFinalize
for the JIT proof? (Currently the CWT enumerator does not.)
As I have said in the issue, I am not sure whether it is worth it to try to eliminate some of these background allocations. |
} | ||
|
||
/// <summary>Provides an enumerator for the table that can be used on the stack without allocation.</summary> | ||
internal ref struct RefEnumerator : IEnumerator<KeyValuePair<TKey, TValue>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #53296 for a long discussion about why it is not the best idea to expose struct enumerator for ConditionalWeakTable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see you have mentioned in a comment - but that does not make it a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm well aware of the dangers, which is why it's internal only. I left a lengthy comment about it on the EnumerateOnStack()
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not interested in spreading more unsafe or otherwise dangerous code throughout the libraries. We believe we have too much of it and we are actively working on reducing it.
Adjust cpp impls to take struct pointers.
Fixes #101536
I tried to build this to test locally, but was unable to.
Changes:
ConditionalWeakTable
objects without allocations.GCMemoryInfoData
to remove allocations after the first call toGC.GetGCMemoryInfo()
.Both of these are used by
SharedArrayPool.Trim()
which is called from Gen2GcCallback.For 2, it looks like the
extern void GetMemoryInfo(GCMemoryInfoData data, int kind)
function could beextern void GetMemoryInfo(GCMemoryInfo* data, int kind)
, passing a struct pointer instead of class to be always zero alloc, but I don't know enough about the system or C++. @Maoni0cc @jkotas