-
Notifications
You must be signed in to change notification settings - Fork 5k
System.IO.Pipelines.Pipe.ReadAsync fails to cancel on CancellationToken #58909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is seriously impacting our pipelines. We've yet to catch this on a local dev machine, but several of our pipelines fail mulltiple times daily because of this bug. One in particular hangs in 75% of runs! Please look into this soon. I wonder if there's some kind of race condition in or around this code: runtime/src/libraries/System.IO.Pipelines/src/System/IO/Pipelines/PipeAwaitable.cs Lines 47 to 61 in 033b2ac
|
Anything that updates the PipeAwaitable should be in a lock. The only thing that could change state here is the cancellationToken itself (since the caller owns that). The registration is released here
It seems you're also using the sync context. I dunno if that's related but does it repro if you disable it? |
This looked interesting, so I did a little digging and here is what I found: runtime/src/libraries/System.IO.Pipelines/src/System/IO/Pipelines/PipeAwaitable.cs Lines 46 to 60 in 57bfe47
If the cancellation token gets cancelled just before executing line runtime/src/libraries/System.IO.Pipelines/src/System/IO/Pipelines/PipeAwaitable.cs Line 58 in 57bfe47
then the callback will be invoked synchronously. The callback is actually: runtime/src/libraries/System.IO.Pipelines/src/System/IO/Pipelines/Pipe.cs Lines 1043 to 1051 in 033b2ac
which calls runtime/src/libraries/System.IO.Pipelines/src/System/IO/Pipelines/PipeAwaitable.cs Lines 139 to 150 in 57bfe47
However if we read carefully in runtime/src/libraries/System.IO.Pipelines/src/System/IO/Pipelines/PipeAwaitable.cs Lines 21 to 23 in 57bfe47
It uses the runtime/src/libraries/System.IO.Pipelines/src/System/IO/Pipelines/PipeAwaitable.cs Line 58 in 57bfe47
The summary is that //cc @davidfowl |
@zlatanov amazing analysis! Inline execution of the callbacks is always a pain to deal with. Also, we use the registration to reduce the size of the PipeAwaitable. |
@davidfowl In order to keep the use of the registration an easy solution would be to modify the callback: runtime/src/libraries/System.IO.Pipelines/src/System/IO/Pipelines/Pipe.cs Lines 1114 to 1122 in 57bfe47
to have something like this at the beginning: if ( Monitor.IsEntered(SyncObj))
{
// We have synchronous callback from UnsafeRegister, but because we use the result from it
// we can't process it synchronously.
ThreadPool.UnsafeQueueUserWorkItem(s_signalReaderAwaitable, this);
return;
}
//... the rest of the method as is |
I'm wondering if there's any downside is removing the IsCancellationRequested check in |
Somebody left a pretty telling comment there: |
I think it's dangerous to add it there, because it only hides the issue. In the future if The issue here is that the callback is fired synchronously and in that code's execution path, some method is using the The best solution would be to be able to set a dummy |
I'm not concerned about code refactoring. We'll add some heavy comments to explain why it's important. I'd like to keep the fix as localized to PipeAwaitable as possible.
The simplest solution is to always store both the token and the registration and make the |
True, but what I don't like about this, is the fact that the callback has already run and could potentially have already changed the state of the awaitable. |
I'm now not sure how this happens on .NET Framework since the cancellation token is assigned before registering for the callback... |
How sure are we that the pipeline uses the correct dll? I can't open the build pipeline at the moment to try and download the artifacts for .NET Framework and use reflection to make sure the we're not chasing ghosts. |
@AArnott ? |
I can confirm from the dump that the System.IO.Pipelines.dll loaded in the process is 5.0.1+b02e13abab3d6f499751af2ad1ae6333fb4e633d.
The dump file isn't of the .NET Framework. It's a .NET Core process (coreclr.dll 4,700,21,35901 @Commit: 2aee3eb10b6303dff09cbe5ac99b6fae4644f346). If the bug you found that can be blamed for the cause of the hang in this dump can't explain a failure on .NET Framework then I will find a .NET Framework dump as well that we can investigate. |
This still repros after the fix is applied (albeit less frequently). As evidenced by the Here are my notes from windbg and VS investigation of the hang dump:
The source code for that V1Formatter.ReadFrameAsync at the 0th await is ultimately a call to PipeReaderStream.ReadAsyncInternal as can be seen in the VS visualization of the async state machines: And as you can see from the windbg notes, both of those ReadFrameAsync calls have already been canceled. |
@AArnott I can't download the artifact from either source. It seems both are internal to Microsoft organization only. Can you share it some other way so I can access it? |
I'm afraid not in this case. The original dump with which I created this issue came from an open source repo, but this new one comes from a private one. If I can repro on the OSS one, I'll share that publicly. |
I double checked the fix by @davidfowl and it seems solid. However then I decided to look into runtime/src/libraries/System.IO.Pipelines/src/System/IO/Pipelines/PipeAwaitable.cs Lines 59 to 69 in 467f5c6
However after reading the source code in CancellationTokenSource, this seems to not always be the case: runtime/src/libraries/System.Private.CoreLib/src/System/Threading/CancellationTokenSource.cs Lines 585 to 594 in 57bfe47
|
Doesn't it just assume that if default is returned the callback has been invoked? That holds. |
You're right. Can we consider another case then. When signaling the runtime/src/libraries/System.Private.CoreLib/src/System/Threading/CancellationTokenSource.cs Lines 602 to 605 in 57bfe47
which sets the state to runtime/src/libraries/System.Private.CoreLib/src/System/Threading/CancellationTokenSource.cs Lines 630 to 638 in 57bfe47
which resets the runtime/src/libraries/System.Private.CoreLib/src/System/Threading/CancellationTokenSource.cs Line 512 in 57bfe47
Then the registering thread will try to create new registrations object. Everything is fine until these lines in the runtime/src/libraries/System.Private.CoreLib/src/System/Threading/CancellationTokenSource.cs Lines 591 to 594 in 57bfe47
and more specifically runtime/src/libraries/System.Private.CoreLib/src/System/Threading/CancellationTokenSource.cs Line 69 in 57bfe47
Is it possible that the I specifically ask because of this note in the documentation here: On a multiprocessor system, a volatile read operation does not guarantee
to obtain the latest value written to that memory location by any processor. |
In your sequence of events, we're only at this point because the thread read a null _registrations. That read is a full fence and can't be reordered with the subsequent read you cite of _state. _state was written by the other thread with a full fence prior to nulling out _registrations, also with a full fence. The reordering described isn't possible then. |
I took a look at this and it looks like you're using pre CancellationTokenSource rewrite when the My conclusion: since the read isn't canceled and the pipe isn't completed then the hang is expected and points to a bug in the test or application code. |
Thank you, @BrennanConroy. I'll take another look at the dump with that insight. |
Ok, I found a cancellation bug in Nerdbank.Streams this time around (dotnet/Nerdbank.Streams#414). So till evidence suggests otherwise, I will consider this bug fixed with the original PR. |
Uh oh!
There was an error while loading. Please reload this page.
Description
I have a hang dump of a process that is stuck in a
PipeReader.ReadAsync
call that was canceled viaCancellationToken
but which never canceled the returnedValueTask<ReadResult>
. This resulted in the hang.I do not have a reliable repro of it but I have a rash of pipelines and repos that all suffer from frequent hangs in test runs and ultimately they all lead into this same function so I believe they are all caused by this one problem.
Dump file. Failed pipeline.
Looking at the dump file in WinDbg I took these notes, which form a kind of continuation chain mixed with some state in each of the objects. Below you can see that the CancellationToken that was passed to
PipeReader.ReadAsync
is in fact cancelled, but that thePipe._readerAwaitable
field has noCancellationTokenRegistration
for it. You can also see thatReadAsync
has not completed, so we're blocked waiting forReadAsync
to honor theCancellationToken
yet it never will because it wasn't even registered to notice.Configuration
Windows 10
The hangs show up on .NET Framework and .NET Core 3.1 at least.
x64 process
System.IO.Pipelines v5.00.120.57516 (nuget package version v5.0.1)
Regression?
Unknown.
Other information
This bug was originally filed against Nerdbank.Streams as dotnet/Nerdbank.Streams#387 before I tracked it into System.IO.Pipelines.
The text was updated successfully, but these errors were encountered: