Skip to content

Send connection WINDOW_UPDATE before RTT PING #97881

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Feb 13, 2024

Conversation

antonfirsov
Copy link
Member

@antonfirsov antonfirsov commented Feb 2, 2024

Fixes #97131.

Some servers in GCP have a non-standard ping accounting mechanism, meaning they reset their unsolicited PING counter when they receive DATA, HEADERS or WINDOW_UPDATE. To comply with this behavior, this PR adjusts the RTT logic ensuring that a connection WINDOW_UPDATE is being sent out before we send an RTT PING by applying two changes:

  • Swap the order of the normal connection WINDOW_UPDATE we attempt after every DATA frame received and the sending of the RTT PING so WINDOW_UPDATE goes first.
  • In case we need to send an RTT PING, and we didn't send a normal connection WINDOW_UPDATE (threshold not reached), send a WINDOW_UPDATE anyways with current _pendingWindowUpdate credits. Do not send a PING if this is not possible (_pendingWindowUpdate == 0). Just like RTT PINGs, such WINDOW_UPDATEs are relatively rare and should not hurt performance.

Ideally, this fix should be backported to all supported versions.

@ghost ghost added the area-System.Net.Http label Feb 2, 2024
@ghost ghost assigned antonfirsov Feb 2, 2024
@ghost
Copy link

ghost commented Feb 2, 2024

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #97131.

Some servers in GCP have a non-standard ping accounting mechanism, meaning they reset their unsolicited PING counter when receiving DATA, HEADERS or WINDOW_UPDATE. To comply with this behavior, this PR adjusts the RTT logic ensuring that a connection WINDOW_UPDATE is being sent out before we send an RTT PING by doing two changes:

  • Swap the order of the normal connection WINDOW_UPDATE we attempt after every DATA frame received and the sending of the RTT PING
  • In case we need to send an RTT PING, and we didn't send a normal connection WINDOW_UPDATE (because of reaching the threshold), send a WINDOW_UPDATE anyways by sending out all the _pendingWindowUpdate credits. Do not send a PING if this is not possible (_pendingWindowUpdate == 0).

Ideally, this fix should be backported to all supported versions.

Author: antonfirsov
Assignees: -
Labels:

area-System.Net.Http

Milestone: -

Comment on lines 1797 to 1800
if (_pendingWindowUpdate == 0) return false;
LogExceptions(SendWindowUpdateAsync(0, _pendingWindowUpdate));
_pendingWindowUpdate = 0;
return true;
Copy link
Member Author

@antonfirsov antonfirsov Feb 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExtendWindow has unnecessary synchronization around _pendingWindowUpdate manipulation which is didn't add to this method. See #97878.

Copy link
Member

@JamesNK JamesNK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is lite on unit tests. What about tests for scenarios that will and won't trigger a ping? And a test that fails if WINDOWS_UPDATE isn't present.

@antonfirsov
Copy link
Member Author

Pushed a test in e93920a.

Copy link
Member

@wfurt wfurt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
I don't fully understand the window & ping management but I don't have any concerns. Hopefully testing was done to ensure correctness.
And I guess excessive updates is the biggest risk of this change, right?

@antonfirsov
Copy link
Member Author

excessive updates is the biggest risk of this change, right?

The amount of WINDOW_UPDATE-s we send did not get significantly higher. Now we essentially do the same as the grpc-go client. Their code has been around for quite a while, and I'm not aware of any servers having an issue with this logic.

https://github.com/grpc/grpc-go/blob/d41b01db97ca2e3627b2c9949fffe8f152a4255d/internal/transport/http2_client.go#L1128-L1140

Copy link
Member

@ManickaP ManickaP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@antonfirsov

This comment was marked as resolved.

This comment was marked as resolved.

Copy link
Member

@MihaZupan MihaZupan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Side question: as far as I understand our current logic, we'll keep sending pings once every ~2 seconds, but only update the RTT if it gets lower.
Are the updates after the initial burst meaningful? That is, we expect the RTT might decrease, but we don't care if it increases?

@antonfirsov
Copy link
Member Author

antonfirsov commented Feb 10, 2024

[Deleted my previous comment because it did not make sense.]

Adjusting RTT upwards would risk growing the window "accidentally" in case of an occasional high delay measurement:

// The condition to extend the window is:
// (_deliveredBytes / dt) * rtt > _streamWindowSize * _windowScaleThresholdMultiplier

With an unstable connection the minimum delay is hopefully constant in long run for most practical SocketsHttpHandler use cases, or at least this is the assumption behind the current strategy.

@antonfirsov
Copy link
Member Author

/azp run runtime-libraries-coreclr outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@antonfirsov
Copy link
Member Author

CI failures are unrelated, the massive outerloop failures are caused by #96035.

@antonfirsov antonfirsov merged commit fae6720 into dotnet:main Feb 13, 2024
@antonfirsov
Copy link
Member Author

/backport to release/8.0-staging

Copy link
Contributor

Started backporting to release/8.0-staging: https://github.com/dotnet/runtime/actions/runs/7892640557

@antonfirsov
Copy link
Member Author

/backport to release/7.0-staging

Copy link
Contributor

Started backporting to release/7.0-staging: https://github.com/dotnet/runtime/actions/runs/7892642858

@antonfirsov
Copy link
Member Author

/backport to release/6.0-staging

Copy link
Contributor

Started backporting to release/6.0-staging: https://github.com/dotnet/runtime/actions/runs/7892646182

@antonfirsov
Copy link
Member Author

/backport to release/8.0-staging

@github-actions github-actions bot unlocked this conversation Mar 19, 2025
Copy link
Contributor

Started backporting to release/8.0-staging: https://github.com/dotnet/runtime/actions/runs/13948929374

Copy link
Contributor

@antonfirsov backporting to "release/8.0-staging" failed, the patch most likely resulted in conflicts:

$ git am --3way --empty=keep --ignore-whitespace --keep-non-patch changes.patch

Applying: send a connection WINDOW_UPDATE before RTT PING
Using index info to reconstruct a base tree...
M	src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/Http2Connection.cs
Falling back to patching base and 3-way merge...
Auto-merging src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/Http2Connection.cs
CONFLICT (content): Merge conflict in src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/Http2Connection.cs
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Patch failed at 0001 send a connection WINDOW_UPDATE before RTT PING
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 19, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[8.0] HTTP/2 request to cloud hosted services is aborted by server with ENHANCE_YOUR_CALM
6 participants