Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Stop the PMTUD search at the interface MTU #2135

Merged
merged 35 commits into from
Dec 18, 2024

Conversation

larseggert
Copy link
Collaborator

@larseggert larseggert commented Sep 26, 2024

Should we also optimistically start the search at the interface MTU, and only start from 1280 when that fails?

WIP

Should we optimistically *start* the search at the interface MTU, and
only start from 1280 when that fails?
Copy link

codecov bot commented Sep 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.30%. Comparing base (b070663) to head (2b85cfb).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2135   +/-   ##
=======================================
  Coverage   93.29%   93.30%           
=======================================
  Files         113      113           
  Lines       36709    36727   +18     
  Branches    36709    36727   +18     
=======================================
+ Hits        34248    34268   +20     
+ Misses       1678     1676    -2     
  Partials      783      783           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@larseggert larseggert marked this pull request as ready for review September 26, 2024 13:08
@mxinden
Copy link
Collaborator

mxinden commented Sep 26, 2024

Should we also optimistically start the search at the interface MTU

Are there other projects using this optimistic approach?

If I understand RFC 8899 correctly the local interface MTU is the end value, not the start value.

The MAX_PLPMTU is the largest size of PLPMTU. This has to be less than or equal to the maximum size of the PL packet that can be sent on the outgoing interface (constrained by the local interface MTU).

https://www.rfc-editor.org/rfc/rfc8899.html#section-5.1.2

Copy link

github-actions bot commented Sep 26, 2024

Failed Interop Tests

QUIC Interop Runner, client vs. server, differences relative to b070663.

neqo-latest as client

neqo-latest as server

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

@larseggert
Copy link
Collaborator Author

All true, but in practice, the local interface is most often the limiting hop.

Copy link

github-actions bot commented Sep 26, 2024

Benchmark results

Performance differences relative to ad8f390.

decode 4096 bytes, mask ff: 💚 Performance has improved.
       time:   [11.198 µs 11.242 µs 11.290 µs]
       change: [-5.9617% -5.5346% -5.0685%] (p = 0.00 < 0.05)

Found 16 outliers among 100 measurements (16.00%)
2 (2.00%) low severe
3 (3.00%) low mild
11 (11.00%) high severe

decode 1048576 bytes, mask ff: 💚 Performance has improved.
       time:   [2.8827 ms 2.8902 ms 2.8995 ms]
       change: [-7.4445% -7.0117% -6.5851%] (p = 0.00 < 0.05)

Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) high mild
6 (6.00%) high severe

decode 4096 bytes, mask 7f: 💚 Performance has improved.
       time:   [19.516 µs 19.548 µs 19.588 µs]
       change: [-1.9813% -1.4895% -1.1080%] (p = 0.00 < 0.05)

Found 17 outliers among 100 measurements (17.00%)
2 (2.00%) low severe
2 (2.00%) low mild
3 (3.00%) high mild
10 (10.00%) high severe

decode 1048576 bytes, mask 7f: 💚 Performance has improved.
       time:   [5.0715 ms 5.0842 ms 5.0987 ms]
       change: [-3.4917% -2.3386% -1.5961%] (p = 0.00 < 0.05)

Found 12 outliers among 100 measurements (12.00%)
12 (12.00%) high severe

decode 4096 bytes, mask 3f: 💚 Performance has improved.
       time:   [5.5312 µs 5.5475 µs 5.5699 µs]
       change: [-21.078% -19.727% -18.519%] (p = 0.00 < 0.05)

Found 17 outliers among 100 measurements (17.00%)
7 (7.00%) low mild
2 (2.00%) high mild
8 (8.00%) high severe

decode 1048576 bytes, mask 3f: 💚 Performance has improved.
       time:   [1.4168 ms 1.4268 ms 1.4396 ms]
       change: [-19.583% -18.994% -18.313%] (p = 0.00 < 0.05)

Found 9 outliers among 100 measurements (9.00%)
9 (9.00%) high severe

coalesce_acked_from_zero 1+1 entries: No change in performance detected.
       time:   [98.506 ns 98.819 ns 99.144 ns]
       change: [-0.4338% +0.0330% +0.4766%] (p = 0.89 > 0.05)

Found 9 outliers among 100 measurements (9.00%)
1 (1.00%) high mild
8 (8.00%) high severe

coalesce_acked_from_zero 3+1 entries: No change in performance detected.
       time:   [116.82 ns 117.14 ns 117.49 ns]
       change: [-0.1888% +0.1344% +0.4678%] (p = 0.43 > 0.05)

Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) low mild
2 (2.00%) high mild
9 (9.00%) high severe

coalesce_acked_from_zero 10+1 entries: No change in performance detected.
       time:   [116.63 ns 117.20 ns 117.86 ns]
       change: [-0.0128% +0.6115% +1.2425%] (p = 0.06 > 0.05)

Found 15 outliers among 100 measurements (15.00%)
4 (4.00%) low severe
1 (1.00%) high mild
10 (10.00%) high severe

coalesce_acked_from_zero 1000+1 entries: No change in performance detected.
       time:   [97.408 ns 97.540 ns 97.683 ns]
       change: [+0.0501% +0.7347% +1.5543%] (p = 0.05 > 0.05)

Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) high mild
7 (7.00%) high severe

RxStreamOrderer::inbound_frame(): Change within noise threshold.
       time:   [111.13 ms 111.27 ms 111.49 ms]
       change: [-1.0423% -0.7981% -0.5448%] (p = 0.00 < 0.05)

Found 17 outliers among 100 measurements (17.00%)
9 (9.00%) low mild
7 (7.00%) high mild
1 (1.00%) high severe

SentPackets::take_ranges: No change in performance detected.
       time:   [5.5276 µs 5.6914 µs 5.8570 µs]
       change: [-15.302% -0.4349% +17.872%] (p = 0.94 > 0.05)

Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe

transfer/pacing-false/varying-seeds: 💔 Performance has regressed.
       time:   [42.866 ms 42.945 ms 43.024 ms]
       change: [+42.829% +45.519% +48.464%] (p = 0.00 < 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) low mild

transfer/pacing-true/varying-seeds: 💔 Performance has regressed.
       time:   [43.121 ms 43.195 ms 43.268 ms]
       change: [+36.398% +38.723% +41.287%] (p = 0.00 < 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

transfer/pacing-false/same-seed: 💔 Performance has regressed.
       time:   [42.670 ms 42.734 ms 42.806 ms]
       change: [+68.397% +72.050% +75.985%] (p = 0.00 < 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe

transfer/pacing-true/same-seed: 💔 Performance has regressed.
       time:   [42.933 ms 42.986 ms 43.039 ms]
       change: [+46.924% +51.065% +55.444%] (p = 0.00 < 0.05)
1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: No change in performance detected.
       time:   [869.17 ms 878.56 ms 888.07 ms]
       thrpt:  [112.60 MiB/s 113.82 MiB/s 115.05 MiB/s]
change:
       time:   [-0.2526% +1.3795% +2.9237%] (p = 0.09 > 0.05)
       thrpt:  [-2.8406% -1.3607% +0.2533%]
1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: 💚 Performance has improved.
       time:   [297.26 ms 299.20 ms 301.15 ms]
       thrpt:  [33.207 Kelem/s 33.423 Kelem/s 33.640 Kelem/s]
change:
       time:   [-7.0953% -5.9407% -4.8692%] (p = 0.00 < 0.05)
       thrpt:  [+5.1184% +6.3159% +7.6372%]

Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) low mild
2 (2.00%) high mild

1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: 💔 Performance has regressed.
       time:   [34.226 ms 34.430 ms 34.655 ms]
       thrpt:  [28.856  elem/s 29.044  elem/s 29.218  elem/s]
change:
       time:   [+1.4019% +2.2722% +3.1384%] (p = 0.00 < 0.05)
       thrpt:  [-3.0429% -2.2218% -1.3825%]

Found 17 outliers among 100 measurements (17.00%)
8 (8.00%) low mild
3 (3.00%) high mild
6 (6.00%) high severe

1-conn/1-100mb-resp/mtu-1504 (aka. Upload)/client: No change in performance detected.
       time:   [1.6597 s 1.6757 s 1.6916 s]
       thrpt:  [59.115 MiB/s 59.676 MiB/s 60.250 MiB/s]
change:
       time:   [-0.6191% +0.6644% +2.0114%] (p = 0.34 > 0.05)
       thrpt:  [-1.9717% -0.6600% +0.6230%]

Client/server transfer results

Transfer of 33554432 bytes over loopback.

Client Server CC Pacing MTU Mean [ms] Min [ms] Max [ms]
gquiche gquiche 1504 522.0 ± 11.2 508.6 541.0
neqo gquiche reno on 1504 770.3 ± 11.3 750.5 784.1
neqo gquiche reno 1504 756.1 ± 14.0 738.4 786.8
neqo gquiche cubic on 1504 752.8 ± 7.1 742.8 761.5
neqo gquiche cubic 1504 750.6 ± 9.7 739.4 769.4
msquic msquic 1504 140.6 ± 64.3 95.0 335.7
neqo msquic reno on 1504 211.6 ± 10.8 198.2 230.4
neqo msquic reno 1504 214.1 ± 10.9 197.5 229.1
neqo msquic cubic on 1504 244.0 ± 50.8 210.4 364.2
neqo msquic cubic 1504 215.9 ± 9.8 199.0 227.0
gquiche neqo reno on 1504 690.6 ± 94.5 563.5 825.3
gquiche neqo reno 1504 708.8 ± 75.8 609.8 822.9
gquiche neqo cubic on 1504 671.9 ± 84.1 547.5 797.1
gquiche neqo cubic 1504 692.6 ± 92.7 566.1 827.8
msquic neqo reno on 1504 472.8 ± 8.3 462.4 485.5
msquic neqo reno 1504 493.7 ± 8.8 484.0 505.6
msquic neqo cubic on 1504 485.2 ± 11.6 465.4 510.8
msquic neqo cubic 1504 490.6 ± 48.0 457.9 623.9
neqo neqo reno on 1504 494.5 ± 22.3 459.7 530.2
neqo neqo reno 1504 473.2 ± 28.0 443.3 525.4
neqo neqo cubic on 1504 560.0 ± 37.4 506.9 648.3
neqo neqo cubic 1504 545.5 ± 25.2 498.2 567.1

⬇️ Download logs

@larseggert
Copy link
Collaborator Author

This PR exposed a bug in mtu 🫤 mozilla/mtu#26

@mxinden
Copy link
Collaborator

mxinden commented Sep 26, 2024

Should we also optimistically start the search at the interface MTU

Are there other projects using this optimistic approach?

If I understand RFC 8899 correctly the local interface MTU is the end value, not the start value.

The MAX_PLPMTU is the largest size of PLPMTU. This has to be less than or equal to the maximum size of the PL packet that can be sent on the outgoing interface (constrained by the local interface MTU).

https://www.rfc-editor.org/rfc/rfc8899.html#section-5.1.2

All true, but in practice, the local interface is most often the limiting hop.

Let me make sure I understand the implications here correctly. Sorry for any potential mistakes.

We only start probing once the connection is confirmed.

fn set_confirmed(&mut self) -> Res<()> {
self.set_state(State::Confirmed);
if self.conn_params.pmtud_enabled() {
self.paths
.primary()
.ok_or(Error::InternalError)?
.borrow_mut()
.pmtud_mut()
.start();

Say that a client's path MTU is smaller than their local interface MTU. Given that probing only starts once confirmed, i.e. after receiving HANDSHAKE_DONE from the server, initial HTTP requests would not be delayed, but only one subsequent flight of requests would be delayed by up to one PTO. Is that correct?

Thus this optimization, and really all of PMTUD probing, assumes that the potential delay of one subsequent flight of HTTP requests by up to one PTO is worth the trade off of potentially increasing the overall connection throughput.

Is that correct?

@larseggert
Copy link
Collaborator Author

Should we also optimistically start the search at the interface MTU

Let me make sure I understand the implications here correctly. Sorry for any potential mistakes.
We only start probing once the connection is confirmed.

This would need to change. What I think we should do is roughly this:

  • Start sending at the local interface MTU when the connection starts (i.e., for the Initial).
  • If we detect a loss n times, we revert to pobing up from 1280.

n should probably be something like 2, so we don't cause undue delay.

@mxinden
Copy link
Collaborator

mxinden commented Sep 27, 2024

In the case where a client's path MTU is smaller than their local interface MTU, this would add a delay of 2*PTO to every connection establishment, right? If so, isn't that a high cost for the potential increase in throughput? Or is such scenario just very rare?

@larseggert
Copy link
Collaborator Author

larseggert commented Sep 27, 2024

Yes. I think this is a rare case, but maybe we add some telemetry first to confirm?

We could also cache a probed MTU towards a destination IP, like the OS does for a TCP MSS it has determined.

@mxinden
Copy link
Collaborator

mxinden commented Sep 27, 2024

Yes. I think this is a rare case, but maybe we add some telemetry first to confirm?

How about we:

  1. Role out PMTUD on Firefox Nightly without the optimization (i.e. starting at the local interface MTU).
  2. Enable the optimization, monitoring whether connection establishment times stay stable.

@larseggert
Copy link
Collaborator Author

I was thinking we just log the log the local interface MTU together with the discovered PMTUD, and check for differences.

Copy link
Collaborator

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looking good to me. Minor comments.

@larseggert larseggert marked this pull request as draft October 8, 2024 05:51
@larseggert larseggert marked this pull request as ready for review October 8, 2024 12:47
@larseggert larseggert marked this pull request as draft October 8, 2024 12:48
@larseggert larseggert marked this pull request as ready for review October 8, 2024 13:10
i3roly pushed a commit to i3roly/firefox-dynasty that referenced this pull request Nov 30, 2024
…in-reviewers,valentin

[mozilla/neqo#2135](mozilla/neqo#2135) adds `mtu` crate
to `neqo-*`. `mtu` crate depends on `windows-bindgen`. `windows-bindgen` depends
on `rayon` `1.7`.

On the other hand mozilla-central depends on [`rayon`
`v1.6.1`](https://searchfox.org/mozilla-central/rev/7987501f2c2ed1914e5c682bd328ace9c4a7c6cd/Cargo.lock#5149-5157).

Given that mozilla-central allows at most one version of each crate, let's
update mozilla-central to `rayon` `1.10.0`, i.e. the most recent version.

See mozilla/neqo#2135 (comment) for details.

Differential Revision: https://phabricator.services.mozilla.com/D230127
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified-and-comments-removed that referenced this pull request Dec 1, 2024
…in-reviewers,valentin

[mozilla/neqo#2135](mozilla/neqo#2135) adds `mtu` crate
to `neqo-*`. `mtu` crate depends on `windows-bindgen`. `windows-bindgen` depends
on `rayon` `1.7`.

On the other hand mozilla-central depends on [`rayon`
`v1.6.1`](https://searchfox.org/mozilla-central/rev/7987501f2c2ed1914e5c682bd328ace9c4a7c6cd/Cargo.lock#5149-5157).

Given that mozilla-central allows at most one version of each crate, let's
update mozilla-central to `rayon` `1.10.0`, i.e. the most recent version.

See mozilla/neqo#2135 (comment) for details.

Differential Revision: https://phabricator.services.mozilla.com/D230127

UltraBlame original commit: a80b258672c95bf02014f72b7fde8609b6f507cc
gecko-dev-updater pushed a commit to marco-c/gecko-dev-comments-removed that referenced this pull request Dec 1, 2024
…in-reviewers,valentin

[mozilla/neqo#2135](mozilla/neqo#2135) adds `mtu` crate
to `neqo-*`. `mtu` crate depends on `windows-bindgen`. `windows-bindgen` depends
on `rayon` `1.7`.

On the other hand mozilla-central depends on [`rayon`
`v1.6.1`](https://searchfox.org/mozilla-central/rev/7987501f2c2ed1914e5c682bd328ace9c4a7c6cd/Cargo.lock#5149-5157).

Given that mozilla-central allows at most one version of each crate, let's
update mozilla-central to `rayon` `1.10.0`, i.e. the most recent version.

See mozilla/neqo#2135 (comment) for details.

Differential Revision: https://phabricator.services.mozilla.com/D230127

UltraBlame original commit: a80b258672c95bf02014f72b7fde8609b6f507cc
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this pull request Dec 1, 2024
…in-reviewers,valentin

[mozilla/neqo#2135](mozilla/neqo#2135) adds `mtu` crate
to `neqo-*`. `mtu` crate depends on `windows-bindgen`. `windows-bindgen` depends
on `rayon` `1.7`.

On the other hand mozilla-central depends on [`rayon`
`v1.6.1`](https://searchfox.org/mozilla-central/rev/7987501f2c2ed1914e5c682bd328ace9c4a7c6cd/Cargo.lock#5149-5157).

Given that mozilla-central allows at most one version of each crate, let's
update mozilla-central to `rayon` `1.10.0`, i.e. the most recent version.

See mozilla/neqo#2135 (comment) for details.

Differential Revision: https://phabricator.services.mozilla.com/D230127

UltraBlame original commit: a80b258672c95bf02014f72b7fde8609b6f507cc
@larseggert
Copy link
Collaborator Author

@mxinden should we update more deps on m-c before merging this, or do we want to handle that when we vendor neqo into m-c the next time?

@mxinden
Copy link
Collaborator

mxinden commented Dec 17, 2024

Other than the rayon (:heavy_check_mark:) update and the windows-bindgen audit, what else is missing @larseggert?

@larseggert
Copy link
Collaborator Author

Other than the rayon (:heavy_check_mark:) update and the windows-bindgen audit, what else is missing @larseggert?

I think windows-metadata, but that is in the same category as windows-bindgen.

@mxinden
Copy link
Collaborator

mxinden commented Dec 17, 2024

If it is audits only, I suggest to "handle that when we vendor neqo into m-c the next time".

Though note, auditing windows-bindgen from scratch might be a significant undertaking.

@larseggert larseggert enabled auto-merge December 18, 2024 06:00
Copy link
Collaborator

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do any of the existing unit tests cover the case where Pmtud stops at the interface MTU?

larseggert and others added 3 commits December 18, 2024 13:16
Co-authored-by: Max Inden <[email protected]>
Signed-off-by: Lars Eggert <[email protected]>
Co-authored-by: Max Inden <[email protected]>
Signed-off-by: Lars Eggert <[email protected]>
Co-authored-by: Max Inden <[email protected]>
Signed-off-by: Lars Eggert <[email protected]>
@larseggert larseggert disabled auto-merge December 18, 2024 13:56
@larseggert
Copy link
Collaborator Author

Though note, auditing windows-bindgen from scratch might be a significant undertaking.

I don't think we need to - the author is trusted.

@larseggert larseggert requested a review from mxinden December 18, 2024 15:02
Copy link
Collaborator

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do any of the existing unit tests cover the case where Pmtud stops at the interface MTU?

Only remaining question. Otherwise this is ready to merge from my end.

@larseggert larseggert added this pull request to the merge queue Dec 18, 2024
Merged via the queue into mozilla:main with commit ab56672 Dec 18, 2024
62 of 64 checks passed
@larseggert larseggert deleted the feat-stop-pmtud-at-iface-mtu branch December 18, 2024 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants