Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(D)CUBIC, NewReno and Prague refactoring #1818

Draft
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

hfstco
Copy link
Collaborator

@hfstco hfstco commented Jan 14, 2025

WORK IN PROGRESS

Refactor of CC code. I moved many common parts to cc_common, but I like to keep the code readable and understandable, as well as what the CC algorithm is doing, by just looking at its code. So, some parts stay in the CC code file even if some code duplicates another CC algorithm. (avoiding general picoquic_cc_do_congestion_avoidance(cc_algo, cc_param) functions)

Additionally, I try to change the CWIN (and other path variables) inside the CC code file. Most of the common CC functions return new values or increase them by values.

While doing some coverage runs, I recognized that CUBIC will not enter the recovery phase anyway. (_ alg_state _ isn't set to _ picoquic_cubic_alg_recovery _ at any time.) Even if we call enter_recovery(), we immediately transition to SS or CA. Of course, recovery_sequence is set. Is this intended, or can we remove the recovery state from the enum? Or set it to recovery while the recovery_sequence is set.

Changes:

  • cubic/dcubic notification callback refactor
  • dcubic calls cubic notification callback except for delay variations
  • CWND increase functions (build on each other) for all CC algorithms ((D)CUBIC, NewReno and Prague) in cc_common
  • app limited for all CC algorithms (as part of CWND increase functions, except NewReno currently)
  • moved some CC common functions (e.g. update_bandwidth()) to cc_common
  • moved common parts of congestion avoidance to cc_common
  • loss filter for Prague
  • introduced HyStart++ constants
  • added some test cases to improve code coverage

TODO:

  • changes which require lots of changes in the test suite (memlog keylog_test packet_trace ready_to_send ready_to_skip ready_to_zfin ready_to_zero pacing_update quality_update multipath_callback multipath_quality multipath_stream_af)
       - loss filter in NewReno
       - app limited for NewReno 
  • different betas for (D)CUBIC/calc of beta like in Prague
  • (more test cases to improve coverage)

Any feedback, changes, and help would be appreciated.

@hfstco hfstco linked an issue Jan 14, 2025 that may be closed by this pull request
@huitema
Copy link
Collaborator

huitema commented Jan 14, 2025

Wow! That a lot of work. I will review later, but this is impressive.

Regarding recovery: yes, that's a bug. We should make sure that recovery is properly entered for cubic, etc. But I share your concern about trying to change too much in a single PR. It is probably better to first complete the code cleanup that you are doing while keeping test results unchanged, then do a series of smaller PRs each fixing an individual issue. That way, we can separate refactor, which should not change test results, and individual improvements, for which we can verify that the test results change in the expected ways.

@hfstco hfstco changed the title (D)CUBIC, NewReno and Prague refactor (D)CUBIC, NewReno and Prague refactoring Jan 14, 2025
@hfstco hfstco requested a review from huitema January 14, 2025 17:44
Copy link
Collaborator

@huitema huitema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a series of comments, mostly asking for a few comments in the code in places when we will need further actions, plus a suggested name change for one of the cc functions.

I see that all tests are passing, with just a few requiring minor adjustment of the completion time target. For me, that's the key. If you address the comments below, we can probably merge that now. I would not try to solve all problems in this one large PR, because I am concerned about potential merge conflicts if we wait much longer.

We should then open specialized PR. First, one PR to add support for recovery. Probably another to verify the "app limited" detection and reaction. And another for DCubic, with additional tests to simulate competition on the same link between Cubic (plain) and Dcubic, and investigate potential issues.

picoquic/cc_common.c Show resolved Hide resolved
picoquic/cc_common.c Show resolved Hide resolved
picoquic/cc_common.c Show resolved Hide resolved
*
* Isn't win_cubic always larger than cubic_state->W_reno?
* If clause is unnecessary here.
*/
/* Pick the largest */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's review that closely. The Cubic specification assumes a set of "dimensions", such as times in seconds and windows expressed in packets. The picoquic code expects CWIN to be expressed in bytes, with a typical target being "Cwin = bandwith * RTT", with bandwidth in bytes per second and RTT in seconds. Actually using ""Cwin = bandwidth * RTTus / 1000000" since times are measured in microseconds. I think that w_reno and w_cubic are both computed in bytes, and that the test here just follows the spec. I would expect w_reno to be larger than w_cubic when the RTT is small.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I wrote this comment early in December. I don't can clearly remember and have to check that again.

Copy link
Collaborator

@huitema huitema Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a counter example:
image

This is a bit silly, but it gets to the point. Cubic computes a variable "K", which is the time it takes from start of era to t=K and W=W_MAX (per formula):

K = cubic_root(W_max*(1-beta_cubic)/C) 

The "Cubic" formula is (kinda) independent of the RTT. That's by design. The authors wanted to ensure fairness between connections of different RTT that share the same bottleneck. Otherwise, the connections that have a smaller RTT will get faster feedback, will track the bottleneck faster, etc.

In the example above, I set W_MAX to 5 -- that's the value of the BDP for a 6Mbps link with 10ms delay and 1500 bytes packets. Cubic sets K to 3.75 seconds. That's 375 RTT! By the time we get to T=K, Reno has increased CWND by 375.

image

That effect persists even if we consider the same RTT and higher bandwidth. The curve above is for 1.2 Gbps link with the same 10ms RTT. Cubic computes K= 8.55 seconds. Reno reaches W_MAX before that, at T=5 sec. It keeps increasing the window after that, being more aggressive than Cubic during the remaining of the concave period. So No, win_cubic is not always larger than cubic_state->W_reno. Especially not when the RTT is small.

picoquic/cubic.c Show resolved Hide resolved
picoquictest/edge_cases.c Outdated Show resolved Hide resolved
picoquictest/l4s_test.c Show resolved Hide resolved
picoquictest/h3zerotest.c Show resolved Hide resolved
picoquictest/satellite_test.c Show resolved Hide resolved
picoquictest/satellite_test.c Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor Cubic, D-Cubic and Prague code
2 participants