The purpose of this repository is to provide a Practitioner's Guide to implementing a userspace only network packet congestion control scheme to be used with a low-latency UDP messaging library on intra-corporate networks. This work shall be complete when,
- Concepts for congestion control, packet drop/loss are well explained
- Implementation choices are well described
- Code is provided to concretely demonstrate concepts and implementation
- A guide is provided on how congestion control might be incorporated into a DPDK messaging library
The following papers are a good starting place. Many papers are co-authored by Google, Intel, and Microsoft and/or are used in production. Packets moving between data centers through WAN is not in scope. This work is for packets moving inside a corporate network. No code uses the kernel; this is all user-space work typically done with DPDK/RDMA.
- [1] ECN or Delay: Lessons Learnt from Analysis of DCQCN and TIMELY
- [2] ECN Github
- [3] TIMELY: RTT-based Congestion Control for the Datacenter
- [4] TIMELY Power Point Slides
- [5] TIMELY Source Code Snippet referenced and used in eRPC
- [6] Datacenter RPCs can be General and Fast aka eRPC
- [7] eRPC Source Code
- [8] Carousel: Scalable Traffic Shaping at End Hosts
- [9] Receiver-Driven RDMA Congestion Control by Differentiating Congestion Types in Datacenter Networks
Points of orientation:
- [1] contains a correction to [3] and also describes DCQCN with ECN but not Carousel. ECN requires switches/routers to be ECN enabled. Equinix does not or cannot do ECN, for example.
- [1] concludes DCQCN is better than Timely
- [9] describes another approach handily beating Timely, however, the "deployment of RCC relies on high precision clock synchronization throughout the datacenter network. Some recent research efforts can reduce the upper bound of clock synchronization within a datacenter to a few hundred nanoseconds, which is sufficient for our work."
- Timely congestion control therefore is the least proscriptive, and probably worst performing. I would note most DPDK/RDMA work like RAMCloud rely on lossless packet hardware atypical in corporate data centers. This is another reason why Timely is in scope
- eRPC [6,7] uses Carousel [8] and Timely [3,4,5] to good effect
In an empty directory do the following assuming you have a C++ tool chain and cmake installed:
- git clone https://github.com/gshanemiller/congestion.git
- mkdir build && cd build
- cmake ..
All tasks/libraries can be found in the './build' directory
Some test programs produce data plotted with R (freeware stats program) using a provided R script. See individual READMEs for details. You might encounter ggplot2
unknown or not found. To fix missing R dependencies, run the following commands in the R CLI:
# R packages used here
install.packages("ggplot2")
install.packages("gridExtra")
You only need to do this once. R is smart enough to find external source repositories and install. R does not need to be restarted.
Congestion control involves four major problems:
- Detect and correct packet drop/loss. Detection usually involves timestamps and sequence numbers. Resending packets is more involved
- Detect and respond to congestion by not sending too much data too soon e.g. [1-5, 9]
- Determine when to send new data without exasperbating congestion e.g. [8]
- Do all of the above without wasting CPU
I suggest the following milestone trajectory. It's valid provided Timely is the goto congestion control method. I report other methods above, however, they arguably impose impactical constraints.
Milestones
0
|----------------------------------------------------------------------------->
Provide theoretical motivation for
Timely based on [1,3]
Milestones (cont)
1 2 3
|------------------------+--------------------------+------------------------->
Simulate Timely in C++ Figure out timestamps. When to use Timely?
The goal is to get a eRPC uses rdtsc. Would eRPC has a complicated
good impl of the model NIC timestamps be better? way to selectively
For Mellanox NICs, how ignore Timely or use it
where are they?
Milestones (cont)
4 5 6
|------------------------+--------------------------+------------------------->
Combine 1-3 into a impl Extend (4) adding kernel Extend (5) adding
closer to production UDP sockets and eRPC sequence numbers to detect
code packet level pacing. ACKs packet drop/reorder. Figure
here form RTT. Run sender/ out a way to resend lost
receiver pair to test data in order
Milestones (cont)
7 8 9
|------------------------+--------------------------+------------------------->
Describe Carousel arch. Document arch to add Implement design in (8)
Delineate where work in Carousel to (6) and test/validate it.
(6) stops and Carousel
starts.
Milestones (cont)
10 11
|------------------------+--------------------------+------------------------->
Document how to bring Implement (10) using code
the work in (9) into a in Reinvent library
DPDK setting
- DONE: see congestion.pdf sections 3,4
- DONE See Timely Basic, and Timely eRPC
- STARTED
- Not started
- Not started
- Not started
- Not started
- ALMOST DONE: see congestion.pdf sections 5
- Not started
- Not started
- Note started
- Note started