Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ietf quic #962

Merged
merged 4 commits into from
Jun 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions include/conf/service.h
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
#define IP_VS_SVC_F_SIP_HASH 0x0100 /* sip hash target */
#define IP_VS_SVC_F_QID_HASH 0x0200 /* quic cid hash target */
#define IP_VS_SVC_F_MATCH 0x0400 /* snat match */
#define IP_VS_SVC_F_QUIC 0x0800 /* quic/h3 protocol */
#define IP_VS_SVC_F_SCHED_SH_FALLBACK IP_VS_SVC_F_SCHED1 /* SH fallback */
#define IP_VS_SVC_F_SCHED_SH_PORT IP_VS_SVC_F_SCHED2 /* SH use port */

Expand Down
181 changes: 181 additions & 0 deletions include/ipvs/quic.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
/*
* DPVS is a software load balancer (Virtual Server) based on DPDK.
*
* Copyright (C) 2021 iQIYI (www.iqiyi.com).
* All Rights Reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
*/
#ifndef __DPVS_QUIC_H__
#define __DPVS_QUICH__

#include <fcntl.h>
#include "ipvs/service.h"
#include "conf/inet.h"

/*
* In order to support QUIC connection migration, DPVS makes an agreement on
* the format of QUIC Connection ID(CID) into which backend address information
* is encoded. Specifically, backend server should generate its QUIC CIDs complying
* with the format defined as below.
*
* DPVS QUIC Connction ID Format {
* First Octet (8),
* L3 Address Length (3),
* L4 Address Flag (1),
* L3 Address (8...64),
* [ L4 Address (16) ]
* Nonce (32...140)
* }
*
* The notations in CID format definition follows the RFC 9000 name notational
* convention. For detailed explanation, please refer to
* https://datatracker.ietf.org/doc/html/rfc9000#name-notational-conventions.
*
* First Octet: 8 bits
* Allows for compatibility with ITEF QUIC-LB drafts. Not used in DPVS.
* https://datatracker.ietf.org/doc/html/draft-ietf-quic-load-balancers-19
* L3 Address Length: 3 bits
* The length of L3 Address in byte. Add 1 to the 3-bit value gets the actual
* length, which is in range 1...8.
* If the length less than legitimated length, i.e. 4 bytes for IPv4, 16 bytes
* for IPv6, the higher address bytes are truncated.
* L4 Address Flags: 1 bit
* Indicate whether L4 Address is included in this CID.
* 1 - L4 Address is included
* 0 - L4 Address is not included
* L3 Address: 8, 16, 24, 32, 40, 48, 56, 64 bits
* IPv4/IPv6 address with high bytes trimmed if necessary.
* Its length is specified by L3 Address Length.
* L4 Address: 16 bits, optional
* UDP port number.
* Nonce: 32 ~ 140 bits, and constrained by CID's max length of 160 bits
* This is server independent field, often filled with data generated randomly.
* A minimum length is 32 bits to satisfy the entropy requirement of QUIC protocol.
*
* DPVS QUIC CID adopts a variable-length code style. The server information takes
* a fixed 4-bit for address length, and a variable 8 ~ 48 bits for L3 and L4 addresses.
* DPVS may not take the whole L3/L4 Address into CID to reduce the CID length. For example,
* if all backend server are in private network cidr 192.168.0.0/16 listening on the same
* server port, then the use of lowest 16-bit L3 Address without L4 Address is appropriate.
*
* Note the server info in QUIC CID is not encrypted, and we don't plan to implement a quic
* server id allocator as required in IETF QUIC-LB drafts. This is just a simple, stateless
* and clear text encoding, which may subject to security vulnerability that can be exploited
* by an external observer to corelate CIDs of a QUIC connection easier.
*/

#define DPVS_QUIC_DCID_BYTES_MIN 7

struct quic_server {
uint16_t wildcard; // enum value: 8, 16, 24, 32, 40, 48, 56, 64
uint16_t port; // network endian
union inet_addr addr;
};

// Generate a Quic CID accepted by DPVS. The function demos an implementation
// for CID generator that may be used by Quic server applications on RS.
//
// For example, given
// cidlen: 10, l3len:2, l4len:2,
// svr_ip:192.168.111.222(0xC0A86FDE), svr_port:8029(0x1F5D)
// the function generator Quic CIDs like
// XX36 FDE1 F5DX XXXX XXXX
// where 'X' denotes a random hexadecimal.
//
// Params:
// af: l3 address family, valid values are (AF_INET, AF_INET6)
// cidlen: the expected cid total length in bytes, no less than DPVS_QUIC_DCID_BYTES_MIN
// l3len: length in bytes of l3 address to be encoded in cid, valid values are integers (1...8)
// l4len: length in bytes of l4 address to be encoded in cid, valid values are (0, 2)
// svr_ip: l3 address
// svr_port: l4 address
// cid: the result cid buffer, the buffer size must be no less than cidlen
static inline int quic_cid_generator(int af, int cidlen,
int l3len, int l4len, const union inet_addr *svr_ip,
uint16_t svr_port, char *cid) {
char rdbuf[20];
int i, fd, ret, entropy, l4flag;
char *l3addr;
uint16_t l4addr;

entropy = cidlen - l3len - l4len + 1;
l4flag = l4len > 0 ? 1 : 0;
if (AF_INET == af)
l3addr = (char *)svr_ip + (4 - l3len);
else
l3addr = (char *)svr_ip + (16 - l3len);
l4addr = svr_port;

if (cidlen < DPVS_QUIC_DCID_BYTES_MIN ||
l3len > 8 || l3len < 1 ||
(l4len != 0 && l4len != 2) ||
cidlen < l3len + l4len + 5)
return -1;
fd = open("/dev/urandom", O_RDONLY);
if (fd < 0)
return -1;
ret = read(fd, rdbuf, entropy);
if (ret != entropy)
return -1;

cid[0] = rdbuf[0];
cid[1] = (((l3len - 1) & 0x7) << 5)
| ((l4flag & 0x1) << 4)
| ((*l3addr>> 4) & 0xf);
for (i = 0; i < l3len; i++) {
if (i == l3len - 1)
cid[2+i] = ((*l3addr & 0xf) << 4);
else
cid[2+i] = ((*l3addr & 0xf) << 4) | ((*(l3addr+1) >> 4) & 0xf);
l3addr++;
}
if (l4len > 0) {
cid[l3len+1] &= 0xf0;
cid[l3len+1] |= ((l4addr >> 12) & 0xf);
l4addr <<= 4;
cid[l3len+2] = (l4addr >> 8) & 0xff;
cid[l3len+3] = l4addr & 0xff;
}
cid[l3len+l4len+1] |= (rdbuf[1] & 0xf);
memcpy(&cid[l3len+l4len+2], &rdbuf[2], entropy - 3);
return 0;
}

static inline void quic_dump_server(const struct quic_server *qsvr,
char *buf, int bufsize) {
int af;
char addrbuf[64] = { 0 };

buf[0] = '\0';
af = qsvr->wildcard > 32 ? AF_INET6 : AF_INET; // an approximation, not accurate
if (NULL == inet_ntop(af, &qsvr->addr, addrbuf, sizeof(addrbuf)))
return;
if (AF_INET == af)
snprintf(buf, bufsize, "%s:%d", addrbuf, ntohs(qsvr->port));
else
snprintf(buf, bufsize, "[%s]:%d", addrbuf, ntohs(qsvr->port));
}

// Parse backend server address information from mbuf into qsvr.
int quic_parse_server(const struct rte_mbuf *,
const struct dp_vs_iphdr *,
struct quic_server *);

// Schedule a dpvs conn using the backend server specified by qsvr.
// Return NULL if the backend server doesn't exists in the svc's rs list.
struct dp_vs_conn* quic_schedule(const struct dp_vs_service *,
const struct quic_server *,
const struct dp_vs_iphdr *,
struct rte_mbuf *);

#endif
1 change: 1 addition & 0 deletions include/ipvs/service.h
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@
#define DP_VS_SVC_F_SIP_HASH IP_VS_SVC_F_SIP_HASH
#define DP_VS_SVC_F_QID_HASH IP_VS_SVC_F_QID_HASH
#define DP_VS_SVC_F_MATCH IP_VS_SVC_F_MATCH
#define DP_VS_SVC_F_QUIC IP_VS_SVC_F_QUIC

/* virtual service */
struct dp_vs_service {
Expand Down
32 changes: 0 additions & 32 deletions kmod/toa/example_nat64/nginx/README.md

This file was deleted.

80 changes: 80 additions & 0 deletions patch/nginx/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
Nginx Patches for DPVS
-----

The directory is arranged to place nginx patch files for DPVS. More specifically, it contains the following patches.

* TOA patch for originating client IP/port derived from DPVS NAT64 translation
* UOA patch for originating client IP/port derived from DPVS UDP FNAT/NAT64 translation in QUIC/HTTP3
* QUIC Server Connection ID patch for connection migration

## TOA NAT64

Nginx can get the originating client IP address and Port NAT64'ed by DPVS by utilizing nginx variables 'toa_remote_addr' and 'toa_remote_port' respectively. It works when and only when the TOA kernel module has already installed successfully on the nginx server.

This is an exampe configuration of nginx with TOA patch for NAT64.

```
http {
include mime.types;
default_type application/octet-stream;

log_format nat64 '$remote_addr $toa_remote_addr :$toa_remote_port - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" "$http_x_forwarded_for" '
'$request_length $upstream_response_time $upstream_addr';

access_log logs/access.log nat64;

# more other configs ......

}
```

## UOA QUIC/HTTP3

Nginx can get the originating client IP address and Port NAT'ed by DPVS by utilizing nginx variables 'uoa_remote_addr' and 'uoa_remote_port' respectively. Both IPv4-IPv4 and IPv6-IPv6 NAT and NAT64(IPv6-IPv4 NAT) as well are supported. It works when and only when the UOA kernel module has already installed sucessfully on the nginx server.

This is an exampe configuration of nginx with UOA patch.

```
http {
include mime.types;
default_type application/octet-stream;

log_format main '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" "$http3" '
'"$http_x_forwarded_for" $request_length $upstream_response_time $upstream_addr';

log_format quic '$remote_addr $uoa_remote_addr :$uoa_remote_port - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" "$http3" '
'"$http_x_forwarded_for" $request_length $upstream_response_time $upstream_addr';

access_log logs/access.log main;

# more other configs ......


server {
listen 443 quic reuseport;
listen 443 ssl;

server_name qlb-test.qiyi.domain;

access_log logs/quic.access.log quic;

ssl_certificate certs/cert.pem;
ssl_certificate_key certs/key.pem;

location / {
add_header Alt-Svc 'h3=":2443"; ma=86400';
root html;
index index.html index.htm;
}
}
}
```

## Quic Server Connection ID

It requires changes to Quic Server Connection ID(SCID) both in DPVS and Nginx to support the feature of QUIC connection migration. DPVS depends on Server IP/Port information encoded in SCID to schedule a migrating connection to the right nginx server where the previous connection resides, and Nginx relies on the socket cookie compiled in SCID to make a migrating connection be processed on the same listening socket as the previous one. Note that eBPF (bpf_sk_select_reuseport) is used in Nginx for QUIC connection migration, which requires Linux 5.7+.

The patch adds Nginx server address information into SCID, and fixes its collision problem with Nginx's socket cookie. The server address contains 24 least significant bits(LSB) for IPv4, and 32 LSB for IPv6, and compliant with DPVS DCID format specification defined in [ipvs/quic.h](../../include/ipvs/quic.h). The server port is not included in SCID.
Loading
Loading