Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

virtio-net enables features without checking if the peer supports them #264

Open
slp opened this issue Feb 6, 2025 · 4 comments
Open

virtio-net enables features without checking if the peer supports them #264

slp opened this issue Feb 6, 2025 · 4 comments

Comments

@slp
Copy link
Contributor

slp commented Feb 6, 2025

Here we expose many features without confirming that the peer actually supports them:

        let avail_features = 1 << VIRTIO_NET_F_GUEST_CSUM
            | 1 << VIRTIO_NET_F_CSUM
            | 1 << VIRTIO_NET_F_GUEST_TSO4
            | 1 << VIRTIO_NET_F_HOST_TSO4
            | 1 << VIRTIO_NET_F_GUEST_UFO
            | 1 << VIRTIO_NET_F_HOST_UFO
            | 1 << VIRTIO_NET_F_MAC
            | 1 << VIRTIO_RING_F_EVENT_IDX
            | 1 << VIRTIO_F_VERSION_1;

QEMU only enables checksum, TSO and UFO if the peer supports vnet_hdr. We should do the same thing and negotiate with the peer before enabling those features.

cc/ @mtjhrc

@nirs
Copy link
Contributor

nirs commented Feb 6, 2025

Looking in qemu code integrating with vmnet, it uses:

static NetClientInfo net_vmnet_shared_info = {
    .type = NET_CLIENT_DRIVER_VMNET_SHARED,
    .size = sizeof(VmnetState),
    .receive = vmnet_receive_common,
    .cleanup = vmnet_cleanup_common,
};

So net_vmnet_shared_info.has_vnet_hdr is NULL, so qemu_has_vnet_hdr() will always return False.

I think qemu can be improved to enable vment_enable_tso_key and vment_enable_checksum_offload_key, and in this case it can enable tso and tco in the virtio driver.

But this is the easy case when the same program controls both the virtio device and the vmnet interface. For libkrun we control the virtio device, but we don't know how the vmnet interface was configured. I think we need to add options to enable or disable offloading features.

When I enable vment_enable_tso_key and vment_enable_checksum_offload_key options in vmnet-helper, I can work with current krunkit/libkrun. But this give very inconsistent performance. The best way would be to be able to enable or disable offloading libkrun (via krunkit options), and in the program crating the vmnet interface. This way you can optimize for the use case.

Related libvirt options to control offloading:
https://libvirt.org/formatdomain.html#setting-nic-driver-specific-options

vmnet offloading features (no public docs, documented in vmnet.h):

/*!
 * @constant vmnet_enable_tso_key
 * Enable TCP segmentation offload. Note, when this is enabled, the
 * interface may generate large (64K) TCP frames. It must also
 * be prepared to accept large TCP frames as well.
 */
extern const char *
vmnet_enable_tso_key API_AVAILABLE(macos(11.0)) API_UNAVAILABLE(ios, watchos, tvos);

/*!
 * @constant vmnet_enable_checksum_offload_key
 * Enable checksum offload for this interface. The checksums that are
 * offloaded are: IPv4 header checksum, UDP checksum (IPv4 and IPv6), 
 * and TCP checksum (IPv4 and IPv6).
 *
 * In order to perform the offload function, all packets flowing in and
 * out of the vmnet_interface instance are verified to pass basic 
 * IPv4, IPv6, UDP, and TCP sanity checks. A packet that fails any
 * of these checks is simply dropped.
 *
 * On output, checksums are automatically computed as necessary
 * on each packet sent using vmnet_write().
 *
 * On input, checksums are verified as necessary. If any checksum verification
 * fails, the packet is dropped and not delivered to vmnet_read().
 *
 * Note that the checksum offload function for UDP and TCP checksums is unable
 * to deal with fragmented IPv4/IPv6 packets. The VM client networking stack
 * must handle UDP and TCP checksums on fragmented packets itself.
 */
extern const char *
vmnet_enable_checksum_offload_key API_AVAILABLE(macos(12.0)) API_UNAVAILABLE(ios, watchos, tvos);

@nirs
Copy link
Contributor

nirs commented Feb 6, 2025

Handling qemu dgram (-net dgram,) looks similar:

static NetClientInfo net_dgram_socket_info = {
    .type = NET_CLIENT_DRIVER_DGRAM,
    .size = sizeof(NetDgramState),
    .receive = net_dgram_receive,
    .cleanup = net_dgram_cleanup,
};

So with -net dgram, we don't enable offloading, and this indeed work with vmnet-helper (not enabling offloading by default).

Example devive that may enable offloading:

static NetClientInfo net_tap_info = {
    .type = NET_CLIENT_DRIVER_TAP,
    .size = sizeof(TAPState),
    .receive = tap_receive,
    .receive_iov = tap_receive_iov,
    .poll = tap_poll,
    .cleanup = tap_cleanup,
    .has_ufo = tap_has_ufo,
    .has_uso = tap_has_uso,
    .has_vnet_hdr = tap_has_vnet_hdr,
    .has_vnet_hdr_len = tap_has_vnet_hdr_len,
    .set_offload = tap_set_offload,
    .set_vnet_hdr_len = tap_set_vnet_hdr_len,
    .set_vnet_le = tap_set_vnet_le,
    .set_vnet_be = tap_set_vnet_be,
    .set_steering_ebpf = tap_set_steering_ebpf,
};

@mtjhrc
Copy link
Collaborator

mtjhrc commented Feb 6, 2025

Yes, this doesn't seem to be correct. It is quite likely we shouldn't be enabling these flags even for the passt backend actually. The virtio-net device is based on firecracker (which used a TAP), and I didn't really give a lot of thought to specifying the virtio features (mostly I wasn't sure what they were doing and it it worked, so I forgot about it).

So we should really remove the features, and have an API for enabling them (if they are useful or supported by the backend).

@nirs
Copy link
Contributor

nirs commented Feb 6, 2025

I did quick performance testing with and without TSO and checksum offload. It shows that in most case disabling offloading improves performance significantly and give more consistent results. However in one test (iper3 -R), enabling offloading is 5 times faster!

TSO and checksum offload enabled

I kept the fetures bits in libkrun and enabled vmnet offloading options.

network vm iper3 iperf3 -R
gvproxy krunkit 1.40 Gbits/s 20.00 Gbits/s
vmnet-helper shared krunkit 1.38 Gbits/s 46.20 Gbits/s
vmnet-helper bridged krunkit 1.37 Gbits/s 45.70 Gbits/s
vmnet-helper shared vfkit 4.27 Gbits/s 8.09 Gbits/s
vmnet-helper bridged vfkit 4.30 Gbits/s 10.50 Gbits/s

TSO and checksum offload disabled

I remove the fetures bits in libkrun and disable vmnet offloading options.

network vm iper3 iperf3 -R
gvproxy krunkit 1.47 Gbits/s 2.58 Gbits/s
vmnet-helper shared krunkit 10.10 Gbits/s 8.38 Gbits/s
vmnet-helper shared krunkit 10.10 Gbits/s 8.38 Gbits/s
gvproxy vfkit 1.43 Gbits/s 2.84 Gbits/s
vmnet-helper shared vfkit 10.70 Gbits/s 8.41 Gbits/s
vmnet-helper bridged vfkit 12.10 Gbits/s 11.50 Gbits/s

I did not check any real workloads, maybe using offloading can improve performance when connecting 2 vms (possible with vmnet, not possible with gvproxy).

Test details

All tests used #263.

libkrun change to remvoe the features bits:

diff --git a/src/devices/src/virtio/net/device.rs b/src/devices/src/virtio/net/device.rs
index 62695c1..788cffb 100644
--- a/src/devices/src/virtio/net/device.rs
+++ b/src/devices/src/virtio/net/device.rs
@@ -93,15 +93,14 @@ pub struct Net {
 impl Net {
     /// Create a new virtio network device using the backend
     pub fn new(id: String, cfg_backend: VirtioNetBackend, mac: [u8; 6]) -> Result<Self> {
-        let avail_features = 1 << VIRTIO_NET_F_GUEST_CSUM
-            | 1 << VIRTIO_NET_F_CSUM
-            | 1 << VIRTIO_NET_F_GUEST_TSO4
-            | 1 << VIRTIO_NET_F_HOST_TSO4
-            | 1 << VIRTIO_NET_F_GUEST_UFO
-            | 1 << VIRTIO_NET_F_HOST_UFO
-            | 1 << VIRTIO_NET_F_MAC
-            | 1 << VIRTIO_RING_F_EVENT_IDX
-            | 1 << VIRTIO_F_VERSION_1;
+        let avail_features =
+            1 << VIRTIO_NET_F_MAC | 1 << VIRTIO_RING_F_EVENT_IDX | 1 << VIRTIO_F_VERSION_1;
+        //1 << VIRTIO_NET_F_GUEST_CSUM
+        //| 1 << VIRTIO_NET_F_CSUM
+        //| 1 << VIRTIO_NET_F_GUEST_TSO4
+        //| 1 << VIRTIO_NET_F_HOST_TSO4
+        //| 1 << VIRTIO_NET_F_GUEST_UFO
+        //| 1 << VIRTIO_NET_F_HOST_UFO

@nirs nirs mentioned this issue Feb 6, 2025
2 tasks
nirs added a commit to nirs/libkrun that referenced this issue Feb 6, 2025
Connecting to the socket set the remote address so we can use
send()/recv() or write()/read() instead of sendto()/recvfrom().

Advantages:
- Simpler code, no need to keep the remote address.
- The server will want to connect to the client address to ensure it
  receives frames only from the connected client. Such server will want
  to remove the unix socket once the client connected[2], which doe snot
  work with current code.
- Once the socket is connected, the same backend can be used to handle
  passed file descriptor[1].
- iperf3 -R is 1.33 times faster (46.6 Gbits/s vs 35.0 Gbits/s).

Tested with:
- [x] gvproxy
- [x] vmnet-helper

For testing results see
containers#264 (comment)

[1] containers/krunkit#24
[2] https://github.com/nirs/vmnet-helper/blob/5c6a595ba3e76314e1d0bef2b0160388439d69ec/helper.c#L475

Signed-off-by: Nir Soffer <[email protected]>
nirs added a commit to nirs/libkrun that referenced this issue Feb 6, 2025
Connecting to the socket set the remote address so we can use
send()/recv() or write()/read() instead of sendto()/recvfrom().

Advantages:
- Simpler code, no need to keep the remote address.
- The server will want to connect to the client address to ensure it
  receives frames only from the connected client. Such server will want
  to remove the unix socket once the client connected[2], which doe snot
  work with current code.
- Once the socket is connected, the same backend can be used to handle
  passed file descriptor[1].
- iperf3 -R is 1.33 times faster (46.6 Gbits/s vs 35.0 Gbits/s).

Tested with:
- [x] gvproxy
- [x] vmnet-helper

For testing results see
containers#264 (comment)

[1] containers/krunkit#24
[2] https://github.com/nirs/vmnet-helper/blob/5c6a595ba3e76314e1d0bef2b0160388439d69ec/helper.c#L475

Signed-off-by: Nir Soffer <[email protected]>
nirs added a commit to nirs/libkrun that referenced this issue Feb 7, 2025
Connecting to the socket sets the remote address so we can use
send()/recv() or write()/read() instead of sendto()/recvfrom().

Advantages:
- Simpler code, no need to keep the remote address.
- The server will want to connect to the client address to ensure it
  receives frames only from the connected client. Such server will want
  to remove the unix socket once the client connected[2], which doe snot
  work with current code.
- Once the socket is connected, the same backend can be used to handle
  passed file descriptor[1].
- iperf3 -R is 1.33 times faster (46.6 Gbits/s vs 35.0 Gbits/s).

Tested with:
- [x] gvproxy
- [x] vmnet-helper

For testing results see
containers#264 (comment)

[1] containers/krunkit#24
[2] https://github.com/nirs/vmnet-helper/blob/5c6a595ba3e76314e1d0bef2b0160388439d69ec/helper.c#L475

Signed-off-by: Nir Soffer <[email protected]>
nirs added a commit to nirs/libkrun that referenced this issue Feb 7, 2025
Connecting to the socket sets the remote address so we can use
send()/recv() or write()/read() instead of sendto()/recvfrom().

Advantages:
- Simpler code, no need to keep the remote address.
- The server will want to connect to the client address to ensure it
  receives frames only from the connected client. Such server will want
  to remove the unix socket once the client connected[2], which doe snot
  work with current code.
- Once the socket is connected, the same backend can be used to handle
  passed file descriptor[1].
- iperf3 -R is 1.33 times faster (46.6 Gbits/s vs 35.0 Gbits/s).

Tested with:
- [x] gvproxy
- [x] vmnet-helper

For testing results see
containers#264 (comment)

[1] containers/krunkit#24
[2] https://github.com/nirs/vmnet-helper/blob/5c6a595ba3e76314e1d0bef2b0160388439d69ec/helper.c#L475

Signed-off-by: Nir Soffer <[email protected]>
nirs added a commit to nirs/libkrun that referenced this issue Feb 7, 2025
This fixes the compatibility with vmnet framework defaults, and shows
better and more consistent performance in most cases.

vmnet has partly documented vmnet_enable_tso_key and
vmnet_enable_checksum_ofload_key options. Enabling these options allows
vment helper to work with current krunkit.

Before (TSO and checksum offload enabled):

| network              | vm       | iper3         | iperf3 -R     |
|----------------------|--------- |---------------|---------------|
| gvproxy              | krunkit  |  1.40 Gbits/s | 20.00 Gbits/s |
| vmnet-helper shared  | krunkit  |  1.38 Gbits/s | 46.20 Gbits/s |
| vmnet-helper bridged | krunkit  |  1.37 Gbits/s | 45.70 Gbits/s |
| vmnet-helper shared  | vfkit    |  4.27 Gbits/s |  8.09 Gbits/s |
| vmnet-helper bridged | vfkit    |  4.30 Gbits/s | 10.50 Gbits/s |

After (TSO and checksum offload disabled):

| network              | vm       | iper3         | iperf3 -R     |
|----------------------|----------|---------------|---------------|
| gvproxy              | krunkit  |  1.47 Gbits/s |  2.58 Gbits/s |
| vmnet-helper shared  | krunkit  | 10.10 Gbits/s |  8.38 Gbits/s |
| vmnet-helper shared  | krunkit  | 10.10 Gbits/s |  8.38 Gbits/s |
| gvproxy              | vfkit    |  1.43 Gbits/s |  2.84 Gbits/s |
| vmnet-helper shared  | vfkit    | 10.70 Gbits/s |  8.41 Gbits/s |
| vmnet-helper bridged | vfkit    | 12.10 Gbits/s | 11.50 Gbits/s |

It seems that using offloading makes sense only in special cases, so it
should be optional.

Part-of: containers#264
slp pushed a commit that referenced this issue Feb 18, 2025
Connecting to the socket sets the remote address so we can use
send()/recv() or write()/read() instead of sendto()/recvfrom().

Advantages:
- Simpler code, no need to keep the remote address.
- The server will want to connect to the client address to ensure it
  receives frames only from the connected client. Such server will want
  to remove the unix socket once the client connected[2], which doe snot
  work with current code.
- Once the socket is connected, the same backend can be used to handle
  passed file descriptor[1].
- iperf3 -R is 1.33 times faster (46.6 Gbits/s vs 35.0 Gbits/s).

Tested with:
- [x] gvproxy
- [x] vmnet-helper

For testing results see
#264 (comment)

[1] containers/krunkit#24
[2] https://github.com/nirs/vmnet-helper/blob/5c6a595ba3e76314e1d0bef2b0160388439d69ec/helper.c#L475

Signed-off-by: Nir Soffer <[email protected]>
nirs added a commit to nirs/libkrun that referenced this issue Feb 21, 2025
This fixes the compatibility with vmnet framework defaults, and shows
better and more consistent performance in most cases.

vmnet has partly documented vmnet_enable_tso_key and
vmnet_enable_checksum_ofload_key options. Enabling these options allows
vment helper to work with current krunkit.

Before (TSO and checksum offload enabled):

| network              | vm       | iper3         | iperf3 -R     |
|----------------------|--------- |---------------|---------------|
| gvproxy              | krunkit  |  1.40 Gbits/s | 20.00 Gbits/s |
| vmnet-helper shared  | krunkit  |  1.38 Gbits/s | 46.20 Gbits/s |
| vmnet-helper bridged | krunkit  |  1.37 Gbits/s | 45.70 Gbits/s |
| vmnet-helper shared  | vfkit    |  4.27 Gbits/s |  8.09 Gbits/s |
| vmnet-helper bridged | vfkit    |  4.30 Gbits/s | 10.50 Gbits/s |

After (TSO and checksum offload disabled):

| network              | vm       | iper3         | iperf3 -R     |
|----------------------|----------|---------------|---------------|
| gvproxy              | krunkit  |  1.47 Gbits/s |  2.58 Gbits/s |
| vmnet-helper shared  | krunkit  | 10.10 Gbits/s |  8.38 Gbits/s |
| vmnet-helper shared  | krunkit  | 10.10 Gbits/s |  8.38 Gbits/s |
| gvproxy              | vfkit    |  1.43 Gbits/s |  2.84 Gbits/s |
| vmnet-helper shared  | vfkit    | 10.70 Gbits/s |  8.41 Gbits/s |
| vmnet-helper bridged | vfkit    | 12.10 Gbits/s | 11.50 Gbits/s |

It seems that using offloading makes sense only in special cases, so it
should be optional.

Part-of: containers#264
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants