Update instructions to use -netdev stream #76

tamird · 2024-11-17T16:30:50Z

Since QEMU 7.2 wrappers are no longer needed[0]; update the
documentation to instruct users to prefer the new options added in
2022[1] but not documented until 2024[2].

The client can be entirely removed in a future release.

[0] https://john-millikin.com/improved-unix-socket-networking-in-qemu-7.2
[1] qemu/qemu@5166fe0
[2] qemu/qemu@178413a

Since QEMU 7.2 wrappers are no longer needed[0]; update the documentation to instruct users to prefer the new options added in 2022[1] but not documented until 2024[2]. The client can be entirely removed in a future release. [0] https://john-millikin.com/improved-unix-socket-networking-in-qemu-7.2 [1] qemu/qemu@5166fe0 [2] qemu/qemu@178413a Signed-off-by: Tamir Duberstein <[email protected]>

nirs · 2024-11-17T16:55:41Z

[0] https://john-millikin.com/improved-unix-socket-networking-in-qemu-7.2 [1] qemu/qemu@5166fe0 [2] qemu/qemu@178413a

Thanks! this is very useful.

nirs

Looks good, did you test the modified examples?

tamird · 2024-11-17T17:04:38Z

I did test, yeah. I also updated the test script which I hope runs in CI.

There is one downside to removing the client: QEMU's socket backend supports both stream and datagram sockets, and automatically interrogates the socket to determine its type. Currently we are using stream sockets, but they appear to be less efficient than datagram sockets because each message is length-prefixed, and we use two syscalls to read each packet. If we want to move to datagram sockets in the future (and it seems likely that we would) then we can no longer hide this detail because we'll need to change -netdev stream to -netdev dgram.

nirs · 2024-11-17T17:15:38Z

If we want to move to datagram sockets in the future (and it seems likely that we would) then we can no longer hide this detail because we'll need to change -netdev stream to -netdev dgram.

We should move to dgram socket - this simplifies the code and more efficient since we eliminate the length prefix on both sides (qemu/lima and socket_vment).

This is mostly important for lima where lima copies every packet read from socket_vmnet unix socket to datagram socket connected to vz. If we support datagram socket in socket_vmnet, we can pass the file descriptor of the datagram socket connected to vz to socket_vmnet, and eliminate the expensive copy in lima.

tamird · 2024-11-17T17:24:15Z

I think moving to datagram sockets is going to be a good deal more complex; since dgram communication is inherently unidirectional each VM will need two sockets (one for inbound and one for outbound communication). This is described in the blog I linked above.

This means we'll certainly need a client wrapper responsible for allocating an ephemeral address for the receiving socket and we'll also need a way to communicate that socket's address to the socket_vmnet daemon.

All in all it isn't difficult to see why stream sockets were initially chosen.

nirs · 2024-11-17T17:40:43Z

datagram sockets are not unidirectional. You can read and write packets from the same file descriptor.

I'm using single dgram socket to integrate vfkit with vmnet. I'm creating a datagram sockepair. One end is passed to vfkit using:

--device=virtio-net,fd=vm-fd

This is used to create a file handle device attachment in vz. The other fd is connected to a helper process creating a vmnet interface.

The helper process reads packets from the file descriptor and write them to the vmnet interface, and read packets from vmnet interface and writes them to file descriptor.

The same should work with lima and qemu.

tamird · 2024-11-17T17:46:30Z

Do you use an abstract socket address to bind vm-fd?

nirs · 2024-11-17T17:59:56Z

Do you use an abstract socket address to bind vm-fd?

I’m using a sockepair(2) - it returns 2 connected sockets without address.

tamird · 2024-11-17T18:09:17Z

I see. That's probably using abstract addresses under the covers.

Another question: why do we create a single vmnet interface for all clients rather than one per client?

nirs · 2024-11-17T18:18:46Z

I see. That's probably using abstract addresses under the covers.

The sockets have empty address. Also abstract address namespace is a Linux extension. On macOS there is no such thing.

Another question: why do we create a single vmnet interface for all clients rather than one per client?

I'm not sure. I'm creating one vmnet interface per vm, and each helper process forward packets between one vmnet interface and one vm. This model is much simpler and give much better performance.

Creating one vmnet interface per vm means that vmnet is responsible for forwarding packets between vms. In socket_vment we forward packets between vms and from vmnet interface to all vms. Because socket_vmnet do not know the vms mac addresses, it copies all packets from every socket to all other sockets and vmnet interface, or from vmnet interface to all vms. This scales very badly (#58).

tamird · 2024-11-17T18:43:12Z

I'm not sure. I'm creating one vmnet interface per vm, and each helper process forward packets between one vmnet interface and one vm. This model is much simpler and give much better performance.

Creating one vmnet interface per vm means that vmnet is responsible for forwarding packets between vms. In socket_vment we forward packets between vms and from vmnet interface to all vms. Because socket_vmnet do not know the vms mac addresses, it copies all packets from every socket to all other sockets and vmnet interface, or from vmnet interface to all vms. This scales very badly (#58).

Right, this is what I expected. Perhaps changing this would be a good place to start?

nirs · 2024-11-17T19:17:45Z

I'm not sure. I'm creating one vmnet interface per vm, and each helper process forward packets between one vmnet interface and one vm. This model is much simpler and give much better performance.
Creating one vmnet interface per vm means that vmnet is responsible for forwarding packets between vms. In socket_vment we forward packets between vms and from vmnet interface to all vms. Because socket_vmnet do not know the vms mac addresses, it copies all packets from every socket to all other sockets and vmnet interface, or from vmnet interface to all vms. This scales very badly (#58).

Right, this is what I expected. Perhaps changing this would be a good place to start?

This is a major redesign and I'm not sure it worth the effort because the helper process model when every vm has a small and simple helper process for forwarding packets between the vm and vment is simpler. For example, we don't need launchd service since the helper is create and managed by the program starting the vm (e.g. lima or minikube).

If we want to keep the single daemon serving multiple vms moving to vmnet interface per vm seems like the right design.

With this model, socket_vmnet need to keep a control socket for passing datagram socket descriptors from other processes. When a datagram socket is passed, it will start a vmnet interface for the socket, and start forwarding packets between the socket and vmnet.

When the vmnet interface is created, we get mac address from vmnet. This mac address must be used by the vm so vmnet can forward packets back to the vm. This requires a protocol to return the mac address from vmnet to the vm. This can be implemented using the control socket.

Currently lima generate a unique mac address for every instance based on the instance name, so the instance is more likely to get same IP address each time. With vmnet we cannot control the mac address, but we can specify an interface UUID. Using the same UUID will return the same mac address. lima can generate the UUID in the same way it generates mac address so each instance will have a unique and constant UUID. The protocol for starting a vmnet interface will need to access the UUID and return the mac address provided by vmnet for this UUID.

This requires changes in programs using socket_vment like lima or minikube.

tamird · 2024-11-17T19:19:45Z

This is a major redesign and I'm not sure it worth the effort because the helper process model when every vm has a small and simple helper process for forwarding packets between the vm and vment is simpler. For example, we don't need launchd service since the helper is create and managed by the program starting the vm (e.g. lima or minikube).

This is of course simpler, but it requires running the helper process as root - or have you found a way to avoid that?

nirs · 2024-11-17T19:44:20Z

This is of course simpler, but it requires running the helper process as root - or have you found a way to avoid that?

There is no way to avoid that. This is the limitation set by Apple, and the reason we need socket_vment. If you use vz in a program form the app store, you may be able to get the required entitlement and use native networking without vmnet. This is much faster than using vmnet but it cannot work for open source project when the program can be built by anyone without getting permission from Apple.

You can add sudoers rule to allow your program to run the helper as root. This is the same solution use by lima when you want to manage socket_vment with lima. Lima creates a sudoers rule for you (try limactl sudoers), and you need to install it in /etc/suderes.d/. Then lima run socket_vmnet for you when you start an instance using lima:shared or lima:bridged network.
https://lima-vm.io/docs/config/network/#vmnet-networks

tamird · 2024-11-18T15:29:06Z

Yes, makes sense. I think we should probably head toward the redesign you mentioned. XPC seems to do everything we need. Would an XPC interface work for vz?

nirs · 2024-11-18T15:36:56Z

Yes, makes sense. I think we should probably head toward the redesign you mentioned. XPC seems to do everything we need. Would an XPC interface work for vz?

Not sure how do you want to use xpc. For vz the interface is a connected datagram socket file descriptor and mac address when configure the network device.

tamird · 2024-11-18T15:38:09Z

With this model, socket_vmnet need to keep a control socket for passing datagram socket descriptors from other processes. When a datagram socket is passed, it will start a vmnet interface for the socket, and start forwarding packets between the socket and vmnet.

The XPC interface would be here. It's the interface over which the client passes the vmnet configuration + the datagram socket.

nirs · 2024-11-18T16:00:12Z

With this model, socket_vmnet need to keep a control socket for passing datagram socket descriptors from other processes. When a datagram socket is passed, it will start a vmnet interface for the socket, and start forwarding packets between the socket and vmnet.

The XPC interface would be here. It's the interface over which the client passes the vmnet configuration + the datagram socket.

The datagram socket is a file descriptor in the process creating the vz virtual machine. We have the other end of the socket pair which need to passed to socket_vmnet, and this requires unix socket.
https://liujunming.top/2024/07/14/File-Descriptor-Transfer-over-Unix-Domain-Sockets/

Maybe XPC supports this but requiring it means it will be hard to integrate with other tools that try to work on multiple platforms, like lima and minikube. Lima uses this module to pass fds on unix socket:
https://pkg.go.dev/github.com/ftrvxmtrx/fd?utm_source=godoc#pkg-overview

Same code can be used by minikube, so this seems like the right way to communicate with socket_vmnet. The rest can be very simple json messages and responses that are compatible with anything that can use unix socket and json.

I don't think we should use any Apple only technology as the public interface. This is fine for internal implementation, or if you control the entire system.

Lets more this discussion to a new issue.

tamird · 2024-11-18T16:15:12Z

Sounds good. I'll go ahead and close this since we're almost certainly going to keep a client in place.

tamird · 2024-11-18T16:36:35Z

I opened #77.

tamird mentioned this pull request Nov 17, 2024

client segfault with crafted arguments #73

Open

nirs reviewed Nov 17, 2024

View reviewed changes

tamird closed this Nov 18, 2024

tamird deleted the no-client branch November 18, 2024 16:36

Update instructions to use -netdev stream #76

Update instructions to use -netdev stream #76

Uh oh!

Conversation

tamird commented Nov 17, 2024

Uh oh!

nirs commented Nov 17, 2024

Uh oh!

nirs left a comment

Choose a reason for hiding this comment

Uh oh!

tamird commented Nov 17, 2024

Uh oh!

nirs commented Nov 17, 2024

Uh oh!

tamird commented Nov 17, 2024

Uh oh!

nirs commented Nov 17, 2024

Uh oh!

tamird commented Nov 17, 2024

Uh oh!

nirs commented Nov 17, 2024

Uh oh!

tamird commented Nov 17, 2024

Uh oh!

nirs commented Nov 17, 2024

Uh oh!

tamird commented Nov 17, 2024

Uh oh!

nirs commented Nov 17, 2024

Uh oh!

tamird commented Nov 17, 2024

Uh oh!

nirs commented Nov 17, 2024

Uh oh!

tamird commented Nov 18, 2024

Uh oh!

nirs commented Nov 18, 2024

Uh oh!

tamird commented Nov 18, 2024

Uh oh!

nirs commented Nov 18, 2024

Uh oh!

tamird commented Nov 18, 2024

Uh oh!

tamird commented Nov 18, 2024

Uh oh!

Uh oh!