Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update instructions to use -netdev stream #76

Closed
wants to merge 1 commit into from

Conversation

tamird
Copy link

@tamird tamird commented Nov 17, 2024

Since QEMU 7.2 wrappers are no longer needed[0]; update the
documentation to instruct users to prefer the new options added in
2022[1] but not documented until 2024[2].

The client can be entirely removed in a future release.

[0] https://john-millikin.com/improved-unix-socket-networking-in-qemu-7.2
[1] qemu/qemu@5166fe0
[2] qemu/qemu@178413a

Since QEMU 7.2 wrappers are no longer needed[0]; update the
documentation to instruct users to prefer the new options added in
2022[1] but not documented until 2024[2].

The client can be entirely removed in a future release.

[0] https://john-millikin.com/improved-unix-socket-networking-in-qemu-7.2
[1] qemu/qemu@5166fe0
[2] qemu/qemu@178413a

Signed-off-by: Tamir Duberstein <[email protected]>
@nirs
Copy link
Member

nirs commented Nov 17, 2024

Copy link
Member

@nirs nirs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, did you test the modified examples?

@tamird
Copy link
Author

tamird commented Nov 17, 2024

I did test, yeah. I also updated the test script which I hope runs in CI.

There is one downside to removing the client: QEMU's socket backend supports both stream and datagram sockets, and automatically interrogates the socket to determine its type. Currently we are using stream sockets, but they appear to be less efficient than datagram sockets because each message is length-prefixed, and we use two syscalls to read each packet. If we want to move to datagram sockets in the future (and it seems likely that we would) then we can no longer hide this detail because we'll need to change -netdev stream to -netdev dgram.

@nirs
Copy link
Member

nirs commented Nov 17, 2024

If we want to move to datagram sockets in the future (and it seems likely that we would) then we can no longer hide this detail because we'll need to change -netdev stream to -netdev dgram.

We should move to dgram socket - this simplifies the code and more efficient since we eliminate the length prefix on both sides (qemu/lima and socket_vment).

This is mostly important for lima where lima copies every packet read from socket_vmnet unix socket to datagram socket connected to vz. If we support datagram socket in socket_vmnet, we can pass the file descriptor of the datagram socket connected to vz to socket_vmnet, and eliminate the expensive copy in lima.

@tamird
Copy link
Author

tamird commented Nov 17, 2024

I think moving to datagram sockets is going to be a good deal more complex; since dgram communication is inherently unidirectional each VM will need two sockets (one for inbound and one for outbound communication). This is described in the blog I linked above.

This means we'll certainly need a client wrapper responsible for allocating an ephemeral address for the receiving socket and we'll also need a way to communicate that socket's address to the socket_vmnet daemon.

All in all it isn't difficult to see why stream sockets were initially chosen.

@nirs
Copy link
Member

nirs commented Nov 17, 2024

datagram sockets are not unidirectional. You can read and write packets from the same file descriptor.

I'm using single dgram socket to integrate vfkit with vmnet. I'm creating a datagram sockepair. One end is passed to vfkit using:

--device=virtio-net,fd=vm-fd

This is used to create a file handle device attachment in vz. The other fd is connected to a helper process creating a vmnet interface.

The helper process reads packets from the file descriptor and write them to the vmnet interface, and read packets from vmnet interface and writes them to file descriptor.

The same should work with lima and qemu.

@tamird
Copy link
Author

tamird commented Nov 17, 2024

Do you use an abstract socket address to bind vm-fd?

@nirs
Copy link
Member

nirs commented Nov 17, 2024

Do you use an abstract socket address to bind vm-fd?

I’m using a sockepair(2) - it returns 2 connected sockets without address.

@tamird
Copy link
Author

tamird commented Nov 17, 2024

I see. That's probably using abstract addresses under the covers.

Another question: why do we create a single vmnet interface for all clients rather than one per client?

@nirs
Copy link
Member

nirs commented Nov 17, 2024

I see. That's probably using abstract addresses under the covers.

The sockets have empty address. Also abstract address namespace is a Linux extension. On macOS there is no such thing.

Another question: why do we create a single vmnet interface for all clients rather than one per client?

I'm not sure. I'm creating one vmnet interface per vm, and each helper process forward packets between one vmnet interface and one vm. This model is much simpler and give much better performance.

Creating one vmnet interface per vm means that vmnet is responsible for forwarding packets between vms. In socket_vment we forward packets between vms and from vmnet interface to all vms. Because socket_vmnet do not know the vms mac addresses, it copies all packets from every socket to all other sockets and vmnet interface, or from vmnet interface to all vms. This scales very badly (#58).

@tamird
Copy link
Author

tamird commented Nov 17, 2024

I'm not sure. I'm creating one vmnet interface per vm, and each helper process forward packets between one vmnet interface and one vm. This model is much simpler and give much better performance.

Creating one vmnet interface per vm means that vmnet is responsible for forwarding packets between vms. In socket_vment we forward packets between vms and from vmnet interface to all vms. Because socket_vmnet do not know the vms mac addresses, it copies all packets from every socket to all other sockets and vmnet interface, or from vmnet interface to all vms. This scales very badly (#58).

Right, this is what I expected. Perhaps changing this would be a good place to start?

@nirs
Copy link
Member

nirs commented Nov 17, 2024

I'm not sure. I'm creating one vmnet interface per vm, and each helper process forward packets between one vmnet interface and one vm. This model is much simpler and give much better performance.
Creating one vmnet interface per vm means that vmnet is responsible for forwarding packets between vms. In socket_vment we forward packets between vms and from vmnet interface to all vms. Because socket_vmnet do not know the vms mac addresses, it copies all packets from every socket to all other sockets and vmnet interface, or from vmnet interface to all vms. This scales very badly (#58).

Right, this is what I expected. Perhaps changing this would be a good place to start?

This is a major redesign and I'm not sure it worth the effort because the helper process model when every vm has a small and simple helper process for forwarding packets between the vm and vment is simpler. For example, we don't need launchd service since the helper is create and managed by the program starting the vm (e.g. lima or minikube).

If we want to keep the single daemon serving multiple vms moving to vmnet interface per vm seems like the right design.

With this model, socket_vmnet need to keep a control socket for passing datagram socket descriptors from other processes. When a datagram socket is passed, it will start a vmnet interface for the socket, and start forwarding packets between the socket and vmnet.

When the vmnet interface is created, we get mac address from vmnet. This mac address must be used by the vm so vmnet can forward packets back to the vm. This requires a protocol to return the mac address from vmnet to the vm. This can be implemented using the control socket.

Currently lima generate a unique mac address for every instance based on the instance name, so the instance is more likely to get same IP address each time. With vmnet we cannot control the mac address, but we can specify an interface UUID. Using the same UUID will return the same mac address. lima can generate the UUID in the same way it generates mac address so each instance will have a unique and constant UUID. The protocol for starting a vmnet interface will need to access the UUID and return the mac address provided by vmnet for this UUID.

This requires changes in programs using socket_vment like lima or minikube.

@tamird
Copy link
Author

tamird commented Nov 17, 2024

This is a major redesign and I'm not sure it worth the effort because the helper process model when every vm has a small and simple helper process for forwarding packets between the vm and vment is simpler. For example, we don't need launchd service since the helper is create and managed by the program starting the vm (e.g. lima or minikube).

This is of course simpler, but it requires running the helper process as root - or have you found a way to avoid that?

@nirs
Copy link
Member

nirs commented Nov 17, 2024

This is of course simpler, but it requires running the helper process as root - or have you found a way to avoid that?

There is no way to avoid that. This is the limitation set by Apple, and the reason we need socket_vment. If you use vz in a program form the app store, you may be able to get the required entitlement and use native networking without vmnet. This is much faster than using vmnet but it cannot work for open source project when the program can be built by anyone without getting permission from Apple.

You can add sudoers rule to allow your program to run the helper as root. This is the same solution use by lima when you want to manage socket_vment with lima. Lima creates a sudoers rule for you (try limactl sudoers), and you need to install it in /etc/suderes.d/. Then lima run socket_vmnet for you when you start an instance using lima:shared or lima:bridged network.
https://lima-vm.io/docs/config/network/#vmnet-networks

@tamird
Copy link
Author

tamird commented Nov 18, 2024

Yes, makes sense. I think we should probably head toward the redesign you mentioned. XPC seems to do everything we need. Would an XPC interface work for vz?

@nirs
Copy link
Member

nirs commented Nov 18, 2024

Yes, makes sense. I think we should probably head toward the redesign you mentioned. XPC seems to do everything we need. Would an XPC interface work for vz?

Not sure how do you want to use xpc. For vz the interface is a connected datagram socket file descriptor and mac address when configure the network device.

@tamird
Copy link
Author

tamird commented Nov 18, 2024

With this model, socket_vmnet need to keep a control socket for passing datagram socket descriptors from other processes. When a datagram socket is passed, it will start a vmnet interface for the socket, and start forwarding packets between the socket and vmnet.

The XPC interface would be here. It's the interface over which the client passes the vmnet configuration + the datagram socket.

@nirs
Copy link
Member

nirs commented Nov 18, 2024

With this model, socket_vmnet need to keep a control socket for passing datagram socket descriptors from other processes. When a datagram socket is passed, it will start a vmnet interface for the socket, and start forwarding packets between the socket and vmnet.

The XPC interface would be here. It's the interface over which the client passes the vmnet configuration + the datagram socket.

The datagram socket is a file descriptor in the process creating the vz virtual machine. We have the other end of the socket pair which need to passed to socket_vmnet, and this requires unix socket.
https://liujunming.top/2024/07/14/File-Descriptor-Transfer-over-Unix-Domain-Sockets/

Maybe XPC supports this but requiring it means it will be hard to integrate with other tools that try to work on multiple platforms, like lima and minikube. Lima uses this module to pass fds on unix socket:
https://pkg.go.dev/github.com/ftrvxmtrx/fd?utm_source=godoc#pkg-overview

Same code can be used by minikube, so this seems like the right way to communicate with socket_vmnet. The rest can be very simple json messages and responses that are compatible with anything that can use unix socket and json.

I don't think we should use any Apple only technology as the public interface. This is fine for internal implementation, or if you control the entire system.

Lets more this discussion to a new issue.

@tamird
Copy link
Author

tamird commented Nov 18, 2024

Sounds good. I'll go ahead and close this since we're almost certainly going to keep a client in place.

@tamird tamird closed this Nov 18, 2024
@tamird tamird deleted the no-client branch November 18, 2024 16:36
@tamird
Copy link
Author

tamird commented Nov 18, 2024

I opened #77.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants