You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you look at benchmarks of tinc you will quickly find that for many real world workloads the largest user of CPU time is TUN/TAP.
I did some work on sendmmsg in the past but rand into architectural issues primarily. Tinc was never built to handle a queue of packets (but this can change!)
If you really want performance for tinc, build ktincd (linux kernel tinc). I've debated it numerous times.
It was originally going to be one of my next experiments after the AES protocol changes merged (but they never did)
The networking side of it wouldnt be too hard. Tinc is structured well enough that adapting to a linux netdev would not be too difficult. Configuration though is potentially a real nightmare.
However tinc doesn't have the architecture in place for batching on the tap side. And that's what holds me back. I'm not certain I want to do that level of change without guidance.
The advantage of io_uring is that you don't have to batch things at all in the application. You can still do single packet read()/write()/send()/recv() calls, but instead of them being system calls, you enqueue them on the io_uring. You can also have buffers shared between userspace and kernelspace, so you can theoretically avoid copies being made. However, I don't know how well that works compared to packet mmap.
This blog post by tailscale sounds promising. It points out that the Linux Tun device supports TSO/GRO offloading.
Also, there is another post for using GSO (Generic Segmentation Offload) to send multiple UDP packets from a single large buffer.
Both techniques reduce network stack traversals. Unfortunatedly these features do not seem to be well documented.
The text was updated successfully, but these errors were encountered: