You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There's a constant stream of kernel vulnerabilities, including e.g. CVE-2024-1086 recently, in Netfilter as exposed to users due to containers - such as user and network namespaces created by a host user specifically to perform the attack (exploits programs invoke unshare on their own). The only mitigations with upstream and Red Hat kernels are user.max_user_namespaces=0, user.max_net_namespaces=0, or blacklisting Netfilter kernel module(s). Unfortunately, these break commonly needed functionality. Ubuntu/AppArmor is able to disable just unprivileged users' creation of namespaces, which breaks only a little bit less.
We could want to invent a knob of our own that would limit access only to Netfilter and only in containers (user/network namespaces). Further, it could support an intermediate setting where it'd disallow Netfilter in nested containers, but leave it allowed (and exposed for attack, unfortunately) in top-level containers. A use case mentioned to me is:
The most obvious use cases I'm thinking of are Kubernetes in Docker by example, KinD container will run kubernetes inside it and kubernetes is using netfilter for kube-proxy
The text was updated successfully, but these errors were encountered:
In terms of implementation, we'd probably need to hook nfnetlink_rcv (not exported and static, but accessed via function pointer, so should be intact), but a problem is with our current kretprobe hooks we "can't" prevent the original function from being called and I don't see a non-invasive way to make it a no-op for one call.
It uses netlink_net_capable(skb, CAP_NET_ADMIN), which makes me think of whether we possibly want to have a knob to restrict access to all of Netlink instead? Which we could perhaps by hooking __netlink_ns_capable (exported).
And this makes me further think of whether we could have a knob to restrict all uses of CAP_NET_ADMIN in non-init namespaces, which we could do from the security_capable LSM hook as used by ns_capable_common (the latter is not exported, static). We already hook security_capable for task integrity checking and pCFI (we hook it via kretprobe for consistency with our other hooks, not the way it was meant to be hooked). So, if we're fine with not limiting this to Netfilter nor even Netlink, what we could do is add a check of security_capable arguments 2 and 3 (namespace and capability) in our p_capable_ret (or switch to proper LSM hooking).
A question is then why would a sysadmin want to allow user+network namespaces then. A possible reason why is that apparently network namespaces are sometimes used (by some systemd services) to give up network access, which I guess would continue to work without a usable CAP_NET_ADMIN in there. Another reason is our knob could allow to make CAP_NET_ADMIN ineffective only starting with a certain namespace nesting depth (the sysctl value).
solardiz
changed the title
Add a knob to limit Netfilter access from containers
Add knob(s) to limit Netfilter, Netlink, or all CAP_NET_ADMIN access from containers
Apr 17, 2024
There's a constant stream of kernel vulnerabilities, including e.g. CVE-2024-1086 recently, in Netfilter as exposed to users due to containers - such as user and network namespaces created by a host user specifically to perform the attack (exploits programs invoke
unshare
on their own). The only mitigations with upstream and Red Hat kernels areuser.max_user_namespaces=0
,user.max_net_namespaces=0
, or blacklisting Netfilter kernel module(s). Unfortunately, these break commonly needed functionality. Ubuntu/AppArmor is able to disable just unprivileged users' creation of namespaces, which breaks only a little bit less.We could want to invent a knob of our own that would limit access only to Netfilter and only in containers (user/network namespaces). Further, it could support an intermediate setting where it'd disallow Netfilter in nested containers, but leave it allowed (and exposed for attack, unfortunately) in top-level containers. A use case mentioned to me is:
The text was updated successfully, but these errors were encountered: