Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add knob(s) to limit Netfilter, Netlink, or all CAP_NET_ADMIN access from containers #331

Open
solardiz opened this issue Apr 8, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@solardiz
Copy link
Contributor

solardiz commented Apr 8, 2024

There's a constant stream of kernel vulnerabilities, including e.g. CVE-2024-1086 recently, in Netfilter as exposed to users due to containers - such as user and network namespaces created by a host user specifically to perform the attack (exploits programs invoke unshare on their own). The only mitigations with upstream and Red Hat kernels are user.max_user_namespaces=0, user.max_net_namespaces=0, or blacklisting Netfilter kernel module(s). Unfortunately, these break commonly needed functionality. Ubuntu/AppArmor is able to disable just unprivileged users' creation of namespaces, which breaks only a little bit less.

We could want to invent a knob of our own that would limit access only to Netfilter and only in containers (user/network namespaces). Further, it could support an intermediate setting where it'd disallow Netfilter in nested containers, but leave it allowed (and exposed for attack, unfortunately) in top-level containers. A use case mentioned to me is:

The most obvious use cases I'm thinking of are Kubernetes in Docker by example, KinD container will run kubernetes inside it and kubernetes is using netfilter for kube-proxy

@solardiz solardiz added the enhancement New feature or request label Apr 8, 2024
@solardiz
Copy link
Contributor Author

In terms of implementation, we'd probably need to hook nfnetlink_rcv (not exported and static, but accessed via function pointer, so should be intact), but a problem is with our current kretprobe hooks we "can't" prevent the original function from being called and I don't see a non-invasive way to make it a no-op for one call.

It uses netlink_net_capable(skb, CAP_NET_ADMIN), which makes me think of whether we possibly want to have a knob to restrict access to all of Netlink instead? Which we could perhaps by hooking __netlink_ns_capable (exported).

And this makes me further think of whether we could have a knob to restrict all uses of CAP_NET_ADMIN in non-init namespaces, which we could do from the security_capable LSM hook as used by ns_capable_common (the latter is not exported, static). We already hook security_capable for task integrity checking and pCFI (we hook it via kretprobe for consistency with our other hooks, not the way it was meant to be hooked). So, if we're fine with not limiting this to Netfilter nor even Netlink, what we could do is add a check of security_capable arguments 2 and 3 (namespace and capability) in our p_capable_ret (or switch to proper LSM hooking).

A question is then why would a sysadmin want to allow user+network namespaces then. A possible reason why is that apparently network namespaces are sometimes used (by some systemd services) to give up network access, which I guess would continue to work without a usable CAP_NET_ADMIN in there. Another reason is our knob could allow to make CAP_NET_ADMIN ineffective only starting with a certain namespace nesting depth (the sysctl value).

@solardiz solardiz changed the title Add a knob to limit Netfilter access from containers Add knob(s) to limit Netfilter, Netlink, or all CAP_NET_ADMIN access from containers Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant