Containers should get a `/run/user/<UID>/` tmpfs volume mount (maybe opt-in) to match Linux w/ systemd #4776

nicowilliams · 2024-07-26T19:43:53Z

Enhancement Description

One-line enhancement description (can be used as a release note): Containers should get a /run/user/<UID>/ tmpfs volume mount to match Linux w/ systemd
Kubernetes Enhancement Proposal: TBD
Discussion Link: Containers should get a /run/user/<UID>/ tmpfs volume mount kubernetes#126394
Primary contact (assignee):
Responsible SIGs:
Enhancement target (which target equals to which milestone):
- Alpha release target (x.y):
- Beta release target (x.y):
- Stable release target (x.y):
Alpha
- KEP (k/enhancements) update PR(s):
- Code (k/k) update PR(s):
- Docs (k/website) update PR(s):

What would you like to be added?

Containers should get a /run/user/<UID>/ tmpfs volume mount.

Why is this needed?

Generally this is needed for conformance with the de facto standard that systemd sets by providing this. Part of this is in FHS, and part in XDG (see below). They provide this feature for the same sorts of reasons that we need it specifically: to store temporary data on a per-user basis with well-known names and not subject to attack via /tmp/ being world-writable (the "sticky" bit is insufficient to fully protect against attacks here). Why bother for Kubernetes? Well, because libraries may want to use /run/user/<UID>/, and it's much easier to deal with the absence of that directory by just always having that directory. To make that directory's presence universal means that Kubernetes needs to provide it, at least optionally.

We've implemented a token cache system that resembles Kerberos credentials caches in that they are kept in temporary storage, preferably tmpfs, but /tmp/ is not an appropriate place (see below) when that code is used on multi-user systems. We want to use temporary storage because we don't want these token caches to survive reboots, for example.

Kubernetes pods and containers are not multi-user systems, but libraries that do this sort of caching need to easily support many kinds of environments. Therefore it would be nice to have /run/user/<UID>/ be universally available, so that libraries can use it w/o concern about use on multi-user systems vs. single-user systems. Where /run/user/<UID>/ exists it is created by PAM, but obviously if it were to exist in Kubernetes containers it should be created by Kubernetes.

Background:

/tmp/ is not appropriate for caches of this sort because on multi-user systems other users can mount attacks on such caches. When coded defensively such attacks can amount to no more than a denial-of-service, but still, it would be easier to safely code such caches if they could use a temporary location that is guaranteed to have the correct permissions (0700) and where none of the parent directories can have world-writable permissions like 0777 or 01777.

Kerberos client libraries typically use files named /tmp/krb5cc_<UID>, or directories named similarly -- well-known names, not mkstemp()ed names as these need to be easily found without having to look through a possibly-huge directory listing. In Unix time /run/user/<UID>/ is very new, and generally it is only ever created by a PAM, and PAM is not used in starting containers in Kubernetes, therefore /run/user/<UID>/ does not exist in Kubernetes containers.

Because such files have to have well-known names, they can be subject to attack on multi-user systems. E.g., creating /tmp/krb5cc_1000 as a symlink to some other file, or creating it as a regular 0666 mode file, etc. Clients need to use O_NOFOLLOW and/or lstat(2) and fstat(2) to make sure that they open only regular, non-symlink files, and they need to check that getuid() owns the file, and that the file has appropriate permissions (0600).

Because we use aud in our tokens to limit their applicability we also have apps that need many tokens. Therefore we want to be able to cache them. Because some of our apps are multi-process apps, or invoke external short-lived programs that fetch and use tokens, we will be able to reduce load on our issuers by having a file-based cache as opposed to in-memory caches only. In our file-based cache of tokens we currently use a 0700 /tmp/tokens_<UID>/, with 0600 regular files in there named after a hash of the issuer and audience of the token modulo the max number of tokens allowed in the cache, and we write the issuer, audience, expiration, token, and other metadata into those files, one token per-file. Ideally we would have /run/user/<UID>/ then we could use /run/user/<UID>/tokens/ (or some such). (Clearly managing the namespace in /run/user/<UID>/ may eventually be a problem, but at this time it is not, and anyway it won't be a problem for the Kubernetes community to manage.)

/run/ is part of the FHS. /run/user/<UID>/ is not part of the FHS, but a) it exists on Linux systems running systemd, b) it is part of XDG and exists on FreeBSD. (I don't have a FreeBSD system to test on, but I think perhaps they use /run/user/${USER}/ rather than /run/user/<UID>/. Using a UID is better than a username because it's always possible to call getuid() to get the UID, while a username is not always possible or as easy to obtain.)

The text was updated successfully, but these errors were encountered:

kannon92 · 2024-08-22T17:54:24Z

/sig node

Bharadwajshivam28 · 2024-10-03T23:57:46Z

Hey @kannon92 is someone working on this or should I take it?

kannon92 · 2024-10-04T13:25:57Z

Please read up on the KEP process. https://github.com/kubernetes/enhancements/tree/master/keps#kubernetes-enhancement-proposals-keps is a good short doc.

This issue tracks the work but it is not considered the KEP. To work on this, we would like to see a KEP that follows the community process.

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jul 26, 2024

nicowilliams mentioned this issue Jul 26, 2024

Containers should get a /run/user/<UID>/ tmpfs volume mount kubernetes/kubernetes#126394

Open

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Containers should get a `/run/user/<UID>/` tmpfs volume mount (maybe opt-in) to match Linux w/ systemd #4776

Containers should get a `/run/user/<UID>/` tmpfs volume mount (maybe opt-in) to match Linux w/ systemd #4776

nicowilliams commented Jul 26, 2024 •

edited

Loading

kannon92 commented Aug 22, 2024

Bharadwajshivam28 commented Oct 3, 2024

kannon92 commented Oct 4, 2024 •

edited

Loading

Containers should get a /run/user/<UID>/ tmpfs volume mount (maybe opt-in) to match Linux w/ systemd #4776

Containers should get a /run/user/<UID>/ tmpfs volume mount (maybe opt-in) to match Linux w/ systemd #4776

Comments

nicowilliams commented Jul 26, 2024 • edited Loading

Enhancement Description

What would you like to be added?

Why is this needed?

kannon92 commented Aug 22, 2024

Bharadwajshivam28 commented Oct 3, 2024

kannon92 commented Oct 4, 2024 • edited Loading

Containers should get a `/run/user/<UID>/` tmpfs volume mount (maybe opt-in) to match Linux w/ systemd #4776

Containers should get a `/run/user/<UID>/` tmpfs volume mount (maybe opt-in) to match Linux w/ systemd #4776

nicowilliams commented Jul 26, 2024 •

edited

Loading

kannon92 commented Oct 4, 2024 •

edited

Loading