Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-3619: Cleanup After Freeze #3874

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
232 changes: 10 additions & 222 deletions keps/sig-node/3619-supplemental-groups-policy/README.md
Original file line number Diff line number Diff line change
@@ -1,76 +1,6 @@
<!--
**Note:** When your KEP is complete, all of these comment blocks should be removed.

To get started with this template:

- [ ] **Pick a hosting SIG.**
Make sure that the problem space is something the SIG is interested in taking
up. KEPs should not be checked in without a sponsoring SIG.
- [ ] **Create an issue in kubernetes/enhancements**
When filing an enhancement tracking issue, please make sure to complete all
fields in that template. One of the fields asks for a link to the KEP. You
can leave that blank until this KEP is filed, and then go back to the
enhancement and add the link.
- [ ] **Make a copy of this template directory.**
Copy this template into the owning SIG's directory and name it
`NNNN-short-descriptive-title`, where `NNNN` is the issue number (with no
leading-zero padding) assigned to your enhancement above.
- [ ] **Fill out as much of the kep.yaml file as you can.**
At minimum, you should fill in the "Title", "Authors", "Owning-sig",
"Status", and date-related fields.
- [ ] **Fill out this file as best you can.**
At minimum, you should fill in the "Summary" and "Motivation" sections.
These should be easy if you've preflighted the idea of the KEP with the
appropriate SIG(s).
- [ ] **Create a PR for this KEP.**
Assign it to people in the SIG who are sponsoring this process.
- [ ] **Merge early and iterate.**
Avoid getting hung up on specific details and instead aim to get the goals of
the KEP clarified and merged quickly. The best way to do this is to just
start with the high-level sections and fill out details incrementally in
subsequent PRs.

Just because a KEP is merged does not mean it is complete or approved. Any KEP
marked as `provisional` is a working document and subject to change. You can
denote sections that are under active debate as follows:

```
<<[UNRESOLVED optional short context or usernames ]>>
Stuff that is being argued.
<<[/UNRESOLVED]>>
```

When editing KEPS, aim for tightly-scoped, single-topic PRs to keep discussions
focused. If you disagree with what is already in a document, open a new PR
with suggested changes.

One KEP corresponds to one "feature" or "enhancement" for its whole lifecycle.
You do not need a new KEP to move from beta to GA, for example. If
new details emerge that belong in the KEP, edit the KEP. Once a feature has become
"implemented", major changes should get new KEPs.

The canonical place for the latest set of instructions (and the likely source
of this file) is [here](/keps/NNNN-kep-template/README.md).

**Note:** Any PRs to move a KEP to `implementable`, or significant changes once
it is marked `implementable`, must be approved by each of the KEP approvers.
If none of those approvers are still appropriate, then changes to that list
should be approved by the remaining approvers and/or the owning SIG (or
SIG Architecture for cross-cutting KEPs).
-->
# KEP-3619: Fine-grained SupplementalGroups control

<!--
This is the title of your KEP. Keep it short, simple, and descriptive. A good
title can help communicate what the KEP is and should be considered as part of
any review.
-->

<!--
A table of contents is helpful for quickly jumping to sections of a KEP and for
highlighting any additional information provided beyond the standard KEP
template.

Ensure the TOC is wrapped with
<code>&lt;!-- toc --&rt;&lt;!-- /toc --&rt;</code>
tags, and then generate with `hack/update-toc.sh`.
Expand Down Expand Up @@ -147,18 +77,18 @@ checklist items _must_ be updated for the enhancement to be released.

Items marked with (R) are required *prior to targeting to a milestone / release*.

- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
- [ ] (R) Design details are appropriately documented
- [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
- [x] (R) KEP approvers have approved the KEP status as `implementable`
- [x] (R) Design details are appropriately documented
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- [ ] e2e Tests for all Beta API Operations (endpoints)
- [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
- [ ] (R) Graduation criteria is in place
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
- [ ] (R) Production readiness review completed
- [ ] (R) Production readiness review approved
- [ ] "Implementation History" section is up-to-date for milestone
- [x] (R) Production readiness review completed
- [x] (R) Production readiness review approved
- [x] "Implementation History" section is up-to-date for milestone
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes

Expand All @@ -173,25 +103,6 @@ Items marked with (R) are required *prior to targeting to a milestone / release*

## Summary

<!--
This section is incredibly important for producing high-quality, user-focused
documentation such as release notes or a development roadmap. It should be
possible to collect this information before implementation begins, in order to
avoid requiring implementors to split their attention between writing release
notes and implementing the feature itself. KEP editors and SIG Docs
should help to ensure that the tone and content of the `Summary` section is
useful for a wide audience.

A good summary is probably at least a paragraph in length.

Both in this section and below, follow the guidelines of the [documentation
style guide]. In particular, wrap lines to a reasonable length, to make it
easier for reviewers to cite specific portions, and to minimize diff churn on
updates.

[documentation style guide]: https://github.com/kubernetes/community/blob/master/contributors/guide/style-guide.md
-->

The KEP seeks to provide a way to choose correct behavior with how Container Runtimes (Containerd and CRI-O) are applying `SupplementalGroups` to the first container processes. The KEP describes the work needed to be done in Kubernetes or connected projects to make sure customers have a clear migration path - including detection and safe upgrade - if any of their workflows took a dependency on this arguably erroneous behavior.

### The issue
Expand Down Expand Up @@ -247,15 +158,6 @@ uid=1000(alice) gid=1000(alice) groups=1000(alice),50000(group-in-image),60000

## Motivation

<!--
This section is for explicitly listing the motivation, goals, and non-goals of
this KEP. Describe why the change is important and the benefits to users. The
motivation section can optionally provide links to [experience reports] to
demonstrate the interest in a KEP within the wider Kubernetes community.

[experience reports]: https://github.com/golang/go/wiki/ExperienceReports
-->

As described above, how supplemental groups attached to the first container process is complicated and not OCI image spec compliant.

Moreover, this causes security considerations as follows. When a cluster enforces some security policy for pods that protects the value of `RunAsGroup` and `SupplementalGroups`, the effect of its enforcement is limited, i.e., cluster users can easily bypass the policy enforcement just by using a custom image. If such a bypass happened, it would be unexpected behavior for most cluster administrators because the enforcement is almost useless. Moreover, the bypass will cause unexpected file access permission. In some use cases, the unexpected file access permission will be a security concern. For example, using `hostPath` volumes could be a severe problem because UID/GIDs matter in accessing files/directories in the volumes.
Expand All @@ -266,36 +168,17 @@ Thus, this KEP proposes to offer a new API field named `SupplementalGroupsPolicy

### Goals

<!--
List the specific goals of the KEP. What is it trying to achieve? How will we
know that this has succeeded?
-->

- To Provide a new API field to control exactly which groups the container process belongs to
- Ensure there are clear steps documented for end users to detect if their workload is affected
- (Optional) provide helper APIs and/or tooling to simplify the detection

### Non-Goals

<!--
What is out of scope for this KEP? Listing non-goals helps to focus discussion
and make progress.
-->

- To provide a cluster-wide control method.
- To change the default behavior (a potentially breaking change)

## Proposal

<!--
This is where we get down to the specifics of what the proposal actually is.
This should have enough detail that reviewers can understand exactly what
you're proposing, but should not include things like API designs or
implementation. What is the desired outcome and how do we measure success?.
The "Design Details" section below is for the real
nitty-gritty.
-->

This KEP proposes changes both on Kubernets API and CRI levels.

### Kubernetes API
Expand Down Expand Up @@ -351,14 +234,6 @@ message ContainerUser {

### User Stories (Optional)

<!--
Detail the things that people will be able to do if this KEP is implemented.
Include as much detail as possible so that people can understand the "how" of
the system. The goal here is to make this feel real for users without getting
bogged down.
-->


#### Story 1: Deploy a Security Policy to enforce `SupplementalGroupsPolicy` field

Assume a multi-tenant kubernetes cluster with `hostPath` volumes below situations:
Expand Down Expand Up @@ -394,42 +269,16 @@ Please note that a security policy without `supplementalGroupsPolicy` would lead

### Notes/Constraints/Caveats (Optional)

<!--
What are the caveats to the proposal?
What are some important details that didn't come across above?
Go in to as much detail as necessary here.
This might be a good place to talk about core concepts and how they relate.
-->

The proposal affects to the CRI implementations (e.g., containerd, cri-o, gVisor, etc.)

### Risks and Mitigations

<!--
What are the risks of this proposal, and how do we mitigate? Think broadly.
For example, consider both security and how this will impact the larger
Kubernetes ecosystem.

How will security be reviewed, and by whom?

How will UX be reviewed, and by whom?

Consider including folks who also work outside the SIG or subproject.
-->

- How to track the support status in CRI implementations of this proposal?
- This feature is mainly implemented inside each CRI implementation.
- How to feature-gate this feature in CRI implementations?

## Design Details

<!--
This section should contain enough information that the specifics of your
change are understandable. This may include API specs (though not always
required) or even code snippets. If there's any ambiguity about HOW your
proposal will be implemented, this is the place to discuss them.
-->

### Kubernetes API

#### SupplementalGroupsPolicy in PodSecurityContext
Expand Down Expand Up @@ -648,34 +497,6 @@ We expect no non-infra related flakes in the last month as a GA graduation crite

### Graduation Criteria

<!--
**Note:** *Not required until targeted at a release.*

Define graduation milestones.

These may be defined in terms of API maturity, [feature gate] graduations, or as
something else. The KEP should keep this high-level with a focus on what
signals will be looked at to determine graduation.

Consider the following in developing the graduation criteria for this enhancement:
- [Maturity levels (`alpha`, `beta`, `stable`)][maturity-levels]
- [Feature gate][feature gate] lifecycle
- [Deprecation policy][deprecation-policy]

Clearly define what graduation means by either linking to the [API doc
definition](https://kubernetes.io/docs/concepts/overview/kubernetes-api/#api-versioning)
or by redefining what graduation means.

In general we try to use the same stages (alpha, beta, GA), regardless of how the
functionality is accessed.

[feature gate]: https://git.k8s.io/community/contributors/devel/sig-architecture/feature-gates.md
[maturity-levels]: https://git.k8s.io/community/contributors/devel/sig-architecture/api_changes.md#alpha-beta-and-stable-versions
[deprecation-policy]: https://kubernetes.io/docs/reference/using-api/deprecation-policy/

Below are some examples to consider, in addition to the aforementioned [maturity levels][maturity-levels].
-->

Because this KEP's core implementation(i.e. `SupplementalGroupsPolicy` handling) lies inside of CRI implementations(e.g. containerd, cri-o), the graduation criteria contains the support statuses of the updated CRI by container runtimes.

#### Alpha
Expand All @@ -700,33 +521,8 @@ Because this KEP's core implementation(i.e. `SupplementalGroupsPolicy` handling)

### Upgrade / Downgrade Strategy

<!--
If applicable, how will the component be upgraded and downgraded? Make sure
this is in the test plan.

Consider the following in developing an upgrade/downgrade strategy for this
enhancement:
- What changes (in invocations, configurations, API use, etc.) is an existing
cluster required to make on upgrade, in order to maintain previous behavior?
- What changes (in invocations, configurations, API use, etc.) is an existing
cluster required to make on upgrade, in order to make use of the enhancement?
-->

### Version Skew Strategy

<!--
If applicable, how will the component handle version skew with other
components? What are the guarantees? Make sure this is in the test plan.

Consider the following in developing a version skew strategy for this
enhancement:
- Does this enhancement involve coordinating behavior in the control plane and
in the kubelet? How does an n-2 kubelet without this feature available behave
when this feature is used?
- Will any other components on the node change? For example, changes to CSI,
CRI or CNI may require updating that component before the kubelet.
-->

- CRI must support this feature, especially when using `SupplementalGroupsPolicy=Strict`.
- kubelet must be at least the version of control-plane components.

Expand Down Expand Up @@ -1062,6 +858,8 @@ Are there any tests that were run/should be run to understand performance charac
and validate the declared limits?
-->

No.

### Troubleshooting

<!--
Expand Down Expand Up @@ -1107,6 +905,8 @@ Major milestones might include:
- when the KEP was retired or superseded
-->

- 2023-02-10: Initial KEP published.

## Drawbacks

<!--
Expand All @@ -1117,12 +917,6 @@ N/A

## Alternatives

<!--
What other approaches did you consider, and why did you rule them out? These do
not need to be as detailed as the proposal, but should include enough
information to express the idea and why it was not acceptable.
-->

### Introducing `RutimeClass`

As described in the [Motivation](#motivation) section, cluster administrators would need to deploy a custom low-level container runtime(e.g., [pfnet-research/strict-supplementalgroups-container-runtime](https://github.com/pfnet-research/strict-supplementalgroups-container-runtime)) that modifies OCI container runtime spec(`config.json`) produced by CRI implementations (e.g., containerd, cri-o). A custom `RuntimeClass` would be introduced for it.
Expand All @@ -1137,10 +931,4 @@ We could just fix CRI implementations directly without introducing new APIs. Th

## Infrastructure Needed (Optional)

<!--
Use this section if you need things from the project/SIG. Examples include a
new subproject, repos requested, or GitHub details. Listing these here allows a
SIG to get the process for these resources started right away.
-->

N/A