ServiceAccount and PodSecurityPolicy usage

While investigating privilege problems when adding Istio to SCF we saw that SCF (and Fissile) use service accounts and pod security policies (PSPs) in a way that none of us seemed to understand.

The current implementation seems to be based on a misunderstanding of how these actually work, and they can be significantly simplified while maintaining the same or even better security.

How do service accounts and PSPs work?

Each pod is created using a service account, which will need to have one or more PSPs bound to it. The pod spec includes a securityContext that specifies any required settings like privilege mode or Linux capabilities.

The admission controller will check each PSP bound to the service account in alphabetical order to find one that satisfies all requested settings from securityContext. If none is found, the pod is not created.

The permissions of the pod will be those requested by the securityContext; it will not get all permissions that the selected PSP potentially allows. For this reason, it doesn't really matter which PSP was used to validate the request.

Do we need an unprivileged service account?

Unprivileged service account in this document means a service account that doesn't have any PSP bound to it that grants privileged mode to the container.

The only time an unprivileged service account makes sense is when it also grants the right to create pods directly, or indirectly via deployments, stateful sets, replication controllers, jobs, daemons, and any other type of k8s resource that creates pods. In this case, restricting the PSP of the service account makes sense but is only fully effective as long as there is no other service account that grants privileged mode in the same namespace.

Privilege escalation via pod creation is a well-known and documented issue with Kubernetes:

Privilege escalation via pod creation:

Caution: System administrators, use care when granting access to pod creation. A user granted permission to create pods (or controllers that create pods) in the namespace can: read all secrets in the namespace; read all config maps in the namespace; and impersonate any service account in the namespace and take any action the account could take. This applies regardless of authorization mode.

Therefore having additional service accounts with restricted PSPs is not providing any additional security since we already need at least one privileged service account (for diego-cell).

Privileged containers and Linux capabilities

Another misunderstanding in the Fissile model is the entangling of privileged mode and the ALL capabilities. We have no explicit setting in the role-manifest to specify privileged: true in the securityContext. Instead, we have overloaded capabilities: [ALL] to imply it. It is wrong because privileged mode also implies access to host system devices, which we don't need in most cases (probably just for diego-cell to allow mounting volumes inside the garden-runc containers).

What should we do?

Note that the following actions are based on assumptions that may turn out to be false, so some things may be more complex than anticipated.

Remove capabilities overrides in values.yaml

This feature was added in fissile#351. It was used to selectively add capabilities to some roles on specific platforms only. This was a mistake; the capabilities should have been added globally in the role manifest. The platforms that didn't "need" the capabilities just had them enabled by default or were not checking for them.

There should be no legitimate use for this feature if the role manifest is correct.

Use the privileged PSP everywhere

Get rid of the bosh_containerization.pod-security-policy key in the role manifest.
Use bosh_containerization.run.service-account to request a service account with the exact bindings needed.

Provide a mechanism to request privileged mode for containers

Define bosh_containerization.run.privileged. It should apply to the securityContext of the container and not of the pod, just like bosh_containerization.run.capabilities. Just like capabilities are combined, setting a single job to privileged makes all jobs in the same instance group privileged (because they currently all run in the same container).

The allowPrivilegeEscalation setting for the container securityContext should be set statically in Fissile to be always false unless the container is privileged or the capabilities include ALL or SYS_ADMIN. This is necessary for the configgin-role to update the labels on its own pod. See fissile#426 and fissile#444 for more information.

Get rid of cluster roles

There should be no need to use cluster roles and cluster role bindings. We should be able to convert all cluster roles to namespaced roles.

Document all roles

We have an extensive list of role privileges without documentation in regards to why each one is needed. At a minimum, this should be documented inside the role manifest, so that individual privileges can be removed when they are no longer required.

Update the documentation

There are references to the current implementation in the wiki: Managing PSPs and Overriding capabilities as well as the docs: capabilities.md. These need to be updated to match any changes.

Future work

Determine role bindings for Eirini

We should not use a ClusterRole that grants every right on the cluster to Eirini, as this is practically the same as disabling RBAC. Instead, we should create a Role that grants all access in the EIRINI_KUBE_NAMESPACE, and bind it to the Eirini service account in the SCF namespace through a RoleBinding that also lives in the EIRINI_KUBE_NAMESPACE.

An illustrative example of this can be found in this gist.

Determine how to enable Istio

Once the use of PSPs has been simplified we need to figure out how to configure Istio without granting excessive privileges when the user doesn't want to use it.

The Github PR for this is: Add permission definitions for Istio automatic sidecar injection #2082 SCF.

Also check: Istio install failed when PSP enabled #6806 istio.

Limit the scope of the test service accounts

The brains acceptance tests currently require a cluster role with wide privileges. It should be possible to limit this to a single namespace, plus a cross-namespace role binding, similar to Eirini.

Authors

Jan Dubois [email protected] and Thulio Ferraz Assis [email protected]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ServiceAccount and PodSecurityPolicy usage

ServiceAccount and PodSecurityPolicy usage

How do service accounts and PSPs work?

Do we need an unprivileged service account?

Privileged containers and Linux capabilities

What should we do?

Remove capabilities overrides in values.yaml

Use the privileged PSP everywhere

Provide a mechanism to request privileged mode for containers

Get rid of cluster roles

Document all roles

Update the documentation

Future work

Determine role bindings for Eirini

Determine how to enable Istio

Limit the scope of the test service accounts

Authors

Clone this wiki locally