-
Notifications
You must be signed in to change notification settings - Fork 36
ServiceAccount and PodSecurityPolicy usage
While investigating privilege problems when adding Istio to SCF we saw that SCF (and Fissile) use service accounts and pod security policies (PSPs) in a way that none of us seemed to understand.
The current implementation seems to be based on a misunderstanding of how these actually work, and they can be significantly simplified while maintaining the same or even better security.
Each pod is created using a service account, which will need to have one or
more PSPs bound to it. The pod spec includes a securityContext
that specifies
any required settings like privilege mode or Linux capabilities.
The admission controller will check each PSP bound to the service account in
alphabetical order to find one that satisfies all requested settings from
securityContext
. If none is found, the pod is not created.
The permissions of the pod will be those requested by the securityContext
; it
will not get all permissions that the selected PSP potentially allows. For this
reason, it doesn't really matter which PSP was used to validate the request.
Unprivileged service account in this document means a service account that doesn't have any PSP bound to it that grants privileged mode to the container.
The only time an unprivileged service account makes sense is when it also grants the right to create pods directly, or indirectly via deployments, stateful sets, replication controllers, jobs, daemons, and any other type of k8s resource that creates pods. In this case, restricting the PSP of the service account makes sense but is only fully effective as long as there is no other service account that grants privileged mode in the same namespace.
Privilege escalation via pod creation is a well-known and documented issue with Kubernetes:
Privilege escalation via pod creation:
Caution: System administrators, use care when granting access to pod creation. A user granted permission to create pods (or controllers that create pods) in the namespace can: read all secrets in the namespace; read all config maps in the namespace; and impersonate any service account in the namespace and take any action the account could take. This applies regardless of authorization mode.
Therefore having additional service accounts with restricted PSPs is not
providing any additional security since we already need at least one privileged
service account (for diego-cell
).
Another misunderstanding in the Fissile model is the entangling of privileged
mode and the ALL
capabilities. We have no explicit setting in the
role-manifest to specify privileged: true
in the securityContext
. Instead,
we have overloaded capabilities: [ALL]
to imply it. It is wrong because
privileged mode also implies access to host system devices, which we don't need
in most cases (probably just for diego-cell
to allow mounting volumes inside
the garden-runc
containers).
Note that the following actions are based on assumptions that may turn out to be false, so some things may be more complex than anticipated.
This feature was added in fissile#351. It was used to selectively add capabilities to some roles on specific platforms only. This was a mistake; the capabilities should have been added globally in the role manifest. The platforms that didn't "need" the capabilities just had them enabled by default or were not checking for them.
There should be no legitimate use for this feature if the role manifest is correct.
- Get rid of the
bosh_containerization.pod-security-policy
key in the role manifest. - Use
bosh_containerization.run.service-account
to request a service account with the exact bindings needed.
Define bosh_containerization.run.privileged
. It should apply to the
securityContext
of the container and not of the pod, just like
bosh_containerization.run.capabilities
. Just like capabilities
are combined, setting a single job to privileged makes all jobs in
the same instance group privileged (because they currently all run
in the same container).
The allowPrivilegeEscalation
setting for the container securityContext
should be set statically in Fissile to be always false
unless the
container is privileged or the capabilities include ALL
or SYS_ADMIN
.
This is necessary for the configgin-role
to update the labels on
its own pod. See fissile#426
and fissile#444
for more information.
There should be no need to use cluster roles and cluster role bindings. We should be able to convert all cluster roles to namespaced roles.
We have an extensive list of role privileges without documentation in regards to why each one is needed. At a minimum, this should be documented inside the role manifest, so that individual privileges can be removed when they are no longer required.
There are references to the current implementation in the wiki: Managing PSPs and Overriding capabilities as well as the docs: capabilities.md. These need to be updated to match any changes.
We should not use a ClusterRole that grants every right on the cluster to
Eirini, as this is practically the same as disabling RBAC. Instead, we should
create a Role that grants all access in the EIRINI_KUBE_NAMESPACE
, and bind
it to the Eirini service account in the SCF
namespace through a RoleBinding
that also lives in the EIRINI_KUBE_NAMESPACE
.
An illustrative example of this can be found in this gist.
Once the use of PSPs has been simplified we need to figure out how to configure Istio without granting excessive privileges when the user doesn't want to use it.
The Github PR for this is: Add permission definitions for Istio automatic sidecar injection #2082 SCF.
Also check: Istio install failed when PSP enabled #6806 istio.
The brains acceptance tests currently require a cluster role with wide privileges. It should be possible to limit this to a single namespace, plus a cross-namespace role binding, similar to Eirini.
Jan Dubois [email protected] and Thulio Ferraz Assis [email protected]