Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce RBAC privileges for single namespace deployment #626

Closed
wants to merge 1 commit into from

Conversation

holzman
Copy link
Contributor

@holzman holzman commented Oct 20, 2022

Overview

The dask-gateway helm chart deploys three microservices: gateway, controller, traefik into a single namespace.

When a user requests a new dask cluster, a number of Kubernetes custom resource definition objects are created, as well as pods for dask-scheduler and dask-workers. By default, these are deployed into the same namespace as the microservices, but can be configured to deploy to a separate namespace via the gateway.backend.namespace helm value.

For the same-namespace deployment, the chart creates cluster-wide permissions that are unnecessarily permissive. This PR checks if the user has set gateway.backend.namespace to a different namespace. In the case of a single-namespace deployment, it moves ClusterRole permissions to namespaced Role permissions
whenever possible. It also adds an option to the traefik chart so that it only listens on a single namespace.

In addition, for the single-namespace deployment, the code itself is also unnecessarily broad - for example, it performs cluster-wide queries for daskcluster when it should restrict itself to namespaced queries. This PR does not address this, but #594 does.

What I wrote previously:

If we are deploying into a single namespace (gateway.backend.Namespace is either null or the same as the release namespace), move as many privileges as possible from cluster-wide to namespaced ones.

This is one possible mitigation to the security concerns discussed in #250.

@consideRatio
Copy link
Collaborator

It sounds reasonable! I suspect we have tried this in the past but failed because we didn't specify the namespace part for traefik as you do here. Not sure.

I'd like to have a good overview of the kind of permissions we can and can't reduce to not being cluster wide before merging this, and preferably, some notes about it in the helm templates to help us maintain this long term.

@consideRatio
Copy link
Collaborator

@holzman we used traefik 2.6.x, now 2.9.x if you rebase this PR on main. Just wanted to mention that in case you would run into a bug fixed already in traefik.

@consideRatio
Copy link
Collaborator

@holzman I'm low on maintenance capacity, but note that #594 is open as well - I think you look to accomplish similar things. It would be good if you help coordinate with that PR in some way.

@holzman
Copy link
Contributor Author

holzman commented Oct 20, 2022

Hi Erik -

I was going to propose something similar to #594 as a follow-up PR - this PR only changes the helm chart and doesn't change any dask-gateway code.

I can certainly integrate this with #594 if you'd prefer to consolidate.

@holzman holzman force-pushed the single-namespace-rbac branch from 05cfbdb to 082cdc4 Compare October 20, 2022 18:27
@consideRatio
Copy link
Collaborator

Help me and other maintainers overview the changes as best as you can in the PR description, if this should include other changes or not is hard for me to evaluate without the overview.

@holzman
Copy link
Contributor Author

holzman commented Oct 21, 2022

Ok - I'll fix up the code (tests are failing because it's missing some privileges for the multi-namespace case) and write up a clear overview for this PR.

If we are deploying into a single namespace (gateway.backend.Namespace is
either null or the same as the release namespace), move as many privileges
as possible from cluster-wide to namespaced ones.

Signed-off-by: Burt Holzman <[email protected]>
@holzman holzman force-pushed the single-namespace-rbac branch from 082cdc4 to 8f6a2e7 Compare October 21, 2022 19:47
@holzman
Copy link
Contributor Author

holzman commented Oct 24, 2022

Ah, this exposed an issue with the current tests - they assume that the scheduler/worker namespace is "default" and don't cover a test case where the scheduler/worker namespace is different.

I'll submit a clean separate PR that only extends the test coverage.

@consideRatio
Copy link
Collaborator

Closed temporarily as motivated in #594 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants