Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Trust Store for TLS and SSH #3366

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
344 changes: 344 additions & 0 deletions rfcs/0000-trust-store-for-tls-and-ssh/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,344 @@
# RFC-NNNN Trust Store for TLS and SSH

**Status:** provisional

**Creation date:** 2022-12-02

**Last update:** 2022-12-02

## Summary

Consolidates and formalizes the supported ways to establish trusted connections
with remote servers via Transport Layer Security (TLS) and Secure Shell (SSH).
Resulting on new ways to configure trust, and allowing administrators the
capability to disable some of the existing options.

## Motivation

The current model could be improved by allowing for controller-level trust
configurations, so that multiple objects connecting to the same server don't
need to specify overrides. The approach aligns with both TLS and SSH canonical
OS level implementations, in which they rely on a global trust store to define
machine level trust, but users and applications can further expand on trusted
servers (or CAs), when not blocked by administrators.

Known Hosts (used by SSH connections), and CA Bundles (for TLS), are not
particularly sensitive information - when leaving aside privacy considerations.
Before this RFC, the officially supported approach leans on secrets to pass this
information to the controllers. The same secret is also use to provide user
credentials, which is more sensitive in nature, making this sub-optimal from a
security stand-point.

### Goals

- Consolidate the officially supported trust settings across the Flux ecosystem.
- Formalize support for configuring trust at controller-level.
- Add toggle to block object-level trust overrides.
- Enable users to surface trust information securely.

### Non-Goals

- Maintain backwards compatibility with older versions of Flux.

## Proposal

For configuring system-wide trust, Flux would rely on the well-established OS-level
trust stores. When dynamically mounting of the trust store is required, it will be
enabled by using Kubernetes `Secret` and `ConfigMap` mounting. When immutable trust
store is required, users can build their own version of the controllers, with their
baked-in settings.

TLS and SSH use different techniques to establish the identity of remote servers,
each relying on its own trust store.

The sections below will dive into the specifics of each one, highlighting their
details, changes required and example of the proposed usage.

A new way to configure object-level Trust Store overrides is also being proposed,
in combination with a controller level toggle to disable it.

### SSH

In SSH, the remote server identity is based on [Trust on first use]. At first
connection to a new server, the user confirms whether or not to trust that server
based on the server's Public key fingerprint.

In the context of Flux, which provides no user interaction, if the remote server
finger print is not configured within the provided set of Known Hosts, the
connection is aborted.

#### Controller-level Known Hosts

For setting controller level Known Hosts, we propose the use of the existing
Linux file in disk: [/etc/ssh/ssh_known_hosts].

Users would be able to configure the OS level trust store by mounting either
a `ConfigMap` or `Secret` directly into the Flux Controllers.

`ConfigMap` example:
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: flux-trust-store
namespace: flux-system
data:
known_hosts: |
github.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg=
```

Patch required on the main `kustomization.yaml`:
```yaml
- patch: |
- op: add
path: /spec/template/spec/containers/0/volumeMounts
value:
- name: ssh-trust-store
mountPath: /etc/ssh/ssh_known_hosts
subPath: known_hosts
readOnly: true
- op: add
path: /spec/template/spec/volumes
value:
- name: ssh-trust-store
configMap:
name: flux-trust-store
target:
kind: Deployment
name: "(kustomize|image-automation|source|image-reflector|helm|notification)-controller"
```

#### Object-level Known Hosts Expansion

A new field is to be introduced into the existing kinds `ImageUpdateAutomation` and
`GitRepository`, to allow users to expand on the controller-level known hosts for
SSH operations:
Comment on lines +113 to +115
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this very problematic, why would we allow for a namespaced object own by a tenant to alter the trust store for all tenants? This also makes Flux bootstrap break, since you can't control which CR is reconciled first, if people add a trust store in one CR we can't reconcile that first to make it available to others.

Copy link
Member Author

@pjbgf pjbgf Dec 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not alter the trust store for all tenants, but rather only expands the trust store for that object only.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this cope with the know_hosts from the secretRef? Can you explain the differences in the doc please?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will rephrase to make that clearer. But basically that's what we already do at present based on the CA bundles provided by secrets, however this would be based on a field instead.

```
spec:
trustStore:
ssh:
secretRef:
configMapRef:
```

The trust store can be expanded by either setting `spec.trustStore.ssh.secretRef` or
`spec.trustStore.ssh.configMapRef`, not both. Either option should contain the data
under a `known_hosts` key.

Known hosts configured this way will be aggregated with the ones defined at both
system and controller levels.

#### Pre-populated trust store

Flux container images would be pre-populated with [/etc/ssh/ssh_known_hosts] from
the main Git SaaS providers. As a result, users will only need to update their SSH
Trust Store for custom or less well known servers.

#### TLS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#### TLS
### TLS


In TLS, the remote server identity is based on [public key infrastructure] and the
trust is based on the confirmation that the remote server's certificate was issued
by a "trusted" Certificate Authority (CA).

The OS level trust store contains the root trusted CAs, and any other certificate
that should be trusted by the machine. Note that CAs can verify other CAs, providing
an hierarchical chain of trust. Certificates that are not part of the chain, which
could be your own self-signed certificates, are considered untrustworthy by default.
TLS communications against untrusted remote servers are aborted.

#### Controller-level Trusted Certificates

**Note:** this requires no changes to the controllers, as this is based on the ways
in which TLS surface the trust store. This RFC only formalizes it as a supported
approach.

To trust CAs that are not part of the root trusted CAs, the OS level trust store
needs to be updated by mounting either a `ConfigMap` or `Secret` directly into the
Flux Controllers.

`Secret` example:
```yaml
apiVersion: v1
kind: Secret
metadata:
name: flux-trust-store
namespace: flux-system
data:
customCA.pem: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJoekNDQVMyZ0F3SUJBZ0lVZHNBdGlYM2dOMHVrN2RkeEFTV1lFL3RkdjB3d0NnWUlLb1pJemowRUF3SXcKR1RFWE1CVUdBMVVFQXhNT1pYaGhiWEJzWlM1amIyMGdRMEV3SGhjTk1qQXdOREUzTURneE9EQXdXaGNOTWpVdwpOREUyTURneE9EQXdXakFaTVJjd0ZRWURWUVFERXc1bGVHRnRjR3hsTG1OdmJTQkRRVEJaTUJNR0J5cUdTTTQ5CkFnRUdDQ3FHU000OUF3RUhBMElBQks3aC81RDhiVjkzTW1FZGh1MDJKc1M2dWdCOHM2UHpSbDNQVjR4czNTYnIKUk5ra001OSt4M2IwaVd4L2k3NnFQWXBOTG9pVlVWWFFtQTlZKzREYk14aWpVekJSTUE0R0ExVWREd0VCL3dRRQpBd0lCQmpBUEJnTlZIUk1CQWY4RUJUQURBUUgvTUIwR0ExVWREZ1FXQkJRR3lVaVUxUUVaaU1BcWpzbklZVHdaCjR5cDV3ekFQQmdOVkhSRUVDREFHaHdSL0FBQUJNQW9HQ0NxR1NNNDlCQU1DQTBnQU1FVUNJUUR6ZHR2S2RFOE8KMStXUlRaOU11U2lGWWNyRXo3Wm5lN1ZYb3VERUtxS0VpZ0lnTTRXbGJEZXVOQ0ticWhxait4WlYwcGEzcndlYgpPRDhFampDTVk2OVJNTzA9Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
```

Patch required on the main `kustomization.yaml`:
```yaml
- patch: |
- op: add
path: /spec/template/spec/containers/0/volumeMounts
value:
- name: tls-trust-store
mountPath: /etc/ssl/certs/ca-cert-flux.pem
subPath: customCA.pem
readOnly: true
- op: add
path: /spec/template/spec/volumes
value:
- name: tls-trust-store
secret:
secretName: flux-trust-store
target:
kind: Deployment
name: "(kustomize|image-automation|source|image-reflector|helm|notification)-controller"
```

#### Object-level Trusted Certificates Expansion

A new field is to be introduced into the existing kinds `Bucket`, `GitRepository`,
`HelmRepository`, `OCIRepository`, `ImageUpdateAutomation`, `Provider` and
`ImageRepository`, to allow users to expand on trusted CAs at controller-level for
HTTPS operations:

```yaml
spec:
trustStore:
tls:
secretRef:
configMapRef:
```

The trust store can be expanded by either setting `spec.trustStore.tls.secretRef`
or `spec.trustStore.tls.configMapRef`, not both. Either option should contain the
data under a `caFile` key.

CA bundles configured this way will be aggregated with the ones defined at both
system and controller levels.

#### Pre-populated trust store

Flux container images already come with pre-populated CA roots, which are
automatically updated by the Linux distribution used on the base images.
As a result, users only need to update their TLS Trust Store when acessing
web servers using certificates that were not signed by a Publicly trusted CA.

### Enabling Object-Level Trust Store

Object-level trust store expansion is disabled by default. To enable it start
the controller with:
Comment on lines +222 to +223
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to see this expanded a little to provide reasoning for this default setting. With the current Flux version 0.37 the trust store can always be expanded for objects such as HelmRepositories or GitRepositories by defining a caFile field in the referenced Secret. Given many admins will unlikely deviate from the defaults, users would no longer be able to do that. Please correct me if I'm wrong here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is certainly a point up for debate. The security invariant here was "Flux should only connect to trusted servers" and the criteria used was that most users would expect Flux to be secure by default.

Given many admins will unlikely deviate from the defaults

This is exactly the point. If something works, users won't look up the documentation and try understand the impact of security sensitive settings. Hence why ideally we should always prioritise security by design and by default.

The idea is that, by pre-populating the controller level SSH trust store with the top SaaS Git providers, a considerable amount of users would not have to worry about this. It will simply securely work - just like for TLS.
Then enterprise customers should configure this once, at controller level and be done with it, for both single and multi-tenancy. The only use case I see for Object-level trust stores to be used in production, would be in a Namespace as a Service multi-tenancy model, which would further weaken an already fragile security posture.


`--insecure-object-trust-store={tls-only,ssh-only,both}`

The flag defaults to `disabled`:
`--insecure-object-trust-store={disabled}`

### User Stories

#### Story 1

> As a tenant, I want to be able to expand trust settings so that I can
> connect to my own servers without needing to ask an administrator.

#### Story 2

> As a Platform admin, I want to configure all trusted servers at the controller
> level and block any specific team from overriding those settings.

#### Story 3

> As a Security Auditor, I want to be able to review all Known Hosts and CA Bundles
> being used within a Flux instance, without requiring RBAC access to more sensitive
> information.

### Alternatives

#### Consume controller-level settings via two new flags

Two new flags would be added into the controllers (`--tls-ca-bundles-secret` and
`--ssh-known-hosts-secret`) allowing for secrets to be consumed at startup time.

This would establish a "flux-specific" approach, which would not be aligned with
existing tools and applications that may need to coexist in the same container,
meaning that a Flux controller may trust a server, whilst other applications within
the container would not - or vice-versa.

#### Remove object-level trust store settings

Instead of creating a toggle to disable object-level trust settings, the entire
feature could have been deprecated. We have decided that by keeping the feature
in would allow for an easier transition.

#### Skip the implementation of the object-level blocker

Instead of creating a built-in feature to block the use of object-level Trust
Store expansion, we could rely on other tools and mechanisms within the Kubernetes
ecosystem (e.g. OPA) to enable users to achieve the same outcome.

Due to the importance that Flux has in the bootstrapping of clusters, such an
important requirement (enforce trust at controller level) should be inherit to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe

Suggested change
important requirement (enforce trust at controller level) should be inherit to
important requirement (enforce trust at controller level) should be inherent to

the controllers, instead of delegated to third party components.

## Design Details

### Auto-populating SSH Trust Store

Flux container images that access Git SSH servers (e.g. Source Controller, Image
Automation Controller and Flux CLI) will contain entries on [/etc/ssh/ssh_known_hosts]
for the most popular Git SaaS providers.

Each provider will contain one entry for each supported host key algorithm.
The `ssh_known_hosts` will be a static file in the respective repositories, and
the Dockerfile will simply copy it into the final image.

The known hosts will be updated via automation, which will issue PRs for the maintainers
to review and then approve. As a result, the trusted known hosts will be deterministic
based on the container image version used, in the same way that CAs are.

### Refreshing Controller-level Trust Store Values

The proposed approach heavily relies on built-in functionality in Kubernetes
and Linux distributions. Therefore, the disk contents will be automatically
refreshed when either [Secrets] or [ConfigMaps] are changed.

All SSH operations would need to read the file again for each operation, which
is analogous to the existing "load from memory" approach in place.

For TLS, this value is cached on first use and won't be refreshed until the
controller is restarted. In some instances, the recurrent failure by the
controller to establish connections with a remote server could cause the Pod
to be restarted, resulting in the TLS certs being refreshed.

[Secrets]: https://kubernetes.io/docs/concepts/configuration/secret/#mounted-secrets-are-updated-automatically
[ConfigMaps]: https://kubernetes.io/docs/concepts/configuration/configmap/#mounted-configmaps-are-updated-automatically

### CA Trust Location and Auto Discovery

**Note:** this requires no changes to the controllers. The below only calls out
the existing Go standard library behavior.

The CA Trust Store location `/etc/ssl/certs/` referenced here is the default
location in Alpine distros, which is what is currently used across all Flux
images. Users can use other default locations, as per defined in the [Go standard library].
Another option is to define a custom CA Trust Store via [SSL_CERT_DIR].

On first Transport creation, Go will load any bundled `.crt` files and then
append any unique `.pem` files which are inside the certificate directory.
Therefore, from a Go perspective, new `.pem` files will be taken into account,
even when they are not bundled into the default `/etc/ssl/certs/ca-certificates.crt`.

[Go standard library]: https://github.com/golang/go/blob/master/src/crypto/x509/root_linux.go#L18
[SSL_CERT_DIR]: https://github.com/golang/go/blob/master/src/crypto/x509/root_unix.go#L53

### SSH and TLS references

The new fields `spec.trustStore.tls` and `spec.trustStore.ssh` analogous
to Kubernetes `EnvFromSource`, in which it can be used to define either a
`configMapRef` or a `secretRef`, but not both.

Copy link
Contributor

@darkowlzz darkowlzz Dec 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can also have some details about how the controllers would behave for different values of --insecure-object-trust-store={tls-only,ssh-only,both,disabled}.
When it's disabled but a source definition specifies a trust store, should the object reconciliation stall? Because the configuration is invalid based on the current controller level configuration and can only be fixed by restarting the controller with new config or updating the source definition with different config.
Also, maybe some description about the error messaging when it's completely disabled and partially disabled and how it'd be communicated to the user.

## Implementation History

<!--
Major milestones in the lifecycle of the RFC such as:
- The first Flux release where an initial version of the RFC was available.
- The version of Flux where the RFC graduated to general availability.
- The version of Flux where the RFC was retired or superseded.
-->

[/etc/ssh/ssh_known_hosts]: https://en.wikibooks.org/wiki/OpenSSH/Client_Configuration_Files#/etc/ssh/ssh_known_hosts
[public key infrastructure]: https://en.wikipedia.org/wiki/Public_key_infrastructure
[Trust on first use]: https://en.wikipedia.org/wiki/Trust_on_first_use