From e375abb73b50786d0597b147104ed8c3d2b0a44f Mon Sep 17 00:00:00 2001 From: Arno Uhlig Date: Mon, 8 Jul 2024 14:18:15 +0200 Subject: [PATCH 1/5] (feat) central cluster ADR --- .../Greenhouse-ADR-6-central_cluster.md | 78 +++++++++++++++++++ 1 file changed, 78 insertions(+) create mode 100644 architecture-decision-records/Greenhouse-ADR-6-central_cluster.md diff --git a/architecture-decision-records/Greenhouse-ADR-6-central_cluster.md b/architecture-decision-records/Greenhouse-ADR-6-central_cluster.md new file mode 100644 index 0000000..4487950 --- /dev/null +++ b/architecture-decision-records/Greenhouse-ADR-6-central_cluster.md @@ -0,0 +1,78 @@ +# ADR-6 Central cluster + +## Decision Contributors + +- Arno Uhlig +- Ivo Gosemann +- David Rochow +- Martin Vossen +- David Gogl +- Fabian Ruff +- Richard Tief +- Tommy Sauer +- Timo Johner + +## Status + +- Proposed + +## Context and Problem Statement + +The central cluster in Greenhouse hosts non-organization specific core components as well as organization-specific metadata and configuration. +Organizations are isolated by namespaces and permissions (RBAC) are restricted to Greenhouse resource. +Granting more permissions would increase the attack surface and introduce additional risks. + +Another aspect to consider is billing. +The shared nature of the central cluster and underlying infrastructure does not allow tenant-specific measurement and billing of consumed resources. +Thus workload in the central cluster is charged on the provider. + +Moreover, workload within the central cluster is neither transparent nor accessible to the customer. +It cannot be configured, its metrics, logs, etc. are not exposed and access (kubectl exec/delete pod) is restricted. +Thus operations of all workload within the central cluster is on the provider. + +From a network perspective and as documented in the security concept, communication is only uni-directional from the central to the remote clusters. + +Currently, the central Prometheus Alertmanager (AM) is being run within the central cluster for each organization as part of the alerts plugin. +Since Prometheis servers push alerts to the AM, it is exposed via an ingress resource incl. TLS certificates and DNS records. +While this contributes to simplicity and easiness of use, this violates the security concept and introduces additional costs for the provider. +Moreover, it assumes the network zone of the central Greenhouse cluster is a good fit across all organizations and cloud providers. + +Use cases being: +1) Prometheus Alertmanager for holistic alerting capabilities +2) Thanos query and ruler component for organization-wide access to decentralized metric stores +3) Grafana/Plutono for holistic dashboards + +## Related Decision Records + +N/A + +## Decision Drivers + +* **Network Compatibility** + It assumes that the network zone of the central Greenhouse cluster is suitable for all organizations and cloud providers. + +* **Security aspects** + Increased permissions and capabilities enlarge the attack surface, introducing risks. + +* **Operational concerns** + User-configurable workloads in the central cluster are not transparent to customers and must be managed by the Greenhouse team. + +* **Billing** + Tenant-specific resources must be charged to the respective tenant. + +* **Easiness of use** + Greenhouse should offer an easy way to manage operational aspects with a low entry barrier. + +## Decision + +* No user-configurable plugins should be allowed in the Greenhouse central cluster. +* Maintain restrictive permissions within the central cluster limited to Greenhouse resources. +* Introduce `AdminPlugins` to utilize the plugin concept for handling core responsibilities. + They cannot be configured by a user and are fully managed by Greenhouse. +* A customer has to onboard at least one cluster to instantiate plugins with a backend. + +--- + +## Evaluated options, technical details, etc. + +N/A From dbf8875ce9a8d9c2a32eda8a5d11f091c2eddb44 Mon Sep 17 00:00:00 2001 From: Arno Uhlig Date: Mon, 8 Jul 2024 15:11:32 +0200 Subject: [PATCH 2/5] reference existing adrs --- .../Greenhouse-ADR-3-location_of_plugins.md | 4 ++++ .../Greenhouse-ADR-6-central_cluster.md | 2 +- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/architecture-decision-records/Greenhouse-ADR-3-location_of_plugins.md b/architecture-decision-records/Greenhouse-ADR-3-location_of_plugins.md index a3b3338..deeba2f 100644 --- a/architecture-decision-records/Greenhouse-ADR-3-location_of_plugins.md +++ b/architecture-decision-records/Greenhouse-ADR-3-location_of_plugins.md @@ -35,6 +35,10 @@ During developement the question got raised whatever it is a good decision/archi > User story: Customer onboards a newly created cluster and requires an Ingress to expose applications. Via Greenhouse the Ingress Plugin can be configured which results in a deployment of the ingress controller within the customer cluster. > The PluginConfig, dashboard reflects the current status of relevant underlying resources. +## Related Decision Records + +Superseded by [Greenhouse-ADR-6-central_cluster.md](Greenhouse-ADR-6-central_cluster.md) + ## Decision Drivers * Should work with/ focus on the for MVP in scope Applications diff --git a/architecture-decision-records/Greenhouse-ADR-6-central_cluster.md b/architecture-decision-records/Greenhouse-ADR-6-central_cluster.md index 4487950..7565130 100644 --- a/architecture-decision-records/Greenhouse-ADR-6-central_cluster.md +++ b/architecture-decision-records/Greenhouse-ADR-6-central_cluster.md @@ -44,7 +44,7 @@ Use cases being: ## Related Decision Records -N/A +Supersedes [Greenhouse-ADR-3-location_of_plugins.md](Greenhouse-ADR-3-location_of_plugins.md) ## Decision Drivers From 90d2a17b47d4d96ea87e51dfe6c493b4ac1c6d3e Mon Sep 17 00:00:00 2001 From: David Rochow Date: Thu, 8 Aug 2024 08:22:33 +0200 Subject: [PATCH 3/5] docs(adr-6): added details to options --- .../Greenhouse-ADR-6-central_cluster.md | 113 ++++++++++++++++-- 1 file changed, 106 insertions(+), 7 deletions(-) diff --git a/architecture-decision-records/Greenhouse-ADR-6-central_cluster.md b/architecture-decision-records/Greenhouse-ADR-6-central_cluster.md index 7565130..f29f219 100644 --- a/architecture-decision-records/Greenhouse-ADR-6-central_cluster.md +++ b/architecture-decision-records/Greenhouse-ADR-6-central_cluster.md @@ -33,7 +33,7 @@ Thus operations of all workload within the central cluster is on the provider. From a network perspective and as documented in the security concept, communication is only uni-directional from the central to the remote clusters. Currently, the central Prometheus Alertmanager (AM) is being run within the central cluster for each organization as part of the alerts plugin. -Since Prometheis servers push alerts to the AM, it is exposed via an ingress resource incl. TLS certificates and DNS records. +Since Prometheus servers push alerts to the AM, it is exposed via an ingress resource incl. TLS certificates and DNS records. While this contributes to simplicity and easiness of use, this violates the security concept and introduces additional costs for the provider. Moreover, it assumes the network zone of the central Greenhouse cluster is a good fit across all organizations and cloud providers. @@ -41,6 +41,7 @@ Use cases being: 1) Prometheus Alertmanager for holistic alerting capabilities 2) Thanos query and ruler component for organization-wide access to decentralized metric stores 3) Grafana/Plutono for holistic dashboards +4) Heureka having multiple agents running on multiple clusters and data beeing consumed centrally ## Related Decision Records @@ -65,14 +66,112 @@ Supersedes [Greenhouse-ADR-3-location_of_plugins.md](Greenhouse-ADR-3-location_o ## Decision -* No user-configurable plugins should be allowed in the Greenhouse central cluster. -* Maintain restrictive permissions within the central cluster limited to Greenhouse resources. -* Introduce `AdminPlugins` to utilize the plugin concept for handling core responsibilities. - They cannot be configured by a user and are fully managed by Greenhouse. -* A customer has to onboard at least one cluster to instantiate plugins with a backend. +Go with Option 1 - Central Admin Plugins: + * No user-configurable plugins should be allowed in the Greenhouse central cluster. + * Maintain restrictive permissions within the central cluster limited to Greenhouse resources. + * Introduce `AdminPlugins` to utilize the plugin concept for handling core responsibilities. + They cannot be configured by a user and are fully managed by Greenhouse. + * A customer has to onboard at least one cluster to instantiate plugins with a backend. --- ## Evaluated options, technical details, etc. -N/A +### Option 1: Central Admin Plugins + +```mermaid +flowchart LR + subgraph CentralCluster["Central Cluster"] + centralVPNPod["VPN Pod"] + centralAdminPlugin["Admin Plugin"] + centralAdminPlugin --> centralVPNPod + end + subgraph Cluster1["Cluster 1"] + direction LR + c1VPNPod["VPN Pod"] + c1API["Remote Plugin"] + c1VPNPod --> c1API + end + subgraph Cluster2["Cluster 2"] + direction LR + c2VPNPod["VPN Pod"] + c2API["Remote Plugin"] + c2VPNPod --> c2API + end + subgraph Cluster3["Cluster 3"] + direction LR + c3VPNPod["VPN Pod"] + c3API["Remote Plugin"] + c3VPNPod --> c3API + end + user["User"] -. Via Greenhouse .-> centralAdminPlugin + centralVPNPod -. WireGuard Tunnel .-> c1VPNPod & c2VPNPod & c3VPNPod +``` + +* No user-configurable plugins should be allowed in the Greenhouse central cluster. +* Maintain restrictive permissions within the central cluster limited to Greenhouse resources. +* Introduce `AdminPlugins` to utilize the plugin concept for handling core responsibilities. + They cannot be configured by a user and are fully managed by Greenhouse. +* A customer has to onboard at least one cluster to instantiate plugins with a backend. + +#### Pros + +* operational well manageable from Greenhouse +* the limitations to admin plugins ensure that no misconfiguration by consumer is possible +* works with "store local query global" scenarios + +#### Contra + +* Puts a hard dependency to the central cluster availability for all plugins with a backend + * is that already the case?? +* Limits decentralization of Greenhouse applications +* Additional effort required to make central data collection scenarios work + +### Option 2: Per Org Central Communication cluster + +```mermaid + +flowchart LR + subgraph CentralCluster["Central Cluster"] + centralVPNPod["VPN Pod"] + end + subgraph Cluster1["Communication Cluster"] + direction LR + c1VPNPod["VPN Pod"] + c1API["Remote Plugin"] + c1VPNPod --> c1API + end + subgraph Cluster2["Cluster 2"] + direction LR + c2VPNPod["VPN Pod"] + c2API["Remote Plugin"] + c2VPNPod --> c2API + end + subgraph Cluster3["Cluster 3"] + direction LR + c3VPNPod["VPN Pod"] + c3API["Remote Plugin"] + c3VPNPod --> c3API + end + user["User"] -. Via Greenhouse .-> centralVPNPod + centralVPNPod -. WireGuard Tunnel .-> c1VPNPod & c2VPNPod & c3VPNPod + c1VPNPod <-. WireGuardTunnel .-> c3VPNPod & c2VPNPod +``` + +* Each organization has its own communication cluster +* Communication cluster is responsible for establishing communication capability between clusters and establishes bi-directional connections +* Commnunication cluster is owned by respective organization / consumer +* No Plugins are allowed on the Communication Cluster nor on the Central Cluster +* As before no cluster is allowed to communicate with the central cluster + +#### Pros + +* allows consumers to establish full interconnectivity between clusters +* enables common use case for plugins where data is collected decentrally and stored centrally +* all plugins live in the consumer clusters + +#### Contra + +* additional operational complexity +* consumers may rely on the interconnectivity solution for other applications then greenhouse increasing blast radius of potential misconfigurations +* additional security risk for consumers as interconnected clusters potentially allow attackers to move between remote clusters \ No newline at end of file From 8df3ea05e98585b2aed712e5cde1c4a14c6eec37 Mon Sep 17 00:00:00 2001 From: Ivo Gosemann Date: Wed, 14 Aug 2024 10:08:11 +0200 Subject: [PATCH 4/5] add option 3: central cluster per org --- .../Greenhouse-ADR-6-central_cluster.md | 55 ++++++++++++++++++- 1 file changed, 54 insertions(+), 1 deletion(-) diff --git a/architecture-decision-records/Greenhouse-ADR-6-central_cluster.md b/architecture-decision-records/Greenhouse-ADR-6-central_cluster.md index f29f219..41c668b 100644 --- a/architecture-decision-records/Greenhouse-ADR-6-central_cluster.md +++ b/architecture-decision-records/Greenhouse-ADR-6-central_cluster.md @@ -174,4 +174,57 @@ flowchart LR * additional operational complexity * consumers may rely on the interconnectivity solution for other applications then greenhouse increasing blast radius of potential misconfigurations -* additional security risk for consumers as interconnected clusters potentially allow attackers to move between remote clusters \ No newline at end of file +* additional security risk for consumers as interconnected clusters potentially allow attackers to move between remote clusters + +### Option 3: Greenhouse Cluster per Organization + +```mermaid + +flowchart LR + subgraph CentralClusterOrgA["Central Cluster Org A"] + adminPluginOrgA["Admin Plugin"] + end + subgraph CentralClusterOrgB["Central Cluster Org B"] + adminPluginOrgB["Admin Plugin"] + centralVPNPod["VPN Pod"] + adminPluginOrgB --> centralVPNPod + end + subgraph Cluster1["Cluster 1"] + direction LR + c1API["Remote Plugin"] + end + subgraph Cluster2["Cluster 2"] + direction LR + c2API["Remote Plugin"] + end + subgraph Cluster3["Cluster 3"] + direction LR + c3VPNPod["VPN Pod"] + c3API["Remote Plugin"] + c3VPNPod --> c3API + end + userOrgA["Org A User"] -. Via Greenhouse .-> adminPluginOrgA + userOrgB["Org B User"] -. Via Greenhouse .-> adminPluginOrgB + adminPluginOrgA -. Direct Access .-> c1API & c2API + centralVPNPod -. Wireguard Tunnel .-> c3VPNPod +``` + +- Each organization has their own central cluster, owned and operated by them +- Greenhouse (eventually) also provides a Managed Greenhouse Central Cluster +- Admin plugins are allowed in the central cluster, and may be configured by the organization +- Access to previously shared Greenhouse components possible + +#### Pros + +- Organization can choose a suitable network zone for their central cluster, allowing for direct access to Clusters +- No dependencies between Orgs on AdminPlugin updates due to CRD changes +- Organization can configure AdminPlugins to their needs +- Access to Greenhouse Logs & Metrics available +- No shared costs that are not billable to the organization +- Can be run on a trial period to evaluate the concept, as this can be reverted into the current state if needed + +#### Contra + +- Harder to support Organizations without access to their Greenhouse Central Cluster +- No longer a OOB solution for Organizations +- Increased operation complexity for Organizations From 6d092529725b5b79f05caa14b059226a3b780fe3 Mon Sep 17 00:00:00 2001 From: Ivo Gosemann Date: Mon, 2 Dec 2024 10:18:54 +0100 Subject: [PATCH 5/5] formatting --- .../Greenhouse-ADR-6-central_cluster.md | 52 ++++++++++--------- 1 file changed, 27 insertions(+), 25 deletions(-) diff --git a/architecture-decision-records/Greenhouse-ADR-6-central_cluster.md b/architecture-decision-records/Greenhouse-ADR-6-central_cluster.md index 41c668b..16f7000 100644 --- a/architecture-decision-records/Greenhouse-ADR-6-central_cluster.md +++ b/architecture-decision-records/Greenhouse-ADR-6-central_cluster.md @@ -19,27 +19,28 @@ ## Context and Problem Statement The central cluster in Greenhouse hosts non-organization specific core components as well as organization-specific metadata and configuration. -Organizations are isolated by namespaces and permissions (RBAC) are restricted to Greenhouse resource. +Organizations are isolated by namespaces and permissions (RBAC) are restricted to Greenhouse resources. Granting more permissions would increase the attack surface and introduce additional risks. -Another aspect to consider is billing. -The shared nature of the central cluster and underlying infrastructure does not allow tenant-specific measurement and billing of consumed resources. +Another aspect to consider is billing. +The shared nature of the central cluster and underlying infrastructure does not allow tenant-specific measurement and billing of consumed resources. Thus workload in the central cluster is charged on the provider. -Moreover, workload within the central cluster is neither transparent nor accessible to the customer. -It cannot be configured, its metrics, logs, etc. are not exposed and access (kubectl exec/delete pod) is restricted. -Thus operations of all workload within the central cluster is on the provider. +Moreover, workload within the central cluster is neither transparent nor accessible to the customer. +It cannot be configured, its metrics, logs, etc. are not exposed and access (kubectl exec/delete pod) is restricted. +Thus operations of all workload within the central cluster is on the provider. -From a network perspective and as documented in the security concept, communication is only uni-directional from the central to the remote clusters. +From a network perspective and as documented in the security concept, communication is only uni-directional from the central to the remote clusters. Currently, the central Prometheus Alertmanager (AM) is being run within the central cluster for each organization as part of the alerts plugin. -Since Prometheus servers push alerts to the AM, it is exposed via an ingress resource incl. TLS certificates and DNS records. -While this contributes to simplicity and easiness of use, this violates the security concept and introduces additional costs for the provider. -Moreover, it assumes the network zone of the central Greenhouse cluster is a good fit across all organizations and cloud providers. +Since Prometheus servers push alerts to the AM, it is exposed via an ingress resource incl. TLS certificates and DNS records. +While this contributes to simplicity and easiness of use, this violates the security concept and introduces additional costs for the provider. +Moreover, it assumes the network zone of the central Greenhouse cluster is a good fit across all organizations and cloud providers. Use cases being: + 1) Prometheus Alertmanager for holistic alerting capabilities -2) Thanos query and ruler component for organization-wide access to decentralized metric stores +2) Thanos query and ruler component for organization-wide access to decentralized metric stores 3) Grafana/Plutono for holistic dashboards 4) Heureka having multiple agents running on multiple clusters and data beeing consumed centrally @@ -49,29 +50,30 @@ Supersedes [Greenhouse-ADR-3-location_of_plugins.md](Greenhouse-ADR-3-location_o ## Decision Drivers -* **Network Compatibility** +- **Network Compatibility** It assumes that the network zone of the central Greenhouse cluster is suitable for all organizations and cloud providers. -* **Security aspects** +- **Security aspects** Increased permissions and capabilities enlarge the attack surface, introducing risks. -* **Operational concerns** +- **Operational concerns** User-configurable workloads in the central cluster are not transparent to customers and must be managed by the Greenhouse team. -* **Billing** +- **Billing** Tenant-specific resources must be charged to the respective tenant. -* **Easiness of use** +- **Easiness of use** Greenhouse should offer an easy way to manage operational aspects with a low entry barrier. ## Decision Go with Option 1 - Central Admin Plugins: - * No user-configurable plugins should be allowed in the Greenhouse central cluster. - * Maintain restrictive permissions within the central cluster limited to Greenhouse resources. - * Introduce `AdminPlugins` to utilize the plugin concept for handling core responsibilities. - They cannot be configured by a user and are fully managed by Greenhouse. - * A customer has to onboard at least one cluster to instantiate plugins with a backend. + +- No user-configurable plugins should be allowed in the Greenhouse central cluster. +- Maintain restrictive permissions within the central cluster limited to Greenhouse resources. +- Introduce `AdminPlugins` to utilize the plugin concept for handling core responsibilities. + They cannot be configured by a user and are fully managed by Greenhouse. +- A customer has to onboard at least one cluster to instantiate plugins with a backend. --- @@ -108,11 +110,11 @@ flowchart LR centralVPNPod -. WireGuard Tunnel .-> c1VPNPod & c2VPNPod & c3VPNPod ``` -* No user-configurable plugins should be allowed in the Greenhouse central cluster. -* Maintain restrictive permissions within the central cluster limited to Greenhouse resources. -* Introduce `AdminPlugins` to utilize the plugin concept for handling core responsibilities. +- No user-configurable plugins should be allowed in the Greenhouse central cluster. +- Maintain restrictive permissions within the central cluster limited to Greenhouse resources. +- Introduce `AdminPlugins` to utilize the plugin concept for handling core responsibilities. They cannot be configured by a user and are fully managed by Greenhouse. -* A customer has to onboard at least one cluster to instantiate plugins with a backend. +- A customer has to onboard at least one cluster to instantiate plugins with a backend. #### Pros