[WIP] 📖 Add Load Balancer Provider proposal #4389

detiber · 2021-03-29T14:18:02Z

What this PR does / why we need it:

Adds the Load Balancer Provider, that was initially discussed in #1250, and further iterated on in https://docs.google.com/document/d/1wJrtd3hgVrUnZsdHDXQLXmZE3cbXVB5KChqmNusBGpE/edit#

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1250

Related: #4095 #3715

k8s-ci-robot · 2021-03-29T14:18:04Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

k8s-ci-robot · 2021-03-29T14:18:10Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please ask for approval from detiber after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

detiber · 2021-03-29T14:18:41Z

Currently a Draft PR, as I finish migrating content from https://docs.google.com/document/d/1wJrtd3hgVrUnZsdHDXQLXmZE3cbXVB5KChqmNusBGpE/edit# and address outstanding comments on the doc

fabriziopandini

@detiber I really, really appreciate the hard effort in documenting controllers behaviours and in separating all the use cases that this proposal covers. Thank you
I have added few small suggestion and a question for my better understanding of this work

docs/proposals/20210335-load-balancer-provider.md

fabriziopandini · 2021-03-30T12:38:25Z

docs/proposals/20210335-load-balancer-provider.md

+
+##### NFR2
+
+This proposal MUST be compatible with the [Cluster API Provider AWS v1alpha4 API](https://docs.google.com/document/d/1EeD2KjFfeKoD-o0LJTMMWh9GUy-15yG80S5Uoj7oKcA/edit)


Why this provider specific requirement?

Is it part of the "compatibility guarantee" to ensure whatever we're implementing here is compatible with our current implementations?

Not sure if it's realistic to extend this to all our "official" providers (imho we should do it on a best-effort basis depending on if we get feedback from everyone)

I'm concerned about this proposal making a lot of AWS specific assumptions with requirements / user stories / examples being mostly based on AWS infrastructure. The goal Must be optional to start, to limit the changes required of infrastructure providers implies at some point it won't optional, which means we should really make sure it works for all infrastructures, not just AWS or even the "official" providers.

I think I phrased it a bit unclear :)

I didn't want to suggest to only support "official" providers. I just meant that it looks like we have really a lot to consider with all the different implementations. We can only make sure to be compatible to all the use cases and variants we know about, everything else will be hard to grasp.

fabriziopandini · 2021-03-30T12:49:16Z

docs/proposals/20210335-load-balancer-provider.md

+
+### Implementation Details/Notes/Constraints
+
+#### High level Sequence Diagrams


Schemes are great, but I'm wondering if we can simplify things a little bit by removing managers and focusing only on contract changes (the API Server can act as manager). Another option is to remove the fetch operations...

fabriziopandini · 2021-03-30T13:00:40Z

docs/proposals/20210335-load-balancer-provider.md

+
+Reconciliation:
+- Fetch the referenced provider CR as `cp`
+  - If `cp.Spec.Endpoint` is empty, set `cp.Spec.Endpoint` to `Spec.ControlPlane.Endpoint`


Might be we should point out that cp.Spec.Endpoint is a new requirement before introducing it here without further context

fabriziopandini · 2021-03-30T13:04:26Z

docs/proposals/20210335-load-balancer-provider.md

+  - If `cp.Spec.Endpoint` is non-empty and is equal to `Spec.ControlPlane.Endpoint`:
+    - Set `Status.ControlPlaneReady` and conditions based on state of `cp`
+
+##### Scenario: Cluster API managed endpoint


Suggested change

##### Scenario: Cluster API managed endpoint

##### Scenario: Cluster API managed control plane and endpoint

fabriziopandini · 2021-03-30T13:17:31Z

docs/proposals/20210335-load-balancer-provider.md

+- Add spec.Endpoint, if not defined it will be populated by the load balancer provider (if defined)
+- Add optional LoadBalancerRef (if needed, based on the provider)


This is the part where I still need to properly understand this proposal.
In my mind, the cluster load balancer should sit at Cluster level, fill an endpoint, and the control plane should use it.
Instead here we are modelling the load balancer as a detail of the control plane, and as a consequence we have to rollup informations by four levels (infraLB, LB, KCP, Cluster) instead of three (infraLB, LB, Cluster).
It will be great id we can provide a little bit more context of why we are choosing this design.

@fabriziopandini consider control plane implementations that do not have separately managed load balancers, such as those backed by managed Kubernetes services. The current implementations for EKS and AKS require hacking things such that creation and management of the control plane is split among "infrastructure provider" and "control plane provider", to the point that the EKS implementation sets both the infrastructureRef and controlPlaneRef to the same AWSManagedControlPlane reference: https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/46f54e483889596c4033dbbb23cec764576269b2/templates/cluster-template-eks.yaml#L10-L17. The current Azure implementation has a separate resource for infrastructure and controlPlane, but the infrastruucture resource just mirrors the values from the controlPlane resource: https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/15c0e592ace20ca93bbb7c0cacfb8e67f89b5125/exp/controllers/azuremanagedcluster_controller.go#L144-L147

I think by trying to keep the apiserver Load Balancer tied specifically to the Cluster rather than an attribute of the Control Plane will make it difficult (if not impossible) to avoid the hacky behavior that we currently have around this.

Another aspect that I believe is important is that when bootstrapping a Machine-based Control Plane, there is a bit of a dance that needs to be played between the load balancer and the control plane, having the required gating between components, related conditions, etc happen through the ControlPlane would provide better visibility and context to the user for troubleshooting purposes than trying to correlate them across the Cluster and ControlPlane resources.

Longer term, I also think it opens up the potential for automated defaulting. I don't think we have enough capability for introspection or for providers to give "hints" yet, but if we hypothetically had a way for infrastructure providers to say if someone is creating a FooCluster, then use FooLoadBalancer by default, then a Control Plane provider that requires an externally managed load balancer (such as KubeadmControlPlane) could automatically create a FooLoadBalancer resource (with certain default attributes) for the user if no LoadBalancerRef is specified. With the LoadBalancer being an attribute of the Control Plane, we could then fill in the rest of the blanks when creating the LoadBalancer, (primarily the targetKind and selector). If we wanted to be able to provide similar defaulting with the apiserver load balancer being an attribute of the Cluster directly, we would also need control plane providers to also publish hinting information (including if no load balancer should be created/required).

Hopefully that helps provide additional context, if so, I can work on incorporating it directly in the doc.

Thanks for the detailed explanation.
I think that documenting that we are modelling the load balancer as a detail fo the control plane in order to provide better support form managed control plane solutions (like EKS) will be great

fabriziopandini · 2021-03-30T13:22:57Z

docs/proposals/20210335-load-balancer-provider.md

+      - 'disabled' (user requested disable complete)
+      - other provider-relevant statuses for in-between states
+- Used by users/external systems to signal to the load balancer provider that a resource should be temporarily removed from the LB:
+  - loadbalancer.cluster.x-k8s.io/disabled:


nit: might be we shall use to use another verb instead of disabled, so we avoid confusion with status:disabled

@fabriziopandini Do you think that disable would be better? Since status: disabled means that this annotation has been acted on?

might be loadbalancer.cluster.x-k8s.io/force-disabled to express the intent from external actors, and to report loadbalancer.cluster.x-k8s.io/status:disabled to report the action been acted on.

I'm not sure if it's a good idea, but does it make any sense to align a bit to the ccm annotation node.kubernetes.io/exclude-from-external-load-balancers?

In OpenStack you can just disable a lb member or you can completely remove it. What would be the intention in this case?

Or phrased differently. Does it matter for us if the lb member is temporarily disabled/removed or do we want to just remove/disable it until it's actively added/enabled again?

fabriziopandini · 2021-03-30T13:23:44Z

docs/proposals/20210335-load-balancer-provider.md

+
+- If Spec.Selector is not defined:
+  - Spec.Selector.MatchLabels = `{'cluster.x-k8s.io/cluster-name': Spec.ClusterName, 'loadbalancer.cluster.x-k8s.io/name': Metadata.Name }`
+


Add a default for targetKind = to machines?

sbueringer · 2021-03-30T13:50:18Z

@detiber is this PR already ready for review or would it be better if I wait a bit? :)

detiber · 2021-03-30T14:26:20Z

@detiber is this PR already ready for review or would it be better if I wait a bit? :)

It's still very much a work in progress as I try to continue to address things from the google doc, but please do review. It'll be at least a few days before I make any more changes as I focus on KubeCon prep.

sbueringer

@detiber Thanks for pushing this forward!!

sbueringer · 2021-03-30T15:11:28Z

docs/proposals/20210335-load-balancer-provider.md

+
+## Summary
+
+We propose a first-class Load Balancer construct for Cluster API for the purpose of providing a pluggable model for load balancing Kubernetes cluster control planes such that the load balancer implementation isn't tied directly into the Infrastructure Cluster resource. This will open the possibility for multiple swappable options in cloud environments and the ability to re-use load balancer implementations when possible, especially for bare metal and data center environments with heterogeneous infrastructure.


Should we also already mention here that the lb provider can be used, even when the cluster controller is not used (e.g. because the cluster is externally managed)

sbueringer · 2021-03-30T15:17:25Z

docs/proposals/20210335-load-balancer-provider.md

+
+There are also cases within existing infrastructure provider implementations where it would be beneficial to allow for easier swapping of different load balancer implementations that may provide different benefits under different circumstances.
+
+We've also seen use cases where there is a need for a Load Balancer to front non-control plane services, such as Ingress. Any functionality that we add should also support these use cases in addition to just control plane load balancing.


Is there a clear separation between the CAPI lb providers and existing cloud providers which implement services of type LoadBalancer, or is a certain overlap in functionality intended?

Is the idea that existing cloud providers can/should still be used in CAPI-managed clusters?

I think there currently is a clear separation between the two, mainly because of limitations of the existing CCM implementations. I covered a bit in the Alternative proposals section, but it could definitely use a bit more elaboration here, thanks!

So in Azure there is actually direct overlap (not sure if the same applies for others), because each VM can only be associated with one LB type "internal" and one LB type "public", which means we have to reuse the same LB cloud provider creates for services of type LoadBalancer for things like node egress. See my comment here for more context: kubernetes-sigs/cluster-api-provider-azure#1229 (comment)

Is the idea that existing cloud providers can/should still be used in CAPI-managed clusters?

This should definitely be true, I wouldn't even suggest adding it as a goal

Agree, there is a clear separation about what they can do.

I just think it's interesting how our (as CAPI project) view of the existing implementation is. Do we expect them to coexist or will our implementations supersede the existing ccm implementation (of course only in CAPI-managed clusters). Or don't we have an opinion and we don't need to as they can coexist without any effort on both sides. The overlap I currently see:

we will depend on external CCMs so the Nodes get ready as soon as the in-tree implementation has been removed (although that's not the service controller, so it's a separate functionality in the CCM)

we're currently also cleaning up CCM lbs during cluster deletion (at least in CAPA (e2e-test), although I also like to implement this in CAPO, but I don't know how other provider are handling this)

this relates to this discussion: #3075

@CecileRobertMichon thank you for the additional context. If I'm understanding correctly, this overlap already exists today, and I wouldn't expect anything in this proposal to greatly affect that overlap more than it exists currently.

I do want to better understand the use case, though, is it okay if I reach out next week to discuss things further over a higher bandwidth medium so that I can better understand the nuances involved and incorporate more of the Azure-specific requirements into the proposal?

Was there any follow-up here?

Even if this isn't applicable to all platforms, I think an example would help here. E.g. User Story 4: allow to bring an AWS ELB into the fold on a vSphere (on AWS, VMC) cluster, which at the moment, wouldn't be possible using just CCM alone

sbueringer · 2021-03-30T15:22:01Z

docs/proposals/20210335-load-balancer-provider.md

+### Goals
+
+- Provide a common abstraction for Load Balancers between various providers
+- Allow for reuse of load balancer providers between infrastructure providers where possible


I assume the first step when implementing this will be to somewhat extract the existing lb implementations from existing providers. I would assume that providers like CAPO/CAPA/... will still keep the code in the respective repository. Will the extracted lb providers be published via additional release artifacts of CAPO/CAPA/...? (I'm only referring to "default" lb providers which only make sense in combination with a specific infra provider, e.g. OpenStack Octavia)

I assume we will have two kinds of lb providers. LB providers which only make sense in combination with a specific infra provider and LB provider which are usable with different infra providers. Do we want some kind of compatibility matrix documented somewhere central or should this be included in the documentation of the lb providers? Do we want to make the lb provider easily discoverable? I suppose we just add a section to the cluster-api book?

I think from a test perspective it would make sense to leave "infra provider" (CAPO/CAPA) specific lb providers like OpenStack Ocatvia in the respective "infra provider" repositories. Similar like CAPD in the main cluster-api repo.

I assume the first step when implementing this will be to somewhat extract the existing lb implementations from existing providers. I would assume that providers like CAPO/CAPA/... will still keep the code in the respective repository. Will the extracted lb providers be published via additional release artifacts of CAPO/CAPA/...? (I'm only referring to "default" lb providers which only make sense in combination with a specific infra provider, e.g. OpenStack Octavia)

I assume we will have two kinds of lb providers. LB providers which only make sense in combination with a specific infra provider and LB provider which are usable with different infra providers. Do we want some kind of compatibility matrix documented somewhere central or should this be included in the documentation of the lb providers? Do we want to make the lb provider easily discoverable? I suppose we just add a section to the cluster-api book?

I do think we'll need to figure out documentation and how to portray the support matrix somehow. As far as discoverability, some of the available options can be exposed through clusterctl or the related operator work as built-in providers, similar to how existing infrastructure providers are exposed today.

I think from a test perspective it would make sense to leave "infra provider" (CAPO/CAPA) specific lb providers like OpenStack Ocatvia in the respective "infra provider" repositories. Similar like CAPD in the main cluster-api repo.

Indeed, I do believe that it'll make sense to keep provider specific implementations in the same repositories as their current functionality exists, but how they exist there may need to change to facilitate their use.

For example, where there are situations such as a "provider" deployed on top of another "provider", think vSphere (or even OpenStack) on top of a public cloud and leveraging a more native LB solution to the public cloud provider. While not all combinations can or should be supported I know there is a desire to support this on the CAPV side.

sbueringer · 2021-03-30T15:39:47Z

docs/proposals/20210335-load-balancer-provider.md

+  conditions: (List of infra providers condition for the load balancer)
+```
+
+#### Annotations


I assume those annotations should be set on the targetKind?

Indeed, I'll clarify

sbueringer · 2021-03-30T15:41:35Z

docs/proposals/20210335-load-balancer-provider.md

+      - 'disabled' (user requested disable complete)
+      - other provider-relevant statuses for in-between states
+- Used by users/external systems to signal to the load balancer provider that a resource should be temporarily removed from the LB:
+  - loadbalancer.cluster.x-k8s.io/disabled:


I'm not sure if it's a good idea, but does it make any sense to align a bit to the ccm annotation node.kubernetes.io/exclude-from-external-load-balancers?

In OpenStack you can just disable a lb member or you can completely remove it. What would be the intention in this case?

Or phrased differently. Does it matter for us if the lb member is temporarily disabled/removed or do we want to just remove/disable it until it's actively added/enabled again?

sbueringer · 2021-03-30T15:59:02Z

docs/proposals/20210335-load-balancer-provider.md

+
+Generic Cluster API resources take precedence on infra resources
+If fields subject to contract are set on CAPI resources, the provider is responsible of copying them back to infra resources before starting to reconcile
+If fields subject to contract aren’t set on CAPI resources, CAPI controllers will take care of pulling the values from the infrastructure resources


Or to phrase it differently: Only CAPI controllers are writing to CAPI resources (as usual).
(CAPI resources defined as the resources in the group cluster.x-k8s.io, and CAPI controllers are the controllers owning them)

(just to confirm my assumption)

I'll work to add additional clarification here and ensure there is consistency with other parts of the proposal around this.

The idea here is that a user shouldn't have to set desired configuration specifically on the provider-specified types, they should be able to set it on the generic CAPI types and that should be honored by all components in the reconciliation chain. We also don't want "child" resources modifying their "parent" resources, but the "parent" resource may modify the "child" resource.

sbueringer · 2021-03-30T16:04:28Z

docs/proposals/20210335-load-balancer-provider.md

+
+Kubernetes already provides a load balancer construct of Service Type=Loadbalancer. The service API allows pointing at arbitrary IPs, which could in fact be the machines of a workload cluster’s control plane.
+
+While we believe this approach has longer term benefits, there are a few reasons that it does not fit the short term requirements:


I think a major difference is in which clusters the resources, specifying the loadbalancers, are deployed.

In our case (our managed Kubernetes-as-a-Service) we currently have a clean separation:

users of the workload cluster can define services of type LoadBalancer in the workload cluster (reconciled via ccm) (CSI also works the same way)

this is also somewhat relevant to be able to completely backup/restore a workload cluster

management cluster operators can define control plane LoadBalancer in the management cluster

If now the non-control-plane loadbalancers should be configured in the management cluster and we want to enable workload clusters to configure them, we have to give workload cluster users access to the management cluster. But maybe that's just something we have to figure out downstream because it's specific to our use case / service.

I think overall I really like the idea that the lb providers can be used for control-plane and non-control-plane loadbalancers. I assume the currently available cloud providers don't really fit in the big picture of CAPI-managed clusters. I'm just not entirely sure if we have clear picture yet how non-control-plane loadbalancers fit.

I think a major difference is in which clusters the resources, specifying the loadbalancers, are deployed.

Yes, I think this is the important distinction that I need to clarify better in the proposal.

In our case (our managed Kubernetes-as-a-Service) we currently have a clean separation:

users of the workload cluster can define services of type LoadBalancer in the workload cluster (reconciled via ccm) (CSI also works the same way)

this is also somewhat relevant to be able to completely backup/restore a workload cluster

management cluster operators can define control plane LoadBalancer in the management cluster

If now the non-control-plane loadbalancers should be configured in the management cluster and we want to enable workload clusters to configure them, we have to give workload cluster users access to the management cluster. But maybe that's just something we have to figure out downstream because it's specific to our use case / service.

I don't think we want to take over that use case, my thought around non-controlPlane load balancers was infrastructure that would have chicken/egg scenarios with the cluster creation. Say I'm standing up n clusters, and for each one I have requirements to configure both load balancing for the control plane, but also provider-specific ingress for the cluster worker nodes (or a subset of the worker nodes). While there may be some combinations of ingress solutions that could work with a subset of CCM providers to provide that functionality, that functionality isn't universal.

I think overall I really like the idea that the lb providers can be used for control-plane and non-control-plane loadbalancers. I assume the currently available cloud providers don't really fit in the big picture of CAPI-managed clusters. I'm just not entirely sure if we have clear picture yet how non-control-plane loadbalancers fit.

Hopefully the above helps clarify things, the intention is not to try and assume control over things that can be filled by CCM providers, but rather to supplement functionality where they don't currently meet it.

On a longer term, it would be great if CCM implementations were able to be run in a multi-tenancy based deployment on a management cluster and could fill the needs of both management cluster managed load balancers and workload cluster managed load balancers, however I think it would take significant effort (and time) to drive the functionality needed to support both of those deployment scenarios and use cases on the CCM side.

sbueringer · 2021-03-30T16:20:06Z

docs/proposals/20210335-load-balancer-provider.md

+CP provider issues machine delete
+LB provider processes hook, quiesces, removes its hook annotation
+Machine is deleted
+


I know WIP :). I think we need an example for non-control-plane loadbalancers to think those cases through.
Some questions:

How would non-control-plane loadbalancers be "connected" to clusters? I assume similar to MachineDeployments today?

Is it possible to reference targetKinds of multiple clusters in non-control-plane loadbalancers?

I know WIP :). I think we need an example for non-control-plane loadbalancers to think those cases through.
Some questions:

How would non-control-plane loadbalancers be "connected" to clusters? I assume similar to MachineDeployments today?

My thought is that they would be somewhat disjoint, connected to the Cluster through the Spec.ClusterName field, at least to start. From the perspective of the Cluster, we could handle it similar to how we handle other "descendants" of the Cluster, such as MachineDeployments, MachineSets, and Machines that have 1:many relationships.

Is it possible to reference targetKinds of multiple clusters in non-control-plane loadbalancers?

Is there a use case where that would be desirable? I'd probably lean towards 'no' without a really compelling use case where it would be needed.

Is it possible to reference targetKinds of multiple clusters in non-control-plane loadbalancers?

Is there a use case where that would be desirable? I'd probably lean towards 'no' without a really compelling use case where it would be needed.

I agree. I was just wondering. As far as I know the OpenStack ccm allows loadbalancer re-use between different Kubernetes clusters. If nobody really needs it, I wouldn't support it as it adds additional complexity.

docs/proposals/20210335-load-balancer-provider.md

sbueringer · 2021-03-30T17:10:07Z

docs/proposals/20210335-load-balancer-provider.md

+
+##### NFR2
+
+This proposal MUST be compatible with the [Cluster API Provider AWS v1alpha4 API](https://docs.google.com/document/d/1EeD2KjFfeKoD-o0LJTMMWh9GUy-15yG80S5Uoj7oKcA/edit)


Is it part of the "compatibility guarantee" to ensure whatever we're implementing here is compatible with our current implementations?

Not sure if it's realistic to extend this to all our "official" providers (imho we should do it on a best-effort basis depending on if we get feedback from everyone)

CecileRobertMichon · 2021-03-30T19:00:49Z

docs/proposals/20210335-load-balancer-provider.md

+
+### Goals
+
+- Provide a common abstraction for Load Balancers between various providers


How would provider specific customizations be handled? One example I'm thinking of is allowing a custom number of frontend public IPs kubernetes-sigs/cluster-api-provider-azure#1229, which is crucial to avoid SNAT port exhaustion in large clusters in Azure since you can't attach multiple SLBs

I'll make sure to add some additional context to the proposal regarding this, but the idea is that they will be exposed through provider specific load balancer types, similar to how provider-specific configuration is exposed through the provider-specific *Cluster and *Machine types today.

devigned · 2021-03-30T20:41:05Z

docs/proposals/20210335-load-balancer-provider.md

+- What changes (in invocations, configurations, API use, etc.) is an existing cluster required to make on upgrade in order to keep previous behavior?
+- What changes (in invocations, configurations, API use, etc.) is an existing cluster required to make on upgrade in order to make use of the enhancement?
+
+### For infrastructure providers taht provide LB constructs


Suggested change

### For infrastructure providers taht provide LB constructs

### For infrastructure providers that provide LB constructs

CecileRobertMichon · 2021-03-30T19:08:16Z

docs/proposals/20210335-load-balancer-provider.md

+  - @fabriziopandini
+  - @JoelSpeed
+creation-date: 2020-10-26
+last-updated: 2021-03-25


nit: the file name has a typo, date is 35 instead of 25

docs/proposals/20210335-load-balancer-provider.md

CecileRobertMichon · 2021-03-30T19:47:29Z

docs/proposals/20210335-load-balancer-provider.md

+User wants Cluster API to manage the endpoint for the cluster control plane.
+
+Preconditions:
+- `Spec.ControlPlane.Endpoint` is empty


Should we close #3715 in favor of this proposal?

I'm planning on attempting to reconcile #3715 as well as some conflicting language in this proposal that is along similar lines to #3715 next week.

vincepri

First round of reviews from my side. I got a bit lost at some point through the details of the proposal where I wasn't sure which controller/reconcilers I was reviewing anymore.

I also had a question on why we need endpoint on both Cluster and the ControlPlane resource

docs/proposals/20210335-load-balancer-provider.md

vincepri · 2021-04-14T20:16:02Z

docs/proposals/20210335-load-balancer-provider.md

+
+There are also cases within existing infrastructure provider implementations where it would be beneficial to allow for easier swapping of different load balancer implementations that may provide different benefits under different circumstances.
+
+We've also seen use cases where there is a need for a Load Balancer to front non-control plane services, such as Ingress. Any functionality that we add should also support these use cases in addition to just control plane load balancing.


Was there any follow-up here?

vincepri · 2021-04-14T20:18:16Z

docs/proposals/20210335-load-balancer-provider.md

+
+- Provide a common abstraction for Load Balancers between various providers
+- Allow for reuse of load balancer providers between infrastructure providers where possible
+- Load Balancer must be able to interoperate with alternative Control Plane Providers


Suggested change

- Load Balancer must be able to interoperate with alternative Control Plane Providers

- Load Balancer must be able to interoperate with any Control Plane Providers

Maybe? I assumed this was implying alternative to KCP, although wanted to be more generic

docs/proposals/20210335-load-balancer-provider.md

vincepri · 2021-04-14T20:20:41Z

docs/proposals/20210335-load-balancer-provider.md

+##### Story 3a - Custom endpoints for consumers
+As a platform operator, in my environment we use PowerDNS for DNS registration with OpenStack, an internal ACME server and a 3rd party load balancer provider. I would like to provide a cluster as a service offering to my development teams, and provide a friendly DNS endpoint for the cluster. When a new cluster is provisioned, I want the load balancer to be configured, a DNS entry created via the PowerDNS API and a certificate to be issued for end users to connect to the API server with a known CA certificate. A static IP is assumed for the load balancer.


How are we planning to allow support for this use case?

vincepri · 2021-04-14T21:03:02Z

docs/proposals/20210335-load-balancer-provider.md

+
+### New Types
+
+#### Load Balancer


I haven't gone far down the proposal yet, is there going to be a controller in CAPI managing these types?

Follow-up: If there is no controller in CAPI itself, or if the controller only reconciles status, should we consider having this type to be embeddable in the infrastructure specific provider instead and provide utilities to update its status instead of a full controller?

vincepri · 2021-04-14T21:06:02Z

docs/proposals/20210335-load-balancer-provider.md

+#### Defaults
+
+- If Spec.Selector is not defined:
+  - Spec.Selector.MatchLabels = `{'cluster.x-k8s.io/cluster-name': Spec.ClusterName, 'loadbalancer.cluster.x-k8s.io/name': Metadata.Name }`


Should we also force the control plane label here?

vincepri · 2021-04-14T21:07:08Z

docs/proposals/20210335-load-balancer-provider.md

+  - loadbalancer.cluster.x-k8s.io/name: `<name of lb>`
+  - loadbalancer.cluster.x-k8s.io/status: `<status>`


Are infrastructure providers in charge of setting this annotation on a LoadBalancer resource? If so, wouldn't it be more useful to have these values under spec or status instead and have them as a contract?

vincepri · 2021-04-14T21:11:22Z

docs/proposals/20210335-load-balancer-provider.md

+
+#### Conditions
+
+- DNSResolvable


Suggested change

- DNSResolvable

- IsReachable

Generic enough that could work with IP-based load balancers as well

vincepri · 2021-04-14T21:11:38Z

docs/proposals/20210335-load-balancer-provider.md

+#### Conditions
+
+- DNSResolvable
+- LBExists


Suggested change

- LBExists

- Exists

Do we need the LB prefix?

enxebre · 2021-04-15T11:17:48Z

docs/proposals/20210335-load-balancer-provider.md

+
+![With Load Balancer Provider](images/load-balancer/cluster-creation-w-loadbalancer-sequence.png)
+
+![Without Load Balancer Provider](images/load-balancer/cluster-creation-wo-loadbalancer-sequence.png)]


typo additional ]

enxebre · 2021-04-15T11:21:02Z

docs/proposals/20210335-load-balancer-provider.md

+    pods:
+      cidrBlocks:
+        - 192.168.0.0/16
+  controlPlaneRef:


It seems the examples here do not align with the proposes changes to cluster here https://github.com/kubernetes-sigs/cluster-api/pull/4389/files#diff-3dabbc43888d2653843c8851f0293425b1edcfa8d613d654d5a8f1dbab17e979R173

docs/proposals/20210335-load-balancer-provider.md

JoelSpeed · 2021-04-15T10:57:47Z

docs/proposals/20210335-load-balancer-provider.md

+
+There are also cases within existing infrastructure provider implementations where it would be beneficial to allow for easier swapping of different load balancer implementations that may provide different benefits under different circumstances.
+
+We've also seen use cases where there is a need for a Load Balancer to front non-control plane services, such as Ingress. Any functionality that we add should also support these use cases in addition to just control plane load balancing.


Even if this isn't applicable to all platforms, I think an example would help here. E.g. User Story 4: allow to bring an AWS ELB into the fold on a vSphere (on AWS, VMC) cluster, which at the moment, wouldn't be possible using just CCM alone

JoelSpeed · 2021-04-15T11:00:50Z

docs/proposals/20210335-load-balancer-provider.md

+- Load Balancer must be able to interoperate with alternative Control Plane Providers
+- Continued reconciliation of members of the Load Balancer (quiescing of connections, allow for graceful termination/removal of membership as part of deletion mechanisms)
+- Should provide an upgrade path for existing Clusters (specifically vsphere clusters and the current Load Balancer provider)
+- Should not be tied to a specific Control Plane provider implementation (to support alternative Control Plane provider implementations)


Is this a duplicate of interoperate with alternative Control Plane Providers on L#78?

JoelSpeed · 2021-04-15T11:11:13Z

docs/proposals/20210335-load-balancer-provider.md

+- Should provide an upgrade path for existing Clusters (specifically vsphere clusters and the current Load Balancer provider)
+- Should not be tied to a specific Control Plane provider implementation (to support alternative Control Plane provider implementations)
+- Should not be a required component of a "Cluster", since some control plane implementations do not require a load balancer (for example, Control Plane providers based on managed Kubernetes services)
+- Must be optional to start, to limit the changes required of infrastructure providers


Given the complexity that having separate LB CRs adds (moar CRs for users to understand, extra controllers to run and update), do we plan to enforce this on all providers in the long term?

Could we consider this to be an "if you need it" or power user feature and change this requirement to Must be optional?

JoelSpeed · 2021-04-15T11:24:18Z

docs/proposals/20210335-load-balancer-provider.md

+
+- Existing Service Type=LoadBalancer implementations do not support co-deployment with other implementations
+
+- ??? (I know there were other reasons, but blanking on them at the moment, need to finish out this list)


Can you use an AWS CCM to implement SotLB if the cluster itself is running on vSphere/VMC? Does that need to go here? (Or rather a generic version of)

JoelSpeed · 2021-04-15T11:26:36Z

docs/proposals/20210335-load-balancer-provider.md

+apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
+kind: AWSCluster
+metadata:
+  name: my-cluster
+spec:
+  region: eu-west-1
+  sshKeyName: default


Normally an AWSCluster would itself create a load balancer, how does it know not to do that in this case where it isn't needed?

enxebre · 2021-04-15T11:38:31Z

docs/proposals/20210335-load-balancer-provider.md

+## Motivation
+
+In private data center environments, there is no common infrastructure provider for load balancers vs. the provider for the machines themselves. Virtualized environments may use one vendor for virtual machines, e.g. VMware vSphere, KVM, or Xen, and a variety of vendors for networks and load balancers, e.g. F5, VMware NSX-T, Cisco ACI etc… This requires the ability to plug in different infrastructure providers for networking constructs with that of the machine provisioner. The most common use is the provisioning of a load balancer to sit in front of the Kubernetes control plane instances in order to provide a stable endpoint. Many cluster operators will also want to provision a stable machine-based load balancer to front all workload services from that provisioned cluster.
+


Many cluster operators will also want to provision a stable machine-based load balancer to front all workload services from that provisioned cluster.

This is mentioned here but then I think all use cases and examples are tied to the controlPlane. Do we want to focus on controlPlane use case for now?
Does the approach in this proposal provide benefits over a service lb type for both use cases?

vincepri · 2021-07-07T17:12:10Z

Closing this for now given that a group of folks is going to take it over to a Google doc for further collaboration

/close

k8s-ci-robot · 2021-07-07T17:12:14Z

@vincepri: Closed this PR.

In response to this:

Closing this for now given that a group of folks is going to take it over to a Google doc for further collaboration

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sbueringer · 2022-10-12T17:25:22Z

@JoelSpeed @randomvariable @davidewatson

Hey folks,
just wanted to ask if you are still interested in pursuing this proposal. (we are currently still tracking it in our meeting notes)

JoelSpeed · 2022-10-13T09:45:30Z

I'm still interested in the idea of supporting multiple load balancers (internal and external) though the provider part itself is a much larger piece of work which I am unlikely to be able to contribute to in the near term

Add Load Balancer Provider proposal

09fac45

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 29, 2021

k8s-ci-robot requested review from CecileRobertMichon and vincepri March 29, 2021 14:18

k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Mar 29, 2021

additional iteration

0d16d93

k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 29, 2021

Added start of high level sequence diagrams

21656ca

fabriziopandini reviewed Mar 30, 2021

View reviewed changes

sbueringer reviewed Mar 30, 2021

View reviewed changes

CecileRobertMichon reviewed Mar 30, 2021

View reviewed changes

devigned reviewed Mar 30, 2021

View reviewed changes

CecileRobertMichon reviewed Mar 30, 2021

View reviewed changes

CecileRobertMichon mentioned this pull request Apr 5, 2021

v1alpha4: Move infra provider contract controlPlaneEndpoint to Status #3715

Closed

sbueringer mentioned this pull request Apr 9, 2021

v0.5+ Implement lb provider kubernetes-sigs/cluster-api-provider-openstack#839

Closed

vincepri reviewed Apr 14, 2021

View reviewed changes

enxebre reviewed Apr 15, 2021

View reviewed changes

JoelSpeed reviewed Apr 15, 2021

View reviewed changes

enxebre reviewed Apr 15, 2021

View reviewed changes

k8s-ci-robot closed this Jul 7, 2021

shivi28 mentioned this pull request Sep 21, 2021

Bring your own control plane ELB kubernetes-sigs/cluster-api-provider-aws#2774

Closed

lubronzhan mentioned this pull request Oct 10, 2022

Load Balancer Provider #1250

Closed


		##### NFR2

		This proposal MUST be compatible with the [Cluster API Provider AWS v1alpha4 API](https://docs.google.com/document/d/1EeD2KjFfeKoD-o0LJTMMWh9GUy-15yG80S5Uoj7oKcA/edit)


		### Implementation Details/Notes/Constraints

		#### High level Sequence Diagrams

	##### Scenario: Cluster API managed endpoint
	##### Scenario: Cluster API managed control plane and endpoint

		- Add spec.Endpoint, if not defined it will be populated by the load balancer provider (if defined)
		- Add optional LoadBalancerRef (if needed, based on the provider)


		- If Spec.Selector is not defined:
		- Spec.Selector.MatchLabels = `{'cluster.x-k8s.io/cluster-name': Spec.ClusterName, 'loadbalancer.cluster.x-k8s.io/name': Metadata.Name }`


		## Summary

		We propose a first-class Load Balancer construct for Cluster API for the purpose of providing a pluggable model for load balancing Kubernetes cluster control planes such that the load balancer implementation isn't tied directly into the Infrastructure Cluster resource. This will open the possibility for multiple swappable options in cloud environments and the ability to re-use load balancer implementations when possible, especially for bare metal and data center environments with heterogeneous infrastructure.


		There are also cases within existing infrastructure provider implementations where it would be beneficial to allow for easier swapping of different load balancer implementations that may provide different benefits under different circumstances.

		We've also seen use cases where there is a need for a Load Balancer to front non-control plane services, such as Ingress. Any functionality that we add should also support these use cases in addition to just control plane load balancing.


		Kubernetes already provides a load balancer construct of Service Type=Loadbalancer. The service API allows pointing at arbitrary IPs, which could in fact be the machines of a workload cluster’s control plane.

		While we believe this approach has longer term benefits, there are a few reasons that it does not fit the short term requirements:


		### Goals

		- Provide a common abstraction for Load Balancers between various providers

	### For infrastructure providers taht provide LB constructs
	### For infrastructure providers that provide LB constructs

	- Load Balancer must be able to interoperate with alternative Control Plane Providers
	- Load Balancer must be able to interoperate with any Control Plane Providers

		##### Story 3a - Custom endpoints for consumers
		As a platform operator, in my environment we use PowerDNS for DNS registration with OpenStack, an internal ACME server and a 3rd party load balancer provider. I would like to provide a cluster as a service offering to my development teams, and provide a friendly DNS endpoint for the cluster. When a new cluster is provisioned, I want the load balancer to be configured, a DNS entry created via the PowerDNS API and a certificate to be issued for end users to connect to the API server with a known CA certificate. A static IP is assumed for the load balancer.

		- loadbalancer.cluster.x-k8s.io/name: `<name of lb>`
		- loadbalancer.cluster.x-k8s.io/status: `<status>`


		![With Load Balancer Provider](images/load-balancer/cluster-creation-w-loadbalancer-sequence.png)

		![Without Load Balancer Provider](images/load-balancer/cluster-creation-wo-loadbalancer-sequence.png)]


		- Existing Service Type=LoadBalancer implementations do not support co-deployment with other implementations

		- ??? (I know there were other reasons, but blanking on them at the moment, need to finish out this list)

		## Motivation

		In private data center environments, there is no common infrastructure provider for load balancers vs. the provider for the machines themselves. Virtualized environments may use one vendor for virtual machines, e.g. VMware vSphere, KVM, or Xen, and a variety of vendors for networks and load balancers, e.g. F5, VMware NSX-T, Cisco ACI etc… This requires the ability to plug in different infrastructure providers for networking constructs with that of the machine provisioner. The most common use is the provisioning of a load balancer to sit in front of the Kubernetes control plane instances in order to provide a stable endpoint. Many cluster operators will also want to provision a stable machine-based load balancer to front all workload services from that provisioned cluster.

[WIP] 📖 Add Load Balancer Provider proposal #4389

[WIP] 📖 Add Load Balancer Provider proposal #4389

Conversation

detiber commented Mar 29, 2021

k8s-ci-robot commented Mar 29, 2021

k8s-ci-robot commented Mar 29, 2021

detiber commented Mar 29, 2021

fabriziopandini left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbueringer Mar 30, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fabriziopandini Mar 30, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbueringer commented Mar 30, 2021

detiber commented Mar 30, 2021

sbueringer left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbueringer Mar 30, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbueringer Mar 31, 2021 • edited Loading

Choose a reason for hiding this comment

sbueringer Mar 30, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vincepri left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

enxebre Apr 15, 2021 • edited Loading

sbueringer Mar 30, 2021 •

edited

Loading

fabriziopandini Mar 30, 2021 •

edited

Loading

sbueringer left a comment •

edited

Loading

sbueringer Mar 30, 2021 •

edited

Loading

sbueringer Mar 31, 2021 •

edited

Loading

sbueringer Mar 30, 2021 •

edited

Loading

enxebre Apr 15, 2021 •

edited

Loading