-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] 📖 Add Load Balancer Provider proposal #4389
Conversation
Skipping CI for Draft Pull Request. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Currently a Draft PR, as I finish migrating content from https://docs.google.com/document/d/1wJrtd3hgVrUnZsdHDXQLXmZE3cbXVB5KChqmNusBGpE/edit# and address outstanding comments on the doc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@detiber I really, really appreciate the hard effort in documenting controllers behaviours and in separating all the use cases that this proposal covers. Thank you
I have added few small suggestion and a question for my better understanding of this work
|
||
##### NFR2 | ||
|
||
This proposal MUST be compatible with the [Cluster API Provider AWS v1alpha4 API](https://docs.google.com/document/d/1EeD2KjFfeKoD-o0LJTMMWh9GUy-15yG80S5Uoj7oKcA/edit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this provider specific requirement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it part of the "compatibility guarantee" to ensure whatever we're implementing here is compatible with our current implementations?
Not sure if it's realistic to extend this to all our "official" providers (imho we should do it on a best-effort basis depending on if we get feedback from everyone)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned about this proposal making a lot of AWS specific assumptions with requirements / user stories / examples being mostly based on AWS infrastructure. The goal Must be optional to start, to limit the changes required of infrastructure providers
implies at some point it won't optional, which means we should really make sure it works for all infrastructures, not just AWS or even the "official" providers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I phrased it a bit unclear :)
I didn't want to suggest to only support "official" providers. I just meant that it looks like we have really a lot to consider with all the different implementations. We can only make sure to be compatible to all the use cases and variants we know about, everything else will be hard to grasp.
|
||
### Implementation Details/Notes/Constraints | ||
|
||
#### High level Sequence Diagrams |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Schemes are great, but I'm wondering if we can simplify things a little bit by removing managers and focusing only on contract changes (the API Server can act as manager). Another option is to remove the fetch operations...
|
||
Reconciliation: | ||
- Fetch the referenced provider CR as `cp` | ||
- If `cp.Spec.Endpoint` is empty, set `cp.Spec.Endpoint` to `Spec.ControlPlane.Endpoint` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be we should point out that cp.Spec.Endpoint
is a new requirement before introducing it here without further context
- If `cp.Spec.Endpoint` is non-empty and is equal to `Spec.ControlPlane.Endpoint`: | ||
- Set `Status.ControlPlaneReady` and conditions based on state of `cp` | ||
|
||
##### Scenario: Cluster API managed endpoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
##### Scenario: Cluster API managed endpoint | |
##### Scenario: Cluster API managed control plane and endpoint |
- Add spec.Endpoint, if not defined it will be populated by the load balancer provider (if defined) | ||
- Add optional LoadBalancerRef (if needed, based on the provider) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the part where I still need to properly understand this proposal.
In my mind, the cluster load balancer should sit at Cluster level, fill an endpoint, and the control plane should use it.
Instead here we are modelling the load balancer as a detail of the control plane, and as a consequence we have to rollup informations by four levels (infraLB, LB, KCP, Cluster) instead of three (infraLB, LB, Cluster).
It will be great id we can provide a little bit more context of why we are choosing this design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fabriziopandini consider control plane implementations that do not have separately managed load balancers, such as those backed by managed Kubernetes services. The current implementations for EKS and AKS require hacking things such that creation and management of the control plane is split among "infrastructure provider" and "control plane provider", to the point that the EKS implementation sets both the infrastructureRef
and controlPlaneRef
to the same AWSManagedControlPlane
reference: https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/46f54e483889596c4033dbbb23cec764576269b2/templates/cluster-template-eks.yaml#L10-L17. The current Azure implementation has a separate resource for infrastructure and controlPlane, but the infrastruucture resource just mirrors the values from the controlPlane resource: https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/15c0e592ace20ca93bbb7c0cacfb8e67f89b5125/exp/controllers/azuremanagedcluster_controller.go#L144-L147
I think by trying to keep the apiserver Load Balancer tied specifically to the Cluster rather than an attribute of the Control Plane will make it difficult (if not impossible) to avoid the hacky behavior that we currently have around this.
Another aspect that I believe is important is that when bootstrapping a Machine-based Control Plane, there is a bit of a dance that needs to be played between the load balancer and the control plane, having the required gating between components, related conditions, etc happen through the ControlPlane would provide better visibility and context to the user for troubleshooting purposes than trying to correlate them across the Cluster and ControlPlane resources.
Longer term, I also think it opens up the potential for automated defaulting. I don't think we have enough capability for introspection or for providers to give "hints" yet, but if we hypothetically had a way for infrastructure providers to say if someone is creating a FooCluster
, then use FooLoadBalancer
by default, then a Control Plane provider that requires an externally managed load balancer (such as KubeadmControlPlane
) could automatically create a FooLoadBalancer
resource (with certain default attributes) for the user if no LoadBalancerRef is specified. With the LoadBalancer being an attribute of the Control Plane, we could then fill in the rest of the blanks when creating the LoadBalancer
, (primarily the targetKind
and selector
). If we wanted to be able to provide similar defaulting with the apiserver load balancer being an attribute of the Cluster directly, we would also need control plane providers to also publish hinting information (including if no load balancer should be created/required).
Hopefully that helps provide additional context, if so, I can work on incorporating it directly in the doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed explanation.
I think that documenting that we are modelling the load balancer as a detail fo the control plane in order to provide better support form managed control plane solutions (like EKS) will be great
- 'disabled' (user requested disable complete) | ||
- other provider-relevant statuses for in-between states | ||
- Used by users/external systems to signal to the load balancer provider that a resource should be temporarily removed from the LB: | ||
- loadbalancer.cluster.x-k8s.io/disabled: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: might be we shall use to use another verb instead of disabled
, so we avoid confusion with status:disabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fabriziopandini Do you think that disable
would be better? Since status: disabled
means that this annotation has been acted on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be loadbalancer.cluster.x-k8s.io/force-disabled
to express the intent from external actors, and to report loadbalancer.cluster.x-k8s.io/status:disabled
to report the action been acted on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it's a good idea, but does it make any sense to align a bit to the ccm annotation node.kubernetes.io/exclude-from-external-load-balancers
?
In OpenStack you can just disable a lb member or you can completely remove it. What would be the intention in this case?
Or phrased differently. Does it matter for us if the lb member is temporarily disabled/removed or do we want to just remove/disable it until it's actively added/enabled again?
|
||
- If Spec.Selector is not defined: | ||
- Spec.Selector.MatchLabels = `{'cluster.x-k8s.io/cluster-name': Spec.ClusterName, 'loadbalancer.cluster.x-k8s.io/name': Metadata.Name }` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a default for targetKind = to machines?
@detiber is this PR already ready for review or would it be better if I wait a bit? :) |
It's still very much a work in progress as I try to continue to address things from the google doc, but please do review. It'll be at least a few days before I make any more changes as I focus on KubeCon prep. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@detiber Thanks for pushing this forward!!
|
||
## Summary | ||
|
||
We propose a first-class Load Balancer construct for Cluster API for the purpose of providing a pluggable model for load balancing Kubernetes cluster control planes such that the load balancer implementation isn't tied directly into the Infrastructure Cluster resource. This will open the possibility for multiple swappable options in cloud environments and the ability to re-use load balancer implementations when possible, especially for bare metal and data center environments with heterogeneous infrastructure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also already mention here that the lb provider can be used, even when the cluster controller is not used (e.g. because the cluster is externally managed)
|
||
There are also cases within existing infrastructure provider implementations where it would be beneficial to allow for easier swapping of different load balancer implementations that may provide different benefits under different circumstances. | ||
|
||
We've also seen use cases where there is a need for a Load Balancer to front non-control plane services, such as Ingress. Any functionality that we add should also support these use cases in addition to just control plane load balancing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a clear separation between the CAPI lb providers and existing cloud providers which implement services of type LoadBalancer, or is a certain overlap in functionality intended?
Is the idea that existing cloud providers can/should still be used in CAPI-managed clusters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there currently is a clear separation between the two, mainly because of limitations of the existing CCM implementations. I covered a bit in the Alternative proposals section, but it could definitely use a bit more elaboration here, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in Azure there is actually direct overlap (not sure if the same applies for others), because each VM can only be associated with one LB type "internal" and one LB type "public", which means we have to reuse the same LB cloud provider creates for services of type LoadBalancer
for things like node egress. See my comment here for more context: kubernetes-sigs/cluster-api-provider-azure#1229 (comment)
Is the idea that existing cloud providers can/should still be used in CAPI-managed clusters?
This should definitely be true, I wouldn't even suggest adding it as a goal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, there is a clear separation about what they can do.
I just think it's interesting how our (as CAPI project) view of the existing implementation is. Do we expect them to coexist or will our implementations supersede the existing ccm implementation (of course only in CAPI-managed clusters). Or don't we have an opinion and we don't need to as they can coexist without any effort on both sides. The overlap I currently see:
- we will depend on external CCMs so the Nodes get ready as soon as the in-tree implementation has been removed (although that's not the service controller, so it's a separate functionality in the CCM)
- we're currently also cleaning up CCM lbs during cluster deletion (at least in CAPA (e2e-test), although I also like to implement this in CAPO, but I don't know how other provider are handling this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this relates to this discussion: #3075
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CecileRobertMichon thank you for the additional context. If I'm understanding correctly, this overlap already exists today, and I wouldn't expect anything in this proposal to greatly affect that overlap more than it exists currently.
I do want to better understand the use case, though, is it okay if I reach out next week to discuss things further over a higher bandwidth medium so that I can better understand the nuances involved and incorporate more of the Azure-specific requirements into the proposal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there any follow-up here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if this isn't applicable to all platforms, I think an example would help here. E.g. User Story 4: allow to bring an AWS ELB into the fold on a vSphere (on AWS, VMC) cluster, which at the moment, wouldn't be possible using just CCM alone
### Goals | ||
|
||
- Provide a common abstraction for Load Balancers between various providers | ||
- Allow for reuse of load balancer providers between infrastructure providers where possible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the first step when implementing this will be to somewhat extract the existing lb implementations from existing providers. I would assume that providers like CAPO/CAPA/... will still keep the code in the respective repository. Will the extracted lb providers be published via additional release artifacts of CAPO/CAPA/...? (I'm only referring to "default" lb providers which only make sense in combination with a specific infra provider, e.g. OpenStack Octavia)
I assume we will have two kinds of lb providers. LB providers which only make sense in combination with a specific infra provider and LB provider which are usable with different infra providers. Do we want some kind of compatibility matrix documented somewhere central or should this be included in the documentation of the lb providers? Do we want to make the lb provider easily discoverable? I suppose we just add a section to the cluster-api book?
I think from a test perspective it would make sense to leave "infra provider" (CAPO/CAPA) specific lb providers like OpenStack Ocatvia in the respective "infra provider" repositories. Similar like CAPD in the main cluster-api repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the first step when implementing this will be to somewhat extract the existing lb implementations from existing providers. I would assume that providers like CAPO/CAPA/... will still keep the code in the respective repository. Will the extracted lb providers be published via additional release artifacts of CAPO/CAPA/...? (I'm only referring to "default" lb providers which only make sense in combination with a specific infra provider, e.g. OpenStack Octavia)
I assume we will have two kinds of lb providers. LB providers which only make sense in combination with a specific infra provider and LB provider which are usable with different infra providers. Do we want some kind of compatibility matrix documented somewhere central or should this be included in the documentation of the lb providers? Do we want to make the lb provider easily discoverable? I suppose we just add a section to the cluster-api book?
I do think we'll need to figure out documentation and how to portray the support matrix somehow. As far as discoverability, some of the available options can be exposed through clusterctl
or the related operator work as built-in providers, similar to how existing infrastructure providers are exposed today.
I think from a test perspective it would make sense to leave "infra provider" (CAPO/CAPA) specific lb providers like OpenStack Ocatvia in the respective "infra provider" repositories. Similar like CAPD in the main cluster-api repo.
Indeed, I do believe that it'll make sense to keep provider specific implementations in the same repositories as their current functionality exists, but how they exist there may need to change to facilitate their use.
For example, where there are situations such as a "provider" deployed on top of another "provider", think vSphere (or even OpenStack) on top of a public cloud and leveraging a more native LB solution to the public cloud provider. While not all combinations can or should be supported I know there is a desire to support this on the CAPV side.
conditions: (List of infra providers condition for the load balancer) | ||
``` | ||
|
||
#### Annotations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume those annotations should be set on the targetKind
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I'll clarify
- 'disabled' (user requested disable complete) | ||
- other provider-relevant statuses for in-between states | ||
- Used by users/external systems to signal to the load balancer provider that a resource should be temporarily removed from the LB: | ||
- loadbalancer.cluster.x-k8s.io/disabled: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it's a good idea, but does it make any sense to align a bit to the ccm annotation node.kubernetes.io/exclude-from-external-load-balancers
?
In OpenStack you can just disable a lb member or you can completely remove it. What would be the intention in this case?
Or phrased differently. Does it matter for us if the lb member is temporarily disabled/removed or do we want to just remove/disable it until it's actively added/enabled again?
|
||
Generic Cluster API resources take precedence on infra resources | ||
If fields subject to contract are set on CAPI resources, the provider is responsible of copying them back to infra resources before starting to reconcile | ||
If fields subject to contract aren’t set on CAPI resources, CAPI controllers will take care of pulling the values from the infrastructure resources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or to phrase it differently: Only CAPI controllers are writing to CAPI resources (as usual).
(CAPI resources defined as the resources in the group cluster.x-k8s.io
, and CAPI controllers are the controllers owning them)
(just to confirm my assumption)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll work to add additional clarification here and ensure there is consistency with other parts of the proposal around this.
The idea here is that a user shouldn't have to set desired configuration specifically on the provider-specified types, they should be able to set it on the generic CAPI types and that should be honored by all components in the reconciliation chain. We also don't want "child" resources modifying their "parent" resources, but the "parent" resource may modify the "child" resource.
|
||
Kubernetes already provides a load balancer construct of Service Type=Loadbalancer. The service API allows pointing at arbitrary IPs, which could in fact be the machines of a workload cluster’s control plane. | ||
|
||
While we believe this approach has longer term benefits, there are a few reasons that it does not fit the short term requirements: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a major difference is in which clusters the resources, specifying the loadbalancers, are deployed.
In our case (our managed Kubernetes-as-a-Service) we currently have a clean separation:
- users of the workload cluster can define services of type LoadBalancer in the workload cluster (reconciled via ccm) (CSI also works the same way)
- this is also somewhat relevant to be able to completely backup/restore a workload cluster
- management cluster operators can define control plane LoadBalancer in the management cluster
If now the non-control-plane loadbalancers should be configured in the management cluster and we want to enable workload clusters to configure them, we have to give workload cluster users access to the management cluster. But maybe that's just something we have to figure out downstream because it's specific to our use case / service.
I think overall I really like the idea that the lb providers can be used for control-plane and non-control-plane loadbalancers. I assume the currently available cloud providers don't really fit in the big picture of CAPI-managed clusters. I'm just not entirely sure if we have clear picture yet how non-control-plane loadbalancers fit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a major difference is in which clusters the resources, specifying the loadbalancers, are deployed.
Yes, I think this is the important distinction that I need to clarify better in the proposal.
In our case (our managed Kubernetes-as-a-Service) we currently have a clean separation:
users of the workload cluster can define services of type LoadBalancer in the workload cluster (reconciled via ccm) (CSI also works the same way)
- this is also somewhat relevant to be able to completely backup/restore a workload cluster
management cluster operators can define control plane LoadBalancer in the management cluster
If now the non-control-plane loadbalancers should be configured in the management cluster and we want to enable workload clusters to configure them, we have to give workload cluster users access to the management cluster. But maybe that's just something we have to figure out downstream because it's specific to our use case / service.
I don't think we want to take over that use case, my thought around non-controlPlane load balancers was infrastructure that would have chicken/egg scenarios with the cluster creation. Say I'm standing up n clusters, and for each one I have requirements to configure both load balancing for the control plane, but also provider-specific ingress for the cluster worker nodes (or a subset of the worker nodes). While there may be some combinations of ingress solutions that could work with a subset of CCM providers to provide that functionality, that functionality isn't universal.
I think overall I really like the idea that the lb providers can be used for control-plane and non-control-plane loadbalancers. I assume the currently available cloud providers don't really fit in the big picture of CAPI-managed clusters. I'm just not entirely sure if we have clear picture yet how non-control-plane loadbalancers fit.
Hopefully the above helps clarify things, the intention is not to try and assume control over things that can be filled by CCM providers, but rather to supplement functionality where they don't currently meet it.
On a longer term, it would be great if CCM implementations were able to be run in a multi-tenancy based deployment on a management cluster and could fill the needs of both management cluster managed load balancers and workload cluster managed load balancers, however I think it would take significant effort (and time) to drive the functionality needed to support both of those deployment scenarios and use cases on the CCM side.
CP provider issues machine delete | ||
LB provider processes hook, quiesces, removes its hook annotation | ||
Machine is deleted | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know WIP :). I think we need an example for non-control-plane loadbalancers to think those cases through.
Some questions:
- How would non-control-plane loadbalancers be "connected" to clusters? I assume similar to MachineDeployments today?
- Is it possible to reference
targetKinds
of multiple clusters in non-control-plane loadbalancers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know WIP :). I think we need an example for non-control-plane loadbalancers to think those cases through.
Some questions:
- How would non-control-plane loadbalancers be "connected" to clusters? I assume similar to MachineDeployments today?
My thought is that they would be somewhat disjoint, connected to the Cluster through the Spec.ClusterName
field, at least to start. From the perspective of the Cluster, we could handle it similar to how we handle other "descendants" of the Cluster, such as MachineDeployments, MachineSets, and Machines that have 1:many relationships.
- Is it possible to reference
targetKinds
of multiple clusters in non-control-plane loadbalancers?
Is there a use case where that would be desirable? I'd probably lean towards 'no' without a really compelling use case where it would be needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to reference targetKinds of multiple clusters in non-control-plane loadbalancers?
Is there a use case where that would be desirable? I'd probably lean towards 'no' without a really compelling use case where it would be needed.
I agree. I was just wondering. As far as I know the OpenStack ccm allows loadbalancer re-use between different Kubernetes clusters. If nobody really needs it, I wouldn't support it as it adds additional complexity.
|
||
##### NFR2 | ||
|
||
This proposal MUST be compatible with the [Cluster API Provider AWS v1alpha4 API](https://docs.google.com/document/d/1EeD2KjFfeKoD-o0LJTMMWh9GUy-15yG80S5Uoj7oKcA/edit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it part of the "compatibility guarantee" to ensure whatever we're implementing here is compatible with our current implementations?
Not sure if it's realistic to extend this to all our "official" providers (imho we should do it on a best-effort basis depending on if we get feedback from everyone)
|
||
### Goals | ||
|
||
- Provide a common abstraction for Load Balancers between various providers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would provider specific customizations be handled? One example I'm thinking of is allowing a custom number of frontend public IPs kubernetes-sigs/cluster-api-provider-azure#1229, which is crucial to avoid SNAT port exhaustion in large clusters in Azure since you can't attach multiple SLBs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make sure to add some additional context to the proposal regarding this, but the idea is that they will be exposed through provider specific load balancer types, similar to how provider-specific configuration is exposed through the provider-specific *Cluster and *Machine types today.
- What changes (in invocations, configurations, API use, etc.) is an existing cluster required to make on upgrade in order to keep previous behavior? | ||
- What changes (in invocations, configurations, API use, etc.) is an existing cluster required to make on upgrade in order to make use of the enhancement? | ||
|
||
### For infrastructure providers taht provide LB constructs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### For infrastructure providers taht provide LB constructs | |
### For infrastructure providers that provide LB constructs |
- @fabriziopandini | ||
- @JoelSpeed | ||
creation-date: 2020-10-26 | ||
last-updated: 2021-03-25 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the file name has a typo, date is 35 instead of 25
User wants Cluster API to manage the endpoint for the cluster control plane. | ||
|
||
Preconditions: | ||
- `Spec.ControlPlane.Endpoint` is empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we close #3715 in favor of this proposal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First round of reviews from my side. I got a bit lost at some point through the details of the proposal where I wasn't sure which controller/reconcilers I was reviewing anymore.
I also had a question on why we need endpoint
on both Cluster and the ControlPlane resource
|
||
There are also cases within existing infrastructure provider implementations where it would be beneficial to allow for easier swapping of different load balancer implementations that may provide different benefits under different circumstances. | ||
|
||
We've also seen use cases where there is a need for a Load Balancer to front non-control plane services, such as Ingress. Any functionality that we add should also support these use cases in addition to just control plane load balancing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there any follow-up here?
|
||
- Provide a common abstraction for Load Balancers between various providers | ||
- Allow for reuse of load balancer providers between infrastructure providers where possible | ||
- Load Balancer must be able to interoperate with alternative Control Plane Providers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Load Balancer must be able to interoperate with alternative Control Plane Providers | |
- Load Balancer must be able to interoperate with any Control Plane Providers |
Maybe? I assumed this was implying alternative to KCP, although wanted to be more generic
##### Story 3a - Custom endpoints for consumers | ||
As a platform operator, in my environment we use PowerDNS for DNS registration with OpenStack, an internal ACME server and a 3rd party load balancer provider. I would like to provide a cluster as a service offering to my development teams, and provide a friendly DNS endpoint for the cluster. When a new cluster is provisioned, I want the load balancer to be configured, a DNS entry created via the PowerDNS API and a certificate to be issued for end users to connect to the API server with a known CA certificate. A static IP is assumed for the load balancer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are we planning to allow support for this use case?
|
||
### New Types | ||
|
||
#### Load Balancer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't gone far down the proposal yet, is there going to be a controller in CAPI managing these types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow-up: If there is no controller in CAPI itself, or if the controller only reconciles status, should we consider having this type to be embeddable in the infrastructure specific provider instead and provide utilities to update its status instead of a full controller?
#### Defaults | ||
|
||
- If Spec.Selector is not defined: | ||
- Spec.Selector.MatchLabels = `{'cluster.x-k8s.io/cluster-name': Spec.ClusterName, 'loadbalancer.cluster.x-k8s.io/name': Metadata.Name }` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also force the control plane label here?
- loadbalancer.cluster.x-k8s.io/name: `<name of lb>` | ||
- loadbalancer.cluster.x-k8s.io/status: `<status>` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are infrastructure providers in charge of setting this annotation on a LoadBalancer resource? If so, wouldn't it be more useful to have these values under spec or status instead and have them as a contract?
|
||
#### Conditions | ||
|
||
- DNSResolvable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- DNSResolvable | |
- IsReachable |
Generic enough that could work with IP-based load balancers as well
#### Conditions | ||
|
||
- DNSResolvable | ||
- LBExists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- LBExists | |
- Exists |
Do we need the LB prefix?
|
||
![With Load Balancer Provider](images/load-balancer/cluster-creation-w-loadbalancer-sequence.png) | ||
|
||
![Without Load Balancer Provider](images/load-balancer/cluster-creation-wo-loadbalancer-sequence.png)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo additional ]
pods: | ||
cidrBlocks: | ||
- 192.168.0.0/16 | ||
controlPlaneRef: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the examples here do not align with the proposes changes to cluster here https://github.com/kubernetes-sigs/cluster-api/pull/4389/files#diff-3dabbc43888d2653843c8851f0293425b1edcfa8d613d654d5a8f1dbab17e979R173
|
||
There are also cases within existing infrastructure provider implementations where it would be beneficial to allow for easier swapping of different load balancer implementations that may provide different benefits under different circumstances. | ||
|
||
We've also seen use cases where there is a need for a Load Balancer to front non-control plane services, such as Ingress. Any functionality that we add should also support these use cases in addition to just control plane load balancing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if this isn't applicable to all platforms, I think an example would help here. E.g. User Story 4: allow to bring an AWS ELB into the fold on a vSphere (on AWS, VMC) cluster, which at the moment, wouldn't be possible using just CCM alone
- Load Balancer must be able to interoperate with alternative Control Plane Providers | ||
- Continued reconciliation of members of the Load Balancer (quiescing of connections, allow for graceful termination/removal of membership as part of deletion mechanisms) | ||
- Should provide an upgrade path for existing Clusters (specifically vsphere clusters and the current Load Balancer provider) | ||
- Should not be tied to a specific Control Plane provider implementation (to support alternative Control Plane provider implementations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a duplicate of interoperate with alternative Control Plane Providers
on L#78?
- Should provide an upgrade path for existing Clusters (specifically vsphere clusters and the current Load Balancer provider) | ||
- Should not be tied to a specific Control Plane provider implementation (to support alternative Control Plane provider implementations) | ||
- Should not be a required component of a "Cluster", since some control plane implementations do not require a load balancer (for example, Control Plane providers based on managed Kubernetes services) | ||
- Must be optional to start, to limit the changes required of infrastructure providers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the complexity that having separate LB CRs adds (moar CRs for users to understand, extra controllers to run and update), do we plan to enforce this on all providers in the long term?
Could we consider this to be an "if you need it" or power user feature and change this requirement to Must be optional
?
|
||
- Existing Service Type=LoadBalancer implementations do not support co-deployment with other implementations | ||
|
||
- ??? (I know there were other reasons, but blanking on them at the moment, need to finish out this list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use an AWS CCM to implement SotLB if the cluster itself is running on vSphere/VMC? Does that need to go here? (Or rather a generic version of)
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4 | ||
kind: AWSCluster | ||
metadata: | ||
name: my-cluster | ||
spec: | ||
region: eu-west-1 | ||
sshKeyName: default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally an AWSCluster would itself create a load balancer, how does it know not to do that in this case where it isn't needed?
## Motivation | ||
|
||
In private data center environments, there is no common infrastructure provider for load balancers vs. the provider for the machines themselves. Virtualized environments may use one vendor for virtual machines, e.g. VMware vSphere, KVM, or Xen, and a variety of vendors for networks and load balancers, e.g. F5, VMware NSX-T, Cisco ACI etc… This requires the ability to plug in different infrastructure providers for networking constructs with that of the machine provisioner. The most common use is the provisioning of a load balancer to sit in front of the Kubernetes control plane instances in order to provide a stable endpoint. Many cluster operators will also want to provision a stable machine-based load balancer to front all workload services from that provisioned cluster. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many cluster operators will also want to provision a stable machine-based load balancer to front all workload services from that provisioned cluster.
This is mentioned here but then I think all use cases and examples are tied to the controlPlane. Do we want to focus on controlPlane use case for now?
Does the approach in this proposal provide benefits over a service lb type for both use cases?
Closing this for now given that a group of folks is going to take it over to a Google doc for further collaboration /close |
@vincepri: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@JoelSpeed @randomvariable @davidewatson Hey folks, |
I'm still interested in the idea of supporting multiple load balancers (internal and external) though the provider part itself is a much larger piece of work which I am unlikely to be able to contribute to in the near term |
What this PR does / why we need it:
Adds the Load Balancer Provider, that was initially discussed in #1250, and further iterated on in https://docs.google.com/document/d/1wJrtd3hgVrUnZsdHDXQLXmZE3cbXVB5KChqmNusBGpE/edit#
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #1250
Related: #4095 #3715