-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cluster-api based cloudprovider #1866
Add cluster-api based cloudprovider #1866
Conversation
As mentioned in the PR description we use an annotation to delete a specific machine during scale down. This is the corresponding cluster-api implementation: kubernetes-sigs/cluster-api#726 |
I did a PoC of scale from 0 in the openshift implementation: openshift#89. This PR is derived from the openshift implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of nits, but overall this just works, thanks for your work! :)))
cluster-autoscaler/cloudprovider/clusterapi/clusterapi_controller.go
Outdated
Show resolved
Hide resolved
cluster-autoscaler/cloudprovider/clusterapi/clusterapi_controller.go
Outdated
Show resolved
Hide resolved
cluster-autoscaler/cloudprovider/clusterapi/clusterapi_controller.go
Outdated
Show resolved
Hide resolved
cluster-autoscaler/cloudprovider/clusterapi/clusterapi_machinedeployment.go
Outdated
Show resolved
Hide resolved
cluster-autoscaler/cloudprovider/clusterapi/clusterapi_nodegroup.go
Outdated
Show resolved
Hide resolved
cluster-autoscaler/cloudprovider/clusterapi/clusterapi_provider.go
Outdated
Show resolved
Hide resolved
The following users are mentioned in OWNERS file(s) but are not members of the kubernetes org.
|
tested and works like charm. frobware was extremely helpful with setting it up. unfortunately, at this moment it's dependent on nodelink-controller and openshift at some time renamed cluster.k8s.io to openshift.machine.io so it's needed to use commit before this changes (e87de64297ff7b9352b72d7512f1f50abf9c15a4) to get it working outside openshift. (can't wait to abadon this dependency) |
Merging the following PRs from OpenShift will remove the nodelink-controller dependency:
If I push these two commits into this PR you will have to manually accommodate the following PR to make openstack work again: |
7e4b91f
to
00edc56
Compare
8b32581
to
b4efe3c
Compare
Scale down can fail unless you have PR #2096. |
b4efe3c
to
7a31c91
Compare
7a31c91
to
c3c1551
Compare
…1 alias This is largely to be consistent with other usages (in the community) but really to be at parity with the upstream PR [1] that uses this import alias already. This also makes it easier to backport changes made from openshift/autoscaler into upstream. [1] kubernetes#1866
…1 alias This is largely to be consistent with other usages (in the community) but really to be at parity with the upstream PR [1] that uses this import alias already. This also makes it easier to backport changes made from openshift/autoscaler into upstream. [1] kubernetes#1866
We're evaluating this feature and overall it looks quite solid with our cluster-api (v1alpha1) implementation. Is there any reason this PR gets very little feedback / approval? |
/lgtm |
Access to this is required by cloudprovider/clusterapi.
Enable cloudprovider/clusterapi.
This adds a new cloudprovider based on the cluster-api project: https://github.com/kubernetes-sigs/cluster-api
These are copied to facilitate testing. They are not meant to reflect upstream clusterapi/v1alpha1 - in fact, fields have been removed. They are here to support the switch to unstructured types in the tests without having to rewrite all of the unit tests.
The autoscaler expects provider implementations nodeGroups to implement the Nodes() function to return the number of instances belonging to the group regardless of they have become a kubernetes node or not. This information is then used for instance to realise about unregistered nodes https://github.com/kubernetes/autoscaler/blob/bf3a9fb52e3214dff0bea5ef2b97f17ad00a7702/cluster-autoscaler/clusterstate/clusterstate.go#L307-L311
We index on providerID but it turns out that those values on node and machine are not always consistent. Some encode region, some do not, for example. This commit normalizes all values through the normalizedProviderString(). To ensure that we catch all places I've introduced a new type and made the find() functions take this new type in lieu of a string. Unit tests have also been adjusted to introduce a 'test:///' prefix on the providerID value to further validate the change. This change allows CAPI to work out-of-the-box, assuming v1alpha2. It's also reasonable to assert that this consistency should be enforced elsewhere and to make this behaviour easily revertable I'm leaving this as a separate commit in this patch series.
…8f44206ff4dd9b58386d96462b01a3d79fb1 (f8ff8f4)
cf4575f
to
3955223
Compare
@MaciekPytel @enxebre @elmiko @hardikdr @detiber: I rebased for the updated vendor in PR #2914 I also added |
/lgtm |
/lgtm thanks @frobware ! |
/lgtm |
1 similar comment
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: MaciekPytel The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This is a new cloudprovider implementation based on the cluster-api project.
This PR has been cut from openshift@b38dd11 and updated to reflect changes in autoscaler master (1.15); the openshift version is using 1.13. This implementation has been working well for many months and will scale up/down via a MachineSet or a MachineDeployment.
Known limitations:
Node groups are represented when a MachineSet or MachineDeployment has positive scaling values. The min/max values are encoded as annotations on the respective objects, for example:
To map between nodes and machines we currently depend on the following annotation getting added to the node object, for example:
We currently do this using a nodelink-controller but we have future plans to remove this and rely on the
node.Spec.ProviderID
value.For scale down the cloudprovider implementation annotates the machine object with:
and the machine controller will drain the node, delete the machine, then finally delete the node.
Using
cluster.k8s.io/delete-machine
will force thebetterDelete
deletion policy in the machineset controller. The default deletion policy israndom
but machines annotated withcluster.k8s.io/delete-machine
will be deleted in preference.