Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Implement MachineDrainRules #11353

Merged

Conversation

sbueringer
Copy link
Member

@sbueringer sbueringer commented Oct 30, 2024

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Part of #11240

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-area PR is missing an area label labels Oct 30, 2024
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Oct 30, 2024
@sbueringer sbueringer added the area/machine Issues or PRs related to machine lifecycle management label Oct 30, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/needs-area PR is missing an area label label Oct 30, 2024
@sbueringer
Copy link
Member Author

/assign @fabriziopandini

/assign @JoelSpeed

@JoelSpeed If you have time, would be great if you can review the new API (api/v1beta1/machinedrainrules_types.go). No changes compared to the proposal (as far as I can tell :)).

@sbueringer
Copy link
Member Author

/test pull-cluster-api-e2e-main

api/v1beta1/machinedrainrules_types.go Show resolved Hide resolved
api/v1beta1/machinedrainrules_types.go Show resolved Hide resolved
api/v1beta1/machinedrainrules_types.go Outdated Show resolved Hide resolved
api/v1beta1/machinedrainrules_types.go Show resolved Hide resolved
api/v1beta1/machinedrainrules_types.go Outdated Show resolved Hide resolved
api/v1beta1/machinedrainrules_types.go Show resolved Hide resolved
api/v1beta1/machinedrainrules_types.go Show resolved Hide resolved
api/v1beta1/machinedrainrules_types.go Show resolved Hide resolved
api/v1beta1/machinedrainrules_types.go Outdated Show resolved Hide resolved
api/v1beta1/machinedrainrules_types.go Outdated Show resolved Hide resolved
@sbueringer
Copy link
Member Author

sbueringer commented Oct 30, 2024

@enxebre @JoelSpeed Thx for the quick reviews. Findings either fixed or answered to the conversations

(I'll look into the e2e test failure, needs a minor fix in the in-memory provider)

@sbueringer
Copy link
Member Author

/test pull-cluster-api-e2e-main

Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome job!

api/v1beta1/machinedrainrules_types.go Outdated Show resolved Hide resolved
docs/book/src/reference/api/labels-and-annotations.md Outdated Show resolved Hide resolved
internal/controllers/machine/drain/filters.go Show resolved Hide resolved
internal/controllers/machine/drain/filters.go Outdated Show resolved Hide resolved
internal/controllers/machine/drain/drain.go Show resolved Hide resolved
api/v1beta1/machinedrainrules_types.go Outdated Show resolved Hide resolved
api/v1beta1/machinedrainrules_types.go Outdated Show resolved Hide resolved
api/v1beta1/machinedrainrules_types.go Show resolved Hide resolved
api/v1beta1/machinedrainrules_types.go Show resolved Hide resolved
api/v1beta1/machinedrainrules_types.go Outdated Show resolved Hide resolved
api/v1beta1/machinedrainrules_types.go Outdated Show resolved Hide resolved
api/v1beta1/machinedrainrules_types.go Outdated Show resolved Hide resolved
api/v1beta1/machinedrainrules_types.go Outdated Show resolved Hide resolved
api/v1beta1/machinedrainrules_types.go Outdated Show resolved Hide resolved
internal/webhooks/machinedrainrules.go Show resolved Hide resolved
@sbueringer sbueringer added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Oct 31, 2024
@sbueringer
Copy link
Member Author

@JoelSpeed I think everything either fixed or answered

@sbueringer sbueringer force-pushed the pr-machine-drain-rules branch from b8a7e87 to 05cae45 Compare October 31, 2024 14:27
@sbueringer
Copy link
Member Author

/test pull-cluster-api-e2e-main

@sbueringer
Copy link
Member Author

/test pull-cluster-api-e2e-main

@sbueringer
Copy link
Member Author

/test pull-cluster-api-e2e-main

// +listType=atomic
// +kubebuilder:validation:MinItems=1
// +kubebuilder:validation:MaxItems=32
// +kubebuilder:validation:XValidation:rule="self.all(x, self.exists_one(y, x == y))",message="entries in pods must be unique"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one note on these, what's the minimum cluster version that these would work in?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.28. We cannot use CEL, missed that. CI on main would have caught it eventually..

Copy link
Contributor

@JoelSpeed JoelSpeed Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did you get 1.28 from?

I think this validation would work from 1.26 clusters onwards

Edit: CEL validations have been on by default from 1.25, forced on from 1.29

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. Misread the comment from Vince. v1.28 is the miminum Kubernetes mgmt cluster version that Cluster API v1.9 will support (source: https://main.cluster-api.sigs.k8s.io/reference/versions#core-provider-cluster-api-controller)

According to the Kubernetes website validation rules have become stable with Kubernetes 1.29: https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#validation-rules

Copy link
Member Author

@sbueringer sbueringer Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what that means for our Kubernetes version support. I think currently we just state that we support >= 1.28.

We didn't specify details like: "We support 1.28, but only if you don't disable CEL"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good point. I would wager most people don't disable beta, enabled by default Kube features, but, we could in theory break someone by adding this.

What release is this going into? Could a release note be added for this release saying we support 1.28 but you must not have disabled CEL?

Copy link
Member Author

@sbueringer sbueringer Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goes into CAPI v1.9 in December.

Could a release note be added for this release saying we support 1.28 but you must not have disabled CEL?

Not sure if it's worth it, just to get 2 CEL validations in there :)

Random example:

Starting with 1.30 kubernetes version and 1.27 LTS versions the beta APIs will be disabled by default when you upgrade to them.
https://learn.microsoft.com/en-us/azure/aks/upgrade-aks-cluster?tabs=azure-cli#before-you-begin

(But this is not about CEL, it's only for >= 1.30 and I don't know if what they do with beta APIs is comparable to beta feature gates. In general I don't know what policies folks follow for beta features)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damn you AKS 😂 Do we risk it?

Or do we implement the same CEL validations as webhooks for now, and leave the markers there (disable them somehow?) with a todo to reimplement them and delete the code once the minimum kube version is bumped to 1.29 for CAPI?

Copy link
Member Author

@sbueringer sbueringer Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to keep it simple and do the following:

  • Implement the validation in the webhook for now
  • Open a follow-up issue to track the API validation discussion in general and also add a specific sub-task for this validation

Context: current plan is that CAPI v1.10 supports mgmt clusters >= v1.29 (which means that once we branched away release-1.9 in 3 weeks this is not a blocker anymore anyway).

@sbueringer
Copy link
Member Author

sbueringer commented Nov 1, 2024

/test pull-cluster-api-e2e-mink8s-main
(to confirm CI would have caught the issue with CEL)

Update: Looks like it works because CEL is enabled per default in 1.28 (xref: #11353 (comment)).

@sbueringer
Copy link
Member Author

/test pull-cluster-api-e2e-mink8s-main
/test pull-cluster-api-e2e-main

@sbueringer
Copy link
Member Author

@fabriziopandini @JoelSpeed @vincepri PTAL :)

@enxebre
Copy link
Member

enxebre commented Nov 4, 2024

lgtm

Copy link
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API and validations LGTM

@sbueringer
Copy link
Member Author

@enxebre PTAL, thx :)

@sbueringer
Copy link
Member Author

/test pull-cluster-api-e2e-mink8s-main
/test pull-cluster-api-e2e-main

@enxebre
Copy link
Member

enxebre commented Nov 4, 2024

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 4, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: e2de3e9859256144dca80d0e87a62675aed8dc4c

@fabriziopandini
Copy link
Member

Great work
/lgtm
/approve

/hold feel free to cancel anytime

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 5, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fabriziopandini

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 5, 2024
@sbueringer
Copy link
Member Author

Thx everyone!

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 5, 2024
@sbueringer
Copy link
Member Author

sbueringer commented Nov 5, 2024

Flake is independent of this PR and will be fixed in a separate PR

@k8s-ci-robot k8s-ci-robot merged commit f9cd33f into kubernetes-sigs:main Nov 5, 2024
22 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.9 milestone Nov 5, 2024
@sbueringer sbueringer deleted the pr-machine-drain-rules branch November 5, 2024 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/machine Issues or PRs related to machine lifecycle management cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants