Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP 1618: Optional garbage collection of finished Workloads #2742

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

mwysokin
Copy link
Contributor

@mwysokin mwysokin commented Aug 1, 2024

What type of PR is this?

/kind documentation
/kind api-change

What this PR does / why we need it:

Which issue(s) this PR fixes:

KEP for #1618

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/feature Categorizes issue or PR as related to a new feature. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Aug 1, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @mwysokin. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 1, 2024
Copy link

netlify bot commented Aug 1, 2024

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 2efe734
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/66b60ece996fd600085cf14a

@mimowo
Copy link
Contributor

mimowo commented Aug 1, 2024

/cc @tenzen-y @PBundyra

@mimowo
Copy link
Contributor

mimowo commented Aug 1, 2024

nit, please say "KEP for #1618" in the PR description instead of "Fixes". Otherwise the robot will close the issue after merging the PR :)

@mwysokin
Copy link
Contributor Author

mwysokin commented Aug 1, 2024

/assign

Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass, it lgtm overall. Left some comments to improve it.

keps/1618-optional-gc-of-workloads/README.md Outdated Show resolved Hide resolved
keps/1618-optional-gc-of-workloads/README.md Outdated Show resolved Hide resolved
keps/1618-optional-gc-of-workloads/README.md Show resolved Hide resolved
@mimowo
Copy link
Contributor

mimowo commented Aug 2, 2024

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 2, 2024
…ories into 1 and move the description of the behavior to a new section under Design Details.
leading to unnecessary use of etcd storage and a gradual increase in Kueue's memory footprint. Some of these objects,
like finished Workloads, do not contain any useful information that could be used for additional purposes, such as debugging.
That's why, based on user feedback in [#1618](https://github.com/kubernetes-sigs/kueue/issues/1618)
or [#1789](https://github.com/kubernetes-sigs/kueue/issues/1789), a mechanism for garbage collection of finished Workloads
Copy link
Contributor

@mimowo mimowo Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: I'm wondering if the KEP and the initial implementation tackle this issue, because the workloads in the context of this issue are not necessarily finished. It might still be ok to mention in motivation for the generic KEP on workload cleanups, but I would like to clarify if the first version will cover it.

Copy link
Contributor Author

@mwysokin mwysokin Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked and even though an extension to the code would be minimal I'd rather keep it as a separate task not to pollute and further delay this feature. I will update the KEP once I have something more polished then just a draft version of how the removal of orphaned workloads would look like. For now I'm going to remove the orphaned Workloads issue from the KEP. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for checking I think removing them from the KEP makes sense, because:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, on the second thought. I think it is better to honor user's choice of using "orphan" mode. So, I would suggest to just transition them to finished state. This way we could again link the issue and the KEP.

Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one remaining question from me: https://github.com/kubernetes-sigs/kueue/pull/2742/files#r1708818068. @tenzen-y would you like to give it a pass?

Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve
/hold
To wait for feedback from @tenzen-y as another pair of eyes.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 8, 2024
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 8, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 4b4f2f1108238ec7b0edef46b807fe0cc471a4d8

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mimowo, mwysokin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 8, 2024
@PBundyra
Copy link
Contributor

PBundyra commented Aug 8, 2024

LGTM from my side as well

@tenzen-y
Copy link
Member

tenzen-y commented Aug 8, 2024

/lgtm /approve /hold To wait for feedback from @tenzen-y as another pair of eyes.

ACK

@tenzen-y
Copy link
Member

tenzen-y commented Aug 8, 2024

/remove-kind api-change

@k8s-ci-robot k8s-ci-robot removed the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Aug 8, 2024

- Support the deletion of finished Workload objects.
- Introduce configuration of global retention policies for Kueue-authored Kubernetes objects,
starting with finished Workloads.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that you will work on the deletion of other objects except the finished Workloads in the next iteration?

But, you mention that ''Support the deletion/expiration of any Kueue-authored Kubernetes object other than finished Workloads.'' in the Non-Goals section.

Copy link
Contributor Author

@mwysokin mwysokin Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal was to include certain extensibility into the design so that other objects could be added without modifying root lvl API (Jobs, etc.). That's the main reason why the API key for Workloads is nested. I haven't seen any issues for removal of other completed object (other than orphaned Workloads #1789) but if there's any interest in extending that feature I'd be happy to work on it but figuring out how to remove every object that Kueue owns wasn't the goal here, that's where the items in goals/non-goals came from. I'm also happy to work on any updates to this KEP if we decide to extend the cleanup process to other objects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend limiting the scope for this KEP to Kueue managed objects, not only to Kueue Workload objects since actually, we aim to extend the API for the future extensibility for the other Kueue managed resources like AdmissionCheck.

The v1beta1 means that we can not bring breaking changes to APIs without bumping the API version.
So, I would like to take care of future API possibilities in this KEP.

So, my recommendation is to remove "starting with finished Workloads." from Goal, and move the statement to the proposal. And then, mention that we do not aim to add APIs for the Jobs like batch/job, RayJob as we discussed in another thread...
I'm wondering if we need to add another mechanisms for the Jobs retention in the future when we find the actual use cases.

Copy link
Contributor Author

@mwysokin mwysokin Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Let me propose another set of changes before the end of the week.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduce configuration of global retention policies for Kueue-authored Kubernetes objects

This sentence matches Workload and ProvisioiningRequest. We don't need to clean up provisioning requests explicitly, because they are children of the Workloads.

So, for now, we only need to take care of Workloads. But I agree we should keep the API extensible for supporting the cleanup of finished jobs, but that can be addressed in a separate KEP.

Comment on lines +208 to +218
- **R**: In clusters with a large number of existing finished Workloads
(thousands, tens of thousands, etc.), it may take a significant amount of time
to delete all previously finished Workloads that qualify for deletion. Since the
reconciliation loop is effectively synchronous, this may impact the reconciliation
of new jobs. From the user's perspective, it may seem like Kueue's initialization
time is very long for that initial run until all expired objects are deleted.\
**M**: Unfortunately, there is no easy way to mitigate this without complicating the code.
A potential improvement would be to have a dedicated internal queue that handles deletion
outside the reconciliation loop, or to delete expired objects in batches so that after
a batch of deletions is completed, any remaining expired objects would be skipped until
the next run of the loop. However, this would go against the synchronous nature of the reconciliation loop.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that we can mitigate this risk by setting the strictly qps only when admins enable this feature.
Once we limit the client qps and burst in the KueueConfiguration, we can mitigate the burst kube-apiserver load.

clientConnection:
qps: 50
burst: 100

So, I would propose the mentioning limiting client qps when the admins enable this feature in the existing cluster.
Hence, could you mention this mitigation here?

-->


### API
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I feel that this API looks like a half-bake to me.
Because when we add new policies to garbage collect other resources like integrations Jobs, we need to hard cord the API name here like FinishedRayClusterRetention. But, the Kueue integration Jobs are the extensible interface for external Jobs like AppWrapper or in-house one. So, I guess that it is challenging to maintain retention policies for all integrations like:

[...]
objectRetentionPolicies:
  finiishedWorkloadRetention: 10s
  failedRayClusterRetention: 30s
  # How to specify in-house or other OSS resource policy, here?
  # In the upstream Kueue, we can not find all integrations 
  # since the integration framework could be implemented in other places regardless of private or public repositories.
[...]

From another perspective, even if we restrict the objective objects to specify retention duration, this API is a little bit extensible because in spite of that the API says that this API is used for retention policies for "object", but in this case, actually, we can specify only Workload retention policies.

objectRetentionPolicies:
 finishedWorkloadRetention: 10s
 succeededWorkloadRetention: 30s 

In conclusion, I would ask which objects will you support in this feature through all iterations?
It's only Workload? or It's arbitrary objects?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I started working on this KEP without too much vision but rather I wanted to implement #1618 and I wanted to make the API flexible enough so that it could be extended with other object types in the future. After reading what you said it indeed makes sense to come up with a consistent approach for all objects so that user's could benefit from Kueue's extensibility.

I don't have much experience with KEPs so I'm not sure what should happen now. For sure I'd like to ship the feature so that it doesn't slowly fades away in the in-review stage but at the same time if you feel that it's half-baked maybe I need to make a step back and come up with a more flexible approach. All feedback and guidance is really welcome 🙇

I'll do some research over the weekend and I'll try to propose something early next week.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple of questions / aspects I see touched here:

  1. should we make it generic to cover Job CRDs?
  2. should we nest the API inside objectRetentionPolicies?

Re 1: There is an intrinsic asymmetry between workloads and Job CRDs. Job CRDs are created by users who can specify spec.ttlSecondsAfterFinished. Workloads are created by Kueue, and so users don't have this control. So, I think it is justified to start from Workloads only and have the API field specific to them.

Another API maintained by Kueue that comes to my mind is ProvisioningRequest. I think having finishedProvisioningRequestRetention will make sense too (but we can leave it for later as currently we delete them eagerly anyway).

Re 2: I think it makes sense to nest the API inside objectRetentionPolicies to be future-proof in case we have some other objects with lifetime maintained by Kueue (like provisioning requests, or different policies for finished workloads depending on success /failure).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re 1: There is an intrinsic asymmetry between workloads and Job CRDs. Job CRDs are created by users who can specify spec.ttlSecondsAfterFinished. Workloads are created by Kueue, and so users don't have this control. So, I think it is justified to start from Workloads only and have the API field specific to them.

Yes, I agree with you. TBH, I don't prefer to implement the feature to do garbage collection of Jobs since I'm thinking that deletion jobs are responsible for Job's own controller like ttlSecondsAfterFinished. But this proposed API is extensible for Job API. So, I asked, "which objects will you support in this feature".

Another API maintained by Kueue that comes to my mind is ProvisioningRequest. I think having finishedProvisioningRequestRetention will make sense too (but we can leave it for later as currently we delete them eagerly anyway).

I agree with you. I believe that the separate and dedicated CRDs for this feature are overkill. So, I would recommend not introducing the dedicated CRD.

Re 2: I think it makes sense to nest the API inside objectRetentionPolicies to be future-proof in case we have some other objects with lifetime maintained by Kueue (like provisioning requests, or different policies for finished workloads depending on success /failure).

If we want to support additional Kueue-managed objects in the future,
could we add a dedicated field for each object something like this?
I guess that the criterion for garbage collection depends on the object’s kind.

workloadRetentionPolicy:
  onConditions:
    [metav1.Condition]
admissionCheckRetentionsPolicy:
  onConditions:
    [metav1.Condition]
  onResourceFlavor:
    name: xxx
    ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit more complex than I originally anticipated but I agree that it's quite extensible. TBH I lack expertise when it comes evaluating whether this would be a consistent and a good way for Kueue to support that so I'll defer to other maintainers (@mimowo, @alculquicondor). Once we reach the consensus I'll be happy to implement it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer 2. It looks more future-proof.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a side note, although. To evolve API more easily, I want to find a way to guard the new API field :)

IIUC, k/k has the feature gate guard for the new API fields like here: https://github.com/kubernetes/kubernetes/blob/1c4f540669397d031d464877ec3d2b8ac6be9b2f/pkg/apis/core/types.go#L543-L548

We can not break the new API field without bumping the API version in the current API evolving steps. I think that a new API without a guard makes API evolution harder since we need to consider future compatibility and extensibility in the first implementation.

Copy link
Contributor

@alculquicondor alculquicondor Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The feature gate only implies that the field will be ignored when disabled. It doesn't mean that we can change the API in a future version of kubernetes, without changing the API version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. That was not my expected behavior.
Thanks for letting me know.

Comment on lines 351 to 379
[//]: # (### Graduation Criteria)

[//]: # ()
[//]: # (<!--)

[//]: # ()
[//]: # (Clearly define what it means for the feature to be implemented and)

[//]: # (considered stable.)

[//]: # ()
[//]: # (If the feature you are introducing has high complexity, consider adding graduation)

[//]: # (milestones with these graduation criteria:)

[//]: # (- [Maturity levels &#40;`alpha`, `beta`, `stable`&#41;][maturity-levels])

[//]: # (- [Feature gate][feature gate] lifecycle)

[//]: # (- [Deprecation policy][deprecation-policy])

[//]: # ()
[//]: # ([feature gate]: https://git.k8s.io/community/contributors/devel/sig-architecture/feature-gates.md)

[//]: # ([maturity-levels]: https://git.k8s.io/community/contributors/devel/sig-architecture/api_changes.md#alpha-beta-and-stable-versions)

[//]: # ([deprecation-policy]: https://kubernetes.io/docs/reference/using-api/deprecation-policy/)

[//]: # (-->)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you revert these comment-outs since we need to revisit here when we want to graduate this feature to Beta/Stable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explicitly commented it out instead of removing it so it could be revisited later but maybe it needs to be there from the beginning? It's my first KEP ever so I'm learning as I go and I'll follow your guidance 🙏

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 9, 2024
@k8s-ci-robot
Copy link
Contributor

@mwysokin: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kueue-verify-main 2efe734 link true /test pull-kueue-verify-main

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@alculquicondor
Copy link
Contributor

/assign


- Support the deletion of finished Workload objects.
- Introduce configuration of global retention policies for Kueue-authored Kubernetes objects,
starting with finished Workloads.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduce configuration of global retention policies for Kueue-authored Kubernetes objects

This sentence matches Workload and ProvisioiningRequest. We don't need to clean up provisioning requests explicitly, because they are children of the Workloads.

So, for now, we only need to take care of Workloads. But I agree we should keep the API extensible for supporting the cleanup of finished jobs, but that can be addressed in a separate KEP.

and make progress.
-->

- Support the deletion/expiration of any Kueue-authored Kubernetes object other than finished Workloads.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Support the deletion/expiration of any Kueue-authored Kubernetes object other than finished Workloads.
- Support the deletion/expiration of jobs or other Kubernetes objects not authored by Kueue.


## Summary

<!--
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the comments

nitty-gritty.
-->

Add a new API called `objectRetentionPolicies`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Add a new API called `objectRetentionPolicies`,
Add a new field called `objectRetentionPolicies` to the Kueue Configuration API,

Add a new API called `objectRetentionPolicies`,
which will enable specifying a retention policy for finished Workloads under an option
called `FinishedWorkloadRetention`. This API section could be extended in the future
with options for other Kueue-authored objects.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with options for other Kueue-authored objects.
with options for other Kueue-managed objects.

-->

- Initially, other naming was considered for config keys. Namely, instead of "Objects,"
the word "Resources" was used, but based on feedback from `@alculquicondor`, it was changed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't usually keep the names of the people who suggested the changes.

-->


### API
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that API looks too complex. For the most part, we will be happy just cleaning up finished Workloads.

Let me propose the following:

objectRetentionPolicies:
  workloads:
    afterFinished: 10m

It leaves us the possibility to extend to other objects and to even add conditions, but I prefer we don't give that flexibility yet.

Comment on lines +286 to +295
[//]: # (##### Prerequisite testing updates)

[//]: # ()
[//]: # (<!--)

[//]: # (Based on reviewers feedback describe what additional tests need to be added prior)

[//]: # (implementing this enhancement to ensure the enhancements have also solid foundations.)

[//]: # (-->)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this if there aren't any requirements.

@mwysokin
Copy link
Contributor Author

FYI I just wanted to let everyone know that I didn't abandon the work. I've been on vacation last week and I'll be back in September. Sorry for the delay 🙇

@tenzen-y
Copy link
Member

FYI I just wanted to let everyone know that I didn't abandon the work. I've been on vacation last week and I'll be back in September. Sorry for the delay 🙇

No worries. I'm looking forward to moving this discussion after you come back here!

@mwysokin
Copy link
Contributor Author

I'm really sorry it's been dragging. I'm coming back to it starting Monday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/feature Categorizes issue or PR as related to a new feature. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants