-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
upgrade kueue default version to v0.9.0 to support TAS #3260
base: experimental
Are you sure you want to change the base?
upgrade kueue default version to v0.9.0 to support TAS #3260
Conversation
@@ -41,15 +41,15 @@ variable "kueue" { | |||
description = "Install and configure [Kueue](https://kueue.sigs.k8s.io/docs/overview/) workload scheduler. A configuration yaml/template file can be provided with config_path to be applied right after kueue installation. If a template file provided, its variables can be set to config_template_vars." | |||
type = object({ | |||
install = optional(bool, false) | |||
version = optional(string, "v0.8.1") | |||
version = optional(string, "v0.9.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not update the default , but use the version we want from the blueprint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please explain why we should not update the default? My rationale for updating was that the changes should be backward compatible across versions and TAS support is independent of the node type we choose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you on all versions being backward compatible and updating the default eventually @ighosh98. We should wait till we know for a fact that 0.9.0 is stable though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me check with the Kueue team once on their confidence wrt the stability question. My understanding is that if they release a version, it should be stable.
If the concern is about TAS being in alpha phase and we want to reduce the blast radius associated with it, then I am aligned with changing the default to the older version until we are out oh alpha.
Let me know which one it is.
condition = !var.kueue.install || contains(["v0.8.1"], var.kueue.version) | ||
error_message = "Supported version of Kueue is v0.8.1" | ||
condition = !var.kueue.install || contains(["v0.9.0", "v0.8.1"], var.kueue.version) | ||
error_message = "Default Supported version of Kueue is v0.9.0. Cluster toolkit also supports v0.8.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Update error message to just output supported versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updating this
config_path = optional(string, null) | ||
config_template_vars = optional(map(any), null) | ||
}) | ||
default = {} | ||
|
||
validation { | ||
condition = !var.kueue.install || contains(["v0.8.1"], var.kueue.version) | ||
error_message = "Supported version of Kueue is v0.8.1" | ||
condition = !var.kueue.install || contains(["v0.9.0", "v0.8.1"], var.kueue.version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Store this in a list, and use it in the error message as well as the condition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting clarification: Isn't ["v0.9.0", "v0.8.1"] already a list? Are you recommending that we should store it in a new variable and then use it elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, use a new variable and avoid the hardcoded versions in the error message. Use the variable in the error message
Added support for kueue v0.9.0 which was released recently with TAS enabled for all clusters. As aligned offline, Kueue-v0.9.0 would now be the default Kueue version choice for customers.
kueue-v0.9.0 has TAS enabled in alpha phase. Have added necessary tolerations to kueue-v0.9.0 file to make the manifest compatible with cluster-toolkit.
Testing Information
Testing was done on a2-high clusters.
Ran the command:
kubectl get pods -o custom-columns="Name:.metadata.name,Host:.spec.nodeSelector.cloud\.google\.com/gce-topology-host" | sort -k2
Job Annotation:
Output:
Was testing host topology compliance required job. Observe that all pods are on the same host:
Submission Checklist
NOTE: Community submissions can take up to 2 weeks to be reviewed.
Please take the following actions before submitting this pull request.