Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add ValidateRayClusterSpec to Webhook #2739

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

CheyuWu
Copy link
Contributor

@CheyuWu CheyuWu commented Jan 13, 2025

Why are these changes needed?

ValidateRayClusterSpec validation should also be invoked in the optional validation webhook

  • The validation webhook is usually the recommended way to validate a k8s CR, but users can choose not to enable it. That is the reason we need to validate both reconciliation and the webhook.
  • Add constant.go to store some necessary variables which copy from controllers/ray/utils
  • Add utils.go to store some necessary functions which copy from controllers/ray/utils

Related issue number

Closes #2694

Related Discussion:

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

const RAY_REDIS_ADDRESS = "RAY_REDIS_ADDRESS"

// Ray GCS FT related annotations
const RayFTEnabledAnnotationKey = "ray.io/ft-enabled"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create references to them with the same name in the package utils?

I think it should be something like this

package utils

const RayContainerIndex = rayv1.RayContainerIndex

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I have a misunderstanding

@CheyuWu
Copy link
Contributor Author

CheyuWu commented Jan 13, 2025

  • Reference all of the constant variables in utils
  • Reference the EnvVarExists func in utils

}
if r.Annotations[RayFTEnabledAnnotationKey] != "true" &&
len(r.Spec.HeadGroupSpec.Template.Spec.Containers) > 0 &&
r.Spec.HeadGroupSpec.Template.Spec.Containers[RayContainerIndex].Env != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r.Spec.HeadGroupSpec.Template.Spec.Containers[RayContainerIndex].Env != nil check is not necessary.

return field.Invalid(
field.NewPath("spec").Child("gcsFaultToleranceOptions"),
r.Spec.GcsFaultToleranceOptions,
fmt.Sprintf("GcsFaultToleranceOptions should be nil when %s is disabled", RayFTEnabledAnnotationKey),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when %s is disabled might be confusing.

Suggested change
fmt.Sprintf("GcsFaultToleranceOptions should be nil when %s is disabled", RayFTEnabledAnnotationKey),
fmt.Sprintf("GcsFaultToleranceOptions should be nil when %s annotation is set to false", RayFTEnabledAnnotationKey),

return field.Invalid(
field.NewPath("spec").Child("headGroupSpec").Child("template").Child("spec").Child("containers").Index(0).Child("env"),
RAY_REDIS_ADDRESS,
fmt.Sprintf("%s should not be set when %s is disabled", RAY_REDIS_ADDRESS, RayFTEnabledAnnotationKey),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fmt.Sprintf("%s should not be set when %s is disabled", RAY_REDIS_ADDRESS, RayFTEnabledAnnotationKey),
fmt.Sprintf("%s environment variable should not be set when %s annotation is not set to true", RAY_REDIS_ADDRESS, RayFTEnabledAnnotationKey),

@CheyuWu CheyuWu force-pushed the feat/gcsFT/webhook_val branch from e579e66 to ce2c194 Compare January 14, 2025 15:38
@CheyuWu
Copy link
Contributor Author

CheyuWu commented Jan 14, 2025

  • Rebase upstream master
  • Remove r.Spec.HeadGroupSpec.Template.Spec.Containers[RayContainerIndex].Env != nil check
  • Change confusing message

Copy link
Contributor

@rueian rueian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow this https://github.com/ray-project/kuberay/pull/2749/files to update error messages.

@CheyuWu CheyuWu force-pushed the feat/gcsFT/webhook_val branch from ce2c194 to 8ba5b91 Compare January 15, 2025 16:54
@CheyuWu
Copy link
Contributor Author

CheyuWu commented Jan 15, 2025

  • Rebase upstream/master
  • Add unit test for EnvVarExists
  • Fix the test case name in raycluster_controller_unit_test.go
  • Update the func logic, test case, and error message in https://github.com/ray-project/kuberay/pull/2749/files
  • Move IsGCSFaultToleranceEnabled to rayv1/pods to avoid import cycle and add unit test
  • Add REDIS_PASSWORD to ray/v1/constant.go

PTAL, thx

@CheyuWu
Copy link
Contributor Author

CheyuWu commented Jan 15, 2025

  • Add containers spec in HeadGroupSpec and add containers spec in WorkerGroupSpec to ensure the new change won't interfere with the test case

@CheyuWu
Copy link
Contributor Author

CheyuWu commented Jan 15, 2025

  • Add RayStartParams in workerGroupSpec

Comment on lines 108 to 109
field.NewPath("spec").Child("workerGroupSpecs"),
r.Spec.WorkerGroupSpecs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
field.NewPath("spec").Child("workerGroupSpecs"),
r.Spec.WorkerGroupSpecs,
field.NewPath("spec").Child("workerGroupSpecs").Index(i),
workerGroup,

@CheyuWu
Copy link
Contributor Author

CheyuWu commented Jan 16, 2025

  • Add invalid workerGroupSpec's index
  • Assign error workgroup value

@kevin85421
Copy link
Member

kevin85421 commented Jan 17, 2025

I will revisit this PR after the release. I’m not sure whether the change is worth making the codebase more complex to support the validation webhook, which most KubeRay users don’t use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[GCS FT] GCS FT misconfiguration
3 participants