-
Notifications
You must be signed in to change notification settings - Fork 584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ELB health check fails with Kubernetes >=v1.30.x #5139
Comments
/triage accepted |
I'm running into this as well. |
/priority critical-urgent |
This was discussed at the office hours 14th October 2024. The summary is that:
|
/help |
@richardcase: GuidelinesPlease ensure that the issue body includes answers to the following questions:
For more details on the requirements of such an issue, please see here and ensure that they are met. If this request no longer meets these requirements, the label can be removed In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/milestone v2.8.0 |
/milestone v2.7.2 |
I tried setting the tls cipher suites but that didn't work: https://gist.github.com/richardcase/47118a404bc832904c399ba1360462f2 |
@richardcase I wasn't able to just apply your spec directly because of some IAM issues, but was able to create by explicitly setting this public AMI:
AWSCluster
and KCP:
|
I got it working with this additional argument in the template (I'm using
Do we really want to hardcode these less secure settings in the template? This makes it very likely for users to blindly take it over. I'm rather thinking of other options:
|
We talked in the office hours to consider switching the default type to NLB. Next steps:
|
This issue is labeled with You can:
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/ /remove-triage accepted |
The reconciler fails afterwards. Looks like a change in the loadbalancer naming scheme has happened as well (
Even if it did work, it would break existing clients as there is no way to convert ELB, just create a new one and then manually (or with the help of DNS, think Route 53) migrate the existing clients.
This looks interesting as the enabler of the proper way forward in my opinion (read on). Since the working (NLB) alternative uses the TCP healthcheck by default - why not make the default classic ELB use TCP healthchecks by default as a current workaround? I can agree that the verification of SSL/TLS is good beyond just TCP but still not as comprehensive as proper HTTPS healthcheck and so this default change does not really dumb down the defaults much in my opinion. Note there are no raw SSL/TLS checks in non-classic ELBs. I see the way forward would be to migrate to NLB with HTTPS checks as they should be the most comprehensive (and working), i.e.:
See also notes below. Other notes:
|
/kind bug
What steps did you take and what happened:
Follow the quickstart documentation with Kubernetes v1.30.5 and a custom built AMI (the public AMIs are missing for that version and the default v1.31.0 version).
The ELB Health Check fails and the cluster is stuck after creating the first control-plane instance. The AWS console shows that 0 of 1 instanced are in service.
What did you expect to happen:
The defaults should result in a working cluster.
Anything else you would like to add:
Changing the health check to TCP in the AWS console did fix the check, but this update is not allowed by a webhook here and even after removing the webhook, the new value from AWSCluster never got updated.
Setting this on the apiserver and other control-plane components allowed the ELB health check to pass
Some discussion about this in the Kuberentes slack https://kubernetes.slack.com/archives/C3QUFP0QM/p1726622974749509
Environment:
kubectl version
): v1.30.5/etc/os-release
):The text was updated successfully, but these errors were encountered: