Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing SKE NodePool from eu01-1 -> eu01-m fails #384

Open
roberth1988 opened this issue May 31, 2024 · 6 comments
Open

Changing SKE NodePool from eu01-1 -> eu01-m fails #384

roberth1988 opened this issue May 31, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@roberth1988
Copy link

How to reproduce:

Create SKE-Cluster, change NodePool availability_zone eu01-1 and change it to eu01-m. You will get an error: changing the order of AvailabilityZones it not allowed

Expected behaviour:

  • Node Pool gets replaced
@DiogoFerrao DiogoFerrao added the bug Something isn't working label May 31, 2024
@DiogoFerrao
Copy link
Contributor

Hey Robert,

Thank you for reporting this issue and in general for being active in doing so!

I'll be investigating this problem and will provide an update soon.

@DiogoFerrao
Copy link
Contributor

Hey @roberth1988, I have looked into it and it seems to be a case for adding a RequiresReplace modifier to the attribute.

Unless I'm missing something, I don't think we can replace the individual NodePool with the SKE API, so we would need to replace the entire cluster.

I will create a PR for this today, but if you have more information on the Node Pool replacement please reach out!

@roberth1988
Copy link
Author

roberth1988 commented Jun 3, 2024

Hi @DiogoFerrao I don't think replacing the whole cluster is the way. Please check out for example Azure Terraform / AKS.

They have a temporary_name_for_rotation... Ref: https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/kubernetes_cluster

Changing certain properties of the default_node_pool is done by cycling the system node pool of the cluster. 
When cycling the system node pool, it doesn't perform cordon and drain, and it will disrupt rescheduling pods currently running on the previous system node pool.temporary_name_for_rotation must be specified when changing any of the following properties: enable_host_encryption, enable_node_public_ip, fips_enabled, kubelet_config, linux_os_config, max_pods, only_critical_addons_enabled, os_disk_size_gb, os_disk_type, os_sku, pod_subnet_id, snapshot_id, ultra_ssd_enabled, vnet_subnet_id, vm_size, zones.

So they rotate the Node-Pool. That's something you could do too ... or by default add a suffix on the node pool name(s)
So you can replace them by creating first a new one ... then deleting the old one.

Replacing the cluster in such a case is a too huge interruption.

@DiogoFerrao
Copy link
Contributor

You have a good point there and thank you for the suggestion. I have looked at the implementation in the Azure provider, and it seems that it will work with SKE as well.

In general, it has some issues:

  • It breaks volumes that may be attached to the nodes (since they are region specific)
  • It may break node pool name specific configuration, such as kubernetes scheduling contraints or others

I tend to agree that having this is better than throwing an error, as long as its properly documented and warnings are shown. We will investigate this further and work on a solution for the problem.

In the meantime, I think we can temporarily replace the entire cluster with the plan modifier, which at least allows users to perform this operation. Once we implement the Node Pool rotation we would remove this.

@DiogoFerrao DiogoFerrao added enhancement New feature or request and removed bug Something isn't working labels Jun 3, 2024
@roberth1988
Copy link
Author

Hi @DiogoFerrao you break even more when replacing the whole cluster when we think about volumes.
When a SKE instance is replaced, it looses also completely the knowledge / information about PVC, so you need to create them manually. Further you loose all ConfigMaps, Secrets etc. etc.

So replacing on a NodePool change the whole cluster is even more harmful in the way of how Kubernetes should work.
I heavily recommend moving on to implement the rotation logic, so that's more how cloud works :)

@DiogoFerrao
Copy link
Contributor

I completely agree @roberth1988, I was suggesting implementing a stop gap solution while we plan the implementation of your recomendation in our backlog.

However, you raised important concerns about this temporary approach, so we will leave it with the error for the time being and report back to this issue when we move to implement a better solution, following your suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants