Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore from backup fails to bring up healthy clusters for longer cluster and machine pool names #640

Open
ev-hines opened this issue Jan 6, 2025 · 0 comments

Comments

@ev-hines
Copy link

ev-hines commented Jan 6, 2025

Rancher Server Setup

  • Rancher version: 2.9.2
  • Installation option (Docker install/Helm Chart): Helm Chart
  • Kubernetes Version and Engine: 1.30.5+rke2

Describe the bug
Due to name length limits in kubernetes resources, the following resourceNameRegexp rule fails to find all possible machine-states. This causes restores to break for clusters due to empty secrets. Other resources in any regexp rules that search for a resource ending with a specific value are also likely to be affected by this issue.

    resourceNameRegexp: "machine-plan$|rke-state$|machine-state$|machine-driver-secret$|machine-provision$|^harvesterconfig|^registryconfig-auth"

To Reproduce
Steps to reproduce the behavior:

  1. Create a cluster with a machine pool with names such that the sum of their lengths is >= 39 (a lower number of around 30-35 will likely work but this what I tested)
  2. Create a backup
  3. Tear down the rancher cluster and spin up a new one
  4. Restore the cluster from the backup
  5. The downstream cluster will never become healthy after restore and instead show a waiting for machine <project>/<machine_name> driver config to be saved forever.
  6. The rke.cattle.io/machine-state for that node will show 0 keys
  7. Check the backup tar ball and you will see there are no machine-state secrets in the backup.

Expected behavior
The secrets are properly backed up

Screenshots
Screenshot attached

Screenshot 2024-11-04 at 1 58 42 PM (1)

Console output of secrets with 0 keys

fleet-default                                             osc-nonprd-test-test-control-plane-az1-jnt7g-ckwkv-machin-1be8a         rke.cattle.io/machine-state                   0      13m 
fleet-default                                             osc-nonprd-test-test-control-plane-az2-2sl6m-nv9h2-machin-d565f         rke.cattle.io/machine-state                   0      13m 
fleet-default                                             osc-nonprd-test-test-control-plane-az3-hmztw-dk65z-machin-c8097         rke.cattle.io/machine-state                   0      13m 

Additional context
As shown above, this seems to occur because the backup searchss for secrets to backup by name suffixes, not a label or secret type, which tend to get truncated when the name gets too long but still within allowed limits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant