Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Update cloud-init customization #11

Merged
merged 2 commits into from
Oct 18, 2023

Conversation

dlipovetsky
Copy link
Collaborator

@dlipovetsky dlipovetsky commented Oct 12, 2023

Description

Changes relative to upstream:

  • Add explanatory comments
  • Do not use stderr output of preKubeadmCommands indicate an error with bootstrapping

Changes relative to our fork:

  • Do not enable IPv6
  • Do not remove cloud-init logs and seed
  • Do not disable VMware customization
  • Do not disable network configuration
  • Do not truncate cloud-init-output.log
  • Do not report status of HTTP proxy configuration
  • Do not configure cloud-init to remove SSH keys on first boot
  • Remove commands that are already executed as a result of being defined in preKubeadmCommands

@dlipovetsky dlipovetsky force-pushed the dlipovetsky/cloud-init branch 2 times, most recently from e024eb1 to 80126df Compare October 12, 2023 21:18
Changes relative to upstream:
* Add explanatory comments
* Do not use stderr output of preKubeadmCommands indicate an error with
  bootstrapping

Changes relative to our fork:
* Do not enable IPv6
* Do not remove cloud-init logs and seed
* Do not disable VMware customization
* Do not disable network configuration
* Do not truncate cloud-init-output.log
* Do not report status of HTTP proxy configuration
* Do not configure cloud-init to remove SSH keys on first boot
* Remove commands that are already executed as a result of being defined in `preKubeadmCommands`
{{ if .ControlPlane }}
- '[ ! -f /run/kubeadm/konvoy-set-kube-proxy-configuration.sh] && sudo reboot'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

were you able to boot the VM and create cluster after removing this file? I remember that pre kubeadm commands were failing If I removed them. I will test it out later to confirm

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question!

You're right that any preKubeadmCommand that requires an ordinary file in /run will fail after a reboot, because ordinary files in /run do not persist across reboots. We recently (in https://github.com/mesosphere/konvoy2/pull/2337) moved all patch scripts from /run to /etc for this reason.

Your question made me wonder about the two reboot calls left in this template:

{{ if .ControlPlane }}
- '[ ! -f /root/control_plane.sh ] && sudo reboot'
- '[ ! -f /run/kubeadm/kubeadm.yaml ] && sudo reboot'
- bash /root/control_plane.sh
{{ else }}
- '[ ! -f /root/node.sh ] && sudo reboot'
- '[ ! -f /run/kubeadm/kubeadm-join-config.yaml ] && sudo reboot'
- bash /root/node.sh
{{ end }}

It seems harmless to reboot if the kubeadm config (/run/kubeadm/kubeadm.yaml or /run/kubeadm/kubeadm-join-config.yaml) is not not there (yet?).

But if we reboot because the bootstrap script (/root/control_plane.sh or /root/node.sh) is missing, and the kubeadm config happens to already be present, we will lose the kubeadm config after the reboot, leading to further reboots, without end.

At this time (66ecf82), I can successfully reboot either a control plane, or worker machine.

I think it may be better to remove the remaining reboot calls. I will experiment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a comment that explains why the reboot call is necessary. I've also moved these checks out to their own script, and use a separate log file to keep track.

This is what the log looks like:

# cat /var/log/capvcd/replace-userdata-files.log
2023-10-17 22:07:37 Checking for kubeadm configuration file
2023-10-17 22:07:37 kubeadm configuration file not found, cleaning cloud-init cache and rebooting
2023-10-17 22:08:12 Checking for kubeadm configuration file
2023-10-17 22:08:12 kubeadm configuration file found, exiting

Copy link
Collaborator

@supershal supershal Oct 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty clever. just iterating the logic: until the user-data is available, the vm will reboot and cloud-init will run it as if its first boot. once it is user-data are available, the bootstrap.sh will run kubeadm init/join.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

until the user-data is available, the vm will reboot and cloud-init will run it as if its first boot. once it is user-data are available, the bootstrap.sh will run kubeadm init/join.

Correct. The upstream cloud-init also did this, but used in-line commands, instead of a script, and the reason behind everything wasn't given.

* Use shell script to clean cloud-init cache and reboot.
* Fix error handling of bootstrap script. Do not interpret stderr output
  as an indicator of failure. Do not rely on trap and errexit, because
  it does not work for command lists.
* Include last lines of output for error context.
* Ensure we have an IPv4 address for localhost.
* Remove unnecessary cloud-init configuration to preserve SSH host keys.
Copy link
Collaborator

@supershal supershal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for testing this out.

Copy link

@dkoshkin dkoshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all the comments and really digging deep into this! Great changes and so much simpler to understand.

@dlipovetsky dlipovetsky merged commit ac3388b into d2iq/release-1.1.0-1 Oct 18, 2023
1 check passed
dlipovetsky added a commit that referenced this pull request Oct 18, 2023
* feat: Update cloud-init customization

Changes relative to upstream:
* Use shell script to clean cloud-init cache and reboot.
* Fix error handling of bootstrap script. Do not interpret stderr output
  as an indicator of failure. Do not rely on trap and errexit, because
  it does not work for command lists.
* Include last lines of output for error context.
* Ensure we have an IPv4 address for localhost.
* Remove unnecessary cloud-init configuration to preserve SSH host keys.

Changes relative to our fork:
* Do not remove cloud-init logs and seed on reboot
* Do not truncate cloud-init-output.log on reboot
* Do not report status of HTTP proxy configuration
* Remove redundant commands (already executed as a result of being defined in `preKubeadmCommands`)
* Do not disable VMware customization
* Do not disable network configuration

Signed-off-by: Daniel Lipovetsky <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants