tools

sheetalsingala

Aug 12, 2020

f3b30a6 · Aug 12, 2020

Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md	Overview for VM-based install	Mar 18, 2020
clean-vms.sh	clean-vms.sh	Service for powering on VMs	Aug 12, 2020
provision-vms.sh	provision-vms.sh	Service for powering on VMs	Aug 12, 2020
vbmc-funcs.sh	vbmc-funcs.sh	More support for VMs	Mar 17, 2020
vm-boot-helper.sh	vm-boot-helper.sh	Remove comments from vm-boot-helper	Mar 19, 2020

README.md

Virtualized Install

If you would like to use kni-upi-lab with virtual masters and workers instead of baremetal, follow these steps.

Overview

When a virtualized install is requested, we will create VMs for your master and workers nodes based on the replica counts set in your cluster/install-config.yaml. We also create vBMC controllers for each VM. After creating those VMs, we inject their data (IPMI details, interface names, MAC addresses, etc) back into cluster/install-config.yaml as well as into cluster/site-config.yaml.

The VMs are created with reboot behavior set to destroy the VM rather than to actually reboot it. This is due to peculiar behavior of vBMC and its interaction with libvirt. When a VM is set to PXE boot via IPMI operating through vBMC, the VM definition loses its disk boot option. The VM must then be destroyed, set to disk boot via IPMI/vBMC, and then started again to be able to boot from the disk. Leaving the VM reboot behavior as the default (which is to just restart the guest OS) results in the VM looping in PXE boot mode (because it re-uses its modified VM definition that lacks the disk boot option). Since we want to PXE boot just once, our approach is there:

Run a helper script that virsh starts any VM belonging to the cluster that is currently powered-off.
Set the VMs to be destroyed on reboot.
Use vBMC to PXE boot the VMs (and start them), sleep a few seconds, and then set the VMs to disk boot.
Since the VMs are already booting, the disk-boot command does not affect the current boot. It just modifies the VM definitions.
VMs finish PXE booting and attempt to reboot, but are powered-off instead due to our settings.
Our helper script sees the VMs powered-off and starts them.
Since our disk-boot command modified the VM definitions to boot from disk, they now boot from disk as they're started.

Steps

Ensure that libvirt is installed on your provisioning host and that the default storage pool is created and started.
Ensure that virt-install is installed on your provisioning host.
Ensure that virtualbmc is installed on your provisioning host.
Edit common.sh as described in the README, but set the VIRTUALIZED_INSTALL variable to true.
Edit your cluster/install-config.yaml as described in the README, but do not worry about platform.hosts, as it will be automatically populated for you.
Edit your cluster/site-config.yaml as described in the README, but do not worry about provisioningInfrastructure.hosts, provisioningInfrastructure.virtualMasters nor provisioningInfrastructure.virtualWorkers, as these will be automatically populated for you.

Now execute:

./prep_bm_host.sh
make all
make con-start
./scripts/manage.sh deploy cluster

Monitor oc get nodes until you see your masters in the Ready state. If this step seems to be taking a long time, please see troubleshooting below.
Once the masters are in the Ready state, execute ./scripts/manage.sh deploy workers.
Monitor oc get nodes until you see your workers in the Ready state. If this step seems to be taking a long time, please see troubleshooting below.

Troubleshooting

Sometimes the libvirt VMs get stuck when the guest OS attempts to reboot during the installation process. If you are interactively deploying, simply executing virsh destroy <VM name> on a stuck node should fix the problem. The easiest way to detect if this problem is indeed happening is to connect to the VM with vncviewer and check if the console is just a persistent blinking cursor. We have found that this issue usually resolves itself, but it takes around 30 minutes to do so.
Sometimes the auto-approval process fails when attempting to add worker nodes to the cluster. We are currently investigating this issue. As a workaround, you can simply execute ./scripts/auto_approver.sh to manually approve the workers (once their CSRs appear in oc get csr).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

tools

tools

README.md

Virtualized Install

Overview

Steps

Troubleshooting

Files

tools

Directory actions

More options

Directory actions

More options

Latest commit

History

tools

Folders and files

parent directory

README.md

Virtualized Install

Overview

Steps

Troubleshooting