VM spinup hang (potentially with nested virt) #161

dougbtv · 2018-01-16T14:46:22Z

From a report received in email:

I have filled in the inventory…etc, but when I run the virthost-setup.yml file, it seems to get stuck at this point:

TASK [vm-spinup : Run spinup for each host that doesn't exist] *********************************

Waited a good few hours, and I get no output, or anything. Checking the libvirt log file on the host, there’s nothing obviously wrong:

2018-01-16 12:10:34.164+0000: starting up libvirt version: 3.2.0, package: 14.el7_4.7 (CentOS BuildSystem <http://bugs.centos.org>, 2018-01-04-19:31:34, c1bm.rdu2.centos.org), qemu version: 1.5.3 (qemu-kvm-1.5.3-141.el7_4.6), hostname: kube-cluster
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name kube-master -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off,dump-guest-core=off -cpu Westmere -m 2048 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid aed43740-d153-44a4-b429-f5a64bbd03ea -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-4-kube-master/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/home/images/kube-master/kube-master.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/home/images/kube-master/kube-master-cidata.iso,format=raw,if=none,id=drive-ide0-0-0,readonly=on -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:1e:0d:b3,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-4-kube-master/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -spice port=5900,addr=127.0.0.1,disable-ticketing,image-compression=off,seamless-migration=on -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864 -global qxl-vga.vgamem_mb=16 -global qxl-vga.max_outputs=1 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on
char device redirected to /dev/pts/2 (label charserial0)

It seems to deploy the first VM, but gets stuck:

[root@kube-cluster qemu]# virsh list
Id    Name                           State
----------------------------------------------------
4     kube-master                    running

Running the spinup.sh sript manually seems to try the same, but also doesn’t progress.

[root@kube-cluster ~]# ./spinup.sh kube-master 2048 4
Tue, 16 Jan 2018 11:51:18 +0000 Destroying the kube-master domain (if it exists)...
Tue, 16 Jan 2018 11:51:18 +0000 Copying template image...
Tue, 16 Jan 2018 11:51:20 +0000 Generating ISO for cloud-init...
Tue, 16 Jan 2018 11:51:20 +0000 Installing the domain and adjusting the configuration...
[INFO] Installing with the following parameters:
virt-install --import --name kube-master --ram 2048 --vcpus 4 --disk     kube-master.qcow2,format=qcow2,bus=virtio --disk kube-master-cidata.iso,device=cdrom --network     bridge=br0,model=virtio  --os-type=linux --os-variant=rhel6 --noautoconsole
 
Starting install...
Domain creation completed.
 
^C
[root@kube-cluster ~]#

I then tried running the command like manually and got this error:

[root@kube-cluster ~]# virt-install --import --name kube-master --ram 2048 --vcpus 4 --disk     kube-master.qcow2,format=qcow2,bus=virtio --disk kube-master-cidata.iso,device=cdrom --network     bridge=br0,model=virtio  --os-type=linux --os-variant=rhel6 --noautoconsole
ERROR    Error: --disk kube-master.qcow2,format=qcow2,bus=virtio: Size must be specified for non existent volume 'kube-master.qcow2'
[root@kube-cluster ~]#

The text was updated successfully, but these errors were encountered:

dougbtv · 2018-01-16T14:50:24Z

One possible work-around is -- skip the virtualization host setup if you can spin up a few VMs manually. So what you'd do is, create some CentOS 7 virtual machines manually, and skip the virthost-setup.yml playbook, and then just run the kube-install.yml playbook with another manually created inventory to match the virtual machines you spun up by hand. In the case of the Multus with CRD article, I just run two guests, kube-master and kube-node-1. Here's an example inventory (and example playbook run command) in a github gist.

I actually hadn't been brave enough to try it with nested virtualization (had only tested on bare metal), but... No time better than the present!

I have two possible ideas of the cause...

It's not having good luck finding the IP address of the virtual machine, and it's hanging there. This happens in approximately lines 115-132 of the spinup.sh script. This seems like a possible place for it to hang for a lonnnng time, because it goes into a loop.
- it's untested (with the Multus setup), but, there's a variable you can change in the example Multus vars that I think could be of interest, and it's the bridge_networking=true you might try changing it to false. I added another file to the gist to show some extra variables that could change.
The output saying the volume is non-existent seems like something failed with the disk (small disk size of the virtual virtualization host, or maybe operating system image didn't download properly, could try a rebuild with more disk space)

dougbtv · 2018-01-16T19:36:51Z

Was able to replicate by starting up a (rather large, 100 gig disk, 24 gig ram) VM on an existing virthost, and run the playbooks against it, experience same hang.

In process of locating issue. First thing I tried was to change the subnet for virbr0 (didn't work), dumped output here: http://pasteall.org/770750

Apparently continually looping this line in spinup.sh:

IP=$(grep -B1 $MAC /var/lib/libvirt/dnsmasq/$BRIDGE.status | head \
                 -n 1 | awk '{print $2}' | sed -e s/\"//g -e s/,//)

dougbtv · 2018-01-16T20:11:28Z

Apparently the box has not DHCP'd at all?

[root@nestedvirt ~]# virsh list --all
 Id    Name                           State
----------------------------------------------------
 4     test123                        running

[root@nestedvirt ~]# virsh domiflist test123
Interface  Type       Source     Model       MAC
-------------------------------------------------------
vnet0      bridge     virbr0     virtio      52:54:00:d9:26:51

[root@nestedvirt ~]# nmap -sP 192.168.1.123/24 | grep 52:54:00:d9:26:51 -B 3

(note: no output from nmap intentional in notes)

dougbtv · 2018-01-30T14:47:38Z

Still digging at this one, currently trying to figure out why machines aren't necessarily starting properly. Script hanging in the same spot, I tried using a GUI to get virt-manager going because I wanted to sanity check my work on using virsh console .... Turns out, still having the same issues where I can't console to the boxen started with ./spinup.sh ... even with virt-manager. Used virt-manager to start up a machine, did work, and I could console into it.

dougbtv · 2018-01-31T16:26:25Z

Documented procedure for trying it out fix in ansible-role-vm-spinup in this gist: https://gist.github.com/dougbtv/6d9fd054c3db58124f3a302acff403fc

dougbtv · 2018-02-01T13:43:04Z

Reporter also noted that:

I did get it working by doing some of your original config (Setting the bridge to virbr0 and disabling networking bridging)

dougbtv added type:bug priority:medium labels Jan 16, 2018

dougbtv self-assigned this Jan 16, 2018

dougbtv mentioned this issue Jan 31, 2018

[hotfix][minor] adds option to skip --cpus parameter for nested virt with qemu redhat-nfvpe/ansible-role-vm-spinup#8

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VM spinup hang (potentially with nested virt) #161

VM spinup hang (potentially with nested virt) #161

dougbtv commented Jan 16, 2018

dougbtv commented Jan 16, 2018

dougbtv commented Jan 16, 2018

dougbtv commented Jan 16, 2018 •

edited

Loading

dougbtv commented Jan 30, 2018

dougbtv commented Jan 31, 2018

dougbtv commented Feb 1, 2018

VM spinup hang (potentially with nested virt) #161

VM spinup hang (potentially with nested virt) #161

Comments

dougbtv commented Jan 16, 2018

dougbtv commented Jan 16, 2018

dougbtv commented Jan 16, 2018

dougbtv commented Jan 16, 2018 • edited Loading

dougbtv commented Jan 30, 2018

dougbtv commented Jan 31, 2018

dougbtv commented Feb 1, 2018

dougbtv commented Jan 16, 2018 •

edited

Loading