Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VM spinup hang (potentially with nested virt) #161

Open
dougbtv opened this issue Jan 16, 2018 · 6 comments
Open

VM spinup hang (potentially with nested virt) #161

dougbtv opened this issue Jan 16, 2018 · 6 comments

Comments

@dougbtv
Copy link
Member

dougbtv commented Jan 16, 2018

From a report received in email:

I have filled in the inventory…etc, but when I run the virthost-setup.yml file, it seems to get stuck at this point:

TASK [vm-spinup : Run spinup for each host that doesn't exist] *********************************

Waited a good few hours, and I get no output, or anything. Checking the libvirt log file on the host, there’s nothing obviously wrong:

2018-01-16 12:10:34.164+0000: starting up libvirt version: 3.2.0, package: 14.el7_4.7 (CentOS BuildSystem <http://bugs.centos.org>, 2018-01-04-19:31:34, c1bm.rdu2.centos.org), qemu version: 1.5.3 (qemu-kvm-1.5.3-141.el7_4.6), hostname: kube-cluster
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name kube-master -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off,dump-guest-core=off -cpu Westmere -m 2048 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid aed43740-d153-44a4-b429-f5a64bbd03ea -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-4-kube-master/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/home/images/kube-master/kube-master.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/home/images/kube-master/kube-master-cidata.iso,format=raw,if=none,id=drive-ide0-0-0,readonly=on -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:1e:0d:b3,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-4-kube-master/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -spice port=5900,addr=127.0.0.1,disable-ticketing,image-compression=off,seamless-migration=on -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864 -global qxl-vga.vgamem_mb=16 -global qxl-vga.max_outputs=1 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on
char device redirected to /dev/pts/2 (label charserial0)

It seems to deploy the first VM, but gets stuck:

[root@kube-cluster qemu]# virsh list
Id    Name                           State
----------------------------------------------------
4     kube-master                    running

Running the spinup.sh sript manually seems to try the same, but also doesn’t progress.

[root@kube-cluster ~]# ./spinup.sh kube-master 2048 4
Tue, 16 Jan 2018 11:51:18 +0000 Destroying the kube-master domain (if it exists)...
Tue, 16 Jan 2018 11:51:18 +0000 Copying template image...
Tue, 16 Jan 2018 11:51:20 +0000 Generating ISO for cloud-init...
Tue, 16 Jan 2018 11:51:20 +0000 Installing the domain and adjusting the configuration...
[INFO] Installing with the following parameters:
virt-install --import --name kube-master --ram 2048 --vcpus 4 --disk     kube-master.qcow2,format=qcow2,bus=virtio --disk kube-master-cidata.iso,device=cdrom --network     bridge=br0,model=virtio  --os-type=linux --os-variant=rhel6 --noautoconsole
 
Starting install...
Domain creation completed.
 
^C
[root@kube-cluster ~]#

I then tried running the command like manually and got this error:

[root@kube-cluster ~]# virt-install --import --name kube-master --ram 2048 --vcpus 4 --disk     kube-master.qcow2,format=qcow2,bus=virtio --disk kube-master-cidata.iso,device=cdrom --network     bridge=br0,model=virtio  --os-type=linux --os-variant=rhel6 --noautoconsole
ERROR    Error: --disk kube-master.qcow2,format=qcow2,bus=virtio: Size must be specified for non existent volume 'kube-master.qcow2'
[root@kube-cluster ~]#
@dougbtv
Copy link
Member Author

dougbtv commented Jan 16, 2018

One possible work-around is -- skip the virtualization host setup if you can spin up a few VMs manually. So what you'd do is, create some CentOS 7 virtual machines manually, and skip the virthost-setup.yml playbook, and then just run the kube-install.yml playbook with another manually created inventory to match the virtual machines you spun up by hand. In the case of the Multus with CRD article, I just run two guests, kube-master and kube-node-1. Here's an example inventory (and example playbook run command) in a github gist.

I actually hadn't been brave enough to try it with nested virtualization (had only tested on bare metal), but... No time better than the present!

I have two possible ideas of the cause...

  • It's not having good luck finding the IP address of the virtual machine, and it's hanging there. This happens in approximately lines 115-132 of the spinup.sh script. This seems like a possible place for it to hang for a lonnnng time, because it goes into a loop.
    • it's untested (with the Multus setup), but, there's a variable you can change in the example Multus vars that I think could be of interest, and it's the bridge_networking=true you might try changing it to false. I added another file to the gist to show some extra variables that could change.
  • The output saying the volume is non-existent seems like something failed with the disk (small disk size of the virtual virtualization host, or maybe operating system image didn't download properly, could try a rebuild with more disk space)

@dougbtv
Copy link
Member Author

dougbtv commented Jan 16, 2018

Was able to replicate by starting up a (rather large, 100 gig disk, 24 gig ram) VM on an existing virthost, and run the playbooks against it, experience same hang.

In process of locating issue. First thing I tried was to change the subnet for virbr0 (didn't work), dumped output here: http://pasteall.org/770750

Apparently continually looping this line in spinup.sh:

IP=$(grep -B1 $MAC /var/lib/libvirt/dnsmasq/$BRIDGE.status | head \
                 -n 1 | awk '{print $2}' | sed -e s/\"//g -e s/,//)

@dougbtv
Copy link
Member Author

dougbtv commented Jan 16, 2018

Apparently the box has not DHCP'd at all?

[root@nestedvirt ~]# virsh list --all
 Id    Name                           State
----------------------------------------------------
 4     test123                        running

[root@nestedvirt ~]# virsh domiflist test123
Interface  Type       Source     Model       MAC
-------------------------------------------------------
vnet0      bridge     virbr0     virtio      52:54:00:d9:26:51

[root@nestedvirt ~]# nmap -sP 192.168.1.123/24 | grep 52:54:00:d9:26:51 -B 3

(note: no output from nmap intentional in notes)

@dougbtv
Copy link
Member Author

dougbtv commented Jan 30, 2018

Still digging at this one, currently trying to figure out why machines aren't necessarily starting properly. Script hanging in the same spot, I tried using a GUI to get virt-manager going because I wanted to sanity check my work on using virsh console .... Turns out, still having the same issues where I can't console to the boxen started with ./spinup.sh ... even with virt-manager. Used virt-manager to start up a machine, did work, and I could console into it.

@dougbtv
Copy link
Member Author

dougbtv commented Jan 31, 2018

Documented procedure for trying it out fix in ansible-role-vm-spinup in this gist: https://gist.github.com/dougbtv/6d9fd054c3db58124f3a302acff403fc

@dougbtv
Copy link
Member Author

dougbtv commented Feb 1, 2018

Reporter also noted that:

I did get it working by doing some of your original config (Setting the bridge to virbr0 and disabling networking bridging)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant