[libvirt] problems starting several qemu VMS simultaneously

Hi, I grabbed today's git head of libvirt. Created a VM (clean install of ubuntu oneiric, installed through virt-manager), and cloned it 3 times. Then I did serge@ubuntu:~$ for i in `seq 1 4`; do virsh start o$i > /tmp/o$i 2>&1 & done [1] 12184 [2] 12185 [3] 12186 [4] 12187 serge@ubuntu:~$ virsh list error: Failed to list active domains error: End of file while reading data: Input/output error serge@ubuntu:~$ virsh list Id Name State ---------------------------------------------------- 5 o2 shut off 7 o3 shut off 8 o4 shut off [1] Exit 1 virsh start o$i > /tmp/o$i 2>&1 [2] Done virsh start o$i > /tmp/o$i 2>&1 serge@ubuntu:~$ virsh list Id Name State ---------------------------------------------------- 5 o2 running 7 o3 running 8 o4 running [3]- Done virsh start o$i > /tmp/o$i 2>&1 [4]+ Done virsh start o$i > /tmp/o$i 2>&1 serge@ubuntu:~$ cat /tmp/o1 error: Failed to start domain o1 error: Unable to wait for child process: Bad file descriptor It's quite reproducible. An ubuntu bug report was filed at https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/961217 It's also very reminiscent of some lxc startup handshake issues which have been fixed. You can find the last chunk of the log file at http://people.canonical.com/~serge/libvirtd-parallel-startup.log -serge

At 03/22/2012 06:54 AM, Serge Hallyn Wrote:
Hi,
I grabbed today's git head of libvirt. Created a VM (clean install of ubuntu oneiric, installed through virt-manager), and cloned it 3 times. Then I did
serge@ubuntu:~$ for i in `seq 1 4`; do virsh start o$i > /tmp/o$i 2>&1 & done [1] 12184 [2] 12185 [3] 12186 [4] 12187 serge@ubuntu:~$ virsh list error: Failed to list active domains error: End of file while reading data: Input/output error
I cannot reproduce this problem on RHEL6.
serge@ubuntu:~$ virsh list Id Name State ---------------------------------------------------- 5 o2 shut off 7 o3 shut off 8 o4 shut off
[1] Exit 1 virsh start o$i > /tmp/o$i 2>&1 [2] Done virsh start o$i > /tmp/o$i 2>&1 serge@ubuntu:~$ virsh list Id Name State ---------------------------------------------------- 5 o2 running 7 o3 running 8 o4 running
[3]- Done virsh start o$i > /tmp/o$i 2>&1 [4]+ Done virsh start o$i > /tmp/o$i 2>&1 serge@ubuntu:~$ cat /tmp/o1 error: Failed to start domain o1 error: Unable to wait for child process: Bad file descriptor
It's quite reproducible. An ubuntu bug report was filed at https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/961217
It's also very reminiscent of some lxc startup handshake issues which have been fixed.
You can find the last chunk of the log file at http://people.canonical.com/~serge/libvirtd-parallel-startup.log
This url cannot be opened: You don't have permission to access /~serge/libvirtd-parallel-startup.log on this server. Thanks Wen Congyang
-serge
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

Quoting Wen Congyang (wency@cn.fujitsu.com):
At 03/22/2012 06:54 AM, Serge Hallyn Wrote:
Hi,
I grabbed today's git head of libvirt. Created a VM (clean install of ubuntu oneiric, installed through virt-manager), and cloned it 3 times. Then I did
serge@ubuntu:~$ for i in `seq 1 4`; do virsh start o$i > /tmp/o$i 2>&1 & done [1] 12184 [2] 12185 [3] 12186 [4] 12187 serge@ubuntu:~$ virsh list error: Failed to list active domains error: End of file while reading data: Input/output error
I cannot reproduce this problem on RHEL6.
... Hmm, today I can't either. With various packages and built binaries. I wonder if it has to do with the kernel. I'll re-try an install from scratch. Thanks for trying!
This url cannot be opened: You don't have permission to access /~serge/libvirtd-parallel-startup.log on this server.
Sorry, fixed. -serge

Quoting Serge Hallyn (serge.hallyn@canonical.com):
Quoting Wen Congyang (wency@cn.fujitsu.com):
At 03/22/2012 06:54 AM, Serge Hallyn Wrote:
Hi,
I grabbed today's git head of libvirt. Created a VM (clean install of ubuntu oneiric, installed through virt-manager), and cloned it 3 times. Then I did
serge@ubuntu:~$ for i in `seq 1 4`; do virsh start o$i > /tmp/o$i 2>&1 & done [1] 12184 [2] 12185 [3] 12186 [4] 12187 serge@ubuntu:~$ virsh list error: Failed to list active domains error: End of file while reading data: Input/output error
(Note that most of the time, virsh list actually succeeds, only one of the virsh start's fails)
I cannot reproduce this problem on RHEL6.
...
Hmm, today I can't either. With various packages and built binaries. I wonder if it has to do with the kernel. I'll re-try an install from scratch.
It's not the kernel. (Well, at least not the oneiric kernel). However I have to do the 'virsh list' very fast, so it happens while the others are still starting up. That seems to be the trigger. Without that, all start up fine. With that, more often than not I get at least one failure. I'll try to reproduce with rhel6, but setup will take me some time. Or maybe I'll first add some debugging info to qemuDomainObjBeginJobInternal() and qemuProcessStart(). Since virsh always seems to lose the connection while waiting for handshake from child, it seems a good place to check. -serge

Quoting Serge Hallyn (serge.hallyn@canonical.com):
Quoting Serge Hallyn (serge.hallyn@canonical.com):
Quoting Wen Congyang (wency@cn.fujitsu.com):
At 03/22/2012 06:54 AM, Serge Hallyn Wrote:
Hi,
I grabbed today's git head of libvirt. Created a VM (clean install of ubuntu oneiric, installed through virt-manager), and cloned it 3 times. Then I did
serge@ubuntu:~$ for i in `seq 1 4`; do virsh start o$i > /tmp/o$i 2>&1 & done [1] 12184 [2] 12185 [3] 12186 [4] 12187 serge@ubuntu:~$ virsh list error: Failed to list active domains error: End of file while reading data: Input/output error
(Note that most of the time, virsh list actually succeeds, only one of the virsh start's fails)
I cannot reproduce this problem on RHEL6.
At last, found the trigger. I can now reproduce this on uptodate fedora 16, using http://people.canonical.com/~serge/breaklibvirt.sh (run as root from /home/$user). The trigger is the qemu hook. If I don't have a qemu hook, then I can do 100 runs of the parallel starts and lists with no failures. But introduce a slow hook, and the very first run always fails. -serge
participants (2)
-
Serge Hallyn
-
Wen Congyang