On Mon, Jan 27, 2014 at 11:28:31AM +0000, Daniel P. Berrange wrote:
On Fri, Jan 24, 2014 at 05:17:02PM +0100, Martin Kletzander wrote:
> On Fri, Jan 24, 2014 at 12:56:43PM +0000, Daniel P. Berrange wrote:
> > On Thu, Jan 23, 2014 at 07:47:54PM +0200, Pavel Fux wrote:
> > > there are 8 servers with 8 vms on each server. all the qcow images are on
> > > the nfs share on the same external server.
> > > we are starting all 64 vms at the same time.
> > > each vm is 2.5GB X 64vms = 160GB = 1280Gb
> > > to read all of the data on a 1Gbe interface will take 1280sec = 21.3min
> > > not all of the image is being read on boot so it takes only 5min
> >
> > That's interesting, but it still doesn't explain the failures. QEMU
will
> > start listening on its monitor socket before it even opens any of the
> > disk images. So the time it takes to read disk images on boot should have
> > no relevance to timeouts waiting for the monitor socket. All it does between
> > exec of the QEMU binary and listening for the monitor socket is to loaded
> > libraries QEMU is linked against and load a few misc pieces like BIOS
> > firmware blobs. I just can't see a reason why this would take anywhere
> > near 5 minutes - it should be a matter of a few seconds at worst.
> >
>
> I think it does a little bit more than that, but I have no proof for
> it. When you look for most occurrences of this error wrt virt-manager
> (I'm not sure why, maybe because people using virsh deal with it
> themselves), you'll find that most of them are caused by a managed
> save. When qemu is loading, it takes more than those 3 seconds we had
> before, and it fails to start the machine. The thing is that there is
> nothing else weird on those machines, removing the managed save solves
> everything. And that's why I think it at least loads some
> initialization values (in some special cases), although I haven't been
> able to reproduce that.
Hmm, I was thinking it might be something related to socket connect/accept
synchronization. QEMU will listen() very early, but won't accept() until
very late in startup. I've just confirmed in a test though that connect()
will succeed even if the app doesn't call accept(), since the kernel will
complete the connection at the protocol level and just queue the client.
So that doesn't explain it yet.
I did a test with QEMU by adding a 'sleep(20)' into the QEMU main()
method in vl.c. It only causes QEMU startup failures if we put the
sleep right after parsing command line args. Once QEMU has done a
listen() on the socket, libvirt handles arbitrary delays without
issue.
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|