[libvirt] [PATCH 0/2] Try to get rid of most monitor timeout errors

This is basically v3 of the patch Pavel Fux sent [1] with the addition of changing the default as discussed in the same thread [2]. Martin [1] https://www.redhat.com/archives/libvir-list/2014-January/msg00060.html [2] https://www.redhat.com/archives/libvir-list/2014-January/msg00367.html Martin Kletzander (1): qemu: Change the default unix monitor timeout Pavel Fux (1): qemu: Add support for changing timeout value to open unix monitor socket src/qemu/libvirtd_qemu.aug | 3 +++ src/qemu/qemu.conf | 12 ++++++++++++ src/qemu/qemu_conf.c | 2 ++ src/qemu/qemu_conf.h | 2 ++ src/qemu/qemu_monitor.c | 20 +++++++++++++++++--- src/qemu/test_libvirtd_qemu.aug.in | 1 + 6 files changed, 37 insertions(+), 3 deletions(-) -- 1.8.5.2

From: Pavel Fux <pavel@stratoscale.com> Adding an option to change monitor socket opening timeout the current default is 3 seconds and in some cases it's not enough Signed-off-by: Pavel Fux <pavel@stratoscale.com> Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- Notes: I modified the description in the config file, made the use of the opaque argument in qemuMonitorOpen and rebased it on current master. I also added the config options in augeas test to make the 'make check' pass. src/qemu/libvirtd_qemu.aug | 3 +++ src/qemu/qemu.conf | 12 ++++++++++++ src/qemu/qemu_conf.c | 2 ++ src/qemu/qemu_conf.h | 2 ++ src/qemu/qemu_monitor.c | 18 ++++++++++++++++-- src/qemu/test_libvirtd_qemu.aug.in | 1 + 6 files changed, 36 insertions(+), 2 deletions(-) diff --git a/src/qemu/libvirtd_qemu.aug b/src/qemu/libvirtd_qemu.aug index a9ff421..29e756b 100644 --- a/src/qemu/libvirtd_qemu.aug +++ b/src/qemu/libvirtd_qemu.aug @@ -85,6 +85,8 @@ module Libvirtd_qemu = | int_entry "migration_port_min" | int_entry "migration_port_max" + let monitor_entry = int_entry "monitor_socket_open_timeout" + (* Each entry in the config is one of the following ... *) let entry = vnc_entry | spice_entry @@ -96,6 +98,7 @@ module Libvirtd_qemu = | device_entry | rpc_entry | network_entry + | monitor_entry let comment = [ label "#comment" . del /#[ \t]*/ "# " . store /([^ \t\n][^\n]*)?/ . del /\n/ "\n" ] let empty = [ label "#empty" . eol ] diff --git a/src/qemu/qemu.conf b/src/qemu/qemu.conf index 17f1b10..6217b49 100644 --- a/src/qemu/qemu.conf +++ b/src/qemu/qemu.conf @@ -463,3 +463,15 @@ # #migration_port_min = 49152 #migration_port_max = 49215 + + +# Override the time (in seconds) for which libvirt waits until the +# qemu monitor to shows up. +# +# If you sometimes get the message "monitor socket did not show up: No +# such file or directory" that could be because libvirt did not wait +# enough time, you can try increasing this timeout. +# +# Default is 3 +# +#monitor_socket_open_timeout = 60 diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c index 4378791..7f9c7f6 100644 --- a/src/qemu/qemu_conf.c +++ b/src/qemu/qemu_conf.c @@ -575,6 +575,8 @@ int virQEMUDriverConfigLoadFile(virQEMUDriverConfigPtr cfg, GET_VALUE_STR("migration_address", cfg->migrationAddress); + GET_VALUE_LONG("monitor_socket_open_timeout", cfg->monitorSocketOpenTimeout); + ret = 0; cleanup: diff --git a/src/qemu/qemu_conf.h b/src/qemu/qemu_conf.h index 1f44a76..4bbb86b 100644 --- a/src/qemu/qemu_conf.h +++ b/src/qemu/qemu_conf.h @@ -164,6 +164,8 @@ struct _virQEMUDriverConfig { char *migrationAddress; int migrationPortMin; int migrationPortMax; + + int monitorSocketOpenTimeout; }; /* Main driver state */ diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 1fa1492..f34527a 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -34,6 +34,7 @@ #include "qemu_monitor_json.h" #include "qemu_domain.h" #include "qemu_process.h" +#include "qemu_conf.h" #include "virerror.h" #include "viralloc.h" #include "virlog.h" @@ -267,14 +268,24 @@ static void qemuMonitorDispose(void *obj) static int -qemuMonitorOpenUnix(const char *monitor, pid_t cpid) +qemuMonitorOpenUnix(const char *monitor, pid_t cpid, virQEMUDriverPtr driver) { + virQEMUDriverConfigPtr cfg = NULL; struct sockaddr_un addr; int monfd; int timeout = 3; /* In seconds */ int ret; size_t i = 0; + if (driver) { + cfg = virQEMUDriverGetConfig(driver); + if (cfg->monitorSocketOpenTimeout > 0){ + timeout = cfg->monitorSocketOpenTimeout; + } + virObjectUnref(cfg); + cfg = NULL; + } + if ((monfd = socket(AF_UNIX, SOCK_STREAM, 0)) < 0) { virReportSystemError(errno, "%s", _("failed to create socket")); @@ -849,11 +860,14 @@ qemuMonitorOpen(virDomainObjPtr vm, int fd; bool hasSendFD = false; qemuMonitorPtr ret; + virQEMUDriverPtr driver = opaque; switch (config->type) { case VIR_DOMAIN_CHR_TYPE_UNIX: hasSendFD = true; - if ((fd = qemuMonitorOpenUnix(config->data.nix.path, vm ? vm->pid : 0)) < 0) + if ((fd = qemuMonitorOpenUnix(config->data.nix.path, + vm ? vm->pid : 0, + driver)) < 0) return NULL; break; diff --git a/src/qemu/test_libvirtd_qemu.aug.in b/src/qemu/test_libvirtd_qemu.aug.in index 81fedd6..8d58178 100644 --- a/src/qemu/test_libvirtd_qemu.aug.in +++ b/src/qemu/test_libvirtd_qemu.aug.in @@ -72,3 +72,4 @@ module Test_libvirtd_qemu = { "migration_address" = "127.0.0.1" } { "migration_port_min" = "49152" } { "migration_port_max" = "49215" } +{ "monitor_socket_open_timeout" = "60" } -- 1.8.5.2

On Thu, Jan 09, 2014 at 09:22:05AM +0100, Martin Kletzander wrote:
From: Pavel Fux <pavel@stratoscale.com>
Adding an option to change monitor socket opening timeout the current default is 3 seconds and in some cases it's not enough
Signed-off-by: Pavel Fux <pavel@stratoscale.com> Signed-off-by: Martin Kletzander <mkletzan@redhat.com> ---
Notes: I modified the description in the config file, made the use of the opaque argument in qemuMonitorOpen and rebased it on current master.
I also added the config options in augeas test to make the 'make check' pass.
IMHO we shouldn't add this config parameter. This kind of parameter is basically saying "our code doesn't work by default, set this to fix it" which is just horrible behaviour. Further more an admin won't even find out about this until the worst possible time. Just increase the default timeout if we need to. Even better would be to figure out how we can properly fix this to avoid any need for timeout at all. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Fri, Jan 10, 2014 at 02:18:37PM +0000, Daniel P. Berrange wrote:
On Thu, Jan 09, 2014 at 09:22:05AM +0100, Martin Kletzander wrote:
From: Pavel Fux <pavel@stratoscale.com>
Adding an option to change monitor socket opening timeout the current default is 3 seconds and in some cases it's not enough
Signed-off-by: Pavel Fux <pavel@stratoscale.com> Signed-off-by: Martin Kletzander <mkletzan@redhat.com> ---
Notes: I modified the description in the config file, made the use of the opaque argument in qemuMonitorOpen and rebased it on current master.
I also added the config options in augeas test to make the 'make check' pass.
IMHO we shouldn't add this config parameter. This kind of parameter is basically saying "our code doesn't work by default, set this to fix it" which is just horrible behaviour. Further more an admin won't even find out about this until the worst possible time. Just increase the default timeout if we need to. Even better would be to figure out how we can properly fix this to avoid any need for timeout at all.
The same can be said about e.g. audio-related options in the config file or (when going to extremes) debug logs. Yes, there might be problems and this is a way how admins/users can check where a particular problem might be. And the very fact that we need to change this variable now does in fact proves that this might need another change in the future. Having this particular value configurable is merely a _option_ for admins/users and I see no drawback at all in it. As Rich suggested (and Cole copied), check out the number of hits for: https://www.google.co.uk/search?q="monitor+socket+did+not+show+up" Many of them are related to the domains having managed-save, but since nobody looked for a root cause of it (as I know of), this might be related to this very problem. Does this mean ACK to [2/2] and NACK to [1/2] then? Martin P.S.: I also forgot to mention that this might most probably resolve these bugs: https://bugzilla.redhat.com/show_bug.cgi?id=892273 https://bugzilla.redhat.com/show_bug.cgi?id=895901 https://bugzilla.redhat.com/show_bug.cgi?id=987088 https://bugzilla.redhat.com/show_bug.cgi?id=1051364
Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

Ping? Can we at least change the default [2/2] or should I send a v2 for that one? Martin On Fri, Jan 10, 2014 at 04:27:40PM +0100, Martin Kletzander wrote:
On Fri, Jan 10, 2014 at 02:18:37PM +0000, Daniel P. Berrange wrote:
On Thu, Jan 09, 2014 at 09:22:05AM +0100, Martin Kletzander wrote:
From: Pavel Fux <pavel@stratoscale.com>
Adding an option to change monitor socket opening timeout the current default is 3 seconds and in some cases it's not enough
Signed-off-by: Pavel Fux <pavel@stratoscale.com> Signed-off-by: Martin Kletzander <mkletzan@redhat.com> ---
Notes: I modified the description in the config file, made the use of the opaque argument in qemuMonitorOpen and rebased it on current master.
I also added the config options in augeas test to make the 'make check' pass.
IMHO we shouldn't add this config parameter. This kind of parameter is basically saying "our code doesn't work by default, set this to fix it" which is just horrible behaviour. Further more an admin won't even find out about this until the worst possible time. Just increase the default timeout if we need to. Even better would be to figure out how we can properly fix this to avoid any need for timeout at all.
The same can be said about e.g. audio-related options in the config file or (when going to extremes) debug logs. Yes, there might be problems and this is a way how admins/users can check where a particular problem might be. And the very fact that we need to change this variable now does in fact proves that this might need another change in the future. Having this particular value configurable is merely a _option_ for admins/users and I see no drawback at all in it.
As Rich suggested (and Cole copied), check out the number of hits for: https://www.google.co.uk/search?q="monitor+socket+did+not+show+up"
Many of them are related to the domains having managed-save, but since nobody looked for a root cause of it (as I know of), this might be related to this very problem.
Does this mean ACK to [2/2] and NACK to [1/2] then?
Martin
P.S.: I also forgot to mention that this might most probably resolve these bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=892273 https://bugzilla.redhat.com/show_bug.cgi?id=895901 https://bugzilla.redhat.com/show_bug.cgi?id=987088 https://bugzilla.redhat.com/show_bug.cgi?id=1051364
Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

I agree, there is no harm in adding an option of configuration, different setup configurations require different timeout values. my setup was 8 servers booted with PXE boot and running on nfs rootfs, with 8 vms on each. when I tried to start all of them together the bottle neck was the network and it takes about 5 minutes till they all start. I think a good solution will include changes in qemu's behavior, but the solution that I suggested is much better than the current state. we would be very happy not to manage our own version of libvirt. Silence gives consent? On Thu, Jan 16, 2014 at 5:49 PM, Martin Kletzander <mkletzan@redhat.com>wrote:
Ping? Can we at least change the default [2/2] or should I send a v2 for that one?
Martin
On Fri, Jan 10, 2014 at 02:18:37PM +0000, Daniel P. Berrange wrote:
On Thu, Jan 09, 2014 at 09:22:05AM +0100, Martin Kletzander wrote:
From: Pavel Fux <pavel@stratoscale.com>
Adding an option to change monitor socket opening timeout the current default is 3 seconds and in some cases it's not enough
Signed-off-by: Pavel Fux <pavel@stratoscale.com> Signed-off-by: Martin Kletzander <mkletzan@redhat.com> ---
Notes: I modified the description in the config file, made the use of
On Fri, Jan 10, 2014 at 04:27:40PM +0100, Martin Kletzander wrote: the
opaque argument in qemuMonitorOpen and rebased it on current
master.
I also added the config options in augeas test to make the 'make check' pass.
IMHO we shouldn't add this config parameter. This kind of parameter is basically saying "our code doesn't work by default, set this to fix it" which is just horrible behaviour. Further more an admin won't even find out about this until the worst possible time. Just increase the default timeout if we need to. Even better would be to figure out how we can properly fix this to avoid any need for timeout at all.
The same can be said about e.g. audio-related options in the config file or (when going to extremes) debug logs. Yes, there might be problems and this is a way how admins/users can check where a particular problem might be. And the very fact that we need to change this variable now does in fact proves that this might need another change in the future. Having this particular value configurable is merely a _option_ for admins/users and I see no drawback at all in it.
As Rich suggested (and Cole copied), check out the number of hits for: https://www.google.co.uk/search?q="monitor+socket+did+not+show+up"
Many of them are related to the domains having managed-save, but since nobody looked for a root cause of it (as I know of), this might be related to this very problem.
Does this mean ACK to [2/2] and NACK to [1/2] then?
Martin
P.S.: I also forgot to mention that this might most probably resolve these bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=892273 https://bugzilla.redhat.com/show_bug.cgi?id=895901 https://bugzilla.redhat.com/show_bug.cgi?id=987088 https://bugzilla.redhat.com/show_bug.cgi?id=1051364
Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On Thu, Jan 23, 2014 at 04:40:40PM +0200, Pavel Fux wrote:
I agree, there is no harm in adding an option of configuration, different setup configurations require different timeout values. my setup was 8 servers booted with PXE boot and running on nfs rootfs, with 8 vms on each. when I tried to start all of them together the bottle neck was the network and it takes about 5 minutes till they all start.
That doesn't make any sense. The waiting code here is about the QEMU process' initial startup sequence - ie the gap between exec'ing the QEMU binary, and it listening on the monitor socket. PXE / nfsroot doesn't get involved there at all. Even if it were involve, if you're seeing 5 minute delays with only 8 vms on the host, something is seriously screwed with your host. This isn't a compelling reason to add this config option to libvirt. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

there are 8 servers with 8 vms on each server. all the qcow images are on the nfs share on the same external server. we are starting all 64 vms at the same time. each vm is 2.5GB X 64vms = 160GB = 1280Gb to read all of the data on a 1Gbe interface will take 1280sec = 21.3min not all of the image is being read on boot so it takes only 5min anyway, one timeout won't solve all problems, why not 31? or 60? On Thu, Jan 23, 2014 at 4:44 PM, Daniel P. Berrange <berrange@redhat.com>wrote:
On Thu, Jan 23, 2014 at 04:40:40PM +0200, Pavel Fux wrote:
I agree, there is no harm in adding an option of configuration, different setup configurations require different timeout values. my setup was 8 servers booted with PXE boot and running on nfs rootfs, with 8 vms on each. when I tried to start all of them together the bottle neck was the network and it takes about 5 minutes till they all start.
That doesn't make any sense. The waiting code here is about the QEMU process' initial startup sequence - ie the gap between exec'ing the QEMU binary, and it listening on the monitor socket. PXE / nfsroot doesn't get involved there at all. Even if it were involve, if you're seeing 5 minute delays with only 8 vms on the host, something is seriously screwed with your host. This isn't a compelling reason to add this config option to libvirt.
Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/:| |: http://libvirt.org -o- http://virt-manager.org:| |: http://autobuild.org -o- http://search.cpan.org/~danberr/:| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc:|

On Thu, Jan 23, 2014 at 07:47:54PM +0200, Pavel Fux wrote:
there are 8 servers with 8 vms on each server. all the qcow images are on the nfs share on the same external server. we are starting all 64 vms at the same time. each vm is 2.5GB X 64vms = 160GB = 1280Gb to read all of the data on a 1Gbe interface will take 1280sec = 21.3min not all of the image is being read on boot so it takes only 5min
That's interesting, but it still doesn't explain the failures. QEMU will start listening on its monitor socket before it even opens any of the disk images. So the time it takes to read disk images on boot should have no relevance to timeouts waiting for the monitor socket. All it does between exec of the QEMU binary and listening for the monitor socket is to loaded libraries QEMU is linked against and load a few misc pieces like BIOS firmware blobs. I just can't see a reason why this would take anywhere near 5 minutes - it should be a matter of a few seconds at worst. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Fri, Jan 24, 2014 at 12:56:43PM +0000, Daniel P. Berrange wrote:
On Thu, Jan 23, 2014 at 07:47:54PM +0200, Pavel Fux wrote:
there are 8 servers with 8 vms on each server. all the qcow images are on the nfs share on the same external server. we are starting all 64 vms at the same time. each vm is 2.5GB X 64vms = 160GB = 1280Gb to read all of the data on a 1Gbe interface will take 1280sec = 21.3min not all of the image is being read on boot so it takes only 5min
That's interesting, but it still doesn't explain the failures. QEMU will start listening on its monitor socket before it even opens any of the disk images. So the time it takes to read disk images on boot should have no relevance to timeouts waiting for the monitor socket. All it does between exec of the QEMU binary and listening for the monitor socket is to loaded libraries QEMU is linked against and load a few misc pieces like BIOS firmware blobs. I just can't see a reason why this would take anywhere near 5 minutes - it should be a matter of a few seconds at worst.
I think it does a little bit more than that, but I have no proof for it. When you look for most occurrences of this error wrt virt-manager (I'm not sure why, maybe because people using virsh deal with it themselves), you'll find that most of them are caused by a managed save. When qemu is loading, it takes more than those 3 seconds we had before, and it fails to start the machine. The thing is that there is nothing else weird on those machines, removing the managed save solves everything. And that's why I think it at least loads some initialization values (in some special cases), although I haven't been able to reproduce that. The machine can have high load on the same resource where the bottleneck for the first lines of code is (or even binary initialization as Michal mentioned IIRC). And after that the machine is all right. Martin

On Fri, Jan 24, 2014 at 05:17:02PM +0100, Martin Kletzander wrote:
On Fri, Jan 24, 2014 at 12:56:43PM +0000, Daniel P. Berrange wrote:
On Thu, Jan 23, 2014 at 07:47:54PM +0200, Pavel Fux wrote:
there are 8 servers with 8 vms on each server. all the qcow images are on the nfs share on the same external server. we are starting all 64 vms at the same time. each vm is 2.5GB X 64vms = 160GB = 1280Gb to read all of the data on a 1Gbe interface will take 1280sec = 21.3min not all of the image is being read on boot so it takes only 5min
That's interesting, but it still doesn't explain the failures. QEMU will start listening on its monitor socket before it even opens any of the disk images. So the time it takes to read disk images on boot should have no relevance to timeouts waiting for the monitor socket. All it does between exec of the QEMU binary and listening for the monitor socket is to loaded libraries QEMU is linked against and load a few misc pieces like BIOS firmware blobs. I just can't see a reason why this would take anywhere near 5 minutes - it should be a matter of a few seconds at worst.
I think it does a little bit more than that, but I have no proof for it. When you look for most occurrences of this error wrt virt-manager (I'm not sure why, maybe because people using virsh deal with it themselves), you'll find that most of them are caused by a managed save. When qemu is loading, it takes more than those 3 seconds we had before, and it fails to start the machine. The thing is that there is nothing else weird on those machines, removing the managed save solves everything. And that's why I think it at least loads some initialization values (in some special cases), although I haven't been able to reproduce that.
Hmm, I was thinking it might be something related to socket connect/accept synchronization. QEMU will listen() very early, but won't accept() until very late in startup. I've just confirmed in a test though that connect() will succeed even if the app doesn't call accept(), since the kernel will complete the connection at the protocol level and just queue the client. So that doesn't explain it yet. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Mon, Jan 27, 2014 at 11:28:31AM +0000, Daniel P. Berrange wrote:
On Fri, Jan 24, 2014 at 05:17:02PM +0100, Martin Kletzander wrote:
On Fri, Jan 24, 2014 at 12:56:43PM +0000, Daniel P. Berrange wrote:
On Thu, Jan 23, 2014 at 07:47:54PM +0200, Pavel Fux wrote:
there are 8 servers with 8 vms on each server. all the qcow images are on the nfs share on the same external server. we are starting all 64 vms at the same time. each vm is 2.5GB X 64vms = 160GB = 1280Gb to read all of the data on a 1Gbe interface will take 1280sec = 21.3min not all of the image is being read on boot so it takes only 5min
That's interesting, but it still doesn't explain the failures. QEMU will start listening on its monitor socket before it even opens any of the disk images. So the time it takes to read disk images on boot should have no relevance to timeouts waiting for the monitor socket. All it does between exec of the QEMU binary and listening for the monitor socket is to loaded libraries QEMU is linked against and load a few misc pieces like BIOS firmware blobs. I just can't see a reason why this would take anywhere near 5 minutes - it should be a matter of a few seconds at worst.
I think it does a little bit more than that, but I have no proof for it. When you look for most occurrences of this error wrt virt-manager (I'm not sure why, maybe because people using virsh deal with it themselves), you'll find that most of them are caused by a managed save. When qemu is loading, it takes more than those 3 seconds we had before, and it fails to start the machine. The thing is that there is nothing else weird on those machines, removing the managed save solves everything. And that's why I think it at least loads some initialization values (in some special cases), although I haven't been able to reproduce that.
Hmm, I was thinking it might be something related to socket connect/accept synchronization. QEMU will listen() very early, but won't accept() until very late in startup. I've just confirmed in a test though that connect() will succeed even if the app doesn't call accept(), since the kernel will complete the connection at the protocol level and just queue the client. So that doesn't explain it yet.
I did a test with QEMU by adding a 'sleep(20)' into the QEMU main() method in vl.c. It only causes QEMU startup failures if we put the sleep right after parsing command line args. Once QEMU has done a listen() on the socket, libvirt handles arbitrary delays without issue. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

There is a number of reported issues when we fail starting a domain. Turns out that, in some scenarios like high load, 3 second timeout is not enough for qemu to start up to the phase where the socket is created. Since the timeout is configurable and there is no downside of waiting longer, raise the timeout right to 30 seconds. Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/qemu/qemu.conf | 2 +- src/qemu/qemu_monitor.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu.conf b/src/qemu/qemu.conf index 6217b49..4936d88 100644 --- a/src/qemu/qemu.conf +++ b/src/qemu/qemu.conf @@ -472,6 +472,6 @@ # such file or directory" that could be because libvirt did not wait # enough time, you can try increasing this timeout. # -# Default is 3 +# Default is 30 # #monitor_socket_open_timeout = 60 diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index f34527a..6a437b1 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -273,7 +273,7 @@ qemuMonitorOpenUnix(const char *monitor, pid_t cpid, virQEMUDriverPtr driver) virQEMUDriverConfigPtr cfg = NULL; struct sockaddr_un addr; int monfd; - int timeout = 3; /* In seconds */ + int timeout = 30; /* In seconds */ int ret; size_t i = 0; -- 1.8.5.2

On Thu, Jan 09, 2014 at 09:22:06AM +0100, Martin Kletzander wrote:
There is a number of reported issues when we fail starting a domain. Turns out that, in some scenarios like high load, 3 second timeout is not enough for qemu to start up to the phase where the socket is created. Since the timeout is configurable and there is no downside of waiting longer, raise the timeout right to 30 seconds.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/qemu/qemu.conf | 2 +- src/qemu/qemu_monitor.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu.conf b/src/qemu/qemu.conf index 6217b49..4936d88 100644 --- a/src/qemu/qemu.conf +++ b/src/qemu/qemu.conf @@ -472,6 +472,6 @@ # such file or directory" that could be because libvirt did not wait # enough time, you can try increasing this timeout. # -# Default is 3 +# Default is 30 # #monitor_socket_open_timeout = 60 diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index f34527a..6a437b1 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -273,7 +273,7 @@ qemuMonitorOpenUnix(const char *monitor, pid_t cpid, virQEMUDriverPtr driver) virQEMUDriverConfigPtr cfg = NULL; struct sockaddr_un addr; int monfd; - int timeout = 3; /* In seconds */ + int timeout = 30; /* In seconds */ int ret; size_t i = 0;
ACK. It is safe to wait longer, since in the loop we kill() to check if QEMU is still running or not. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Thu, Jan 16, 2014 at 04:11:07PM +0000, Daniel P. Berrange wrote:
On Thu, Jan 09, 2014 at 09:22:06AM +0100, Martin Kletzander wrote:
There is a number of reported issues when we fail starting a domain. Turns out that, in some scenarios like high load, 3 second timeout is not enough for qemu to start up to the phase where the socket is created. Since the timeout is configurable and there is no downside of waiting longer, raise the timeout right to 30 seconds.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/qemu/qemu.conf | 2 +- src/qemu/qemu_monitor.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu.conf b/src/qemu/qemu.conf index 6217b49..4936d88 100644 --- a/src/qemu/qemu.conf +++ b/src/qemu/qemu.conf @@ -472,6 +472,6 @@ # such file or directory" that could be because libvirt did not wait # enough time, you can try increasing this timeout. # -# Default is 3 +# Default is 30 # #monitor_socket_open_timeout = 60 diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index f34527a..6a437b1 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -273,7 +273,7 @@ qemuMonitorOpenUnix(const char *monitor, pid_t cpid, virQEMUDriverPtr driver) virQEMUDriverConfigPtr cfg = NULL; struct sockaddr_un addr; int monfd; - int timeout = 3; /* In seconds */ + int timeout = 30; /* In seconds */ int ret; size_t i = 0;
ACK.
It is safe to wait longer, since in the loop we kill() to check if QEMU is still running or not.
Pushed, thanks. Martin

at least in my case changing the value to 30 seconds is not enough, we had to change it to 5 minutes I suggest you let the user change it as he wishes. On Thu, Jan 16, 2014 at 6:21 PM, Martin Kletzander <mkletzan@redhat.com>wrote:
On Thu, Jan 16, 2014 at 04:11:07PM +0000, Daniel P. Berrange wrote:
On Thu, Jan 09, 2014 at 09:22:06AM +0100, Martin Kletzander wrote:
There is a number of reported issues when we fail starting a domain. Turns out that, in some scenarios like high load, 3 second timeout is not enough for qemu to start up to the phase where the socket is created. Since the timeout is configurable and there is no downside of waiting longer, raise the timeout right to 30 seconds.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/qemu/qemu.conf | 2 +- src/qemu/qemu_monitor.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu.conf b/src/qemu/qemu.conf index 6217b49..4936d88 100644 --- a/src/qemu/qemu.conf +++ b/src/qemu/qemu.conf @@ -472,6 +472,6 @@ # such file or directory" that could be because libvirt did not wait # enough time, you can try increasing this timeout. # -# Default is 3 +# Default is 30 # #monitor_socket_open_timeout = 60 diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index f34527a..6a437b1 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -273,7 +273,7 @@ qemuMonitorOpenUnix(const char *monitor, pid_t cpid, virQEMUDriverPtr driver) virQEMUDriverConfigPtr cfg = NULL; struct sockaddr_un addr; int monfd; - int timeout = 3; /* In seconds */ + int timeout = 30; /* In seconds */ int ret; size_t i = 0;
ACK.
It is safe to wait longer, since in the loop we kill() to check if QEMU is still running or not.
Pushed, thanks.
Martin

On Mon, Jan 20, 2014 at 04:33:09PM +0200, Pavel Fux wrote:
at least in my case changing the value to 30 seconds is not enough, we had to change it to 5 minutes
What is the scenario in which you're seeing this problem ? Is it a problem when you are running lots of machines at once on a host ? Waiting 5 minutes for a QEMU process to start is really a ridiculous amount of time - I'd really question whether the VMs can even do any useful work at all if the system is being that slow Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 09.01.2014 09:22, Martin Kletzander wrote:
This is basically v3 of the patch Pavel Fux sent [1] with the addition of changing the default as discussed in the same thread [2].
Martin
[1] https://www.redhat.com/archives/libvir-list/2014-January/msg00060.html [2] https://www.redhat.com/archives/libvir-list/2014-January/msg00367.html
Martin Kletzander (1): qemu: Change the default unix monitor timeout
Pavel Fux (1): qemu: Add support for changing timeout value to open unix monitor socket
src/qemu/libvirtd_qemu.aug | 3 +++ src/qemu/qemu.conf | 12 ++++++++++++ src/qemu/qemu_conf.c | 2 ++ src/qemu/qemu_conf.h | 2 ++ src/qemu/qemu_monitor.c | 20 +++++++++++++++++--- src/qemu/test_libvirtd_qemu.aug.in | 1 + 6 files changed, 37 insertions(+), 3 deletions(-)
-- 1.8.5.2
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
I'm not going to ACK these until there's an agreement upstream, but just to express my opinion: I like these patches. I'm not sure about qemu internals, but I see some cases where this might be handy: a) qemu access anything on the disk prior to creating the eventloop thread (and thus accept()-ing us on the monitor). The file qemu's accessing might be on an NFS which is currently unresponsive. Either it will be back in a while and then we can allow domain startup, or it won't - then we need to kill the qemu. And what does 'while' mean is use case specific => should be configurable. b) system is temporarily under heavy load - e.g. I/O load - which means, kernel is not able to link and preload qemu and its libraries. Michal
participants (4)
-
Daniel P. Berrange
-
Martin Kletzander
-
Michal Privoznik
-
Pavel Fux