[libvirt] [PATCH 0/2] RFC: make QEMU timeout monitor replies

This is an experimental patch to make QEMU timeout on monitor replies after 30 seconds. I have only tested it to the extent of booting a guest. Interested in whether people can get a bogus timeout to occur in any testing they perform. By 'bogus timeout', I mean libvirt reporting a timeout when QEMU is still working correctly. Ideally we only timeout when QEMU has been sent SIGSTOP, or has live-locked itself.

If the QEMU process has been stopped (kill -STOP/gdb), or the QEMU process has live-locked itself, then we will never get a reply from the monitor. We should not wait forever in this case, but instead timeout after a reasonable amount of time. NB if the host has high CPU load, or a single monitor command intentionally takes a long time, then this will cause bogus failures. In the case of high CPU load, arguably the guest should have been migrated elsewhere, since you can't effectively manage guests on a host if QEMU is taking > 30 seconds to reply to simply commands. Since we use background migration, there should not be any commands which take significant time to execute any more * src/qemu/qemu_monitor.c: Timeout waiting for reply after 30 seconds --- src/qemu/qemu_monitor.c | 21 ++++++++++++++++++--- 1 files changed, 18 insertions(+), 3 deletions(-) diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 89a3f64..d9b6600 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -781,10 +781,19 @@ char *qemuMonitorNextCommandID(qemuMonitorPtr mon) } +/* Give up waiting for reply after 30 seconds */ +#define QEMU_MONITOR_WAIT_TIME (1000ull * 30) + int qemuMonitorSend(qemuMonitorPtr mon, qemuMonitorMessagePtr msg) { int ret = -1; + unsigned long long now; + unsigned long long then; + + if (virTimeMs(&now) < 0) + return -1; + then = now + QEMU_MONITOR_WAIT_TIME; /* Check whether qemu quited unexpectedly */ if (mon->lastError.code != VIR_ERR_OK) { @@ -798,9 +807,15 @@ int qemuMonitorSend(qemuMonitorPtr mon, qemuMonitorUpdateWatch(mon); while (!mon->msg->finished) { - if (virCondWait(&mon->notify, &mon->lock) < 0) { - qemuReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("Unable to wait on monitor condition")); + if (virCondWaitUntil(&mon->notify, &mon->lock, then) < 0) { + if (errno == ETIMEDOUT) + qemuReportError(VIR_ERR_OPERATION_TIMEOUT, + "%s", _("no reply received from qemu")); + else + virReportSystemError(errno, + "%s", _("cannot wait on monitor condition")); + /* Ensure no further monitor commands can be run */ + virCopyLastError(&mon->lastError); goto cleanup; } } -- 1.7.4.4

On Wed, Jun 22, 2011 at 16:47:18 +0100, Daniel P. Berrange wrote:
If the QEMU process has been stopped (kill -STOP/gdb), or the QEMU process has live-locked itself, then we will never get a reply from the monitor. We should not wait forever in this case, but instead timeout after a reasonable amount of time.
NB if the host has high CPU load, or a single monitor command intentionally takes a long time, then this will cause bogus failures. In the case of high CPU load, arguably the guest should have been migrated elsewhere, since you can't effectively manage guests on a host if QEMU is taking > 30 seconds to reply to simply commands. Since we use background migration, there should not be any commands which take significant time to execute any more
The thing I'm most concerned about is that is far too easy to get into such situations especially since disk cache subsystem in Linux kernel is not the best thing in the world. While I agree that running guests on a loaded host is not very clever and guests should rather be migrated elsewhere, such situation doesn't have to be intentional. In other words, in case of a malfunction of some kind (some processes go crazy, network disruptions, ...) QEMU may require more than a timeout seconds to respond and we will penalize an innocent QEMU process because we won't be able to control it anymore even though the issues get fixed. Jirka

On 06/22/2011 11:05 AM, Jiri Denemark wrote:
On Wed, Jun 22, 2011 at 16:47:18 +0100, Daniel P. Berrange wrote:
If the QEMU process has been stopped (kill -STOP/gdb), or the QEMU process has live-locked itself, then we will never get a reply from the monitor. We should not wait forever in this case, but instead timeout after a reasonable amount of time.
NB if the host has high CPU load, or a single monitor command intentionally takes a long time, then this will cause bogus failures. In the case of high CPU load, arguably the guest should have been migrated elsewhere, since you can't effectively manage guests on a host if QEMU is taking > 30 seconds to reply to simply commands. Since we use background migration, there should not be any commands which take significant time to execute any more
The thing I'm most concerned about is that is far too easy to get into such situations especially since disk cache subsystem in Linux kernel is not the best thing in the world. While I agree that running guests on a loaded host is not very clever and guests should rather be migrated elsewhere, such situation doesn't have to be intentional. In other words, in case of a malfunction of some kind (some processes go crazy, network disruptions, ...) QEMU may require more than a timeout seconds to respond and we will penalize an innocent QEMU process because we won't be able to control it anymore even though the issues get fixed.
Is there any way to measure time spent by the child process, rather than just relying on wall-time elapsed? That is, when libvirt hits 30 seconds of wall time in waiting for a monitor, can it then check whether the child process has accumulated any execution time (likely hung) vs. no execution time (likely a starved system situation), and only give up in the former case? -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On Wed, Jun 22, 2011 at 11:26:27AM -0600, Eric Blake wrote:
On 06/22/2011 11:05 AM, Jiri Denemark wrote:
On Wed, Jun 22, 2011 at 16:47:18 +0100, Daniel P. Berrange wrote:
If the QEMU process has been stopped (kill -STOP/gdb), or the QEMU process has live-locked itself, then we will never get a reply from the monitor. We should not wait forever in this case, but instead timeout after a reasonable amount of time.
NB if the host has high CPU load, or a single monitor command intentionally takes a long time, then this will cause bogus failures. In the case of high CPU load, arguably the guest should have been migrated elsewhere, since you can't effectively manage guests on a host if QEMU is taking > 30 seconds to reply to simply commands. Since we use background migration, there should not be any commands which take significant time to execute any more
The thing I'm most concerned about is that is far too easy to get into such situations especially since disk cache subsystem in Linux kernel is not the best thing in the world. While I agree that running guests on a loaded host is not very clever and guests should rather be migrated elsewhere, such situation doesn't have to be intentional. In other words, in case of a malfunction of some kind (some processes go crazy, network disruptions, ...) QEMU may require more than a timeout seconds to respond and we will penalize an innocent QEMU process because we won't be able to control it anymore even though the issues get fixed.
Is there any way to measure time spent by the child process, rather than just relying on wall-time elapsed? That is, when libvirt hits 30 seconds of wall time in waiting for a monitor, can it then check whether the child process has accumulated any execution time (likely hung) vs. no execution time (likely a starved system situation), and only give up in the former case?
Well a STOP'ed child process won't accumulate any execution time, and you won't be able to discriminate just based on this, but I think we should be able to poke linux to see if the process is in D state for example and if we do mark the guest as non reponding then being able to provide an useful error information upon the associated API failure like "Failed to contact domain: process stopped" "Failed to contact domain: blocked on I/O" "Failed to contact domain: process looping" would be a really good thing. That probing and reporting can be done as a separate step though Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

On 24.06.2011 07:19, Daniel Veillard wrote:
On Wed, Jun 22, 2011 at 11:26:27AM -0600, Eric Blake wrote:
On 06/22/2011 11:05 AM, Jiri Denemark wrote:
On Wed, Jun 22, 2011 at 16:47:18 +0100, Daniel P. Berrange wrote:
If the QEMU process has been stopped (kill -STOP/gdb), or the QEMU process has live-locked itself, then we will never get a reply from the monitor. We should not wait forever in this case, but instead timeout after a reasonable amount of time.
NB if the host has high CPU load, or a single monitor command intentionally takes a long time, then this will cause bogus failures. In the case of high CPU load, arguably the guest should have been migrated elsewhere, since you can't effectively manage guests on a host if QEMU is taking > 30 seconds to reply to simply commands. Since we use background migration, there should not be any commands which take significant time to execute any more
The thing I'm most concerned about is that is far too easy to get into such situations especially since disk cache subsystem in Linux kernel is not the best thing in the world. While I agree that running guests on a loaded host is not very clever and guests should rather be migrated elsewhere, such situation doesn't have to be intentional. In other words, in case of a malfunction of some kind (some processes go crazy, network disruptions, ...) QEMU may require more than a timeout seconds to respond and we will penalize an innocent QEMU process because we won't be able to control it anymore even though the issues get fixed.
Is there any way to measure time spent by the child process, rather than just relying on wall-time elapsed? That is, when libvirt hits 30 seconds of wall time in waiting for a monitor, can it then check whether the child process has accumulated any execution time (likely hung) vs. no execution time (likely a starved system situation), and only give up in the former case?
Well a STOP'ed child process won't accumulate any execution time, and you won't be able to discriminate just based on this, but I think we should be able to poke linux to see if the process is in D state for example and if we do mark the guest as non reponding then being able to provide an useful error information upon the associated API failure like "Failed to contact domain: process stopped" "Failed to contact domain: blocked on I/O" "Failed to contact domain: process looping"
would be a really good thing. That probing and reporting can be done as a separate step though
Daniel
To me this looks like solving the Halting problem. That means - for some cases we might be able to tell qemu will not answer anymore, but for others we will not. I agree if qemu (and thus libvirt API call) does not return in ~30 seconds, users get anxious, but it would be nice if we could send destroy to a unresponsive domains at least. Michal

On Wed, Jun 22, 2011 at 07:05:08PM +0200, Jiri Denemark wrote:
On Wed, Jun 22, 2011 at 16:47:18 +0100, Daniel P. Berrange wrote:
If the QEMU process has been stopped (kill -STOP/gdb), or the QEMU process has live-locked itself, then we will never get a reply from the monitor. We should not wait forever in this case, but instead timeout after a reasonable amount of time.
NB if the host has high CPU load, or a single monitor command intentionally takes a long time, then this will cause bogus failures. In the case of high CPU load, arguably the guest should have been migrated elsewhere, since you can't effectively manage guests on a host if QEMU is taking > 30 seconds to reply to simply commands. Since we use background migration, there should not be any commands which take significant time to execute any more
The thing I'm most concerned about is that is far too easy to get into such situations especially since disk cache subsystem in Linux kernel is not the best thing in the world. While I agree that running guests on a loaded host is not very clever and guests should rather be migrated elsewhere, such situation doesn't have to be intentional. In other words, in case of a malfunction of some kind (some processes go crazy, network disruptions, ...) QEMU may require more than a timeout seconds to respond and we will penalize an innocent QEMU process because we won't be able to control it anymore even though the issues get fixed.
It's clearly a trade-off and the reason why it must be configurable 30s is a lot already. It's a first shot, and I'm sure feedback will suggest to add more logic around that basic timeout based error detection. Right now the problem is that never failing the call is a serious issue, and can block the whole process too (like on daemon restart when trying to reconnect to a stuck guest). Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

On 06/22/2011 09:47 AM, Daniel P. Berrange wrote:
If the QEMU process has been stopped (kill -STOP/gdb), or the QEMU process has live-locked itself, then we will never get a reply from the monitor. We should not wait forever in this case, but instead timeout after a reasonable amount of time.
NB if the host has high CPU load, or a single monitor command intentionally takes a long time, then this will cause bogus failures. In the case of high CPU load, arguably the guest should have been migrated elsewhere, since you can't effectively manage guests on a host if QEMU is taking > 30 seconds to reply to simply commands. Since we use background migration, there should not be any commands which take significant time to execute any more
* src/qemu/qemu_monitor.c: Timeout waiting for reply after 30 seconds --- src/qemu/qemu_monitor.c | 21 ++++++++++++++++++--- 1 files changed, 18 insertions(+), 3 deletions(-)
This didn't make it into RC1, so I'm now torn on whether it is important enough to be in RC2 or whether it is enough of a feature to defer to post-0.9.3. ACK to the code, once we decide whether (when?) to apply it. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

This makes the monitor timeout limit configurable via the qemu.conf configuration file. * src/conf/capabilities.h: Allow private data to be passed to the virDomainObjPtr private allocation function * src/libxl/libxl_driver.c, src/lxc/lxc_driver.c, src/test/test_driver.c, src/uml/uml_driver.c, src/vmware/vmware_driver.c: Update for API change in capabilities * src/qemu/qemu.conf, src/qemu/test_libvirtd_qemu.aug, src/qemu/libvirtd_qemu.aug: Add 'monitor_timeout' config parameter * src/qemu/qemu_conf.c, src/qemu/qemu_conf.h: Load monitor timeout * src/qemu/qemu_domain.c, src/qemu/qemu_domain.h, src/qemu/qemu_driver.c, src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h, src/qemu/qemu_process.c: Use configurable monitor timeout --- src/conf/capabilities.h | 3 ++- src/libxl/libxl_driver.c | 2 +- src/lxc/lxc_driver.c | 2 +- src/qemu/libvirtd_qemu.aug | 1 + src/qemu/qemu.conf | 12 ++++++++++++ src/qemu/qemu_conf.c | 6 ++++++ src/qemu/qemu_conf.h | 2 ++ src/qemu/qemu_domain.c | 15 ++++++++------- src/qemu/qemu_domain.h | 3 ++- src/qemu/qemu_driver.c | 2 +- src/qemu/qemu_monitor.c | 8 ++++---- src/qemu/qemu_monitor.h | 1 + src/qemu/qemu_process.c | 1 + src/qemu/test_libvirtd_qemu.aug | 4 ++++ src/test/test_driver.c | 2 +- src/uml/uml_driver.c | 2 +- src/vmware/vmware_driver.c | 2 +- 17 files changed, 49 insertions(+), 19 deletions(-) diff --git a/src/conf/capabilities.h b/src/conf/capabilities.h index e2fa1d6..dd4ad93 100644 --- a/src/conf/capabilities.h +++ b/src/conf/capabilities.h @@ -145,7 +145,8 @@ struct _virCaps { const char *defaultDiskDriverName; const char *defaultDiskDriverType; int defaultConsoleTargetType; - void *(*privateDataAllocFunc)(void); + void *privateDataOpaque; + void *(*privateDataAllocFunc)(void *); void (*privateDataFreeFunc)(void *); int (*privateDataXMLFormat)(virBufferPtr, void *); int (*privateDataXMLParse)(xmlXPathContextPtr, void *); diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index 5a5951f..19edb4f 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -79,7 +79,7 @@ libxlDriverUnlock(libxlDriverPrivatePtr driver) } static void * -libxlDomainObjPrivateAlloc(void) +libxlDomainObjPrivateAlloc(void *opaque ATTRIBUTE_UNUSED) { libxlDomainObjPrivatePtr priv; diff --git a/src/lxc/lxc_driver.c b/src/lxc/lxc_driver.c index d0f7158..5f18f84 100644 --- a/src/lxc/lxc_driver.c +++ b/src/lxc/lxc_driver.c @@ -82,7 +82,7 @@ static void lxcDriverUnlock(lxc_driver_t *driver) virMutexUnlock(&driver->lock); } -static void *lxcDomainObjPrivateAlloc(void) +static void *lxcDomainObjPrivateAlloc(void *opaque ATTRIBUTE_UNUSED) { lxcDomainObjPrivatePtr priv; diff --git a/src/qemu/libvirtd_qemu.aug b/src/qemu/libvirtd_qemu.aug index 66858ae..c0a1e95 100644 --- a/src/qemu/libvirtd_qemu.aug +++ b/src/qemu/libvirtd_qemu.aug @@ -49,6 +49,7 @@ module Libvirtd_qemu = | bool_entry "set_process_name" | int_entry "max_processes" | str_entry "lock_manager" + | int_entry "monitor_timeout" (* Each enty in the config is one of the following three ... *) let entry = vnc_entry diff --git a/src/qemu/qemu.conf b/src/qemu/qemu.conf index 2c50d9d..f8d1187 100644 --- a/src/qemu/qemu.conf +++ b/src/qemu/qemu.conf @@ -287,3 +287,15 @@ # this # # lock_manager = "fcntl" + +# To change the amount of time we wait for a reply +# from the QEMU monitor. QEMU monitor replies should +# be pretty quick (< 5 seconds), but if the host is +# seriously badly loaded this may take more time. +# The default timeout, 30 seconds, ought to be plenty. +# If the host is so loaded that 30 seconds is too +# short it may be desirable to migrate some VMs to +# another host to reduce load, rather than increase +# the libvirt timeout. +# +# monitor_timeout = 30 diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c index 46f6976..0024b35 100644 --- a/src/qemu/qemu_conf.c +++ b/src/qemu/qemu_conf.c @@ -81,6 +81,7 @@ int qemudLoadDriverConfig(struct qemud_driver *driver, /* Setup critical defaults */ driver->dynamicOwnership = 1; driver->clearEmulatorCapabilities = 1; + driver->monTimeout = 30; if (!(driver->vncListen = strdup("127.0.0.1"))) { virReportOOMError(); @@ -444,6 +445,11 @@ int qemudLoadDriverConfig(struct qemud_driver *driver, VIR_ERROR(_("Failed to load lock manager %s"), p->str); } + p = virConfGetValue (conf, "monitor_timeout"); + CHECK_TYPE ("monitor_timeout", VIR_CONF_LONG); + if (p) driver->monTimeout = p->l; + + virConfFree (conf); return 0; } diff --git a/src/qemu/qemu_conf.h b/src/qemu/qemu_conf.h index bf6dcf4..2d08086 100644 --- a/src/qemu/qemu_conf.h +++ b/src/qemu/qemu_conf.h @@ -127,6 +127,8 @@ struct qemud_driver { virSysinfoDefPtr hostsysinfo; virLockManagerPluginPtr lockManager; + + int monTimeout; }; typedef struct _qemuDomainCmdlineDef qemuDomainCmdlineDef; diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index fab316f..8b39183 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -79,9 +79,10 @@ void qemuDomainEventQueue(struct qemud_driver *driver, } -static void *qemuDomainObjPrivateAlloc(void) +static void *qemuDomainObjPrivateAlloc(void *opaque) { qemuDomainObjPrivatePtr priv; + struct qemud_driver *driver = opaque; if (VIR_ALLOC(priv) < 0) return NULL; @@ -93,6 +94,7 @@ static void *qemuDomainObjPrivateAlloc(void) ignore_value(virCondDestroy(&priv->jobCond)); goto initfail; } + priv->monTimeout = driver->monTimeout; return priv; @@ -454,9 +456,11 @@ qemuDomainDefNamespaceHref(void) } -void qemuDomainSetPrivateDataHooks(virCapsPtr caps) +void qemuDomainSetPrivateDataHooks(struct qemud_driver *driver, + virCapsPtr caps) { /* Domain XML parser hooks */ + caps->privateDataOpaque = driver; caps->privateDataAllocFunc = qemuDomainObjPrivateAlloc; caps->privateDataFreeFunc = qemuDomainObjPrivateFree; caps->privateDataXMLFormat = qemuDomainObjPrivateXMLFormat; @@ -483,9 +487,6 @@ void qemuDomainSetNamespaceHooks(virCapsPtr caps) * successful calls must be followed by EndJob eventually */ -/* Give up waiting for mutex after 30 seconds */ -#define QEMU_JOB_WAIT_TIME (1000ull * 30) - int qemuDomainObjBeginJob(virDomainObjPtr obj) { qemuDomainObjPrivatePtr priv = obj->privateData; @@ -494,7 +495,7 @@ int qemuDomainObjBeginJob(virDomainObjPtr obj) if (virTimeMs(&now) < 0) return -1; - then = now + QEMU_JOB_WAIT_TIME; + then = now + (priv->monTimeout * 1000ull); virDomainObjRef(obj); @@ -535,7 +536,7 @@ int qemuDomainObjBeginJobWithDriver(struct qemud_driver *driver, if (virTimeMs(&now) < 0) return -1; - then = now + QEMU_JOB_WAIT_TIME; + then = now + (priv->monTimeout * 1000ull); virDomainObjRef(obj); qemuDriverUnlock(driver); diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 3d041fc..2a0ba6d 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -81,6 +81,7 @@ struct _qemuDomainObjPrivate { int monJSON; bool monError; unsigned long long monStart; + int monTimeout; bool gotShutdown; int nvcpupids; @@ -105,7 +106,7 @@ void qemuDomainEventFlush(int timer ATTRIBUTE_UNUSED, void *opaque); void qemuDomainEventQueue(struct qemud_driver *driver, virDomainEventPtr event); -void qemuDomainSetPrivateDataHooks(virCapsPtr caps); +void qemuDomainSetPrivateDataHooks(struct qemud_driver *driver, virCapsPtr caps); void qemuDomainSetNamespaceHooks(virCapsPtr caps); int qemuDomainObjBeginJob(virDomainObjPtr obj) ATTRIBUTE_RETURN_CHECK; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 01587e8..8aea541 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -241,7 +241,7 @@ qemuCreateCapabilities(virCapsPtr oldcaps, caps->defaultDiskDriverType = "raw"; } - qemuDomainSetPrivateDataHooks(caps); + qemuDomainSetPrivateDataHooks(driver, caps); qemuDomainSetNamespaceHooks(caps); if (virGetHostUUID(caps->host.host_uuid)) { diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index d9b6600..a118d49 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -51,6 +51,7 @@ struct _qemuMonitor { int fd; int watch; int hasSendFD; + int timeout; virDomainObjPtr vm; @@ -653,6 +654,7 @@ qemuMonitorPtr qemuMonitorOpen(virDomainObjPtr vm, virDomainChrSourceDefPtr config, int json, + int timeout, qemuMonitorCallbacksPtr cb) { qemuMonitorPtr mon; @@ -684,6 +686,7 @@ qemuMonitorOpen(virDomainObjPtr vm, mon->fd = -1; mon->refs = 1; mon->vm = vm; + mon->timeout = timeout; mon->json = json; mon->cb = cb; qemuMonitorLock(mon); @@ -781,9 +784,6 @@ char *qemuMonitorNextCommandID(qemuMonitorPtr mon) } -/* Give up waiting for reply after 30 seconds */ -#define QEMU_MONITOR_WAIT_TIME (1000ull * 30) - int qemuMonitorSend(qemuMonitorPtr mon, qemuMonitorMessagePtr msg) { @@ -793,7 +793,7 @@ int qemuMonitorSend(qemuMonitorPtr mon, if (virTimeMs(&now) < 0) return -1; - then = now + QEMU_MONITOR_WAIT_TIME; + then = now + (mon->timeout * 1000ull); /* Check whether qemu quited unexpectedly */ if (mon->lastError.code != VIR_ERR_OK) { diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 3bb0269..1cb9b44 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -130,6 +130,7 @@ char *qemuMonitorEscapeShell(const char *in); qemuMonitorPtr qemuMonitorOpen(virDomainObjPtr vm, virDomainChrSourceDefPtr config, int json, + int timeout, qemuMonitorCallbacksPtr cb); void qemuMonitorClose(qemuMonitorPtr mon); diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index b441137..efd72db 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -734,6 +734,7 @@ qemuConnectMonitor(struct qemud_driver *driver, virDomainObjPtr vm) priv->mon = qemuMonitorOpen(vm, priv->monConfig, priv->monJSON, + priv->monTimeout, &monitorCallbacks); /* Safe to ignore value since ref count was incremented above */ diff --git a/src/qemu/test_libvirtd_qemu.aug b/src/qemu/test_libvirtd_qemu.aug index b1f9114..4cd7957 100644 --- a/src/qemu/test_libvirtd_qemu.aug +++ b/src/qemu/test_libvirtd_qemu.aug @@ -115,6 +115,8 @@ vnc_auto_unix_socket = 1 max_processes = 12345 lock_manager = \"fcntl\" + +monitor_timeout = 60 " test Libvirtd_qemu.lns get conf = @@ -240,3 +242,5 @@ lock_manager = \"fcntl\" { "max_processes" = "12345" } { "#empty" } { "lock_manager" = "fcntl" } +{ "#empty" } +{ "monitor_timeout" = "60" } diff --git a/src/test/test_driver.c b/src/test/test_driver.c index 6c8b9cf..ead8db9 100644 --- a/src/test/test_driver.c +++ b/src/test/test_driver.c @@ -132,7 +132,7 @@ static void testDriverUnlock(testConnPtr driver) virMutexUnlock(&driver->lock); } -static void *testDomainObjPrivateAlloc(void) +static void *testDomainObjPrivateAlloc(void *opaque ATTRIBUTE_UNUSED) { testDomainObjPrivatePtr priv; diff --git a/src/uml/uml_driver.c b/src/uml/uml_driver.c index e557fe8..8d3dc5e 100644 --- a/src/uml/uml_driver.c +++ b/src/uml/uml_driver.c @@ -77,7 +77,7 @@ struct _umlDomainObjPrivate { static int umlShutdown(void); -static void *umlDomainObjPrivateAlloc(void) +static void *umlDomainObjPrivateAlloc(void *opaque ATTRIBUTE_UNUSED) { umlDomainObjPrivatePtr priv; diff --git a/src/vmware/vmware_driver.c b/src/vmware/vmware_driver.c index 5e2c4ba..2f8640d 100644 --- a/src/vmware/vmware_driver.c +++ b/src/vmware/vmware_driver.c @@ -49,7 +49,7 @@ vmwareDriverUnlock(struct vmware_driver *driver) } static void * -vmwareDataAllocFunc(void) +vmwareDataAllocFunc(void *opaque ATTRIBUTE_UNUSED) { vmwareDomainPtr dom; -- 1.7.4.4

On 06/22/2011 09:47 AM, Daniel P. Berrange wrote:
This makes the monitor timeout limit configurable via the qemu.conf configuration file.
* src/conf/capabilities.h: Allow private data to be passed to the virDomainObjPtr private allocation function * src/libxl/libxl_driver.c, src/lxc/lxc_driver.c, src/test/test_driver.c, src/uml/uml_driver.c, src/vmware/vmware_driver.c: Update for API change in capabilities * src/qemu/qemu.conf, src/qemu/test_libvirtd_qemu.aug,
Do we have any html documentation for conf files, or is the mention in qemu.conf of the new option sufficient?
+++ b/src/qemu/qemu.conf @@ -287,3 +287,15 @@ # this # # lock_manager = "fcntl" + +# To change the amount of time we wait for a reply +# from the QEMU monitor. QEMU monitor replies should +# be pretty quick (< 5 seconds), but if the host is +# seriously badly loaded this may take more time.
Double adverb 'seriously badly' doesn't sound very professional, but I don't know that I have a better suggestion.
+ p = virConfGetValue (conf, "monitor_timeout"); + CHECK_TYPE ("monitor_timeout", VIR_CONF_LONG); + if (p) driver->monTimeout = p->l;
arbitrary user value...
@@ -494,7 +495,7 @@ int qemuDomainObjBeginJob(virDomainObjPtr obj)
if (virTimeMs(&now) < 0) return -1; - then = now + QEMU_JOB_WAIT_TIME; + then = now + (priv->monTimeout * 1000ull);
which can then lead to integer overflow. We should have some sanity checking on the user's input. Perhaps just cap things that if the multiply would overflow, then just change driver->monTimeout to -1 (anyone with that large of a timeout won't be alive to wait around to see that we stopped short of infinity :); as well as have code that special cases -1 as a request to block rather than time out. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org
participants (5)
-
Daniel P. Berrange
-
Daniel Veillard
-
Eric Blake
-
Jiri Denemark
-
Michal Privoznik