[libvirt] [PATCH 0/3 v2] More graceful handling of QEMU monitor failures

An update of https://www.redhat.com/archives/libvir-list/2011-May/msg00657.html Currently when libvirt has a serious error doing I/O and/or parsing of the QEMU monitor, it will kill off the guest. Application developers have expressed a desire for more graceful handling of this scenario. In particular to allow the guest OS to continue to run, without any further monitor interactons, and then kill/restart it at a time which is convenient to the guest admin/apps. New in this posting: - Change the name of the event to 'CONTROL_ERROR' and use the generic callback, instead of passing a type field - Add python dispatch code - Add 3rd patch to improve qemu monitor error reporting

This introduces a new domain VIR_DOMAIN_EVENT_ID_CONTROL_ERROR Which uses the existing generic callback typedef void (*virConnectDomainEventGenericCallback)(virConnectPtr conn, virDomainPtr dom, void *opaque); This event is intended to be emitted when there is a failure in some part of the domain virtualization system. Whether the domain continues to run/exist after the failure is an implementation detail specific to the hypervisor. The idea is that with some types of failure, hypervisors may prefer to leave the domain running in a "degraded" mode of operation. For example, if something goes wrong with the QEMU monitor, it is possible to leave the guest OS running quite happily. The mgmt app will simply loose the ability todo various tasks. The mgmt app can then choose how/when to deal with the failure that occured. * daemon/remote.c: Dispatch of new event * examples/domain-events/events-c/event-test.c: Demo catch of event * include/libvirt/libvirt.h.in: Define event ID and callback * src/conf/domain_event.c, src/conf/domain_event.h: Internal event handling * src/remote/remote_driver.c: Receipt of new event from daemon * src/remote/remote_protocol.x: Wire protocol for new event --- daemon/remote.c | 29 ++++++++++++++++++++++++ examples/domain-events/events-c/event-test.c | 19 ++++++++++++++++ include/libvirt/libvirt.h.in | 1 + python/libvirt-override.c | 3 ++ src/conf/domain_event.c | 23 +++++++++++++++++++ src/conf/domain_event.h | 2 + src/libvirt_private.syms | 2 + src/remote/remote_driver.c | 31 ++++++++++++++++++++++++++ src/remote/remote_protocol.x | 7 +++++- 9 files changed, 116 insertions(+), 1 deletions(-) diff --git a/daemon/remote.c b/daemon/remote.c index 80783b3..59a7377 100644 --- a/daemon/remote.c +++ b/daemon/remote.c @@ -380,6 +380,34 @@ static int remoteRelayDomainEventGraphics(virConnectPtr conn ATTRIBUTE_UNUSED, } +static int remoteRelayDomainEventControlError(virConnectPtr conn ATTRIBUTE_UNUSED, + virDomainPtr dom, + void *opaque) +{ + struct qemud_client *client = opaque; + remote_domain_event_control_error_msg data; + + if (!client) + return -1; + + VIR_DEBUG("Relaying domain control error %s %d", dom->name, dom->id); + + virMutexLock(&client->lock); + + /* build return data */ + memset(&data, 0, sizeof data); + make_nonnull_domain(&data.dom, dom); + + remoteDispatchDomainEventSend(client, + REMOTE_PROC_DOMAIN_EVENT_CONTROL_ERROR, + (xdrproc_t)xdr_remote_domain_event_control_error_msg, &data); + + virMutexUnlock(&client->lock); + + return 0; +} + + static virConnectDomainEventGenericCallback domainEventCallbacks[] = { VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventLifecycle), VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventReboot), @@ -388,6 +416,7 @@ static virConnectDomainEventGenericCallback domainEventCallbacks[] = { VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventIOError), VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventGraphics), VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventIOErrorReason), + VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventControlError), }; verify(ARRAY_CARDINALITY(domainEventCallbacks) == VIR_DOMAIN_EVENT_ID_LAST); diff --git a/examples/domain-events/events-c/event-test.c b/examples/domain-events/events-c/event-test.c index 2da58b8..4766a0d 100644 --- a/examples/domain-events/events-c/event-test.c +++ b/examples/domain-events/events-c/event-test.c @@ -245,6 +245,17 @@ static int myDomainEventGraphicsCallback(virConnectPtr conn ATTRIBUTE_UNUSED, return 0; } +static int myDomainEventControlErrorCallback(virConnectPtr conn ATTRIBUTE_UNUSED, + virDomainPtr dom, + void *opaque ATTRIBUTE_UNUSED) +{ + printf("%s EVENT: Domain %s(%d) control error\n", __func__, virDomainGetName(dom), + virDomainGetID(dom)); + + return 0; +} + + static void myFreeFunc(void *opaque) { char *str = opaque; @@ -278,6 +289,7 @@ int main(int argc, char **argv) int callback5ret = -1; int callback6ret = -1; int callback7ret = -1; + int callback8ret = -1; struct sigaction action_stop; memset(&action_stop, 0, sizeof action_stop); @@ -336,6 +348,11 @@ int main(int argc, char **argv) VIR_DOMAIN_EVENT_ID_GRAPHICS, VIR_DOMAIN_EVENT_CALLBACK(myDomainEventGraphicsCallback), strdup("callback graphics"), myFreeFunc); + callback8ret = virConnectDomainEventRegisterAny(dconn, + NULL, + VIR_DOMAIN_EVENT_ID_CONTROL_ERROR, + VIR_DOMAIN_EVENT_CALLBACK(myDomainEventControlErrorCallback), + strdup("callback control error"), myFreeFunc); if ((callback1ret != -1) && (callback2ret != -1) && @@ -360,6 +377,8 @@ int main(int argc, char **argv) virConnectDomainEventDeregisterAny(dconn, callback5ret); virConnectDomainEventDeregisterAny(dconn, callback6ret); virConnectDomainEventDeregisterAny(dconn, callback7ret); + if (callback8ret != -1) + virConnectDomainEventDeregisterAny(dconn, callback8ret); } VIR_DEBUG("Closing connection"); diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index 7cd6e13..448951f 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -2525,6 +2525,7 @@ typedef enum { VIR_DOMAIN_EVENT_ID_IO_ERROR = 4, /* virConnectDomainEventIOErrorCallback */ VIR_DOMAIN_EVENT_ID_GRAPHICS = 5, /* virConnectDomainEventGraphicsCallback */ VIR_DOMAIN_EVENT_ID_IO_ERROR_REASON = 6, /* virConnectDomainEventIOErrorReasonCallback */ + VIR_DOMAIN_EVENT_ID_CONTROL_ERROR = 7, /* virConnectDomainEventGenericCallback */ /* * NB: this enum value will increase over time as new events are diff --git a/python/libvirt-override.c b/python/libvirt-override.c index a151e78..d15a865 100644 --- a/python/libvirt-override.c +++ b/python/libvirt-override.c @@ -3485,6 +3485,9 @@ libvirt_virConnectDomainEventRegisterAny(ATTRIBUTE_UNUSED PyObject * self, case VIR_DOMAIN_EVENT_ID_GRAPHICS: cb = VIR_DOMAIN_EVENT_CALLBACK(libvirt_virConnectDomainEventGraphicsCallback); break; + case VIR_DOMAIN_EVENT_ID_CONTROL_ERROR: + cb = VIR_DOMAIN_EVENT_CALLBACK(libvirt_virConnectDomainEventGenericCallback); + break; } if (!cb) { diff --git a/src/conf/domain_event.c b/src/conf/domain_event.c index b85765e..34a9d91 100644 --- a/src/conf/domain_event.c +++ b/src/conf/domain_event.c @@ -875,6 +875,24 @@ virDomainEventPtr virDomainEventGraphicsNewFromObj(virDomainObjPtr obj, } +virDomainEventPtr virDomainEventControlErrorNewFromDom(virDomainPtr dom) +{ + virDomainEventPtr ev = + virDomainEventNewInternal(VIR_DOMAIN_EVENT_ID_CONTROL_ERROR, + dom->id, dom->name, dom->uuid); + return ev; +} + + +virDomainEventPtr virDomainEventControlErrorNewFromObj(virDomainObjPtr obj) +{ + virDomainEventPtr ev = + virDomainEventNewInternal(VIR_DOMAIN_EVENT_ID_CONTROL_ERROR, + obj->def->id, obj->def->name, obj->def->uuid); + return ev; +} + + /** * virDomainEventQueuePop: * @evtQueue: the queue of events @@ -1004,6 +1022,11 @@ void virDomainEventDispatchDefaultFunc(virConnectPtr conn, cbopaque); break; + case VIR_DOMAIN_EVENT_ID_CONTROL_ERROR: + (cb)(conn, dom, + cbopaque); + break; + default: VIR_WARN("Unexpected event ID %d", event->eventID); break; diff --git a/src/conf/domain_event.h b/src/conf/domain_event.h index efc05f9..68dc8a8 100644 --- a/src/conf/domain_event.h +++ b/src/conf/domain_event.h @@ -167,6 +167,8 @@ virDomainEventPtr virDomainEventGraphicsNewFromObj(virDomainObjPtr obj, virDomainEventGraphicsAddressPtr remote, const char *authScheme, virDomainEventGraphicsSubjectPtr subject); +virDomainEventPtr virDomainEventControlErrorNewFromDom(virDomainPtr dom); +virDomainEventPtr virDomainEventControlErrorNewFromObj(virDomainObjPtr obj); diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 7b6151c..6036afe 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -370,6 +370,8 @@ virDomainEventCallbackListPurgeMarked; virDomainEventCallbackListRemove; virDomainEventCallbackListRemoveConn; virDomainEventCallbackListRemoveID; +virDomainEventControlErrorNewFromDom; +virDomainEventControlErrorNewFromObj; virDomainEventDispatch; virDomainEventDispatchDefaultFunc; virDomainEventFree; diff --git a/src/remote/remote_driver.c b/src/remote/remote_driver.c index 8c69743..97e9e2f 100644 --- a/src/remote/remote_driver.c +++ b/src/remote/remote_driver.c @@ -4093,6 +4093,33 @@ no_memory: } +static virDomainEventPtr +remoteDomainReadEventControlError(virConnectPtr conn, XDR *xdr) +{ + remote_domain_event_control_error_msg msg; + virDomainPtr dom; + virDomainEventPtr event = NULL; + memset (&msg, 0, sizeof msg); + + /* unmarshall parameters, and process it*/ + if (! xdr_remote_domain_event_control_error_msg(xdr, &msg) ) { + remoteError(VIR_ERR_RPC, "%s", + _("unable to demarshall reboot event")); + return NULL; + } + + dom = get_nonnull_domain(conn,msg.dom); + if (!dom) + return NULL; + + event = virDomainEventControlErrorNewFromDom(dom); + xdr_free ((xdrproc_t) &xdr_remote_domain_event_control_error_msg, (char *) &msg); + + virDomainFree(dom); + return event; +} + + static virDrvOpenStatus ATTRIBUTE_NONNULL (1) remoteSecretOpen(virConnectPtr conn, virConnectAuthPtr auth, int flags) { @@ -5903,6 +5930,10 @@ processCallDispatchMessage(virConnectPtr conn, struct private_data *priv, event = remoteDomainReadEventGraphics(conn, xdr); break; + case REMOTE_PROC_DOMAIN_EVENT_CONTROL_ERROR: + event = remoteDomainReadEventControlError(conn, xdr); + break; + default: VIR_DEBUG("Unexpected event proc %d", hdr->proc); break; diff --git a/src/remote/remote_protocol.x b/src/remote/remote_protocol.x index f0da95d..61f994f 100644 --- a/src/remote/remote_protocol.x +++ b/src/remote/remote_protocol.x @@ -2042,6 +2042,10 @@ struct remote_domain_migrate_confirm3_args { int cancelled; }; +struct remote_domain_event_control_error_msg { + remote_nonnull_domain dom; +}; + /*----- Protocol. -----*/ /* Define the program number, protocol version and procedure numbers here. */ @@ -2291,7 +2295,8 @@ enum remote_procedure { REMOTE_PROC_DOMAIN_MIGRATE_PERFORM3 = 216, /* skipgen skipgen */ REMOTE_PROC_DOMAIN_MIGRATE_FINISH3 = 217, /* skipgen skipgen */ REMOTE_PROC_DOMAIN_MIGRATE_CONFIRM3 = 218, /* skipgen skipgen */ - REMOTE_PROC_DOMAIN_SET_SCHEDULER_PARAMETERS_FLAGS = 219 /* skipgen skipgen */ + REMOTE_PROC_DOMAIN_SET_SCHEDULER_PARAMETERS_FLAGS = 219, /* skipgen skipgen */ + REMOTE_PROC_DOMAIN_EVENT_CONTROL_ERROR = 220 /* skipgen skipgen */ /* * Notice how the entries are grouped in sets of 10 ? -- 1.7.4.4

Currently whenever there is any failure with parsing the monitor, this is treated in the same was as end-of-file (ie QEMU quit). The domain is terminated, if not already dead. With this change, failures in parsing the monitor stream do not result in the death of QEMU. The guest continues running unchanged, but all further use of the monitor will be disabled. The VMM_FAILURE event will be emitted, and the mgmt application can decide when to kill/restart the guest to re-gain control * src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h: Run a different callback for monitor EOF vs error conditions. * src/qemu/qemu_process.c: Emit VMM_FAILURE event when monitor fails --- src/qemu/qemu_monitor.c | 45 +++++++++++++++++++++++++---------------- src/qemu/qemu_monitor.h | 7 +++-- src/qemu/qemu_process.c | 50 ++++++++++++++++++++++++++++++++++------------ 3 files changed, 68 insertions(+), 34 deletions(-) diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 5186f99..28e5252 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -517,7 +517,8 @@ static void qemuMonitorUpdateWatch(qemuMonitorPtr mon) static void qemuMonitorIO(int watch, int fd, int events, void *opaque) { qemuMonitorPtr mon = opaque; - int quit = 0, failed = 0; + bool error = false; + bool eof = false; /* lock access to the monitor and protect fd */ qemuMonitorLock(mon); @@ -528,27 +529,27 @@ qemuMonitorIO(int watch, int fd, int events, void *opaque) { if (mon->fd != fd || mon->watch != watch) { VIR_ERROR(_("event from unexpected fd %d!=%d / watch %d!=%d"), mon->fd, fd, mon->watch, watch); - failed = 1; + error = true; } else { if (!mon->lastErrno && events & VIR_EVENT_HANDLE_WRITABLE) { int done = qemuMonitorIOWrite(mon); if (done < 0) - failed = 1; + error = 1; events &= ~VIR_EVENT_HANDLE_WRITABLE; } if (!mon->lastErrno && events & VIR_EVENT_HANDLE_READABLE) { int got = qemuMonitorIORead(mon); if (got < 0) - failed = 1; + error = true; /* Ignore hangup/error events if we read some data, to * give time for that data to be consumed */ if (got > 0) { events = 0; if (qemuMonitorIOProcess(mon) < 0) - failed = 1; + error = true; } else events &= ~VIR_EVENT_HANDLE_READABLE; } @@ -572,36 +573,44 @@ qemuMonitorIO(int watch, int fd, int events, void *opaque) { mon->msg->lastErrno = EIO; virCondSignal(&mon->notify); } - quit = 1; + eof = 1; } else if (events) { VIR_ERROR(_("unhandled fd event %d for monitor fd %d"), events, mon->fd); - failed = 1; + error = 1; } } + if (eof || error) + mon->lastErrno = EIO; + + qemuMonitorUpdateWatch(mon); + /* We have to unlock to avoid deadlock against command thread, * but is this safe ? I think it is, because the callback * will try to acquire the virDomainObjPtr mutex next */ - if (failed || quit) { - void (*eofNotify)(qemuMonitorPtr, virDomainObjPtr, int) + if (eof) { + void (*eofNotify)(qemuMonitorPtr, virDomainObjPtr) = mon->cb->eofNotify; virDomainObjPtr vm = mon->vm; - /* If qemu quited unexpectedly, and we may try to send monitor - * command later. But we have no chance to wake up it. So set - * mon->lastErrno to EIO, and check it before sending monitor - * command. - */ - if (!mon->lastErrno) - mon->lastErrno = EIO; + /* Make sure anyone waiting wakes up now */ + virCondSignal(&mon->notify); + if (qemuMonitorUnref(mon) > 0) + qemuMonitorUnlock(mon); + VIR_DEBUG("Triggering EOF callback"); + (eofNotify)(mon, vm); + } else if (error) { + void (*errorNotify)(qemuMonitorPtr, virDomainObjPtr) + = mon->cb->errorNotify; + virDomainObjPtr vm = mon->vm; /* Make sure anyone waiting wakes up now */ virCondSignal(&mon->notify); if (qemuMonitorUnref(mon) > 0) qemuMonitorUnlock(mon); - VIR_DEBUG("Triggering EOF callback error? %d", failed); - (eofNotify)(mon, vm, failed); + VIR_DEBUG("Triggering error callback"); + (errorNotify)(mon, vm); } else { if (qemuMonitorUnref(mon) > 0) qemuMonitorUnlock(mon); diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 05c3359..0adef81 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -67,9 +67,10 @@ struct _qemuMonitorCallbacks { virDomainObjPtr vm); void (*eofNotify)(qemuMonitorPtr mon, - virDomainObjPtr vm, - int withError); - /* XXX we'd really like to avoid virCOnnectPtr here + virDomainObjPtr vm); + void (*errorNotify)(qemuMonitorPtr mon, + virDomainObjPtr vm); + /* XXX we'd really like to avoid virConnectPtr here * It is required so the callback can find the active * secret driver. Need to change this to work like the * security drivers do, to avoid this diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 01b15e0..4eca63b 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -100,12 +100,13 @@ extern struct qemud_driver *qemu_driver; */ static void qemuProcessHandleMonitorEOF(qemuMonitorPtr mon ATTRIBUTE_UNUSED, - virDomainObjPtr vm, - int hasError) + virDomainObjPtr vm) { struct qemud_driver *driver = qemu_driver; virDomainEventPtr event = NULL; qemuDomainObjPrivatePtr priv; + int eventReason = VIR_DOMAIN_EVENT_STOPPED_SHUTDOWN; + const char *auditReason = "shutdown"; VIR_DEBUG("Received EOF on %p '%s'", vm, vm->def->name); @@ -120,32 +121,54 @@ qemuProcessHandleMonitorEOF(qemuMonitorPtr mon ATTRIBUTE_UNUSED, } priv = vm->privateData; - if (!hasError && priv->monJSON && !priv->gotShutdown) { + if (priv->monJSON && !priv->gotShutdown) { VIR_DEBUG("Monitor connection to '%s' closed without SHUTDOWN event; " "assuming the domain crashed", vm->def->name); - hasError = 1; + eventReason = VIR_DOMAIN_EVENT_STOPPED_FAILED; + auditReason = "failed"; } event = virDomainEventNewFromObj(vm, - VIR_DOMAIN_EVENT_STOPPED, - hasError ? - VIR_DOMAIN_EVENT_STOPPED_FAILED : - VIR_DOMAIN_EVENT_STOPPED_SHUTDOWN); + VIR_DOMAIN_EVENT_STOPPED, eventReason); qemuProcessStop(driver, vm, 0, - hasError ? - VIR_DOMAIN_SHUTOFF_CRASHED : VIR_DOMAIN_SHUTOFF_SHUTDOWN); - qemuAuditDomainStop(vm, hasError ? "failed" : "shutdown"); + qemuAuditDomainStop(vm, auditReason); if (!vm->persistent) virDomainRemoveInactive(&driver->domains, vm); else virDomainObjUnlock(vm); - if (event) { + if (event) qemuDomainEventQueue(driver, event); - } + qemuDriverUnlock(driver); +} + + +/* + * This is invoked when there is some kind of error + * parsing data to/from the monitor. The VM can continue + * to run, but no further monitor commands will be + * allowed + */ +static void +qemuProcessHandleMonitorError(qemuMonitorPtr mon ATTRIBUTE_UNUSED, + virDomainObjPtr vm) +{ + struct qemud_driver *driver = qemu_driver; + virDomainEventPtr event = NULL; + + VIR_DEBUG("Received error on %p '%s'", vm, vm->def->name); + + qemuDriverLock(driver); + virDomainObjLock(vm); + + event = virDomainEventControlErrorNewFromObj(vm); + if (event) + qemuDomainEventQueue(driver, event); + + virDomainObjUnlock(vm); qemuDriverUnlock(driver); } @@ -626,6 +649,7 @@ static void qemuProcessHandleMonitorDestroy(qemuMonitorPtr mon, static qemuMonitorCallbacks monitorCallbacks = { .destroy = qemuProcessHandleMonitorDestroy, .eofNotify = qemuProcessHandleMonitorEOF, + .errorNotify = qemuProcessHandleMonitorError, .diskSecretLookup = qemuProcessFindVolumeQcowPassphrase, .domainShutdown = qemuProcessHandleShutdown, .domainStop = qemuProcessHandleStop, -- 1.7.4.4

On Tue, May 24, 2011 at 18:06:42 +0100, Daniel P. Berrange wrote:
Currently whenever there is any failure with parsing the monitor, this is treated in the same was as end-of-file (ie QEMU quit). The domain is terminated, if not already dead.
With this change, failures in parsing the monitor stream do not result in the death of QEMU. The guest continues running unchanged, but all further use of the monitor will be disabled.
The VMM_FAILURE event will be emitted, and the mgmt application can decide when to kill/restart the guest to re-gain control
* src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h: Run a different callback for monitor EOF vs error conditions. * src/qemu/qemu_process.c: Emit VMM_FAILURE event when monitor fails --- src/qemu/qemu_monitor.c | 45 +++++++++++++++++++++++++---------------- src/qemu/qemu_monitor.h | 7 +++-- src/qemu/qemu_process.c | 50 ++++++++++++++++++++++++++++++++++------------ 3 files changed, 68 insertions(+), 34 deletions(-) ... diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 01b15e0..4eca63b 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -100,12 +100,13 @@ extern struct qemud_driver *qemu_driver; */ static void qemuProcessHandleMonitorEOF(qemuMonitorPtr mon ATTRIBUTE_UNUSED, - virDomainObjPtr vm, - int hasError) + virDomainObjPtr vm) { struct qemud_driver *driver = qemu_driver; virDomainEventPtr event = NULL; qemuDomainObjPrivatePtr priv; + int eventReason = VIR_DOMAIN_EVENT_STOPPED_SHUTDOWN; + const char *auditReason = "shutdown";
VIR_DEBUG("Received EOF on %p '%s'", vm, vm->def->name);
@@ -120,32 +121,54 @@ qemuProcessHandleMonitorEOF(qemuMonitorPtr mon ATTRIBUTE_UNUSED, }
priv = vm->privateData; - if (!hasError && priv->monJSON && !priv->gotShutdown) { + if (priv->monJSON && !priv->gotShutdown) { VIR_DEBUG("Monitor connection to '%s' closed without SHUTDOWN event; " "assuming the domain crashed", vm->def->name); - hasError = 1; + eventReason = VIR_DOMAIN_EVENT_STOPPED_FAILED; + auditReason = "failed"; }
event = virDomainEventNewFromObj(vm, - VIR_DOMAIN_EVENT_STOPPED, - hasError ? - VIR_DOMAIN_EVENT_STOPPED_FAILED : - VIR_DOMAIN_EVENT_STOPPED_SHUTDOWN); + VIR_DOMAIN_EVENT_STOPPED, eventReason);
qemuProcessStop(driver, vm, 0, - hasError ? - VIR_DOMAIN_SHUTOFF_CRASHED : VIR_DOMAIN_SHUTOFF_SHUTDOWN);
I don't think this change is correct. Wee need to correct reason to be passed to qemuProcessStop so that virDomainGetState can report it. This way we would always report normal shutdown. Jirka

Currently whenever there is any failure with parsing the monitor, this is treated in the same was as end-of-file (ie QEMU quit). The domain is terminated, if not already dead. With this change, failures in parsing the monitor stream do not result in the death of QEMU. The guest continues running unchanged, but all further use of the monitor will be disabled. The VMM_FAILURE event will be emitted, and the mgmt application can decide when to kill/restart the guest to re-gain control * src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h: Run a different callback for monitor EOF vs error conditions. * src/qemu/qemu_process.c: Emit VMM_FAILURE event when monitor fails v2: Fix reason passed to qemuProcessStop() --- src/qemu/qemu_monitor.c | 45 +++++++++++++++++++++++--------------- src/qemu/qemu_monitor.h | 7 +++-- src/qemu/qemu_process.c | 55 ++++++++++++++++++++++++++++++++++------------ 3 files changed, 71 insertions(+), 36 deletions(-) diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 5186f99..28e5252 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -517,7 +517,8 @@ static void qemuMonitorUpdateWatch(qemuMonitorPtr mon) static void qemuMonitorIO(int watch, int fd, int events, void *opaque) { qemuMonitorPtr mon = opaque; - int quit = 0, failed = 0; + bool error = false; + bool eof = false; /* lock access to the monitor and protect fd */ qemuMonitorLock(mon); @@ -528,27 +529,27 @@ qemuMonitorIO(int watch, int fd, int events, void *opaque) { if (mon->fd != fd || mon->watch != watch) { VIR_ERROR(_("event from unexpected fd %d!=%d / watch %d!=%d"), mon->fd, fd, mon->watch, watch); - failed = 1; + error = true; } else { if (!mon->lastErrno && events & VIR_EVENT_HANDLE_WRITABLE) { int done = qemuMonitorIOWrite(mon); if (done < 0) - failed = 1; + error = 1; events &= ~VIR_EVENT_HANDLE_WRITABLE; } if (!mon->lastErrno && events & VIR_EVENT_HANDLE_READABLE) { int got = qemuMonitorIORead(mon); if (got < 0) - failed = 1; + error = true; /* Ignore hangup/error events if we read some data, to * give time for that data to be consumed */ if (got > 0) { events = 0; if (qemuMonitorIOProcess(mon) < 0) - failed = 1; + error = true; } else events &= ~VIR_EVENT_HANDLE_READABLE; } @@ -572,36 +573,44 @@ qemuMonitorIO(int watch, int fd, int events, void *opaque) { mon->msg->lastErrno = EIO; virCondSignal(&mon->notify); } - quit = 1; + eof = 1; } else if (events) { VIR_ERROR(_("unhandled fd event %d for monitor fd %d"), events, mon->fd); - failed = 1; + error = 1; } } + if (eof || error) + mon->lastErrno = EIO; + + qemuMonitorUpdateWatch(mon); + /* We have to unlock to avoid deadlock against command thread, * but is this safe ? I think it is, because the callback * will try to acquire the virDomainObjPtr mutex next */ - if (failed || quit) { - void (*eofNotify)(qemuMonitorPtr, virDomainObjPtr, int) + if (eof) { + void (*eofNotify)(qemuMonitorPtr, virDomainObjPtr) = mon->cb->eofNotify; virDomainObjPtr vm = mon->vm; - /* If qemu quited unexpectedly, and we may try to send monitor - * command later. But we have no chance to wake up it. So set - * mon->lastErrno to EIO, and check it before sending monitor - * command. - */ - if (!mon->lastErrno) - mon->lastErrno = EIO; + /* Make sure anyone waiting wakes up now */ + virCondSignal(&mon->notify); + if (qemuMonitorUnref(mon) > 0) + qemuMonitorUnlock(mon); + VIR_DEBUG("Triggering EOF callback"); + (eofNotify)(mon, vm); + } else if (error) { + void (*errorNotify)(qemuMonitorPtr, virDomainObjPtr) + = mon->cb->errorNotify; + virDomainObjPtr vm = mon->vm; /* Make sure anyone waiting wakes up now */ virCondSignal(&mon->notify); if (qemuMonitorUnref(mon) > 0) qemuMonitorUnlock(mon); - VIR_DEBUG("Triggering EOF callback error? %d", failed); - (eofNotify)(mon, vm, failed); + VIR_DEBUG("Triggering error callback"); + (errorNotify)(mon, vm); } else { if (qemuMonitorUnref(mon) > 0) qemuMonitorUnlock(mon); diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 05c3359..0adef81 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -67,9 +67,10 @@ struct _qemuMonitorCallbacks { virDomainObjPtr vm); void (*eofNotify)(qemuMonitorPtr mon, - virDomainObjPtr vm, - int withError); - /* XXX we'd really like to avoid virCOnnectPtr here + virDomainObjPtr vm); + void (*errorNotify)(qemuMonitorPtr mon, + virDomainObjPtr vm); + /* XXX we'd really like to avoid virConnectPtr here * It is required so the callback can find the active * secret driver. Need to change this to work like the * security drivers do, to avoid this diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 01b15e0..811fa28 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -100,12 +100,14 @@ extern struct qemud_driver *qemu_driver; */ static void qemuProcessHandleMonitorEOF(qemuMonitorPtr mon ATTRIBUTE_UNUSED, - virDomainObjPtr vm, - int hasError) + virDomainObjPtr vm) { struct qemud_driver *driver = qemu_driver; virDomainEventPtr event = NULL; qemuDomainObjPrivatePtr priv; + int eventReason = VIR_DOMAIN_EVENT_STOPPED_SHUTDOWN; + int stopReason = VIR_DOMAIN_SHUTOFF_SHUTDOWN; + const char *auditReason = "shutdown"; VIR_DEBUG("Received EOF on %p '%s'", vm, vm->def->name); @@ -120,32 +122,54 @@ qemuProcessHandleMonitorEOF(qemuMonitorPtr mon ATTRIBUTE_UNUSED, } priv = vm->privateData; - if (!hasError && priv->monJSON && !priv->gotShutdown) { + if (priv->monJSON && !priv->gotShutdown) { VIR_DEBUG("Monitor connection to '%s' closed without SHUTDOWN event; " "assuming the domain crashed", vm->def->name); - hasError = 1; + eventReason = VIR_DOMAIN_EVENT_STOPPED_FAILED; + stopReason = VIR_DOMAIN_SHUTOFF_FAILED; + auditReason = "failed"; } event = virDomainEventNewFromObj(vm, VIR_DOMAIN_EVENT_STOPPED, - hasError ? - VIR_DOMAIN_EVENT_STOPPED_FAILED : - VIR_DOMAIN_EVENT_STOPPED_SHUTDOWN); - - qemuProcessStop(driver, vm, 0, - hasError ? - VIR_DOMAIN_SHUTOFF_CRASHED : - VIR_DOMAIN_SHUTOFF_SHUTDOWN); - qemuAuditDomainStop(vm, hasError ? "failed" : "shutdown"); + eventReason); + qemuProcessStop(driver, vm, 0, stopReason); + qemuAuditDomainStop(vm, auditReason); if (!vm->persistent) virDomainRemoveInactive(&driver->domains, vm); else virDomainObjUnlock(vm); - if (event) { + if (event) qemuDomainEventQueue(driver, event); - } + qemuDriverUnlock(driver); +} + + +/* + * This is invoked when there is some kind of error + * parsing data to/from the monitor. The VM can continue + * to run, but no further monitor commands will be + * allowed + */ +static void +qemuProcessHandleMonitorError(qemuMonitorPtr mon ATTRIBUTE_UNUSED, + virDomainObjPtr vm) +{ + struct qemud_driver *driver = qemu_driver; + virDomainEventPtr event = NULL; + + VIR_DEBUG("Received error on %p '%s'", vm, vm->def->name); + + qemuDriverLock(driver); + virDomainObjLock(vm); + + event = virDomainEventControlErrorNewFromObj(vm); + if (event) + qemuDomainEventQueue(driver, event); + + virDomainObjUnlock(vm); qemuDriverUnlock(driver); } @@ -626,6 +650,7 @@ static void qemuProcessHandleMonitorDestroy(qemuMonitorPtr mon, static qemuMonitorCallbacks monitorCallbacks = { .destroy = qemuProcessHandleMonitorDestroy, .eofNotify = qemuProcessHandleMonitorEOF, + .errorNotify = qemuProcessHandleMonitorError, .diskSecretLookup = qemuProcessFindVolumeQcowPassphrase, .domainShutdown = qemuProcessHandleShutdown, .domainStop = qemuProcessHandleStop, -- 1.7.4.4

Currently the QEMU monitor I/O handler code uses errno values to report errors. This results in a sub-optimal error messages on certain conditions, in particular when parsing JSON strings malformed data simply results in 'EINVAL'. This changes the code to use the standard libvirt error reporting APIs. The virError is stored against the qemuMonitorPtr struct, and when a monitor API is run, any existing stored error is copied into that thread's error local * src/qemu/qemu_monitor.c, src/qemu/qemu_monitor.h, src/qemu/qemu_monitor_json.c, src/qemu/qemu_monitor_text.c: Use virError APIs for all monitor I/O handling code --- src/qemu/qemu_monitor.c | 161 ++++++++++++++++++++++++++++-------------- src/qemu/qemu_monitor.h | 11 ++- src/qemu/qemu_monitor_json.c | 85 +++++++++++----------- src/qemu/qemu_monitor_text.c | 51 +++++++------- 4 files changed, 183 insertions(+), 125 deletions(-) diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 28e5252..09a1b8e 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -69,7 +69,9 @@ struct _qemuMonitor { /* If anything went wrong, this will be fed back * the next monitor msg */ - int lastErrno; + virError lastError; + + int nextSerial; unsigned json: 1; unsigned json_hmp: 1; @@ -312,6 +314,10 @@ qemuMonitorOpenPty(const char *monitor) } +/* This method processes data that has been received + * from the monitor. Looking for async events and + * replies/errors. + */ static int qemuMonitorIOProcess(qemuMonitorPtr mon) { @@ -344,10 +350,8 @@ qemuMonitorIOProcess(qemuMonitorPtr mon) mon->buffer, mon->bufferOffset, msg); - if (len < 0) { - mon->lastErrno = errno; + if (len < 0) return -1; - } if (len < mon->bufferOffset) { memmove(mon->buffer, mon->buffer + len, mon->bufferOffset - len); @@ -378,11 +382,6 @@ qemuMonitorIOWriteWithFD(qemuMonitorPtr mon, char control[CMSG_SPACE(sizeof(int))]; struct cmsghdr *cmsg; - if (!mon->hasSendFD) { - errno = EINVAL; - return -1; - } - memset(&msg, 0, sizeof(msg)); iov[0].iov_base = (void *)data; @@ -423,6 +422,12 @@ qemuMonitorIOWrite(qemuMonitorPtr mon) if (!mon->msg || mon->msg->txOffset == mon->msg->txLength) return 0; + if (mon->msg->txFD != -1 && !mon->hasSendFD) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Monitor does not support sending of file descriptors")); + return -1; + } + if (mon->msg->txFD == -1) done = write(mon->fd, mon->msg->txBuffer + mon->msg->txOffset, @@ -437,7 +442,8 @@ qemuMonitorIOWrite(qemuMonitorPtr mon) if (errno == EAGAIN) return 0; - mon->lastErrno = errno; + virReportSystemError(errno, "%s", + _("Unable to write to monitor")); return -1; } mon->msg->txOffset += done; @@ -459,7 +465,7 @@ qemuMonitorIORead(qemuMonitorPtr mon) if (avail < 1024) { if (VIR_REALLOC_N(mon->buffer, mon->bufferLength + 1024) < 0) { - errno = ENOMEM; + virReportOOMError(); return -1; } mon->bufferLength += 1024; @@ -476,7 +482,8 @@ qemuMonitorIORead(qemuMonitorPtr mon) if (got < 0) { if (errno == EAGAIN) break; - mon->lastErrno = errno; + virReportSystemError(errno, "%s", + _("Unable to read from monitor")); ret = -1; break; } @@ -503,7 +510,7 @@ static void qemuMonitorUpdateWatch(qemuMonitorPtr mon) VIR_EVENT_HANDLE_HANGUP | VIR_EVENT_HANDLE_ERROR; - if (!mon->lastErrno) { + if (mon->lastError.code == VIR_ERR_OK) { events |= VIR_EVENT_HANDLE_READABLE; if (mon->msg && mon->msg->txOffset < mon->msg->txLength) @@ -528,61 +535,84 @@ qemuMonitorIO(int watch, int fd, int events, void *opaque) { #endif if (mon->fd != fd || mon->watch != watch) { - VIR_ERROR(_("event from unexpected fd %d!=%d / watch %d!=%d"), mon->fd, fd, mon->watch, watch); + if (events & VIR_EVENT_HANDLE_HANGUP) + eof = true; + qemuReportError(VIR_ERR_INTERNAL_ERROR, + _("event from unexpected fd %d!=%d / watch %d!=%d"), + mon->fd, fd, mon->watch, watch); + error = true; + } else if (mon->lastError.code != VIR_ERR_OK) { + if (events & VIR_EVENT_HANDLE_HANGUP) + eof = true; error = true; } else { - if (!mon->lastErrno && - events & VIR_EVENT_HANDLE_WRITABLE) { - int done = qemuMonitorIOWrite(mon); - if (done < 0) - error = 1; + if (events & VIR_EVENT_HANDLE_WRITABLE) { + if (qemuMonitorIOWrite(mon) < 0) + error = true; events &= ~VIR_EVENT_HANDLE_WRITABLE; } - if (!mon->lastErrno && + + if (!error && events & VIR_EVENT_HANDLE_READABLE) { int got = qemuMonitorIORead(mon); - if (got < 0) + events &= ~VIR_EVENT_HANDLE_READABLE; + if (got < 0) { error = true; - /* Ignore hangup/error events if we read some data, to - * give time for that data to be consumed */ - if (got > 0) { + } else if (got == 0) { + eof = true; + } else { + /* Ignore hangup/error events if we read some data, to + * give time for that data to be consumed */ events = 0; if (qemuMonitorIOProcess(mon) < 0) error = true; - } else - events &= ~VIR_EVENT_HANDLE_READABLE; + } } - /* If IO process resulted in an error & we have a message, - * then wakeup that waiter */ - if (mon->lastErrno && mon->msg && !mon->msg->finished) { - mon->msg->lastErrno = mon->lastErrno; - mon->msg->finished = 1; - virCondSignal(&mon->notify); + if (!error && + events & VIR_EVENT_HANDLE_HANGUP) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, + _("End of file from monitor")); + eof = 1; + events &= ~VIR_EVENT_HANDLE_HANGUP; } - qemuMonitorUpdateWatch(mon); - - if (events & (VIR_EVENT_HANDLE_HANGUP | - VIR_EVENT_HANDLE_ERROR)) { - /* If IO process resulted in EOF & we have a message, - * then wakeup that waiter */ - if (mon->msg && !mon->msg->finished) { - mon->msg->finished = 1; - mon->msg->lastErrno = EIO; - virCondSignal(&mon->notify); - } - eof = 1; - } else if (events) { - VIR_ERROR(_("unhandled fd event %d for monitor fd %d"), - events, mon->fd); + if (!error && !eof && + events & VIR_EVENT_HANDLE_ERROR) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, + _("Error while waiting for monitor")); + error = 1; + } + if (!error && events) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, + _("Unhandled event %d for monitor fd %d"), + events, mon->fd); error = 1; } } - if (eof || error) - mon->lastErrno = EIO; + if (error || eof) { + if (mon->lastError.code != VIR_ERR_OK) { + /* Already have an error, so clear any new error */ + virResetLastError(); + } else { + virErrorPtr err = virGetLastError(); + if (!err) + qemuReportError(VIR_ERR_INTERNAL_ERROR, + _("Error while processing monitor IO")); + virCopyLastError(&mon->lastError); + virResetLastError(); + } + + VIR_DEBUG("Error on monitor %s", NULLSTR(mon->lastError.message)); + /* If IO process resulted in an error & we have a message, + * then wakeup that waiter */ + if (mon->msg && !mon->msg->finished) { + mon->msg->finished = 1; + virCondSignal(&mon->notify); + } + } qemuMonitorUpdateWatch(mon); @@ -738,14 +768,28 @@ void qemuMonitorClose(qemuMonitorPtr mon) } +char *qemuMonitorNextCommandID(qemuMonitorPtr mon) +{ + char *id; + + if (virAsprintf(&id, "libvirt-%d", ++mon->nextSerial) < 0) { + virReportOOMError(); + return NULL; + } + return id; +} + + int qemuMonitorSend(qemuMonitorPtr mon, qemuMonitorMessagePtr msg) { int ret = -1; /* Check whether qemu quited unexpectedly */ - if (mon->lastErrno) { - msg->lastErrno = mon->lastErrno; + if (mon->lastError.code != VIR_ERR_OK) { + VIR_DEBUG("Attempt to send command while error is set %s", + NULLSTR(mon->lastError.message)); + virSetError(&mon->lastError); return -1; } @@ -753,12 +797,21 @@ int qemuMonitorSend(qemuMonitorPtr mon, qemuMonitorUpdateWatch(mon); while (!mon->msg->finished) { - if (virCondWait(&mon->notify, &mon->lock) < 0) + if (virCondWait(&mon->notify, &mon->lock) < 0) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Unable to wait on monitor condition")); goto cleanup; + } } - if (mon->lastErrno == 0) - ret = 0; + if (mon->lastError.code != VIR_ERR_OK) { + VIR_DEBUG("Send command resulted in error %s", + NULLSTR(mon->lastError.message)); + virSetError(&mon->lastError); + goto cleanup; + } + + ret = 0; cleanup: mon->msg = NULL; diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 0adef81..910865b 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -49,12 +49,16 @@ struct _qemuMonitorMessage { int txOffset; int txLength; + /* Used by the text monitor reply / error */ char *rxBuffer; int rxLength; + /* Used by the JSON monitor to hold reply / error */ + void *rxObject; - int finished; - - int lastErrno; + /* True if rxBuffer / rxObject are ready, or a + * fatal error occurred on the monitor channel + */ + bool finished; qemuMonitorPasswordHandler passwordHandler; void *passwordOpaque; @@ -137,6 +141,7 @@ int qemuMonitorRef(qemuMonitorPtr mon); int qemuMonitorUnref(qemuMonitorPtr mon) ATTRIBUTE_RETURN_CHECK; /* These APIs are for use by the internal Text/JSON monitor impl code only */ +char *qemuMonitorNextCommandID(qemuMonitorPtr mon); int qemuMonitorSend(qemuMonitorPtr mon, qemuMonitorMessagePtr msg); int qemuMonitorHMPCommandWithFd(qemuMonitorPtr mon, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 2d8a390..87cd45d 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -111,43 +111,38 @@ qemuMonitorJSONIOProcessLine(qemuMonitorPtr mon, VIR_DEBUG("Line [%s]", line); - if (!(obj = virJSONValueFromString(line))) { - VIR_DEBUG("Parsing JSON string failed"); - errno = EINVAL; + if (!(obj = virJSONValueFromString(line))) goto cleanup; - } if (obj->type != VIR_JSON_TYPE_OBJECT) { - VIR_DEBUG("Parsed JSON string isn't an object"); - errno = EINVAL; + qemuReportError(VIR_ERR_INTERNAL_ERROR, + _("Parsed JSON reply '%s' isn't an object"), line); + goto cleanup; } if (virJSONValueObjectHasKey(obj, "QMP") == 1) { - VIR_DEBUG("Got QMP capabilities data"); ret = 0; - goto cleanup; - } - - if (virJSONValueObjectHasKey(obj, "event") == 1) { + virJSONValueFree(obj); + } else if (virJSONValueObjectHasKey(obj, "event") == 1) { ret = qemuMonitorJSONIOProcessEvent(mon, obj); - goto cleanup; - } - - if (msg) { - if (!(msg->rxBuffer = strdup(line))) { - errno = ENOMEM; - goto cleanup; + } else if (virJSONValueObjectHasKey(obj, "error") == 1 || + virJSONValueObjectHasKey(obj, "return") == 1) { + if (msg) { + msg->rxObject = obj; + msg->finished = 1; + ret = 0; + } else { + qemuReportError(VIR_ERR_INTERNAL_ERROR, + _("Unexpected JSON reply '%s'"), line); } - msg->rxLength = strlen(line); - msg->finished = 1; } else { - VIR_DEBUG("Ignoring unexpected JSON message [%s]", line); + qemuReportError(VIR_ERR_INTERNAL_ERROR, + _("Unknown JSON reply '%s'"), line); } - ret = 0; - cleanup: - virJSONValueFree(obj); + if (ret < 0) + virJSONValueFree(obj); return ret; } @@ -166,7 +161,7 @@ int qemuMonitorJSONIOProcess(qemuMonitorPtr mon, int got = nl - (data + used); char *line = strndup(data + used, got); if (!line) { - errno = ENOMEM; + virReportOOMError(); return -1; } used += got + strlen(LINE_ENDING); @@ -195,11 +190,24 @@ qemuMonitorJSONCommandWithFd(qemuMonitorPtr mon, int ret = -1; qemuMonitorMessage msg; char *cmdstr = NULL; + char *id = NULL; + virJSONValuePtr exe; *reply = NULL; memset(&msg, 0, sizeof msg); + exe = virJSONValueObjectGet(cmd, "execute"); + if (exe) { + if (!(id = qemuMonitorNextCommandID(mon))) + goto cleanup; + if (virJSONValueObjectAppendString(cmd, "id", id) < 0) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Unable to append command 'id' string")); + goto cleanup; + } + } + if (!(cmdstr = virJSONValueToString(cmd))) { virReportOOMError(); goto cleanup; @@ -215,33 +223,24 @@ qemuMonitorJSONCommandWithFd(qemuMonitorPtr mon, ret = qemuMonitorSend(mon, &msg); - VIR_DEBUG("Receive command reply ret=%d errno=%d %d bytes '%s'", - ret, msg.lastErrno, msg.rxLength, msg.rxBuffer); + VIR_DEBUG("Receive command reply ret=%d rxObject=%p", + ret, msg.rxObject); - /* If we got ret==0, but not reply data something rather bad - * went wrong, so lets fake an EIO error */ - if (!msg.rxBuffer && ret == 0) { - msg.lastErrno = EIO; - ret = -1; - } - if (ret == 0) { - if (!((*reply) = virJSONValueFromString(msg.rxBuffer))) { - qemuReportError(VIR_ERR_INTERNAL_ERROR, - _("cannot parse JSON doc '%s'"), msg.rxBuffer); - goto cleanup; + if (!msg.rxObject) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Missing monitor reply object")); + ret = -1; + } else { + *reply = msg.rxObject; } } - if (ret < 0) - virReportSystemError(msg.lastErrno, - _("cannot send monitor command '%s'"), cmdstr); - cleanup: + VIR_FREE(id); VIR_FREE(cmdstr); VIR_FREE(msg.txBuffer); - VIR_FREE(msg.rxBuffer); return ret; } diff --git a/src/qemu/qemu_monitor_text.c b/src/qemu/qemu_monitor_text.c index 106f2d3..3b42e7a 100644 --- a/src/qemu/qemu_monitor_text.c +++ b/src/qemu/qemu_monitor_text.c @@ -153,7 +153,8 @@ int qemuMonitorTextIOProcess(qemuMonitorPtr mon ATTRIBUTE_UNUSED, #endif if (msg->passwordHandler) { int i; - /* Try and handle the prompt */ + /* Try and handle the prompt. The handler is required + * to report a normal libvirt error */ if (msg->passwordHandler(mon, msg, start, passwd - start + strlen(PASSWORD_PROMPT), @@ -168,7 +169,8 @@ int qemuMonitorTextIOProcess(qemuMonitorPtr mon ATTRIBUTE_UNUSED, /* Handled, so skip forward over password prompt */ start = passwd; } else { - errno = EACCES; + qemuReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Password request seen, but no handler available")); return -1; } } @@ -183,8 +185,10 @@ int qemuMonitorTextIOProcess(qemuMonitorPtr mon ATTRIBUTE_UNUSED, * BASIC_PROMPT we can reasonably reliably cope */ if (want) { if (VIR_REALLOC_N(msg->rxBuffer, - msg->rxLength + want + 1) < 0) + msg->rxLength + want + 1) < 0) { + virReportOOMError(); return -1; + } memcpy(msg->rxBuffer + msg->rxLength, start, want); msg->rxLength += want; msg->rxBuffer[msg->rxLength] = '\0'; @@ -234,30 +238,26 @@ qemuMonitorTextCommandWithHandler(qemuMonitorPtr mon, ret = qemuMonitorSend(mon, &msg); - VIR_DEBUG("Receive command reply ret=%d errno=%d %d bytes '%s'", - ret, msg.lastErrno, msg.rxLength, msg.rxBuffer); + VIR_DEBUG("Receive command reply ret=%d rxLength=%d rxBuffer='%s'", + ret, msg.rxLength, msg.rxBuffer); /* Just in case buffer had some passwords in */ memset(msg.txBuffer, 0, msg.txLength); VIR_FREE(msg.txBuffer); - /* To make life safer for callers, already ensure there's at least an empty string */ - if (msg.rxBuffer) { - *reply = msg.rxBuffer; - } else { - *reply = strdup(""); - if (!*reply) { - virReportOOMError(); - return -1; + if (ret >= 0) { + /* To make life safer for callers, already ensure there's at least an empty string */ + if (msg.rxBuffer) { + *reply = msg.rxBuffer; + } else { + *reply = strdup(""); + if (!*reply) { + virReportOOMError(); + return -1; + } } } - if (ret < 0) { - virReportSystemError(msg.lastErrno, - _("cannot send monitor command '%s'"), cmd); - VIR_FREE(*reply); - } - return ret; } @@ -297,15 +297,16 @@ qemuMonitorSendDiskPassphrase(qemuMonitorPtr mon, pathStart = strstr(data, DISK_ENCRYPTION_PREFIX); pathEnd = strstr(data, DISK_ENCRYPTION_POSTFIX); if (!pathStart || !pathEnd || pathStart >= pathEnd) { - errno = -EINVAL; + qemuReportError(VIR_ERR_INTERNAL_ERROR, + _("Unable to extract disk path from %s"), + data); return -1; } /* Extra the path */ pathStart += strlen(DISK_ENCRYPTION_PREFIX); - path = strndup(pathStart, pathEnd - pathStart); - if (!path) { - errno = ENOMEM; + if (!(path = strndup(pathStart, pathEnd - pathStart))) { + virReportOOMError(); return -1; } @@ -325,7 +326,7 @@ qemuMonitorSendDiskPassphrase(qemuMonitorPtr mon, msg->txLength + passphrase_len + 1 + 1) < 0) { memset(passphrase, 0, passphrase_len); VIR_FREE(passphrase); - errno = ENOMEM; + virReportOOMError(); return -1; } @@ -763,7 +764,7 @@ qemuMonitorSendVNCPassphrase(qemuMonitorPtr mon ATTRIBUTE_UNUSED, * to be sent back */ if (VIR_REALLOC_N(msg->txBuffer, msg->txLength + passphrase_len + 1 + 1) < 0) { - errno = ENOMEM; + virReportOOMError(); return -1; } -- 1.7.4.4

On Tue, May 24, 2011 at 06:06:40PM +0100, Daniel P. Berrange wrote:
An update of
https://www.redhat.com/archives/libvir-list/2011-May/msg00657.html
Currently when libvirt has a serious error doing I/O and/or parsing of the QEMU monitor, it will kill off the guest. Application developers have expressed a desire for more graceful handling of this scenario. In particular to allow the guest OS to continue to run, without any further monitor interactons, and then kill/restart it at a time which is convenient to the guest admin/apps.
New in this posting:
- Change the name of the event to 'CONTROL_ERROR' and use the generic callback, instead of passing a type field - Add python dispatch code - Add 3rd patch to improve qemu monitor error reporting
Okay reviewed, principle sounds good we can probably use the event for other hypervisors. I just had to modify 1/3 to add a description of the event message in the protocol ABI check file src/remote_protocol-structs ACK, and pushed, Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/
participants (3)
-
Daniel P. Berrange
-
Daniel Veillard
-
Jiri Denemark