[libvirt] [PATCH] qemu: Report errors from iohelper

Currently, we use iohelper when saving/restoring a domain. However, if there's some kind of error (like I/O) it is not propagated to libvirt. Since it is not qemu who is doing the actual write() it will not get error. The iohelper does. Therefore we should check for iohelper errors as it makes libvirt more user friendly. --- src/libvirt_private.syms | 1 + src/qemu/qemu_driver.c | 7 ++++- src/util/virfile.c | 47 ++++++++++++++++++++++++++++++++++++++++++++++ src/util/virfile.h | 2 + 4 files changed, 55 insertions(+), 2 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 60f9c7f..26eb7b1 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1332,6 +1332,7 @@ virDomainListSnapshots; virFileLoopDeviceAssociate; virFileClose; virFileDirectFdFlag; +virFileWrapperFdCatchError; virFileWrapperFdClose; virFileWrapperFdFree; virFileWrapperFdNew; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 8af316f..c8e97cf 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -2945,6 +2945,7 @@ endjob: if (rc < 0) VIR_WARN("Unable to resume guest CPUs after save failure"); } + virFileWrapperFdCatchError(wrapperFd); } if (qemuDomainObjEndAsyncJob(driver, vm) == 0) vm = NULL; @@ -3282,9 +3283,11 @@ doCoreDump(struct qemud_driver *driver, cleanup: VIR_FORCE_CLOSE(fd); - virFileWrapperFdFree(wrapperFd); - if (ret != 0) + if (ret != 0) { + virFileWrapperFdCatchError(wrapperFd); unlink(path); + } + virFileWrapperFdFree(wrapperFd); return ret; } diff --git a/src/util/virfile.c b/src/util/virfile.c index 5b00ead..4d7538e 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -135,6 +135,7 @@ virFileDirectFdFlag(void) * read-write is not supported, just a single direction. */ struct _virFileWrapperFd { virCommandPtr cmd; /* Child iohelper process to do the I/O. */ + int err_fd; /* FD to read stderr of @cmd */ }; #ifndef WIN32 @@ -229,6 +230,14 @@ virFileWrapperFdNew(int *fd, const char *name, unsigned int flags) virCommandAddArg(ret->cmd, "0"); } + /* In order to catch iohelper stderr, we must: + * - pass a FD to virCommand (-1 to auto-allocate one) + * - change iohelper's env so virLog functions print to stderr + */ + ret->err_fd = -1; + virCommandSetErrorFD(ret->cmd, &ret->err_fd); + virCommandAddEnvPair(ret->cmd, "LIBVIRT_LOG_OUTPUTS", "1:stderr"); + if (virCommandRunAsync(ret->cmd, NULL) < 0) goto error; @@ -280,6 +289,42 @@ virFileWrapperFdClose(virFileWrapperFdPtr wfd) return virCommandWait(wfd->cmd, NULL); } + +/** + * virFileWrapperFdCatchError: + * @wfd: fd wrapper, or NULL + * + * Read the stderr of iohelper and VIR_WARN() about it. + */ +void +virFileWrapperFdCatchError(virFileWrapperFdPtr wfd) +{ + char *err = NULL; + char ebuf[1024]; + size_t nread = 0; + size_t err_size = 0; + + if (!wfd) + return; + + while ((nread = saferead(wfd->err_fd, ebuf, sizeof(ebuf)))) { + if (VIR_REALLOC_N(err, err_size + nread + 1) < 0) { + virReportOOMError(); + goto cleanup; + } + memcpy(err + err_size, ebuf, nread); + err_size += nread; + err[err_size] = '\0'; + } + + if (err) + VIR_WARN("iohelper reports: %s", err); + +cleanup: + VIR_FREE(err); +} + + /** * virFileWrapperFdFree: * @wfd: fd wrapper, or NULL @@ -295,6 +340,8 @@ virFileWrapperFdFree(virFileWrapperFdPtr wfd) if (!wfd) return; + VIR_FORCE_CLOSE(wfd->err_fd); + virCommandFree(wfd->cmd); VIR_FREE(wfd); } diff --git a/src/util/virfile.h b/src/util/virfile.h index c885b73..80daf86 100644 --- a/src/util/virfile.h +++ b/src/util/virfile.h @@ -90,6 +90,8 @@ int virFileWrapperFdClose(virFileWrapperFdPtr dfd); void virFileWrapperFdFree(virFileWrapperFdPtr dfd); +void virFileWrapperFdCatchError(virFileWrapperFdPtr dfd); + int virFileLock(int fd, bool shared, off_t start, off_t len); int virFileUnlock(int fd, off_t start, off_t len); -- 1.7.8.6

On 10/23/2012 03:39 AM, Michal Privoznik wrote:
Currently, we use iohelper when saving/restoring a domain. However, if there's some kind of error (like I/O) it is not propagated to libvirt. Since it is not qemu who is doing the actual write() it will not get error. The iohelper does. Therefore we should check for iohelper errors as it makes libvirt more user friendly. --- src/libvirt_private.syms | 1 + src/qemu/qemu_driver.c | 7 ++++- src/util/virfile.c | 47 ++++++++++++++++++++++++++++++++++++++++++++++ src/util/virfile.h | 2 + 4 files changed, 55 insertions(+), 2 deletions(-)
I agree that catching iohelper errors is needed, but I'm worried that you might introduce deadlock with this approach (then again, it may just be theoretical):
+++ b/src/qemu/qemu_driver.c @@ -2945,6 +2945,7 @@ endjob: if (rc < 0) VIR_WARN("Unable to resume guest CPUs after save failure"); } + virFileWrapperFdCatchError(wrapperFd); } if (qemuDomainObjEndAsyncJob(driver, vm) == 0) vm = NULL; @@ -3282,9 +3283,11 @@ doCoreDump(struct qemud_driver *driver,
cleanup: VIR_FORCE_CLOSE(fd); - virFileWrapperFdFree(wrapperFd); - if (ret != 0) + if (ret != 0) { + virFileWrapperFdCatchError(wrapperFd); unlink(path);
You aren't doing any reading of the stderr file descriptor until you get to the cleanup label. But suppose that iohelper encounters situations that cause it to write more than PIPE_BUF bytes to stderr, but while still continuing to run. Then, because no one is reading the other end of the pipe, iohelper blocks on writes to stderr instead of proceeding on with the rest of its operations, and thus never gets to a point where it will exit in order to kick libvirtd over to the cleanup label to start reading the stderr pipe in the first place.
+++ b/src/util/virfile.c @@ -135,6 +135,7 @@ virFileDirectFdFlag(void) * read-write is not supported, just a single direction. */ struct _virFileWrapperFd { virCommandPtr cmd; /* Child iohelper process to do the I/O. */ + int err_fd; /* FD to read stderr of @cmd */ };
#ifndef WIN32 @@ -229,6 +230,14 @@ virFileWrapperFdNew(int *fd, const char *name, unsigned int flags) virCommandAddArg(ret->cmd, "0"); }
+ /* In order to catch iohelper stderr, we must: + * - pass a FD to virCommand (-1 to auto-allocate one) + * - change iohelper's env so virLog functions print to stderr + */ + ret->err_fd = -1; + virCommandSetErrorFD(ret->cmd, &ret->err_fd); + virCommandAddEnvPair(ret->cmd, "LIBVIRT_LOG_OUTPUTS", "1:stderr");
Would it work to use virCommandSetErrorBuffer() instead, or does that rely on using virCommandRun() to properly populate the buffer via poll(), where we are stuck using virCommandRunAsync() instead? Or, since we are already using virCommandRunAsync(), does that mean we need to figure out how to incorporate a poll() to favorably handle stderr in addition to everything else that iohelper is already handling? Or am I worrying too much, and if iohelper hits any issue where it would be logging to stderr in the first place, it will be exiting before filling up the stderr pipe and thus never deadlocking? At any rate, depending on how that question is answered, this code does indeed look useful. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 23.10.2012 16:51, Eric Blake wrote:
On 10/23/2012 03:39 AM, Michal Privoznik wrote:
Currently, we use iohelper when saving/restoring a domain. However, if there's some kind of error (like I/O) it is not propagated to libvirt. Since it is not qemu who is doing the actual write() it will not get error. The iohelper does. Therefore we should check for iohelper errors as it makes libvirt more user friendly. --- src/libvirt_private.syms | 1 + src/qemu/qemu_driver.c | 7 ++++- src/util/virfile.c | 47 ++++++++++++++++++++++++++++++++++++++++++++++ src/util/virfile.h | 2 + 4 files changed, 55 insertions(+), 2 deletions(-)
I agree that catching iohelper errors is needed, but I'm worried that you might introduce deadlock with this approach (then again, it may just be theoretical):
+++ b/src/qemu/qemu_driver.c @@ -2945,6 +2945,7 @@ endjob: if (rc < 0) VIR_WARN("Unable to resume guest CPUs after save failure"); } + virFileWrapperFdCatchError(wrapperFd); } if (qemuDomainObjEndAsyncJob(driver, vm) == 0) vm = NULL; @@ -3282,9 +3283,11 @@ doCoreDump(struct qemud_driver *driver,
cleanup: VIR_FORCE_CLOSE(fd); - virFileWrapperFdFree(wrapperFd); - if (ret != 0) + if (ret != 0) { + virFileWrapperFdCatchError(wrapperFd); unlink(path);
You aren't doing any reading of the stderr file descriptor until you get to the cleanup label. But suppose that iohelper encounters situations that cause it to write more than PIPE_BUF bytes to stderr, but while still continuing to run. Then, because no one is reading the other end of the pipe, iohelper blocks on writes to stderr instead of proceeding on with the rest of its operations, and thus never gets to a point where it will exit in order to kick libvirtd over to the cleanup label to start reading the stderr pipe in the first place.
+++ b/src/util/virfile.c @@ -135,6 +135,7 @@ virFileDirectFdFlag(void) * read-write is not supported, just a single direction. */ struct _virFileWrapperFd { virCommandPtr cmd; /* Child iohelper process to do the I/O. */ + int err_fd; /* FD to read stderr of @cmd */ };
#ifndef WIN32 @@ -229,6 +230,14 @@ virFileWrapperFdNew(int *fd, const char *name, unsigned int flags) virCommandAddArg(ret->cmd, "0"); }
+ /* In order to catch iohelper stderr, we must: + * - pass a FD to virCommand (-1 to auto-allocate one) + * - change iohelper's env so virLog functions print to stderr + */ + ret->err_fd = -1; + virCommandSetErrorFD(ret->cmd, &ret->err_fd); + virCommandAddEnvPair(ret->cmd, "LIBVIRT_LOG_OUTPUTS", "1:stderr");
Would it work to use virCommandSetErrorBuffer() instead, or does that rely on using virCommandRun() to properly populate the buffer via poll(), where we are stuck using virCommandRunAsync() instead? Or, since we are already using virCommandRunAsync(), does that mean we need to figure out how to incorporate a poll() to favorably handle stderr in addition to everything else that iohelper is already handling?
Yeah, I've tried te virCommandSetErrorBuffer() way firstly. But it requires the poll() which is wrapped in virCommandProcessIO(). However, it is static so it would needed to be dig out and run in a separate thread. But we can run virCommandRun directly then. So I think I can incorporate our event loop. Just add a callback once ret->err_fd is initialized.
Or am I worrying too much, and if iohelper hits any issue where it would be logging to stderr in the first place, it will be exiting before filling up the stderr pipe and thus never deadlocking?
Well, with current implementation of iohelper there is only one place where it doesn't die right after reporting an error: src/util/iohelper.c:160 I don't know how possible is to hit that.
At any rate, depending on how that question is answered, this code does indeed look useful.
Okay, since we would have to keep this limitation in mind when changing iohelper (and honestly, I'll forget immediately after the push) I think we should make this safer and read the error pipe in the event loop. I'll update the patch and resend. Michal
participants (2)
-
Eric Blake
-
Michal Privoznik