[libvirt] [PATCH v3 0/4] qemu: Report better error on dump/migrate failure

Changes from [v2]: * Move error reporting from virFileWrapperFdFree() to virFileWrapperFdClose(). Changes from [v1]: * Use VIR_FREE() followed by VIR_ALLOC_N() instead of manually setting the last (and only) byte of the array returned by VIR_REALLOC_N() to zero. [v2] https://www.redhat.com/archives/libvir-list/2019-February/msg00782.html [v1] https://www.redhat.com/archives/libvir-list/2019-February/msg00156.html Andrea Bolognani (4): util: Make it safe to call virFileWrapperFdClose() multiple times qemu: Always call virFileWrapperFdClose() util: Move error reporting back to virFileWrapperFdClose() util: Report error in virFileWrapperFdClose() src/qemu/qemu_driver.c | 12 ++++++------ src/util/virfile.c | 18 ++++++++++++++---- 2 files changed, 20 insertions(+), 10 deletions(-) -- 2.20.1

We'll want to use this function in the cleanup path soon, and in order to be able to do that we need to make sure we can call it multiple times on the same virFileWrapperFd without side effects. Signed-off-by: Andrea Bolognani <abologna@redhat.com> --- src/util/virfile.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/src/util/virfile.c b/src/util/virfile.c index 271bf5e796..42add5a2cd 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -175,6 +175,7 @@ virFileDirectFdFlag(void) /* Opaque type for managing a wrapper around a fd. For now, * read-write is not supported, just a single direction. */ struct _virFileWrapperFd { + bool closed; /* Whether virFileWrapperFdClose() has been already called */ virCommandPtr cmd; /* Child iohelper process to do the I/O. */ char *err_msg; /* stderr of @cmd */ }; @@ -323,16 +324,21 @@ virFileWrapperFdNew(int *fd ATTRIBUTE_UNUSED, * callers can conditionally create a virFileWrapperFd wrapper but * unconditionally call the cleanup code. To avoid deadlock, only * call this after closing the fd resulting from virFileWrapperFdNew(). + * + * This function can be safely called multiple times on the same @wfd. */ int virFileWrapperFdClose(virFileWrapperFdPtr wfd) { int ret; - if (!wfd) + if (!wfd || wfd->closed) return 0; ret = virCommandWait(wfd->cmd, NULL); + + wfd->closed = true; + return ret; } -- 2.20.1

Right now we're reporting errors in virFileWrapperFdFree(), but that's hardly the appropriate place to do so, as free functions are supposed to do nothing more than release allocated resources. We want to move that code back into virFileWrapperFdClose(), but before we can do that we need to make sure the function is actually called every time we're done processing the wrapped file. The cleanup path is the obvious candidate. For two of the users we can just move the call, but for the other two we need to duplicate it instead in order not to alter the existing behavior. Signed-off-by: Andrea Bolognani <abologna@redhat.com> --- src/qemu/qemu_driver.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 5118f4ad42..30f69b339b 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -3231,6 +3231,7 @@ qemuDomainSaveMemory(virQEMUDriverPtr driver, cleanup: VIR_FORCE_CLOSE(fd); + qemuFileWrapperFDClose(vm, wrapperFd); virFileWrapperFdFree(wrapperFd); virObjectUnref(cfg); @@ -3834,9 +3835,10 @@ doCoreDump(virQEMUDriverPtr driver, cleanup: VIR_FORCE_CLOSE(fd); + qemuFileWrapperFDClose(vm, wrapperFd); + virFileWrapperFdFree(wrapperFd); if (ret != 0) unlink(path); - virFileWrapperFdFree(wrapperFd); VIR_FREE(compressedpath); virObjectUnref(cfg); return ret; @@ -7043,17 +7045,16 @@ qemuDomainRestoreFlags(virConnectPtr conn, ret = qemuDomainSaveImageStartVM(conn, driver, vm, &fd, data, path, false, QEMU_ASYNC_JOB_START); - if (virFileWrapperFdClose(wrapperFd) < 0) - VIR_WARN("Failed to close %s", path); qemuProcessEndJob(driver, vm); cleanup: virDomainDefFree(def); VIR_FORCE_CLOSE(fd); + virFileWrapperFdClose(wrapperFd); + virFileWrapperFdFree(wrapperFd); virQEMUSaveDataFree(data); VIR_FREE(xmlout); - virFileWrapperFdFree(wrapperFd); if (vm && ret < 0) qemuDomainRemoveInactiveJob(driver, vm); virDomainObjEndAPI(&vm); @@ -7318,14 +7319,13 @@ qemuDomainObjRestore(virConnectPtr conn, ret = qemuDomainSaveImageStartVM(conn, driver, vm, &fd, data, path, start_paused, asyncJob); - if (virFileWrapperFdClose(wrapperFd) < 0) - VIR_WARN("Failed to close %s", path); cleanup: virQEMUSaveDataFree(data); VIR_FREE(xmlout); virDomainDefFree(def); VIR_FORCE_CLOSE(fd); + virFileWrapperFdClose(wrapperFd); virFileWrapperFdFree(wrapperFd); return ret; } -- 2.20.1

On Tue, Feb 19, 2019 at 05:52:29PM +0100, Andrea Bolognani wrote:
Right now we're reporting errors in virFileWrapperFdFree(), but that's hardly the appropriate place to do so, as free functions are supposed to do nothing more than release allocated resources.
We want to move that code back into virFileWrapperFdClose(), but before we can do that we need to make sure the function is actually called every time we're done processing the wrapped file. The cleanup path is the obvious candidate.
For two of the users we can just move the call, but for the other two we need to duplicate it instead in order not to alter the existing behavior.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> --- src/qemu/qemu_driver.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 5118f4ad42..30f69b339b 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -3231,6 +3231,7 @@ qemuDomainSaveMemory(virQEMUDriverPtr driver,
cleanup: VIR_FORCE_CLOSE(fd); + qemuFileWrapperFDClose(vm, wrapperFd);
Don't we need to check & propagate the return status of this, otherwise callers would mistakenly think qemuDomainSaveMemory has succeeeded, despite qemuFileWrapperFDClose having raised an error. Likewise all the other places below.
virFileWrapperFdFree(wrapperFd); virObjectUnref(cfg);
@@ -3834,9 +3835,10 @@ doCoreDump(virQEMUDriverPtr driver,
cleanup: VIR_FORCE_CLOSE(fd); + qemuFileWrapperFDClose(vm, wrapperFd); + virFileWrapperFdFree(wrapperFd); if (ret != 0) unlink(path); - virFileWrapperFdFree(wrapperFd); VIR_FREE(compressedpath); virObjectUnref(cfg); return ret; @@ -7043,17 +7045,16 @@ qemuDomainRestoreFlags(virConnectPtr conn,
ret = qemuDomainSaveImageStartVM(conn, driver, vm, &fd, data, path, false, QEMU_ASYNC_JOB_START); - if (virFileWrapperFdClose(wrapperFd) < 0) - VIR_WARN("Failed to close %s", path);
qemuProcessEndJob(driver, vm);
cleanup: virDomainDefFree(def); VIR_FORCE_CLOSE(fd); + virFileWrapperFdClose(wrapperFd); + virFileWrapperFdFree(wrapperFd); virQEMUSaveDataFree(data); VIR_FREE(xmlout); - virFileWrapperFdFree(wrapperFd); if (vm && ret < 0) qemuDomainRemoveInactiveJob(driver, vm); virDomainObjEndAPI(&vm); @@ -7318,14 +7319,13 @@ qemuDomainObjRestore(virConnectPtr conn,
ret = qemuDomainSaveImageStartVM(conn, driver, vm, &fd, data, path, start_paused, asyncJob); - if (virFileWrapperFdClose(wrapperFd) < 0) - VIR_WARN("Failed to close %s", path);
cleanup: virQEMUSaveDataFree(data); VIR_FREE(xmlout); virDomainDefFree(def); VIR_FORCE_CLOSE(fd); + virFileWrapperFdClose(wrapperFd); virFileWrapperFdFree(wrapperFd); return ret; } -- 2.20.1
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Tue, 2019-02-19 at 17:01 +0000, Daniel P. Berrangé wrote:
On Tue, Feb 19, 2019 at 05:52:29PM +0100, Andrea Bolognani wrote: [...]
@@ -3231,6 +3231,7 @@ qemuDomainSaveMemory(virQEMUDriverPtr driver,
cleanup: VIR_FORCE_CLOSE(fd); + qemuFileWrapperFDClose(vm, wrapperFd);
Don't we need to check & propagate the return status of this, otherwise callers would mistakenly think qemuDomainSaveMemory has succeeeded, despite qemuFileWrapperFDClose having raised an error. Likewise all the other places below.
In cases where qemuFileWrapperFDClose() returning an error was not considered an overall failure in the existing code, I have preserved that behavior. Then again, we're ultimately going to call virReportError() from it instead of just logging a message with VIR_WARN(), so perhaps not returning an overall failure would be confusing... -- Andrea Bolognani / Red Hat / Virtualization

On Wed, Feb 20, 2019 at 01:16:34PM +0100, Andrea Bolognani wrote:
On Tue, 2019-02-19 at 17:01 +0000, Daniel P. Berrangé wrote:
On Tue, Feb 19, 2019 at 05:52:29PM +0100, Andrea Bolognani wrote: [...]
@@ -3231,6 +3231,7 @@ qemuDomainSaveMemory(virQEMUDriverPtr driver,
cleanup: VIR_FORCE_CLOSE(fd); + qemuFileWrapperFDClose(vm, wrapperFd);
Don't we need to check & propagate the return status of this, otherwise callers would mistakenly think qemuDomainSaveMemory has succeeeded, despite qemuFileWrapperFDClose having raised an error. Likewise all the other places below.
In cases where qemuFileWrapperFDClose() returning an error was not considered an overall failure in the existing code, I have preserved that behavior.
FWIW, I consider that existing code to be buggy. It is akin to ignoring the return status of close(). Data you think you've written can in fact be lost. We usually only ignore close() return value if we're already in an error cleanup scenario. Any success codepaths should always honour the close() status. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

virFileWrapperFdFree(), like all free functions, is supposed to only release allocated resources, so error reporting is better suited for virFileWrapperFdClose(). This reverts most of commit b0c3e931804a. Signed-off-by: Andrea Bolognani <abologna@redhat.com> --- src/util/virfile.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/util/virfile.c b/src/util/virfile.c index 42add5a2cd..8e045279f0 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -337,6 +337,9 @@ virFileWrapperFdClose(virFileWrapperFdPtr wfd) ret = virCommandWait(wfd->cmd, NULL); + if (wfd->err_msg && *wfd->err_msg) + VIR_WARN("iohelper reports: %s", wfd->err_msg); + wfd->closed = true; return ret; @@ -357,9 +360,6 @@ virFileWrapperFdFree(virFileWrapperFdPtr wfd) if (!wfd) return; - if (wfd->err_msg && *wfd->err_msg) - VIR_WARN("iohelper reports: %s", wfd->err_msg); - virCommandAbort(wfd->cmd); VIR_FREE(wfd->err_msg); -- 2.20.1

On Tue, Feb 19, 2019 at 05:52:30PM +0100, Andrea Bolognani wrote:
virFileWrapperFdFree(), like all free functions, is supposed to only release allocated resources, so error reporting is better suited for virFileWrapperFdClose().
This reverts most of commit b0c3e931804a.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> --- src/util/virfile.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/src/util/virfile.c b/src/util/virfile.c index 42add5a2cd..8e045279f0 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -337,6 +337,9 @@ virFileWrapperFdClose(virFileWrapperFdPtr wfd)
ret = virCommandWait(wfd->cmd, NULL);
+ if (wfd->err_msg && *wfd->err_msg) + VIR_WARN("iohelper reports: %s", wfd->err_msg); + wfd->closed = true;
return ret; @@ -357,9 +360,6 @@ virFileWrapperFdFree(virFileWrapperFdPtr wfd) if (!wfd) return;
- if (wfd->err_msg && *wfd->err_msg) - VIR_WARN("iohelper reports: %s", wfd->err_msg); - virCommandAbort(wfd->cmd);
It feels like this could be reverted too. We already call virCommandFree which I think should call Abort for us - though not 100% confident since the scenario in which "reap = true" is set is quite complex.
VIR_FREE(wfd->err_msg);
Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Tue, 2019-02-19 at 16:59 +0000, Daniel P. Berrangé wrote:
On Tue, Feb 19, 2019 at 05:52:30PM +0100, Andrea Bolognani wrote:
virFileWrapperFdFree(), like all free functions, is supposed to only release allocated resources, so error reporting is better suited for virFileWrapperFdClose().
This reverts most of commit b0c3e931804a.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> --- src/util/virfile.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/src/util/virfile.c b/src/util/virfile.c index 42add5a2cd..8e045279f0 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -337,6 +337,9 @@ virFileWrapperFdClose(virFileWrapperFdPtr wfd)
ret = virCommandWait(wfd->cmd, NULL);
+ if (wfd->err_msg && *wfd->err_msg) + VIR_WARN("iohelper reports: %s", wfd->err_msg); + wfd->closed = true;
return ret; @@ -357,9 +360,6 @@ virFileWrapperFdFree(virFileWrapperFdPtr wfd) if (!wfd) return;
- if (wfd->err_msg && *wfd->err_msg) - VIR_WARN("iohelper reports: %s", wfd->err_msg); - virCommandAbort(wfd->cmd);
It feels like this could be reverted too. We already call virCommandFree which I think should call Abort for us - though not 100% confident since the scenario in which "reap = true" is set is quite complex.
According to virCommandRunAsync()'s documentation There are two approaches to child process cleanup. 1. Use auto-cleanup, by passing NULL for pid. The child will be auto-reaped by virCommandFree, unless you reap it earlier via virCommandWait or virCommandAbort. We call virCommandRunAsync() with pid=NULL in virFileWrapperFdNew(), and then both call virCommandWait() in virFileWrapperFdClose() *and* virCommandFree() in virFileWrapperFdFree(), so I think you're right that we don't need to call virCommandAbort() explicitly and we can simply revert all of b0c3e931804a. -- Andrea Bolognani / Red Hat / Virtualization

libvirt_iohelper is used internally by the virFileWrapperFd APIs; more specifically, in the QEMU driver we have the doCoreDump() and qemuDomainSaveMemory() helper functions as users, and those in turn end up being called by the implementation of several driver APIs. By calling virReportError() if libvirt_iohelper has failed, we overwrite whatever generic error message QEMU might have raised with the more useful one generated by the helper program. After this commit, the user will be able to see the error directly instead of having to dig in the journal or libvirtd log. https://bugzilla.redhat.com/show_bug.cgi?id=1578741 Signed-off-by: Andrea Bolognani <abologna@redhat.com> --- src/util/virfile.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/util/virfile.c b/src/util/virfile.c index 8e045279f0..1b61cbd127 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -337,8 +337,12 @@ virFileWrapperFdClose(virFileWrapperFdPtr wfd) ret = virCommandWait(wfd->cmd, NULL); + /* If the command used to process IO has produced errors, it's fair + * to assume those will be more relevant to the user than whatever + * eg. QEMU can figure out on its own, so it's okay if we end up + * discarding an existing error */ if (wfd->err_msg && *wfd->err_msg) - VIR_WARN("iohelper reports: %s", wfd->err_msg); + virReportError(VIR_ERR_OPERATION_FAILED, "%s", wfd->err_msg); wfd->closed = true; -- 2.20.1

On Tue, Feb 19, 2019 at 05:52:31PM +0100, Andrea Bolognani wrote:
libvirt_iohelper is used internally by the virFileWrapperFd APIs; more specifically, in the QEMU driver we have the doCoreDump() and qemuDomainSaveMemory() helper functions as users, and those in turn end up being called by the implementation of several driver APIs.
By calling virReportError() if libvirt_iohelper has failed, we overwrite whatever generic error message QEMU might have raised with the more useful one generated by the helper program.
After this commit, the user will be able to see the error directly instead of having to dig in the journal or libvirtd log.
https://bugzilla.redhat.com/show_bug.cgi?id=1578741
Signed-off-by: Andrea Bolognani <abologna@redhat.com> --- src/util/virfile.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/src/util/virfile.c b/src/util/virfile.c index 8e045279f0..1b61cbd127 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -337,8 +337,12 @@ virFileWrapperFdClose(virFileWrapperFdPtr wfd)
ret = virCommandWait(wfd->cmd, NULL);
+ /* If the command used to process IO has produced errors, it's fair + * to assume those will be more relevant to the user than whatever + * eg. QEMU can figure out on its own, so it's okay if we end up + * discarding an existing error */ if (wfd->err_msg && *wfd->err_msg) - VIR_WARN("iohelper reports: %s", wfd->err_msg); + virReportError(VIR_ERR_OPERATION_FAILED, "%s", wfd->err_msg);
ret needs to be set to -1 in this case
wfd->closed = true;
Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Tue, 2019-02-19 at 16:58 +0000, Daniel P. Berrangé wrote:
On Tue, Feb 19, 2019 at 05:52:31PM +0100, Andrea Bolognani wrote: [...]
@@ -337,8 +337,12 @@ virFileWrapperFdClose(virFileWrapperFdPtr wfd)
ret = virCommandWait(wfd->cmd, NULL);
+ /* If the command used to process IO has produced errors, it's fair + * to assume those will be more relevant to the user than whatever + * eg. QEMU can figure out on its own, so it's okay if we end up + * discarding an existing error */ if (wfd->err_msg && *wfd->err_msg) - VIR_WARN("iohelper reports: %s", wfd->err_msg); + virReportError(VIR_ERR_OPERATION_FAILED, "%s", wfd->err_msg);
ret needs to be set to -1 in this case
I expect that a command who produced output on stderr also exited with a non-zero return code, but explicitly erroring out if the former condition is detected can't possibly hurt I guess. -- Andrea Bolognani / Red Hat / Virtualization

On Wed, Feb 20, 2019 at 01:34:15PM +0100, Andrea Bolognani wrote:
On Tue, 2019-02-19 at 16:58 +0000, Daniel P. Berrangé wrote:
On Tue, Feb 19, 2019 at 05:52:31PM +0100, Andrea Bolognani wrote: [...]
@@ -337,8 +337,12 @@ virFileWrapperFdClose(virFileWrapperFdPtr wfd)
ret = virCommandWait(wfd->cmd, NULL);
+ /* If the command used to process IO has produced errors, it's fair + * to assume those will be more relevant to the user than whatever + * eg. QEMU can figure out on its own, so it's okay if we end up + * discarding an existing error */ if (wfd->err_msg && *wfd->err_msg) - VIR_WARN("iohelper reports: %s", wfd->err_msg); + virReportError(VIR_ERR_OPERATION_FAILED, "%s", wfd->err_msg);
ret needs to be set to -1 in this case
I expect that a command who produced output on stderr also exited with a non-zero return code, but explicitly erroring out if the former condition is detected can't possibly hurt I guess.
The alternative is to change the conditional to if (ret != 0 && wfd->err_msg && *wfd->err_msg) ... If we think there's a risk of the command printing stuff to stderr in a non-error scenario. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
participants (2)
-
Andrea Bolognani
-
Daniel P. Berrangé