[libvirt] [PATCH 0/2] Fix a couple get all domain stats issues

One is longer term (patch1), while the other is sourced in this release (4.10.0) when IOThread stats were added. John Ferlan (2): qemu: Save qemuDomainGetStats error qemu: Don't fail stats collection due to IOThread capability src/qemu/qemu_driver.c | 28 ++++++++++++++++++++-------- 1 file changed, 20 insertions(+), 8 deletions(-) -- 2.17.2

During qemuConnectGetAllDomainStats if qemuDomainGetStats causes a failure, then when collecting more than one domain's worth of statistics the loop in virDomainStatsRecordListFree would call virDomainFree which would call virResetLastError effectively wiping out the reason we failed leaving the caller with no idea why the collection failed. To fix this, let's save the error and restore it prior to return so that a caller such as 'virsh domstats' doesn't get the generic "error: An error occurred, but the cause is unknown". Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/qemu/qemu_driver.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 7d9e17e72c..2fb8eef609 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -21092,6 +21092,7 @@ qemuConnectGetAllDomainStats(virConnectPtr conn, unsigned int flags) { virQEMUDriverPtr driver = conn->privateData; + virErrorPtr save_err = NULL; virDomainObjPtr *vms = NULL; virDomainObjPtr vm; size_t nvms; @@ -21160,6 +21161,7 @@ qemuConnectGetAllDomainStats(virConnectPtr conn, if (flags & VIR_CONNECT_GET_ALL_DOMAINS_STATS_BACKING) domflags |= QEMU_DOMAIN_STATS_BACKING; if (qemuDomainGetStats(conn, vm, stats, &tmp, domflags) < 0) { + save_err = virSaveLastError(); if (HAVE_JOB(domflags) && vm) qemuDomainObjEndJob(driver, vm); @@ -21184,6 +21186,10 @@ qemuConnectGetAllDomainStats(virConnectPtr conn, cleanup: virDomainStatsRecordListFree(tmpstats); virObjectListFreeCount(vms, nvms); + if (save_err) { + virSetError(save_err); + virFreeError(save_err); + } return ret; } -- 2.17.2

On Tue, Nov 27, 2018 at 11:23:22AM -0500, John Ferlan wrote:
During qemuConnectGetAllDomainStats if qemuDomainGetStats causes a failure, then when collecting more than one domain's worth of statistics the loop in virDomainStatsRecordListFree would call virDomainFree which would call virResetLastError effectively wiping
A (probably) silly question, but why do we have to call virResetLastError as within virDomainFree in the first place? Erik

On Wed, Nov 28, 2018 at 12:49:14PM +0100, Erik Skultety wrote:
On Tue, Nov 27, 2018 at 11:23:22AM -0500, John Ferlan wrote:
During qemuConnectGetAllDomainStats if qemuDomainGetStats causes a failure, then when collecting more than one domain's worth of statistics the loop in virDomainStatsRecordListFree would call virDomainFree which would call virResetLastError effectively wiping
A (probably) silly question, but why do we have to call virResetLastError as within virDomainFree in the first place?
Well virDomainFree can return an error just like any other API can :-) See virCheckDomainReturn(...) Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Wed, Nov 28, 2018 at 12:08:51PM +0000, Daniel P. Berrangé wrote:
On Wed, Nov 28, 2018 at 12:49:14PM +0100, Erik Skultety wrote:
On Tue, Nov 27, 2018 at 11:23:22AM -0500, John Ferlan wrote:
During qemuConnectGetAllDomainStats if qemuDomainGetStats causes a failure, then when collecting more than one domain's worth of statistics the loop in virDomainStatsRecordListFree would call virDomainFree which would call virResetLastError effectively wiping
A (probably) silly question, but why do we have to call virResetLastError as within virDomainFree in the first place?
Well virDomainFree can return an error just like any other API can :-) See virCheckDomainReturn(...)
In which case we'd get into virRaiseErrorFull which already resets the error object inside, that's why I'm asking whether there's another reason for it. Erik

On Wed, Nov 28, 2018 at 01:26:17PM +0100, Erik Skultety wrote:
On Wed, Nov 28, 2018 at 12:08:51PM +0000, Daniel P. Berrangé wrote:
On Wed, Nov 28, 2018 at 12:49:14PM +0100, Erik Skultety wrote:
On Tue, Nov 27, 2018 at 11:23:22AM -0500, John Ferlan wrote:
During qemuConnectGetAllDomainStats if qemuDomainGetStats causes a failure, then when collecting more than one domain's worth of statistics the loop in virDomainStatsRecordListFree would call virDomainFree which would call virResetLastError effectively wiping
A (probably) silly question, but why do we have to call virResetLastError as within virDomainFree in the first place?
Well virDomainFree can return an error just like any other API can :-) See virCheckDomainReturn(...)
In which case we'd get into virRaiseErrorFull which already resets the error object inside, that's why I'm asking whether there's another reason for it.
That is insuficient as APIs need to guarantee that when they *don't* take the error path, the last error is clear. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Wed, Nov 28, 2018 at 12:28:21PM +0000, Daniel P. Berrangé wrote:
On Wed, Nov 28, 2018 at 01:26:17PM +0100, Erik Skultety wrote:
On Wed, Nov 28, 2018 at 12:08:51PM +0000, Daniel P. Berrangé wrote:
On Wed, Nov 28, 2018 at 12:49:14PM +0100, Erik Skultety wrote:
On Tue, Nov 27, 2018 at 11:23:22AM -0500, John Ferlan wrote:
During qemuConnectGetAllDomainStats if qemuDomainGetStats causes a failure, then when collecting more than one domain's worth of statistics the loop in virDomainStatsRecordListFree would call virDomainFree which would call virResetLastError effectively wiping
A (probably) silly question, but why do we have to call virResetLastError as within virDomainFree in the first place?
Well virDomainFree can return an error just like any other API can :-) See virCheckDomainReturn(...)
In which case we'd get into virRaiseErrorFull which already resets the error object inside, that's why I'm asking whether there's another reason for it.
That is insuficient as APIs need to guarantee that when they *don't* take the error path, the last error is clear.
Right, makes sense, thanks. Erik

On Tue, Nov 27, 2018 at 11:23:22AM -0500, John Ferlan wrote:
During qemuConnectGetAllDomainStats if qemuDomainGetStats causes a failure, then when collecting more than one domain's worth of statistics the loop in virDomainStatsRecordListFree would call virDomainFree which would call virResetLastError effectively wiping out the reason we failed leaving the caller with no idea why the collection failed.
To fix this, let's save the error and restore it prior to return so that a caller such as 'virsh domstats' doesn't get the generic "error: An error occurred, but the cause is unknown".
Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/qemu/qemu_driver.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 7d9e17e72c..2fb8eef609 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -21092,6 +21092,7 @@ qemuConnectGetAllDomainStats(virConnectPtr conn, unsigned int flags) { virQEMUDriverPtr driver = conn->privateData; + virErrorPtr save_err = NULL;
git grep virError src/qemu shows most files use orig_err as the variable name
virDomainObjPtr *vms = NULL; virDomainObjPtr vm; size_t nvms; @@ -21160,6 +21161,7 @@ qemuConnectGetAllDomainStats(virConnectPtr conn, if (flags & VIR_CONNECT_GET_ALL_DOMAINS_STATS_BACKING) domflags |= QEMU_DOMAIN_STATS_BACKING; if (qemuDomainGetStats(conn, vm, stats, &tmp, domflags) < 0) { + save_err = virSaveLastError();
Since virDomainStatsRecordListFree is the one resetting the error, I think we should only save it right before that call. This would cause a memory leak if qemuDomainGetStats would fail for more than one domain.
if (HAVE_JOB(domflags) && vm) qemuDomainObjEndJob(driver, vm);
@@ -21184,6 +21186,10 @@ qemuConnectGetAllDomainStats(virConnectPtr conn, cleanup:
Here. And we have a specific helper for that: virErrorPreserveLast(&orig_err);
virDomainStatsRecordListFree(tmpstats); virObjectListFreeCount(vms, nvms); + if (save_err) { + virSetError(save_err); + virFreeError(save_err); + }
virErrorRestore(&orig_err); With that addressed: Reviewed-by: Ján Tomko <jtomko@redhat.com> Jano

On Thu, Dec 06, 2018 at 02:49:59PM +0100, Ján Tomko wrote:
On Tue, Nov 27, 2018 at 11:23:22AM -0500, John Ferlan wrote:
During qemuConnectGetAllDomainStats if qemuDomainGetStats causes a failure, then when collecting more than one domain's worth of statistics the loop in virDomainStatsRecordListFree would call virDomainFree which would call virResetLastError effectively wiping out the reason we failed leaving the caller with no idea why the collection failed.
To fix this, let's save the error and restore it prior to return so that a caller such as 'virsh domstats' doesn't get the generic "error: An error occurred, but the cause is unknown".
Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/qemu/qemu_driver.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 7d9e17e72c..2fb8eef609 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -21092,6 +21092,7 @@ qemuConnectGetAllDomainStats(virConnectPtr conn, unsigned int flags) { virQEMUDriverPtr driver = conn->privateData; + virErrorPtr save_err = NULL;
git grep virError src/qemu shows most files use orig_err as the variable name
virDomainObjPtr *vms = NULL; virDomainObjPtr vm; size_t nvms; @@ -21160,6 +21161,7 @@ qemuConnectGetAllDomainStats(virConnectPtr conn, if (flags & VIR_CONNECT_GET_ALL_DOMAINS_STATS_BACKING) domflags |= QEMU_DOMAIN_STATS_BACKING; if (qemuDomainGetStats(conn, vm, stats, &tmp, domflags) < 0) { + save_err = virSaveLastError();
Since virDomainStatsRecordListFree is the one resetting the error, I think we should only save it right before that call.
This would cause a memory leak if qemuDomainGetStats would fail for more than one domain.
Ehm, disregard this sentence. (Thanks Erik for pointing that out) But I still consider the cleanup section a beter place for this. Jano
if (HAVE_JOB(domflags) && vm) qemuDomainObjEndJob(driver, vm);
@@ -21184,6 +21186,10 @@ qemuConnectGetAllDomainStats(virConnectPtr conn, cleanup:
Here. And we have a specific helper for that: virErrorPreserveLast(&orig_err);
virDomainStatsRecordListFree(tmpstats); virObjectListFreeCount(vms, nvms); + if (save_err) { + virSetError(save_err); + virFreeError(save_err); + }
virErrorRestore(&orig_err);
With that addressed: Reviewed-by: Ján Tomko <jtomko@redhat.com>
Jano
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On 12/6/18 10:15 AM, Ján Tomko wrote:
On Thu, Dec 06, 2018 at 02:49:59PM +0100, Ján Tomko wrote:
On Tue, Nov 27, 2018 at 11:23:22AM -0500, John Ferlan wrote:
During qemuConnectGetAllDomainStats if qemuDomainGetStats causes a failure, then when collecting more than one domain's worth of statistics the loop in virDomainStatsRecordListFree would call virDomainFree which would call virResetLastError effectively wiping out the reason we failed leaving the caller with no idea why the collection failed.
To fix this, let's save the error and restore it prior to return so that a caller such as 'virsh domstats' doesn't get the generic "error: An error occurred, but the cause is unknown".
Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/qemu/qemu_driver.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 7d9e17e72c..2fb8eef609 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -21092,6 +21092,7 @@ qemuConnectGetAllDomainStats(virConnectPtr conn, unsigned int flags) { virQEMUDriverPtr driver = conn->privateData; + virErrorPtr save_err = NULL;
git grep virError src/qemu shows most files use orig_err as the variable name
OK - I'll modify... Of course the example I cut-n-paste'd from was save_err - chances were slim, but possible.
virDomainObjPtr *vms = NULL; virDomainObjPtr vm; size_t nvms; @@ -21160,6 +21161,7 @@ qemuConnectGetAllDomainStats(virConnectPtr conn, if (flags & VIR_CONNECT_GET_ALL_DOMAINS_STATS_BACKING) domflags |= QEMU_DOMAIN_STATS_BACKING; if (qemuDomainGetStats(conn, vm, stats, &tmp, domflags) < 0) { + save_err = virSaveLastError();
Since virDomainStatsRecordListFree is the one resetting the error, I think we should only save it right before that call.
This would cause a memory leak if qemuDomainGetStats would fail for more than one domain.
Ehm, disregard this sentence. (Thanks Erik for pointing that out) But I still consider the cleanup section a beter place for this.
It was a 50/50 coin flip for placement. I'm fine with and will move this to just before the virDomainStatsRecordListFree. Perhaps the more common model used throughout the code. I hedged my "bets" by putting it here because I thought it was clearer as to why an error would be saved.
if (HAVE_JOB(domflags) && vm) qemuDomainObjEndJob(driver, vm);
@@ -21184,6 +21186,10 @@ qemuConnectGetAllDomainStats(virConnectPtr conn, cleanup:
Here. And we have a specific helper for that: virErrorPreserveLast(&orig_err);
We should be "more consistent" in the code regarding usage to avoid future copypasta's, but going with Preserve and Restore is fine by me. And by more consistent the underlying question/tone is - should the virSaveLastError w/ paired virSetError/virFreeError all be replaced with the Preserve/Restore... I see that could cause a lot of short term churn, but perhaps the long term gain is worth it. The second question then becomes all at one time or module my module... Thanks for the review - the changes are made here at least. John
virDomainStatsRecordListFree(tmpstats); virObjectListFreeCount(vms, nvms); + if (save_err) { + virSetError(save_err); + virFreeError(save_err); + }
virErrorRestore(&orig_err);
With that addressed: Reviewed-by: Ján Tomko <jtomko@redhat.com>
Jano
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

Commit 212dc9286 made a generic qemuDomainGetIOThreadsMon which would fail if the QEMU_CAPS_OBJECT_IOTHREAD didn't exist. Then commit d1eac927 used that helper for the collection of all domain stats. However, if the capability doesn't exist, then the entire stats collection fails. Since the IOThread stats were meant to be if available only, thus rather than failing if the capability doesn't exist, let's just not collect the stats. Restore the caps failure logic for qemuDomainGetIOThreadsLive. Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/qemu/qemu_driver.c | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 2fb8eef609..60e29577ad 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -5490,16 +5490,9 @@ qemuDomainGetIOThreadsMon(virQEMUDriverPtr driver, virDomainObjPtr vm, qemuMonitorIOThreadInfoPtr **iothreads) { - qemuDomainObjPrivatePtr priv; + qemuDomainObjPrivatePtr priv = vm->privateData; int niothreads = 0; - priv = vm->privateData; - if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_OBJECT_IOTHREAD)) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("IOThreads not supported with this binary")); - return -1; - } - qemuDomainObjEnterMonitor(driver, vm); niothreads = qemuMonitorGetIOThreads(priv->mon, iothreads); if (qemuDomainObjExitMonitor(driver, vm) < 0 || niothreads < 0) @@ -5514,6 +5507,7 @@ qemuDomainGetIOThreadsLive(virQEMUDriverPtr driver, virDomainObjPtr vm, virDomainIOThreadInfoPtr **info) { + qemuDomainObjPrivatePtr priv; qemuMonitorIOThreadInfoPtr *iothreads = NULL; virDomainIOThreadInfoPtr *info_ret = NULL; int niothreads = 0; @@ -5529,6 +5523,13 @@ qemuDomainGetIOThreadsLive(virQEMUDriverPtr driver, goto endjob; } + priv = vm->privateData; + if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_OBJECT_IOTHREAD)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("IOThreads not supported with this binary")); + goto endjob; + } + if ((niothreads = qemuDomainGetIOThreadsMon(driver, vm, &iothreads)) < 0) goto endjob; @@ -20874,6 +20875,7 @@ qemuDomainGetStatsIOThread(virQEMUDriverPtr driver, int *maxparams, unsigned int privflags ATTRIBUTE_UNUSED) { + qemuDomainObjPrivatePtr priv; size_t i; qemuMonitorIOThreadInfoPtr *iothreads = NULL; int niothreads; @@ -20882,6 +20884,10 @@ qemuDomainGetStatsIOThread(virQEMUDriverPtr driver, if (!virDomainObjIsActive(dom)) return 0; + priv = dom->privateData; + if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_OBJECT_IOTHREAD)) + return 0; + if ((niothreads = qemuDomainGetIOThreadsMon(driver, dom, &iothreads)) < 0) return -1; -- 2.17.2

On Tue, Nov 27, 2018 at 11:23:23AM -0500, John Ferlan wrote:
Commit 212dc9286 made a generic qemuDomainGetIOThreadsMon which would fail if the QEMU_CAPS_OBJECT_IOTHREAD didn't exist. Then commit d1eac927 used that helper for the collection of all domain stats. However, if the capability doesn't exist, then the entire stats collection fails. Since the IOThread stats were meant to be if available only, thus rather than failing if the capability doesn't exist, let's just not collect the stats. Restore the caps failure logic for qemuDomainGetIOThreadsLive.
Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/qemu/qemu_driver.c | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 2fb8eef609..60e29577ad 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -5490,16 +5490,9 @@ qemuDomainGetIOThreadsMon(virQEMUDriverPtr driver, virDomainObjPtr vm, qemuMonitorIOThreadInfoPtr **iothreads) { - qemuDomainObjPrivatePtr priv; + qemuDomainObjPrivatePtr priv = vm->privateData; int niothreads = 0;
- priv = vm->privateData; - if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_OBJECT_IOTHREAD)) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("IOThreads not supported with this binary")); - return -1; - } - qemuDomainObjEnterMonitor(driver, vm); niothreads = qemuMonitorGetIOThreads(priv->mon, iothreads); if (qemuDomainObjExitMonitor(driver, vm) < 0 || niothreads < 0) @@ -5514,6 +5507,7 @@ qemuDomainGetIOThreadsLive(virQEMUDriverPtr driver, virDomainObjPtr vm, virDomainIOThreadInfoPtr **info) { + qemuDomainObjPrivatePtr priv; qemuMonitorIOThreadInfoPtr *iothreads = NULL; virDomainIOThreadInfoPtr *info_ret = NULL; int niothreads = 0; @@ -5529,6 +5523,13 @@ qemuDomainGetIOThreadsLive(virQEMUDriverPtr driver, goto endjob; }
+ priv = vm->privateData; + if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_OBJECT_IOTHREAD)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("IOThreads not supported with this binary")); + goto endjob; + } + if ((niothreads = qemuDomainGetIOThreadsMon(driver, vm, &iothreads)) < 0) goto endjob;
@@ -20874,6 +20875,7 @@ qemuDomainGetStatsIOThread(virQEMUDriverPtr driver, int *maxparams, unsigned int privflags ATTRIBUTE_UNUSED) { + qemuDomainObjPrivatePtr priv;
dom is locked when this function is called, you can initialize priv here
size_t i; qemuMonitorIOThreadInfoPtr *iothreads = NULL; int niothreads; @@ -20882,6 +20884,10 @@ qemuDomainGetStatsIOThread(virQEMUDriverPtr driver, if (!virDomainObjIsActive(dom)) return 0;
+ priv = dom->privateData; + if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_OBJECT_IOTHREAD)) + return 0; + if ((niothreads = qemuDomainGetIOThreadsMon(driver, dom, &iothreads)) < 0) return -1;
Reviewed-by: Ján Tomko <jtomko@redhat.com> Jano

Thanks for the quick peek on patch1; however, I was hoping to have these patches be considered for the current release seeing as patch2 fixes a condition introduced in the release and because the next release won't be until mid January or so. The downside of not pushing patch2 is that running 'virsh domstats' will fail completely on a domain that doesn't have the IOThread capability. Without patch1, you'd only get the generic cause is unknown reason. If it's felt that the patches can wait for the next release that's fine, but no harm in asking... Tks, John On 11/27/18 11:23 AM, John Ferlan wrote:
One is longer term (patch1), while the other is sourced in this release (4.10.0) when IOThread stats were added.
John Ferlan (2): qemu: Save qemuDomainGetStats error qemu: Don't fail stats collection due to IOThread capability
src/qemu/qemu_driver.c | 28 ++++++++++++++++++++-------- 1 file changed, 20 insertions(+), 8 deletions(-)

ping? Tks, John On 11/27/18 11:23 AM, John Ferlan wrote:
One is longer term (patch1), while the other is sourced in this release (4.10.0) when IOThread stats were added.
John Ferlan (2): qemu: Save qemuDomainGetStats error qemu: Don't fail stats collection due to IOThread capability
src/qemu/qemu_driver.c | 28 ++++++++++++++++++++-------- 1 file changed, 20 insertions(+), 8 deletions(-)
participants (4)
-
Daniel P. Berrangé
-
Erik Skultety
-
John Ferlan
-
Ján Tomko