[libvirt] [PATCH v2 0/5] Introduce VIR_CONNECT_GET_ALL_DOMAINS_STATS_NOWAIT

v2 of: https://www.redhat.com/archives/libvir-list/2018-June/msg00546.html diff to v1: - Rename to STATS_NOWAIT - John's review worked in - added comment to qemuDomainObjBeginJobNowait Michal Privoznik (5): qemu_domain: Document qemuDomainObjBeginJob qemuDomainObjBeginJobInternal: Remove spurious @ret assignment qemu_domain: Introduce qemuDomainObjBeginJobNowait Introduce VIR_CONNECT_GET_ALL_DOMAINS_STATS_NOWAIT virsh: Introduce --nowait to domstats include/libvirt/libvirt-domain.h | 2 ++ src/libvirt-domain.c | 12 ++++++++ src/qemu/qemu_domain.c | 60 +++++++++++++++++++++++++++++++++++----- src/qemu/qemu_domain.h | 4 +++ src/qemu/qemu_driver.c | 15 ++++++++-- tools/virsh-domain-monitor.c | 7 +++++ tools/virsh.pod | 16 +++++++---- 7 files changed, 101 insertions(+), 15 deletions(-) -- 2.16.4

Provide a small comment on the function and its parameters. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: John Ferlan <jferlan@redhat.com> --- src/qemu/qemu_domain.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 2119233907..be36a9e3e4 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -6353,8 +6353,22 @@ qemuDomainJobAllowed(qemuDomainObjPrivatePtr priv, qemuDomainJob job) /* Give up waiting for mutex after 30 seconds */ #define QEMU_JOB_WAIT_TIME (1000ull * 30) -/* - * obj must be locked before calling +/** + * qemuDomainObjBeginJobInternal: + * @driver: qemu driver + * @obj: domain object + * @job: qemuDomainJob to start + * @asyncJob: qemuDomainAsyncJob to start + * + * Acquires job for a domain object which must be locked before + * calling. If there's already a job running waits up to + * QEMU_JOB_WAIT_TIME after which the functions fails reporting + * an error. + * + * Returns: 0 on success, + * -2 if unable to start job because of timeout or + * maxQueuedJobs limit, + * -1 otherwise. */ static int ATTRIBUTE_NONNULL(1) qemuDomainObjBeginJobInternal(virQEMUDriverPtr driver, -- 2.16.4

The variable is initialized to -1 already. There's no way it can be overwritten by the time control gets to the line I'm removing. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: John Ferlan <jferlan@redhat.com> --- src/qemu/qemu_domain.c | 1 - 1 file changed, 1 deletion(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index be36a9e3e4..21d54938b6 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -6490,7 +6490,6 @@ qemuDomainObjBeginJobInternal(virQEMUDriverPtr driver, else blocker = priv->job.asyncOwnerAPI; - ret = -1; if (errno == ETIMEDOUT) { if (blocker) { virReportError(VIR_ERR_OPERATION_TIMEOUT, -- 2.16.4

The aim of this API is to allow the caller do best effort. Some functions can work even when acquiring the job fails (e.g. qemuConnectGetAllDomainStats()). But what they can't bear is delay if they have to wait up to 30 seconds for each domain that is processing some other job. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 43 ++++++++++++++++++++++++++++++++++++++----- src/qemu/qemu_domain.h | 4 ++++ 2 files changed, 42 insertions(+), 5 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 21d54938b6..a01067049e 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -6359,11 +6359,15 @@ qemuDomainJobAllowed(qemuDomainObjPrivatePtr priv, qemuDomainJob job) * @obj: domain object * @job: qemuDomainJob to start * @asyncJob: qemuDomainAsyncJob to start + * @nowait: don't wait trying to acquire @job * * Acquires job for a domain object which must be locked before * calling. If there's already a job running waits up to * QEMU_JOB_WAIT_TIME after which the functions fails reporting - * an error. + * an error unless @nowait is set. + * If @nowait is true this function tries to acquire job and if + * it fails, then it returns immediately without waiting. No + * error is reported in this case. * * Returns: 0 on success, * -2 if unable to start job because of timeout or @@ -6374,7 +6378,8 @@ static int ATTRIBUTE_NONNULL(1) qemuDomainObjBeginJobInternal(virQEMUDriverPtr driver, virDomainObjPtr obj, qemuDomainJob job, - qemuDomainAsyncJob asyncJob) + qemuDomainAsyncJob asyncJob, + bool nowait) { qemuDomainObjPrivatePtr priv = obj->privateData; unsigned long long now; @@ -6414,12 +6419,18 @@ qemuDomainObjBeginJobInternal(virQEMUDriverPtr driver, } while (!nested && !qemuDomainNestedJobAllowed(priv, job)) { + if (nowait) + goto cleanup; + VIR_DEBUG("Waiting for async job (vm=%p name=%s)", obj, obj->def->name); if (virCondWaitUntil(&priv->job.asyncCond, &obj->parent.lock, then) < 0) goto error; } while (priv->job.active) { + if (nowait) + goto cleanup; + VIR_DEBUG("Waiting for job (vm=%p name=%s)", obj, obj->def->name); if (virCondWaitUntil(&priv->job.cond, &obj->parent.lock, then) < 0) goto error; @@ -6536,7 +6547,7 @@ int qemuDomainObjBeginJob(virQEMUDriverPtr driver, qemuDomainJob job) { if (qemuDomainObjBeginJobInternal(driver, obj, job, - QEMU_ASYNC_JOB_NONE) < 0) + QEMU_ASYNC_JOB_NONE, false) < 0) return -1; else return 0; @@ -6551,7 +6562,7 @@ int qemuDomainObjBeginAsyncJob(virQEMUDriverPtr driver, qemuDomainObjPrivatePtr priv; if (qemuDomainObjBeginJobInternal(driver, obj, QEMU_JOB_ASYNC, - asyncJob) < 0) + asyncJob, false) < 0) return -1; priv = obj->privateData; @@ -6580,9 +6591,31 @@ qemuDomainObjBeginNestedJob(virQEMUDriverPtr driver, return qemuDomainObjBeginJobInternal(driver, obj, QEMU_JOB_ASYNC_NESTED, - QEMU_ASYNC_JOB_NONE); + QEMU_ASYNC_JOB_NONE, + false); } +/** + * qemuDomainObjBeginJobNowait: + * + * @driver: qemu driver + * @obj: domain object + * @job: qemuDomainJob to start + * + * Acquires job for a domain object which must be locked before + * calling. If there's already a job running it returns + * immediately without any error reported. + * + * Returns: see qemuDomainObjBeginJobInternal + */ +int +qemuDomainObjBeginJobNowait(virQEMUDriverPtr driver, + virDomainObjPtr obj, + qemuDomainJob job) +{ + return qemuDomainObjBeginJobInternal(driver, obj, job, + QEMU_ASYNC_JOB_NONE, true); +} /* * obj must be locked and have a reference before calling diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index fd8d9b5305..9e2da0a37c 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -510,6 +510,10 @@ int qemuDomainObjBeginNestedJob(virQEMUDriverPtr driver, virDomainObjPtr obj, qemuDomainAsyncJob asyncJob) ATTRIBUTE_RETURN_CHECK; +int qemuDomainObjBeginJobNowait(virQEMUDriverPtr driver, + virDomainObjPtr obj, + qemuDomainJob job) + ATTRIBUTE_RETURN_CHECK; void qemuDomainObjEndJob(virQEMUDriverPtr driver, virDomainObjPtr obj); -- 2.16.4

On 06/15/2018 04:18 AM, Michal Privoznik wrote:
The aim of this API is to allow the caller do best effort. Some
s/do/to do/
functions can work even when acquiring the job fails (e.g. qemuConnectGetAllDomainStats()). But what they can't bear is delay if they have to wait up to 30 seconds for each domain that is processing some other job.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 43 ++++++++++++++++++++++++++++++++++++++----- src/qemu/qemu_domain.h | 4 ++++ 2 files changed, 42 insertions(+), 5 deletions(-)
Couple of nits noted, one above, one below... Reviewed-by: John Ferlan <jferlan@redhat.com> John
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 21d54938b6..a01067049e 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -6359,11 +6359,15 @@ qemuDomainJobAllowed(qemuDomainObjPrivatePtr priv, qemuDomainJob job) * @obj: domain object * @job: qemuDomainJob to start * @asyncJob: qemuDomainAsyncJob to start + * @nowait: don't wait trying to acquire @job * * Acquires job for a domain object which must be locked before * calling. If there's already a job running waits up to * QEMU_JOB_WAIT_TIME after which the functions fails reporting - * an error. + * an error unless @nowait is set.
Paragraph break have a empty line for readability.
+ * If @nowait is true this function tries to acquire job and if + * it fails, then it returns immediately without waiting. No + * error is reported in this case. * * Returns: 0 on success, * -2 if unable to start job because of timeout or
[...]

https://bugzilla.redhat.com/show_bug.cgi?id=1552092 If there's a long running job it might cause us to wait 30 seconds before we give up acquiring the job. This is problematic to interactive applications that fetch stats repeatedly every few seconds. The solution is to introduce VIR_CONNECT_GET_ALL_DOMAINS_STATS_NOWAIT flag which tries to acquire job but does not wait if acquiring failed. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- include/libvirt/libvirt-domain.h | 2 ++ src/libvirt-domain.c | 12 ++++++++++++ src/qemu/qemu_driver.c | 15 ++++++++++++--- 3 files changed, 26 insertions(+), 3 deletions(-) diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h index 3ef7c24528..796f2e1408 100644 --- a/include/libvirt/libvirt-domain.h +++ b/include/libvirt/libvirt-domain.h @@ -2055,6 +2055,8 @@ typedef enum { VIR_CONNECT_GET_ALL_DOMAINS_STATS_SHUTOFF = VIR_CONNECT_LIST_DOMAINS_SHUTOFF, VIR_CONNECT_GET_ALL_DOMAINS_STATS_OTHER = VIR_CONNECT_LIST_DOMAINS_OTHER, + VIR_CONNECT_GET_ALL_DOMAINS_STATS_NOWAIT = 1 << 29, /* report statistics that can be obtained + immediately without any blocking */ VIR_CONNECT_GET_ALL_DOMAINS_STATS_BACKING = 1 << 30, /* include backing chain for block stats */ VIR_CONNECT_GET_ALL_DOMAINS_STATS_ENFORCE_STATS = 1U << 31, /* enforce requested stats */ } virConnectGetAllDomainStatsFlags; diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c index 4a899f31c8..c71f2e6877 100644 --- a/src/libvirt-domain.c +++ b/src/libvirt-domain.c @@ -11502,6 +11502,12 @@ virConnectGetDomainCapabilities(virConnectPtr conn, * fields for offline domains if the statistics are meaningful only for a * running domain. * + * Passing VIR_CONNECT_GET_ALL_DOMAINS_STATS_NOWAIT in + * @flags means when libvirt is unable to fetch stats for any of + * the domains (for whatever reason) only a subset of statistics + * is returned for the domain. That subset being statistics that + * don't involve querying the underlying hypervisor. + * * Similarly to virConnectListAllDomains, @flags can contain various flags to * filter the list of domains to provide stats for. * @@ -11586,6 +11592,12 @@ virConnectGetAllDomainStats(virConnectPtr conn, * fields for offline domains if the statistics are meaningful only for a * running domain. * + * Passing VIR_CONNECT_GET_ALL_DOMAINS_STATS_NOWAIT in + * @flags means when libvirt is unable to fetch stats for any of + * the domains (for whatever reason) only a subset of statistics + * is returned for the domain. That subset being statistics that + * don't involve querying the underlying hypervisor. + * * Note that any of the domain list filtering flags in @flags may be rejected * by this function. * diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 42069ee617..35038889a3 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -20428,6 +20428,7 @@ qemuConnectGetAllDomainStats(virConnectPtr conn, virCheckFlags(VIR_CONNECT_LIST_DOMAINS_FILTERS_ACTIVE | VIR_CONNECT_LIST_DOMAINS_FILTERS_PERSISTENT | VIR_CONNECT_LIST_DOMAINS_FILTERS_STATE | + VIR_CONNECT_GET_ALL_DOMAINS_STATS_NOWAIT | VIR_CONNECT_GET_ALL_DOMAINS_STATS_BACKING | VIR_CONNECT_GET_ALL_DOMAINS_STATS_ENFORCE_STATS, -1); @@ -20462,9 +20463,17 @@ qemuConnectGetAllDomainStats(virConnectPtr conn, virObjectLock(vm); - if (HAVE_JOB(privflags) && - qemuDomainObjBeginJob(driver, vm, QEMU_JOB_QUERY) == 0) - domflags |= QEMU_DOMAIN_STATS_HAVE_JOB; + if (HAVE_JOB(privflags)) { + int rv; + + if (flags & VIR_CONNECT_GET_ALL_DOMAINS_STATS_NOWAIT) + rv = qemuDomainObjBeginJobNowait(driver, vm, QEMU_JOB_QUERY); + else + rv = qemuDomainObjBeginJob(driver, vm, QEMU_JOB_QUERY); + + if (rv == 0) + domflags |= QEMU_DOMAIN_STATS_HAVE_JOB; + } /* else: without a job it's still possible to gather some data */ if (flags & VIR_CONNECT_GET_ALL_DOMAINS_STATS_BACKING) -- 2.16.4

On 06/15/2018 04:18 AM, Michal Privoznik wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1552092
If there's a long running job it might cause us to wait 30 seconds before we give up acquiring the job. This is problematic to interactive applications that fetch stats repeatedly every few seconds.
The solution is to introduce VIR_CONNECT_GET_ALL_DOMAINS_STATS_NOWAIT flag which tries to acquire job but does not wait if acquiring failed.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- include/libvirt/libvirt-domain.h | 2 ++ src/libvirt-domain.c | 12 ++++++++++++ src/qemu/qemu_driver.c | 15 ++++++++++++--- 3 files changed, 26 insertions(+), 3 deletions(-)
Reviewed-by: John Ferlan <jferlan@redhat.com> John Is the 30 seconds qemu specific? Could drop it from the commit message - if you feel so compelled.

On 06/19/2018 01:25 AM, John Ferlan wrote:
On 06/15/2018 04:18 AM, Michal Privoznik wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1552092
If there's a long running job it might cause us to wait 30 seconds before we give up acquiring the job. This is problematic to interactive applications that fetch stats repeatedly every few seconds.
The solution is to introduce VIR_CONNECT_GET_ALL_DOMAINS_STATS_NOWAIT flag which tries to acquire job but does not wait if acquiring failed.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- include/libvirt/libvirt-domain.h | 2 ++ src/libvirt-domain.c | 12 ++++++++++++ src/qemu/qemu_driver.c | 15 ++++++++++++--- 3 files changed, 26 insertions(+), 3 deletions(-)
Reviewed-by: John Ferlan <jferlan@redhat.com>
John
Is the 30 seconds qemu specific? Could drop it from the commit message - if you feel so compelled.
Yes & no. Each driver has its own timeout but all set it to 30 seconds: #define LXC_JOB_WAIT_TIME (1000ull * 30) #define LIBXL_JOB_WAIT_TIME (1000ull * 30) #define VZ_JOB_WAIT_TIME (1000 * 30) #define QEMU_JOB_WAIT_TIME (1000ull * 30) Therefore I rather keep 30seconds in the commit message. Michal

This new switch can be used to set VIR_CONNECT_GET_ALL_DOMAINS_STATS_NOWAIT flag for stats fetching API. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- tools/virsh-domain-monitor.c | 7 +++++++ tools/virsh.pod | 16 +++++++++++----- 2 files changed, 18 insertions(+), 5 deletions(-) diff --git a/tools/virsh-domain-monitor.c b/tools/virsh-domain-monitor.c index 8cbb3db37c..886f7f16b5 100644 --- a/tools/virsh-domain-monitor.c +++ b/tools/virsh-domain-monitor.c @@ -1992,6 +1992,10 @@ static const vshCmdOptDef opts_domstats[] = { .type = VSH_OT_BOOL, .help = N_("add backing chain information to block stats"), }, + {.name = "nowait", + .type = VSH_OT_BOOL, + .help = N_("report only stats that are accessible instantly"), + }, VIRSH_COMMON_OPT_DOMAIN_OT_ARGV(N_("list of domains to get stats for"), 0), {.name = NULL} }; @@ -2087,6 +2091,9 @@ cmdDomstats(vshControl *ctl, const vshCmd *cmd) if (vshCommandOptBool(cmd, "backing")) flags |= VIR_CONNECT_GET_ALL_DOMAINS_STATS_BACKING; + if (vshCommandOptBool(cmd, "nowait")) + flags |= VIR_CONNECT_GET_ALL_DOMAINS_STATS_NOWAIT; + if (vshCommandOptBool(cmd, "domain")) { if (VIR_ALLOC_N(domlist, 1) < 0) goto cleanup; diff --git a/tools/virsh.pod b/tools/virsh.pod index 3f3314a87e..7cb8c8a6e4 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -968,11 +968,11 @@ that require a block device name (such as I<domblkinfo> or I<snapshot-create> for disk snapshots) will accept either target or unique source names printed by this command. -=item B<domstats> [I<--raw>] [I<--enforce>] [I<--backing>] [I<--state>] -[I<--cpu-total>] [I<--balloon>] [I<--vcpu>] [I<--interface>] [I<--block>] -[I<--perf>] [[I<--list-active>] [I<--list-inactive>] [I<--list-persistent>] -[I<--list-transient>] [I<--list-running>] [I<--list-paused>] -[I<--list-shutoff>] [I<--list-other>]] | [I<domain> ...] +=item B<domstats> [I<--raw>] [I<--enforce>] [I<--backing>] [I<--nowait>] +[I<--state>] [I<--cpu-total>] [I<--balloon>] [I<--vcpu>] [I<--interface>] +[I<--block>] [I<--perf>] [[I<--list-active>] [I<--list-inactive>] +[I<--list-persistent>] [I<--list-transient>] [I<--list-running>] +[I<--list-paused>] [I<--list-shutoff>] [I<--list-other>]] | [I<domain> ...] Get statistics for multiple or all domains. Without any argument this command prints all available statistics for all domains. @@ -1123,6 +1123,12 @@ daemon supports the selected group of stats. Flag I<--enforce> forces the command to fail if the daemon doesn't support the selected group. +When collecting stats libvirtd may wait for some time if there's +already another job running on given domain for it to finish. +This may cause unnecessary delay in delivering stats. Using +I<--nowait> suppresses this behaviour. On the other hand +some statistics might be missing for such domain. + =item B<domiflist> I<domain> [I<--inactive>] Print a table showing the brief information of all virtual interfaces -- 2.16.4

On 06/15/2018 04:18 AM, Michal Privoznik wrote:
This new switch can be used to set VIR_CONNECT_GET_ALL_DOMAINS_STATS_NOWAIT flag for stats fetching API.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- tools/virsh-domain-monitor.c | 7 +++++++ tools/virsh.pod | 16 +++++++++++----- 2 files changed, 18 insertions(+), 5 deletions(-)
Reviewed-by: John Ferlan <jferlan@redhat.com> John
participants (3)
-
John Ferlan
-
Michal Privoznik
-
Michal Prívozník