On 06/13/2018 05:34 PM, John Ferlan wrote:
$SUBJ: "Introduce" and "NO_WAIT"
On 06/07/2018 07:59 AM, Michal Privoznik wrote:
>
https://bugzilla.redhat.com/show_bug.cgi?id=1552092
>
> If there's a long running job it might cause us to wait 30
> seconds before we give up acquiring job. This may cause trouble
s/job/the job/
s/may cause trouble/is problematic/
> to interactive applications that fetch stats repeatedly every few
> seconds.
>
> Solution is to introduce
The solution is...
> VIR_CONNECT_GET_ALL_DOMAINS_STATS_BEST_EFFORT flag which tries to
> acquire job but does not wait if acquiring failed.
>
> Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>
> ---
> include/libvirt/libvirt-domain.h | 1 +
> src/libvirt-domain.c | 10 ++++++++++
> src/qemu/qemu_driver.c | 15 ++++++++++++---
> 3 files changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h
> index da773b76cb..1a1d34620d 100644
> --- a/include/libvirt/libvirt-domain.h
> +++ b/include/libvirt/libvirt-domain.h
> @@ -2055,6 +2055,7 @@ typedef enum {
> VIR_CONNECT_GET_ALL_DOMAINS_STATS_SHUTOFF = VIR_CONNECT_LIST_DOMAINS_SHUTOFF,
> VIR_CONNECT_GET_ALL_DOMAINS_STATS_OTHER = VIR_CONNECT_LIST_DOMAINS_OTHER,
>
> + VIR_CONNECT_GET_ALL_DOMAINS_STATS_BEST_EFFORT = 1 << 29, /* ignore stalled
domains */
"stalled"? How about "don't wait on other jobs"
Well, my hidden idea was also that we can "misuse" this flag to not wait
on other places too. For instance, if we find out (somehow) that a
domain is in D state, we would consider it stale even without any job
running on it. Okay, we have no way of detecting if qemu is in D state
right now, but you get my point. If we don't lock this flag down to just
domain jobs (that not all drivers have btw), we can use it more widely.
> VIR_CONNECT_GET_ALL_DOMAINS_STATS_BACKING = 1 << 30, /* include backing
chain for block stats */
> VIR_CONNECT_GET_ALL_DOMAINS_STATS_ENFORCE_STATS = 1U << 31, /* enforce
requested stats */
> } virConnectGetAllDomainStatsFlags;
> diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c
> index d44b553c74..906b097adb 100644
> --- a/src/libvirt-domain.c
> +++ b/src/libvirt-domain.c
> @@ -11502,6 +11502,11 @@ virConnectGetDomainCapabilities(virConnectPtr conn,
> * fields for offline domains if the statistics are meaningful only for a
> * running domain.
> *
> + * Passing VIR_CONNECT_GET_ALL_DOMAINS_STATS_BEST_EFFORT in
> + * @flags means when libvirt is unable to fetch stats for any of
> + * the domains (for whatever reason) such domain is silently
> + * ignored.
> + *
So we're adding this for both capabilities and...
Not really, this is jut git generating misleading diff (for human). In
fact this flag is added to virConnectGetAllDomainStats() and
virDomainListGetStats().
> * Similarly to virConnectListAllDomains, @flags can contain various flags to
> * filter the list of domains to provide stats for.
> *
> @@ -11586,6 +11591,11 @@ virConnectGetAllDomainStats(virConnectPtr conn,
> * fields for offline domains if the statistics are meaningful only for a
> * running domain.
> *
> + * Passing VIR_CONNECT_GET_ALL_DOMAINS_STATS_BEST_EFFORT in
> + * @flags means when libvirt is unable to fetch stats for any of
> + * the domains (for whatever reason) such domain is silently
> + * ignored.
> + *
...stats in the API documentation, but...
BTW: the domain isn't silently ignored, instead only a subset of
statistics is returned for the domain. That subset being statistics
that don't involve querying the underlying hypervisor.
OKay.
> * Note that any of the domain list filtering flags in @flags may be rejected
> * by this function.
> *
> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> index 38ea865ce3..28338de05f 100644
> --- a/src/qemu/qemu_driver.c
> +++ b/src/qemu/qemu_driver.c
> @@ -20446,6 +20446,7 @@ qemuConnectGetAllDomainStats(virConnectPtr conn,
...only adding the check in the DomainStats?
Sure. Because what virDomainListGetStats does is it calls
driver->connectGetAllDomainStats() just like
virConnectGetAllDomainStats() does. So at the driver side there's only
one function serving two public APIs.
> virCheckFlags(VIR_CONNECT_LIST_DOMAINS_FILTERS_ACTIVE |
> VIR_CONNECT_LIST_DOMAINS_FILTERS_PERSISTENT |
> VIR_CONNECT_LIST_DOMAINS_FILTERS_STATE |
> + VIR_CONNECT_GET_ALL_DOMAINS_STATS_BEST_EFFORT |
> VIR_CONNECT_GET_ALL_DOMAINS_STATS_BACKING |
> VIR_CONNECT_GET_ALL_DOMAINS_STATS_ENFORCE_STATS, -1);
>
> @@ -20480,9 +20481,17 @@ qemuConnectGetAllDomainStats(virConnectPtr conn,
>
> virObjectLock(vm);
>
> - if (HAVE_JOB(privflags) &&
> - qemuDomainObjBeginJob(driver, vm, QEMU_JOB_QUERY) == 0)
> - domflags |= QEMU_DOMAIN_STATS_HAVE_JOB;
Existing, but if BeginJob fails for some "other" reason, then one
wonders how much of the next steps actually happen.
Not sure what do you mean. Any StatsWorker callback (from
qemuDomainGetStats()) that needs to talk to monitor checks if grabbing
job succeeded or not. If it didn't the callback returns immediately. You
can see this live - just put sleep(60) right at the beginning of
qemuAgentGetTime(), and then from one console run 'virsh domtime $dom'
and from another one 'virsh domstats'.
Furthermore, we
don't clear the LastError.
> + if (HAVE_JOB(privflags)) {
> + int rv;
> +
> + if (flags & VIR_CONNECT_GET_ALL_DOMAINS_STATS_BEST_EFFORT)
> + rv = qemuDomainObjBeginJobInstant(driver, vm, QEMU_JOB_QUERY);
Let's assume rv = -1 for a moment and it's the last domain that caused
the failure - does that mean the caller then re...
unfinished sen... :-)
> + else
> + rv = qemuDomainObjBeginJob(driver, vm, QEMU_JOB_QUERY);
> +
> + if (rv == 0)
> + domflags |= QEMU_DOMAIN_STATS_HAVE_JOB;
to add to my comment above, if rv < 0, then should we clear the last
error since this code doesn't seem to care...
Probably yeah. But that can be done outside of this patch set.
Michal