On 06/14/2018 04:10 PM, John Ferlan wrote:
[...]
>>> diff --git a/include/libvirt/libvirt-domain.h
b/include/libvirt/libvirt-domain.h
>>> index da773b76cb..1a1d34620d 100644
>>> --- a/include/libvirt/libvirt-domain.h
>>> +++ b/include/libvirt/libvirt-domain.h
>>> @@ -2055,6 +2055,7 @@ typedef enum {
>>> VIR_CONNECT_GET_ALL_DOMAINS_STATS_SHUTOFF =
VIR_CONNECT_LIST_DOMAINS_SHUTOFF,
>>> VIR_CONNECT_GET_ALL_DOMAINS_STATS_OTHER =
VIR_CONNECT_LIST_DOMAINS_OTHER,
>>>
>>> + VIR_CONNECT_GET_ALL_DOMAINS_STATS_BEST_EFFORT = 1 << 29, /* ignore
stalled domains */
>>
>> "stalled"? How about "don't wait on other jobs"
>
> Well, my hidden idea was also that we can "misuse" this flag to not wait
> on other places too. For instance, if we find out (somehow) that a
> domain is in D state, we would consider it stale even without any job
> running on it. Okay, we have no way of detecting if qemu is in D state
> right now, but you get my point. If we don't lock this flag down to just
> domain jobs (that not all drivers have btw), we can use it more widely.
>
Ahhh, I see what you're driving at - some flag name which allows us to
reuse the name for other conditions which have caused delays in
collection due to some underlying condition...
a/k/a: {AVOID|IGNORE}_BLOCKING_CONDITIONS...
>>
>>> VIR_CONNECT_GET_ALL_DOMAINS_STATS_BACKING = 1 << 30, /* include
backing chain for block stats */
>>> VIR_CONNECT_GET_ALL_DOMAINS_STATS_ENFORCE_STATS = 1U << 31, /*
enforce requested stats */
Would the new flag be mutually exclusive to ENFORCE? I want some list
of stats, but don't wait to get the answer.
I thought about this and I don't think so. So current behaviour is: with
enforce you still might miss some stats if acquiring job fails. ENFORCE
merely tells libvirt to check if requested stats are available (=driver
knows how to fetch requested stats).
Therefore I don't think these two flags are mutually exclusive.
>>> } virConnectGetAllDomainStatsFlags;
>>> diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c
>>> index d44b553c74..906b097adb 100644
>>> --- a/src/libvirt-domain.c
>>> +++ b/src/libvirt-domain.c
>>> @@ -11502,6 +11502,11 @@ virConnectGetDomainCapabilities(virConnectPtr conn,
>>> * fields for offline domains if the statistics are meaningful only for a
>>> * running domain.
>>> *
>>> + * Passing VIR_CONNECT_GET_ALL_DOMAINS_STATS_BEST_EFFORT in
>>> + * @flags means when libvirt is unable to fetch stats for any of
>>> + * the domains (for whatever reason) such domain is silently
>>> + * ignored.
>>> + *
>>
>> So we're adding this for both capabilities and...
>
> Not really, this is jut git generating misleading diff (for human). In
> fact this flag is added to virConnectGetAllDomainStats() and
> virDomainListGetStats().
>
>>
>>> * Similarly to virConnectListAllDomains, @flags can contain various flags
to
>>> * filter the list of domains to provide stats for.
>>> *
>>> @@ -11586,6 +11591,11 @@ virConnectGetAllDomainStats(virConnectPtr conn,
>>> * fields for offline domains if the statistics are meaningful only for a
>>> * running domain.
>>> *
>>> + * Passing VIR_CONNECT_GET_ALL_DOMAINS_STATS_BEST_EFFORT in
>>> + * @flags means when libvirt is unable to fetch stats for any of
>>> + * the domains (for whatever reason) such domain is silently
>>> + * ignored.
>>> + *
>>
>> ...stats in the API documentation, but...
>>
>> BTW: the domain isn't silently ignored, instead only a subset of
>> statistics is returned for the domain. That subset being statistics
>> that don't involve querying the underlying hypervisor.
>
> OKay.
>
>>
>>> * Note that any of the domain list filtering flags in @flags may be
rejected
>>> * by this function.
>>> *
>>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
>>> index 38ea865ce3..28338de05f 100644
>>> --- a/src/qemu/qemu_driver.c
>>> +++ b/src/qemu/qemu_driver.c
>>> @@ -20446,6 +20446,7 @@ qemuConnectGetAllDomainStats(virConnectPtr conn,
>>
>> ...only adding the check in the DomainStats?
>
> Sure. Because what virDomainListGetStats does is it calls
> driver->connectGetAllDomainStats() just like
> virConnectGetAllDomainStats() does. So at the driver side there's only
> one function serving two public APIs.
>
The ... are all related - you've answered the concern above. I was too
lazy to look at the actual placement - just took the git diff output and
assumed Capabilities.
>>
>>> virCheckFlags(VIR_CONNECT_LIST_DOMAINS_FILTERS_ACTIVE |
>>> VIR_CONNECT_LIST_DOMAINS_FILTERS_PERSISTENT |
>>> VIR_CONNECT_LIST_DOMAINS_FILTERS_STATE |
>>> + VIR_CONNECT_GET_ALL_DOMAINS_STATS_BEST_EFFORT |
>>> VIR_CONNECT_GET_ALL_DOMAINS_STATS_BACKING |
>>> VIR_CONNECT_GET_ALL_DOMAINS_STATS_ENFORCE_STATS, -1);
>>>
>>> @@ -20480,9 +20481,17 @@ qemuConnectGetAllDomainStats(virConnectPtr conn,
>>>
>>> virObjectLock(vm);
>>>
>>> - if (HAVE_JOB(privflags) &&
>>> - qemuDomainObjBeginJob(driver, vm, QEMU_JOB_QUERY) == 0)
>>> - domflags |= QEMU_DOMAIN_STATS_HAVE_JOB;
>>
>> Existing, but if BeginJob fails for some "other" reason, then one
>> wonders how much of the next steps actually happen.
>
> Not sure what do you mean. Any StatsWorker callback (from
> qemuDomainGetStats()) that needs to talk to monitor checks if grabbing
> job succeeded or not. If it didn't the callback returns immediately. You
> can see this live - just put sleep(60) right at the beginning of
> qemuAgentGetTime(), and then from one console run 'virsh domtime $dom'
> and from another one 'virsh domstats'.
>
>> Furthermore, we
>> don't clear the LastError.
>>
>>> + if (HAVE_JOB(privflags)) {
>>> + int rv;
>>> +
>>> + if (flags & VIR_CONNECT_GET_ALL_DOMAINS_STATS_BEST_EFFORT)
>>> + rv = qemuDomainObjBeginJobInstant(driver, vm,
QEMU_JOB_QUERY);
>>
>> Let's assume rv = -1 for a moment and it's the last domain that caused
>> the failure - does that mean the caller then re...
>
> unfinished sen... :-)
>
Darn those interruptions - I think in the end this was all related to
the clearing of the error message. I know I was concerned that for this
avoid wait condition that we wouldn't message. For this case, it's
handled by ignoring it, but perhaps some other future consumer would
need to know it has to message that the call failed. It would then need
to check if the last message was NULL, then generate a message.
Yup. But again, this is pre-existing and deserves a separate patch. I
can send it and (to motivate others to merge these) - I will do that
once these are in as a follow up ;-)
Of course then I got into the - hey wait, we don't clear the last error
in the event that we're ignoring it and naturally neglected to go back
to my first train of thought to complete my "tence" (;-)).
:-D
Michal