On Tue, Jun 28, 2022 at 10:15:28PM +0530, Amneesh Singh wrote:
On Tue, Jun 28, 2022 at 05:23:11PM +0200, Michal Prívozník wrote:
> On 6/24/22 10:14, Amneesh Singh wrote:
> > Related:
https://gitlab.com/libvirt/libvirt/-/issues/276
> >
> > This patch uses qemuMonitorQueryStats to query "halt_poll_success_ns"
> > and "halt_poll_fail_ns" for every vCPU. The respective values for
each
> > vCPU are then added together.
> >
> > Signed-off-by: Amneesh Singh <natto(a)weirdnatto.in>
> > ---
> > src/qemu/qemu_driver.c | 48 ++++++++++++++++++++++++++++++++++++++----
> > 1 file changed, 44 insertions(+), 4 deletions(-)
> >
> > diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> > index 3b5c3db6..0a2be6d3 100644
> > --- a/src/qemu/qemu_driver.c
> > +++ b/src/qemu/qemu_driver.c
> > @@ -18057,10 +18057,50 @@ qemuDomainGetStatsCpuHaltPollTime(virDomainObj *dom,
> > {
> > unsigned long long haltPollSuccess = 0;
> > unsigned long long haltPollFail = 0;
> > - pid_t pid = dom->pid;
> > + qemuDomainObjPrivate *priv = dom->privateData;
> > + bool canQueryStats = virQEMUCapsGet(priv->qemuCaps,
QEMU_CAPS_QUERY_STATS);
>
> Is this variable really needed? I mean, we can just:
>
> if (virQEMUCapsGet(...) {
>
> } else {
>
> }
>
> But if you want to avoid long lines, then perhaps rename the variable to
> queryStatsCap? This way it's more obvious what the variable reflects.
> Stats can be queried in both cases ;-)
Sure, that sounds doable :)
>
> >
> > - if (virHostCPUGetHaltPollTime(pid, &haltPollSuccess, &haltPollFail)
< 0)
> > - return 0;
> > + if (!canQueryStats) {
> > + pid_t pid = dom->pid;
> > +
> > + if (virHostCPUGetHaltPollTime(pid, &haltPollSuccess,
&haltPollFail) < 0)
> > + return 0;
> > + } else {
> > + size_t i;
> > + qemuMonitorQueryStatsTargetType target =
QEMU_MONITOR_QUERY_STATS_TARGET_VCPU;
> > + qemuMonitorQueryStatsProvider *provider = NULL;
> > + g_autoptr(GPtrArray) providers = NULL;
> > + g_autoptr(GPtrArray) queried_stats = NULL;
> > + const char *success_str = "halt_poll_success_ns";
> > + const char *fail_str = "halt_poll_fail_ns";
> > +
> > + provider =
qemuMonitorQueryStatsProviderNew(QEMU_MONITOR_QUERY_STATS_PROVIDER_KVM);
> > + provider->names = g_new0(char *, 3);
> > + provider->names[0] = g_strdup(success_str), provider->names[1] =
g_strdup(fail_str);
>
> I'm starting to wonder whether this is a good interface. These ->names[]
> array is never changed, so maybe we can have it as a NULL terminated
> list of constant strings? For instance:
>
> provider = qemuMonitorQueryStatsProviderNew();
> provider->names = {"halt_poll_success_ns",
"halt_poll_fail_ns", NULL};
Well, cannot really assign char ** with a scalar initialization, but
what might work is something like
const char *names[] = { success_str, fail_str, NULL };
provider = qemuMonitorQueryStatsProviderNew(QEMU_MONITOR_QUERY_STATS_PROVIDER_KVM);
provider->names = g_strdupv((char **) names);
I think what Michal was trying to say is that since you do not change
the array anywhere, there is no need for that to be a dynamically
allocated array that needs to be freed. I, however, am not 100% if you
are going to need this to be dynamic and whether you will be changing
these arrays.
Another thought was using GStrvBuilder but it is not avaiable as per
GLIB_VERSION_MAX.
I too think that the current approach is not very nice. A variadic
function similar to g_strv_builder_add_many that initializes a
char ** would be nice.
>
If that is something that helps, then we can write it ourselves and only
use our implementation if glib is too old.
> > +
> > + providers = g_ptr_array_new_full(1, (GDestroyNotify)
qemuMonitorQueryStatsProviderFree);
> > + g_ptr_array_add(providers, provider);
> > +
> > + queried_stats = qemuMonitorQueryStats(priv->mon, target, NULL,
providers);
>
> This issues monitor command without proper checks. Firstly, you need to
> check whether job acquiring (that s/false/true/ change done in the last
> hunk) was successful and the domain is still running:
>
> if (!HAVE_JOB(privflags) || !virDomainObjIsActive(dom)) {
> /* code here */
> }
>
> Then, you need to so called "enter the monitor" and "exit the
monitor".
> So what happens here is that @vm is when this function is called. Now,
> issuing a monitor command with the domain object lock held is
> potentially very dangerous because if QEMU takes long time to reply (or
> it doesn't reply at all) the domain object is locked an can't be
> destroyed (virsh destroy) - because the first thing that
> qemuDomainDestroyFlags() does is locking the domain object. Also, you
> want to make sure no other thread is currently talking on the monitor -
> so the monitor has to be locked too. We have two convenient functions
> for these operations:
>
> qemuDomainObjEnterMonitor()
> qemuDomainObjExitMonitor()
>
> This is all described in more detail in
> docs/kbase/internals/qemu-threads.rst.
>
> Having said all of this, I think we can fallback to the current behavior
> if job wasn't acquired successfully. Therefore the condition at the very
> beginning might look like this:
>
> if (queryStatsCap && HAVE_JOB() && virDomainObjIsActive()) {
> ...
> qemuDomainEnterMonitor();
> qemuMonitorQueryStats();
> qemuDomainExitMonitor();
> } else {
> virHostCPUGetHaltPollTime();
> }
>
I see that makes sense. I shall do the necessary changes. Thanks for the
detailed explanation.
> > +
> > + if (!queried_stats)
> > + return 0;
> > +
> > + for (i = 0; i < queried_stats->len; i++) {
> > + unsigned long long curHaltPollSuccess, curHaltPollFail;
> > + GHashTable *cur_table = queried_stats->pdata[i];
> > + virJSONValue *success, *fail;
> > +
> > + success = g_hash_table_lookup(cur_table, success_str);
> > + fail = g_hash_table_lookup(cur_table, fail_str);
> > +
> > + ignore_value(virJSONValueGetNumberUlong(success,
&curHaltPollSuccess));
> > + ignore_value(virJSONValueGetNumberUlong(fail,
&curHaltPollFail));
>
> I don't think this is a safe thing to do. What if either of @success or
> @fail is NULL? That can happen when QEMU didn't return requested member.
> True, at that point the 'query-stats' command should have failed, but I
> think it's more defensive if we checked these pointers. My worry is that
> virJSONValueGetNumberUlong() dereferences the pointer right away.
>
I do understand that this is not an unfounded worry. I shall put the
checks there.
> > +
> > + haltPollSuccess += curHaltPollSuccess;
> > + haltPollFail += curHaltPollFail;
> > + }
> > + }
> >
> > if (virTypedParamListAddULLong(params, haltPollSuccess,
"cpu.haltpoll.success.time") < 0 ||
> > virTypedParamListAddULLong(params, haltPollFail,
"cpu.haltpoll.fail.time") < 0)
> > @@ -18915,7 +18955,7 @@ static virQEMUCapsFlags queryDirtyRateRequired[] = {
> >
> > static struct qemuDomainGetStatsWorker qemuDomainGetStatsWorkers[] = {
> > { qemuDomainGetStatsState, VIR_DOMAIN_STATS_STATE, false, NULL },
> > - { qemuDomainGetStatsCpu, VIR_DOMAIN_STATS_CPU_TOTAL, false, NULL },
> > + { qemuDomainGetStatsCpu, VIR_DOMAIN_STATS_CPU_TOTAL, true, NULL },
> > { qemuDomainGetStatsBalloon, VIR_DOMAIN_STATS_BALLOON, true, NULL },
> > { qemuDomainGetStatsVcpu, VIR_DOMAIN_STATS_VCPU, true, NULL },
> > { qemuDomainGetStatsInterface, VIR_DOMAIN_STATS_INTERFACE, false, NULL },
>
> Please feel free to object to anything I wrote. Maybe you have more
> patches written on a local branch that justify your choices. I don't know.
>
> Michal
>
Thanks for taking the time to review the patches.
Amneesh