Re: [libvirt] [PATCHv3 2/8] qemu: bulk stats: implement CPU stats group

10 Sep 2014

----- Original Message -----
...
From: "Wang Rui" <moon.wangrui@huawei.com>
To: "Francesco Romani" <fromani@redhat.com>
Cc: libvir-list@redhat.com, pkrempa@redhat.com
Sent: Wednesday, September 10, 2014 10:56:47 AM
Subject: Re: [libvirt] [PATCHv3 2/8] qemu: bulk stats: implement CPU stats group
On 2014/9/8 21:05, Francesco Romani wrote:
...
This patch implements the VIR_DOMAIN_STATS_CPU_TOTAL
group of statistics.
Signed-off-by: Francesco Romani <fromani@redhat.com>
---
 include/libvirt/libvirt.h.in |  1 +
 src/libvirt.c                |  9 ++++++++
 src/qemu/qemu_driver.c       | 51
 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 61 insertions(+)
[...]

diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index 2950a4b..cfc5941 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -96,6 +96,7 @@
 #include "storage/storage_driver.h"
 #include "virhostdev.h"
 #include "domain_capabilities.h"
+#include "vircgroup.h"
Hi, Francesco.
I see the file including relationship. 'qemu_driver.c' includes
'qemu_cgroup.h' which
includes 'vircgroup.h'. There are other virCgroupGet* functions called in
qemu_driver.c
now. So I think here "include vircgroup.h" is not necessary.
Thanks for the research, I'll remove
...
...
#define VIR_FROM_THIS VIR_FROM_QEMU
@@ -17338,6 +17339,55 @@ qemuDomainGetStatsState(virConnectPtr conn
ATTRIBUTE_UNUSED,
 }
+static int
+qemuDomainGetStatsCpu(virConnectPtr conn ATTRIBUTE_UNUSED,
+                      virDomainObjPtr dom,
+                      virDomainStatsRecordPtr record,
+                      int *maxparams,
+                      unsigned int privflags ATTRIBUTE_UNUSED)
+{
+    qemuDomainObjPrivatePtr priv = dom->privateData;
+    unsigned long long cpu_time = 0;
+    unsigned long long user_time = 0;
+    unsigned long long sys_time = 0;
+    int ncpus = 0;
+    int err;
+
+    ncpus = nodeGetCPUCount();
+    if (ncpus > 0 &&
+        virTypedParamsAddUInt(&record->params,
+                              &record->nparams,
+                              maxparams,
+                              "cpu.count",
+                              (unsigned int)ncpus) < 0)
+        return -1;
+
+    err = virCgroupGetCpuacctUsage(priv->cgroup, &cpu_time);
+    if (!err && virTypedParamsAddULLong(&record->params,
+                                        &record->nparams,
+                                        maxparams,
+                                        "cpu.time",
+                                        cpu_time) < 0)
+        return -1;
+    err = virCgroupGetCpuacctStat(priv->cgroup, &user_time, &sys_time);
+    if (!err && virTypedParamsAddULLong(&record->params,
+                                        &record->nparams,
+                                        maxparams,
+                                        "cpu.user",
+                                        user_time) < 0)
+        return -1;
+    if (!err && virTypedParamsAddULLong(&record->params,
+                                        &record->nparams,
+                                        maxparams,
+                                        "cpu.system",
+                                        sys_time) < 0)
+        return -1;
1. If any of the 'err's is not zero, the function may returns 0 as success.
   Is this the expected return? Or at least we can give a warning that we
   miss some parameters.
2. I think it's better to report an error or warning log before return -1.
The idea here (well, at least my idea :) ) is to gather as much as data as possible,
and to silently skip failures here. The lack of expected output is a good enough
indicator that a domain is unresponsive.
-1 is reported for really unrecoverable error, like memory (re)allocation failure.

Otherwise, how could the client code be meaningfully informed that some group
failed, maybe partially? It is possible that different groups fail for different
domains: how could we convey this information?

I have no problems to go this route if there is consensus this is the preferred
way to go, but then we'll need to discuss how to convey a meaningful error convention.

Bests,

-- 
Francesco Romani
RedHat Engineering Virtualization R & D
Phone: 8261328
IRC: fromani