On 01/18/2012 12:12 AM, Lai Jiangshan wrote:
Signed-off-by: Lai Jiangshan <laijs(a)cn.fujitsu.com>
---
include/libvirt/libvirt.h.in | 5 ++++
python/generator.py | 1 +
src/driver.h | 7 +++++
src/libvirt.c | 51 ++++++++++++++++++++++++++++++++++++++++++
src/libvirt_public.syms | 5 ++++
5 files changed, 69 insertions(+), 0 deletions(-)
+
+/**
+ * virDomainGetPcpusUsage:
+ * @dom: pointer to domain object
+ * @usages: returned physical cpu usages
+ * @nr_usages: length of @usages
+ * @flags: flags to control the operation
+ *
+ * Get the cpu usages for every physical cpu since the domain started (in nanoseconds).
+ *
+ * Returns 0 if success, -1 on error
+ */
+int virDomainGetPcpusUsage(virDomainPtr dom,
+ unsigned long long *usages,
+ int *nr_usages,
+ unsigned int flags)
Overall, I think this patch series is headed in the right direction.
And the idea of using the cpuacct cgroup for per-cpu usage numbers is a
cool hack. However, I think this API is a bit too restricted, and that
a few tweaks can make the API more powerful and avoid the need for
adding new API in the future (instead, just adding new named parameter
types to this API).
First, we don't use the term Pcpu anywhere else in libvirt. I'm
thinking it would be better to name this virDomainGetCPUStats, to match
our existing virNodeGetCPUStats. (Too bad that we weren't consistent
between CPU vs. Vcpu in capitalization.) Also, by naming it just
CPUStats, rather than PcpusUsate, I think we can open the door to
further stats - for example, I can envision that we might want to get
stats not only on how much cpu accounting has been attributed to the
host qemu process, but we can also communicate to a guest agent and get
the guest's own view of its cpu usage; the difference between the two
values would thus be the overhead consumed by the hypervisor in
presenting an environment to the guest. Of course, we don't have to
integrate with a guest agent now, but we should be designing the API
with that in mind.
Next, I notice that right now, cpuacct provides the following
properties: cpuacct.stat (gives elapsed user and system times for the
cgroup, similar to 2 of the 3 cumulative accounting values given by the
time(1) utility; looks like the unit is roughly in microseconds),
cpuacct.usage_percpu (gives just nanosecond usage, without division
between user or system), and cpuacct.usage (the sum of cpuacct.usage).
And it is conceivable that future kernels will add even more stats under
the cpuacct umbrella. But your patch only uses cpuacct.usage_percpu.
Why should we be hard-coding things to just one stat?
Additionally, I'm comparing this with existing APIs, for things we have
learned in the past about querying statistics. virNodeGetCPUStats is
rather limited in that it can only return the stats of one processor at
a time; but has the benefit that the magic VIR_NODE_CPU_STATS_ALL_CPUS
(aka -1) can provide the sum across all processors into that one stat.
So a machine with 8 processors requires 8 calls to the API, but you can
also get the summary in one blow without doing any additions yourself.
It may also be the case that some stats are available only on the entire
process, rather than per-cpu. virNodeGetCPUStats also has the drawback
that it can only return unsigned long long values, rather than using
virTypedParameter; but at least it has the benefit that it can return an
arbitrary number of name/value stat tuples. Meanwhile,
virNodeGetCellsFreeMemory allows the user to request a start and stop
limit, and thus grab multiple stats in one call, but doesn't allow
passing -1 to get a summary (that is, with 8 NUMA cells you can make
just 1 call, but have to do the addition yourself). It has a drawback
that it can only return one specific stat; it cannot be extended to
other statistics. And virDomainGetVcpuPinInfo is an example of a
function that returns a 2-dimensional array through a single output
pointer, by passing in two separate length parameters; but again with a
limited output of only one specific stat.
Borrowing from these ideas, I think your API should return an array of
virTypedParameter per cpu, so that we can add new stats without needing
new API. It should also allow the ability to return a summary over all
cpus, instead of a 2-dimensional list of statistics per cpu. I'm
thinking something like the following:
/**
* virDomainGetCPUStats:
* @dom: domain to query
* @params: array to populate on output
* @nparams: number of parameters per cpu
* @start_cpu: which cpu to start with, or -1 for summary
* @ncpus: how many cpus to query
* @flags: unused for now
*
* Get statistics relating to CPU usage attributable to a single
* domain (in contrast to the statistics returned by
* virNodeGetCPUStats() for all processes on the host). @dom
* must be running (an inactive domain has no attributable cpu
* usage). On input, @params must contain at least @nparams * @ncpus
* entries, allocated by the caller.
*
* If @start_cpu is -1, then @ncpus must be 1, and the returned
* results reflect the statistics attributable to the entire
* domain (such as user and system time for the process as a
* whole). Otherwise, @start_cpu represents which cpu to start
* with, and @ncpus represents how many consecutive processors to
* query, with statistics attributable per processor (such as
* per-cpu usage).
*
* As a special case, if @params is NULL and @nparams is 0, then
* @ncpus must be 1, and the return value will be how many
* statistics are available for the given @start_cpu. This number
* may be different for @start_cpu of -1 than for any non-negative
* value, but will be the same for all non-negative @start_cpu.
* The combination of @start_cpu and @ncpus must not exceed the
* number of available cpus to query.
*
* For now, @flags is unused, and the statistics all relate to the
* usage from the host perspective; the number of cpus available to
* query can be determined by the cpus member of the result of
* virNodeGetInfo(), even if the domain has had vcpus pinned to only
* use a subset of overall host cpus. It is possible that a future
* version will support a flag that queries the cpu usage from the
* guest's perspective, using the number of vcpus available to the
* guest, found by virDomainGetVcpusFlags(). An individual guest
* vcpu cannot be reliably mapped back to a specific host cpu unless
* a single-processor vcpu pinning was used, but when @start_cpu is -1,
* any difference in usage between a host and guest perspective would
* serve as a measure of hypervisor overhead.
*
* Returns -1 on failure, or the number of statistics that were
* populated per cpu on success (this will be less than the total
* number of populated @params, unless @ncpus was 1; and may be
* less than @nparams). The populated parameters start at each
* stride of @nparams; any unpopulated parameters will be zeroed
* on success. The caller is responsible for freeing any returned
* string parameters.
*/
int virDomainGetCPUStats(virDomainPtr dom,
virTypedParamterPtr params,
int nparams,
int start_cpu,
int end_cpu,
unsigned int flags);
where you might have the following behavior on a 2-cpu machine where the
guest has 1 vcpu allocated:
virDomainGetCPUStats(dom, NULL, 0, -1, 1, 0) => 2
virDomainGetCPUStats(dom, params, 2, -1, 1, 0) => 2, with params[0]
being <"user",ull,6507945> and params[1] being
<"system",ull,1817278> -
that is, the times found in cpuacct.stat
virDomainGetCPUStats(dom, NULL, 0, 0, 1, 0) => 1
virDomainGetCPUStats(dom, params, 1, 0, 3, 0) => -1 (end cpu too large)
virDomainGetCPUStats(dom, params, 2, 0, 2, 0) => 1, with params[0] being
<"usage",ull,45211932635804>, params[1] being zeroed, params[2] being
<"usage",ull,44711067947567>, and params[3] being zeroed
and in the future, if we add guest agent interaction via the flag,
virDomainGetCPUStats(dom, NULL, 0, 0, 1, GUEST) => 1
virDomainGetCPUStats(dom, params, 1, 0, 2, GUEST) => -1 (end cpu too large)
virDomainGetCPUStats(dom, params, 1, 0, 1, GUEST) => 1, with params[0]
being <"guestusage",ull,35211932635804> from a query of the guest agent
One particular implementation point will be that the RPC call should not
transmit zeroed entries. That is, in my example where I passed in an
over-allocated params, the rpc call would compress things so that the
on-the-wire format should transmit just return_value*ncpus blobs, and
deserialization of the wire format would then expand them back into the
strides of the user's params array. Another point is that you must
provide reasonable bounds for both nparams (16 is probably okay,
compared to our other *_PARAMETERS_MAX values in remote_protocol.x) and
ncpus. Others in this thread suggested a minimum bound of 4096 for
ncpus, but that adds up fast when you have 92-byte virTypedParameters
(the on-the-wire format transmits a nul-terminated string rather than
all 80 bytes of any virTypedParameter.name field, but that still means
on-the-wire can easily be sending 32 bytes per parameter). I'd suggest
128 as the cap for ncpus (16 * 32 * 128 is 64k, which is approaching the
rpc bounds we have for a single string anyway), as well as documentation
that a user with more than 128 cpus would have to break things into
chunks to avoid exceeding RPC limitations. Putting limits on both
nparams and ncpus also avoids any potential issues with multiplication
overflow causing what looks like a small product after wraparound.
If others like my thoughts, can you respin your patch series to adjust
your API accordingly?
--
Eric Blake eblake(a)redhat.com +1-919-301-3266
Libvirt virtualization library
http://libvirt.org