-----Original Message-----
From: John Ferlan [mailto:jferlan@redhat.com]
Sent: Wednesday, July 13, 2016 4:02 AM
To: Ren, Qiaowei <qiaowei.ren(a)intel.com>; libvir-list(a)redhat.com
Cc: Peter Krempa <pkrempa(a)redhat.com>
Subject: Re: [libvirt] [PATCH 1/1] perf: add more perf events support
On 06/29/2016 08:10 PM, Qiaowei Ren wrote:
> With current perf framework, this patch adds support to more perf
^^^^^^^ for more perf
> events, including cache missing, cache peference, cpu cycles,
A quick google search turns up "cache references" - there's just too many
peference or peferences references to comment on them all, but they all need
to be "references"
John, according perf code from linux kernel, 'cache peference' is from linux
kernel code. Certainly 'perf list' command show the 'references', and I
will change it to this.
> instrction, etc..
instructions
>
> Signed-off-by: Qiaowei Ren <qiaowei.ren(a)intel.com>
> ---
> docs/formatdomain.html.in | 24 +++++++++++
> docs/schemas/domaincommon.rng | 4 ++
> include/libvirt/libvirt-domain.h | 39 +++++++++++++++++
> src/libvirt-domain.c | 8 ++++
> src/qemu/qemu_driver.c | 23 +++++-----
> src/util/virperf.c | 65 ++++++++++++++++++++++++++++-
> src/util/virperf.h | 4 ++
> tests/genericxml2xmlindata/generic-perf.xml | 4 ++
> 8 files changed, 158 insertions(+), 13 deletions(-)
>
I see no changes for virsh.pod, see commit id '3110363d' for a recent change
Peter made in this space...
I think perhaps it may also be worthwhile to "in a separate patch" alter the
'domstats --perf' description to simply reference the 'perf'
description where each of the collect perf.* events can be listed and described.
Each of the collectible events could have some sort of tabular output - see how
'vol-wipe' describes the various supported algorithms. So much easier to read
than one long sentence.
Sure. I will add one patch about this.
> diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
> index f660aa6..7999e43 100644
> --- a/docs/formatdomain.html.in
> +++ b/docs/formatdomain.html.in
> @@ -1839,6 +1839,10 @@
> <event name='cmt' enabled='yes'/>
> <event name='mbmt' enabled='no'/>
> <event name='mbml' enabled='yes'/>
> + <event name='cache_misses' enabled='no'/>
> + <event name='cache_peferences' enabled='no'/>
> + <event name='instructions' enabled='no'/>
> + <event name='cpu_cycles' enabled='no'/>
> </perf>
> ...
> </pre>
> @@ -1864,6 +1868,26 @@
> <td>bandwidth of memory traffic for a memory controller</td>
> <td><code>perf.mbml</code></td>
> </tr>
> + <tr>
> + <td><code>cache_misses</code></td>
> + <td>the amount of cache missing by applications running on the
> + platform</td>
is this the count of caches misses? amount implies perhaps other things.
Yes. I will change it.
> +
<td><code>perf.cache_misses</code></td>
> + </tr>
> + <tr>
> + <td><code>cache_peferences</code></td>
> + <td>the amount of cache hit by applications running on the
platform</td>
> + <td><code>perf.cache_peferences</code></td>
similar is this the count of cache hits, right?
Yes.
> + </tr>
> + <tr>
> + <td><code>instructions</code></td>
> + <td>the amount of instructions by applications running on the
> + platform</td>
the count of instructions executed...
> + <td><code>perf.instructions</code></td>
> + </tr>
> + <tr>
> + <td><code>cpu_cycles</code></td>
> + <td>the amount of cycles one instruction needs</td>
the number/count of cpu cycles
> + <td><code>perf.cpu_cycles</code></td>
> + </tr>
> </table>
>
> <h3><a
name="elementsDevices">Devices</a></h3>
> diff --git a/docs/schemas/domaincommon.rng
> b/docs/schemas/domaincommon.rng index 563cb3c..e41dc3a 100644
> --- a/docs/schemas/domaincommon.rng
> +++ b/docs/schemas/domaincommon.rng
> @@ -414,6 +414,10 @@
> <value>cmt</value>
> <value>mbmt</value>
> <value>mbml</value>
> + <value>cache_misses</value>
> + <value>cache_peferences</value>
> + <value>instructions</value>
> + <value>cpu_cycles</value>
> </choice>
> </attribute>
> <attribute name="enabled">
> diff --git a/include/libvirt/libvirt-domain.h
> b/include/libvirt/libvirt-domain.h
> index 7ea93aa..b79cdb0 100644
> --- a/include/libvirt/libvirt-domain.h
> +++ b/include/libvirt/libvirt-domain.h
> @@ -1947,6 +1947,45 @@ void
virDomainStatsRecordListFree(virDomainStatsRecordPtr *stats);
> */
> # define VIR_PERF_PARAM_MBML "mbml"
>
> +/**
> + * VIR_PERF_PARAM_CACHE_MISSES:
> + *
> + * Macro for typed parameter name that represents cache_misses perf
> + * event which can be used to measure the amount of cache missing by
s/amount/count
s/cache missing/cache misses/ (missing is a very different context!)
> + * applications running on the platform. It corresponds to the
> + * "perf.cache_misses" field in the *Stats APIs.
> + */
> +# define VIR_PERF_PARAM_CACHE_MISSES "cache_misses"
> +
> +/**
> + * VIR_PERF_PARAM_CACHE_REFERENCES:
> + *
> + * Macro for typed parameter name that represents cache_peferences
> + * perf event which can be used to measure the amount of cache hit
similar... amount/count ... hit/hits
> + * by applications running on the platform. It corresponds to the
> + * "perf.cache_peferences" field in the *Stats APIs.
> + */
> +# define VIR_PERF_PARAM_CACHE_REFERENCES "cache_peferences"
> +
> +/**
> + * VIR_PERF_PARAM_INSTRUCTIONS:
> + *
> + * Macro for typed parameter name that represents instructions perf
> + * event which can be used to measure the amount of instructions
similar amount/count
> + * by applications running on the platform. It corresponds to the
> + * "perf.instructions" field in the *Stats APIs.
> + */
> +# define VIR_PERF_PARAM_INSTRUCTIONS "instructions"
> +
> +/**
> + * VIR_PERF_PARAM_CPU_CYCLES:
> + *
> + * Macro for typed parameter name that represents cpu_cycles perf
> +event
> + * which can be used to measure how many cycles one instruction needs.
> + * It corresponds to the "perf.cpu_cycles" field in the *Stats APIs.
> + */
The cycles and instructions seem to me to be things that could be very large and
awkward to print
> +# define VIR_PERF_PARAM_CPU_CYCLES "cpu_cycles"
> +
> int virDomainGetPerfEvents(virDomainPtr dom,
> virTypedParameterPtr *params,
> int *nparams, diff --git
> a/src/libvirt-domain.c b/src/libvirt-domain.c index 4e71a94..b817e4b
> 100644
> --- a/src/libvirt-domain.c
> +++ b/src/libvirt-domain.c
> @@ -11452,6 +11452,14 @@ virConnectGetDomainCapabilities(virConnectPtr
conn,
> * "perf.mbml" - the amount of data (bytes/s) sent through the memory
controller
> * on the socket as unsigned long long. It is produced by mbml
> * perf event.
> + * "perf.cache_misses" - the amount of cache missing as unsigned long
long.
> + * It is produced by cache_misses perf event.
> + * "perf.cache_peferences" - the amount of cache hit as unsigned long
long.
> + * It is produced by cache_peferences perf event.
> + * "perf.instructions" - the amount of instructions as unsigned long
long.
> + * It is produced by instructions perf event.
> + * "perf.cpu_cycles" - the amount of cycles one instruction needs as
> + unsigned
> + * long long. It is produced by cpu_cycles perf event.
Similar "amount" vs. "count"
> *
> * Note that entire stats groups or individual stat fields may be missing from
> * the output in case they are not supported by the given hypervisor,
> are not diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> index 61d184b..bea753f 100644
> --- a/src/qemu/qemu_driver.c
> +++ b/src/qemu/qemu_driver.c
> @@ -9613,6 +9613,10 @@ qemuDomainSetPerfEvents(virDomainPtr dom,
> VIR_PERF_PARAM_CMT, VIR_TYPED_PARAM_BOOLEAN,
> VIR_PERF_PARAM_MBMT, VIR_TYPED_PARAM_BOOLEAN,
> VIR_PERF_PARAM_MBML,
> VIR_TYPED_PARAM_BOOLEAN,
> + VIR_PERF_PARAM_CACHE_MISSES,
VIR_TYPED_PARAM_BOOLEAN,
> + VIR_PERF_PARAM_CACHE_REFERENCES,
VIR_TYPED_PARAM_BOOLEAN,
> + VIR_PERF_PARAM_INSTRUCTIONS,
VIR_TYPED_PARAM_BOOLEAN,
> + VIR_PERF_PARAM_CPU_CYCLES,
> + VIR_TYPED_PARAM_BOOLEAN,
> NULL) < 0)
> return -1;
>
> @@ -18941,10 +18945,10 @@ qemuDomainGetStatsBlock(virQEMUDriverPtr
> driver, #undef QEMU_ADD_COUNT_PARAM
>
> static int
> -qemuDomainGetStatsPerfRdt(virPerfPtr perf,
> - virPerfEventType type,
> - virDomainStatsRecordPtr record,
> - int *maxparams)
> +qemuDomainGetStatsPerfOneEvent(virPerfPtr perf,
> + virPerfEventType type,
> + virDomainStatsRecordPtr record,
> + int *maxparams)
This change seems separable. That is it's own patch to change the name of the
function because it's going to be multiple/general purpose soon.
Ok. I will separate it from this patch.
> {
> char param_name[VIR_TYPED_PARAM_FIELD_LENGTH];
> uint64_t value = 0;
> @@ -18980,14 +18984,9 @@ qemuDomainGetStatsPerf(virQEMUDriverPtr
driver ATTRIBUTE_UNUSED,
> if (!virPerfEventIsEnabled(priv->perf, i))
> continue;
>
> - switch (i) {
> - case VIR_PERF_EVENT_CMT:
> - case VIR_PERF_EVENT_MBMT:
> - case VIR_PERF_EVENT_MBML:
And removing the need for a switch could be separate too... Trying to think if
it's still necessary though. virPerfEventIsEnabled will return NULL if i >=
VIR_PERF_EVENT_LAST, so it doesn't seem some future client could request an
event that some older driver doesn't support.
virPerfEventIsEnabled will return false (not NULL) if I >= VIR_PERF_EVENT_LAST.
Some future clients could not enable those events that older driver doesn't support.
> - if (qemuDomainGetStatsPerfRdt(priv->perf, i,
record, maxparams) < 0)
> - goto cleanup;
> - break;
> - }
> + if (qemuDomainGetStatsPerfOneEvent(priv->perf, i,
> + record, maxparams) < 0)
> + goto cleanup;
> }
>
> ret = 0;
> diff --git a/src/util/virperf.c b/src/util/virperf.c index
> 4661ba3..a3d2bc6 100644
> --- a/src/util/virperf.c
> +++ b/src/util/virperf.c
> @@ -38,7 +38,9 @@ VIR_LOG_INIT("util.perf"); #define VIR_FROM_THIS
> VIR_FROM_PERF
>
> VIR_ENUM_IMPL(virPerfEvent, VIR_PERF_EVENT_LAST,
> - "cmt", "mbmt", "mbml");
> + "cmt", "mbmt", "mbml",
> + "cache_misses", "cache_peferences",
> + "instructions", "cpu_cycles");
>
> struct virPerfEvent {
> int type;
> @@ -189,6 +191,60 @@ virPerfRdtEnable(virPerfEventPtr event,
> return -1;
> }
>
> +static int
> +virPerfGeneralEnable(virPerfEventPtr event,
> + pid_t pid)
Currently, these are less "General" and more "Hardware" events
Based on what I see in perf_event.h, I suspect the future could hold getting
software, tracepoint, hw_cache, raw, and breakpoint events too.
Perhaps in order to be more "General" the "type" and
"config" parameters could
be passed...
> +{
> + struct perf_event_attr attr;
> +
> + memset(&attr, 0, sizeof(attr));
> + attr.size = sizeof(attr);
> + attr.inherit = 1;
> + attr.disabled = 1;
> + attr.enable_on_exec = 0;
> +
> + switch (event->type) {
> + case VIR_PERF_EVENT_CACHE_MISSES:
> + attr.type = PERF_TYPE_HARDWARE;
^^^^^^^^^
this doesn't change for any of the cases
> + attr.config = PERF_COUNT_HW_CACHE_MISSES;
> + break;
> + case VIR_PERF_EVENT_CACHE_REFERENCES:
> + attr.type = PERF_TYPE_HARDWARE;
> + attr.config = PERF_COUNT_HW_CACHE_REFERENCES;
> + break;
> + case VIR_PERF_EVENT_INSTRUCTIONS:
> + attr.type = PERF_TYPE_HARDWARE;
> + attr.config = PERF_COUNT_HW_INSTRUCTIONS;
> + break;
> + case VIR_PERF_EVENT_CPU_CYCLES:
> + attr.type = PERF_TYPE_HARDWARE;
> + attr.config = PERF_COUNT_HW_CPU_CYCLES;
> + break;
> + }
...Seems like it would be possible to create some sort of static table/matrix that
would be able to convert the VIR_PERF_EVENT_* into their respective
"attr.type" and "attr.config", so that this function doesn't have
the switch and
the calling function passes by value the 'type' and 'config'. Assuming
of course
the future is to get other events.
Yes. This is a good idea. And I will add the static table.
> +
> + event->fd = syscall(__NR_perf_event_open, &attr, pid, -1, -1, 0);
> + if (event->fd < 0) {
> + virReportSystemError(errno,
> + _("Unable to open perf event for %s"),
> + virPerfEventTypeToString(event->type));
> + goto error;
> + }
> +
> + if (ioctl(event->fd, PERF_EVENT_IOC_ENABLE) < 0) {
> + virReportSystemError(errno,
> + _("Unable to enable perf event for %s"),
> + virPerfEventTypeToString(event->type));
> + goto error;
> + }
> +
> + event->enabled = true;
> + return 0;
> +
> + error:
> + VIR_FORCE_CLOSE(event->fd);
> + return -1;
> +}
> +
> int
> virPerfEventEnable(virPerfPtr perf,
> virPerfEventType type, @@ -205,6 +261,13 @@
> virPerfEventEnable(virPerfPtr perf,
> if (virPerfRdtEnable(event, pid) < 0)
I see this PerfRdt reference here which led me to wonder why both aren't being
changed, but I think I understand now. Of course, now I wonder what the Rdt
acronym means (is it Resource Director Technology?). If it's an acronym, it's
nice
to see it spelled out at least once. Probably in some earlier commit which I didn't
chase. The lack of function comments is, well, frustrating.
The comments about RDT will be added.
> return -1;
> break;
> + case VIR_PERF_EVENT_CACHE_MISSES:
> + case VIR_PERF_EVENT_CACHE_REFERENCES:
> + case VIR_PERF_EVENT_INSTRUCTIONS:
> + case VIR_PERF_EVENT_CPU_CYCLES:
> + if (virPerfGeneralEnable(event, pid) < 0)
> + return -1;
> + break;
here is where there could be some sort of tabular/matrix reference to get the
'type' and 'config' values filled in to pass.
> case VIR_PERF_EVENT_LAST:
> virReportError(VIR_ERR_INTERNAL_ERROR,
> _("Unexpected perf event type=%d"), type);
> diff --git a/src/util/virperf.h b/src/util/virperf.h index
> 7163410..7129370 100644
> --- a/src/util/virperf.h
> +++ b/src/util/virperf.h
> @@ -28,6 +28,10 @@ typedef enum {
> VIR_PERF_EVENT_CMT,
> VIR_PERF_EVENT_MBMT,
> VIR_PERF_EVENT_MBML,
> + VIR_PERF_EVENT_CACHE_MISSES,
> + VIR_PERF_EVENT_CACHE_REFERENCES,
> + VIR_PERF_EVENT_INSTRUCTIONS,
> + VIR_PERF_EVENT_CPU_CYCLES,
Not that it matters too much since the numbers are different, but the order here
is different than perf_hw_id. "Sometimes" it's kinder to access memory
sequentially rather than somewhat randomly. Easier to see for future changers
what was already too...
Any reason to not go after the other 6 events in 'perf_hw_id'?
There are a lot of different perf events, but these 4 events will be used by
OpenStack, and so currently I only add them in this patch.
>
> VIR_PERF_EVENT_LAST
> } virPerfEventType;
> diff --git a/tests/genericxml2xmlindata/generic-perf.xml
> b/tests/genericxml2xmlindata/generic-perf.xml
> index 394d2a6..6428ebd 100644
> --- a/tests/genericxml2xmlindata/generic-perf.xml
> +++ b/tests/genericxml2xmlindata/generic-perf.xml
> @@ -16,6 +16,10 @@
> <event name='cmt' enabled='yes'/>
> <event name='mbmt' enabled='no'/>
> <event name='mbml' enabled='yes'/>
> + <event name='cache_misses' enabled='no'/>
> + <event name='cache_peferences' enabled='no'/>
> + <event name='instructions' enabled='no'/>
> + <event name='cpu_cycles' enabled='no'/>
All 4 are 'no', maybe make 1 or 2 'yes' (not that it matters, but they
are a
different table underneath the covers)
Yes.
Thanks,
Qiaowei