On Mon, Jul 20, 2015 at 01:50:54PM +0000, Ren, Qiaowei wrote:
> -----Original Message-----
> From: Daniel P. Berrange [mailto:berrange@redhat.com]
> Sent: Monday, July 20, 2015 5:32 PM
> To: Ren, Qiaowei
> Cc: libvir-list(a)redhat.com
> Subject: Re: [libvirt] [PATCH 2/3] Qemu: add CMT support
>
> On Sun, Jul 05, 2015 at 07:43:43PM +0800, Qiaowei Ren wrote:
> > One RFC in
> >
https://www.redhat.com/archives/libvir-list/2015-June/msg01509.html
> >
> > CMT (Cache Monitoring Technology) can be used to measure the usage of
> > cache by VM running on the host. This patch will extend the bulk stats
> > API (virDomainListGetStats) to add this field. Applications based on
> > libvirt can use this API to achieve cache usage of VM. Because CMT
> > implementation in Linux kernel is based on perf mechanism, this patch
> > will enable perf event for CMT when VM is created and disable it when
> > VM is destroyed.
> >
> > Signed-off-by: Qiaowei Ren <qiaowei.ren(a)intel.com>
>
> > diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index
> > 4cfae03..8c678c9 100644
> > --- a/src/qemu/qemu_driver.c
> > +++ b/src/qemu/qemu_driver.c
> > @@ -19320,6 +19320,53 @@ qemuDomainGetStatsBlock(virQEMUDriverPtr
> > driver,
> >
> > #undef QEMU_ADD_COUNT_PARAM
> >
> > +static int
> > +qemuDomainGetStatsCache(virQEMUDriverPtr driver ATTRIBUTE_UNUSED,
> > + virDomainObjPtr dom,
> > + virDomainStatsRecordPtr record,
> > + int *maxparams,
> > + unsigned int privflags ATTRIBUTE_UNUSED)
>
> So this is a method that is used to collect per-domain information
>
> > +{
> > + qemuDomainObjPrivatePtr priv = dom->privateData;
> > + FILE *fd;
> > + unsigned long long cache = 0;
> > + int scaling_factor = 0;
> > +
> > + if (priv->cmt_fd <= 0)
> > + return -1;
> > +
> > + if (read(priv->cmt_fd, &cache, sizeof(uint64_t)) < 0) {
> > + virReportSystemError(errno, "%s",
> > + _("Unable to read cache data"));
> > + return -1;
> > + }
> > +
> > + fd = fopen("/sys/devices/intel_cqm/events/llc_occupancy.scale",
"r");
> > + if (!fd) {
> > + virReportSystemError(errno, "%s",
> > + _("Unable to open CMT scale file"));
> > + return -1;
> > + }
> > + if (fscanf(fd, "%d", &scaling_factor) != 1) {
> > + virReportSystemError(errno, "%s",
> > + _("Unable to read CMT scale file"));
> > + VIR_FORCE_FCLOSE(fd);
> > + return -1;
> > + }
> > + VIR_FORCE_FCLOSE(fd);
>
> But this data you are reading is global to the entire host.
>
In fact this data is only for per-domain. When the perf syscall is
called to enable perf event for domain, the pid of that domain is
passed.
Ah, I see - you rely on the open file descriptor to be associated
with the VM pid.
> > -5122,6 +5199,15 @@ void qemuProcessStop(virQEMUDriverPtr
driver,
> > virPortAllocatorRelease(driver->migrationPorts, priv->nbdPort);
> > priv->nbdPort = 0;
> >
> > + /* Disable CMT */
> > + if (priv->cmt_fd > 0) {
>
> You can't rely on keeping an open file descriptor for the guest because
libvirtd
> may be restarted.
>
Sorry, I don't really get the meaning of this. You mean that when libvirtd
is restarted, those resource which the domain opened should be closed,
right?
No, when libvirtd is restarted, the domains must all continuing running
without loss of state. You open the FD when starting the guest, then
libvirtd is restarted, now someone wants to query the perf data. The
perf FD will not be open anymore because libvirtd was restarted. At
least you'd need to re-open the file descriptor when libvirtd starts
up again, for any running guest. I'm not really convinced we want to
keep the perf file descriptors open for all the domains for the entire
time they are running. Should really only open them when we actually
want to read the collected data.
> > + if (ioctl(priv->cmt_fd, PERF_EVENT_IOC_DISABLE) < 0) {
> > + virReportSystemError(errno, "%s",
> > + _("Unable to disable perf event for
CMT"));
> > + }
> > + VIR_FORCE_CLOSE(priv->cmt_fd);
> > + }
> > +
> > if (priv->agent) {
> > qemuAgentClose(priv->agent);
> > priv->agent = NULL;
>
>
> Conceptually I think this approach to implementation is flawed. While you are
> turning on/off the perf events for each QEMU process, the data collection does
> not distinguish data from each QEMU process - the data reported is host wide.
> So this really doesn't make much sense IMHO.
>
As mentioned above, the data reported is only for domain.
> I'm also wondering whether this is really going to be sufficiently useful on
its
> own. CPUs have countless other performance counters that I would imagine
> apps/admins will want to read in order to analyse QEMU performance, beyond
> this new CMT feature. The domain stats API won't really scale up to dealing
with
> arbitrary perf event counter reporting so I'm not much of a fan of just special
> casing CMT in this way.
>
> IOW, if we want to support host performance analysis in libvirt, then we
> probably want to design an set of APIs specifically for this purpose, but I could
> well see us saying that this is out of scope for libvirt and apps shoud just use
the
> linux perf interfaces directly.
>
Yes. I can get what you mean. Maybe libvirt doesn't have to be responsible
for supporting host performance.
But I guess cache usage should be important for each domain, if those apps
based on libvirt can achieve this information they will be able to better
check and confirm the domain works normally, like the stats of
cpu/memory/block/... which have be supported in libvirt now. Do you think
so?
I'm not saying cache usage is unimportant. There are quite a lot of other
hardware event counters in modern CPUs though, so I'm asking why we should
add just this new special intel event, and not any of the other existing
performance counters that are useful in diagnosing performance issues.
Also, I'm thinking that QEMU has many different threads - VCPU threads,
I/O threads, emulator threads and so on. I could see users want to have
distinct profiling for these different functional areas of QEMU too
instead of just whole-QEMU granularity.
Regards,
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|