Hi Martin,
Thanks for your comments, please see my update inline below.
-----Original Message-----
From: Martin Kletzander [mailto:mkletzan@redhat.com]
Sent: Monday, June 11, 2018 4:30 PM
To: Wang, Huaqiang <huaqiang.wang(a)intel.com>
Cc: libvir-list(a)redhat.com; Feng, Shaohe <shaohe.feng(a)intel.com>; Niu, Bing
<bing.niu(a)intel.com>; Ding, Jian-feng <jian-feng.ding(a)intel.com>; Zang, Rui
<rui.zang(a)intel.com>
Subject: Re: [libvirt] [RFC PATCH 0/3] RFC for X86 RDT Cache Monitoring
Technology (CMT) support
[It would be nice if you wrapped the long lines]
I'll pay attention to these
long lines. Thanks for advices.
On Fri, Jun 08, 2018 at 05:02:16PM +0800, Wang Huaqiang wrote:
>This is an RFC request for supporting CPU Cache Monitoring Technology (CMT)
feature in libvirt. Since MBM is also another feature which is very close to CMT,
for simplicity we only discuss CMT here. MBM is the followup that will be
implemented after CMT.
>About CMT please refer to Intel x86 SDM section 17.18 of volume 3
(
link:https://software.intel.com/en-us/articles/intel-sdm).
>
Can you elaborate on how is this different to the CMT perf event that is already
in libvirt and can be monitored through domstats API?
Due to kernel interface removal of the perf events 'cmt,mbmt,mbml', the libvirt
will no
longer work with latest kernel. Please examine following link for details.
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/c...,
This serials is trying to provide the similar functions of this missing part for
reporting
cmt, mbmt and mbml information. First we only focus on cmt.
Comparing with 'CMT perf event already in libvirt', I am trying to implement
almost
the same output as 'perf.cmt' in the output message of 'domstats', but
with another
name , such as 'resctrl.cmt' or 'rdt.cmt' (or some others).
Another difference is that the underlying implementation is done through the
kernel resctrl fs.
This serials also attempts to provide a command interface for enabling and disabling
cmt feature in scope of whole domain as original perf event based cmt could be
controlled, enabled or disabled, through specifying '--enable cmt' or
'--disable cmt'
while invoking command 'virsh perf <domain>'.
Our version is like 'virsh resctrl <domain> --enable' with a difference of
no suffix of
'cmt'. The 'cmt' is omitted because the CMT and MBM function are both
enabled
whenever a valid resctrl fs sub-folder created, there is no way to disable one while
enable another one, such as enabling CMT while disabling MBML at the same time.
This serials is trying to stick to interfaces exposed by perf event based CMT/MBM
and provide an interface substitution for perf event based CMB/MBM, such as
the perf based CMT only provides the cache occupancy information for whole
domain only. We are also in thinking providing the capability to provide the
cache occupancy information based on vcpus groups which may be specified in
XML file.
For example, if we have following configuration:
<cputune>
<vcpupin vcpu='0' cpuset='1'/>
<vcpupin vcpu='1' cpuset='3-4'/>
<vcpupin vcpu='2' cpuset='4-5'/>
<vcpupin vcpu='3' cpuset='6-7'/>
<cachetune vcpus='0'>
<cache id='0' level='3' type='both' size='2816'
unit='KiB'/>
<cache id='1' level='3' type='both' size='2816'
unit='KiB'/>
</cachetune>
<cachetune vcpus='1-2'>
<cache id='0' level='3' type='both' size='2816'
unit='KiB'/>
<cache id='1' level='3' type='both' size='2816'
unit='KiB'/>
</cachetune>
<rdt-monitoring vcpu='0' enable='yes'>
<rdt-monitoring vcpu='1-2' enable='yes'>
<rdt-monitoring vcpu='3' enable='yes'>
</cputune>
The 'domstats' will output following information regarding cmt
[root@dl-c200 libvirt]# virsh domstats vm1 --resctrl
Domain: 'vm1'
rdt.cmt.total=645562
rdt.cmt.vcpu0=104331
rdt.cmt.vcpu1_2=203200
rdt.cmt.vcpu3=340129
Those updates address your comment for " Can you elaborate on how is
this different to the CMT perf event that is already in libvirt and can be
monitored through domstats API?", any input is welcome.
https://libvirt.org/formatdomain.html#elementsPerf
>## About '_virResctrlMon' interface
>
>The cache allocation technology (CAT) has already been implemented in
>util/virresctrl.* which interacts with Linux kernel resctrl file
>system. Very simlimar to CAT, the CMT object is represented by 'struct
>_virResctrlMon', which is
>
>```
>struct _virResctrlMon {
> virObject parent;
>
> /* pairedalloc: pointer to a resctrl allocaion it paried with.
> * NULL for a resctrl monitoring group not associated with
> * any allocation. */
> virResctrlAllocPtr pairedalloc;
> /* The identifier (any unique string for now) */
> char *id;
> /* libvirt-generated path, may be identical to alloction path
> * may not if allocation is ready */
> char *path;
>};
>```
>
>Almost following the same logic behind '_virResctrlAlloc' which is mainly
presented in file 'virresctrl.c', a group of APIs has been designed to
manipulate
'_virResctrlMon'. The '_virResctrlMon' shares a lot in common with
'_virResctrlAlloc' except field 'pairedalloc'.
>'pairedalloc' stores the pointer of paired resctrl allocation object. With
current
libvirt resctrl implementation, if a resctrl '_virResctrlAlloc' object is
created, the
CMT hardware is enabled automatically and shares the same folder under same
resctrlfs, I call the CMT '_virResctrlMon' object that shares the same folder
under resctrlfs as 'paired' _virResctrlMon, further, one '_virResctrlMon'
and one
'_virResctrlAlloc' are a pare. In '_virResctrlMon' the paired
'_virResctrlAlloc' is
tracked through pairedalloc. paired mon group could not be dynamically enabled
or disabled during runtime.
>'pairedalloc' could be set to NULL, which creates a non-paired mon group
object. Which is necessory because CMT could work independently to monitor
the utilization of critical CPU resouces (cache or memory bandwidth) without
allocating any dedicated cache or memory bandwidth. A non-paired mon group
object represents an independent working CMT. Non-paired mon group could be
enabled or disabled during runtime.
>
>## About virsh command 'resctrl'
>
>To set or get the resctrl mon group (hardware CMT), a virsh command
'resctrl'
is created. here are the common usages:
The command does make sense for people who know how the stuff works on
the inside or have seen the code in libvirt. For other users the name 'resctrl'
is
going to feel very much arbitrary. We re trying to abstract the details for users,
so I don't see why it should be named 'resctrl' when it handles "RDT
Monitoring
Status".
Agree. 'resctrl' do make a lot of confusion to end users.
Since the underlying kernel interface combines CAT and MBM features together,
what I mean is , the files ' llc_occupancy', ' mbm_local_bytes' and '
mbm_total_bytes'
that represent the information of cache, local memory bandwidth, and total
memory bandwidth respectively are created automatically and simultaneously for
each resctrl group, there is no way to enable one and disable another one. So for
a command which affects both cache and memory bandwidth, I would like to use
the word 'rdt' as the key command word. Both cache monitoring(CMT) and memory
bandwidth monitoring(MBM) are belong to the scope of RDT monitoring.
So to replace the confusing word 'resctrl', I'd like to use 'rdtmon'
as command name,
the command 'virsh resctrl <domain>' would be changed to 'virsh rdtmon
<domain>'.
Also, here welcoming any suggestions from community.
>```
>[root@dl-c200 david]# virsh list --all
> Id Name State
>----------------------------------------------------
> 1 vm3 running
> 3 vm2 running
> - vm1 shut off
>```
>
>### Test on a running domain vm3
>To get RDT monitoring status, type 'virsh resctrl <domain>'
>```
> [root@dl-c200 david]# virsh resctrl vm3
> RDT Monitoring Status: Enabled
>```
>
>To enable RDT monitoring, type 'virsh resctrl <domain> --enable'
>```
> [root@dl-c200 david]# virsh resctrl vm3 --enable
> RDT Monitoring Status: Enabled
>```
>
>To diable RDT monitoring, type 'virsh resctrl <domain> --disable'
>```
> [root@dl-c200 david]# virsh resctrl vm3 --disable
> RDT Monitoring Status: Disabled
>
> [root@dl-c200 david]# virsh resctrl vm3
> RDT Monitoring Status: Disabled
>```
>
>### test on domain not running vm1
>if domain is not active, it will fail to set RDT monitoring status, and also get the
state of 'disabled'
>```
> [root@dl-c200 david]# virsh resctrl vm1
> RDT Monitoring Status: Disabled
>
> [root@dl-c200 david]# virsh resctrl vm1 --enable
> error: Requested operation is not valid: domain is not running
>
> [root@dl-c200 david]# virsh resctrl vm1 --disable
> error: Requested operation is not valid: domain is not running ```
>
Can't these commands enable it in the XML? It would be nice if the XML part
was
shown here in the explanation.
In the POC code of the first version there is no XML changes, and could not be
enabled/disabled through XML file.
Let's have a discuss and add this function, how about this configuration
<cputune>
<cachetune vcpus='1-2'>
<cache id='0' level='3' type='both' size='2816'
unit='KiB'/>
<cache id='1' level='3' type='both' size='2816'
unit='KiB'/>
</cachetune>
<rdt-monitoring vcpu='0' enable='no'>
<rdt-monitoring vcpu='1-2' enable='yes'>
<rdt-monitoring vcpu='3' enable='yes'>
</cputune>
With upper setting,
- Two rdt monitoring groups will be created along with the launch of vm.
- <rdt-monitoring vcpu='1-2' enable='yes'> is created automatically
due to the
setting of <cachetune>. Under resctrl fs, the resctrl allocation and rdt monitoring
group are presented in the way of sub-folders, we cannot create two sub-folders
under resctrl fs folders for one process. so a resctrl allocation will create a rdt
monitoring group as well. This rdt monitoring group could not be disabled in
runtime because there is no way to disable resctrl allocation (CAT) in runtime.
- <rdt-monitoring vcpu='3' enable='yes'> creates another default
enabled rdt
monitoring group, and task id (pid associated with vcpu3) will be put into the
'tasks' file. This rdt monitoring over vcpu 3 could be enabled or disabled in
runtime through command such as 'virsh rdtmon --enable vcpu3' .
The MBM feature will also be enabled or disabled with this command.
- <rdt-monitoring vcpu='0' enable='no'> specifies the default CAT
state for vcpu0
of domain, which is disabled after launch, and could be changed in runtime.
>### test on domain vm2
>domain vm2 is active and the CAT functionality is enabled through
'cachetune'
(configured in 'cputune/cachetune' section). So the resctrl mon group is a
'paried' one, for 'pared' mon group, the RDT monitoring could not be
disabled. If
it is allowed to disable 'paire' mon group, we have to destroy resctrl
allocation
folders which is not supported by current cache allocation design.
What if you have multiple cachetunes? What if the cachetune is only set for one
vcpu and you want to monitor the others as well? I guess I have to see the
patches to understand why you have so much information stored for something
that
looks like a boolean (enable/disable).
At the time I raised this RFC, there is no design for reporting rdt monitoring
information in granularity of cachetune, only report cache /memory bandwidth
information for whole domain.
But now I'd like to discuss the design that I list above, reporting rdt monitoring
Information based on the setting of rdt-monitoring(cachetune) groups. Need your
comments.
>```
> [root@dl-c200 libvirt]# virsh resctrl vm2 --enable
> RDT Monitoring Status: Enabled (forced by cachetune)
>
> [root@dl-c200 libvirt]# virsh resctrl vm2 --disable
> RDT Monitoring Status: Enabled (forced by cachetune)
>
> [root@dl-c200 libvirt]# virsh resctrl vm2
> RDT Monitoring Status: Enabled (forced by cachetune)
>```
>
>## About showing the utilization information of RDT
>
>A domstats field has been created to show the utilization of RDT resources, the
command is like this:
>```
> [root@dl-c200 libvirt]# virsh domstats --resctrl
> Domain: 'vm1'
> resctrl.cmt=0
>
> Domain: 'vm3'
> resctrl.cmt=180224
>
> Domain: 'vm2'
> resctrl.cmt=2613248
>```
>
>
>Wang Huaqiang (3):
> util: add Intel x86 RDT/CMT support
> tools: virsh: add command for controling/monitoring resctrl
> tools: virsh domstats: show RDT CMT resource utilization information
>
> include/libvirt/libvirt-domain.h | 10 ++
> src/conf/domain_conf.c | 28 ++++
> src/conf/domain_conf.h | 3 +
> src/driver-hypervisor.h | 8 +
> src/libvirt-domain.c | 92 +++++++++++
> src/libvirt_private.syms | 9 +
> src/libvirt_public.syms | 6 +
> src/qemu/qemu_driver.c | 189 +++++++++++++++++++++
> src/qemu/qemu_process.c | 65 +++++++-
> src/remote/remote_daemon_dispatch.c | 45 +++++
> src/remote/remote_driver.c | 2 +
> src/remote/remote_protocol.x | 28 +++-
> src/remote_protocol-structs | 12 ++
> src/util/virresctrl.c | 316 +++++++++++++++++++++++++++++++++++-
> src/util/virresctrl.h | 44 +++++
> tools/virsh-domain-monitor.c | 7 +
> tools/virsh-domain.c | 74 +++++++++
> 17 files changed, 933 insertions(+), 5 deletions(-)
>
>--
>2.7.4
>
>--
>libvir-list mailing list
>libvir-list(a)redhat.com
>https://www.redhat.com/mailman/listinfo/libvir-list