For the supporting of Intel x86 Cache Monitoring Technology (CMT) in
libvirt,
we have already discussed a couple of times, and I have raised two RFCs, you
can find the discussion from the following links, and thanks to the people
who participated in the discussion online and offline.
RFCv2 links
https://www.redhat.com/archives/libvir-list/2018-July/msg00409.html
https://www.redhat.com/archives/libvir-list/2018-July/msg01241.html
RFCv1
https://www.redhat.com/archives/libvir-list/2018-June/msg00674.html
Nearly one month has passed since last online discussion, in this month, MBA
(memory bandwidth allocation) feature has been introduced by Niu Bin, and
some fundamental code for CMT feature has been changed. This change is not
that significant, and I wrote my POC code for this RFC again upon the new
virresctrl framework, I also made some changes for the CMT feature.
I think it is better to summarize my thought about the enabling of CMT here
before I submit my code. I'll try to keep consistent with the previous
discussion.
1. What is the purpose of this RFC and later patches?
Short answer:
Introducing Kernel resctrlfs based x86 CMT feature to libvirt.
Detail explanation:
Latest kernel has removed the "cmt, mbmt, mbml" perf event, and libvirt
relies on these kernel perf events to report the result of CMT/MBM through
virperf interface. Libvirt will no longer support CMT/MBM feature for Linux
distribution with the latest kernel, while there CPU features are vital
to some software such as Openstack Nova. About the deprecation of
cmt/mbm perf
events, refer to the following link for details:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
/commit/?id=c39a0e2c8850f08249383f2425dbd8dbe4baad69
Latest kernel introduces the RDT feature, including CAT MBA CDP CMT
and MBM, through resource control file system (resctrl fs), and libvirt
already implemented the cache and memory bandwidth allocation through kernel
resctrlfs interface. I'd like to add the cache and memory bandwidth
monitoring
feature to libvirt through kernel resctrlfs interface.
Since CMT and MBM shares a very similar kernel interface, for
simplicity,
only discuss CMT here. MBM is the followup that will be implemented
after CMT,
which should be very straightforward once CMT is ready.
It is planned to output the cache monitoring result through 'domstats'
command, and create or delete monitoring groups with either domain's
statical
XML configuration file or a new 'virsh' runtime command.
2. How many stages scheduled to implement the libvirt CMT(MBM) feature?
Short answer:
4 stages.
Detail explanation:
stage 1: Statical CMT feature.
Implementing the libvirt interface for creating and deleting CMT groups
for statically through domain's XML configuration file, changes only takes
effect after a reboot.
Stage 2: Statical MBM feature.
Very similar to stage1 CMT.
Stage 3: Dynamical CMT feature.
In this stage, it is aimed to implement interfaces to change the CMT
groups dynamically at runtime, with no requirement for a reboot.
My basic thought is implementing a new 'virsh' sub-command providing all
interfaces. I also hope to implement the functionality of dynamical cache
allocation (CAT) with this command.
Stage 4: Dynamical MBM feature.
Depending on the implementation of stage3, should share many interfaces
created at stage 3.
This RFC mainly covers the discussion of stage1.
3. What is the interface for statical CMT feature (the output and input)?
Short answer:
output through command 'domstats'; input through XML file.
Detail explanation:
Output:
Either CMT or MBM is Intel x86 CPU feature, so I put the statistical
message under the result of 'cpu-total' subcommand.
This is different with RFCv2, in RFCv2 a separate 'domstats' sub-command has
been created, with the name 'cpu-resource', Martin and I were not very
satisfied with the the naming.
The output would be:
virsh domstats --cpu--total
Domain: 'ubuntu16.04-base'
cpu.time=3143918715755
cpu.user=15320000000
cpu.system=235280000000
cpu.cache.monitor.count=3
cpu.cache.0.name=vcpus_2
cpu.cache.0.vcpus=2
cpu.cache.0.bank.count=2
cpu.cache.0.bank.0.bytes=5488
cpu.cache.0.bank.1.bytes =4410000
cpu.cache.1.name=vcpus_1
cpu.cache.1.vcpus=1
cpu.cache.1.bank.count=2
cpu.cache.1.bank.0.bytes =7839744
cpu.cache.1.bank.1.bytes =0
cpu.cache.2.name=vcpu_0,3
cpu.cache.2.vcpus=0,3
cpu.cache.2.bank.count=2
cpu.cache.2.bank.0.bytes=53796864
cpu.cache.2.bank.1.bytes=0
Comparing with RFCv2, the CMT information is slightly changed.
Input:
Comparing with RFC v2, this part has been changed. Now the resource
monitoring group is considered as a resource monitor toward specific
allocation.
The main interface for creating monitoring group is through XML
file. The
proposed configuration is like:
<cputune>
<cachetune vcpus='0-1'>
<cache id='0' level='3' type='code'
size='7680' unit='KiB'/>
<cache id='1' level='3' type='data'
size='3840' unit='KiB'/>
+ <monitor vcpus='0-1'/>
+ <monitor vcpus='0'/>
</cachetune>
<cachetune vcpus='3'>
+ <monitor vcpus='3'/>
</cachetune>
</cputune>
In above XML, created 2 cache resctrl allocation groups and 3 resctrl
monitoring groups.
For cache allocation group with vcpu 0 and vcpu 1, there are two monitors:
one is monitoring the last level cache occupancy of two vcpus, while the
other one only monitors the cache occupancy of vcpu 0.
In another cache allocation, there is no explicit cache resource allocation,
this cache allocation uses the resource of 'default' cache allocation group,
and a cache monitor is created to monitor the cache occupancy.
4. Have emulator and io threads been considered for CMT?
Short answer:
No. Unsupport to allocate dedicated cache or memory bandwidth for
emulator
and io threads.