[libvirt] [RFC v3] x86 RDT Cache Monitoring Technology (CMT)

For the supporting of Intel x86 Cache Monitoring Technology (CMT) in libvirt, we have already discussed a couple of times, and I have raised two RFCs, you can find the discussion from the following links, and thanks to the people who participated in the discussion online and offline. RFCv2 links https://www.redhat.com/archives/libvir-list/2018-July/msg00409.html https://www.redhat.com/archives/libvir-list/2018-July/msg01241.html RFCv1 https://www.redhat.com/archives/libvir-list/2018-June/msg00674.html Nearly one month has passed since last online discussion, in this month, MBA (memory bandwidth allocation) feature has been introduced by Niu Bin, and some fundamental code for CMT feature has been changed. This change is not that significant, and I wrote my POC code for this RFC again upon the new virresctrl framework, I also made some changes for the CMT feature. I think it is better to summarize my thought about the enabling of CMT here before I submit my code. I'll try to keep consistent with the previous discussion. 1. What is the purpose of this RFC and later patches? Short answer: Introducing Kernel resctrlfs based x86 CMT feature to libvirt. Detail explanation: Latest kernel has removed the "cmt, mbmt, mbml" perf event, and libvirt relies on these kernel perf events to report the result of CMT/MBM through virperf interface. Libvirt will no longer support CMT/MBM feature for Linux distribution with the latest kernel, while there CPU features are vital to some software such as Openstack Nova. About the deprecation of cmt/mbm perf events, refer to the following link for details: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git /commit/?id=c39a0e2c8850f08249383f2425dbd8dbe4baad69 Latest kernel introduces the RDT feature, including CAT MBA CDP CMT and MBM, through resource control file system (resctrl fs), and libvirt already implemented the cache and memory bandwidth allocation through kernel resctrlfs interface. I'd like to add the cache and memory bandwidth monitoring feature to libvirt through kernel resctrlfs interface. Since CMT and MBM shares a very similar kernel interface, for simplicity, only discuss CMT here. MBM is the followup that will be implemented after CMT, which should be very straightforward once CMT is ready. It is planned to output the cache monitoring result through 'domstats' command, and create or delete monitoring groups with either domain's statical XML configuration file or a new 'virsh' runtime command. 2. How many stages scheduled to implement the libvirt CMT(MBM) feature? Short answer: 4 stages. Detail explanation: stage 1: Statical CMT feature. Implementing the libvirt interface for creating and deleting CMT groups for statically through domain's XML configuration file, changes only takes effect after a reboot. Stage 2: Statical MBM feature. Very similar to stage1 CMT. Stage 3: Dynamical CMT feature. In this stage, it is aimed to implement interfaces to change the CMT groups dynamically at runtime, with no requirement for a reboot. My basic thought is implementing a new 'virsh' sub-command providing all interfaces. I also hope to implement the functionality of dynamical cache allocation (CAT) with this command. Stage 4: Dynamical MBM feature. Depending on the implementation of stage3, should share many interfaces created at stage 3. This RFC mainly covers the discussion of stage1. 3. What is the interface for statical CMT feature (the output and input)? Short answer: output through command 'domstats'; input through XML file. Detail explanation: Output: Either CMT or MBM is Intel x86 CPU feature, so I put the statistical message under the result of 'cpu-total' subcommand. This is different with RFCv2, in RFCv2 a separate 'domstats' sub-command has been created, with the name 'cpu-resource', Martin and I were not very satisfied with the the naming. The output would be: virsh domstats --cpu--total Domain: 'ubuntu16.04-base' cpu.time=3143918715755 cpu.user=15320000000 cpu.system=235280000000 cpu.cache.monitor.count=3 cpu.cache.0.name=vcpus_2 cpu.cache.0.vcpus=2 cpu.cache.0.bank.count=2 cpu.cache.0.bank.0.bytes=5488 cpu.cache.0.bank.1.bytes =4410000 cpu.cache.1.name=vcpus_1 cpu.cache.1.vcpus=1 cpu.cache.1.bank.count=2 cpu.cache.1.bank.0.bytes =7839744 cpu.cache.1.bank.1.bytes =0 cpu.cache.2.name=vcpu_0,3 cpu.cache.2.vcpus=0,3 cpu.cache.2.bank.count=2 cpu.cache.2.bank.0.bytes=53796864 cpu.cache.2.bank.1.bytes=0 Comparing with RFCv2, the CMT information is slightly changed. Input: Comparing with RFC v2, this part has been changed. Now the resource monitoring group is considered as a resource monitor toward specific allocation. The main interface for creating monitoring group is through XML file. The proposed configuration is like: <cputune> <cachetune vcpus='0-1'> <cache id='0' level='3' type='code' size='7680' unit='KiB'/> <cache id='1' level='3' type='data' size='3840' unit='KiB'/> + <monitor vcpus='0-1'/> + <monitor vcpus='0'/> </cachetune> <cachetune vcpus='3'> + <monitor vcpus='3'/> </cachetune> </cputune> In above XML, created 2 cache resctrl allocation groups and 3 resctrl monitoring groups. For cache allocation group with vcpu 0 and vcpu 1, there are two monitors: one is monitoring the last level cache occupancy of two vcpus, while the other one only monitors the cache occupancy of vcpu 0. In another cache allocation, there is no explicit cache resource allocation, this cache allocation uses the resource of 'default' cache allocation group, and a cache monitor is created to monitor the cache occupancy. 4. Have emulator and io threads been considered for CMT? Short answer: No. Unsupport to allocate dedicated cache or memory bandwidth for emulator and io threads.
participants (1)
-
Huaqiang,Wang