This series of patches introduced the x86 Cache Monitoring Technology
(CMT) to libvirt by interacting with kernel resource control (resctrl)
interface. CMT is one of the Intel(R) x86 CPU feature which belongs to
the Resource Director Technology (RDT). CMT reports the occupancy of the
last level cache, which is shared by all CPU cores.
In v1 series, we are introducing CMT for libvirt, including reporting
host capability and creating CMT groups. Introducing host capability
is pretty much a well self-contained step, we only cover this step in
this series. As an extension of v1, MBM capability is also introduced.
These patches will not cover the part of creating CMT groups, which
will be subsequent patches.
We have serval discussion about the enabling of CMT, please refer to
following links for the RFCs.
RFCv3
https://www.redhat.com/archives/libvir-list/2018-August/msg01213.html
RFCv2
https://www.redhat.com/archives/libvir-list/2018-July/msg00409.html
https://www.redhat.com/archives/libvir-list/2018-July/msg01241.html
RFCv1
https://www.redhat.com/archives/libvir-list/2018-June/msg00674.html
1. About reason why CMT is necessary for libvirt?
The perf events of 'CMT, MBML, MBMT' have been phased out since Linux
kernel commit c39a0e2c8850f08249383f2425dbd8dbe4baad69, in libvirt
the perf based cmt,mbm will not work with the latest linux kernel. These
patches add CMT feature to libvirt through kernel resctrlfs interface.
2. Interfaces for CMT from the high level.
CMT, CAT, MBM and MBA are orthogonal features, each could works
independently.
If 'CMT' is enabled in host, then a 'cache monitor' is introduced for
cache, which is role is monitoring the last level cache utilization
of target system process. Cache monitor capabilities is shown under
element <cache>.
'MBM', a monitor named memory bandwidth monitor is introduced, for
role of monitoring memory bandwidth utilization. The capability
information block is located under <memory bandwidth> element.
2.1 Query the host capability of CMT.
The element 'monitor' represents the host capabilities of CMT.
The explanations of involved attributes:
- 'maxMonitors': denotes the maximum monitoring groups could be
created, which is limited by the number of hardware 'RMID'.
- 'reuseThreshold': an adjustable value affects the final reuse of
resources used by monitor. After the action of removing a
monitor, the kernel may not release all hardware resources that
monitor used immediately if the cache occupancy value associated
with 'removed' monitor is above this threshold. Once the cache
occupancy is below this threshold, the underlying hardware
resource will be reclaimed and be put into the resource pool
for next reusing.
- 'llc_occupancy': a feature of CMT, reporting the last level cache
occupancy information.
- 'mbm_total_bytes': a feature of MBM, reporting total memory
bandwidth utilization, in bytes, including local memory and
remote memory for multi-node system.
- 'mbm_local_bytes': a feature of MBM, reporting only local memory
bandwidth utilization.
# virsh capabilities
...
<cache>
<bank id='0' level='3' type='both' size='15'
unit='MiB' cpus='0-5'>
<control granularity='768' min='1536' unit='KiB'
type='both' maxAllocs='4'/>
</bank>
<bank id='1' level='3' type='both' size='15'
unit='MiB' cpus='6-11'>
<control granularity='768' min='1536' unit='KiB'
type='both' maxAllocs='4'/>
</bank>
+ <monitor level='3' reuseThreshold='270336'
maxMonitors='176'>
+ <feature name='llc_occupancy'/>
+ </monitor>
</cache>
<memory_bandwidth>
<node id='0' cpus='0-5'>
<control granularity='10' min ='10'
maxAllocs='4'/>
</node>
<node id='1' cpus='6-11'>
<control granularity='10' min ='10'
maxAllocs='4'/>
</node>
+ <monitor maxMonitors='176'>
+ <feature name='mbm_total_bytes'/>
+ <feature name='mbm_local_bytes'/>
+ </monitor>
</memory_bandwidth>
...
</host>
Changes since v2:
- Addressed John Ferlan's review.
- Typo fixed.
- Removed VIR_ENUM_DECL(virMonitor);
Changes since v1:
- Introduced MBM capability.
- Capability layout changed
* Moved <monitor> from cahe <bank> to <cache>
* Renamed <Threshold> to <reuseThreshold>
- Document for 'reuseThreshold' changed.
- Introduced API virResctrlInfoGetMonitorPrefix
- Added more tests, covering standalone CMT, fake new
feature.
- Creating CMT resource control group will be
subsequent job.
Wang Huaqiang (4):
util: Introduce monitor capability interface
conf: Refactor cache bank capability structure
conf: Refactor memory bandwidth capability structure
conf: Introduce RDT monitor host capability
docs/schemas/capability.rng | 37 +++-
src/conf/capabilities.c | 122 ++++++++---
src/conf/capabilities.h | 24 ++-
src/libvirt_private.syms | 2 +
src/util/virresctrl.c | 236 +++++++++++++++++++++
src/util/virresctrl.h | 40 ++++
.../resctrl/info/L3_MON/max_threshold_occupancy | 1 +
.../resctrl/info/L3_MON/mon_features | 1 +
.../resctrl/info/L3_MON/num_rmids | 1 +
.../linux-resctrl-cmt/resctrl/manualres/cpus | 1 +
.../linux-resctrl-cmt/resctrl/manualres/schemata | 1 +
.../linux-resctrl-cmt/resctrl/manualres/tasks | 0
.../linux-resctrl-cmt/resctrl/schemata | 1 +
tests/vircaps2xmldata/linux-resctrl-cmt/system | 1 +
.../resctrl/info/L3/cbm_mask | 1 +
.../resctrl/info/L3/min_cbm_bits | 1 +
.../resctrl/info/L3/num_closids | 1 +
.../resctrl/info/L3_MON/max_threshold_occupancy | 1 +
.../resctrl/info/L3_MON/mon_features | 10 +
.../resctrl/info/L3_MON/num_rmids | 1 +
.../resctrl/info/MB/bandwidth_gran | 1 +
.../resctrl/info/MB/min_bandwidth | 1 +
.../resctrl/info/MB/num_closids | 1 +
.../resctrl/manualres/cpus | 1 +
.../resctrl/manualres/schemata | 1 +
.../resctrl/manualres/tasks | 0
.../linux-resctrl-fake-feature/resctrl/schemata | 1 +
.../linux-resctrl-fake-feature/system | 1 +
.../resctrl/info/L3_MON/max_threshold_occupancy | 1 +
.../linux-resctrl/resctrl/info/L3_MON/mon_features | 3 +
.../linux-resctrl/resctrl/info/L3_MON/num_rmids | 1 +
.../vircaps2xmldata/vircaps-x86_64-resctrl-cmt.xml | 53 +++++
.../vircaps-x86_64-resctrl-fake-feature.xml | 73 +++++++
tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml | 7 +
tests/vircaps2xmltest.c | 2 +
35 files changed, 594 insertions(+), 36 deletions(-)
create mode 100644
tests/vircaps2xmldata/linux-resctrl-cmt/resctrl/info/L3_MON/max_threshold_occupancy
create mode 100644
tests/vircaps2xmldata/linux-resctrl-cmt/resctrl/info/L3_MON/mon_features
create mode 100644 tests/vircaps2xmldata/linux-resctrl-cmt/resctrl/info/L3_MON/num_rmids
create mode 100644 tests/vircaps2xmldata/linux-resctrl-cmt/resctrl/manualres/cpus
create mode 100644 tests/vircaps2xmldata/linux-resctrl-cmt/resctrl/manualres/schemata
create mode 100644 tests/vircaps2xmldata/linux-resctrl-cmt/resctrl/manualres/tasks
create mode 100644 tests/vircaps2xmldata/linux-resctrl-cmt/resctrl/schemata
create mode 120000 tests/vircaps2xmldata/linux-resctrl-cmt/system
create mode 100644
tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/info/L3/cbm_mask
create mode 100644
tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/info/L3/min_cbm_bits
create mode 100644
tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/info/L3/num_closids
create mode 100644
tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/info/L3_MON/max_threshold_occupancy
create mode 100644
tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/info/L3_MON/mon_features
create mode 100644
tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/info/L3_MON/num_rmids
create mode 100644
tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/info/MB/bandwidth_gran
create mode 100644
tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/info/MB/min_bandwidth
create mode 100644
tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/info/MB/num_closids
create mode 100644
tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/manualres/cpus
create mode 100644
tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/manualres/schemata
create mode 100644
tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/manualres/tasks
create mode 100644 tests/vircaps2xmldata/linux-resctrl-fake-feature/resctrl/schemata
create mode 120000 tests/vircaps2xmldata/linux-resctrl-fake-feature/system
create mode 100644
tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/max_threshold_occupancy
create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/mon_features
create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/num_rmids
create mode 100644 tests/vircaps2xmldata/vircaps-x86_64-resctrl-cmt.xml
create mode 100644 tests/vircaps2xmldata/vircaps-x86_64-resctrl-fake-feature.xml
--
2.7.4