From: Bing Niu <bing.niu(a)intel.com>
This series is to introduce RDT memory bandwidth allocation support by extending
current virresctrl implementation.
The Memory Bandwidth Allocation (MBA) feature provides indirect and approximate
control over memory bandwidth available per-core. This feature provides a method to
control applications which may be over-utilizing bandwidth relative to their priority
in environments such as the data-center. The details can be found in Intel's SDM
17.19.7.
Kernel supports MBA through resctrl file system same as CAT. Each resctrl group have a
MB parameter to control how much memory bandwidth it can utilize in unit of percentage.
In this series, MBA is enabled by enhancing existing virresctrl implementation. The
policy employed for MBA is similar with CAT: The sum of each MBA group's bandwidth
dose not exceed 100%.
The enhancement of virresctrl include two main parts:
Part 1: Add two new structures virResctrlInfoMemMB and virResctrlAllocMemBW for
collecting
host system MBA capability and domain memory bandwidth allocation. Those two
structures are the extension of existing virResctrlInfo and virResctrlAlloc.
With
them, virresctrl framework can support MBA and CAT concurrently. Each
virResctrlAlloc
represent a resource allocation including CAT, or MBA, or CAT&MBA. The
policy of MBA is
that: total memory bandwidth of each resctrl group, created by virresctrl, does
not
exceed to 100%.
Part 2: On XML part, add new elements to host capabilities query and domain allocation
to support
memory bandwidth allocation.
---------------------------------------------------------------------------------------------
For host capabilities XML, new XML format like below example,
<host>
.....
<memory_bandwidth>
<node id='0' cpus='0-19'>
<control granularity='10' min ='10'
maxAllocs='8'/>
</node>
</memory_bandwidth>
</host>
granularity --- memory bandwidth granularity
min --- minimum memory bandwidth allowed
maxAllocs --- maximum concurrent memory bandwidth allocation allowed.
---------------------------------------------------------------------------------------------
For domain XML, new format as below example
<domain type='kvm' id='2'>
......
<cputune>
......
<shares>1024</shares>
<memorytune vcpus='0-1'>
<node id='0' bandwidth='20'/>
</memorytune>
</cputune>
......
</domain>
id --- node where memory bandwidth allocation will happen
bandwidth --- bandwidth allocated in percentage
----------------------------------------------------------------------------------------------
With this extension of the virresctrl, the overall working follow of CAT and MBA is
described by below
picture. XML parser will aggregate MBA and CAT configuration and represents it in one
virresctrl object.
The methods of virresctrl class will manipulate resctrl interface to allocate
corresponding resources.
<memorytune cpus='0-3'>
+---------+
| <cachetune vcpus='0-3'>
XML | +
parser +-----------+
|
|
+------------------------------+
|
|
internal object +------v--------------+
virResctrlAlloc | backing object |
+------+--------------+
|
|
+------------------------------+
|
+--v-------+
| |
| schemata |
/sys/fs/resctrl | tasks |
| . |
| . |
| |
+----------+
---------------------------------------------------------------------
previous versions and discussion can be found at
v1:
https://www.redhat.com/archives/libvir-list/2018-July/msg01144.html
RFC v2:
https://www.redhat.com/archives/libvir-list/2018-June/msg01268.html
RFC v1:
https://www.redhat.com/archives/libvir-list/2018-May/msg02101.html
Changelog:
v1 -> this: John's comment: 1. Split calculation of number of memory
bandwidth control
to one patch.
2. Split virResctrlAllocMemBW relating methods to 5
patch, each
provides one kind of function, eg: schemata
processing, memory
bandwidth calculation.....
3. Use resctrl to replace cachetune in domain conf.
4. Split refactor virDomainCachetuneDefParse into 3
patches. And
adjust some logic, eg: use %s format error log,
renaming
functions.....
5. Complete doc description. eg: update cachetune part
about vcpus
overlapping with memorytune, update libvirt version
info for memory
bandwidth control availability.
6. Some coding style fix.
RFC_v2->v1: John's comment: 1. use name MemBW to replace MB for a more
clear description.
2. split rename patch and put refactor function part
separately.
3. split virResctrlInfoMemMB and virResctrlAllocMemBW
to different
patches.
4. add docs/schemas/*.rng for XML related patches.
5. some cleanup for coding conventions.
RFC_ v1->RFC_v2:
Jano's comment: 1. put renaming parts into separated patches.
2. set the initial return value as -1.
3. using full name in structure definition.
4. do not use VIR_CACHE_TYPE_LAST for memory bandwidth
allocation formatting.
Pavel's comment: 1. add host capabilities XML for memory bandwidth
allocation.
2. do not mix use cachetune section in XML for memory
bandwidth allocation in
domain XML. define a dedicated one for memory bandwidth
allocation.
Bing Niu (17):
util: Rename some functions of virresctrl
util: Refactor virResctrlGetInfo in virresctrl
util: Refactor virResctrlAllocFormat of virresctrl
util: Add MBA capability information query to resctrl
util: Add MBA check to virResctrlInfoGetCache
util: Add MBA allocation to virresctrl
util: Add MBA schemata parse and format methods
util: Add support to calculate MBA utilization
util: Introduce virResctrlAllocForeachMemory
util: Introduce virResctrlAllocSetMemoryBandwidth
conf: Rename cachetune to resctrl
conf: Factor out vcpus parsing part from virDomainCachetuneDefParse
conf: Factor out vcpus overlapping from virDomainCachetuneDefParse
conf: Factor out virDomainResctrlDef update from
virDomainCachetuneDefParse
conf: Add support for memorytune XML processing for resctrl MBA
conf: Add return value check to virResctrlAllocForeachCache
conf: Add memory bandwidth allocation capability of host
docs/formatdomain.html.in | 39 +-
docs/schemas/capability.rng | 33 ++
docs/schemas/domaincommon.rng | 17 +
src/conf/capabilities.c | 107 ++++
src/conf/capabilities.h | 11 +
src/conf/domain_conf.c | 428 ++++++++++++---
src/conf/domain_conf.h | 10 +-
src/libvirt_private.syms | 6 +-
src/qemu/qemu_domain.c | 2 +-
src/qemu/qemu_process.c | 18 +-
src/util/virresctrl.c | 611 +++++++++++++++++++--
src/util/virresctrl.h | 55 +-
.../memorytune-colliding-allocs.xml | 30 +
.../memorytune-colliding-cachetune.xml | 32 ++
tests/genericxml2xmlindata/memorytune.xml | 33 ++
tests/genericxml2xmltest.c | 5 +
.../linux-resctrl/resctrl/info/MB/bandwidth_gran | 1 +
.../linux-resctrl/resctrl/info/MB/min_bandwidth | 1 +
.../linux-resctrl/resctrl/info/MB/num_closids | 1 +
tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml | 8 +
tests/virresctrldata/resctrl.schemata | 1 +
21 files changed, 1280 insertions(+), 169 deletions(-)
create mode 100644 tests/genericxml2xmlindata/memorytune-colliding-allocs.xml
create mode 100644 tests/genericxml2xmlindata/memorytune-colliding-cachetune.xml
create mode 100644 tests/genericxml2xmlindata/memorytune.xml
create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/MB/bandwidth_gran
create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/MB/min_bandwidth
create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/MB/num_closids
Reviewed-by: John Ferlan <jferlan(a)redhat.com>
(series)
I'll push once the tree is open for 4.7.0 commits unless someone else
chimes in with other major issues that need to be addressed.
John