[libvirt] [PATCH 0/9] Introduce RDT memory bandwidth allocation support

From: Bing Niu <bing.niu@intel.com> This series is to introduce RDT memory bandwidth allocation support by extending current virresctrl implementation. The Memory Bandwidth Allocation (MBA) feature provides indirect and approximate control over memory bandwidth available per-core. This feature provides a method to control applications which may be over-utilizing bandwidth relative to their priority in environments such as the data-center. The details can be found in Intel's SDM 17.19.7. Kernel supports MBA through resctrl file system same as CAT. Each resctrl group have a MB parameter to control how much memory bandwidth it can utilize in unit of percentage. In this series, MBA is enabled by enhancing existing virresctrl implementation. The policy employed for MBA is similar with CAT: The sum of each MBA group's bandwidth dose not exceed 100%. The enhancement of virresctrl include two main parts: Part 1: Add two new structures virResctrlInfoMemMB and virResctrlAllocMemBW for collecting host system MBA capability and domain memory bandwidth allocation. Those two structures are the extension of existing virResctrlInfo and virResctrlAlloc. With them, virresctrl framework can support MBA and CAT concurrently. Each virResctrlAlloc represent a resource allocation including CAT, or MBA, or CAT&MBA. The policy of MBA is that: total memory bandwidth of each resctrl group, created by virresctrl, does not exceed to 100%. Part 2: On XML part, add new elements to host capabilities query and domain allocation to support memory bandwidth allocation. --------------------------------------------------------------------------------------------- For host capabilities XML, new XML format like below example, <host> ..... <memory_bandwidth> <node id='0' cpus='0-19'> <control granularity='10' min ='10' maxAllocs='8'/> </node> </memory_bandwidth> </host> granularity --- memory bandwidth granularity min --- minimum memory bandwidth allowed maxAllocs --- maximum concurrent memory bandwidth allocation allowed. --------------------------------------------------------------------------------------------- For domain XML, new format as below example <domain type='kvm' id='2'> ...... <cputune> ...... <shares>1024</shares> <memorytune vcpus='0-1'> <node id='0' bandwidth='20'/> </memorytune> </cputune> ...... </domain> id --- node where memory bandwidth allocation will happen bandwidth --- bandwidth allocated in percentage ---------------------------------------------------------------------------------------------- With this extension of the virresctrl, the overall working follow of CAT and MBA is described by below picture. XML parser will aggregate MBA and CAT configuration and represents it in one virresctrl object. The methods of virresctrl class will manipulate resctrl interface to allocate corresponding resources. <memorytune cpus='0-3'> +---------+ | <cachetune vcpus='0-3'> XML | + parser +-----------+ | | +------------------------------+ | | internal object +------v--------------+ virResctrlAlloc | backing object | +------+--------------+ | | +------------------------------+ | +--v-------+ | | | schemata | /sys/fs/resctrl | tasks | | . | | . | | | +----------+ --------------------------------------------------------------------- previous RFC versions and discussion can be found at v2: https://www.redhat.com/archives/libvir-list/2018-June/msg01268.html v1: https://www.redhat.com/archives/libvir-list/2018-May/msg02101.html Changelog: RFC_v2->this: John's comment: 1. use name MemBW to replace MB for a more clear description. 2. split rename patch and put refactor function part separately. 3. split virResctrlInfoMemMB and virResctrlAllocMemBW to different patches. 4. add docs/schemas/*.rng for XML related patches. 5. some cleanup for coding conventions. RFC_ v1->RFC_v2: Jano's comment: 1. put renaming parts into separated patches. 2. set the initial return value as -1. 3. using full name in structure definition. 4. do not use VIR_CACHE_TYPE_LAST for memory bandwidth allocation formatting. Pavel's comment: 1. add host capabilities XML for memory bandwidth allocation. 2. do not mix use cachetune section in XML for memory bandwidth allocation in domain XML. define a dedicated one for memory bandwidth allocation. Bing Niu (9): util: Rename some functions of virresctrl util: Refactor virResctrlGetInfo in virresctrl util: Refactor virResctrlAllocFormat of virresctrl util: Add MBA capability information query to resctrl util: Add MBA allocation to virresctrl conf: Rename cachetune to restune conf: Refactor virDomainCachetuneDefParse conf: Introduce cputune/memorytune to support memory bandwidth allocation conf: Add memory bandwidth allocation capability of host docs/formatdomain.html.in | 35 ++ docs/schemas/capability.rng | 33 ++ docs/schemas/domaincommon.rng | 17 + src/conf/capabilities.c | 108 ++++ src/conf/capabilities.h | 11 + src/conf/domain_conf.c | 408 ++++++++++++--- src/conf/domain_conf.h | 10 +- src/libvirt_private.syms | 4 +- src/qemu/qemu_domain.c | 2 +- src/qemu/qemu_process.c | 18 +- src/util/virresctrl.c | 574 +++++++++++++++++++-- src/util/virresctrl.h | 54 +- .../memorytune-colliding-allocs.xml | 30 ++ .../memorytune-colliding-cachetune.xml | 32 ++ tests/genericxml2xmlindata/memorytune.xml | 33 ++ tests/genericxml2xmltest.c | 5 + .../linux-resctrl/resctrl/info/MB/bandwidth_gran | 1 + .../linux-resctrl/resctrl/info/MB/min_bandwidth | 1 + .../linux-resctrl/resctrl/info/MB/num_closids | 1 + tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml | 8 + tests/virresctrldata/resctrl.schemata | 1 + 21 files changed, 1233 insertions(+), 153 deletions(-) create mode 100644 tests/genericxml2xmlindata/memorytune-colliding-allocs.xml create mode 100644 tests/genericxml2xmlindata/memorytune-colliding-cachetune.xml create mode 100644 tests/genericxml2xmlindata/memorytune.xml create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/MB/bandwidth_gran create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/MB/min_bandwidth create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/MB/num_closids -- 2.7.4

From: Bing Niu <bing.niu@intel.com> Some functions in virresctrl are for CAT only, while some of other functions are for resource allocation, not just CAT. So change their names to reflect the reality. Signed-off-by: Bing Niu <bing.niu@intel.com> --- src/conf/domain_conf.c | 8 ++++---- src/libvirt_private.syms | 4 ++-- src/util/virresctrl.c | 30 +++++++++++++++--------------- src/util/virresctrl.h | 26 +++++++++++++------------- 4 files changed, 34 insertions(+), 34 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 7396616..f259b4c 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -18946,7 +18946,7 @@ virDomainCachetuneDefParseCache(xmlXPathContextPtr ctxt, ULLONG_MAX, true) < 0) goto cleanup; - if (virResctrlAllocSetSize(alloc, level, type, cache, size) < 0) + if (virResctrlAllocSetCacheSize(alloc, level, type, cache, size) < 0) goto cleanup; ret = 0; @@ -26921,9 +26921,9 @@ virDomainCachetuneDefFormat(virBufferPtr buf, int ret = -1; virBufferSetChildIndent(&childrenBuf, buf); - virResctrlAllocForeachSize(cachetune->alloc, - virDomainCachetuneDefFormatHelper, - &childrenBuf); + virResctrlAllocForeachCache(cachetune->alloc, + virDomainCachetuneDefFormatHelper, + &childrenBuf); if (virBufferCheckError(&childrenBuf) < 0) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 1caecb9..47e1b18 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2646,14 +2646,14 @@ virCacheTypeToString; virResctrlAllocAddPID; virResctrlAllocCreate; virResctrlAllocDeterminePath; -virResctrlAllocForeachSize; +virResctrlAllocForeachCache; virResctrlAllocFormat; virResctrlAllocGetID; virResctrlAllocGetUnused; virResctrlAllocNew; virResctrlAllocRemove; +virResctrlAllocSetCacheSize; virResctrlAllocSetID; -virResctrlAllocSetSize; virResctrlInfoGetCache; virResctrlInfoNew; diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index e492a63..6d69c8d 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -754,11 +754,11 @@ virResctrlAllocCheckCollision(virResctrlAllocPtr alloc, int -virResctrlAllocSetSize(virResctrlAllocPtr alloc, - unsigned int level, - virCacheType type, - unsigned int cache, - unsigned long long size) +virResctrlAllocSetCacheSize(virResctrlAllocPtr alloc, + unsigned int level, + virCacheType type, + unsigned int cache, + unsigned long long size) { if (virResctrlAllocCheckCollision(alloc, level, type, cache)) { virReportError(VIR_ERR_XML_ERROR, @@ -773,9 +773,9 @@ virResctrlAllocSetSize(virResctrlAllocPtr alloc, int -virResctrlAllocForeachSize(virResctrlAllocPtr alloc, - virResctrlAllocForeachSizeCallback cb, - void *opaque) +virResctrlAllocForeachCache(virResctrlAllocPtr alloc, + virResctrlAllocForeachCacheCallback cb, + void *opaque) { int ret = 0; unsigned int level = 0; @@ -939,9 +939,9 @@ virResctrlAllocParseProcessCache(virResctrlInfoPtr resctrl, static int -virResctrlAllocParseProcessLine(virResctrlInfoPtr resctrl, - virResctrlAllocPtr alloc, - char *line) +virResctrlAllocParseCacheLine(virResctrlInfoPtr resctrl, + virResctrlAllocPtr alloc, + char *line) { char **caches = NULL; char *tmp = NULL; @@ -1009,7 +1009,7 @@ virResctrlAllocParse(virResctrlInfoPtr resctrl, lines = virStringSplitCount(schemata, "\n", 0, &nlines); for (i = 0; i < nlines; i++) { - if (virResctrlAllocParseProcessLine(resctrl, alloc, lines[i]) < 0) + if (virResctrlAllocParseCacheLine(resctrl, alloc, lines[i]) < 0) goto cleanup; } @@ -1401,8 +1401,8 @@ virResctrlAllocCopyMasks(virResctrlAllocPtr dst, * transforming `sizes` into `masks`. */ static int -virResctrlAllocMasksAssign(virResctrlInfoPtr resctrl, - virResctrlAllocPtr alloc) +virResctrlAllocAssign(virResctrlInfoPtr resctrl, + virResctrlAllocPtr alloc) { int ret = -1; unsigned int level = 0; @@ -1526,7 +1526,7 @@ virResctrlAllocCreate(virResctrlInfoPtr resctrl, if (lockfd < 0) goto cleanup; - if (virResctrlAllocMasksAssign(resctrl, alloc) < 0) + if (virResctrlAllocAssign(resctrl, alloc) < 0) goto cleanup; alloc_str = virResctrlAllocFormat(alloc); diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h index 9052a2b..d657c06 100644 --- a/src/util/virresctrl.h +++ b/src/util/virresctrl.h @@ -67,11 +67,11 @@ virResctrlInfoGetCache(virResctrlInfoPtr resctrl, typedef struct _virResctrlAlloc virResctrlAlloc; typedef virResctrlAlloc *virResctrlAllocPtr; -typedef int virResctrlAllocForeachSizeCallback(unsigned int level, - virCacheType type, - unsigned int cache, - unsigned long long size, - void *opaque); +typedef int virResctrlAllocForeachCacheCallback(unsigned int level, + virCacheType type, + unsigned int cache, + unsigned long long size, + void *opaque); virResctrlAllocPtr virResctrlAllocNew(void); @@ -80,16 +80,16 @@ bool virResctrlAllocIsEmpty(virResctrlAllocPtr alloc); int -virResctrlAllocSetSize(virResctrlAllocPtr alloc, - unsigned int level, - virCacheType type, - unsigned int cache, - unsigned long long size); +virResctrlAllocSetCacheSize(virResctrlAllocPtr alloc, + unsigned int level, + virCacheType type, + unsigned int cache, + unsigned long long size); int -virResctrlAllocForeachSize(virResctrlAllocPtr alloc, - virResctrlAllocForeachSizeCallback cb, - void *opaque); +virResctrlAllocForeachCache(virResctrlAllocPtr alloc, + virResctrlAllocForeachCacheCallback cb, + void *opaque); int virResctrlAllocSetID(virResctrlAllocPtr alloc, -- 2.7.4

On 07/18/2018 03:57 AM, bing.niu@intel.com wrote:
From: Bing Niu <bing.niu@intel.com>
Some functions in virresctrl are for CAT only, while some of other functions are for resource allocation, not just CAT. So change their names to reflect the reality.
Signed-off-by: Bing Niu <bing.niu@intel.com> --- src/conf/domain_conf.c | 8 ++++---- src/libvirt_private.syms | 4 ++-- src/util/virresctrl.c | 30 +++++++++++++++--------------- src/util/virresctrl.h | 26 +++++++++++++------------- 4 files changed, 34 insertions(+), 34 deletions(-)
Reviewed-by: John Ferlan <jferlan@redhat.com> John

From: Bing Niu <bing.niu@intel.com> Separate resctrl common information parts from CAT specific parts, so that common information parts can be reused among different resource allocation technologies. Signed-off-by: Bing Niu <bing.niu@intel.com> --- src/util/virresctrl.c | 31 ++++++++++++++++++++++--------- 1 file changed, 22 insertions(+), 9 deletions(-) diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index 6d69c8d..98e7296 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -318,9 +318,8 @@ virResctrlUnlock(int fd) /* virResctrlInfo-related definitions */ static int -virResctrlGetInfo(virResctrlInfoPtr resctrl) +virResctrlGetCacheInfo(virResctrlInfoPtr resctrl, DIR *dirp) { - DIR *dirp = NULL; char *endptr = NULL; char *tmp_str = NULL; int ret = -1; @@ -332,12 +331,6 @@ virResctrlGetInfo(virResctrlInfoPtr resctrl) virResctrlInfoPerLevelPtr i_level = NULL; virResctrlInfoPerTypePtr i_type = NULL; - rv = virDirOpenIfExists(&dirp, SYSFS_RESCTRL_PATH "/info"); - if (rv <= 0) { - ret = rv; - goto cleanup; - } - while ((rv = virDirRead(dirp, &ent, SYSFS_RESCTRL_PATH "/info")) > 0) { VIR_DEBUG("Parsing info type '%s'", ent->d_name); if (ent->d_name[0] != 'L') @@ -443,12 +436,32 @@ virResctrlGetInfo(virResctrlInfoPtr resctrl) ret = 0; cleanup: - VIR_DIR_CLOSE(dirp); VIR_FREE(i_type); return ret; } +static int +virResctrlGetInfo(virResctrlInfoPtr resctrl) +{ + DIR *dirp = NULL; + int ret = -1; + + ret = virDirOpenIfExists(&dirp, SYSFS_RESCTRL_PATH "/info"); + if (ret <= 0) + goto cleanup; + + ret = virResctrlGetCacheInfo(resctrl, dirp); + if (ret < 0) + goto cleanup; + + ret = 0; + cleanup: + VIR_DIR_CLOSE(dirp); + return ret; +} + + virResctrlInfoPtr virResctrlInfoNew(void) { -- 2.7.4

On 07/18/2018 03:57 AM, bing.niu@intel.com wrote:
From: Bing Niu <bing.niu@intel.com>
Separate resctrl common information parts from CAT specific parts, so that common information parts can be reused among different resource allocation technologies.
Signed-off-by: Bing Niu <bing.niu@intel.com> --- src/util/virresctrl.c | 31 ++++++++++++++++++++++--------- 1 file changed, 22 insertions(+), 9 deletions(-)
I'm assuming you are coordinating with the [RFC PATCHv2 00/10] x86 RDT Cache Monitoring Technology (CMT) that's on list as well. I don't have the cycles to do any compare/contrast between the two...
diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index 6d69c8d..98e7296 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -318,9 +318,8 @@ virResctrlUnlock(int fd)
/* virResctrlInfo-related definitions */ static int -virResctrlGetInfo(virResctrlInfoPtr resctrl) +virResctrlGetCacheInfo(virResctrlInfoPtr resctrl, DIR *dirp)
One argument per line... I can fix before pushing though. Reviewed-by: John Ferlan <jferlan@redhat.com> John

On 2018年07月25日 06:02, John Ferlan wrote:
On 07/18/2018 03:57 AM, bing.niu@intel.com wrote:
From: Bing Niu <bing.niu@intel.com>
Separate resctrl common information parts from CAT specific parts, so that common information parts can be reused among different resource allocation technologies.
Signed-off-by: Bing Niu <bing.niu@intel.com> --- src/util/virresctrl.c | 31 ++++++++++++++++++++++--------- 1 file changed, 22 insertions(+), 9 deletions(-)
I'm assuming you are coordinating with the [RFC PATCHv2 00/10] x86 RDT Cache Monitoring Technology (CMT) that's on list as well. I don't have the cycles to do any compare/contrast between the two...
Yup~ Huaqiang in cc list is working on CMT part. We already sync off line. Since monitor is more complicated and a brand new implementing to replace old perf based monitor functions. Huaqiang will absorb Martin's comment and update his RFC patch. :)
diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index 6d69c8d..98e7296 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -318,9 +318,8 @@ virResctrlUnlock(int fd)
/* virResctrlInfo-related definitions */ static int -virResctrlGetInfo(virResctrlInfoPtr resctrl) +virResctrlGetCacheInfo(virResctrlInfoPtr resctrl, DIR *dirp)
One argument per line... I can fix before pushing though.
My bad! I should remember one argument per line for function, instead of just splitting line only beyond 80 characters. :(
Reviewed-by: John Ferlan <jferlan@redhat.com>
John

From: Bing Niu <bing.niu@intel.com> Refactor virResctrlAllocFormat so that it is easy to support other resource allocation technologies. Signed-off-by: Bing Niu <bing.niu@intel.com> --- src/util/virresctrl.c | 43 ++++++++++++++++++++++++++++--------------- 1 file changed, 28 insertions(+), 15 deletions(-) diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index 98e7296..1515de2 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -848,17 +848,13 @@ virResctrlAllocGetID(virResctrlAllocPtr alloc) } -char * -virResctrlAllocFormat(virResctrlAllocPtr alloc) +static int +virResctrlAllocFormatCache(virResctrlAllocPtr alloc, virBufferPtr buf) { - virBuffer buf = VIR_BUFFER_INITIALIZER; unsigned int level = 0; unsigned int type = 0; unsigned int cache = 0; - if (!alloc) - return NULL; - for (level = 0; level < alloc->nlevels; level++) { virResctrlAllocPerLevelPtr a_level = alloc->levels[level]; @@ -871,7 +867,7 @@ virResctrlAllocFormat(virResctrlAllocPtr alloc) if (!a_type) continue; - virBufferAsprintf(&buf, "L%u%s:", level, virResctrlTypeToString(type)); + virBufferAsprintf(buf, "L%u%s:", level, virResctrlTypeToString(type)); for (cache = 0; cache < a_type->nmasks; cache++) { virBitmapPtr mask = a_type->masks[cache]; @@ -881,21 +877,38 @@ virResctrlAllocFormat(virResctrlAllocPtr alloc) continue; mask_str = virBitmapToString(mask, false, true); - if (!mask_str) { - virBufferFreeAndReset(&buf); - return NULL; - } + if (!mask_str) + return -1; - virBufferAsprintf(&buf, "%u=%s;", cache, mask_str); + virBufferAsprintf(buf, "%u=%s;", cache, mask_str); VIR_FREE(mask_str); } - virBufferTrim(&buf, ";", 1); - virBufferAddChar(&buf, '\n'); + virBufferTrim(buf, ";", 1); + virBufferAddChar(buf, '\n'); } } - virBufferCheckError(&buf); + if (virBufferCheckError(buf) < 0) + return -1; + else + return 0; +} + + +char * +virResctrlAllocFormat(virResctrlAllocPtr alloc) +{ + virBuffer buf = VIR_BUFFER_INITIALIZER; + + if (!alloc) + return NULL; + + if (virResctrlAllocFormatCache(alloc, &buf) < 0) { + virBufferFreeAndReset(&buf); + return NULL; + } + return virBufferContentAndReset(&buf); } -- 2.7.4

On 07/18/2018 03:57 AM, bing.niu@intel.com wrote:
From: Bing Niu <bing.niu@intel.com>
Refactor virResctrlAllocFormat so that it is easy to support other resource allocation technologies.
Signed-off-by: Bing Niu <bing.niu@intel.com> --- src/util/virresctrl.c | 43 ++++++++++++++++++++++++++++--------------- 1 file changed, 28 insertions(+), 15 deletions(-)
diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index 98e7296..1515de2 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -848,17 +848,13 @@ virResctrlAllocGetID(virResctrlAllocPtr alloc) }
-char * -virResctrlAllocFormat(virResctrlAllocPtr alloc) +static int +virResctrlAllocFormatCache(virResctrlAllocPtr alloc, virBufferPtr buf)
One argument per line. I'll adjust.
{ - virBuffer buf = VIR_BUFFER_INITIALIZER; unsigned int level = 0; unsigned int type = 0; unsigned int cache = 0;
[...]
- virBufferCheckError(&buf); + if (virBufferCheckError(buf) < 0) + return -1; + else + return 0;
Just return virBufferCheckError(buf); directly I'll adjust before pushing. Reviewed-by: John Ferlan <jferlan@redhat.com> John

On 2018年07月25日 06:03, John Ferlan wrote:
On 07/18/2018 03:57 AM, bing.niu@intel.com wrote:
From: Bing Niu <bing.niu@intel.com>
Refactor virResctrlAllocFormat so that it is easy to support other resource allocation technologies.
Signed-off-by: Bing Niu <bing.niu@intel.com> --- src/util/virresctrl.c | 43 ++++++++++++++++++++++++++++--------------- 1 file changed, 28 insertions(+), 15 deletions(-)
diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index 98e7296..1515de2 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -848,17 +848,13 @@ virResctrlAllocGetID(virResctrlAllocPtr alloc) }
-char * -virResctrlAllocFormat(virResctrlAllocPtr alloc) +static int +virResctrlAllocFormatCache(virResctrlAllocPtr alloc, virBufferPtr buf)
One argument per line. I'll adjust.
{ - virBuffer buf = VIR_BUFFER_INITIALIZER; unsigned int level = 0; unsigned int type = 0; unsigned int cache = 0;
[...]
- virBufferCheckError(&buf); + if (virBufferCheckError(buf) < 0) + return -1; + else + return 0;
Just return virBufferCheckError(buf); directly
I'll adjust before pushing.
Thanks for that. ;)
Reviewed-by: John Ferlan <jferlan@redhat.com>
John

From: Bing Niu <bing.niu@intel.com> Introducing virResctrlInfoMemBW for the information memory bandwidth allocation information. Those information is used for memory bandwidth allocating. Signed-off-by: Bing Niu <bing.niu@intel.com> --- src/util/virresctrl.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 104 insertions(+) diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index 1515de2..06e2702 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -80,6 +80,9 @@ typedef virResctrlInfoPerType *virResctrlInfoPerTypePtr; typedef struct _virResctrlInfoPerLevel virResctrlInfoPerLevel; typedef virResctrlInfoPerLevel *virResctrlInfoPerLevelPtr; +typedef struct _virResctrlInfoMemBW virResctrlInfoMemBW; +typedef virResctrlInfoMemBW *virResctrlInfoMemBWPtr; + typedef struct _virResctrlAllocPerType virResctrlAllocPerType; typedef virResctrlAllocPerType *virResctrlAllocPerTypePtr; @@ -116,11 +119,31 @@ struct _virResctrlInfoPerLevel { virResctrlInfoPerTypePtr *types; }; +/* Information about memory bandwidth allocation */ +struct _virResctrlInfoMemBW { + /* minimum memory bandwidth allowed */ + unsigned int min_bandwidth; + /* bandwidth granularity */ + unsigned int bandwidth_granularity; + /* Maximum number of simultaneous allocations */ + unsigned int max_allocation; + /* level number of last level cache */ + unsigned int last_level_cache; + /* max id of last level cache, this is used to track + * how many last level cache available in host system, + * the number of memory bandwidth allocation controller + * is identical with last level cache. + */ + unsigned int max_id; +}; + struct _virResctrlInfo { virObject parent; virResctrlInfoPerLevelPtr *levels; size_t nlevels; + + virResctrlInfoMemBWPtr membw_info; }; @@ -146,6 +169,7 @@ virResctrlInfoDispose(void *obj) VIR_FREE(level); } + VIR_FREE(resctrl->membw_info); VIR_FREE(resctrl->levels); } @@ -442,6 +466,65 @@ virResctrlGetCacheInfo(virResctrlInfoPtr resctrl, DIR *dirp) static int +virResctrlGetMemoryBandwidthInfo(virResctrlInfoPtr resctrl) +{ + int ret = -1; + int rv = -1; + virResctrlInfoMemBWPtr i_membw = NULL; + + /* query memory bandwidth allocation info */ + if (VIR_ALLOC(i_membw) < 0) + goto cleanup; + rv = virFileReadValueUint(&i_membw->bandwidth_granularity, + SYSFS_RESCTRL_PATH "/info/MB/bandwidth_gran"); + if (rv == -2) { + /* The file doesn't exist, so it's unusable for us, + * properly mba unsupported */ + VIR_INFO("The path '" SYSFS_RESCTRL_PATH "/info/MB/bandwidth_gran'" + "does not exist"); + ret = 0; + goto cleanup; + } else if (rv < 0) { + /* Other failures are fatal, so just quit */ + goto cleanup; + } + + rv = virFileReadValueUint(&i_membw->min_bandwidth, + SYSFS_RESCTRL_PATH "/info/MB/min_bandwidth"); + if (rv == -2) { + /* If the previous file exists, so should this one. Hence -2 is + * fatal in this case (errors out in next condition) - the + * kernel interface might've changed too much or something else is + * wrong. */ + + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Cannot get min bandwidth from resctrl memory info")); + } + if (rv < 0) + goto cleanup; + + rv = virFileReadValueUint(&i_membw->max_allocation, + SYSFS_RESCTRL_PATH "/info/MB/num_closids"); + if (rv == -2) { + /* If the previous file exists, so should this one. Hence -2 is + * fatal in this case as well (errors out in next condition) - the + * kernel interface might've changed too much or something else is + * wrong. */ + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Cannot get max allocation from resctrl memory info")); + } + if (rv < 0) + goto cleanup; + + VIR_STEAL_PTR(resctrl->membw_info, i_membw); + ret = 0; + cleanup: + VIR_FREE(i_membw); + return ret; +} + + +static int virResctrlGetInfo(virResctrlInfoPtr resctrl) { DIR *dirp = NULL; @@ -451,6 +534,10 @@ virResctrlGetInfo(virResctrlInfoPtr resctrl) if (ret <= 0) goto cleanup; + ret = virResctrlGetMemoryBandwidthInfo(resctrl); + if (ret < 0) + goto cleanup; + ret = virResctrlGetCacheInfo(resctrl, dirp); if (ret < 0) goto cleanup; @@ -492,6 +579,9 @@ virResctrlInfoIsEmpty(virResctrlInfoPtr resctrl) if (!resctrl) return true; + if (resctrl->membw_info) + return false; + for (i = 0; i < resctrl->nlevels; i++) { virResctrlInfoPerLevelPtr i_level = resctrl->levels[i]; @@ -517,12 +607,26 @@ virResctrlInfoGetCache(virResctrlInfoPtr resctrl, { virResctrlInfoPerLevelPtr i_level = NULL; virResctrlInfoPerTypePtr i_type = NULL; + virResctrlInfoMemBWPtr membw_info = NULL; size_t i = 0; int ret = -1; if (virResctrlInfoIsEmpty(resctrl)) return 0; + /* Let's take the opportunity to update the number of last level + * cache. This number of memory bandwidth controller is same with + * last level cache */ + if (resctrl->membw_info) { + membw_info = resctrl->membw_info; + if (level > membw_info->last_level_cache) { + membw_info->last_level_cache = level; + membw_info->max_id = 0; + } else if (membw_info->last_level_cache == level) { + membw_info->max_id++; + } + } + if (level >= resctrl->nlevels) return 0; -- 2.7.4

On 07/18/2018 03:57 AM, bing.niu@intel.com wrote:
From: Bing Niu <bing.niu@intel.com>
Introducing virResctrlInfoMemBW for the information memory bandwidth allocation information. Those information is used for memory bandwidth allocating.
Last sentence is duplicitous.
Signed-off-by: Bing Niu <bing.niu@intel.com> --- src/util/virresctrl.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 104 insertions(+)
diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index 1515de2..06e2702 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -80,6 +80,9 @@ typedef virResctrlInfoPerType *virResctrlInfoPerTypePtr; typedef struct _virResctrlInfoPerLevel virResctrlInfoPerLevel; typedef virResctrlInfoPerLevel *virResctrlInfoPerLevelPtr;
+typedef struct _virResctrlInfoMemBW virResctrlInfoMemBW; +typedef virResctrlInfoMemBW *virResctrlInfoMemBWPtr; + typedef struct _virResctrlAllocPerType virResctrlAllocPerType; typedef virResctrlAllocPerType *virResctrlAllocPerTypePtr;
@@ -116,11 +119,31 @@ struct _virResctrlInfoPerLevel { virResctrlInfoPerTypePtr *types; };
+/* Information about memory bandwidth allocation */ +struct _virResctrlInfoMemBW { + /* minimum memory bandwidth allowed */ + unsigned int min_bandwidth; + /* bandwidth granularity */ + unsigned int bandwidth_granularity; + /* Maximum number of simultaneous allocations */ + unsigned int max_allocation; + /* level number of last level cache */ + unsigned int last_level_cache; + /* max id of last level cache, this is used to track + * how many last level cache available in host system, + * the number of memory bandwidth allocation controller + * is identical with last level cache. + */
The closing */ can be on the previous line.
+ unsigned int max_id; +}; + struct _virResctrlInfo { virObject parent;
virResctrlInfoPerLevelPtr *levels; size_t nlevels; + + virResctrlInfoMemBWPtr membw_info; };
@@ -146,6 +169,7 @@ virResctrlInfoDispose(void *obj) VIR_FREE(level); }
+ VIR_FREE(resctrl->membw_info); VIR_FREE(resctrl->levels); }
@@ -442,6 +466,65 @@ virResctrlGetCacheInfo(virResctrlInfoPtr resctrl, DIR *dirp)
static int +virResctrlGetMemoryBandwidthInfo(virResctrlInfoPtr resctrl) +{ + int ret = -1; + int rv = -1; + virResctrlInfoMemBWPtr i_membw = NULL; + + /* query memory bandwidth allocation info */ + if (VIR_ALLOC(i_membw) < 0) + goto cleanup; + rv = virFileReadValueUint(&i_membw->bandwidth_granularity, + SYSFS_RESCTRL_PATH "/info/MB/bandwidth_gran"); + if (rv == -2) { + /* The file doesn't exist, so it's unusable for us, + * properly mba unsupported */
s/ properly/probably/ ?? Is that what you meant? s/mba/memory bandwidth/ right?
+ VIR_INFO("The path '" SYSFS_RESCTRL_PATH "/info/MB/bandwidth_gran'" + "does not exist"); + ret = 0; + goto cleanup; + } else if (rv < 0) { + /* Other failures are fatal, so just quit */ + goto cleanup; + } + + rv = virFileReadValueUint(&i_membw->min_bandwidth, + SYSFS_RESCTRL_PATH "/info/MB/min_bandwidth"); + if (rv == -2) { + /* If the previous file exists, so should this one. Hence -2 is
s/ Hence/ Hence/ It's a raging debate at times on this list whether to go with 1 space or 2. I think 1 space usually wins, but I use 2 many times as well. I'll adjust this to 1 space though ;-)
+ * fatal in this case (errors out in next condition) - the + * kernel interface might've changed too much or something else is + * wrong. */
"kernel" can fit on previous line meaning "wrong." can fit on previous line. Also, no need for blank line between here and virReportError
+ + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Cannot get min bandwidth from resctrl memory info")); + } + if (rv < 0) + goto cleanup; + + rv = virFileReadValueUint(&i_membw->max_allocation, + SYSFS_RESCTRL_PATH "/info/MB/num_closids"); + if (rv == -2) { + /* If the previous file exists, so should this one. Hence -2 is + * fatal in this case as well (errors out in next condition) - the + * kernel interface might've changed too much or something else is + * wrong. */
I'd just note /* Similar reasoning as min_bandwidth above */
+ virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Cannot get max allocation from resctrl memory info")); + } + if (rv < 0) + goto cleanup; + + VIR_STEAL_PTR(resctrl->membw_info, i_membw); + ret = 0; + cleanup: + VIR_FREE(i_membw); + return ret; +} + + +static int virResctrlGetInfo(virResctrlInfoPtr resctrl) { DIR *dirp = NULL; @@ -451,6 +534,10 @@ virResctrlGetInfo(virResctrlInfoPtr resctrl) if (ret <= 0) goto cleanup;
+ ret = virResctrlGetMemoryBandwidthInfo(resctrl); + if (ret < 0) + goto cleanup; + ret = virResctrlGetCacheInfo(resctrl, dirp); if (ret < 0) goto cleanup; @@ -492,6 +579,9 @@ virResctrlInfoIsEmpty(virResctrlInfoPtr resctrl) if (!resctrl) return true;
+ if (resctrl->membw_info) + return false; + for (i = 0; i < resctrl->nlevels; i++) { virResctrlInfoPerLevelPtr i_level = resctrl->levels[i];
@@ -517,12 +607,26 @@ virResctrlInfoGetCache(virResctrlInfoPtr resctrl, { virResctrlInfoPerLevelPtr i_level = NULL; virResctrlInfoPerTypePtr i_type = NULL; + virResctrlInfoMemBWPtr membw_info = NULL;
This can be local to the if...
size_t i = 0; int ret = -1;
if (virResctrlInfoIsEmpty(resctrl)) return 0;
+ /* Let's take the opportunity to update the number of last level + * cache. This number of memory bandwidth controller is same with + * last level cache */ + if (resctrl->membw_info) {
virResctrlInfoMemBWPtr membw_info = resctrl->membw_info;
+ membw_info = resctrl->membw_info; + if (level > membw_info->last_level_cache) { + membw_info->last_level_cache = level; + membw_info->max_id = 0; + } else if (membw_info->last_level_cache == level) { + membw_info->max_id++; + } + } +
This last hunk should be it's own patch. I can split it out for you. The rest of the patch introduces the concept, does the query, and saves the data in the "right place". This hunk says, hey we have some membw_info data that can change our perception of reality, so we need to adjust. Although nothing yet has set last_level_cache or max_id... I'll assume that's comeing. I'll also make membw_info "local" to the if {}. The only hmm... I have is this last hunk, I've already forgotten what we discussed the previous series. But my hmm is related to why is it here and what impact can it (eventually) have if the values are changed in this method while perhaps also being changed in a different method. I'm sure I'll learn more of that as I move forward. Reviewed-by: John Ferlan <jferlan@redhat.com> John BTW: I'm stopping here for the evening, I'll work through the rest hopefully tomorrow depending on interruptions.
if (level >= resctrl->nlevels) return 0;

On 2018年07月25日 06:08, John Ferlan wrote:
On 07/18/2018 03:57 AM, bing.niu@intel.com wrote:
From: Bing Niu <bing.niu@intel.com>
Introducing virResctrlInfoMemBW for the information memory bandwidth allocation information. Those information is used for memory bandwidth allocating.
Last sentence is duplicitous.
Signed-off-by: Bing Niu <bing.niu@intel.com> --- src/util/virresctrl.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 104 insertions(+)
diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index 1515de2..06e2702 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -80,6 +80,9 @@ typedef virResctrlInfoPerType *virResctrlInfoPerTypePtr; typedef struct _virResctrlInfoPerLevel virResctrlInfoPerLevel; typedef virResctrlInfoPerLevel *virResctrlInfoPerLevelPtr;
+typedef struct _virResctrlInfoMemBW virResctrlInfoMemBW; +typedef virResctrlInfoMemBW *virResctrlInfoMemBWPtr; + typedef struct _virResctrlAllocPerType virResctrlAllocPerType; typedef virResctrlAllocPerType *virResctrlAllocPerTypePtr;
@@ -116,11 +119,31 @@ struct _virResctrlInfoPerLevel { virResctrlInfoPerTypePtr *types; };
+/* Information about memory bandwidth allocation */ +struct _virResctrlInfoMemBW { + /* minimum memory bandwidth allowed */ + unsigned int min_bandwidth; + /* bandwidth granularity */ + unsigned int bandwidth_granularity; + /* Maximum number of simultaneous allocations */ + unsigned int max_allocation; + /* level number of last level cache */ + unsigned int last_level_cache; + /* max id of last level cache, this is used to track + * how many last level cache available in host system, + * the number of memory bandwidth allocation controller + * is identical with last level cache. + */
The closing */ can be on the previous line.
+ unsigned int max_id; +}; + struct _virResctrlInfo { virObject parent;
virResctrlInfoPerLevelPtr *levels; size_t nlevels; + + virResctrlInfoMemBWPtr membw_info; };
@@ -146,6 +169,7 @@ virResctrlInfoDispose(void *obj) VIR_FREE(level); }
+ VIR_FREE(resctrl->membw_info); VIR_FREE(resctrl->levels); }
@@ -442,6 +466,65 @@ virResctrlGetCacheInfo(virResctrlInfoPtr resctrl, DIR *dirp)
static int +virResctrlGetMemoryBandwidthInfo(virResctrlInfoPtr resctrl) +{ + int ret = -1; + int rv = -1; + virResctrlInfoMemBWPtr i_membw = NULL; + + /* query memory bandwidth allocation info */ + if (VIR_ALLOC(i_membw) < 0) + goto cleanup; + rv = virFileReadValueUint(&i_membw->bandwidth_granularity, + SYSFS_RESCTRL_PATH "/info/MB/bandwidth_gran"); + if (rv == -2) { + /* The file doesn't exist, so it's unusable for us, + * properly mba unsupported */
s/ properly/probably/
?? Is that what you meant?
Yes! probably. :)
s/mba/memory bandwidth/
s/mba/memory bandwidth allocation/
right?
+ VIR_INFO("The path '" SYSFS_RESCTRL_PATH "/info/MB/bandwidth_gran'" + "does not exist"); + ret = 0; + goto cleanup; + } else if (rv < 0) { + /* Other failures are fatal, so just quit */ + goto cleanup; + } + + rv = virFileReadValueUint(&i_membw->min_bandwidth, + SYSFS_RESCTRL_PATH "/info/MB/min_bandwidth"); + if (rv == -2) { + /* If the previous file exists, so should this one. Hence -2 is
s/ Hence/ Hence/
It's a raging debate at times on this list whether to go with 1 space or 2. I think 1 space usually wins, but I use 2 many times as well. I'll adjust this to 1 space though ;-)
Good to know this history :). Above 2 space is just following the existing virresctrl code.
+ * fatal in this case (errors out in next condition) - the + * kernel interface might've changed too much or something else is + * wrong. */
"kernel" can fit on previous line meaning "wrong." can fit on previous line. Also, no need for blank line between here and virReportError
Sorry I am a bit lost. Do you mean put "kernel" in previous line? right? I have a try in Thunderbird. Looks like put “kernel” in previous line beyond max length of line and Thunderbird give me automatically split. :(
+ + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Cannot get min bandwidth from resctrl memory info")); + } + if (rv < 0) + goto cleanup; + + rv = virFileReadValueUint(&i_membw->max_allocation, + SYSFS_RESCTRL_PATH "/info/MB/num_closids"); + if (rv == -2) { + /* If the previous file exists, so should this one. Hence -2 is + * fatal in this case as well (errors out in next condition) - the + * kernel interface might've changed too much or something else is + * wrong. */
I'd just note /* Similar reasoning as min_bandwidth above */
Short and clear!
+ virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Cannot get max allocation from resctrl memory info")); + } + if (rv < 0) + goto cleanup; + + VIR_STEAL_PTR(resctrl->membw_info, i_membw); + ret = 0; + cleanup: + VIR_FREE(i_membw); + return ret; +} + + +static int virResctrlGetInfo(virResctrlInfoPtr resctrl) { DIR *dirp = NULL; @@ -451,6 +534,10 @@ virResctrlGetInfo(virResctrlInfoPtr resctrl) if (ret <= 0) goto cleanup;
+ ret = virResctrlGetMemoryBandwidthInfo(resctrl); + if (ret < 0) + goto cleanup; + ret = virResctrlGetCacheInfo(resctrl, dirp); if (ret < 0) goto cleanup; @@ -492,6 +579,9 @@ virResctrlInfoIsEmpty(virResctrlInfoPtr resctrl) if (!resctrl) return true;
+ if (resctrl->membw_info) + return false; + for (i = 0; i < resctrl->nlevels; i++) { virResctrlInfoPerLevelPtr i_level = resctrl->levels[i];
@@ -517,12 +607,26 @@ virResctrlInfoGetCache(virResctrlInfoPtr resctrl, { virResctrlInfoPerLevelPtr i_level = NULL; virResctrlInfoPerTypePtr i_type = NULL; + virResctrlInfoMemBWPtr membw_info = NULL;
This can be local to the if...
size_t i = 0; int ret = -1;
if (virResctrlInfoIsEmpty(resctrl)) return 0;
+ /* Let's take the opportunity to update the number of last level + * cache. This number of memory bandwidth controller is same with + * last level cache */ + if (resctrl->membw_info) {
virResctrlInfoMemBWPtr membw_info = resctrl->membw_info;
Yes. Should put to here. :). I remember some compile flags will complain if local variable not at the begin of a function.
+ membw_info = resctrl->membw_info; + if (level > membw_info->last_level_cache) { + membw_info->last_level_cache = level; + membw_info->max_id = 0; + } else if (membw_info->last_level_cache == level) { + membw_info->max_id++; + } + } +
This last hunk should be it's own patch. I can split it out for you. The rest of the patch introduces the concept, does the query, and saves the data in the "right place".
This hunk says, hey we have some membw_info data that can change our perception of reality, so we need to adjust. Although nothing yet has set last_level_cache or max_id... I'll assume that's comeing.
I'll also make membw_info "local" to the if {}.
The only hmm... I have is this last hunk, I've already forgotten what we discussed the previous series. But my hmm is related to why is it here and what impact can it (eventually) have if the values are changed in this method while perhaps also being changed in a different method. I'm sure I'll learn more of that as I move forward.
Thanks for that! Let me help to recall the discuss. :) RDT kernel module will report some parameters for MBA. They are : "min_bandwidth": The minimum memory bandwidth percentage which user can request. "bandwidth_gran": The granularity in which the memory bandwidth percentage is allocated. The allocated b/w percentage is rounded off to the next control step available on the hardware. The available bandwidth control steps are: min_bandwidth + N * bandwidth_gran. "num_closids": The number mba group can exist simultaneous. Those information is query from kernel interface /sys/fs/resctrl/info/MB Above hunk is used to calculate the number of memory bandwidth allocation controllers in system. The number of memory bandwidth allocation controllers is same as last level cache. This number is calculate by traversing cache hierarchy of host(/sys/bus/cpu/devices/cpuX/cache/). So above hunk is used to collect that information, to sanitize domain XML about memory bandwidth allocation part. And the number of memory bandwidth allocation controllers will not change, it just cannot query directly from RDT kernel module.
Reviewed-by: John Ferlan <jferlan@redhat.com>
John
BTW: I'm stopping here for the evening, I'll work through the rest hopefully tomorrow depending on interruptions.
Good evening :). The rest are about 1. allocate memory bandwidth 2. domain XML and 3. host capability XML.
if (level >= resctrl->nlevels) return 0;
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On 07/25/2018 12:28 AM, bing.niu wrote:
On 2018年07月25日 06:08, John Ferlan wrote:
[...]
+ * fatal in this case (errors out in next condition) - the + * kernel interface might've changed too much or something else is + * wrong. */
"kernel" can fit on previous line meaning "wrong." can fit on previous line. Also, no need for blank line between here and virReportError
Sorry I am a bit lost. Do you mean put "kernel" in previous line? right? I have a try in Thunderbird. Looks like put “kernel” in previous line beyond max length of line and Thunderbird give me automatically split. :(
Yes moving kernel up a line is what I meant. It's taken care of in my branch already.
[...]
+ membw_info = resctrl->membw_info; + if (level > membw_info->last_level_cache) { + membw_info->last_level_cache = level; + membw_info->max_id = 0; + } else if (membw_info->last_level_cache == level) { + membw_info->max_id++; + } + } +
This last hunk should be it's own patch. I can split it out for you. The rest of the patch introduces the concept, does the query, and saves the data in the "right place".
This hunk says, hey we have some membw_info data that can change our perception of reality, so we need to adjust. Although nothing yet has set last_level_cache or max_id... I'll assume that's comeing.
I'll also make membw_info "local" to the if {}.
The only hmm... I have is this last hunk, I've already forgotten what we discussed the previous series. But my hmm is related to why is it here and what impact can it (eventually) have if the values are changed in this method while perhaps also being changed in a different method. I'm sure I'll learn more of that as I move forward.
Thanks for that! Let me help to recall the discuss. :) RDT kernel module will report some parameters for MBA. They are : "min_bandwidth": The minimum memory bandwidth percentage which user can request.
"bandwidth_gran": The granularity in which the memory bandwidth percentage is allocated. The allocated b/w percentage is rounded off to the next control step available on the hardware. The available bandwidth control steps are: min_bandwidth + N * bandwidth_gran.
"num_closids": The number mba group can exist simultaneous.
Those information is query from kernel interface /sys/fs/resctrl/info/MB
Right this I understand the other fields are fetch-able. Setting last_level_cache and max_id is only ever done in virResctrlInfoGetCache. I have to remind myself what ends up calling into here and the order of processing for all this code.
Above hunk is used to calculate the number of memory bandwidth allocation controllers in system. The number of memory bandwidth allocation controllers is same as last level cache. This number is calculate by traversing cache hierarchy of host(/sys/bus/cpu/devices/cpuX/cache/). So above hunk is used to collect that information, to sanitize domain XML about memory bandwidth allocation part. And the number of memory bandwidth allocation controllers will not change, it just cannot query directly from RDT kernel module.
So does the following seem like a good summary for the split out patch?: util: Add MBA check to virResctrlInfoGetCache If we have some membw_info data, then we need to calculate the number of MBA controllers on the system. The value cannot be obtained from a direct query to the RDT kernel module, but it is the same as the last level cache value which is calculated by traversing the cache hierarchy of host(/sys/bus/cpu/devices/cpuX/cache/). John
Reviewed-by: John Ferlan <jferlan@redhat.com>
John
BTW: I'm stopping here for the evening, I'll work through the rest hopefully tomorrow depending on interruptions.
Good evening :). The rest are about 1. allocate memory bandwidth 2. domain XML and 3. host capability XML.
if (level >= resctrl->nlevels) return 0;
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On 2018年07月26日 06:31, John Ferlan wrote:
On 07/25/2018 12:28 AM, bing.niu wrote:
On 2018年07月25日 06:08, John Ferlan wrote:
[...]
+ * fatal in this case (errors out in next condition) - the + * kernel interface might've changed too much or something else is + * wrong. */
"kernel" can fit on previous line meaning "wrong." can fit on previous line. Also, no need for blank line between here and virReportError
Sorry I am a bit lost. Do you mean put "kernel" in previous line? right? I have a try in Thunderbird. Looks like put “kernel” in previous line beyond max length of line and Thunderbird give me automatically split. :(
Yes moving kernel up a line is what I meant. It's taken care of in my branch already.
Thanks for this!
[...]
+ membw_info = resctrl->membw_info; + if (level > membw_info->last_level_cache) { + membw_info->last_level_cache = level; + membw_info->max_id = 0; + } else if (membw_info->last_level_cache == level) { + membw_info->max_id++; + } + } +
This last hunk should be it's own patch. I can split it out for you. The rest of the patch introduces the concept, does the query, and saves the data in the "right place".
This hunk says, hey we have some membw_info data that can change our perception of reality, so we need to adjust. Although nothing yet has set last_level_cache or max_id... I'll assume that's comeing.
I'll also make membw_info "local" to the if {}.
The only hmm... I have is this last hunk, I've already forgotten what we discussed the previous series. But my hmm is related to why is it here and what impact can it (eventually) have if the values are changed in this method while perhaps also being changed in a different method. I'm sure I'll learn more of that as I move forward.
Thanks for that! Let me help to recall the discuss. :) RDT kernel module will report some parameters for MBA. They are : "min_bandwidth": The minimum memory bandwidth percentage which user can request.
"bandwidth_gran": The granularity in which the memory bandwidth percentage is allocated. The allocated b/w percentage is rounded off to the next control step available on the hardware. The available bandwidth control steps are: min_bandwidth + N * bandwidth_gran.
"num_closids": The number mba group can exist simultaneous.
Those information is query from kernel interface /sys/fs/resctrl/info/MB
Right this I understand the other fields are fetch-able. Setting last_level_cache and max_id is only ever done in virResctrlInfoGetCache. I have to remind myself what ends up calling into here and the order of processing for all this code.
Above hunk is used to calculate the number of memory bandwidth allocation controllers in system. The number of memory bandwidth allocation controllers is same as last level cache. This number is calculate by traversing cache hierarchy of host(/sys/bus/cpu/devices/cpuX/cache/). So above hunk is used to collect that information, to sanitize domain XML about memory bandwidth allocation part. And the number of memory bandwidth allocation controllers will not change, it just cannot query directly from RDT kernel module.
So does the following seem like a good summary for the split out patch?:
util: Add MBA check to virResctrlInfoGetCache
If we have some membw_info data, then we need to calculate the number of MBA controllers on the system. The value cannot be obtained from a direct query to the RDT kernel module, but it is the same as the last level cache value which is calculated by traversing the cache hierarchy of host(/sys/bus/cpu/devices/cpuX/cache/).
Above summary is clear and informative. :)
John
Reviewed-by: John Ferlan <jferlan@redhat.com>
John
BTW: I'm stopping here for the evening, I'll work through the rest hopefully tomorrow depending on interruptions.
Good evening :). The rest are about 1. allocate memory bandwidth 2. domain XML and 3. host capability XML.
if (level >= resctrl->nlevels) return 0;
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

From: Bing Niu <bing.niu@intel.com> Add memory bandwidth allocation support to virresctrl class. Introducing virResctrlAllocMemBW which is used for allocating memory bandwidth. Following virResctrlAllocPerType, it also employs a nested sparse array to indicate whether allocation is available for particular last level cache. Signed-off-by: Bing Niu <bing.niu@intel.com> --- src/util/virresctrl.c | 346 ++++++++++++++++++++++++++++++++++++++++++++++++-- src/util/virresctrl.h | 13 ++ 2 files changed, 346 insertions(+), 13 deletions(-) diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index 06e2702..bec2afd 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -36,9 +36,9 @@ VIR_LOG_INIT("util.virresctrl") /* Resctrl is short for Resource Control. It might be implemented for various - * resources, but at the time of this writing this is only supported for cache - * allocation technology (aka CAT). Hence the reson for leaving 'Cache' out of - * all the structure and function names for now (can be added later if needed. + * resources. Currently this supports cache allocation technology (aka CAT) and + * memory bandwidth allocation (aka MBA). More resources technologies may be + * added in feature. */ @@ -89,6 +89,8 @@ typedef virResctrlAllocPerType *virResctrlAllocPerTypePtr; typedef struct _virResctrlAllocPerLevel virResctrlAllocPerLevel; typedef virResctrlAllocPerLevel *virResctrlAllocPerLevelPtr; +typedef struct _virResctrlAllocMemBW virResctrlAllocMemBW; +typedef virResctrlAllocMemBW *virResctrlAllocMemBWPtr; /* Class definitions and initializations */ static virClassPtr virResctrlInfoClass; @@ -181,7 +183,10 @@ virResctrlInfoDispose(void *obj) * consequently a directory under /sys/fs/resctrl). Since it can have multiple * parts of multiple caches allocated it is represented as bunch of nested * sparse arrays (by sparse I mean array of pointers so that each might be NULL - * in case there is no allocation for that particular one (level, cache, ...)). + * in case there is no allocation for that particular cache allocation (level, + * cache, ...) or memory allocation for particular node). + * + * =====Cache allocation technology (CAT)===== * * Since one allocation can be made for caches on different levels, the first * nested sparse array is of types virResctrlAllocPerLevel. For example if you @@ -206,6 +211,16 @@ virResctrlInfoDispose(void *obj) * all of them. While doing that we store the bitmask in a sparse array of * virBitmaps named `masks` indexed the same way as `sizes`. The upper bounds * of the sparse arrays are stored in nmasks or nsizes, respectively. + * + * =====Memory Bandwidth allocation technology (MBA)===== + * + * The memory bandwidth allocation support in virResctrlAlloc works in the same + * fashion as CAT. However, memory bandwidth controller doesn't have a hierarchy + * organization as cache, each node have one memory bandwidth controller to + * memory bandwidth distribution. The number of memory bandwidth controller is + * identical with number of last level cache. So MBA also employs a sparse array + * to represent whether a memory bandwidth allocation happens on corresponding node. + * The available memory controller number is collected in 'virResctrlInfo'. */ struct _virResctrlAllocPerType { /* There could be bool saying whether this is set or not, but since everything @@ -226,12 +241,24 @@ struct _virResctrlAllocPerLevel { * VIR_CACHE_TYPE_LAST number of items */ }; +/* + * virResctrlAllocMemBW represents one memory bandwidth allocation. Since it can have + * several last level caches in a NUMA system, it is also represented as a nested + * sparse arrays as virRestrlAllocPerLevel. + */ +struct _virResctrlAllocMemBW { + unsigned int **bandwidths; + size_t nbandwidths; +}; + struct _virResctrlAlloc { virObject parent; virResctrlAllocPerLevelPtr *levels; size_t nlevels; + virResctrlAllocMemBWPtr mem_bw; + /* The identifier (any unique string for now) */ char *id; /* libvirt-generated path in /sys/fs/resctrl for this particular @@ -275,6 +302,13 @@ virResctrlAllocDispose(void *obj) VIR_FREE(level); } + if (alloc->mem_bw) { + virResctrlAllocMemBWPtr mem_bw = alloc->mem_bw; + for (i = 0; i < mem_bw->nbandwidths; i++) + VIR_FREE(mem_bw->bandwidths[i]); + } + + VIR_FREE(alloc->mem_bw); VIR_FREE(alloc->id); VIR_FREE(alloc->path); VIR_FREE(alloc->levels); @@ -697,6 +731,9 @@ virResctrlAllocIsEmpty(virResctrlAllocPtr alloc) if (!alloc) return true; + if (alloc->mem_bw) + return false; + for (i = 0; i < alloc->nlevels; i++) { virResctrlAllocPerLevelPtr a_level = alloc->levels[i]; @@ -890,6 +927,27 @@ virResctrlAllocSetCacheSize(virResctrlAllocPtr alloc, int +virResctrlAllocForeachMemory(virResctrlAllocPtr alloc, + virResctrlAllocForeachMemoryCallback cb, + void *opaque) +{ + size_t i = 0; + + if (!alloc) + return 0; + + if (alloc->mem_bw) { + virResctrlAllocMemBWPtr mem_bw = alloc->mem_bw; + for (i = 0; i < mem_bw->nbandwidths; i++) + if (mem_bw->bandwidths[i]) + cb(i, *mem_bw->bandwidths[i], opaque); + } + + return 0; +} + + +int virResctrlAllocForeachCache(virResctrlAllocPtr alloc, virResctrlAllocForeachCacheCallback cb, void *opaque) @@ -952,6 +1010,240 @@ virResctrlAllocGetID(virResctrlAllocPtr alloc) } +static void +virResctrlMemoryBandwidthSubtract(virResctrlAllocPtr free, + virResctrlAllocPtr used) +{ + size_t i; + + if (!used->mem_bw) + return; + + for (i = 0; i < used->mem_bw->nbandwidths; i++) { + if (used->mem_bw->bandwidths[i]) + *(free->mem_bw->bandwidths[i]) -= *(used->mem_bw->bandwidths[i]); + } +} + + +int +virResctrlSetMemoryBandwidth(virResctrlAllocPtr alloc, + unsigned int id, + unsigned int memory_bandwidth) +{ + virResctrlAllocMemBWPtr mem_bw = alloc->mem_bw; + + if (!mem_bw) { + if (VIR_ALLOC(mem_bw) < 0) + return -1; + alloc->mem_bw = mem_bw; + } + + if (mem_bw->nbandwidths <= id && + VIR_EXPAND_N(mem_bw->bandwidths, mem_bw->nbandwidths, + id - mem_bw->nbandwidths + 1) < 0) + return -1; + + if (mem_bw->bandwidths[id]) { + virReportError(VIR_ERR_XML_ERROR, + _("Memory Bandwidth already defined for node %u"), + id); + return -1; + } + + if (VIR_ALLOC(mem_bw->bandwidths[id]) < 0) + return -1; + + *(mem_bw->bandwidths[id]) = memory_bandwidth; + return 0; +} + + +static int +virResctrlAllocMemoryBandwidthFormat(virResctrlAllocPtr alloc, + virBufferPtr buf) +{ + size_t i; + + if (!alloc->mem_bw) + return 0; + + virBufferAddLit(buf, "MB:"); + + for (i = 0; i < alloc->mem_bw->nbandwidths; i++) { + if (alloc->mem_bw->bandwidths[i]) { + virBufferAsprintf(buf, "%zd=%u;", i, + *(alloc->mem_bw->bandwidths[i])); + } + } + + virBufferTrim(buf, ";", 1); + virBufferAddChar(buf, '\n'); + if (virBufferCheckError(buf) < 0) + return -1; + else + return 0; +} + + +static int +virResctrlAllocMemoryBandwidth(virResctrlInfoPtr resctrl, + virResctrlAllocPtr alloc, + virResctrlAllocPtr free) +{ + size_t i; + virResctrlAllocMemBWPtr mem_bw_alloc = alloc->mem_bw; + virResctrlAllocMemBWPtr mem_bw_free = free->mem_bw; + virResctrlInfoMemBWPtr mem_bw_info = resctrl->membw_info; + + if (!mem_bw_alloc) + return 0; + + if (mem_bw_alloc && !mem_bw_info) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("RDT Memory Bandwidth allocation " + "unsupported")); + return -1; + } + + for (i = 0; i < mem_bw_alloc->nbandwidths; i++) { + if (!mem_bw_alloc->bandwidths[i]) + continue; + + if (*(mem_bw_alloc->bandwidths[i]) % mem_bw_info->bandwidth_granularity) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Memory Bandwidth allocation of size " + "%u is not divisible by granularity %u"), + *(mem_bw_alloc->bandwidths[i]), + mem_bw_info->bandwidth_granularity); + return -1; + } + if (*(mem_bw_alloc->bandwidths[i]) < mem_bw_info->min_bandwidth) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Memory Bandwidth allocation of size " + "%u is smaller than the minimum " + "allowed allocation %u"), + *(mem_bw_alloc->bandwidths[i]), + mem_bw_info->min_bandwidth); + return -1; + } + if (i > mem_bw_info->max_id) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("bandwidth controller %zd not exist, " + "max controller id %u"), + i, mem_bw_info->max_id); + return -1; + } + if (*(mem_bw_alloc->bandwidths[i]) > *(mem_bw_free->bandwidths[i])) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Not enough room for allocation of %u%% " + "bandwidth on node %zd, available bandwidth %u%%"), + *(mem_bw_alloc->bandwidths[i]), i, + *(mem_bw_free->bandwidths[i])); + return -1; + } + } + return 0; +} + + +static int +virResctrlAllocParseProcessMemoryBandwidth(virResctrlInfoPtr resctrl, + virResctrlAllocPtr alloc, + char *mem_bw) +{ + unsigned int bandwidth; + unsigned int id; + char *tmp = NULL; + + tmp = strchr(mem_bw, '='); + if (!tmp) + return 0; + *tmp = '\0'; + tmp++; + + if (virStrToLong_uip(mem_bw, NULL, 10, &id) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Invalid node id %u "), id); + return -1; + } + if (virStrToLong_uip(tmp, NULL, 10, &bandwidth) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Invalid bandwidth %u"), bandwidth); + return -1; + } + if (bandwidth < resctrl->membw_info->min_bandwidth || + id > resctrl->membw_info->max_id) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Missing or inconsistent resctrl info for " + "memory bandwidth node '%u'"), id); + return -1; + } + if (alloc->mem_bw->nbandwidths <= id && + VIR_EXPAND_N(alloc->mem_bw->bandwidths, alloc->mem_bw->nbandwidths, + id - alloc->mem_bw->nbandwidths + 1) < 0) { + return -1; + } + if (!alloc->mem_bw->bandwidths[id]) { + if (VIR_ALLOC(alloc->mem_bw->bandwidths[id]) < 0) + return -1; + } + + *(alloc->mem_bw->bandwidths[id]) = bandwidth; + return 0; +} + + +static int +virResctrlAllocParseMemoryBandwidthLine(virResctrlInfoPtr resctrl, + virResctrlAllocPtr alloc, + char *line) +{ + char **mbs = NULL; + char *tmp = NULL; + size_t nmbs = 0; + size_t i; + int ret = -1; + + /* For no reason there can be spaces */ + virSkipSpaces((const char **) &line); + + if (STRNEQLEN(line, "MB", 2)) + return 0; + + if (!resctrl || !resctrl->membw_info || + !resctrl->membw_info->min_bandwidth || + !resctrl->membw_info->bandwidth_granularity) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Missing or inconsistent resctrl info for " + "memory bandwidth allocation")); + } + + if (!alloc->mem_bw) { + if (VIR_ALLOC(alloc->mem_bw) < 0) + return -1; + } + + tmp = strchr(line, ':'); + if (!tmp) + return 0; + tmp++; + + mbs = virStringSplitCount(tmp, ";", 0, &nmbs); + if (nmbs == 0) + return 0; + + for (i = 0; i < nmbs; i++) { + if (virResctrlAllocParseProcessMemoryBandwidth(resctrl, alloc, mbs[i]) < 0) + goto cleanup; + } + ret = 0; + cleanup: + virStringListFree(mbs); + return ret; +} + + static int virResctrlAllocFormatCache(virResctrlAllocPtr alloc, virBufferPtr buf) { @@ -1013,6 +1305,11 @@ virResctrlAllocFormat(virResctrlAllocPtr alloc) return NULL; } + if (virResctrlAllocMemoryBandwidthFormat(alloc, &buf) < 0) { + virBufferFreeAndReset(&buf); + return NULL; + } + return virBufferContentAndReset(&buf); } @@ -1139,6 +1436,8 @@ virResctrlAllocParse(virResctrlInfoPtr resctrl, lines = virStringSplitCount(schemata, "\n", 0, &nlines); for (i = 0; i < nlines; i++) { + if (virResctrlAllocParseMemoryBandwidthLine(resctrl, alloc, lines[i]) < 0) + goto cleanup; if (virResctrlAllocParseCacheLine(resctrl, alloc, lines[i]) < 0) goto cleanup; } @@ -1273,6 +1572,22 @@ virResctrlAllocNewFromInfo(virResctrlInfoPtr info) } } + /* set default free memory bandwidth to 100%*/ + if (info->membw_info) { + if (VIR_ALLOC(ret->mem_bw) < 0) + goto error; + + if (VIR_EXPAND_N(ret->mem_bw->bandwidths, ret->mem_bw->nbandwidths, + info->membw_info->max_id + 1) < 0) + goto error; + + for (i = 0; i < ret->mem_bw->nbandwidths; i++) { + if (VIR_ALLOC(ret->mem_bw->bandwidths[i]) < 0) + goto error; + *(ret->mem_bw->bandwidths[i]) = 100; + } + } + cleanup: virBitmapFree(mask); return ret; @@ -1284,13 +1599,14 @@ virResctrlAllocNewFromInfo(virResctrlInfoPtr info) /* * This function creates an allocation that represents all unused parts of all - * caches in the system. It uses virResctrlInfo for creating a new full - * allocation with all bits set (using virResctrlAllocNewFromInfo()) and then - * scans for all allocations under /sys/fs/resctrl and subtracts each one of - * them from it. That way it can then return an allocation with only bit set - * being those that are not mentioned in any other allocation. It is used for - * two things, a) calculating the masks when creating allocations and b) from - * tests. + * caches and memory bandwidth in the system. It uses virResctrlInfo for + * creating a new full allocation with all bits set (using + * virResctrlAllocNewFromInfo()), memory bandwidth 100% and then scans + * for all allocations under /sys/fs/resctrl and subtracts each one of them + * from it. That way it can then return an allocation with only bit set + * being those that are not mentioned in any other allocation for CAT and + * available memory bandwidth for MBA. It is used for two things, a) calculating + * the masks and bandwidth available when creating allocations and b) from tests. */ virResctrlAllocPtr virResctrlAllocGetUnused(virResctrlInfoPtr resctrl) @@ -1336,6 +1652,7 @@ virResctrlAllocGetUnused(virResctrlInfoPtr resctrl) goto error; } + virResctrlMemoryBandwidthSubtract(ret, alloc); virResctrlAllocSubtract(ret, alloc); virObjectUnref(alloc); alloc = NULL; @@ -1526,8 +1843,8 @@ virResctrlAllocCopyMasks(virResctrlAllocPtr dst, /* * This function is called when creating an allocation in the system. What it - * does is that it gets all the unused bits using virResctrlAllocGetUnused() and - * then tries to find a proper space for every requested allocation effectively + * does is that it gets all the unused resources using virResctrlAllocGetUnused() + * and then tries to find a proper space for every requested allocation effectively * transforming `sizes` into `masks`. */ static int @@ -1547,6 +1864,9 @@ virResctrlAllocAssign(virResctrlInfoPtr resctrl, if (!alloc_default) goto cleanup; + if (virResctrlAllocMemoryBandwidth(resctrl, alloc, alloc_free) < 0) + goto cleanup; + if (virResctrlAllocCopyMasks(alloc, alloc_default) < 0) goto cleanup; diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h index d657c06..d43fd31 100644 --- a/src/util/virresctrl.h +++ b/src/util/virresctrl.h @@ -73,6 +73,10 @@ typedef int virResctrlAllocForeachCacheCallback(unsigned int level, unsigned long long size, void *opaque); +typedef int virResctrlAllocForeachMemoryCallback(unsigned int id, + unsigned int size, + void *opaque); + virResctrlAllocPtr virResctrlAllocNew(void); @@ -85,6 +89,15 @@ virResctrlAllocSetCacheSize(virResctrlAllocPtr alloc, virCacheType type, unsigned int cache, unsigned long long size); +int +virResctrlAllocForeachMemory(virResctrlAllocPtr resctrl, + virResctrlAllocForeachMemoryCallback cb, + void *opaque); + +int +virResctrlSetMemoryBandwidth(virResctrlAllocPtr alloc, + unsigned int id, + unsigned int memory_bandwidth); int virResctrlAllocForeachCache(virResctrlAllocPtr alloc, -- 2.7.4

On 07/18/2018 03:57 AM, bing.niu@intel.com wrote:
From: Bing Niu <bing.niu@intel.com>
Add memory bandwidth allocation support to virresctrl class. Introducing virResctrlAllocMemBW which is used for allocating memory bandwidth. Following virResctrlAllocPerType, it also employs a nested sparse array to indicate whether allocation is available for particular last level cache.
Signed-off-by: Bing Niu <bing.niu@intel.com> --- src/util/virresctrl.c | 346 ++++++++++++++++++++++++++++++++++++++++++++++++-- src/util/virresctrl.h | 13 ++ 2 files changed, 346 insertions(+), 13 deletions(-)
diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index 06e2702..bec2afd 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -36,9 +36,9 @@ VIR_LOG_INIT("util.virresctrl")
/* Resctrl is short for Resource Control. It might be implemented for various - * resources, but at the time of this writing this is only supported for cache - * allocation technology (aka CAT). Hence the reson for leaving 'Cache' out of - * all the structure and function names for now (can be added later if needed. + * resources. Currently this supports cache allocation technology (aka CAT) and + * memory bandwidth allocation (aka MBA). More resources technologies may be + * added in feature.
s/feature/the future/
*/
@@ -89,6 +89,8 @@ typedef virResctrlAllocPerType *virResctrlAllocPerTypePtr; typedef struct _virResctrlAllocPerLevel virResctrlAllocPerLevel; typedef virResctrlAllocPerLevel *virResctrlAllocPerLevelPtr;
+typedef struct _virResctrlAllocMemBW virResctrlAllocMemBW; +typedef virResctrlAllocMemBW *virResctrlAllocMemBWPtr;
/* Class definitions and initializations */ static virClassPtr virResctrlInfoClass; @@ -181,7 +183,10 @@ virResctrlInfoDispose(void *obj) * consequently a directory under /sys/fs/resctrl). Since it can have multiple * parts of multiple caches allocated it is represented as bunch of nested * sparse arrays (by sparse I mean array of pointers so that each might be NULL - * in case there is no allocation for that particular one (level, cache, ...)). + * in case there is no allocation for that particular cache allocation (level, + * cache, ...) or memory allocation for particular node). + * + * =====Cache allocation technology (CAT)===== * * Since one allocation can be made for caches on different levels, the first * nested sparse array is of types virResctrlAllocPerLevel. For example if you @@ -206,6 +211,16 @@ virResctrlInfoDispose(void *obj) * all of them. While doing that we store the bitmask in a sparse array of * virBitmaps named `masks` indexed the same way as `sizes`. The upper bounds * of the sparse arrays are stored in nmasks or nsizes, respectively. + * + * =====Memory Bandwidth allocation technology (MBA)===== + * + * The memory bandwidth allocation support in virResctrlAlloc works in the same + * fashion as CAT. However, memory bandwidth controller doesn't have a hierarchy + * organization as cache, each node have one memory bandwidth controller to + * memory bandwidth distribution. The number of memory bandwidth controller is + * identical with number of last level cache. So MBA also employs a sparse array + * to represent whether a memory bandwidth allocation happens on corresponding node. + * The available memory controller number is collected in 'virResctrlInfo'. */ struct _virResctrlAllocPerType { /* There could be bool saying whether this is set or not, but since everything @@ -226,12 +241,24 @@ struct _virResctrlAllocPerLevel { * VIR_CACHE_TYPE_LAST number of items */ };
+/* + * virResctrlAllocMemBW represents one memory bandwidth allocation. Since it can have + * several last level caches in a NUMA system, it is also represented as a nested + * sparse arrays as virRestrlAllocPerLevel. + */ +struct _virResctrlAllocMemBW { + unsigned int **bandwidths; + size_t nbandwidths; +}; + struct _virResctrlAlloc { virObject parent;
virResctrlAllocPerLevelPtr *levels; size_t nlevels;
+ virResctrlAllocMemBWPtr mem_bw; + /* The identifier (any unique string for now) */ char *id; /* libvirt-generated path in /sys/fs/resctrl for this particular @@ -275,6 +302,13 @@ virResctrlAllocDispose(void *obj) VIR_FREE(level); }
+ if (alloc->mem_bw) { + virResctrlAllocMemBWPtr mem_bw = alloc->mem_bw; + for (i = 0; i < mem_bw->nbandwidths; i++) + VIR_FREE(mem_bw->bandwidths[i]); + } + + VIR_FREE(alloc->mem_bw);
NIT: Could be inside the if condition
VIR_FREE(alloc->id); VIR_FREE(alloc->path); VIR_FREE(alloc->levels); @@ -697,6 +731,9 @@ virResctrlAllocIsEmpty(virResctrlAllocPtr alloc) if (!alloc) return true;
+ if (alloc->mem_bw) + return false; + for (i = 0; i < alloc->nlevels; i++) { virResctrlAllocPerLevelPtr a_level = alloc->levels[i];
@@ -890,6 +927,27 @@ virResctrlAllocSetCacheSize(virResctrlAllocPtr alloc,
int +virResctrlAllocForeachMemory(virResctrlAllocPtr alloc, + virResctrlAllocForeachMemoryCallback cb, + void *opaque) +{ + size_t i = 0; + + if (!alloc) + return 0; + + if (alloc->mem_bw) { + virResctrlAllocMemBWPtr mem_bw = alloc->mem_bw; + for (i = 0; i < mem_bw->nbandwidths; i++) + if (mem_bw->bandwidths[i]) + cb(i, *mem_bw->bandwidths[i], opaque);
@cb is an int function, but only ever used in what amounts to a void function in patch8 (virDomainMemorytuneDefFormatHelper - although defined as int, it only ever returns 0, so it could be void too). So either this changes to handle that or we change the *Callback to be void. Do you have a preference?
+ } + + return 0; +} + + +int virResctrlAllocForeachCache(virResctrlAllocPtr alloc, virResctrlAllocForeachCacheCallback cb, void *opaque) @@ -952,6 +1010,240 @@ virResctrlAllocGetID(virResctrlAllocPtr alloc) }
The next hunk of changes has perhaps more to do with the functionality of the code vs. the pure alloc/free of the structures. I think that alloc/free needs to be split from Parse/Format - it'll make things easier. Then there's Parse of XML vs Parse of the schemata for the MB: lines which just jumbles things (in my mind ;-)) This particular change may even be a 3rd patch... Keep reading ;-)
+static void +virResctrlMemoryBandwidthSubtract(virResctrlAllocPtr free, + virResctrlAllocPtr used) +{ + size_t i; + + if (!used->mem_bw) + return; + + for (i = 0; i < used->mem_bw->nbandwidths; i++) { + if (used->mem_bw->bandwidths[i]) + *(free->mem_bw->bandwidths[i]) -= *(used->mem_bw->bandwidths[i]); + } +} + +
Might be nice to put some comments here. Since this is a backend of sorts for the XML Parse API from patch8, let's wait to introduce it there. That is it moves to patch8.
+int +virResctrlSetMemoryBandwidth(virResctrlAllocPtr alloc, + unsigned int id, + unsigned int memory_bandwidth) +{ + virResctrlAllocMemBWPtr mem_bw = alloc->mem_bw; + + if (!mem_bw) { + if (VIR_ALLOC(mem_bw) < 0) + return -1; + alloc->mem_bw = mem_bw; + } + + if (mem_bw->nbandwidths <= id && + VIR_EXPAND_N(mem_bw->bandwidths, mem_bw->nbandwidths, + id - mem_bw->nbandwidths + 1) < 0) + return -1; + + if (mem_bw->bandwidths[id]) { + virReportError(VIR_ERR_XML_ERROR, + _("Memory Bandwidth already defined for node %u"), + id); + return -1; + } + + if (VIR_ALLOC(mem_bw->bandwidths[id]) < 0) + return -1; + + *(mem_bw->bandwidths[id]) = memory_bandwidth; + return 0; +} + +
Seeing Format and Parse API's was confusing until I remembered that these are for the schemata files. I think perhaps this section could have been commented a bit more to indicate what it's for.
+static int +virResctrlAllocMemoryBandwidthFormat(virResctrlAllocPtr alloc, + virBufferPtr buf) +{ + size_t i; + + if (!alloc->mem_bw) + return 0; + + virBufferAddLit(buf, "MB:"); + + for (i = 0; i < alloc->mem_bw->nbandwidths; i++) { + if (alloc->mem_bw->bandwidths[i]) { + virBufferAsprintf(buf, "%zd=%u;", i, + *(alloc->mem_bw->bandwidths[i])); + } + } + + virBufferTrim(buf, ";", 1); + virBufferAddChar(buf, '\n'); + if (virBufferCheckError(buf) < 0) + return -1; + else + return 0;
return virBufferCheckError(buf);
+} + +
This API is the backend of the virResctrlAllocAssign called during the qemuProcessResctrlCreate processing - so it may be a third patch that can be extracted from this one. I need to look a bit harder though. Similarly the *Subtract API may fall into this category.
+static int +virResctrlAllocMemoryBandwidth(virResctrlInfoPtr resctrl, + virResctrlAllocPtr alloc, + virResctrlAllocPtr free) +{ + size_t i; + virResctrlAllocMemBWPtr mem_bw_alloc = alloc->mem_bw; + virResctrlAllocMemBWPtr mem_bw_free = free->mem_bw; + virResctrlInfoMemBWPtr mem_bw_info = resctrl->membw_info; + + if (!mem_bw_alloc) + return 0;
So can we even have one of these if resctrl->membw_info == NULL? The ->mem_bw is allocated in virResctrlSetMemoryBandwidth, which is only defined, but not called. It will be called later in patch8 from a Parse API I see.
+ + if (mem_bw_alloc && !mem_bw_info) {
In virResctrlGetMemoryBandwidthInfo it's "OK" if bandwidth_gran is not present - a VIR_INFO is generated, the resctrl->membw_info is left as NULL, and we return 0. So failing here would seemingly be quite unexpected, wouldn't it?
+ virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("RDT Memory Bandwidth allocation " + "unsupported")); + return -1; + } + + for (i = 0; i < mem_bw_alloc->nbandwidths; i++) { + if (!mem_bw_alloc->bandwidths[i]) + continue; + + if (*(mem_bw_alloc->bandwidths[i]) % mem_bw_info->bandwidth_granularity) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Memory Bandwidth allocation of size " + "%u is not divisible by granularity %u"), + *(mem_bw_alloc->bandwidths[i]), + mem_bw_info->bandwidth_granularity); + return -1; + } + if (*(mem_bw_alloc->bandwidths[i]) < mem_bw_info->min_bandwidth) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Memory Bandwidth allocation of size " + "%u is smaller than the minimum " + "allowed allocation %u"), + *(mem_bw_alloc->bandwidths[i]), + mem_bw_info->min_bandwidth); + return -1; + } + if (i > mem_bw_info->max_id) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("bandwidth controller %zd not exist, "
s/%zd/id %zd/ s/not/does not/
+ "max controller id %u"), + i, mem_bw_info->max_id); + return -1; + } + if (*(mem_bw_alloc->bandwidths[i]) > *(mem_bw_free->bandwidths[i])) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Not enough room for allocation of %u%% " + "bandwidth on node %zd, available bandwidth %u%%"), + *(mem_bw_alloc->bandwidths[i]), i, + *(mem_bw_free->bandwidths[i])); + return -1; + } + }
So this is just a Check or Validate type function? Should it be renamed or should it actually do something.
+ return 0; +} + +
More stuff for the schemata parse/format
+static int +virResctrlAllocParseProcessMemoryBandwidth(virResctrlInfoPtr resctrl, + virResctrlAllocPtr alloc, + char *mem_bw) +{ + unsigned int bandwidth; + unsigned int id; + char *tmp = NULL; + + tmp = strchr(mem_bw, '='); + if (!tmp) + return 0; + *tmp = '\0'; + tmp++; + + if (virStrToLong_uip(mem_bw, NULL, 10, &id) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Invalid node id %u "), id); + return -1; + } + if (virStrToLong_uip(tmp, NULL, 10, &bandwidth) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Invalid bandwidth %u"), bandwidth); + return -1; + } + if (bandwidth < resctrl->membw_info->min_bandwidth || + id > resctrl->membw_info->max_id) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Missing or inconsistent resctrl info for " + "memory bandwidth node '%u'"), id); + return -1; + } + if (alloc->mem_bw->nbandwidths <= id && + VIR_EXPAND_N(alloc->mem_bw->bandwidths, alloc->mem_bw->nbandwidths, + id - alloc->mem_bw->nbandwidths + 1) < 0) { + return -1; + } + if (!alloc->mem_bw->bandwidths[id]) { + if (VIR_ALLOC(alloc->mem_bw->bandwidths[id]) < 0) + return -1; + } + + *(alloc->mem_bw->bandwidths[id]) = bandwidth; + return 0; +} + +
Ditto for schemata parse/format...
+static int +virResctrlAllocParseMemoryBandwidthLine(virResctrlInfoPtr resctrl, + virResctrlAllocPtr alloc, + char *line) +{ + char **mbs = NULL; + char *tmp = NULL; + size_t nmbs = 0; + size_t i; + int ret = -1; + + /* For no reason there can be spaces */ + virSkipSpaces((const char **) &line); + + if (STRNEQLEN(line, "MB", 2)) + return 0; + + if (!resctrl || !resctrl->membw_info || + !resctrl->membw_info->min_bandwidth || + !resctrl->membw_info->bandwidth_granularity) {
These checks seem out of place. If we get this far without !resctrl? Wow... And erroring out if !resctrl->membw_info even though virResctrlGetMemoryBandwidthInfo can succeed without it. As for the min_bandwidth and bandwidth_granularity checks - those would seem to be more appropriate in patch4 wouldn't they? when the values are read... If they're read as zero, then I assume that's bad ;-).
+ virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Missing or inconsistent resctrl info for " + "memory bandwidth allocation")); + } + + if (!alloc->mem_bw) { + if (VIR_ALLOC(alloc->mem_bw) < 0) + return -1; + } + + tmp = strchr(line, ':'); + if (!tmp) + return 0; + tmp++; + + mbs = virStringSplitCount(tmp, ";", 0, &nmbs); + if (nmbs == 0) + return 0; + + for (i = 0; i < nmbs; i++) { + if (virResctrlAllocParseProcessMemoryBandwidth(resctrl, alloc, mbs[i]) < 0) + goto cleanup; + } + ret = 0; + cleanup: + virStringListFree(mbs); + return ret; +} + + static int virResctrlAllocFormatCache(virResctrlAllocPtr alloc, virBufferPtr buf) { @@ -1013,6 +1305,11 @@ virResctrlAllocFormat(virResctrlAllocPtr alloc) return NULL; }
For schemata parse/format
+ if (virResctrlAllocMemoryBandwidthFormat(alloc, &buf) < 0) { + virBufferFreeAndReset(&buf); + return NULL; + } + return virBufferContentAndReset(&buf); }
@@ -1139,6 +1436,8 @@ virResctrlAllocParse(virResctrlInfoPtr resctrl,
lines = virStringSplitCount(schemata, "\n", 0, &nlines); for (i = 0; i < nlines; i++) {
For schemata parse/format
+ if (virResctrlAllocParseMemoryBandwidthLine(resctrl, alloc, lines[i]) < 0) + goto cleanup; if (virResctrlAllocParseCacheLine(resctrl, alloc, lines[i]) < 0) goto cleanup; } @@ -1273,6 +1572,22 @@ virResctrlAllocNewFromInfo(virResctrlInfoPtr info) } }
+ /* set default free memory bandwidth to 100%*/ + if (info->membw_info) { + if (VIR_ALLOC(ret->mem_bw) < 0) + goto error; + + if (VIR_EXPAND_N(ret->mem_bw->bandwidths, ret->mem_bw->nbandwidths, + info->membw_info->max_id + 1) < 0) + goto error; + + for (i = 0; i < ret->mem_bw->nbandwidths; i++) { + if (VIR_ALLOC(ret->mem_bw->bandwidths[i]) < 0) + goto error; + *(ret->mem_bw->bandwidths[i]) = 100; + } + } + cleanup: virBitmapFree(mask); return ret; @@ -1284,13 +1599,14 @@ virResctrlAllocNewFromInfo(virResctrlInfoPtr info)
/* * This function creates an allocation that represents all unused parts of all - * caches in the system. It uses virResctrlInfo for creating a new full - * allocation with all bits set (using virResctrlAllocNewFromInfo()) and then - * scans for all allocations under /sys/fs/resctrl and subtracts each one of - * them from it. That way it can then return an allocation with only bit set - * being those that are not mentioned in any other allocation. It is used for - * two things, a) calculating the masks when creating allocations and b) from - * tests. + * caches and memory bandwidth in the system. It uses virResctrlInfo for + * creating a new full allocation with all bits set (using + * virResctrlAllocNewFromInfo()), memory bandwidth 100% and then scans + * for all allocations under /sys/fs/resctrl and subtracts each one of them + * from it. That way it can then return an allocation with only bit set + * being those that are not mentioned in any other allocation for CAT and + * available memory bandwidth for MBA. It is used for two things, a) calculating + * the masks and bandwidth available when creating allocations and b) from tests. */ virResctrlAllocPtr virResctrlAllocGetUnused(virResctrlInfoPtr resctrl) @@ -1336,6 +1652,7 @@ virResctrlAllocGetUnused(virResctrlInfoPtr resctrl) goto error; }
Need to think about this, but perhaps would be appropriate for schemata format/parse, but could be something for that 3rd patch out of this one.
+ virResctrlMemoryBandwidthSubtract(ret, alloc); virResctrlAllocSubtract(ret, alloc); virObjectUnref(alloc); alloc = NULL; @@ -1526,8 +1843,8 @@ virResctrlAllocCopyMasks(virResctrlAllocPtr dst,
/* * This function is called when creating an allocation in the system. What it - * does is that it gets all the unused bits using virResctrlAllocGetUnused() and - * then tries to find a proper space for every requested allocation effectively + * does is that it gets all the unused resources using virResctrlAllocGetUnused() + * and then tries to find a proper space for every requested allocation effectively * transforming `sizes` into `masks`. */ static int @@ -1547,6 +1864,9 @@ virResctrlAllocAssign(virResctrlInfoPtr resctrl, if (!alloc_default) goto cleanup;
Still need to come to grips with the 'best place' for this - perhaps a 3rd patch.
+ if (virResctrlAllocMemoryBandwidth(resctrl, alloc, alloc_free) < 0) + goto cleanup; + if (virResctrlAllocCopyMasks(alloc, alloc_default) < 0) goto cleanup;
Let's see whether I can help you split these up a bit. There's a few things to answer from what I looked at. So far nothing is necessarily wrong, it's just the splitting of things that makes it easier to think about. The split you did has certainly helped thus far though. John I'll keep plugging away tomorrow.
diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h index d657c06..d43fd31 100644 --- a/src/util/virresctrl.h +++ b/src/util/virresctrl.h @@ -73,6 +73,10 @@ typedef int virResctrlAllocForeachCacheCallback(unsigned int level, unsigned long long size, void *opaque);
+typedef int virResctrlAllocForeachMemoryCallback(unsigned int id, + unsigned int size, + void *opaque); + virResctrlAllocPtr virResctrlAllocNew(void);
@@ -85,6 +89,15 @@ virResctrlAllocSetCacheSize(virResctrlAllocPtr alloc, virCacheType type, unsigned int cache, unsigned long long size); +int +virResctrlAllocForeachMemory(virResctrlAllocPtr resctrl, + virResctrlAllocForeachMemoryCallback cb, + void *opaque); + +int +virResctrlSetMemoryBandwidth(virResctrlAllocPtr alloc, + unsigned int id, + unsigned int memory_bandwidth);
int virResctrlAllocForeachCache(virResctrlAllocPtr alloc,

On 2018年07月26日 06:37, John Ferlan wrote:
On 07/18/2018 03:57 AM, bing.niu@intel.com wrote:
From: Bing Niu <bing.niu@intel.com>
Add memory bandwidth allocation support to virresctrl class. Introducing virResctrlAllocMemBW which is used for allocating memory bandwidth. Following virResctrlAllocPerType, it also employs a nested sparse array to indicate whether allocation is available for particular last level cache.
Signed-off-by: Bing Niu <bing.niu@intel.com> --- src/util/virresctrl.c | 346 ++++++++++++++++++++++++++++++++++++++++++++++++-- src/util/virresctrl.h | 13 ++ 2 files changed, 346 insertions(+), 13 deletions(-)
diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index 06e2702..bec2afd 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -36,9 +36,9 @@ VIR_LOG_INIT("util.virresctrl")
/* Resctrl is short for Resource Control. It might be implemented for various - * resources, but at the time of this writing this is only supported for cache - * allocation technology (aka CAT). Hence the reson for leaving 'Cache' out of - * all the structure and function names for now (can be added later if needed. + * resources. Currently this supports cache allocation technology (aka CAT) and + * memory bandwidth allocation (aka MBA). More resources technologies may be + * added in feature.
s/feature/the future/
Will change.
*/
[....]
virObject parent;
virResctrlAllocPerLevelPtr *levels; size_t nlevels;
+ virResctrlAllocMemBWPtr mem_bw; + /* The identifier (any unique string for now) */ char *id; /* libvirt-generated path in /sys/fs/resctrl for this particular @@ -275,6 +302,13 @@ virResctrlAllocDispose(void *obj) VIR_FREE(level); }
+ if (alloc->mem_bw) { + virResctrlAllocMemBWPtr mem_bw = alloc->mem_bw; + for (i = 0; i < mem_bw->nbandwidths; i++) + VIR_FREE(mem_bw->bandwidths[i]); + } + + VIR_FREE(alloc->mem_bw);
NIT: Could be inside the if condition.
OK~
VIR_FREE(alloc->id); VIR_FREE(alloc->path); VIR_FREE(alloc->levels); @@ -697,6 +731,9 @@ virResctrlAllocIsEmpty(virResctrlAllocPtr alloc) if (!alloc) return true;
+ if (alloc->mem_bw) + return false; + for (i = 0; i < alloc->nlevels; i++) { virResctrlAllocPerLevelPtr a_level = alloc->levels[i];
@@ -890,6 +927,27 @@ virResctrlAllocSetCacheSize(virResctrlAllocPtr alloc,
int +virResctrlAllocForeachMemory(virResctrlAllocPtr alloc, + virResctrlAllocForeachMemoryCallback cb, + void *opaque) +{ + size_t i = 0; + + if (!alloc) + return 0; + + if (alloc->mem_bw) { + virResctrlAllocMemBWPtr mem_bw = alloc->mem_bw; + for (i = 0; i < mem_bw->nbandwidths; i++) + if (mem_bw->bandwidths[i]) + cb(i, *mem_bw->bandwidths[i], opaque);
@cb is an int function, but only ever used in what amounts to a void function in patch8 (virDomainMemorytuneDefFormatHelper - although defined as int, it only ever returns 0, so it could be void too).
So either this changes to handle that or we change the *Callback to be void. Do you have a preference?
Let's add a return value testing in patch8. I think we should change cachetune virDomainCachetuneDefFormatHelper part also. Since it also doesn't check return value of virResctrlAllocForeachCache. IMO, Giving a return value of function is always a good practice. ;) It's also give some spaces for the future extension. If logic inside @cb became complex. :)
+ } + + return 0; +} + + +int virResctrlAllocForeachCache(virResctrlAllocPtr alloc, virResctrlAllocForeachCacheCallback cb, void *opaque) @@ -952,6 +1010,240 @@ virResctrlAllocGetID(virResctrlAllocPtr alloc) }
The next hunk of changes has perhaps more to do with the functionality of the code vs. the pure alloc/free of the structures. I think that alloc/free needs to be split from Parse/Format - it'll make things easier. Then there's Parse of XML vs Parse of the schemata for the MB: lines which just jumbles things (in my mind ;-))
This particular change may even be a 3rd patch... Keep reading ;-)
Yes. I just realize put everything relating with ResctrlAlloc into one patch will give difficulties to the reviewer. I should split this patch. So how about we split this patch into: 1. introduce virResctrlAllocMemBW and its alloc and free part 2. introduce schemata file parsing part for MBA. 3. introduce functions used to calculate how much memory bandwidth is used (that is find all groups already create) 4. introduce interface function used to set memory bandwidth (virResctrlSetMemoryBandwidth). 5. introduce logic create resctrl group with memory bandwidth.
+static void +virResctrlMemoryBandwidthSubtract(virResctrlAllocPtr free, + virResctrlAllocPtr used) +{ + size_t i; + + if (!used->mem_bw) + return; + + for (i = 0; i < used->mem_bw->nbandwidths; i++) { + if (used->mem_bw->bandwidths[i]) + *(free->mem_bw->bandwidths[i]) -= *(used->mem_bw->bandwidths[i]); + } +} + +
Might be nice to put some comments here. Since this is a backend of sorts for the XML Parse API from patch8, let's wait to introduce it there. That is it moves to patch8.
I will add some comments here. And put into a separated patch. Since patch8 is used to parse XML and create resctrl with MBA information purely. So it's better we don't introduce this interface function together with XML parse. We can introduce interface first and wait our consumer patch8. :) How do you think?
+int +virResctrlSetMemoryBandwidth(virResctrlAllocPtr alloc, + unsigned int id, + unsigned int memory_bandwidth) +{ + virResctrlAllocMemBWPtr mem_bw = alloc->mem_bw; + + if (!mem_bw) { + if (VIR_ALLOC(mem_bw) < 0) + return -1; + alloc->mem_bw = mem_bw; + } + + if (mem_bw->nbandwidths <= id && + VIR_EXPAND_N(mem_bw->bandwidths, mem_bw->nbandwidths, + id - mem_bw->nbandwidths + 1) < 0) + return -1; + + if (mem_bw->bandwidths[id]) { + virReportError(VIR_ERR_XML_ERROR, + _("Memory Bandwidth already defined for node %u"), + id); + return -1; + } + + if (VIR_ALLOC(mem_bw->bandwidths[id]) < 0) + return -1; + + *(mem_bw->bandwidths[id]) = memory_bandwidth; + return 0; +} + +
Seeing Format and Parse API's was confusing until I remembered that these are for the schemata files. I think perhaps this section could have been commented a bit more to indicate what it's for.
Will add comments and put this into a dedicated patch with other schemata file parse and formating functions.
+static int +virResctrlAllocMemoryBandwidthFormat(virResctrlAllocPtr alloc, + virBufferPtr buf) +{ + size_t i; + + if (!alloc->mem_bw) + return 0; + + virBufferAddLit(buf, "MB:"); + + for (i = 0; i < alloc->mem_bw->nbandwidths; i++) { + if (alloc->mem_bw->bandwidths[i]) { + virBufferAsprintf(buf, "%zd=%u;", i, + *(alloc->mem_bw->bandwidths[i])); + } + } + + virBufferTrim(buf, ";", 1); + virBufferAddChar(buf, '\n'); + if (virBufferCheckError(buf) < 0) + return -1; + else + return 0;
return virBufferCheckError(buf);
OK.
+} + +
This API is the backend of the virResctrlAllocAssign called during the qemuProcessResctrlCreate processing - so it may be a third patch that can be extracted from this one. I need to look a bit harder though. Similarly the *Subtract API may fall into this category.
Yes~
+static int +virResctrlAllocMemoryBandwidth(virResctrlInfoPtr resctrl, + virResctrlAllocPtr alloc, + virResctrlAllocPtr free) +{ + size_t i; + virResctrlAllocMemBWPtr mem_bw_alloc = alloc->mem_bw; + virResctrlAllocMemBWPtr mem_bw_free = free->mem_bw; + virResctrlInfoMemBWPtr mem_bw_info = resctrl->membw_info; + + if (!mem_bw_alloc) + return 0;
So can we even have one of these if resctrl->membw_info == NULL? The ->mem_bw is allocated in virResctrlSetMemoryBandwidth, which is only defined, but not called. It will be called later in patch8 from a Parse API I see.
Yes. If host doesn't have MBA feature, however someone try to allocate memory bandwidth for a VM by mistake. In such a case, we will have resctrl->membw_info == NULL && ->mem_bw != NULL Then below check will throw error.
+ + if (mem_bw_alloc && !mem_bw_info) {
In virResctrlGetMemoryBandwidthInfo it's "OK" if bandwidth_gran is not present - a VIR_INFO is generated, the resctrl->membw_info is left as NULL, and we return 0.
So failing here would seemingly be quite unexpected, wouldn't it?
Yes! you are right. Maybe someone try to allocate memory bandwidth on a host without MBA support.
+ virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("RDT Memory Bandwidth allocation " + "unsupported")); + return -1; + } + + for (i = 0; i < mem_bw_alloc->nbandwidths; i++) { + if (!mem_bw_alloc->bandwidths[i]) + continue; + + if (*(mem_bw_alloc->bandwidths[i]) % mem_bw_info->bandwidth_granularity) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Memory Bandwidth allocation of size " + "%u is not divisible by granularity %u"), + *(mem_bw_alloc->bandwidths[i]), + mem_bw_info->bandwidth_granularity); + return -1; + } + if (*(mem_bw_alloc->bandwidths[i]) < mem_bw_info->min_bandwidth) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Memory Bandwidth allocation of size " + "%u is smaller than the minimum " + "allowed allocation %u"), + *(mem_bw_alloc->bandwidths[i]), + mem_bw_info->min_bandwidth); + return -1; + } + if (i > mem_bw_info->max_id) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("bandwidth controller %zd not exist, "
s/%zd/id %zd/
s/not/does not/
Will update.
+ "max controller id %u"), + i, mem_bw_info->max_id); + return -1; + } + if (*(mem_bw_alloc->bandwidths[i]) > *(mem_bw_free->bandwidths[i])) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Not enough room for allocation of %u%% " + "bandwidth on node %zd, available bandwidth %u%%"), + *(mem_bw_alloc->bandwidths[i]), i, + *(mem_bw_free->bandwidths[i])); + return -1; + } + }
So this is just a Check or Validate type function? Should it be renamed or should it actually do something.
er.... from code, it seems just a validate function to grantee enough memory bandwidth available. We can think this function is allocating memory bandwidth from free_memory_bandwidth. But the free_memory_bandwidth is calculated by 100 - sum(memory bandwidth of existing resctrl group). So there is no any management structure used to track free memory bandwidth as other "allocate" or "free" functions, eg memory manange code. Maybe function name need to be adjust. I am not good at naming :(. Do you have a suggestion?
+ return 0; +} + +
More stuff for the schemata parse/format
+static int +virResctrlAllocParseProcessMemoryBandwidth(virResctrlInfoPtr resctrl, + virResctrlAllocPtr alloc, + char *mem_bw) +{ + unsigned int bandwidth; + unsigned int id; + char *tmp = NULL; + + tmp = strchr(mem_bw, '='); + if (!tmp) + return 0; + *tmp = '\0'; + tmp++; + + if (virStrToLong_uip(mem_bw, NULL, 10, &id) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Invalid node id %u "), id); + return -1; + } + if (virStrToLong_uip(tmp, NULL, 10, &bandwidth) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Invalid bandwidth %u"), bandwidth); + return -1; + } + if (bandwidth < resctrl->membw_info->min_bandwidth || + id > resctrl->membw_info->max_id) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Missing or inconsistent resctrl info for " + "memory bandwidth node '%u'"), id); + return -1; + } + if (alloc->mem_bw->nbandwidths <= id && + VIR_EXPAND_N(alloc->mem_bw->bandwidths, alloc->mem_bw->nbandwidths, + id - alloc->mem_bw->nbandwidths + 1) < 0) { + return -1; + } + if (!alloc->mem_bw->bandwidths[id]) { + if (VIR_ALLOC(alloc->mem_bw->bandwidths[id]) < 0) + return -1; + } + + *(alloc->mem_bw->bandwidths[id]) = bandwidth; + return 0; +} + +
Ditto for schemata parse/format...
+static int +virResctrlAllocParseMemoryBandwidthLine(virResctrlInfoPtr resctrl, + virResctrlAllocPtr alloc, + char *line) +{ + char **mbs = NULL; + char *tmp = NULL; + size_t nmbs = 0; + size_t i; + int ret = -1; + + /* For no reason there can be spaces */ + virSkipSpaces((const char **) &line); + + if (STRNEQLEN(line, "MB", 2)) + return 0; + + if (!resctrl || !resctrl->membw_info || + !resctrl->membw_info->min_bandwidth || + !resctrl->membw_info->bandwidth_granularity) {
These checks seem out of place. If we get this far without !resctrl? Wow... And erroring out if !resctrl->membw_info even though virResctrlGetMemoryBandwidthInfo can succeed without it.
As for the min_bandwidth and bandwidth_granularity checks - those would seem to be more appropriate in patch4 wouldn't they? when the values are read... If they're read as zero, then I assume that's bad ;-).
Yes. Such thing should not happen. Above check means find a resctrl group with memory bandwidth information on a host without MBA support. Although it is harmless, we can remove this if you like. min_bandwidth and bandwidth_granularity checks should be part of this patch I think. Those parameters are used for sanity check for memory bandwidth allocation, though they are read in patch 4. Do you agree? :)
+ virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Missing or inconsistent resctrl info for " + "memory bandwidth allocation")); + } + + if (!alloc->mem_bw) { + if (VIR_ALLOC(alloc->mem_bw) < 0) + return -1; + } + + tmp = strchr(line, ':'); + if (!tmp) + return 0; + tmp++; + + mbs = virStringSplitCount(tmp, ";", 0, &nmbs); + if (nmbs == 0) + return 0; + + for (i = 0; i < nmbs; i++) { + if (virResctrlAllocParseProcessMemoryBandwidth(resctrl, alloc, mbs[i]) < 0) + goto cleanup; + } + ret = 0; + cleanup: + virStringListFree(mbs); + return ret; +} + + static int virResctrlAllocFormatCache(virResctrlAllocPtr alloc, virBufferPtr buf) { @@ -1013,6 +1305,11 @@ virResctrlAllocFormat(virResctrlAllocPtr alloc) return NULL; }
For schemata parse/format
+ if (virResctrlAllocMemoryBandwidthFormat(alloc, &buf) < 0) { + virBufferFreeAndReset(&buf); + return NULL; + } + return virBufferContentAndReset(&buf); }
@@ -1139,6 +1436,8 @@ virResctrlAllocParse(virResctrlInfoPtr resctrl,
lines = virStringSplitCount(schemata, "\n", 0, &nlines); for (i = 0; i < nlines; i++) {
For schemata parse/format
+ if (virResctrlAllocParseMemoryBandwidthLine(resctrl, alloc, lines[i]) < 0) + goto cleanup; if (virResctrlAllocParseCacheLine(resctrl, alloc, lines[i]) < 0) goto cleanup; } @@ -1273,6 +1572,22 @@ virResctrlAllocNewFromInfo(virResctrlInfoPtr info) } }
+ /* set default free memory bandwidth to 100%*/ + if (info->membw_info) { + if (VIR_ALLOC(ret->mem_bw) < 0) + goto error; + + if (VIR_EXPAND_N(ret->mem_bw->bandwidths, ret->mem_bw->nbandwidths, + info->membw_info->max_id + 1) < 0) + goto error; + + for (i = 0; i < ret->mem_bw->nbandwidths; i++) { + if (VIR_ALLOC(ret->mem_bw->bandwidths[i]) < 0) + goto error; + *(ret->mem_bw->bandwidths[i]) = 100; + } + } + cleanup: virBitmapFree(mask); return ret; @@ -1284,13 +1599,14 @@ virResctrlAllocNewFromInfo(virResctrlInfoPtr info)
/* * This function creates an allocation that represents all unused parts of all - * caches in the system. It uses virResctrlInfo for creating a new full - * allocation with all bits set (using virResctrlAllocNewFromInfo()) and then - * scans for all allocations under /sys/fs/resctrl and subtracts each one of - * them from it. That way it can then return an allocation with only bit set - * being those that are not mentioned in any other allocation. It is used for - * two things, a) calculating the masks when creating allocations and b) from - * tests. + * caches and memory bandwidth in the system. It uses virResctrlInfo for + * creating a new full allocation with all bits set (using + * virResctrlAllocNewFromInfo()), memory bandwidth 100% and then scans + * for all allocations under /sys/fs/resctrl and subtracts each one of them + * from it. That way it can then return an allocation with only bit set + * being those that are not mentioned in any other allocation for CAT and + * available memory bandwidth for MBA. It is used for two things, a) calculating + * the masks and bandwidth available when creating allocations and b) from tests. */ virResctrlAllocPtr virResctrlAllocGetUnused(virResctrlInfoPtr resctrl) @@ -1336,6 +1652,7 @@ virResctrlAllocGetUnused(virResctrlInfoPtr resctrl) goto error; }
Need to think about this, but perhaps would be appropriate for schemata format/parse, but could be something for that 3rd patch out of this one.
+ virResctrlMemoryBandwidthSubtract(ret, alloc); virResctrlAllocSubtract(ret, alloc); virObjectUnref(alloc); alloc = NULL; @@ -1526,8 +1843,8 @@ virResctrlAllocCopyMasks(virResctrlAllocPtr dst,
/* * This function is called when creating an allocation in the system. What it - * does is that it gets all the unused bits using virResctrlAllocGetUnused() and - * then tries to find a proper space for every requested allocation effectively + * does is that it gets all the unused resources using virResctrlAllocGetUnused() + * and then tries to find a proper space for every requested allocation effectively * transforming `sizes` into `masks`. */ static int @@ -1547,6 +1864,9 @@ virResctrlAllocAssign(virResctrlInfoPtr resctrl, if (!alloc_default) goto cleanup;
Still need to come to grips with the 'best place' for this - perhaps a 3rd patch.
+ if (virResctrlAllocMemoryBandwidth(resctrl, alloc, alloc_free) < 0) + goto cleanup; + if (virResctrlAllocCopyMasks(alloc, alloc_default) < 0) goto cleanup;
Let's see whether I can help you split these up a bit. There's a few things to answer from what I looked at. So far nothing is necessarily wrong, it's just the splitting of things that makes it easier to think about. The split you did has certainly helped thus far though.
John
I'll keep plugging away tomorrow.
Thanks for your review and help! :) I just want to echo again. We will split this patch as: 1. introduce virResctrlAllocMemBW and its alloc and free part 2. introduce schemata file parsing part for MBA. 3. introduce functions used to calculate how much memory bandwidth is used (that is find all groups already create) 4. introduce interface function used to set memory bandwidth (virResctrlSetMemoryBandwidth). 5. introduce logic create resctrl group with memory bandwidth. I may do some adjustment(merge or split above 5 patch) when try to split this patch. Anyway, above 5 sub-patches are what we try to achive. How do you think? Bing
diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h index d657c06..d43fd31 100644 --- a/src/util/virresctrl.h +++ b/src/util/virresctrl.h @@ -73,6 +73,10 @@ typedef int virResctrlAllocForeachCacheCallback(unsigned int level, unsigned long long size, void *opaque);
+typedef int virResctrlAllocForeachMemoryCallback(unsigned int id, + unsigned int size, + void *opaque); + virResctrlAllocPtr virResctrlAllocNew(void);
@@ -85,6 +89,15 @@ virResctrlAllocSetCacheSize(virResctrlAllocPtr alloc, virCacheType type, unsigned int cache, unsigned long long size); +int +virResctrlAllocForeachMemory(virResctrlAllocPtr resctrl, + virResctrlAllocForeachMemoryCallback cb, + void *opaque); + +int +virResctrlSetMemoryBandwidth(virResctrlAllocPtr alloc, + unsigned int id, + unsigned int memory_bandwidth);
int virResctrlAllocForeachCache(virResctrlAllocPtr alloc,

On 07/26/2018 02:00 AM, bing.niu wrote:
On 2018年07月26日 06:37, John Ferlan wrote:
On 07/18/2018 03:57 AM, bing.niu@intel.com wrote:
From: Bing Niu <bing.niu@intel.com>
Add memory bandwidth allocation support to virresctrl class. Introducing virResctrlAllocMemBW which is used for allocating memory bandwidth. Following virResctrlAllocPerType, it also employs a nested sparse array to indicate whether allocation is available for particular last level cache.
Signed-off-by: Bing Niu <bing.niu@intel.com> --- src/util/virresctrl.c | 346 ++++++++++++++++++++++++++++++++++++++++++++++++-- src/util/virresctrl.h | 13 ++ 2 files changed, 346 insertions(+), 13 deletions(-)
diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index 06e2702..bec2afd 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -36,9 +36,9 @@ VIR_LOG_INIT("util.virresctrl") /* Resctrl is short for Resource Control. It might be implemented for various - * resources, but at the time of this writing this is only supported for cache - * allocation technology (aka CAT). Hence the reson for leaving 'Cache' out of - * all the structure and function names for now (can be added later if needed. + * resources. Currently this supports cache allocation technology (aka CAT) and + * memory bandwidth allocation (aka MBA). More resources technologies may be + * added in feature.
s/feature/the future/
Will change.
*/
[....]
virObject parent; virResctrlAllocPerLevelPtr *levels; size_t nlevels; + virResctrlAllocMemBWPtr mem_bw; + /* The identifier (any unique string for now) */ char *id; /* libvirt-generated path in /sys/fs/resctrl for this particular @@ -275,6 +302,13 @@ virResctrlAllocDispose(void *obj) VIR_FREE(level); } + if (alloc->mem_bw) { + virResctrlAllocMemBWPtr mem_bw = alloc->mem_bw; + for (i = 0; i < mem_bw->nbandwidths; i++) + VIR_FREE(mem_bw->bandwidths[i]); + } + + VIR_FREE(alloc->mem_bw);
NIT: Could be inside the if condition.
OK~
VIR_FREE(alloc->id); VIR_FREE(alloc->path); VIR_FREE(alloc->levels); @@ -697,6 +731,9 @@ virResctrlAllocIsEmpty(virResctrlAllocPtr alloc) if (!alloc) return true; + if (alloc->mem_bw) + return false; + for (i = 0; i < alloc->nlevels; i++) { virResctrlAllocPerLevelPtr a_level = alloc->levels[i]; @@ -890,6 +927,27 @@ virResctrlAllocSetCacheSize(virResctrlAllocPtr alloc, int +virResctrlAllocForeachMemory(virResctrlAllocPtr alloc, + virResctrlAllocForeachMemoryCallback cb, + void *opaque) +{ + size_t i = 0; + + if (!alloc) + return 0; + + if (alloc->mem_bw) { + virResctrlAllocMemBWPtr mem_bw = alloc->mem_bw; + for (i = 0; i < mem_bw->nbandwidths; i++) + if (mem_bw->bandwidths[i]) + cb(i, *mem_bw->bandwidths[i], opaque);
@cb is an int function, but only ever used in what amounts to a void function in patch8 (virDomainMemorytuneDefFormatHelper - although defined as int, it only ever returns 0, so it could be void too).
So either this changes to handle that or we change the *Callback to be void. Do you have a preference?
Let's add a return value testing in patch8. I think we should change cachetune virDomainCachetuneDefFormatHelper part also. Since it also doesn't check return value of virResctrlAllocForeachCache.
IMO, Giving a return value of function is always a good practice. ;) It's also give some spaces for the future extension. If logic inside @cb became complex. :)
Yeah, typically those Foreach type function want to return something such that we stop processing in the event of some failure, no need to keep going and pile on the errors or have some unexpected success!
+ } + + return 0; +} + + +int virResctrlAllocForeachCache(virResctrlAllocPtr alloc, virResctrlAllocForeachCacheCallback cb, void *opaque) @@ -952,6 +1010,240 @@ virResctrlAllocGetID(virResctrlAllocPtr alloc) }
The next hunk of changes has perhaps more to do with the functionality of the code vs. the pure alloc/free of the structures. I think that alloc/free needs to be split from Parse/Format - it'll make things easier. Then there's Parse of XML vs Parse of the schemata for the MB: lines which just jumbles things (in my mind ;-))
This particular change may even be a 3rd patch... Keep reading ;-)
Yes. I just realize put everything relating with ResctrlAlloc into one patch will give difficulties to the reviewer. I should split this patch. So how about we split this patch into:
1. introduce virResctrlAllocMemBW and its alloc and free part 2. introduce schemata file parsing part for MBA. 3. introduce functions used to calculate how much memory bandwidth is used (that is find all groups already create) 4. introduce interface function used to set memory bandwidth (virResctrlSetMemoryBandwidth). 5. introduce logic create resctrl group with memory bandwidth.
So I've done some of this already before reading your response - I'll send what I have directly to you though. "So far" what I came up with from just this patch: util: Add MBA allocation to virresctrl -> Just the allocation and dispose changes util: Add MBA schemata parse and format methods -> Split out virResctrlAllocMemoryBandwidthFormat and virResctrlAllocParseMemoryBandwidthLine -> Changed the order of virResctrlAllocParse to L* first and MB second. It shouldn't matter, but it is the order it's formatted util: Add support to calculate MBA utilization -> Split out virResctrlMemoryBandwidthSubtract and virResctrlAllocMemoryBandwidth -> NB, for these two I moved them "closer" to their usage rather than mixed up in the Parse/Format methods above. -> I have a question for virResctrlMemoryBandwidthSubtract though. I noted that virResctrlAllocSubtract gets called twice - once before the while loop and once during the while loop. So should the same occur for the BandwidthSubtract? util: Introduce virResctrlAllocForeachMemory -> Split out just the one API -> Modify the code slightly to return 0 earlier if !alloc->mem_bw -> Modify the for loop to return -1 if @cb fails -> Add the API to src/libvirt_private.syms util: Introduce virResctrlAllocSetMemoryBandwidth -> All that was left over -> Rename from virResctrlSetMemoryBandwidth -> Move code to be "similar" to the virResctrlAllocSetCacheSize such that it's before the AllocForeach API -> Add the API to src/libvirt_private.syms
+static void +virResctrlMemoryBandwidthSubtract(virResctrlAllocPtr free, + virResctrlAllocPtr used) +{ + size_t i; + + if (!used->mem_bw) + return; + + for (i = 0; i < used->mem_bw->nbandwidths; i++) { + if (used->mem_bw->bandwidths[i]) + *(free->mem_bw->bandwidths[i]) -= *(used->mem_bw->bandwidths[i]); + } +} + +
Might be nice to put some comments here. Since this is a backend of sorts for the XML Parse API from patch8, let's wait to introduce it there. That is it moves to patch8.
I will add some comments here. And put into a separated patch. Since patch8 is used to parse XML and create resctrl with MBA information purely. So it's better we don't introduce this interface function together with XML parse. We can introduce interface first and wait our consumer patch8. :) How do you think?
Upon further reflection, introducing before is fine - you'll see in what I send to you.
+int +virResctrlSetMemoryBandwidth(virResctrlAllocPtr alloc, + unsigned int id, + unsigned int memory_bandwidth) +{ + virResctrlAllocMemBWPtr mem_bw = alloc->mem_bw; + + if (!mem_bw) { + if (VIR_ALLOC(mem_bw) < 0) + return -1; + alloc->mem_bw = mem_bw; + } + + if (mem_bw->nbandwidths <= id && + VIR_EXPAND_N(mem_bw->bandwidths, mem_bw->nbandwidths, + id - mem_bw->nbandwidths + 1) < 0) + return -1; + + if (mem_bw->bandwidths[id]) { + virReportError(VIR_ERR_XML_ERROR, + _("Memory Bandwidth already defined for node %u"), + id); + return -1; + } + + if (VIR_ALLOC(mem_bw->bandwidths[id]) < 0) + return -1; + + *(mem_bw->bandwidths[id]) = memory_bandwidth; + return 0; +} + +
Seeing Format and Parse API's was confusing until I remembered that these are for the schemata files. I think perhaps this section could have been commented a bit more to indicate what it's for.
Will add comments and put this into a dedicated patch with other schemata file parse and formating functions.
+static int +virResctrlAllocMemoryBandwidthFormat(virResctrlAllocPtr alloc, + virBufferPtr buf) +{ + size_t i; + + if (!alloc->mem_bw) + return 0; + + virBufferAddLit(buf, "MB:"); + + for (i = 0; i < alloc->mem_bw->nbandwidths; i++) { + if (alloc->mem_bw->bandwidths[i]) { + virBufferAsprintf(buf, "%zd=%u;", i, + *(alloc->mem_bw->bandwidths[i])); + } + } + + virBufferTrim(buf, ";", 1); + virBufferAddChar(buf, '\n'); + if (virBufferCheckError(buf) < 0) + return -1; + else + return 0;
return virBufferCheckError(buf);
OK.
+} + +
This API is the backend of the virResctrlAllocAssign called during the qemuProcessResctrlCreate processing - so it may be a third patch that can be extracted from this one. I need to look a bit harder though. Similarly the *Subtract API may fall into this category.
Yes~
+static int +virResctrlAllocMemoryBandwidth(virResctrlInfoPtr resctrl, + virResctrlAllocPtr alloc, + virResctrlAllocPtr free) +{ + size_t i; + virResctrlAllocMemBWPtr mem_bw_alloc = alloc->mem_bw; + virResctrlAllocMemBWPtr mem_bw_free = free->mem_bw; + virResctrlInfoMemBWPtr mem_bw_info = resctrl->membw_info; + + if (!mem_bw_alloc) + return 0;
So can we even have one of these if resctrl->membw_info == NULL? The ->mem_bw is allocated in virResctrlSetMemoryBandwidth, which is only defined, but not called. It will be called later in patch8 from a Parse API I see.
Yes. If host doesn't have MBA feature, however someone try to allocate memory bandwidth for a VM by mistake. In such a case, we will have resctrl->membw_info == NULL && ->mem_bw != NULL Then below check will throw error.
+ + if (mem_bw_alloc && !mem_bw_info) {
In virResctrlGetMemoryBandwidthInfo it's "OK" if bandwidth_gran is not present - a VIR_INFO is generated, the resctrl->membw_info is left as NULL, and we return 0.
So failing here would seemingly be quite unexpected, wouldn't it?
Yes! you are right. Maybe someone try to allocate memory bandwidth on a host without MBA support.
I didn't change any of the error scenarios in what I'm sending to you... Just be sure to add a few more comments about what's being checked. Since everything was packed into one patch as I'm reviewing I'm thinking, is this the internal or external caller.
+ virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("RDT Memory Bandwidth allocation " + "unsupported")); + return -1; + } + + for (i = 0; i < mem_bw_alloc->nbandwidths; i++) { + if (!mem_bw_alloc->bandwidths[i]) + continue; + + if (*(mem_bw_alloc->bandwidths[i]) % mem_bw_info->bandwidth_granularity) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Memory Bandwidth allocation of size " + "%u is not divisible by granularity %u"), + *(mem_bw_alloc->bandwidths[i]), + mem_bw_info->bandwidth_granularity); + return -1; + } + if (*(mem_bw_alloc->bandwidths[i]) < mem_bw_info->min_bandwidth) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Memory Bandwidth allocation of size " + "%u is smaller than the minimum " + "allowed allocation %u"), + *(mem_bw_alloc->bandwidths[i]), + mem_bw_info->min_bandwidth); + return -1; + } + if (i > mem_bw_info->max_id) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("bandwidth controller %zd not exist, "
s/%zd/id %zd/
s/not/does not/
Will update.
+ "max controller id %u"), + i, mem_bw_info->max_id); + return -1; + } + if (*(mem_bw_alloc->bandwidths[i]) > *(mem_bw_free->bandwidths[i])) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Not enough room for allocation of %u%% " + "bandwidth on node %zd, available bandwidth %u%%"), + *(mem_bw_alloc->bandwidths[i]), i, + *(mem_bw_free->bandwidths[i])); + return -1; + } + }
So this is just a Check or Validate type function? Should it be renamed or should it actually do something.
er.... from code, it seems just a validate function to grantee enough memory bandwidth available. We can think this function is allocating memory bandwidth from free_memory_bandwidth. But the free_memory_bandwidth is calculated by 100 - sum(memory bandwidth of existing resctrl group). So there is no any management structure used to track free memory bandwidth as other "allocate" or "free" functions, eg memory manange code. Maybe function name need to be adjust. I am not good at naming :(. Do you have a suggestion?
As another Red Hat libvirt engineer says, "Naming is hard". Problem is there's a lot of opinions about the perfect name. If this is a Check or Validation function, then let's use that in the name. As it is now as virResctrlAllocMemoryBandwidth is just too generic, using the virResctrlAlloc 'prefix', perhaps an adjustment to virResctrlAllocCheckMemoryBandwidth is sufficient with of course a few extra comments about it's purpose.
+ return 0; +} + +
More stuff for the schemata parse/format
+static int +virResctrlAllocParseProcessMemoryBandwidth(virResctrlInfoPtr resctrl, + virResctrlAllocPtr alloc, + char *mem_bw) +{ + unsigned int bandwidth; + unsigned int id; + char *tmp = NULL; + + tmp = strchr(mem_bw, '='); + if (!tmp) + return 0; + *tmp = '\0'; + tmp++; + + if (virStrToLong_uip(mem_bw, NULL, 10, &id) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Invalid node id %u "), id); + return -1; + } + if (virStrToLong_uip(tmp, NULL, 10, &bandwidth) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Invalid bandwidth %u"), bandwidth); + return -1; + } + if (bandwidth < resctrl->membw_info->min_bandwidth || + id > resctrl->membw_info->max_id) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Missing or inconsistent resctrl info for " + "memory bandwidth node '%u'"), id); + return -1; + } + if (alloc->mem_bw->nbandwidths <= id && + VIR_EXPAND_N(alloc->mem_bw->bandwidths, alloc->mem_bw->nbandwidths, + id - alloc->mem_bw->nbandwidths + 1) < 0) { + return -1; + } + if (!alloc->mem_bw->bandwidths[id]) { + if (VIR_ALLOC(alloc->mem_bw->bandwidths[id]) < 0) + return -1; + } + + *(alloc->mem_bw->bandwidths[id]) = bandwidth; + return 0; +} + +
Ditto for schemata parse/format...
+static int +virResctrlAllocParseMemoryBandwidthLine(virResctrlInfoPtr resctrl, + virResctrlAllocPtr alloc, + char *line) +{ + char **mbs = NULL; + char *tmp = NULL; + size_t nmbs = 0; + size_t i; + int ret = -1; + + /* For no reason there can be spaces */ + virSkipSpaces((const char **) &line); + + if (STRNEQLEN(line, "MB", 2)) + return 0; + + if (!resctrl || !resctrl->membw_info || + !resctrl->membw_info->min_bandwidth || + !resctrl->membw_info->bandwidth_granularity) {
These checks seem out of place. If we get this far without !resctrl? Wow... And erroring out if !resctrl->membw_info even though virResctrlGetMemoryBandwidthInfo can succeed without it.
As for the min_bandwidth and bandwidth_granularity checks - those would seem to be more appropriate in patch4 wouldn't they? when the values are read... If they're read as zero, then I assume that's bad ;-).
Yes. Such thing should not happen. Above check means find a resctrl group with memory bandwidth information on a host without MBA support. Although it is harmless, we can remove this if you like. min_bandwidth and bandwidth_granularity checks should be part of this patch I think. Those parameters are used for sanity check for memory bandwidth allocation, though they are read in patch 4. Do you agree? :)
Yeah, I think parameter value validation is fine at read time. I know there's other code that does that. I didn't check if the CAT code did that though. I think in the long term you need to decide - if I found an MB: line in the schemata file, then can it even be possible that membw_info is NULL. I suppose it is an error - again with all the code being added together in one patch, it was "confusing" to consider the Parse/Format of the schemata file. I suppose in the back of my mind I had he parse/format of the XML file...
+ virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Missing or inconsistent resctrl info for " + "memory bandwidth allocation")); + } + + if (!alloc->mem_bw) { + if (VIR_ALLOC(alloc->mem_bw) < 0) + return -1; + } + + tmp = strchr(line, ':'); + if (!tmp) + return 0; + tmp++; + + mbs = virStringSplitCount(tmp, ";", 0, &nmbs); + if (nmbs == 0) + return 0; + + for (i = 0; i < nmbs; i++) { + if (virResctrlAllocParseProcessMemoryBandwidth(resctrl, alloc, mbs[i]) < 0) + goto cleanup; + } + ret = 0; + cleanup: + virStringListFree(mbs); + return ret; +} + + static int virResctrlAllocFormatCache(virResctrlAllocPtr alloc, virBufferPtr buf) { @@ -1013,6 +1305,11 @@ virResctrlAllocFormat(virResctrlAllocPtr alloc) return NULL; }
For schemata parse/format
+ if (virResctrlAllocMemoryBandwidthFormat(alloc, &buf) < 0) { + virBufferFreeAndReset(&buf); + return NULL; + } + return virBufferContentAndReset(&buf); } @@ -1139,6 +1436,8 @@ virResctrlAllocParse(virResctrlInfoPtr resctrl, lines = virStringSplitCount(schemata, "\n", 0, &nlines); for (i = 0; i < nlines; i++) {
For schemata parse/format
+ if (virResctrlAllocParseMemoryBandwidthLine(resctrl, alloc, lines[i]) < 0) + goto cleanup; if (virResctrlAllocParseCacheLine(resctrl, alloc, lines[i]) < 0) goto cleanup; } @@ -1273,6 +1572,22 @@ virResctrlAllocNewFromInfo(virResctrlInfoPtr info) } } + /* set default free memory bandwidth to 100%*/ + if (info->membw_info) { + if (VIR_ALLOC(ret->mem_bw) < 0) + goto error; + + if (VIR_EXPAND_N(ret->mem_bw->bandwidths, ret->mem_bw->nbandwidths, + info->membw_info->max_id + 1) < 0) + goto error; + + for (i = 0; i < ret->mem_bw->nbandwidths; i++) { + if (VIR_ALLOC(ret->mem_bw->bandwidths[i]) < 0) + goto error; + *(ret->mem_bw->bandwidths[i]) = 100; + } + } + cleanup: virBitmapFree(mask); return ret; @@ -1284,13 +1599,14 @@ virResctrlAllocNewFromInfo(virResctrlInfoPtr info) /* * This function creates an allocation that represents all unused parts of all - * caches in the system. It uses virResctrlInfo for creating a new full - * allocation with all bits set (using virResctrlAllocNewFromInfo()) and then - * scans for all allocations under /sys/fs/resctrl and subtracts each one of - * them from it. That way it can then return an allocation with only bit set - * being those that are not mentioned in any other allocation. It is used for - * two things, a) calculating the masks when creating allocations and b) from - * tests. + * caches and memory bandwidth in the system. It uses virResctrlInfo for + * creating a new full allocation with all bits set (using + * virResctrlAllocNewFromInfo()), memory bandwidth 100% and then scans + * for all allocations under /sys/fs/resctrl and subtracts each one of them + * from it. That way it can then return an allocation with only bit set + * being those that are not mentioned in any other allocation for CAT and + * available memory bandwidth for MBA. It is used for two things, a) calculating + * the masks and bandwidth available when creating allocations and b) from tests. */ virResctrlAllocPtr virResctrlAllocGetUnused(virResctrlInfoPtr resctrl) @@ -1336,6 +1652,7 @@ virResctrlAllocGetUnused(virResctrlInfoPtr resctrl) goto error; }
Need to think about this, but perhaps would be appropriate for schemata format/parse, but could be something for that 3rd patch out of this one.
+ virResctrlMemoryBandwidthSubtract(ret, alloc); virResctrlAllocSubtract(ret, alloc); virObjectUnref(alloc); alloc = NULL; @@ -1526,8 +1843,8 @@ virResctrlAllocCopyMasks(virResctrlAllocPtr dst, /* * This function is called when creating an allocation in the system. What it - * does is that it gets all the unused bits using virResctrlAllocGetUnused() and - * then tries to find a proper space for every requested allocation effectively + * does is that it gets all the unused resources using virResctrlAllocGetUnused() + * and then tries to find a proper space for every requested allocation effectively * transforming `sizes` into `masks`. */ static int @@ -1547,6 +1864,9 @@ virResctrlAllocAssign(virResctrlInfoPtr resctrl, if (!alloc_default) goto cleanup;
Still need to come to grips with the 'best place' for this - perhaps a 3rd patch.
+ if (virResctrlAllocMemoryBandwidth(resctrl, alloc, alloc_free) < 0) + goto cleanup; + if (virResctrlAllocCopyMasks(alloc, alloc_default) < 0) goto cleanup;
Let's see whether I can help you split these up a bit. There's a few things to answer from what I looked at. So far nothing is necessarily wrong, it's just the splitting of things that makes it easier to think about. The split you did has certainly helped thus far though.
John
I'll keep plugging away tomorrow.
Thanks for your review and help! :) I just want to echo again. We will split this patch as:
1. introduce virResctrlAllocMemBW and its alloc and free part 2. introduce schemata file parsing part for MBA. 3. introduce functions used to calculate how much memory bandwidth is used (that is find all groups already create) 4. introduce interface function used to set memory bandwidth (virResctrlSetMemoryBandwidth). 5. introduce logic create resctrl group with memory bandwidth.
I may do some adjustment(merge or split above 5 patch) when try to split this patch. Anyway, above 5 sub-patches are what we try to achive. How do you think?
Bing
I'll send you what I have "so far" shortly. There are some formatting adjustments included, but for the most part it's taking this one patch and splitting it before the subsequent rename patch. The only other adjustment was to use the "new" name for the SetMemoryBandwidth John
diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h index d657c06..d43fd31 100644 --- a/src/util/virresctrl.h +++ b/src/util/virresctrl.h @@ -73,6 +73,10 @@ typedef int virResctrlAllocForeachCacheCallback(unsigned int level, unsigned long long size, void *opaque); +typedef int virResctrlAllocForeachMemoryCallback(unsigned int id, + unsigned int size, + void *opaque); + virResctrlAllocPtr virResctrlAllocNew(void); @@ -85,6 +89,15 @@ virResctrlAllocSetCacheSize(virResctrlAllocPtr alloc, virCacheType type, unsigned int cache, unsigned long long size); +int +virResctrlAllocForeachMemory(virResctrlAllocPtr resctrl, + virResctrlAllocForeachMemoryCallback cb, + void *opaque); + +int +virResctrlSetMemoryBandwidth(virResctrlAllocPtr alloc, + unsigned int id, + unsigned int memory_bandwidth); int virResctrlAllocForeachCache(virResctrlAllocPtr alloc,

From: Bing Niu <bing.niu@intel.com> Resctrl not only supports cache tuning, but also memory bandwidth tuning. Renaming cachetune to restune(resource tuning) to reflect that. With restune, all allocation for different resources (cache, memory bandwidth) are aggregated and represented by a virResctrlAllocPtr inside virDomainRestuneDef. Signed-off-by: Bing Niu <bing.niu@intel.com> --- src/conf/domain_conf.c | 44 ++++++++++++++++++++++---------------------- src/conf/domain_conf.h | 10 +++++----- src/qemu/qemu_domain.c | 2 +- src/qemu/qemu_process.c | 18 +++++++++--------- 4 files changed, 37 insertions(+), 37 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index f259b4c..24fefd1 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -2963,14 +2963,14 @@ virDomainLoaderDefFree(virDomainLoaderDefPtr loader) static void -virDomainCachetuneDefFree(virDomainCachetuneDefPtr cachetune) +virDomainRestuneDefFree(virDomainRestuneDefPtr restune) { - if (!cachetune) + if (!restune) return; - virObjectUnref(cachetune->alloc); - virBitmapFree(cachetune->vcpus); - VIR_FREE(cachetune); + virObjectUnref(restune->alloc); + virBitmapFree(restune->vcpus); + VIR_FREE(restune); } @@ -3160,9 +3160,9 @@ void virDomainDefFree(virDomainDefPtr def) virDomainShmemDefFree(def->shmems[i]); VIR_FREE(def->shmems); - for (i = 0; i < def->ncachetunes; i++) - virDomainCachetuneDefFree(def->cachetunes[i]); - VIR_FREE(def->cachetunes); + for (i = 0; i < def->nrestunes; i++) + virDomainRestuneDefFree(def->restunes[i]); + VIR_FREE(def->restunes); VIR_FREE(def->keywrap); @@ -18967,7 +18967,7 @@ virDomainCachetuneDefParse(virDomainDefPtr def, xmlNodePtr *nodes = NULL; virBitmapPtr vcpus = NULL; virResctrlAllocPtr alloc = virResctrlAllocNew(); - virDomainCachetuneDefPtr tmp_cachetune = NULL; + virDomainRestuneDefPtr tmp_restune = NULL; char *tmp = NULL; char *vcpus_str = NULL; char *alloc_id = NULL; @@ -18980,7 +18980,7 @@ virDomainCachetuneDefParse(virDomainDefPtr def, if (!alloc) goto cleanup; - if (VIR_ALLOC(tmp_cachetune) < 0) + if (VIR_ALLOC(tmp_restune) < 0) goto cleanup; vcpus_str = virXMLPropString(node, "vcpus"); @@ -19021,8 +19021,8 @@ virDomainCachetuneDefParse(virDomainDefPtr def, goto cleanup; } - for (i = 0; i < def->ncachetunes; i++) { - if (virBitmapOverlaps(def->cachetunes[i]->vcpus, vcpus)) { + for (i = 0; i < def->nrestunes; i++) { + if (virBitmapOverlaps(def->restunes[i]->vcpus, vcpus)) { virReportError(VIR_ERR_XML_ERROR, "%s", _("Overlapping vcpus in cachetunes")); goto cleanup; @@ -19052,16 +19052,16 @@ virDomainCachetuneDefParse(virDomainDefPtr def, if (virResctrlAllocSetID(alloc, alloc_id) < 0) goto cleanup; - VIR_STEAL_PTR(tmp_cachetune->vcpus, vcpus); - VIR_STEAL_PTR(tmp_cachetune->alloc, alloc); + VIR_STEAL_PTR(tmp_restune->vcpus, vcpus); + VIR_STEAL_PTR(tmp_restune->alloc, alloc); - if (VIR_APPEND_ELEMENT(def->cachetunes, def->ncachetunes, tmp_cachetune) < 0) + if (VIR_APPEND_ELEMENT(def->restunes, def->nrestunes, tmp_restune) < 0) goto cleanup; ret = 0; cleanup: ctxt->node = oldnode; - virDomainCachetuneDefFree(tmp_cachetune); + virDomainRestuneDefFree(tmp_restune); virObjectUnref(alloc); virBitmapFree(vcpus); VIR_FREE(alloc_id); @@ -26913,7 +26913,7 @@ virDomainCachetuneDefFormatHelper(unsigned int level, static int virDomainCachetuneDefFormat(virBufferPtr buf, - virDomainCachetuneDefPtr cachetune, + virDomainRestuneDefPtr restune, unsigned int flags) { virBuffer childrenBuf = VIR_BUFFER_INITIALIZER; @@ -26921,7 +26921,7 @@ virDomainCachetuneDefFormat(virBufferPtr buf, int ret = -1; virBufferSetChildIndent(&childrenBuf, buf); - virResctrlAllocForeachCache(cachetune->alloc, + virResctrlAllocForeachCache(restune->alloc, virDomainCachetuneDefFormatHelper, &childrenBuf); @@ -26934,14 +26934,14 @@ virDomainCachetuneDefFormat(virBufferPtr buf, goto cleanup; } - vcpus = virBitmapFormat(cachetune->vcpus); + vcpus = virBitmapFormat(restune->vcpus); if (!vcpus) goto cleanup; virBufferAsprintf(buf, "<cachetune vcpus='%s'", vcpus); if (!(flags & VIR_DOMAIN_DEF_FORMAT_INACTIVE)) { - const char *alloc_id = virResctrlAllocGetID(cachetune->alloc); + const char *alloc_id = virResctrlAllocGetID(restune->alloc); if (!alloc_id) goto cleanup; @@ -27062,8 +27062,8 @@ virDomainCputuneDefFormat(virBufferPtr buf, def->iothreadids[i]->iothread_id); } - for (i = 0; i < def->ncachetunes; i++) - virDomainCachetuneDefFormat(&childrenBuf, def->cachetunes[i], flags); + for (i = 0; i < def->nrestunes; i++) + virDomainCachetuneDefFormat(&childrenBuf, def->restunes[i], flags); if (virBufferCheckError(&childrenBuf) < 0) return -1; diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 0f10e24..64920fa 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2229,10 +2229,10 @@ struct _virDomainCputune { }; -typedef struct _virDomainCachetuneDef virDomainCachetuneDef; -typedef virDomainCachetuneDef *virDomainCachetuneDefPtr; +typedef struct _virDomainRestuneDef virDomainRestuneDef; +typedef virDomainRestuneDef *virDomainRestuneDefPtr; -struct _virDomainCachetuneDef { +struct _virDomainRestuneDef { virBitmapPtr vcpus; virResctrlAllocPtr alloc; }; @@ -2410,8 +2410,8 @@ struct _virDomainDef { virDomainCputune cputune; - virDomainCachetuneDefPtr *cachetunes; - size_t ncachetunes; + virDomainRestuneDefPtr *restunes; + size_t nrestunes; virDomainNumaPtr numa; virDomainResourceDefPtr resource; diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index ed76495..a9979ba 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -3999,7 +3999,7 @@ qemuDomainDefValidate(const virDomainDef *def, } } - if (def->ncachetunes && + if (def->nrestunes && def->virtType != VIR_DOMAIN_VIRT_KVM) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("cachetune is only supported for KVM domains")); diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index c903a8e..233ae94 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -2559,7 +2559,7 @@ qemuProcessResctrlCreate(virQEMUDriverPtr driver, virCapsPtr caps = NULL; qemuDomainObjPrivatePtr priv = vm->privateData; - if (!vm->def->ncachetunes) + if (!vm->def->nrestunes) return 0; /* Force capability refresh since resctrl info can change @@ -2568,9 +2568,9 @@ qemuProcessResctrlCreate(virQEMUDriverPtr driver, if (!caps) return -1; - for (i = 0; i < vm->def->ncachetunes; i++) { + for (i = 0; i < vm->def->nrestunes; i++) { if (virResctrlAllocCreate(caps->host.resctrl, - vm->def->cachetunes[i]->alloc, + vm->def->restunes[i]->alloc, priv->machineName) < 0) goto cleanup; } @@ -5376,8 +5376,8 @@ qemuProcessSetupVcpu(virDomainObjPtr vm, &vcpu->sched) < 0) return -1; - for (i = 0; i < vm->def->ncachetunes; i++) { - virDomainCachetuneDefPtr ct = vm->def->cachetunes[i]; + for (i = 0; i < vm->def->nrestunes; i++) { + virDomainRestuneDefPtr ct = vm->def->restunes[i]; if (virBitmapIsBitSet(ct->vcpus, vcpuid)) { if (virResctrlAllocAddPID(ct->alloc, vcpupid) < 0) @@ -7077,8 +7077,8 @@ void qemuProcessStop(virQEMUDriverPtr driver, /* Remove resctrl allocation after cgroups are cleaned up which makes it * kind of safer (although removing the allocation should work even with * pids in tasks file */ - for (i = 0; i < vm->def->ncachetunes; i++) - virResctrlAllocRemove(vm->def->cachetunes[i]->alloc); + for (i = 0; i < vm->def->nrestunes; i++) + virResctrlAllocRemove(vm->def->restunes[i]->alloc); qemuProcessRemoveDomainStatus(driver, vm); @@ -7801,8 +7801,8 @@ qemuProcessReconnect(void *opaque) if (qemuConnectAgent(driver, obj) < 0) goto error; - for (i = 0; i < obj->def->ncachetunes; i++) { - if (virResctrlAllocDeterminePath(obj->def->cachetunes[i]->alloc, + for (i = 0; i < obj->def->nrestunes; i++) { + if (virResctrlAllocDeterminePath(obj->def->restunes[i]->alloc, priv->machineName) < 0) goto error; } -- 2.7.4

On 07/18/2018 03:57 AM, bing.niu@intel.com wrote:
From: Bing Niu <bing.niu@intel.com>
Resctrl not only supports cache tuning, but also memory bandwidth tuning. Renaming cachetune to restune(resource tuning) to reflect that. With restune, all allocation for different resources (cache, memory bandwidth) are aggregated and represented by a virResctrlAllocPtr inside virDomainRestuneDef.
Signed-off-by: Bing Niu <bing.niu@intel.com> --- src/conf/domain_conf.c | 44 ++++++++++++++++++++++---------------------- src/conf/domain_conf.h | 10 +++++----- src/qemu/qemu_domain.c | 2 +- src/qemu/qemu_process.c | 18 +++++++++--------- 4 files changed, 37 insertions(+), 37 deletions(-)
As I noted previously, not much a fan of Restune instead of Cachetune, but I understand the logic why you went that way. I wonder if "virDomainResAllocDef" is any better (resallocs, nresallocs)? or if that clashes with any other namespace so far? or is too close to virResctrlAllocPtr. Or perhaps "virDomainResCtrlDef" w/ resctrls and nresctrls to mimic the virresctrl.{c,h} naming scheme. As previously stated, "Naming is hard"... Wish there was more feedback than just me on this, but in the long run, I'll go with whatever the Intel team agrees upon as it's not that big a deal. If someone else has agita after things are pushed and wants to change the name, then they know how to send patches. John [...]

> -----Original Message----- > From: John Ferlan [mailto:jferlan@redhat.com] > Sent: Friday, July 27, 2018 12:33 AM > To: Niu, Bing <bing.niu@intel.com>; libvir-list@redhat.com > Cc: Feng, Shaohe <shaohe.feng@intel.com>; Wang, Huaqiang > <huaqiang.wang@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; > rui.zang@yandex.com > Subject: Re: [libvirt] [PATCH 6/9] conf: Rename cachetune to restune > > > > On 07/18/2018 03:57 AM, bing.niu@intel.com wrote: > > From: Bing Niu <bing.niu@intel.com> > > > > Resctrl not only supports cache tuning, but also memory bandwidth > > tuning. Renaming cachetune to restune(resource tuning) to reflect > > that. With restune, all allocation for different resources (cache, > > memory bandwidth) are aggregated and represented by a > > virResctrlAllocPtr inside virDomainRestuneDef. > > > > Signed-off-by: Bing Niu <bing.niu@intel.com> > > --- > > src/conf/domain_conf.c | 44 > > ++++++++++++++++++++++---------------------- > > src/conf/domain_conf.h | 10 +++++----- src/qemu/qemu_domain.c | 2 > > +- src/qemu/qemu_process.c | 18 +++++++++--------- > > 4 files changed, 37 insertions(+), 37 deletions(-) > > > > As I noted previously, not much a fan of Restune instead of Cachetune, but I > understand the logic why you went that way. > > I wonder if "virDomainResAllocDef" is any better (resallocs, nresallocs)? or if > that clashes with any other namespace so far? or is too close to > virResctrlAllocPtr. >From Huaqiang: Hi Bing and John, this is Huaqiang and I am working on preparing libvirt patches to support cache monitoring (CMT) feature and memory bandwidth monitoring feature(MBM). I am wondering if we can consider making a further step of renaming of 'virDomainResAllocDef' in the future to accommodate RDT monitoring feature (CMT and MBM): Using 'virDomainResDef' or 'virDomainCPUResDef' to substitute 'virDomainResAllocDef'? My considerations are below: - 'cache tune', 'memory bw tune' and the corresponding monitoring technologies are integrated in same underlying CPU resource control group (aka: resctrlfs group). They are sharing interfaces exposed interfaces of same resctrlfs subdirectory. - If we make a rename to 'virDomainResDef'/'virDomainCPUResDef', this data struct could be used for CMT/MBM feature in a very straightforward way. - And I prefer 'virDomainCPUResDef' than the other one which make it clear that it is dealing with CPU resources other than network or disk resources. I am also in thinking about raising a refactor patch in the future to rename 'virResctrlAllocPtr' to ' virResctrlPtr'(or virResctrlGroupPtr, still in figuring out an appropriate name...). Which will make it possible to reuse some of code existing in file virresctrl.c/h now with similar reasons I talked above. > > Or perhaps "virDomainResCtrlDef" w/ resctrls and nresctrls to mimic the > virresctrl.{c,h} naming scheme. > > As previously stated, "Naming is hard"... Wish there was more feedback than > just me on this, but in the long run, I'll go with whatever the Intel team agrees > upon as it's not that big a deal. If someone else has agita after things are pushed > and wants to change the name, then they know how to send patches. > > John > > [...]

-----Original Message----- From: John Ferlan [mailto:jferlan@redhat.com] Sent: Friday, July 27, 2018 12:33 AM To: Niu, Bing <bing.niu@intel.com>; libvir-list@redhat.com Cc: Feng, Shaohe <shaohe.feng@intel.com>; Wang, Huaqiang <huaqiang.wang@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; rui.zang@yandex.com Subject: Re: [libvirt] [PATCH 6/9] conf: Rename cachetune to restune
On 07/18/2018 03:57 AM, bing.niu@intel.com wrote:
From: Bing Niu <bing.niu@intel.com>
Resctrl not only supports cache tuning, but also memory bandwidth tuning. Renaming cachetune to restune(resource tuning) to reflect that. With restune, all allocation for different resources (cache, memory bandwidth) are aggregated and represented by a virResctrlAllocPtr inside virDomainRestuneDef.
Signed-off-by: Bing Niu <bing.niu@intel.com> --- src/conf/domain_conf.c | 44 ++++++++++++++++++++++---------------------- src/conf/domain_conf.h | 10 +++++----- src/qemu/qemu_domain.c | 2 +- src/qemu/qemu_process.c | 18 +++++++++--------- 4 files changed, 37 insertions(+), 37 deletions(-)
As I noted previously, not much a fan of Restune instead of Cachetune, but I understand the logic why you went that way.
I wonder if "virDomainResAllocDef" is any better (resallocs, nresallocs)? or if that clashes with any other namespace so far? or is too close to virResctrlAllocPtr.
Or perhaps "virDomainResCtrlDef" w/ resctrls and nresctrls to mimic the virresctrl.{c,h} naming scheme.
Hi John, I have made a comment without reading above line... Yes! This is what I want to change for the naming. 'virDomainResCtrlDef' looks Ok for me.
As previously stated, "Naming is hard"... Wish there was more feedback than just me on this, but in the long run, I'll go with whatever the Intel team agrees upon as it's not that big a deal. If someone else has agita after things are pushed and wants to change the name, then they know how to send patches.
John
[...]

On 2018年07月27日 00:32, John Ferlan wrote:
On 07/18/2018 03:57 AM, bing.niu@intel.com wrote:
From: Bing Niu <bing.niu@intel.com>
Resctrl not only supports cache tuning, but also memory bandwidth tuning. Renaming cachetune to restune(resource tuning) to reflect that. With restune, all allocation for different resources (cache, memory bandwidth) are aggregated and represented by a virResctrlAllocPtr inside virDomainRestuneDef.
Signed-off-by: Bing Niu <bing.niu@intel.com> --- src/conf/domain_conf.c | 44 ++++++++++++++++++++++---------------------- src/conf/domain_conf.h | 10 +++++----- src/qemu/qemu_domain.c | 2 +- src/qemu/qemu_process.c | 18 +++++++++--------- 4 files changed, 37 insertions(+), 37 deletions(-)
As I noted previously, not much a fan of Restune instead of Cachetune, but I understand the logic why you went that way.
I wonder if "virDomainResAllocDef" is any better (resallocs, nresallocs)? or if that clashes with any other namespace so far? or is too close to virResctrlAllocPtr.
Or perhaps "virDomainResCtrlDef" w/ resctrls and nresctrls to mimic the virresctrl.{c,h} naming scheme.
virDomainResCtrlDef is better. How about we did one puny adjustment. virDomainResctrlDef w/ resctrls and nresctrls? Use little 'c' can align with virresctrl.c function naming. ;)
As previously stated, "Naming is hard"... Wish there was more feedback than just me on this, but in the long run, I'll go with whatever the Intel team agrees upon as it's not that big a deal. If someone else has agita after things are pushed and wants to change the name, then they know how to send patches.
John
[...]

From: Bing Niu <bing.niu@intel.com> Refactoring virDomainCachetuneDefParse, extracting vcpus extracting, overlapping detecting and new resctrl allocation creating logic. Those two logic is common among different resource allocation technologies. Signed-off-by: Bing Niu <bing.niu@intel.com> --- src/conf/domain_conf.c | 195 +++++++++++++++++++++++++++++++++---------------- 1 file changed, 131 insertions(+), 64 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 24fefd1..695981c 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -18884,6 +18884,115 @@ virDomainDefParseBootOptions(virDomainDefPtr def, static int +virDomainRestuneParseVcpus(virDomainDefPtr def, + xmlNodePtr node, + virBitmapPtr *vcpus) +{ + char *vcpus_str = NULL; + int ret = -1; + + vcpus_str = virXMLPropString(node, "vcpus"); + if (!vcpus_str) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("Missing cachetune attribute 'vcpus'")); + goto cleanup; + } + if (virBitmapParse(vcpus_str, vcpus, VIR_DOMAIN_CPUMASK_LEN) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("Invalid cachetune attribute 'vcpus' value '%s'"), + vcpus_str); + goto cleanup; + } + + /* We need to limit the bitmap to number of vCPUs. If there's nothing left, + * then we can just clean up and return 0 immediately */ + virBitmapShrink(*vcpus, def->maxvcpus); + + ret = 0; + cleanup: + VIR_FREE(vcpus_str); + return ret; +} + + +static int +virDomainFindResctrlAlloc(virDomainDefPtr def, + virBitmapPtr vcpus, + virResctrlAllocPtr *alloc) +{ + ssize_t i = 0; + + for (i = 0; i < def->nrestunes; i++) { + /* vcpus group has been created, directly use the existing one. + * Just updating memory allocation information of that group + */ + if (virBitmapEqual(def->restunes[i]->vcpus, vcpus)) { + *alloc = def->restunes[i]->alloc; + break; + } + if (virBitmapOverlaps(def->restunes[i]->vcpus, vcpus)) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("Overlapping vcpus in restunes")); + return -1; + } + } + return 0; +} + + +static int +virDomainUpdateRestune(virDomainDefPtr def, + xmlNodePtr node, + virResctrlAllocPtr alloc, + virBitmapPtr vcpus, + unsigned int flags) +{ + char *vcpus_str = NULL; + char *alloc_id = NULL; + virDomainRestuneDefPtr tmp_restune = NULL; + int ret = -1; + + if (VIR_ALLOC(tmp_restune) < 0) + goto cleanup; + + /* We need to format it back because we need to be consistent in the naming + * even when users specify some "sub-optimal" string there. */ + vcpus_str = virBitmapFormat(vcpus); + if (!vcpus_str) + goto cleanup; + + if (!(flags & VIR_DOMAIN_DEF_PARSE_INACTIVE)) + alloc_id = virXMLPropString(node, "id"); + + if (!alloc_id) { + /* The number of allocations is limited and the directory structure is flat, + * not hierarchical, so we need to have all same allocations in one + * directory, so it's nice to have it named appropriately. For now it's + * 'vcpus_...' but it's designed in order for it to be changeable in the + * future (it's part of the status XML). */ + if (virAsprintf(&alloc_id, "vcpus_%s", vcpus_str) < 0) + goto cleanup; + } + + if (virResctrlAllocSetID(alloc, alloc_id) < 0) + goto cleanup; + + tmp_restune->vcpus = vcpus; + tmp_restune->alloc = alloc; + + if (VIR_APPEND_ELEMENT(def->restunes, def->nrestunes, tmp_restune) < 0) + goto cleanup; + + ret = 0; + cleanup: + virDomainRestuneDefFree(tmp_restune); + VIR_FREE(alloc_id); + VIR_FREE(vcpus_str); + return ret; +} + + +static int virDomainCachetuneDefParseCache(xmlXPathContextPtr ctxt, xmlNodePtr node, virResctrlAllocPtr alloc) @@ -18966,39 +19075,16 @@ virDomainCachetuneDefParse(virDomainDefPtr def, xmlNodePtr oldnode = ctxt->node; xmlNodePtr *nodes = NULL; virBitmapPtr vcpus = NULL; - virResctrlAllocPtr alloc = virResctrlAllocNew(); - virDomainRestuneDefPtr tmp_restune = NULL; - char *tmp = NULL; - char *vcpus_str = NULL; - char *alloc_id = NULL; + virResctrlAllocPtr alloc = NULL; ssize_t i = 0; int n; int ret = -1; + bool new_alloc = false; ctxt->node = node; - if (!alloc) - goto cleanup; - - if (VIR_ALLOC(tmp_restune) < 0) - goto cleanup; - - vcpus_str = virXMLPropString(node, "vcpus"); - if (!vcpus_str) { - virReportError(VIR_ERR_XML_ERROR, "%s", - _("Missing cachetune attribute 'vcpus'")); - goto cleanup; - } - if (virBitmapParse(vcpus_str, &vcpus, VIR_DOMAIN_CPUMASK_LEN) < 0) { - virReportError(VIR_ERR_XML_ERROR, - _("Invalid cachetune attribute 'vcpus' value '%s'"), - vcpus_str); + if (virDomainRestuneParseVcpus(def, node, &vcpus) < 0) goto cleanup; - } - - /* We need to limit the bitmap to number of vCPUs. If there's nothing left, - * then we can just clean up and return 0 immediately */ - virBitmapShrink(vcpus, def->maxvcpus); if (virBitmapIsAllClear(vcpus)) { ret = 0; @@ -19011,63 +19097,44 @@ virDomainCachetuneDefParse(virDomainDefPtr def, goto cleanup; } - for (i = 0; i < n; i++) { - if (virDomainCachetuneDefParseCache(ctxt, nodes[i], alloc) < 0) + if (virDomainFindResctrlAlloc(def, vcpus, &alloc) < 0) + goto cleanup; + + if (!alloc) { + alloc = virResctrlAllocNew(); + if (!alloc) goto cleanup; - } + new_alloc = true; + } else { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("Identical vcpus in cachetunes found")); - if (virResctrlAllocIsEmpty(alloc)) { - ret = 0; goto cleanup; } - for (i = 0; i < def->nrestunes; i++) { - if (virBitmapOverlaps(def->restunes[i]->vcpus, vcpus)) { - virReportError(VIR_ERR_XML_ERROR, "%s", - _("Overlapping vcpus in cachetunes")); + for (i = 0; i < n; i++) { + if (virDomainCachetuneDefParseCache(ctxt, nodes[i], alloc) < 0) goto cleanup; - } } - /* We need to format it back because we need to be consistent in the naming - * even when users specify some "sub-optimal" string there. */ - VIR_FREE(vcpus_str); - vcpus_str = virBitmapFormat(vcpus); - if (!vcpus_str) + if (virResctrlAllocIsEmpty(alloc)) { + ret = 0; goto cleanup; + } - if (!(flags & VIR_DOMAIN_DEF_PARSE_INACTIVE)) - alloc_id = virXMLPropString(node, "id"); - - if (!alloc_id) { - /* The number of allocations is limited and the directory structure is flat, - * not hierarchical, so we need to have all same allocations in one - * directory, so it's nice to have it named appropriately. For now it's - * 'vcpus_...' but it's designed in order for it to be changeable in the - * future (it's part of the status XML). */ - if (virAsprintf(&alloc_id, "vcpus_%s", vcpus_str) < 0) + if (new_alloc) { + if (virDomainUpdateRestune(def, node, alloc, vcpus, flags) < 0) goto cleanup; + vcpus = NULL; + alloc = NULL; } - if (virResctrlAllocSetID(alloc, alloc_id) < 0) - goto cleanup; - - VIR_STEAL_PTR(tmp_restune->vcpus, vcpus); - VIR_STEAL_PTR(tmp_restune->alloc, alloc); - - if (VIR_APPEND_ELEMENT(def->restunes, def->nrestunes, tmp_restune) < 0) - goto cleanup; - ret = 0; cleanup: ctxt->node = oldnode; - virDomainRestuneDefFree(tmp_restune); virObjectUnref(alloc); virBitmapFree(vcpus); - VIR_FREE(alloc_id); - VIR_FREE(vcpus_str); VIR_FREE(nodes); - VIR_FREE(tmp); return ret; } -- 2.7.4

On 07/18/2018 03:57 AM, bing.niu@intel.com wrote:
From: Bing Niu <bing.niu@intel.com>
Refactoring virDomainCachetuneDefParse, extracting vcpus extracting, overlapping detecting and new resctrl allocation creating logic. Those two logic is common among different resource allocation technologies.
Signed-off-by: Bing Niu <bing.niu@intel.com> --- src/conf/domain_conf.c | 195 +++++++++++++++++++++++++++++++++---------------- 1 file changed, 131 insertions(+), 64 deletions(-)
You could make 3 patches out of this - one for each reduction of virDomainCachetuneDefParse... The virDomainFindResctrlAlloc should have used Restune instead of ResctrlAlloc, right? Of course that changes based on how the naming patch shakes out. Actually I think perhaps virDomainRestuneVcpuMatch would be even better (where Restune ends up being whatever shakes out of the previous patch naming decision). In this area, I'm wondering what purpose does @new_alloc serve? If @alloc was already defined, then you added an error message. So the reality is - @new_alloc is always true. Secondary to that you've added a new error related to identical vcpus in the parsing logic. Unfortunately that could make a domain disappear on a libvirtd restart if for some reason it was already defined in that manner. If there's a parsing error because two entries had identical cpus, then there will be an error. Errors would cause a previously OK/found domain to not be defined. So that type of check (now that the code has been around for more than a release) would need to go in some sort of Validation API. This is one of those libvirt define and libvirtd restart quirks. Finally, virDomainUpdateRestune should be virDomainRestuneUpdate since you started with virDomainRestuneParseVcpus and it should go after virDomainCachetuneDefParseCache Hope this all makes sense! John
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 24fefd1..695981c 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -18884,6 +18884,115 @@ virDomainDefParseBootOptions(virDomainDefPtr def,
static int +virDomainRestuneParseVcpus(virDomainDefPtr def, + xmlNodePtr node, + virBitmapPtr *vcpus) +{ + char *vcpus_str = NULL; + int ret = -1; + + vcpus_str = virXMLPropString(node, "vcpus"); + if (!vcpus_str) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("Missing cachetune attribute 'vcpus'")); + goto cleanup; + } + if (virBitmapParse(vcpus_str, vcpus, VIR_DOMAIN_CPUMASK_LEN) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("Invalid cachetune attribute 'vcpus' value '%s'"), + vcpus_str); + goto cleanup; + } + + /* We need to limit the bitmap to number of vCPUs. If there's nothing left, + * then we can just clean up and return 0 immediately */ + virBitmapShrink(*vcpus, def->maxvcpus); + + ret = 0; + cleanup: + VIR_FREE(vcpus_str); + return ret; +} + + +static int +virDomainFindResctrlAlloc(virDomainDefPtr def, + virBitmapPtr vcpus, + virResctrlAllocPtr *alloc) +{ + ssize_t i = 0; + + for (i = 0; i < def->nrestunes; i++) { + /* vcpus group has been created, directly use the existing one. + * Just updating memory allocation information of that group + */ + if (virBitmapEqual(def->restunes[i]->vcpus, vcpus)) { + *alloc = def->restunes[i]->alloc; + break; + } + if (virBitmapOverlaps(def->restunes[i]->vcpus, vcpus)) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("Overlapping vcpus in restunes")); + return -1; + } + } + return 0; +} + + +static int +virDomainUpdateRestune(virDomainDefPtr def, + xmlNodePtr node, + virResctrlAllocPtr alloc, + virBitmapPtr vcpus, + unsigned int flags) +{ + char *vcpus_str = NULL; + char *alloc_id = NULL; + virDomainRestuneDefPtr tmp_restune = NULL; + int ret = -1; + + if (VIR_ALLOC(tmp_restune) < 0) + goto cleanup; + + /* We need to format it back because we need to be consistent in the naming + * even when users specify some "sub-optimal" string there. */ + vcpus_str = virBitmapFormat(vcpus); + if (!vcpus_str) + goto cleanup; + + if (!(flags & VIR_DOMAIN_DEF_PARSE_INACTIVE)) + alloc_id = virXMLPropString(node, "id"); + + if (!alloc_id) { + /* The number of allocations is limited and the directory structure is flat, + * not hierarchical, so we need to have all same allocations in one + * directory, so it's nice to have it named appropriately. For now it's + * 'vcpus_...' but it's designed in order for it to be changeable in the + * future (it's part of the status XML). */ + if (virAsprintf(&alloc_id, "vcpus_%s", vcpus_str) < 0) + goto cleanup; + } + + if (virResctrlAllocSetID(alloc, alloc_id) < 0) + goto cleanup; + + tmp_restune->vcpus = vcpus; + tmp_restune->alloc = alloc; + + if (VIR_APPEND_ELEMENT(def->restunes, def->nrestunes, tmp_restune) < 0) + goto cleanup; + + ret = 0; + cleanup: + virDomainRestuneDefFree(tmp_restune); + VIR_FREE(alloc_id); + VIR_FREE(vcpus_str); + return ret; +} + + +static int virDomainCachetuneDefParseCache(xmlXPathContextPtr ctxt, xmlNodePtr node, virResctrlAllocPtr alloc) @@ -18966,39 +19075,16 @@ virDomainCachetuneDefParse(virDomainDefPtr def, xmlNodePtr oldnode = ctxt->node; xmlNodePtr *nodes = NULL; virBitmapPtr vcpus = NULL; - virResctrlAllocPtr alloc = virResctrlAllocNew(); - virDomainRestuneDefPtr tmp_restune = NULL; - char *tmp = NULL; - char *vcpus_str = NULL; - char *alloc_id = NULL; + virResctrlAllocPtr alloc = NULL; ssize_t i = 0; int n; int ret = -1; + bool new_alloc = false;
ctxt->node = node;
- if (!alloc) - goto cleanup; - - if (VIR_ALLOC(tmp_restune) < 0) - goto cleanup; - - vcpus_str = virXMLPropString(node, "vcpus"); - if (!vcpus_str) { - virReportError(VIR_ERR_XML_ERROR, "%s", - _("Missing cachetune attribute 'vcpus'")); - goto cleanup; - } - if (virBitmapParse(vcpus_str, &vcpus, VIR_DOMAIN_CPUMASK_LEN) < 0) { - virReportError(VIR_ERR_XML_ERROR, - _("Invalid cachetune attribute 'vcpus' value '%s'"), - vcpus_str); + if (virDomainRestuneParseVcpus(def, node, &vcpus) < 0) goto cleanup; - } - - /* We need to limit the bitmap to number of vCPUs. If there's nothing left, - * then we can just clean up and return 0 immediately */ - virBitmapShrink(vcpus, def->maxvcpus);
if (virBitmapIsAllClear(vcpus)) { ret = 0; @@ -19011,63 +19097,44 @@ virDomainCachetuneDefParse(virDomainDefPtr def, goto cleanup; }
- for (i = 0; i < n; i++) { - if (virDomainCachetuneDefParseCache(ctxt, nodes[i], alloc) < 0) + if (virDomainFindResctrlAlloc(def, vcpus, &alloc) < 0) + goto cleanup; + + if (!alloc) { + alloc = virResctrlAllocNew(); + if (!alloc) goto cleanup; - } + new_alloc = true; + } else { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("Identical vcpus in cachetunes found"));
- if (virResctrlAllocIsEmpty(alloc)) { - ret = 0; goto cleanup; }
- for (i = 0; i < def->nrestunes; i++) { - if (virBitmapOverlaps(def->restunes[i]->vcpus, vcpus)) { - virReportError(VIR_ERR_XML_ERROR, "%s", - _("Overlapping vcpus in cachetunes")); + for (i = 0; i < n; i++) { + if (virDomainCachetuneDefParseCache(ctxt, nodes[i], alloc) < 0) goto cleanup; - } }
- /* We need to format it back because we need to be consistent in the naming - * even when users specify some "sub-optimal" string there. */ - VIR_FREE(vcpus_str); - vcpus_str = virBitmapFormat(vcpus); - if (!vcpus_str) + if (virResctrlAllocIsEmpty(alloc)) { + ret = 0; goto cleanup; + }
- if (!(flags & VIR_DOMAIN_DEF_PARSE_INACTIVE)) - alloc_id = virXMLPropString(node, "id"); - - if (!alloc_id) { - /* The number of allocations is limited and the directory structure is flat, - * not hierarchical, so we need to have all same allocations in one - * directory, so it's nice to have it named appropriately. For now it's - * 'vcpus_...' but it's designed in order for it to be changeable in the - * future (it's part of the status XML). */ - if (virAsprintf(&alloc_id, "vcpus_%s", vcpus_str) < 0) + if (new_alloc) { + if (virDomainUpdateRestune(def, node, alloc, vcpus, flags) < 0) goto cleanup; + vcpus = NULL; + alloc = NULL; }
- if (virResctrlAllocSetID(alloc, alloc_id) < 0) - goto cleanup; - - VIR_STEAL_PTR(tmp_restune->vcpus, vcpus); - VIR_STEAL_PTR(tmp_restune->alloc, alloc); - - if (VIR_APPEND_ELEMENT(def->restunes, def->nrestunes, tmp_restune) < 0) - goto cleanup; - ret = 0; cleanup: ctxt->node = oldnode; - virDomainRestuneDefFree(tmp_restune); virObjectUnref(alloc); virBitmapFree(vcpus); - VIR_FREE(alloc_id); - VIR_FREE(vcpus_str); VIR_FREE(nodes); - VIR_FREE(tmp); return ret; }

On 2018年07月27日 01:48, John Ferlan wrote:
On 07/18/2018 03:57 AM, bing.niu@intel.com wrote:
From: Bing Niu <bing.niu@intel.com>
Refactoring virDomainCachetuneDefParse, extracting vcpus extracting, overlapping detecting and new resctrl allocation creating logic. Those two logic is common among different resource allocation technologies.
Signed-off-by: Bing Niu <bing.niu@intel.com> --- src/conf/domain_conf.c | 195 +++++++++++++++++++++++++++++++++---------------- 1 file changed, 131 insertions(+), 64 deletions(-)
You could make 3 patches out of this - one for each reduction of virDomainCachetuneDefParse...
Will split that.
The virDomainFindResctrlAlloc should have used Restune instead of ResctrlAlloc, right? Of course that changes based on how the naming patch shakes out. Actually I think perhaps virDomainRestuneVcpuMatch would be even better (where Restune ends up being whatever shakes out of the previous patch naming decision).
Let's use virDomainResctrlVcpuMatch. :)
In this area, I'm wondering what purpose does @new_alloc serve? If @alloc was already defined, then you added an error message. So the reality is - @new_alloc is always true.
As you already note in patch 8. This is used by memory bandwidth to find a resctrl group defined by CAT already.
Secondary to that you've added a new error related to identical vcpus in the parsing logic. Unfortunately that could make a domain disappear on a libvirtd restart if for some reason it was already defined in that manner. If there's a parsing error because two entries had identical cpus, then there will be an error. Errors would cause a previously OK/found domain to not be defined. So that type of check (now that the
Sorry, I am not sure if I understand correctly. Are you talking about two domains(or VMs) with the same vcpus setting? If so, above vpu match is not for different domain but for the memory bandwidth allocation and the cache line allocation of one domain. And above vcpu match function doesn't change any logic of existing virDomainCachetuneDefParseCache. It just extract from virDomainCachetuneDefParseCache and packing into one function. So that memory bandwidth parsing logic in patch 8 can reuse this vcpu matching part. :)
code has been around for more than a release) would need to go in some sort of Validation API. This is one of those libvirt define and libvirtd restart quirks.
If my understanding is incorrect. Could you please clarify here or give me some example? :)
Finally, virDomainUpdateRestune should be virDomainRestuneUpdate since you started with virDomainRestuneParseVcpus and it should go after virDomainCachetuneDefParseCache
Let's use virDomainResctrlUpdate
Hope this all makes sense!
John
[....]

[...]
Secondary to that you've added a new error related to identical vcpus in the parsing logic. Unfortunately that could make a domain disappear on a libvirtd restart if for some reason it was already defined in that manner. If there's a parsing error because two entries had identical cpus, then there will be an error. Errors would cause a previously OK/found domain to not be defined. So that type of check (now that the
Sorry, I am not sure if I understand correctly. Are you talking about two domains(or VMs) with the same vcpus setting?
It's the: + } else { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("Identical vcpus in cachetunes found")); + goto cleanup; + } that was new... Consider what would happen before any of these changes were merged if that condition was true. So I found tests/genericxml2xmlindata/cachetune-colliding-tunes.xml and did a little test... Before your changes, the test fails with "error: XML error: Overlapping vcpus in cachetunes", but with your changes the test fails with "error: XML error: Identical vcpus in cachetunes found". So since both fail, that's good, no issue then. It was different which is what drew my attention. In any case, just mention it in the commit message that part of the change will "clarify" whether it's an overlap or a redefinition. John [...]

On 2018年07月28日 00:58, John Ferlan wrote:
[...]
Secondary to that you've added a new error related to identical vcpus in the parsing logic. Unfortunately that could make a domain disappear on a libvirtd restart if for some reason it was already defined in that manner. If there's a parsing error because two entries had identical cpus, then there will be an error. Errors would cause a previously OK/found domain to not be defined. So that type of check (now that the
Sorry, I am not sure if I understand correctly. Are you talking about two domains(or VMs) with the same vcpus setting?
It's the:
+ } else { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("Identical vcpus in cachetunes found")); + goto cleanup; + }
that was new... Consider what would happen before any of these changes were merged if that condition was true.
So I found tests/genericxml2xmlindata/cachetune-colliding-tunes.xml and did a little test... Before your changes, the test fails with "error: XML error: Overlapping vcpus in cachetunes", but with your changes the test fails with "error: XML error: Identical vcpus in cachetunes found".
Yes! You are right. The error message is different.
So since both fail, that's good, no issue then. It was different which is what drew my attention. In any case, just mention it in the commit message that part of the change will "clarify" whether it's an overlap or a redefinition.
Okay~, will update commit message in v2.
John
[...]

From: Bing Niu <bing.niu@intel.com> Introduce a new section memorytune to support memory bandwidth allocation. This is consistent with existing cachetune. As the example: below: <cputune> ...... <memorytune vcpus='0'> <node id='0' bandwidth='30'/> </memorytune> </cputune> vpus --- vpus subjected to this memory bandwidth. id --- on which node memory bandwidth to be set. bandwidth --- the memory bandwidth percent to set. Signed-off-by: Bing Niu <bing.niu@intel.com> --- docs/formatdomain.html.in | 35 ++++ docs/schemas/domaincommon.rng | 17 ++ src/conf/domain_conf.c | 199 +++++++++++++++++++++ .../memorytune-colliding-allocs.xml | 30 ++++ .../memorytune-colliding-cachetune.xml | 32 ++++ tests/genericxml2xmlindata/memorytune.xml | 33 ++++ tests/genericxml2xmltest.c | 5 + 7 files changed, 351 insertions(+) create mode 100644 tests/genericxml2xmlindata/memorytune-colliding-allocs.xml create mode 100644 tests/genericxml2xmlindata/memorytune-colliding-cachetune.xml create mode 100644 tests/genericxml2xmlindata/memorytune.xml diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index a3afe13..4e38446 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -757,6 +757,10 @@ <cache id='0' level='3' type='both' size='3' unit='MiB'/> <cache id='1' level='3' type='both' size='3' unit='MiB'/> </cachetune> + <memorytune vcpus='0-3'> + <node id='0' bandwidth='60'/> + </memorytune> + </cputune> ... </domain> @@ -950,7 +954,38 @@ </dl> </dd> </dl> + </dd> + <dt><code>memorytune</code></dt> + <dd> + Optional <code>memorytune</code> element can control allocations for + memory bandwidth using the resctrl on the host. Whether or not is this + supported can be gathered from capabilities where some limitations like + minimum bandwidth and required granularity are reported as well. The + required attribute <code>vcpus</code> specifies to which vCPUs this + allocation applies. A vCPU can only be member of one + <code>memorytune</code> element allocations. <code>vcpus</code> specified + by <code>memorytune</code> can be identical to those specified by + <code>cachetune</code>. However they are not allowed to overlap each other. + Supported subelements are: + <dl> + <dt><code>node</code></dt> + <dd> + This element controls the allocation of CPU memory bandwidth and has the + following attributes: + <dl> + <dt><code>id</code></dt> + <dd> + Host node id from which to allocate memory bandwidth. + </dd> + <dt><code>bandwidth</code></dt> + <dd> + The memory bandwidth to allocate from this node. The value by default + is in percentage. + </dd> + </dl> + </dd> + </dl> </dd> </dl> diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index f24a563..b4cd96b 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -983,6 +983,23 @@ </oneOrMore> </element> </zeroOrMore> + <zeroOrMore> + <element name="memorytune"> + <attribute name="vcpus"> + <ref name='cpuset'/> + </attribute> + <oneOrMore> + <element name="node"> + <attribute name="id"> + <ref name='unsignedInt'/> + </attribute> + <attribute name="bandwidth"> + <ref name='unsignedInt'/> + </attribute> + </element> + </oneOrMore> + </element> + </zeroOrMore> </interleave> </element> </define> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 695981c..ea9276f 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -19139,6 +19139,128 @@ virDomainCachetuneDefParse(virDomainDefPtr def, } +static int +virDomainMemorytuneDefParseMemory(xmlXPathContextPtr ctxt, + xmlNodePtr node, + virResctrlAllocPtr alloc) +{ + xmlNodePtr oldnode = ctxt->node; + unsigned int id; + unsigned int bandwidth; + char *tmp = NULL; + int ret = -1; + + ctxt->node = node; + + tmp = virXMLPropString(node, "id"); + if (!tmp) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("Missing memorytune attribute 'id'")); + goto cleanup; + } + if (virStrToLong_uip(tmp, NULL, 10, &id) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("Invalid memorytune attribute 'id' value '%s'"), + tmp); + goto cleanup; + } + VIR_FREE(tmp); + + tmp = virXMLPropString(node, "bandwidth"); + if (!tmp) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("Missing memorytune attribute 'bandwidth'")); + goto cleanup; + } + if (virStrToLong_uip(tmp, NULL, 10, &bandwidth) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("Invalid memorytune attribute 'bandwidth' value '%s'"), + tmp); + goto cleanup; + } + VIR_FREE(tmp); + if (virResctrlSetMemoryBandwidth(alloc, id, bandwidth) < 0) + goto cleanup; + + ret = 0; + cleanup: + ctxt->node = oldnode; + VIR_FREE(tmp); + return ret; +} + + +static int +virDomainMemorytuneDefParse(virDomainDefPtr def, + xmlXPathContextPtr ctxt, + xmlNodePtr node, + unsigned int flags) +{ + xmlNodePtr oldnode = ctxt->node; + xmlNodePtr *nodes = NULL; + virBitmapPtr vcpus = NULL; + virResctrlAllocPtr alloc = NULL; + ssize_t i = 0; + int n; + int ret = -1; + bool new_alloc = false; + + ctxt->node = node; + + if (virDomainRestuneParseVcpus(def, node, &vcpus) < 0) + goto cleanup; + + if (virBitmapIsAllClear(vcpus)) { + ret = 0; + goto cleanup; + } + + if ((n = virXPathNodeSet("./node", ctxt, &nodes)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Cannot extract memory nodes under memorytune")); + goto cleanup; + } + + if (virDomainFindResctrlAlloc(def, vcpus, &alloc) < 0) + goto cleanup; + + if (!alloc) { + alloc = virResctrlAllocNew(); + if (!alloc) + goto cleanup; + new_alloc = true; + } else { + alloc = virObjectRef(alloc); + } + + for (i = 0; i < n; i++) { + if (virDomainMemorytuneDefParseMemory(ctxt, nodes[i], alloc) < 0) + goto cleanup; + } + if (virResctrlAllocIsEmpty(alloc)) { + ret = 0; + goto cleanup; + } + /* + * If this is a new allocation, format ID and append to restune, otherwise + * just update the existing alloc information */ + if (new_alloc) { + if (virDomainUpdateRestune(def, node, alloc, vcpus, flags) < 0) + goto cleanup; + vcpus = NULL; + alloc = NULL; + } + + ret = 0; + cleanup: + ctxt->node = oldnode; + virObjectUnref(alloc); + virBitmapFree(vcpus); + VIR_FREE(nodes); + return ret; +} + + static virDomainDefPtr virDomainDefParseXML(xmlDocPtr xml, xmlNodePtr root, @@ -19734,6 +19856,18 @@ virDomainDefParseXML(xmlDocPtr xml, } VIR_FREE(nodes); + if ((n = virXPathNodeSet("./cputune/memorytune", ctxt, &nodes)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("cannot extract memorytune nodes")); + goto error; + } + + for (i = 0; i < n; i++) { + if (virDomainMemorytuneDefParse(def, ctxt, nodes[i], flags) < 0) + goto error; + } + VIR_FREE(nodes); + if (virCPUDefParseXML(ctxt, "./cpu[1]", VIR_CPU_TYPE_GUEST, &def->cpu) < 0) goto error; @@ -27028,6 +27162,68 @@ virDomainCachetuneDefFormat(virBufferPtr buf, static int +virDomainMemorytuneDefFormatHelper(unsigned int id, + unsigned int bandwidth, + void *opaque) +{ + virBufferPtr buf = opaque; + + virBufferAsprintf(buf, + "<node id='%u' bandwidth='%u'/>\n", + id, bandwidth); + return 0; +} + + +static int +virDomainMemorytuneDefFormat(virBufferPtr buf, + virDomainRestuneDefPtr restune, + unsigned int flags) +{ + virBuffer childrenBuf = VIR_BUFFER_INITIALIZER; + char *vcpus = NULL; + int ret = -1; + + virBufferSetChildIndent(&childrenBuf, buf); + virResctrlAllocForeachMemory(restune->alloc, + virDomainMemorytuneDefFormatHelper, + &childrenBuf); + + + if (virBufferCheckError(&childrenBuf) < 0) + goto cleanup; + + if (!virBufferUse(&childrenBuf)) { + ret = 0; + goto cleanup; + } + + vcpus = virBitmapFormat(restune->vcpus); + if (!vcpus) + goto cleanup; + + virBufferAsprintf(buf, "<memorytune vcpus='%s'", vcpus); + + if (!(flags & VIR_DOMAIN_DEF_FORMAT_INACTIVE)) { + const char *alloc_id = virResctrlAllocGetID(restune->alloc); + if (!alloc_id) + goto cleanup; + + virBufferAsprintf(buf, " id='%s'", alloc_id); + } + virBufferAddLit(buf, ">\n"); + + virBufferAddBuffer(buf, &childrenBuf); + virBufferAddLit(buf, "</memorytune>\n"); + + ret = 0; + cleanup: + virBufferFreeAndReset(&childrenBuf); + VIR_FREE(vcpus); + return ret; +} + +static int virDomainCputuneDefFormat(virBufferPtr buf, virDomainDefPtr def, unsigned int flags) @@ -27132,6 +27328,9 @@ virDomainCputuneDefFormat(virBufferPtr buf, for (i = 0; i < def->nrestunes; i++) virDomainCachetuneDefFormat(&childrenBuf, def->restunes[i], flags); + for (i = 0; i < def->nrestunes; i++) + virDomainMemorytuneDefFormat(&childrenBuf, def->restunes[i], flags); + if (virBufferCheckError(&childrenBuf) < 0) return -1; diff --git a/tests/genericxml2xmlindata/memorytune-colliding-allocs.xml b/tests/genericxml2xmlindata/memorytune-colliding-allocs.xml new file mode 100644 index 0000000..9b8ebaa --- /dev/null +++ b/tests/genericxml2xmlindata/memorytune-colliding-allocs.xml @@ -0,0 +1,30 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219136</memory> + <currentMemory unit='KiB'>219136</currentMemory> + <vcpu placement='static'>4</vcpu> + <cputune> + <memorytune vcpus='0'> + <node id='0' bandwidth='50'/> + <node id='0' bandwidth='50'/> + </memorytune> + </cputune> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-i686</emulator> + <controller type='usb' index='0'/> + <controller type='ide' index='0'/> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/genericxml2xmlindata/memorytune-colliding-cachetune.xml b/tests/genericxml2xmlindata/memorytune-colliding-cachetune.xml new file mode 100644 index 0000000..5416870 --- /dev/null +++ b/tests/genericxml2xmlindata/memorytune-colliding-cachetune.xml @@ -0,0 +1,32 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219136</memory> + <currentMemory unit='KiB'>219136</currentMemory> + <vcpu placement='static'>4</vcpu> + <cputune> + <cachetune vcpus='0-1'> + <cache id='0' level='3' type='code' size='12' unit='KiB'/> + </cachetune> + <memorytune vcpus='0'> + <node id='0' bandwidth='50'/> + </memorytune> + </cputune> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-i686</emulator> + <controller type='usb' index='0'/> + <controller type='ide' index='0'/> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/genericxml2xmlindata/memorytune.xml b/tests/genericxml2xmlindata/memorytune.xml new file mode 100644 index 0000000..ea03e22 --- /dev/null +++ b/tests/genericxml2xmlindata/memorytune.xml @@ -0,0 +1,33 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219136</memory> + <currentMemory unit='KiB'>219136</currentMemory> + <vcpu placement='static'>4</vcpu> + <cputune> + <memorytune vcpus='0-1'> + <node id='0' bandwidth='20'/> + <node id='1' bandwidth='30'/> + </memorytune> + <memorytune vcpus='3'> + <node id='0' bandwidth='50'/> + </memorytune> + </cputune> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-i686</emulator> + <controller type='usb' index='0'/> + <controller type='ide' index='0'/> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/genericxml2xmltest.c b/tests/genericxml2xmltest.c index 7a4fc1e..e6d4ef2 100644 --- a/tests/genericxml2xmltest.c +++ b/tests/genericxml2xmltest.c @@ -140,6 +140,11 @@ mymain(void) TEST_COMPARE_DOM_XML2XML_RESULT_FAIL_PARSE); DO_TEST_FULL("cachetune-colliding-types", false, true, TEST_COMPARE_DOM_XML2XML_RESULT_FAIL_PARSE); + DO_TEST("memorytune"); + DO_TEST_FULL("memorytune-colliding-allocs", false, true, + TEST_COMPARE_DOM_XML2XML_RESULT_FAIL_PARSE); + DO_TEST_FULL("memorytune-colliding-cachetune", false, true, + TEST_COMPARE_DOM_XML2XML_RESULT_FAIL_PARSE); DO_TEST("tseg"); -- 2.7.4

subj: Add support for memorytune XML processing for resctrl MBA On 07/18/2018 03:57 AM, bing.niu@intel.com wrote:
From: Bing Niu <bing.niu@intel.com>
Introduce a new section memorytune to support memory bandwidth allocation. This is consistent with existing cachetune. As the example: below: <cputune> ...... <memorytune vcpus='0'> <node id='0' bandwidth='30'/> </memorytune> </cputune>
vpus --- vpus subjected to this memory bandwidth. id --- on which node memory bandwidth to be set. bandwidth --- the memory bandwidth percent to set.
Signed-off-by: Bing Niu <bing.niu@intel.com> --- docs/formatdomain.html.in | 35 ++++ docs/schemas/domaincommon.rng | 17 ++ src/conf/domain_conf.c | 199 +++++++++++++++++++++ .../memorytune-colliding-allocs.xml | 30 ++++ .../memorytune-colliding-cachetune.xml | 32 ++++ tests/genericxml2xmlindata/memorytune.xml | 33 ++++ tests/genericxml2xmltest.c | 5 + 7 files changed, 351 insertions(+) create mode 100644 tests/genericxml2xmlindata/memorytune-colliding-allocs.xml create mode 100644 tests/genericxml2xmlindata/memorytune-colliding-cachetune.xml create mode 100644 tests/genericxml2xmlindata/memorytune.xml
Strangely I expected to actually have a merge conflict here with previous changes, but I didn't!
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index a3afe13..4e38446 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -757,6 +757,10 @@ <cache id='0' level='3' type='both' size='3' unit='MiB'/> <cache id='1' level='3' type='both' size='3' unit='MiB'/> </cachetune> + <memorytune vcpus='0-3'> + <node id='0' bandwidth='60'/> + </memorytune> + </cputune> ... </domain> @@ -950,7 +954,38 @@ </dl> </dd> </dl> + </dd>
+ <dt><code>memorytune</code></dt> + <dd> + Optional <code>memorytune</code> element can control allocations for
s/Optional/<span class="since">Since 4.7.0</span>, the optional/ e.g. Since 4.7.0, the optional...
+ memory bandwidth using the resctrl on the host. Whether or not is this + supported can be gathered from capabilities where some limitations like + minimum bandwidth and required granularity are reported as well. The + required attribute <code>vcpus</code> specifies to which vCPUs this + allocation applies. A vCPU can only be member of one + <code>memorytune</code> element allocations. <code>vcpus</code> specified
s/./. The/
+ by <code>memorytune</code> can be identical to those specified by + <code>cachetune</code>. However they are not allowed to overlap each other.
The last 2 sentences are "confusing". We didn't add similar wording to cachetune... It's one of those visual things I suppose, but it seems like there's a relationship between cachetune and memtune, but only if they intersect vCPU values. Thus if cachetune "joins" vcpus='0-3', then memtune must also join them - using other combinations of 0-3 isn't allowed. I have to only imagine that would get more complicated with RDT
+ Supported subelements are: + <dl> + <dt><code>node</code></dt> + <dd> + This element controls the allocation of CPU memory bandwidth and has the + following attributes: + <dl> + <dt><code>id</code></dt> + <dd> + Host node id from which to allocate memory bandwidth. + </dd> + <dt><code>bandwidth</code></dt> + <dd> + The memory bandwidth to allocate from this node. The value by default + is in percentage.
Should we indicate that it must be between 1 and 100 inclusive?
+ </dd> + </dl> + </dd> + </dl> </dd> </dl>
diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index f24a563..b4cd96b 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -983,6 +983,23 @@ </oneOrMore> </element> </zeroOrMore> + <zeroOrMore> + <element nit's trying to clear up in my mind whether the vcpus used byame="memorytune"> + <attribute name="vcpus"> + <ref name='cpuset'/> + </attribute> + <oneOrMore> + <element name="node"> + <attribute name="id"> + <ref name='unsignedInt'/> + </attribute> + <attribute name="bandwidth"> + <ref name='unsignedInt'/> + </attribute> + </element> + </oneOrMore> + </element> + </zeroOrMore> </interleave> </element> </define> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 695981c..ea9276f 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -19139,6 +19139,128 @@ virDomainCachetuneDefParse(virDomainDefPtr def, }
+static int +virDomainMemorytuneDefParseMemory(xmlXPathContextPtr ctxt, + xmlNodePtr node, + virResctrlAllocPtr alloc) +{ + xmlNodePtr oldnode = ctxt->node; + unsigned int id; + unsigned int bandwidth; + char *tmp = NULL; + int ret = -1; + + ctxt->node = node; + + tmp = virXMLPropString(node, "id"); + if (!tmp) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("Missing memorytune attribute 'id'")); + goto cleanup; + } + if (virStrToLong_uip(tmp, NULL, 10, &id) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("Invalid memorytune attribute 'id' value '%s'"), + tmp); + goto cleanup; + } + VIR_FREE(tmp); + + tmp = virXMLPropString(node, "bandwidth"); + if (!tmp) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("Missing memorytune attribute 'bandwidth'")); + goto cleanup; + } + if (virStrToLong_uip(tmp, NULL, 10, &bandwidth) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("Invalid memorytune attribute 'bandwidth' value '%s'"), + tmp); + goto cleanup; + }
Should we check that bandwidth is between 1 and 100 inclusive? eg < 1 || > 100, then error. Or leave that to SetMemoryBandwidth (IDC).
+ VIR_FREE(tmp); + if (virResctrlSetMemoryBandwidth(alloc, id, bandwidth) < 0) + goto cleanup; + + ret = 0; + cleanup: + ctxt->node = oldnode; + VIR_FREE(tmp); + return ret; +} + + +static int +virDomainMemorytuneDefParse(virDomainDefPtr def, + xmlXPathContextPtr ctxt, + xmlNodePtr node, + unsigned int flags) +{ + xmlNodePtr oldnode = ctxt->node; + xmlNodePtr *nodes = NULL; + virBitmapPtr vcpus = NULL; + virResctrlAllocPtr alloc = NULL; + ssize_t i = 0; + int n; + int ret = -1; + bool new_alloc = false; + + ctxt->node = node; + + if (virDomainRestuneParseVcpus(def, node, &vcpus) < 0) + goto cleanup; + + if (virBitmapIsAllClear(vcpus)) { + ret = 0; + goto cleanup; + }
Just realized that for here and for cachetune we never parse the subelements if BitmapIsAllClear. That would mean anything provided is lost - of course it seems silly to have an empty bitmap too. Do we even test for that? Is it even feasible?
+ + if ((n = virXPathNodeSet("./node", ctxt, &nodes)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Cannot extract memory nodes under memorytune")); + goto cleanup; + } + + if (virDomainFindResctrlAlloc(def, vcpus, &alloc) < 0) + goto cleanup; + + if (!alloc) { + alloc = virResctrlAllocNew(); + if (!alloc) + goto cleanup; + new_alloc = true; + } else { + alloc = virObjectRef(alloc);
Ahh... so this is where I had some confusion initially, but makes more sense now given one of the failure xml examples. Still, does this cause issues: <memtune vcpus='3-5'> <node id='0' bandwidth='60'/> </memtune <memtune vcpus='3-5'> <node id='0' bandwidth='60'/> </memtune> or would the second one be <node id='1' bandwidth='60'/> Yes, they should put <node id='1' with the first group, but QE likes to stretch the boundaries of imagination once they read the code.
+ } + + for (i = 0; i < n; i++) { + if (virDomainMemorytuneDefParseMemory(ctxt, nodes[i], alloc) < 0) + goto cleanup; + } + if (virResctrlAllocIsEmpty(alloc)) { + ret = 0; + goto cleanup; + } + /* + * If this is a new allocation, format ID and append to restune, otherwise + * just update the existing alloc information */
The following code doesn't do what the "otherwise" comment suggests - although I perhaps still confused over the @alloc usage above.
+ if (new_alloc) { + if (virDomainUpdateRestune(def, node, alloc, vcpus, flags) < 0) + goto cleanup; + vcpus = NULL; + alloc = NULL; + } + + ret = 0; + cleanup: + ctxt->node = oldnode; + virObjectUnref(alloc); + virBitmapFree(vcpus); + VIR_FREE(nodes); + return ret; +} + + static virDomainDefPtr virDomainDefParseXML(xmlDocPtr xml, xmlNodePtr root, @@ -19734,6 +19856,18 @@ virDomainDefParseXML(xmlDocPtr xml, } VIR_FREE(nodes);
+ if ((n = virXPathNodeSet("./cputune/memorytune", ctxt, &nodes)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("cannot extract memorytune nodes")); + goto error; + } + + for (i = 0; i < n; i++) { + if (virDomainMemorytuneDefParse(def, ctxt, nodes[i], flags) < 0) + goto error; + } + VIR_FREE(nodes); + if (virCPUDefParseXML(ctxt, "./cpu[1]", VIR_CPU_TYPE_GUEST, &def->cpu) < 0) goto error;
@@ -27028,6 +27162,68 @@ virDomainCachetuneDefFormat(virBufferPtr buf,
static int +virDomainMemorytuneDefFormatHelper(unsigned int id, + unsigned int bandwidth, + void *opaque) +{ + virBufferPtr buf = opaque; + + virBufferAsprintf(buf, + "<node id='%u' bandwidth='%u'/>\n", + id, bandwidth); + return 0; +} + + +static int +virDomainMemorytuneDefFormat(virBufferPtr buf, + virDomainRestuneDefPtr restune, + unsigned int flags) +{ + virBuffer childrenBuf = VIR_BUFFER_INITIALIZER; + char *vcpus = NULL; + int ret = -1; + + virBufferSetChildIndent(&childrenBuf, buf); + virResctrlAllocForeachMemory(restune->alloc, + virDomainMemorytuneDefFormatHelper, + &childrenBuf);
probably could wrap this in a ignore_value(); since FormatHelper only ever returns 0. The error checking is next:
+ +
You can removed 1 of the 2 empty lines above here.
+ if (virBufferCheckError(&childrenBuf) < 0) + goto cleanup; + + if (!virBufferUse(&childrenBuf)) { + ret = 0; + goto cleanup; + } + + vcpus = virBitmapFormat(restune->vcpus); + if (!vcpus) + goto cleanup; + + virBufferAsprintf(buf, "<memorytune vcpus='%s'", vcpus); + + if (!(flags & VIR_DOMAIN_DEF_FORMAT_INACTIVE)) { + const char *alloc_id = virResctrlAllocGetID(restune->alloc); + if (!alloc_id) + goto cleanup; + + virBufferAsprintf(buf, " id='%s'", alloc_id); + } + virBufferAddLit(buf, ">\n"); + + virBufferAddBuffer(buf, &childrenBuf); + virBufferAddLit(buf, "</memorytune>\n"); + + ret = 0; + cleanup: + virBufferFreeAndReset(&childrenBuf); + VIR_FREE(vcpus); + return ret; +} + +static int virDomainCputuneDefFormat(virBufferPtr buf, virDomainDefPtr def, unsigned int flags) @@ -27132,6 +27328,9 @@ virDomainCputuneDefFormat(virBufferPtr buf, for (i = 0; i < def->nrestunes; i++) virDomainCachetuneDefFormat(&childrenBuf, def->restunes[i], flags);
+ for (i = 0; i < def->nrestunes; i++) + virDomainMemorytuneDefFormat(&childrenBuf, def->restunes[i], flags); + if (virBufferCheckError(&childrenBuf) < 0) return -1;
diff --git a/tests/genericxml2xmlindata/memorytune-colliding-allocs.xml b/tests/genericxml2xmlindata/memorytune-colliding-allocs.xml new file mode 100644 index 0000000..9b8ebaa --- /dev/null +++ b/tests/genericxml2xmlindata/memorytune-colliding-allocs.xml @@ -0,0 +1,30 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219136</memory> + <currentMemory unit='KiB'>219136</currentMemory> + <vcpu placement='static'>4</vcpu> + <cputune> + <memorytune vcpus='0'> + <node id='0' bandwidth='50'/> + <node id='0' bandwidth='50'/>
This is illegal because node id='#' uses the same number, right?
+ </memorytune> + </cputune> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-i686</emulator> + <controller type='usb' index='0'/> + <controller type='ide' index='0'/> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/genericxml2xmlindata/memorytune-colliding-cachetune.xml b/tests/genericxml2xmlindata/memorytune-colliding-cachetune.xml new file mode 100644 index 0000000..5416870 --- /dev/null +++ b/tests/genericxml2xmlindata/memorytune-colliding-cachetune.xml @@ -0,0 +1,32 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219136</memory> + <currentMemory unit='KiB'>219136</currentMemory> + <vcpu placement='static'>4</vcpu> + <cputune> + <cachetune vcpus='0-1'> + <cache id='0' level='3' type='code' size='12' unit='KiB'/> + </cachetune> + <memorytune vcpus='0'> + <node id='0' bandwidth='50'/> + </memorytune>
This is illegal because memory tune needs to be '0-1', right? A few fixups seem necessary. Mostly minor stuff. John
+ </cputune> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-i686</emulator> + <controller type='usb' index='0'/> + <controller type='ide' index='0'/> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/genericxml2xmlindata/memorytune.xml b/tests/genericxml2xmlindata/memorytune.xml new file mode 100644 index 0000000..ea03e22 --- /dev/null +++ b/tests/genericxml2xmlindata/memorytune.xml @@ -0,0 +1,33 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219136</memory> + <currentMemory unit='KiB'>219136</currentMemory> + <vcpu placement='static'>4</vcpu> + <cputune> + <memorytune vcpus='0-1'> + <node id='0' bandwidth='20'/> + <node id='1' bandwidth='30'/> + </memorytune> + <memorytune vcpus='3'> + <node id='0' bandwidth='50'/> + </memorytune> + </cputune> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-i686</emulator> + <controller type='usb' index='0'/> + <controller type='ide' index='0'/> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/genericxml2xmltest.c b/tests/genericxml2xmltest.c index 7a4fc1e..e6d4ef2 100644 --- a/tests/genericxml2xmltest.c +++ b/tests/genericxml2xmltest.c @@ -140,6 +140,11 @@ mymain(void) TEST_COMPARE_DOM_XML2XML_RESULT_FAIL_PARSE); DO_TEST_FULL("cachetune-colliding-types", false, true, TEST_COMPARE_DOM_XML2XML_RESULT_FAIL_PARSE); + DO_TEST("memorytune"); + DO_TEST_FULL("memorytune-colliding-allocs", false, true, + TEST_COMPARE_DOM_XML2XML_RESULT_FAIL_PARSE); + DO_TEST_FULL("memorytune-colliding-cachetune", false, true, + TEST_COMPARE_DOM_XML2XML_RESULT_FAIL_PARSE);
DO_TEST("tseg");

On 2018年07月27日 04:36, John Ferlan wrote:
subj:
Add support for memorytune XML processing for resctrl MBA
OK!
On 07/18/2018 03:57 AM, bing.niu@intel.com wrote:
From: Bing Niu <bing.niu@intel.com>
Introduce a new section memorytune to support memory bandwidth allocation. This is consistent with existing cachetune. As the example: below: <cputune> ...... <memorytune vcpus='0'> <node id='0' bandwidth='30'/> </memorytune> </cputune>
vpus --- vpus subjected to this memory bandwidth. id --- on which node memory bandwidth to be set. bandwidth --- the memory bandwidth percent to set.
Signed-off-by: Bing Niu <bing.niu@intel.com> --- docs/formatdomain.html.in | 35 ++++ docs/schemas/domaincommon.rng | 17 ++ src/conf/domain_conf.c | 199 +++++++++++++++++++++ .../memorytune-colliding-allocs.xml | 30 ++++ .../memorytune-colliding-cachetune.xml | 32 ++++ tests/genericxml2xmlindata/memorytune.xml | 33 ++++ tests/genericxml2xmltest.c | 5 + 7 files changed, 351 insertions(+) create mode 100644 tests/genericxml2xmlindata/memorytune-colliding-allocs.xml create mode 100644 tests/genericxml2xmlindata/memorytune-colliding-cachetune.xml create mode 100644 tests/genericxml2xmlindata/memorytune.xml
Strangely I expected to actually have a merge conflict here with previous changes, but I didn't!
Lucky~
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index a3afe13..4e38446 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -757,6 +757,10 @@ <cache id='0' level='3' type='both' size='3' unit='MiB'/> <cache id='1' level='3' type='both' size='3' unit='MiB'/> </cachetune> + <memorytune vcpus='0-3'> + <node id='0' bandwidth='60'/> + </memorytune> + </cputune> ... </domain> @@ -950,7 +954,38 @@ </dl> </dd> </dl> + </dd>
+ <dt><code>memorytune</code></dt> + <dd> + Optional <code>memorytune</code> element can control allocations for
s/Optional/<span class="since">Since 4.7.0</span>, the optional/
e.g. Since 4.7.0, the optional...
OK! I added that before, but deleted since I am not sure which release this series can be accepted
+ memory bandwidth using the resctrl on the host. Whether or not is this + supported can be gathered from capabilities where some limitations like + minimum bandwidth and required granularity are reported as well. The + required attribute <code>vcpus</code> specifies to which vCPUs this + allocation applies. A vCPU can only be member of one + <code>memorytune</code> element allocations. <code>vcpus</code> specified
s/./. The/
OK!
+ by <code>memorytune</code> can be identical to those specified by + <code>cachetune</code>. However they are not allowed to overlap each other.
The last 2 sentences are "confusing". We didn't add similar wording to cachetune... It's one of those visual things I suppose, but it seems like there's a relationship between cachetune and memtune, but only if they intersect vCPU values. Thus if cachetune "joins" vcpus='0-3', then memtune must also join them - using other combinations of 0-3 isn't allowed.
Yes! Let's update cachetune part also.
I have to only imagine that would get more complicated with RDT
+ Supported subelements are: + <dl> + <dt><code>node</code></dt> + <dd> + This element controls the allocation of CPU memory bandwidth and has the + following attributes: + <dl> + <dt><code>id</code></dt> + <dd> + Host node id from which to allocate memory bandwidth. + </dd> + <dt><code>bandwidth</code></dt> + <dd> + The memory bandwidth to allocate from this node. The value by default + is in percentage.
Should we indicate that it must be between 1 and 100 inclusive?
Yes. However the minimum allowed memory bandwidth is defined by min_bandwidth of RDT. This is reported in host capability XML.
+ </dd> + </dl> + </dd> + </dl> </dd> </dl>
diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index f24a563..b4cd96b 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -983,6 +983,23 @@ </oneOrMore> </element> </zeroOrMore> + <zeroOrMore> + <element nit's trying to clear up in my mind whether the vcpus used byame="memorytune"> + <attribute name="vcpus"> + <ref name='cpuset'/> + </attribute> + <oneOrMore> + <element name="node"> + <attribute name="id"> + <ref name='unsignedInt'/> + </attribute> + <attribute name="bandwidth"> + <ref name='unsignedInt'/> + </attribute> + </element> + </oneOrMore> + </element> + </zeroOrMore> </interleave> </element> </define> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 695981c..ea9276f 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -19139,6 +19139,128 @@ virDomainCachetuneDefParse(virDomainDefPtr def, }
+static int +virDomainMemorytuneDefParseMemory(xmlXPathContextPtr ctxt, + xmlNodePtr node, + virResctrlAllocPtr alloc) +{ + xmlNodePtr oldnode = ctxt->node; + unsigned int id; + unsigned int bandwidth; + char *tmp = NULL; + int ret = -1; + + ctxt->node = node; + + tmp = virXMLPropString(node, "id"); + if (!tmp) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("Missing memorytune attribute 'id'")); + goto cleanup; + } + if (virStrToLong_uip(tmp, NULL, 10, &id) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("Invalid memorytune attribute 'id' value '%s'"), + tmp); + goto cleanup; + } + VIR_FREE(tmp); + + tmp = virXMLPropString(node, "bandwidth"); + if (!tmp) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("Missing memorytune attribute 'bandwidth'")); + goto cleanup; + } + if (virStrToLong_uip(tmp, NULL, 10, &bandwidth) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("Invalid memorytune attribute 'bandwidth' value '%s'"), + tmp); + goto cleanup; + }
Should we check that bandwidth is between 1 and 100 inclusive? eg < 1 || > 100, then error. Or leave that to SetMemoryBandwidth (IDC).
Let's do it in SetMemoryBandwidth. Make XML parse parameters only. :)
+ VIR_FREE(tmp); + if (virResctrlSetMemoryBandwidth(alloc, id, bandwidth) < 0) + goto cleanup; + + ret = 0; + cleanup: + ctxt->node = oldnode; + VIR_FREE(tmp); + return ret; +} + + +static int +virDomainMemorytuneDefParse(virDomainDefPtr def, + xmlXPathContextPtr ctxt, + xmlNodePtr node, + unsigned int flags) +{ + xmlNodePtr oldnode = ctxt->node; + xmlNodePtr *nodes = NULL; + virBitmapPtr vcpus = NULL; + virResctrlAllocPtr alloc = NULL; + ssize_t i = 0; + int n; + int ret = -1; + bool new_alloc = false; + + ctxt->node = node; + + if (virDomainRestuneParseVcpus(def, node, &vcpus) < 0) + goto cleanup; + + if (virBitmapIsAllClear(vcpus)) { + ret = 0; + goto cleanup; + }
Just realized that for here and for cachetune we never parse the subelements if BitmapIsAllClear.
That would mean anything provided is lost - of course it seems silly to have an empty bitmap too. Do we even test for that? Is it even feasible?
Above checking is actually for XML define a vcpus that is not exist. If domain define vcpu number as 2. However XML describe cachetune/ memtune <cachetune vcpus='10'> ...... <cachetune> or <memtune vcpus='10'> ...... <memtune> Domain can still be created, but cachetune or memtune is ignored. Since vcpus element is controlling which vcpu thread will be added to corresponding resctrl group. This is existing cachetune behavior. memtune just follow.
+ + if ((n = virXPathNodeSet("./node", ctxt, &nodes)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Cannot extract memory nodes under memorytune")); + goto cleanup; + } + + if (virDomainFindResctrlAlloc(def, vcpus, &alloc) < 0) + goto cleanup; + + if (!alloc) { + alloc = virResctrlAllocNew(); + if (!alloc) + goto cleanup; + new_alloc = true; + } else { + alloc = virObjectRef(alloc);
Ahh... so this is where I had some confusion initially, but makes more sense now given one of the failure xml examples.
Still, does this cause issues:
<memtune vcpus='3-5'> <node id='0' bandwidth='60'/> </memtune <memtune vcpus='3-5'> <node id='0' bandwidth='60'/> </memtune>
This is valid in virResctrlSetMemoryBandwidth and throw an error. Since you already set memory bandwidth at node 0.
or would the second one be <node id='1' bandwidth='60'/>
This is good, since you are setting memory bandwidth on different node.
Yes, they should put <node id='1' with the first group, but QE likes to stretch the boundaries of imagination once they read the code.
+ } + + for (i = 0; i < n; i++) { + if (virDomainMemorytuneDefParseMemory(ctxt, nodes[i], alloc) < 0) + goto cleanup; + } + if (virResctrlAllocIsEmpty(alloc)) { + ret = 0; + goto cleanup; + } + /* + * If this is a new allocation, format ID and append to restune, otherwise + * just update the existing alloc information */
The following code doesn't do what the "otherwise" comment suggests - although I perhaps still confused over the @alloc usage above.
If alloc is existing already. Memory bandwidth information is updated in virDomainMemorytuneDefParseMemory and alloc structure is already stored in domain->restunes(will be resallocs in next version :) ). virDomainFindResctrlAlloc is used to find a existing alloc and return its pointer. If alloc is new, we need append new alloc structure to domain->restunes.
+ if (new_alloc) { + if (virDomainUpdateRestune(def, node, alloc, vcpus, flags) < 0) + goto cleanup; + vcpus = NULL; + alloc = NULL; + } + + ret = 0; + cleanup: + ctxt->node = oldnode; + virObjectUnref(alloc); + virBitmapFree(vcpus); + VIR_FREE(nodes); + return ret; +} + + static virDomainDefPtr virDomainDefParseXML(xmlDocPtr xml, xmlNodePtr root, @@ -19734,6 +19856,18 @@ virDomainDefParseXML(xmlDocPtr xml, } VIR_FREE(nodes);
+ if ((n = virXPathNodeSet("./cputune/memorytune", ctxt, &nodes)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("cannot extract memorytune nodes")); + goto error; + } + + for (i = 0; i < n; i++) { + if (virDomainMemorytuneDefParse(def, ctxt, nodes[i], flags) < 0) + goto error; + } + VIR_FREE(nodes); + if (virCPUDefParseXML(ctxt, "./cpu[1]", VIR_CPU_TYPE_GUEST, &def->cpu) < 0) goto error;
@@ -27028,6 +27162,68 @@ virDomainCachetuneDefFormat(virBufferPtr buf,
static int +virDomainMemorytuneDefFormatHelper(unsigned int id, + unsigned int bandwidth, + void *opaque) +{ + virBufferPtr buf = opaque; + + virBufferAsprintf(buf, + "<node id='%u' bandwidth='%u'/>\n", + id, bandwidth); + return 0; +} + + +static int +virDomainMemorytuneDefFormat(virBufferPtr buf, + virDomainRestuneDefPtr restune, + unsigned int flags) +{ + virBuffer childrenBuf = VIR_BUFFER_INITIALIZER; + char *vcpus = NULL; + int ret = -1; + + virBufferSetChildIndent(&childrenBuf, buf); + virResctrlAllocForeachMemory(restune->alloc, + virDomainMemorytuneDefFormatHelper, + &childrenBuf);
probably could wrap this in a ignore_value(); since FormatHelper only ever returns 0. The error checking is next:
Let's add a return value check, and also for cachetune part. ;)
+ +
You can removed 1 of the 2 empty lines above here.
OK
+ if (virBufferCheckError(&childrenBuf) < 0) + goto cleanup; + + if (!virBufferUse(&childrenBuf)) { + ret = 0; + goto cleanup; + } + + vcpus = virBitmapFormat(restune->vcpus); + if (!vcpus) + goto cleanup; + + virBufferAsprintf(buf, "<memorytune vcpus='%s'", vcpus); + + if (!(flags & VIR_DOMAIN_DEF_FORMAT_INACTIVE)) { + const char *alloc_id = virResctrlAllocGetID(restune->alloc); + if (!alloc_id) + goto cleanup; + + virBufferAsprintf(buf, " id='%s'", alloc_id); + } + virBufferAddLit(buf, ">\n"); + + virBufferAddBuffer(buf, &childrenBuf); + virBufferAddLit(buf, "</memorytune>\n"); + + ret = 0; + cleanup: + virBufferFreeAndReset(&childrenBuf); + VIR_FREE(vcpus); + return ret; +} + +static int virDomainCputuneDefFormat(virBufferPtr buf, virDomainDefPtr def, unsigned int flags) @@ -27132,6 +27328,9 @@ virDomainCputuneDefFormat(virBufferPtr buf, for (i = 0; i < def->nrestunes; i++) virDomainCachetuneDefFormat(&childrenBuf, def->restunes[i], flags);
+ for (i = 0; i < def->nrestunes; i++) + virDomainMemorytuneDefFormat(&childrenBuf, def->restunes[i], flags); + if (virBufferCheckError(&childrenBuf) < 0) return -1;
diff --git a/tests/genericxml2xmlindata/memorytune-colliding-allocs.xml b/tests/genericxml2xmlindata/memorytune-colliding-allocs.xml new file mode 100644 index 0000000..9b8ebaa --- /dev/null +++ b/tests/genericxml2xmlindata/memorytune-colliding-allocs.xml @@ -0,0 +1,30 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219136</memory> + <currentMemory unit='KiB'>219136</currentMemory> + <vcpu placement='static'>4</vcpu> + <cputune> + <memorytune vcpus='0'> + <node id='0' bandwidth='50'/> + <node id='0' bandwidth='50'/>
This is illegal because node id='#' uses the same number, right?
Yes, illegal
+ </memorytune> + </cputune> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-i686</emulator> + <controller type='usb' index='0'/> + <controller type='ide' index='0'/> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/genericxml2xmlindata/memorytune-colliding-cachetune.xml b/tests/genericxml2xmlindata/memorytune-colliding-cachetune.xml new file mode 100644 index 0000000..5416870 --- /dev/null +++ b/tests/genericxml2xmlindata/memorytune-colliding-cachetune.xml @@ -0,0 +1,32 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219136</memory> + <currentMemory unit='KiB'>219136</currentMemory> + <vcpu placement='static'>4</vcpu> + <cputune> + <cachetune vcpus='0-1'> + <cache id='0' level='3' type='code' size='12' unit='KiB'/> + </cachetune> + <memorytune vcpus='0'> + <node id='0' bandwidth='50'/> + </memorytune>
This is illegal because memory tune needs to be '0-1', right?
Yes, you are right. And thanks a lot~
A few fixups seem necessary. Mostly minor stuff.
John
[....]

From: Bing Niu <bing.niu@intel.com> Add new XML section to report host's memory bandwidth allocation capability. The format as below example: <host> ..... <memory_bandwidth> <node id='0' cpus='0-19'> <control granularity='10' min ='10' maxAllocs='8'/> </node> </memory_bandwidth> </host> granularity ---- granularity of memory bandwidth, unit percentage. min ---- minimum memory bandwidth allowed, unit percentage. maxAllocs ---- maximum memory bandwidth allocation group supported. Signed-off-by: Bing Niu <bing.niu@intel.com> --- docs/schemas/capability.rng | 33 +++++++ src/conf/capabilities.c | 108 +++++++++++++++++++++ src/conf/capabilities.h | 11 +++ src/util/virresctrl.c | 20 ++++ src/util/virresctrl.h | 15 +++ .../linux-resctrl/resctrl/info/MB/bandwidth_gran | 1 + .../linux-resctrl/resctrl/info/MB/min_bandwidth | 1 + .../linux-resctrl/resctrl/info/MB/num_closids | 1 + tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml | 8 ++ tests/virresctrldata/resctrl.schemata | 1 + 10 files changed, 199 insertions(+) create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/MB/bandwidth_gran create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/MB/min_bandwidth create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/MB/num_closids diff --git a/docs/schemas/capability.rng b/docs/schemas/capability.rng index 52164d5..d61515c 100644 --- a/docs/schemas/capability.rng +++ b/docs/schemas/capability.rng @@ -51,6 +51,9 @@ <optional> <ref name='cache'/> </optional> + <optional> + <ref name='memory_bandwidth'/> + </optional> <zeroOrMore> <ref name='secmodel'/> </zeroOrMore> @@ -326,6 +329,36 @@ </attribute> </define> + <define name='memory_bandwidth'> + <element name='memory_bandwidth'> + <oneOrMore> + <element name='node'> + <attribute name='id'> + <ref name='unsignedInt'/> + </attribute> + <attribute name='cpus'> + <ref name='cpuset'/> + </attribute> + <zeroOrMore> + <element name='control'> + <attribute name='granularity'> + <ref name='unsignedInt'/> + </attribute> + <optional> + <attribute name='min'> + <ref name='unsignedInt'/> + </attribute> + </optional> + <attribute name='maxAllocs'> + <ref name='unsignedInt'/> + </attribute> + </element> + </zeroOrMore> + </element> + </oneOrMore> + </element> + </define> + <define name='guestcaps'> <element name='guest'> <ref name='ostype'/> diff --git a/src/conf/capabilities.c b/src/conf/capabilities.c index 7a810ef..3f52296 100644 --- a/src/conf/capabilities.c +++ b/src/conf/capabilities.c @@ -198,6 +198,16 @@ virCapabilitiesFreeNUMAInfo(virCapsPtr caps) } static void +virCapsHostMemBWNodeFree(virCapsHostMemBWNodePtr ptr) +{ + if (!ptr) + return; + + virBitmapFree(ptr->cpus); + VIR_FREE(ptr); +} + +static void virCapabilitiesClearSecModel(virCapsHostSecModelPtr secmodel) { size_t i; @@ -239,6 +249,11 @@ virCapsDispose(void *object) virCapsHostCacheBankFree(caps->host.caches[i]); VIR_FREE(caps->host.caches); + for (i = 0; i < caps->host.nnodes; i++) + virCapsHostMemBWNodeFree(caps->host.nodes[i]); + VIR_FREE(caps->host.nodes); + + VIR_FREE(caps->host.netprefix); VIR_FREE(caps->host.pagesSize); virCPUDefFree(caps->host.cpu); @@ -957,6 +972,58 @@ virCapabilitiesFormatCaches(virBufferPtr buf, return 0; } +static int +virCapabilitiesFormatMemoryBandwidth(virBufferPtr buf, + size_t nnodes, + virCapsHostMemBWNodePtr *nodes) +{ + size_t i = 0; + virBuffer controlBuf = VIR_BUFFER_INITIALIZER; + + if (!nnodes) + return 0; + + virBufferAddLit(buf, "<memory_bandwidth>\n"); + virBufferAdjustIndent(buf, 2); + + for (i = 0; i < nnodes; i++) { + virCapsHostMemBWNodePtr node = nodes[i]; + virResctrlInfoMemBWPerNodePtr control = &node->control; + char *cpus_str = virBitmapFormat(node->cpus); + + if (!cpus_str) + return -1; + + virBufferAsprintf(buf, + "<node id='%u' cpus='%s'", + node->id, cpus_str); + VIR_FREE(cpus_str); + + virBufferSetChildIndent(&controlBuf, buf); + virBufferAsprintf(&controlBuf, + "<control granularity='%u' min ='%u' " + "maxAllocs='%u'/>\n", + control->granularity, control->min, + control->max_allocation); + + if (virBufferCheckError(&controlBuf) < 0) + return -1; + + if (virBufferUse(&controlBuf)) { + virBufferAddLit(buf, ">\n"); + virBufferAddBuffer(buf, &controlBuf); + virBufferAddLit(buf, "</node>\n"); + } else { + virBufferAddLit(buf, "/>\n"); + } + } + + virBufferAdjustIndent(buf, -2); + virBufferAddLit(buf, "</memory_bandwidth>\n"); + + return 0; +} + /** * virCapabilitiesFormatXML: * @caps: capabilities to format @@ -1060,6 +1127,10 @@ virCapabilitiesFormatXML(virCapsPtr caps) caps->host.caches) < 0) goto error; + if (virCapabilitiesFormatMemoryBandwidth(&buf, caps->host.nnodes, + caps->host.nodes) < 0) + goto error; + for (i = 0; i < caps->host.nsecModels; i++) { virBufferAddLit(&buf, "<secmodel>\n"); virBufferAdjustIndent(&buf, 2); @@ -1602,6 +1673,40 @@ virCapabilitiesInitResctrl(virCapsPtr caps) } +static int +virCapabilitiesInitResctrlMemory(virCapsPtr caps) +{ + virCapsHostMemBWNodePtr node = NULL; + size_t i = 0; + int ret = -1; + + for (i = 0; i < caps->host.ncaches; i++) { + virCapsHostCacheBankPtr bank = caps->host.caches[i]; + if (VIR_ALLOC(node) < 0) + goto cleanup; + + if (virResctrlInfoGetMemoryBandwidth(caps->host.resctrl, + bank->level, &node->control) > 0) { + node->id = bank->id; + if (!(node->cpus = virBitmapNewCopy(bank->cpus))) + goto cleanup; + + if (VIR_APPEND_ELEMENT(caps->host.nodes, + caps->host.nnodes, node) < 0) { + goto cleanup; + } + } + virCapsHostMemBWNodeFree(node); + node = NULL; + } + + ret = 0; + cleanup: + virCapsHostMemBWNodeFree(node); + return ret; +} + + int virCapabilitiesInitCaches(virCapsPtr caps) { @@ -1731,6 +1836,9 @@ virCapabilitiesInitCaches(virCapsPtr caps) qsort(caps->host.caches, caps->host.ncaches, sizeof(*caps->host.caches), virCapsHostCacheBankSorter); + if (virCapabilitiesInitResctrlMemory(caps) < 0) + goto cleanup; + ret = 0; cleanup: VIR_FREE(type); diff --git a/src/conf/capabilities.h b/src/conf/capabilities.h index fe1b9ea..046e275 100644 --- a/src/conf/capabilities.h +++ b/src/conf/capabilities.h @@ -151,6 +151,14 @@ struct _virCapsHostCacheBank { virResctrlInfoPerCachePtr *controls; }; +typedef struct _virCapsHostMemBWNode virCapsHostMemBWNode; +typedef virCapsHostMemBWNode *virCapsHostMemBWNodePtr; +struct _virCapsHostMemBWNode { + unsigned int id; + virBitmapPtr cpus; /* All CPUs that belong to this node*/ + virResctrlInfoMemBWPerNode control; +}; + typedef struct _virCapsHost virCapsHost; typedef virCapsHost *virCapsHostPtr; struct _virCapsHost { @@ -175,6 +183,9 @@ struct _virCapsHost { size_t ncaches; virCapsHostCacheBankPtr *caches; + size_t nnodes; + virCapsHostMemBWNodePtr *nodes; + size_t nsecModels; virCapsHostSecModelPtr secModels; diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index bec2afd..c4ccebc 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -633,6 +633,26 @@ virResctrlInfoIsEmpty(virResctrlInfoPtr resctrl) int +virResctrlInfoGetMemoryBandwidth(virResctrlInfoPtr resctrl, + unsigned int level, + virResctrlInfoMemBWPerNodePtr control) +{ + virResctrlInfoMemBWPtr membw_info = resctrl->membw_info; + + if (!membw_info) + return 0; + + if (membw_info->last_level_cache != level) + return 0; + + control->granularity = membw_info->bandwidth_granularity; + control->min = membw_info->min_bandwidth; + control->max_allocation = membw_info->max_allocation; + return 1; +} + + +int virResctrlInfoGetCache(virResctrlInfoPtr resctrl, unsigned int level, unsigned long long size, diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h index d43fd31..4333218 100644 --- a/src/util/virresctrl.h +++ b/src/util/virresctrl.h @@ -50,6 +50,17 @@ struct _virResctrlInfoPerCache { unsigned int max_allocation; }; +typedef struct _virResctrlInfoMemBWPerNode virResctrlInfoMemBWPerNode; +typedef virResctrlInfoMemBWPerNode *virResctrlInfoMemBWPerNodePtr; +struct _virResctrlInfoMemBWPerNode { + /* Smallest possible increase of the allocation bandwidth in percentage */ + unsigned int granularity; + /* Minimal allocatable bandwidth in percentage */ + unsigned int min; + /* Maximum number of simultaneous allocations */ + unsigned int max_allocation; +}; + typedef struct _virResctrlInfo virResctrlInfo; typedef virResctrlInfo *virResctrlInfoPtr; @@ -63,6 +74,10 @@ virResctrlInfoGetCache(virResctrlInfoPtr resctrl, size_t *ncontrols, virResctrlInfoPerCachePtr **controls); +int +virResctrlInfoGetMemoryBandwidth(virResctrlInfoPtr resctrl, + unsigned int level, + virResctrlInfoMemBWPerNodePtr control); /* Alloc-related things */ typedef struct _virResctrlAlloc virResctrlAlloc; typedef virResctrlAlloc *virResctrlAllocPtr; diff --git a/tests/vircaps2xmldata/linux-resctrl/resctrl/info/MB/bandwidth_gran b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/MB/bandwidth_gran new file mode 100644 index 0000000..f599e28 --- /dev/null +++ b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/MB/bandwidth_gran @@ -0,0 +1 @@ +10 diff --git a/tests/vircaps2xmldata/linux-resctrl/resctrl/info/MB/min_bandwidth b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/MB/min_bandwidth new file mode 100644 index 0000000..f599e28 --- /dev/null +++ b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/MB/min_bandwidth @@ -0,0 +1 @@ +10 diff --git a/tests/vircaps2xmldata/linux-resctrl/resctrl/info/MB/num_closids b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/MB/num_closids new file mode 100644 index 0000000..b8626c4 --- /dev/null +++ b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/MB/num_closids @@ -0,0 +1 @@ +4 diff --git a/tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml b/tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml index 4840614..9b00cf0 100644 --- a/tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml +++ b/tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml @@ -49,6 +49,14 @@ <control granularity='768' min='1536' unit='KiB' type='both' maxAllocs='4'/> </bank> </cache> + <memory_bandwidth> + <node id='0' cpus='0-5'> + <control granularity='10' min ='10' maxAllocs='4'/> + </node> + <node id='1' cpus='6-11'> + <control granularity='10' min ='10' maxAllocs='4'/> + </node> + </memory_bandwidth> </host> </capabilities> diff --git a/tests/virresctrldata/resctrl.schemata b/tests/virresctrldata/resctrl.schemata index fa980e5..2578822 100644 --- a/tests/virresctrldata/resctrl.schemata +++ b/tests/virresctrldata/resctrl.schemata @@ -1 +1,2 @@ L3:0=000ff;1=000f0 +MB:0=100;1=100 -- 2.7.4

On 07/18/2018 03:57 AM, bing.niu@intel.com wrote:
From: Bing Niu <bing.niu@intel.com>
Add new XML section to report host's memory bandwidth allocation capability. The format as below example:
<host> ..... <memory_bandwidth> <node id='0' cpus='0-19'> <control granularity='10' min ='10' maxAllocs='8'/> </node> </memory_bandwidth> </host>
granularity ---- granularity of memory bandwidth, unit percentage. min ---- minimum memory bandwidth allowed, unit percentage. maxAllocs ---- maximum memory bandwidth allocation group supported.
Signed-off-by: Bing Niu <bing.niu@intel.com> --- docs/schemas/capability.rng | 33 +++++++ src/conf/capabilities.c | 108 +++++++++++++++++++++ src/conf/capabilities.h | 11 +++ src/util/virresctrl.c | 20 ++++ src/util/virresctrl.h | 15 +++ .../linux-resctrl/resctrl/info/MB/bandwidth_gran | 1 + .../linux-resctrl/resctrl/info/MB/min_bandwidth | 1 + .../linux-resctrl/resctrl/info/MB/num_closids | 1 + tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml | 8 ++ tests/virresctrldata/resctrl.schemata | 1 + 10 files changed, 199 insertions(+) create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/MB/bandwidth_gran create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/MB/min_bandwidth create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/MB/num_closids
What about virsh available views? And similar to the RDT series what about domstats? I think you can get some good ideas from the RDT CMT RFC that's posted. Not even sure if it's already done internally - but pointing it out... It doesn't have to be done as part of the series, but eventually it may be nice. I'll give the following a cursory look as I have other tasks needing some attention. I'll leave it in the back of my mind that I have to be more thorough on the next pass once I get here.
diff --git a/docs/schemas/capability.rng b/docs/schemas/capability.rng index 52164d5..d61515c 100644 --- a/docs/schemas/capability.rng +++ b/docs/schemas/capability.rng @@ -51,6 +51,9 @@ <optional> <ref name='cache'/> </optional> + <optional> + <ref name='memory_bandwidth'/> + </optional> <zeroOrMore> <ref name='secmodel'/> </zeroOrMore> @@ -326,6 +329,36 @@ </attribute> </define>
+ <define name='memory_bandwidth'> + <element name='memory_bandwidth'> + <oneOrMore> + <element name='node'> + <attribute name='id'> + <ref name='unsignedInt'/> + </attribute> + <attribute name='cpus'> + <ref name='cpuset'/> + </attribute> + <zeroOrMore> + <element name='control'> + <attribute name='granularity'> + <ref name='unsignedInt'/> + </attribute> + <optional> + <attribute name='min'> + <ref name='unsignedInt'/> + </attribute> + </optional> + <attribute name='maxAllocs'> + <ref name='unsignedInt'/> + </attribute> + </element> + </zeroOrMore> + </element> + </oneOrMore> + </element> + </define> + <define name='guestcaps'> <element name='guest'> <ref name='ostype'/> diff --git a/src/conf/capabilities.c b/src/conf/capabilities.c index 7a810ef..3f52296 100644 --- a/src/conf/capabilities.c +++ b/src/conf/capabilities.c @@ -198,6 +198,16 @@ virCapabilitiesFreeNUMAInfo(virCapsPtr caps) }
static void +virCapsHostMemBWNodeFree(virCapsHostMemBWNodePtr ptr) +{ + if (!ptr) + return; + + virBitmapFree(ptr->cpus); + VIR_FREE(ptr); +} + +static void virCapabilitiesClearSecModel(virCapsHostSecModelPtr secmodel) { size_t i; @@ -239,6 +249,11 @@ virCapsDispose(void *object) virCapsHostCacheBankFree(caps->host.caches[i]); VIR_FREE(caps->host.caches);
+ for (i = 0; i < caps->host.nnodes; i++) + virCapsHostMemBWNodeFree(caps->host.nodes[i]); + VIR_FREE(caps->host.nodes); + +
Remove one of the blank lines. This was the only issue I saw in my quick glance - rest seemed OK. John
VIR_FREE(caps->host.netprefix); VIR_FREE(caps->host.pagesSize); virCPUDefFree(caps->host.cpu); @@ -957,6 +972,58 @@ virCapabilitiesFormatCaches(virBufferPtr buf, return 0; }
+static int +virCapabilitiesFormatMemoryBandwidth(virBufferPtr buf, + size_t nnodes, + virCapsHostMemBWNodePtr *nodes) +{ + size_t i = 0; + virBuffer controlBuf = VIR_BUFFER_INITIALIZER; + + if (!nnodes) + return 0; + + virBufferAddLit(buf, "<memory_bandwidth>\n"); + virBufferAdjustIndent(buf, 2); + + for (i = 0; i < nnodes; i++) { + virCapsHostMemBWNodePtr node = nodes[i]; + virResctrlInfoMemBWPerNodePtr control = &node->control; + char *cpus_str = virBitmapFormat(node->cpus); + + if (!cpus_str) + return -1; + + virBufferAsprintf(buf, + "<node id='%u' cpus='%s'", + node->id, cpus_str); + VIR_FREE(cpus_str); + + virBufferSetChildIndent(&controlBuf, buf); + virBufferAsprintf(&controlBuf, + "<control granularity='%u' min ='%u' " + "maxAllocs='%u'/>\n", + control->granularity, control->min, + control->max_allocation); + + if (virBufferCheckError(&controlBuf) < 0) + return -1; + + if (virBufferUse(&controlBuf)) { + virBufferAddLit(buf, ">\n"); + virBufferAddBuffer(buf, &controlBuf); + virBufferAddLit(buf, "</node>\n"); + } else { + virBufferAddLit(buf, "/>\n"); + } + } + + virBufferAdjustIndent(buf, -2); + virBufferAddLit(buf, "</memory_bandwidth>\n"); + + return 0; +} + /** * virCapabilitiesFormatXML: * @caps: capabilities to format @@ -1060,6 +1127,10 @@ virCapabilitiesFormatXML(virCapsPtr caps) caps->host.caches) < 0) goto error;
+ if (virCapabilitiesFormatMemoryBandwidth(&buf, caps->host.nnodes, + caps->host.nodes) < 0) + goto error; + for (i = 0; i < caps->host.nsecModels; i++) { virBufferAddLit(&buf, "<secmodel>\n"); virBufferAdjustIndent(&buf, 2); @@ -1602,6 +1673,40 @@ virCapabilitiesInitResctrl(virCapsPtr caps) }
+static int +virCapabilitiesInitResctrlMemory(virCapsPtr caps) +{ + virCapsHostMemBWNodePtr node = NULL; + size_t i = 0; + int ret = -1; + + for (i = 0; i < caps->host.ncaches; i++) { + virCapsHostCacheBankPtr bank = caps->host.caches[i]; + if (VIR_ALLOC(node) < 0) + goto cleanup; + + if (virResctrlInfoGetMemoryBandwidth(caps->host.resctrl, + bank->level, &node->control) > 0) { + node->id = bank->id; + if (!(node->cpus = virBitmapNewCopy(bank->cpus))) + goto cleanup; + + if (VIR_APPEND_ELEMENT(caps->host.nodes, + caps->host.nnodes, node) < 0) { + goto cleanup; + } + } + virCapsHostMemBWNodeFree(node); + node = NULL; + } + + ret = 0; + cleanup: + virCapsHostMemBWNodeFree(node); + return ret; +} + + int virCapabilitiesInitCaches(virCapsPtr caps) { @@ -1731,6 +1836,9 @@ virCapabilitiesInitCaches(virCapsPtr caps) qsort(caps->host.caches, caps->host.ncaches, sizeof(*caps->host.caches), virCapsHostCacheBankSorter);
+ if (virCapabilitiesInitResctrlMemory(caps) < 0) + goto cleanup; + ret = 0; cleanup: VIR_FREE(type); diff --git a/src/conf/capabilities.h b/src/conf/capabilities.h index fe1b9ea..046e275 100644 --- a/src/conf/capabilities.h +++ b/src/conf/capabilities.h @@ -151,6 +151,14 @@ struct _virCapsHostCacheBank { virResctrlInfoPerCachePtr *controls; };
+typedef struct _virCapsHostMemBWNode virCapsHostMemBWNode; +typedef virCapsHostMemBWNode *virCapsHostMemBWNodePtr; +struct _virCapsHostMemBWNode { + unsigned int id; + virBitmapPtr cpus; /* All CPUs that belong to this node*/ + virResctrlInfoMemBWPerNode control; +}; + typedef struct _virCapsHost virCapsHost; typedef virCapsHost *virCapsHostPtr; struct _virCapsHost { @@ -175,6 +183,9 @@ struct _virCapsHost { size_t ncaches; virCapsHostCacheBankPtr *caches;
+ size_t nnodes; + virCapsHostMemBWNodePtr *nodes; + size_t nsecModels; virCapsHostSecModelPtr secModels;
diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index bec2afd..c4ccebc 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -633,6 +633,26 @@ virResctrlInfoIsEmpty(virResctrlInfoPtr resctrl)
int +virResctrlInfoGetMemoryBandwidth(virResctrlInfoPtr resctrl, + unsigned int level, + virResctrlInfoMemBWPerNodePtr control) +{ + virResctrlInfoMemBWPtr membw_info = resctrl->membw_info; + + if (!membw_info) + return 0; + + if (membw_info->last_level_cache != level) + return 0; + + control->granularity = membw_info->bandwidth_granularity; + control->min = membw_info->min_bandwidth; + control->max_allocation = membw_info->max_allocation; + return 1; +} + + +int virResctrlInfoGetCache(virResctrlInfoPtr resctrl, unsigned int level, unsigned long long size, diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h index d43fd31..4333218 100644 --- a/src/util/virresctrl.h +++ b/src/util/virresctrl.h @@ -50,6 +50,17 @@ struct _virResctrlInfoPerCache { unsigned int max_allocation; };
+typedef struct _virResctrlInfoMemBWPerNode virResctrlInfoMemBWPerNode; +typedef virResctrlInfoMemBWPerNode *virResctrlInfoMemBWPerNodePtr; +struct _virResctrlInfoMemBWPerNode { + /* Smallest possible increase of the allocation bandwidth in percentage */ + unsigned int granularity; + /* Minimal allocatable bandwidth in percentage */ + unsigned int min; + /* Maximum number of simultaneous allocations */ + unsigned int max_allocation; +}; + typedef struct _virResctrlInfo virResctrlInfo; typedef virResctrlInfo *virResctrlInfoPtr;
@@ -63,6 +74,10 @@ virResctrlInfoGetCache(virResctrlInfoPtr resctrl, size_t *ncontrols, virResctrlInfoPerCachePtr **controls);
+int +virResctrlInfoGetMemoryBandwidth(virResctrlInfoPtr resctrl, + unsigned int level, + virResctrlInfoMemBWPerNodePtr control); /* Alloc-related things */ typedef struct _virResctrlAlloc virResctrlAlloc; typedef virResctrlAlloc *virResctrlAllocPtr; diff --git a/tests/vircaps2xmldata/linux-resctrl/resctrl/info/MB/bandwidth_gran b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/MB/bandwidth_gran new file mode 100644 index 0000000..f599e28 --- /dev/null +++ b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/MB/bandwidth_gran @@ -0,0 +1 @@ +10 diff --git a/tests/vircaps2xmldata/linux-resctrl/resctrl/info/MB/min_bandwidth b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/MB/min_bandwidth new file mode 100644 index 0000000..f599e28 --- /dev/null +++ b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/MB/min_bandwidth @@ -0,0 +1 @@ +10 diff --git a/tests/vircaps2xmldata/linux-resctrl/resctrl/info/MB/num_closids b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/MB/num_closids new file mode 100644 index 0000000..b8626c4 --- /dev/null +++ b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/MB/num_closids @@ -0,0 +1 @@ +4 diff --git a/tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml b/tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml index 4840614..9b00cf0 100644 --- a/tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml +++ b/tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml @@ -49,6 +49,14 @@ <control granularity='768' min='1536' unit='KiB' type='both' maxAllocs='4'/> </bank> </cache> + <memory_bandwidth> + <node id='0' cpus='0-5'> + <control granularity='10' min ='10' maxAllocs='4'/> + </node> + <node id='1' cpus='6-11'> + <control granularity='10' min ='10' maxAllocs='4'/> + </node> + </memory_bandwidth> </host>
</capabilities> diff --git a/tests/virresctrldata/resctrl.schemata b/tests/virresctrldata/resctrl.schemata index fa980e5..2578822 100644 --- a/tests/virresctrldata/resctrl.schemata +++ b/tests/virresctrldata/resctrl.schemata @@ -1 +1,2 @@ L3:0=000ff;1=000f0 +MB:0=100;1=100

On 2018年07月27日 05:04, John Ferlan wrote:
On 07/18/2018 03:57 AM, bing.niu@intel.com wrote:
From: Bing Niu <bing.niu@intel.com>
Add new XML section to report host's memory bandwidth allocation capability. The format as below example:
<host> ..... <memory_bandwidth> <node id='0' cpus='0-19'> <control granularity='10' min ='10' maxAllocs='8'/> </node> </memory_bandwidth> </host>
granularity ---- granularity of memory bandwidth, unit percentage. min ---- minimum memory bandwidth allowed, unit percentage. maxAllocs ---- maximum memory bandwidth allocation group supported.
Signed-off-by: Bing Niu <bing.niu@intel.com> --- docs/schemas/capability.rng | 33 +++++++ src/conf/capabilities.c | 108 +++++++++++++++++++++ src/conf/capabilities.h | 11 +++ src/util/virresctrl.c | 20 ++++ src/util/virresctrl.h | 15 +++ .../linux-resctrl/resctrl/info/MB/bandwidth_gran | 1 + .../linux-resctrl/resctrl/info/MB/min_bandwidth | 1 + .../linux-resctrl/resctrl/info/MB/num_closids | 1 + tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml | 8 ++ tests/virresctrldata/resctrl.schemata | 1 + 10 files changed, 199 insertions(+) create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/MB/bandwidth_gran create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/MB/min_bandwidth create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/MB/num_closids
What about virsh available views? And similar to the RDT series what about domstats? I think you can get some good ideas from the RDT CMT RFC that's posted. Not even sure if it's already done internally - but pointing it out... It doesn't have to be done as part of the series, but eventually it may be nice.
Yes. RDT CMT will follow this memory_bandwidth host capability XML. Huaqiang will handle that.
I'll give the following a cursory look as I have other tasks needing some attention. I'll leave it in the back of my mind that I have to be more thorough on the next pass once I get here.
diff --git a/docs/schemas/capability.rng b/docs/schemas/capability.rng index 52164d5..d61515c 100644 --- a/docs/schemas/capability.rng +++ b/docs/schemas/capability.rng @@ -51,6 +51,9 @@ <optional> <ref name='cache'/> </optional> + <optional> + <ref name='memory_bandwidth'/> + </optional> <zeroOrMore> <ref name='secmodel'/> </zeroOrMore> @@ -326,6 +329,36 @@ </attribute> </define>
+ <define name='memory_bandwidth'> + <element name='memory_bandwidth'> + <oneOrMore> + <element name='node'> + <attribute name='id'> + <ref name='unsignedInt'/> + </attribute> + <attribute name='cpus'> + <ref name='cpuset'/> + </attribute> + <zeroOrMore> + <element name='control'> + <attribute name='granularity'> + <ref name='unsignedInt'/> + </attribute> + <optional> + <attribute name='min'> + <ref name='unsignedInt'/> + </attribute> + </optional> + <attribute name='maxAllocs'> + <ref name='unsignedInt'/> + </attribute> + </element> + </zeroOrMore> + </element> + </oneOrMore> + </element> + </define> + <define name='guestcaps'> <element name='guest'> <ref name='ostype'/> diff --git a/src/conf/capabilities.c b/src/conf/capabilities.c index 7a810ef..3f52296 100644 --- a/src/conf/capabilities.c +++ b/src/conf/capabilities.c @@ -198,6 +198,16 @@ virCapabilitiesFreeNUMAInfo(virCapsPtr caps) }
static void +virCapsHostMemBWNodeFree(virCapsHostMemBWNodePtr ptr) +{ + if (!ptr) + return; + + virBitmapFree(ptr->cpus); + VIR_FREE(ptr); +} + +static void virCapabilitiesClearSecModel(virCapsHostSecModelPtr secmodel) { size_t i; @@ -239,6 +249,11 @@ virCapsDispose(void *object) virCapsHostCacheBankFree(caps->host.caches[i]); VIR_FREE(caps->host.caches);
+ for (i = 0; i < caps->host.nnodes; i++) + virCapsHostMemBWNodeFree(caps->host.nodes[i]); + VIR_FREE(caps->host.nodes); + +
Remove one of the blank lines.
OK. Thanks a lot~
This was the only issue I saw in my quick glance - rest seemed OK.
John
[.....]
participants (4)
-
bing.niu
-
bing.niu@intel.com
-
John Ferlan
-
Wang, Huaqiang