[libvirt] [PATCH 00/10] Introduce x86 Cache Monitoring Technology (CMT)

This series of patches introduced the x86 Cache Monitoring Technology (CMT) to libvirt by interacting with kernel resource control (resctrl) interface. CMT is one of the Intel(R) x86 CPU feature which belongs to the Resource Director Technology (RDT). CMT reports the occupancy of the last level cache, which is shared by all CPU cores. We have serval discussion about the enabling of CMT, please refer to following links for the RFCs. RFCv3 https://www.redhat.com/archives/libvir-list/2018-August/msg01213.html RFCv2 https://www.redhat.com/archives/libvir-list/2018-July/msg00409.html https://www.redhat.com/archives/libvir-list/2018-July/msg01241.html RFCv1 https://www.redhat.com/archives/libvir-list/2018-June/msg00674.html 1. About reason why CMT is necessary in libvirt? The perf events of 'CMT, MBML, MBMT' have been phased out since Linux kernel commit c39a0e2c8850f08249383f2425dbd8dbe4baad69, in libvirt the perf based cmt,mbm will not work with the latest linux kernel. These patches add CMT feature to libvirt through kernel resctrlfs interface. 2. Interfaces for CMT from the high level. 2.1 Query the host capability of CMT. The element 'monitor' represents the host capabilities of CMT. The explanations of involved CMT attributes: - 'maxAllocs' denotes the maximum monitoring groups could be created, which is limited by the number of hardware 'RMID'. - 'threshold' denotes the upper bound of cache occupancy for current group, in bytes, to determine if an RMID can be reused. - element 'feature' denotes the monitoring feature supported. - 'llc_occupancy' is the feature for reporting the last level cache occupancy information. # virsh capabilities ... <cache> <bank id='0' level='3' type='both' size='15' unit='MiB' cpus='0-5'> <control granularity='768' unit='KiB' type='code' maxAllocs='8'/> <control granularity='768' unit='KiB' type='data' maxAllocs='8'/> + <monitor threshold='540672' unit='B' maxAllocs='176'/> + <feature name=llc_occupancy/> + </monitor> </bank> <bank id='1' level='3' type='both' size='15' unit='MiB' cpus='6-11'> <control granularity='768' unit='KiB' type='code' maxAllocs='8'/> <control granularity='768' unit='KiB' type='data' maxAllocs='8'/> + <monitor threshold='540672' unit='B' maxAllocs='176'/> + <feature name=llc_occupancy/> + </monitor> </bank> </cache> ... 2.2 Create cache monitoring group (cache monitor). The main interface for creating monitoring group is through XML file. The proposed configuration is like: <cputune> <cachetune vcpus='1'> <cache id='0' level='3' type='code' size='7680' unit='KiB'/> <cache id='1' level='3' type='data' size='3840' unit='KiB'/> + <monitor vcpus='1'/> </cachetune> <cachetune vcpus='4-7'> + <monitor vcpus='4-6'/> </cachetune> </cputune> In above XML, created 2 cache resctrl allocation groups and 2 resctrl monitoring groups. The changes of cache monitor will be effective in next booting of VM. 2.3 Show CMT result through command 'domstats' Adding the interface in qemu to report this information for resource monitor group through command 'virsh domstats --cpu-total'. Below is a typical output: # virsh domstats 1 --cpu-total Domain: 'ubuntu16.04-base' ... cpu.cache.monitor.count=2 cpu.cache.0.name=vcpus_1 cpu.cache.0.vcpus=1 cpu.cache.0.bank.count=2 cpu.cache.0.bank.0.id=0 cpu.cache.0.bank.0.bytes=4505600 cpu.cache.0.bank.1.id=1 cpu.cache.0.bank.1.bytes=5586944 cpu.cache.1.name=vcpus_4-6 cpu.cache.1.vcpus=4,5,6 cpu.cache.1.bank.count=2 cpu.cache.1.bank.0.id=0 cpu.cache.1.bank.0.bytes=17571840 cpu.cache.1.bank.1.id=1 cpu.cache.1.bank.1.bytes=29106176 **Changes Since RFCv3** In the output of 'domstats', added 'cpu.cache.<cmt_group_index>.bank.<bank_index>.id' to tell the OS assigned cache bank id of current cache. Changes is prefixed with a '+': # virsh domstats 1 --cpu-total Domain: 'ubuntu16.04-base' ... cpu.cache.monitor.count=2 cpu.cache.0.name=vcpus_1 cpu.cache.0.vcpus=1 cpu.cache.0.bank.count=2 + cpu.cache.0.bank.0.id=0 cpu.cache.0.bank.0.bytes=4505600 + cpu.cache.0.bank.1.id=1 cpu.cache.0.bank.1.bytes=5586944 cpu.cache.1.name=vcpus_4-6 cpu.cache.1.vcpus=4,5,6 cpu.cache.1.bank.count=2 + cpu.cache.1.bank.0.id=0 cpu.cache.1.bank.0.bytes=17571840 + cpu.cache.1.bank.1.id=1 cpu.cache.1.bank.1.bytes=29106176 Wang Huaqiang (10): conf: Renamed 'controlBuf' to 'childrenBuf' util: add interface retrieving CMT capability conf: Add CMT capability to host test: add test case for resctrl monitor util: resctrl: refactoring some functions util: Introduce resctrl monitor for CMT conf: refactor virDomainResctrlAppend conf: introduce resctrl monitor group in domain qemu: Introduce resctrl monitoring group qemu: Report cache occupancy (CMT) with domstats .gnulib | 1 - docs/formatdomain.html.in | 14 +- docs/schemas/capability.rng | 28 + docs/schemas/domaincommon.rng | 11 +- src/conf/capabilities.c | 51 +- src/conf/capabilities.h | 1 + src/conf/domain_conf.c | 159 +++++- src/conf/domain_conf.h | 20 + src/libvirt-domain.c | 9 + src/libvirt_private.syms | 6 + src/qemu/qemu_driver.c | 265 ++++++++- src/qemu/qemu_process.c | 40 +- src/util/virresctrl.c | 597 +++++++++++++++++++-- src/util/virresctrl.h | 48 +- tests/genericxml2xmlindata/cachetune-cdp.xml | 2 + .../cachetune-colliding-monitors.xml | 36 ++ tests/genericxml2xmlindata/cachetune-small.xml | 1 + tests/genericxml2xmlindata/cachetune.xml | 3 + tests/genericxml2xmltest.c | 4 + .../resctrl/info/L3_MON/max_threshold_occupancy | 1 + .../linux-resctrl/resctrl/info/L3_MON/mon_features | 3 + .../linux-resctrl/resctrl/info/L3_MON/num_rmids | 1 + tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml | 6 + 23 files changed, 1208 insertions(+), 99 deletions(-) delete mode 160000 .gnulib create mode 100644 tests/genericxml2xmlindata/cachetune-colliding-monitors.xml create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/max_threshold_occupancy create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/mon_features create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/num_rmids -- 2.7.4

To add CMT/MBM feature and let code be consistent in later patches, renaming variable name from 'controlBuf' to 'childrenBuf', locates in functions 'virCapabilitiesFormatCaches' and 'virCapabilitiesFormatMemoryBandwidth'. Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com> --- .gnulib | 1 - src/conf/capabilities.c | 28 ++++++++++++++-------------- 2 files changed, 14 insertions(+), 15 deletions(-) delete mode 160000 .gnulib diff --git a/.gnulib b/.gnulib deleted file mode 160000 index 68df637..0000000 --- a/.gnulib +++ /dev/null @@ -1 +0,0 @@ -Subproject commit 68df637b5f1b5c10370f6981d2a43a5cf74368df diff --git a/src/conf/capabilities.c b/src/conf/capabilities.c index 6b60fbc..326bd15 100644 --- a/src/conf/capabilities.c +++ b/src/conf/capabilities.c @@ -873,7 +873,7 @@ virCapabilitiesFormatCaches(virBufferPtr buf, { size_t i = 0; size_t j = 0; - virBuffer controlBuf = VIR_BUFFER_INITIALIZER; + virBuffer childrenBuf = VIR_BUFFER_INITIALIZER; if (!ncaches) return 0; @@ -902,7 +902,7 @@ virCapabilitiesFormatCaches(virBufferPtr buf, short_size, unit, cpus_str); VIR_FREE(cpus_str); - virBufferSetChildIndent(&controlBuf, buf); + virBufferSetChildIndent(&childrenBuf, buf); for (j = 0; j < bank->ncontrols; j++) { const char *min_unit; virResctrlInfoPerCachePtr controls = bank->controls[j]; @@ -928,26 +928,26 @@ virCapabilitiesFormatCaches(virBufferPtr buf, } } - virBufferAsprintf(&controlBuf, + virBufferAsprintf(&childrenBuf, "<control granularity='%llu'", gran_short_size); if (min_short_size) - virBufferAsprintf(&controlBuf, " min='%llu'", min_short_size); + virBufferAsprintf(&childrenBuf, " min='%llu'", min_short_size); - virBufferAsprintf(&controlBuf, + virBufferAsprintf(&childrenBuf, " unit='%s' type='%s' maxAllocs='%u'/>\n", unit, virCacheTypeToString(controls->scope), controls->max_allocation); } - if (virBufferCheckError(&controlBuf) < 0) + if (virBufferCheckError(&childrenBuf) < 0) return -1; - if (virBufferUse(&controlBuf)) { + if (virBufferUse(&childrenBuf)) { virBufferAddLit(buf, ">\n"); - virBufferAddBuffer(buf, &controlBuf); + virBufferAddBuffer(buf, &childrenBuf); virBufferAddLit(buf, "</bank>\n"); } else { virBufferAddLit(buf, "/>\n"); @@ -966,7 +966,7 @@ virCapabilitiesFormatMemoryBandwidth(virBufferPtr buf, virCapsHostMemBWNodePtr *nodes) { size_t i = 0; - virBuffer controlBuf = VIR_BUFFER_INITIALIZER; + virBuffer childrenBuf = VIR_BUFFER_INITIALIZER; if (!nnodes) return 0; @@ -987,19 +987,19 @@ virCapabilitiesFormatMemoryBandwidth(virBufferPtr buf, node->id, cpus_str); VIR_FREE(cpus_str); - virBufferSetChildIndent(&controlBuf, buf); - virBufferAsprintf(&controlBuf, + virBufferSetChildIndent(&childrenBuf, buf); + virBufferAsprintf(&childrenBuf, "<control granularity='%u' min ='%u' " "maxAllocs='%u'/>\n", control->granularity, control->min, control->max_allocation); - if (virBufferCheckError(&controlBuf) < 0) + if (virBufferCheckError(&childrenBuf) < 0) return -1; - if (virBufferUse(&controlBuf)) { + if (virBufferUse(&childrenBuf)) { virBufferAddLit(buf, ">\n"); - virBufferAddBuffer(buf, &controlBuf); + virBufferAddBuffer(buf, &childrenBuf); virBufferAddLit(buf, "</node>\n"); } else { virBufferAddLit(buf, "/>\n"); -- 2.7.4

On 08/27/2018 07:23 AM, Wang Huaqiang wrote:
To add CMT/MBM feature and let code be consistent in later patches, renaming variable name from 'controlBuf' to 'childrenBuf', locates in functions 'virCapabilitiesFormatCaches' and 'virCapabilitiesFormatMemoryBandwidth'.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com> --- .gnulib | 1 -
Gaah!!! Don't do that!
src/conf/capabilities.c | 28 ++++++++++++++-------------- 2 files changed, 14 insertions(+), 15 deletions(-) delete mode 160000 .gnulib
diff --git a/.gnulib b/.gnulib deleted file mode 160000 index 68df637..0000000 --- a/.gnulib +++ /dev/null @@ -1 +0,0 @@ -Subproject commit 68df637b5f1b5c10370f6981d2a43a5cf74368df
Luckily I can delete this hunk out of my .eml file before git am'ing the series. The rest is fine by me, allows childrenBuf to catch up with childBuf variables. At least in this case there's multiple elements within for loops being added as opposed to some other uses where there's just one. Reviewed-by: John Ferlan <jferlan@redhat.com> John

-----Original Message----- From: John Ferlan [mailto:jferlan@redhat.com] Sent: Wednesday, September 5, 2018 7:58 PM To: Wang, Huaqiang <huaqiang.wang@intel.com>; libvir-list@redhat.com Cc: Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [PATCH 01/10] conf: Renamed 'controlBuf' to 'childrenBuf'
Hi John, Thanks for review. Will address your comments in each separate email. BR Huaqiang
On 08/27/2018 07:23 AM, Wang Huaqiang wrote:
To add CMT/MBM feature and let code be consistent in later patches, renaming variable name from 'controlBuf' to 'childrenBuf', locates in functions 'virCapabilitiesFormatCaches' and 'virCapabilitiesFormatMemoryBandwidth'.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com> --- .gnulib | 1 -
Gaah!!!
Don't do that!
Will be removed.
src/conf/capabilities.c | 28 ++++++++++++++-------------- 2 files changed, 14 insertions(+), 15 deletions(-) delete mode 160000 .gnulib
diff --git a/.gnulib b/.gnulib deleted file mode 160000 index 68df637..0000000 --- a/.gnulib +++ /dev/null @@ -1 +0,0 @@ -Subproject commit 68df637b5f1b5c10370f6981d2a43a5cf74368df
Luckily I can delete this hunk out of my .eml file before git am'ing the series.
The rest is fine by me, allows childrenBuf to catch up with childBuf variables. At least in this case there's multiple elements within for loops being added as opposed to some other uses where there's just one.
Thanks.
Reviewed-by: John Ferlan <jferlan@redhat.com>
John

Introduce function for reporting CMT capability through going through files under /sys/fs/info/L3_MON. This patch is co-work with later patches and report these information to domain. Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com> --- src/conf/capabilities.c | 6 ++- src/conf/capabilities.h | 1 + src/util/virresctrl.c | 120 ++++++++++++++++++++++++++++++++++++++++++++++-- src/util/virresctrl.h | 17 ++++++- 4 files changed, 137 insertions(+), 7 deletions(-) diff --git a/src/conf/capabilities.c b/src/conf/capabilities.c index 326bd15..5280348 100644 --- a/src/conf/capabilities.c +++ b/src/conf/capabilities.c @@ -1626,6 +1626,9 @@ virCapsHostCacheBankFree(virCapsHostCacheBankPtr ptr) virBitmapFree(ptr->cpus); for (i = 0; i < ptr->ncontrols; i++) VIR_FREE(ptr->controls[i]); + if (ptr->monitor && ptr->monitor->features) + virStringListFree(ptr->monitor->features); + VIR_FREE(ptr->monitor); VIR_FREE(ptr->controls); VIR_FREE(ptr); } @@ -1801,7 +1804,8 @@ virCapabilitiesInitCaches(virCapsPtr caps) bank->level, bank->size, &bank->ncontrols, - &bank->controls) < 0) + &bank->controls, + &bank->monitor) < 0) goto cleanup; if (VIR_APPEND_ELEMENT(caps->host.caches, diff --git a/src/conf/capabilities.h b/src/conf/capabilities.h index 046e275..3ed2523 100644 --- a/src/conf/capabilities.h +++ b/src/conf/capabilities.h @@ -149,6 +149,7 @@ struct _virCapsHostCacheBank { virBitmapPtr cpus; /* All CPUs that share this bank */ size_t ncontrols; virResctrlInfoPerCachePtr *controls; + virResctrlInfoMonPtr monitor; }; typedef struct _virCapsHostMemBWNode virCapsHostMemBWNode; diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index 4b5442f..2f6923a 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -146,6 +146,8 @@ struct _virResctrlInfo { size_t nlevels; virResctrlInfoMemBWPtr membw_info; + + virResctrlInfoMonPtr monitor_info; }; @@ -171,6 +173,9 @@ virResctrlInfoDispose(void *obj) VIR_FREE(level); } + if (resctrl->monitor_info) + virStringListFree(resctrl->monitor_info->features); + VIR_FREE(resctrl->monitor_info); VIR_FREE(resctrl->membw_info); VIR_FREE(resctrl->levels); } @@ -556,6 +561,81 @@ virResctrlGetMemoryBandwidthInfo(virResctrlInfoPtr resctrl) static int +virResctrlGetMonitorInfo(virResctrlInfoPtr resctrl) +{ + int rv = -1; + char *featurestr = NULL; + char **lines = NULL; + size_t nlines = 0; + size_t i = 0; + int ret = -1; + virResctrlInfoMonPtr info = NULL; + + if (VIR_ALLOC(info) < 0) + return -1; + + rv = virFileReadValueUint(&info->max_allocation, + SYSFS_RESCTRL_PATH "/info/L3_MON/num_rmids"); + if (rv == -2) { + /* The file doesn't exist, so it's unusable for us, + * probably resource monitoring feature unsupported */ + VIR_WARN("The path '" SYSFS_RESCTRL_PATH "/info/L3_MON/num_rmids' " + "does not exist"); + + ret = 0; + goto cleanup; + } else if (rv < 0) { + /* Other failures are fatal, so just quit */ + goto cleanup; + } + + rv = virFileReadValueUint(&info->cache_threshold, + SYSFS_RESCTRL_PATH + "/info/L3_MON/max_threshold_occupancy"); + + if (rv == -2) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Cannot get max_threshold_occupancy from resctrl" + " info")); + } + if (rv < 0) + goto cleanup; + + rv = virFileReadValueString(&featurestr, + SYSFS_RESCTRL_PATH + "/info/L3_MON/mon_features"); + if (rv == -2) + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Cannot get mon_features from resctrl info")); + if (rv < 0) + goto cleanup; + + lines = virStringSplitCount(featurestr, "\n", 0, &nlines); + + for (i = 0; i < nlines; i++) { + if (STREQLEN(lines[i], "llc_", strlen("llc_")) || + STREQLEN(lines[i], "mbm_", strlen("mbm_"))) { + if (virStringListAdd(&info->features, lines[i]) < 0) + goto cleanup; + info->nfeatures++; + } + } + + VIR_FREE(featurestr); + virStringListFree(lines); + resctrl->monitor_info = info; + return 0; + + cleanup: + VIR_FREE(featurestr); + virStringListFree(lines); + virStringListFree(info->features); + VIR_FREE(info); + return ret; +} + + +static int virResctrlGetInfo(virResctrlInfoPtr resctrl) { DIR *dirp = NULL; @@ -569,6 +649,10 @@ virResctrlGetInfo(virResctrlInfoPtr resctrl) if (ret < 0) goto cleanup; + ret = virResctrlGetMonitorInfo(resctrl); + if (ret < 0) + goto cleanup; + ret = virResctrlGetCacheInfo(resctrl, dirp); if (ret < 0) goto cleanup; @@ -654,16 +738,21 @@ virResctrlInfoGetCache(virResctrlInfoPtr resctrl, unsigned int level, unsigned long long size, size_t *ncontrols, - virResctrlInfoPerCachePtr **controls) + virResctrlInfoPerCachePtr **controls, + virResctrlInfoMonPtr *monitor) { virResctrlInfoPerLevelPtr i_level = NULL; virResctrlInfoPerTypePtr i_type = NULL; + virResctrlInfoMonPtr cachemon = NULL; size_t i = 0; int ret = -1; if (virResctrlInfoIsEmpty(resctrl)) return 0; + if (VIR_ALLOC(cachemon) < 0) + return -1; + /* Let's take the opportunity to update the number of last level * cache. This number of memory bandwidth controller is same with * last level cache */ @@ -716,14 +805,35 @@ virResctrlInfoGetCache(virResctrlInfoPtr resctrl, memcpy((*controls)[*ncontrols - 1], &i_type->control, sizeof(i_type->control)); } - ret = 0; - cleanup: - return ret; + cachemon->max_allocation = 0; + + if (resctrl->monitor_info) { + virResctrlInfoMonPtr info = resctrl->monitor_info; + + cachemon->max_allocation = info->max_allocation; + cachemon->cache_threshold = info->cache_threshold; + for (i = 0; i < info->nfeatures; i++) { + /* Only cares about last level cache */ + if (STREQLEN(info->features[i], "llc_", strlen("llc_"))) { + if (virStringListAdd(&cachemon->features, + info->features[i]) < 0) + goto error; + cachemon->nfeatures++; + } + } + } + + if (cachemon->features) + *monitor = cachemon; + + return 0; error: while (*ncontrols) VIR_FREE((*controls)[--*ncontrols]); VIR_FREE(*controls); - goto cleanup; + virStringListFree(cachemon->features); + VIR_FREE(cachemon); + return ret; } diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h index cfd56dd..51bb68b 100644 --- a/src/util/virresctrl.h +++ b/src/util/virresctrl.h @@ -61,6 +61,19 @@ struct _virResctrlInfoMemBWPerNode { unsigned int max_allocation; }; +typedef struct _virResctrlInfoMon virResctrlInfoMon; +typedef virResctrlInfoMon *virResctrlInfoMonPtr; +/* Information about resource monitoring group */ +struct _virResctrlInfoMon { + /* null-terminal string list for hw supported monitor feature */ + char **features; + size_t nfeatures; + /* Maximum number of simultaneous allocations */ + unsigned int max_allocation; + /* determines the occupancy at which an RMID can be freed */ + unsigned int cache_threshold; +}; + typedef struct _virResctrlInfo virResctrlInfo; typedef virResctrlInfo *virResctrlInfoPtr; @@ -72,7 +85,9 @@ virResctrlInfoGetCache(virResctrlInfoPtr resctrl, unsigned int level, unsigned long long size, size_t *ncontrols, - virResctrlInfoPerCachePtr **controls); + virResctrlInfoPerCachePtr **controls, + virResctrlInfoMonPtr *monitor); + int virResctrlInfoGetMemoryBandwidth(virResctrlInfoPtr resctrl, -- 2.7.4

On 08/27/2018 07:23 AM, Wang Huaqiang wrote:
Introduce function for reporting CMT capability through going through files under /sys/fs/info/L3_MON. This patch is co-work with later patches and report these information to domain.
Do you mean you're setting the basis for future patches to provide the capability data for monitor info?
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com> --- src/conf/capabilities.c | 6 ++- src/conf/capabilities.h | 1 + src/util/virresctrl.c | 120 ++++++++++++++++++++++++++++++++++++++++++++++-- src/util/virresctrl.h | 17 ++++++- 4 files changed, 137 insertions(+), 7 deletions(-)
Caveat - I didn't go back and read all the previous history on this. Sorry there's just too much. I hope that mkletzan will also take a look at the series since he was involved previously. There's two things going on in this patch: 1. The actual fetch of the data into resctrl structures 2. The movement/copy of some of that data into @bank Splitting the patches such that item 1 is separate and then item 2 is combined with patches 3 and 4 along with some doc adjustments to describe the output. Of course I just peeked looking for "cache" and "bank" in docs/*.in and found nothing /-|... Looks like docs/formatcaps.html.in needs some love to describe <cache> and <memory_bandwidth> (how come this stuff comes to me afterwards...)
diff --git a/src/conf/capabilities.c b/src/conf/capabilities.c index 326bd15..5280348 100644 --- a/src/conf/capabilities.c +++ b/src/conf/capabilities.c @@ -1626,6 +1626,9 @@ virCapsHostCacheBankFree(virCapsHostCacheBankPtr ptr) virBitmapFree(ptr->cpus); for (i = 0; i < ptr->ncontrols; i++) VIR_FREE(ptr->controls[i]); + if (ptr->monitor && ptr->monitor->features) + virStringListFree(ptr->monitor->features); + VIR_FREE(ptr->monitor); VIR_FREE(ptr->controls); VIR_FREE(ptr); } @@ -1801,7 +1804,8 @@ virCapabilitiesInitCaches(virCapsPtr caps) bank->level, bank->size, &bank->ncontrols, - &bank->controls) < 0) + &bank->controls, + &bank->monitor) < 0)
I wonder if perhaps if it'd be better to have virResctrlInfoGetCache just take @bank as a parameter instead of continually adding more... I'm also not convinced @bank is the best place considering it's the same data that repeated gets fetched/stored.
goto cleanup;
if (VIR_APPEND_ELEMENT(caps->host.caches, diff --git a/src/conf/capabilities.h b/src/conf/capabilities.h index 046e275..3ed2523 100644 --- a/src/conf/capabilities.h +++ b/src/conf/capabilities.h @@ -149,6 +149,7 @@ struct _virCapsHostCacheBank { virBitmapPtr cpus; /* All CPUs that share this bank */ size_t ncontrols; virResctrlInfoPerCachePtr *controls; + virResctrlInfoMonPtr monitor;
This structure notes usage for specific @level; however, from how I read the path to the data, the data is only provided in L3. Since on output <bank> can specify a specific level, I assume that L1 or L2 would be possible for it; however, given how the code is written wouldn't that mean @monitor data is included as well? Additionally from how I read things it seems the same data is repeated for each bank id='#' found. So is the data unique to a bank by id or is unique to the level regardless of the bank? If the former, then the data needs to be properly split so it can be shown to be different for each id. If the latter, then <monitor> wouldn't seem to need to be a child of <bank>. It's not clear whether it's then a child or peer of <cache>.
};
typedef struct _virCapsHostMemBWNode virCapsHostMemBWNode; diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index 4b5442f..2f6923a 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -146,6 +146,8 @@ struct _virResctrlInfo { size_t nlevels;
virResctrlInfoMemBWPtr membw_info; + + virResctrlInfoMonPtr monitor_info; };
@@ -171,6 +173,9 @@ virResctrlInfoDispose(void *obj) VIR_FREE(level); }
+ if (resctrl->monitor_info) + virStringListFree(resctrl->monitor_info->features); + VIR_FREE(resctrl->monitor_info); VIR_FREE(resctrl->membw_info); VIR_FREE(resctrl->levels); } @@ -556,6 +561,81 @@ virResctrlGetMemoryBandwidthInfo(virResctrlInfoPtr resctrl)
Could add a few comments here to describe what's being provided here and how it fits (regardless of where in the schema of things it ends up).
static int +virResctrlGetMonitorInfo(virResctrlInfoPtr resctrl) +{ + int rv = -1; + char *featurestr = NULL; + char **lines = NULL> + size_t nlines = 0; + size_t i = 0; + int ret = -1; + virResctrlInfoMonPtr info = NULL; + + if (VIR_ALLOC(info) < 0) + return -1; + + rv = virFileReadValueUint(&info->max_allocation, + SYSFS_RESCTRL_PATH "/info/L3_MON/num_rmids"); + if (rv == -2) { + /* The file doesn't exist, so it's unusable for us, + * probably resource monitoring feature unsupported */ + VIR_WARN("The path '" SYSFS_RESCTRL_PATH "/info/L3_MON/num_rmids' " + "does not exist"); + + ret = 0; + goto cleanup; + } else if (rv < 0) { + /* Other failures are fatal, so just quit */ + goto cleanup; + } + + rv = virFileReadValueUint(&info->cache_threshold, + SYSFS_RESCTRL_PATH + "/info/L3_MON/max_threshold_occupancy"); + + if (rv == -2) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Cannot get max_threshold_occupancy from resctrl" + " info")); + } + if (rv < 0) + goto cleanup; + + rv = virFileReadValueString(&featurestr, + SYSFS_RESCTRL_PATH + "/info/L3_MON/mon_features"); + if (rv == -2) + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Cannot get mon_features from resctrl info")); + if (rv < 0) + goto cleanup; + + lines = virStringSplitCount(featurestr, "\n", 0, &nlines); + + for (i = 0; i < nlines; i++) { + if (STREQLEN(lines[i], "llc_", strlen("llc_")) || + STREQLEN(lines[i], "mbm_", strlen("mbm_"))) {
Consider using STRPREFIX.
+ if (virStringListAdd(&info->features, lines[i]) < 0) + goto cleanup; + info->nfeatures++; + }
So we get a list filtered by prefixes "llc_" and "mbm_" Eventually this list gets pared down again to just "llc_". None of the subsequent patches do anything with "mbm_" other than list it in capabilities XML output. Sure "mbm_" could be used in the future, but the question that comes to mind is why are initially filtering at all? That is, why not just replace lines/nlines with info->features and info->nfeatures? That then provides "everything" supported in the tree, right? I would figure this code would be just mirroring what's available. It's then up to the upper layers or other code to decide what to do with the list.
+ } + + VIR_FREE(featurestr); + virStringListFree(lines); + resctrl->monitor_info = info; + return 0;
consider using cleanup as part of the processing and thus removing the need to have multiple VIR_FREE(featurestr) and virStringListFree(lines). If you add a char **features = NULL; which you use to perform the virStringListAdd calls and then VIR_STEAL_PTR(info->features, features); VIR_STEAL_PTR(resctrl->monitor_info, info); ret = 0; and fall through allowing the featurestr, lines, and features to be used below. Of course that all assumes it's really necessary to do the filtering. There's also the VIR_AUTOPTR stuff added which I'm not exactly sure how to use yet as I wasn't part of that effort.
+ + cleanup: + VIR_FREE(featurestr); + virStringListFree(lines); + virStringListFree(info->features); + VIR_FREE(info); + return ret; +} + + +static int virResctrlGetInfo(virResctrlInfoPtr resctrl) { DIR *dirp = NULL; @@ -569,6 +649,10 @@ virResctrlGetInfo(virResctrlInfoPtr resctrl) if (ret < 0) goto cleanup;
+ ret = virResctrlGetMonitorInfo(resctrl); + if (ret < 0) + goto cleanup; + ret = virResctrlGetCacheInfo(resctrl, dirp); if (ret < 0) goto cleanup; @@ -654,16 +738,21 @@ virResctrlInfoGetCache(virResctrlInfoPtr resctrl, unsigned int level, unsigned long long size, size_t *ncontrols, - virResctrlInfoPerCachePtr **controls) + virResctrlInfoPerCachePtr **controls, + virResctrlInfoMonPtr *monitor) { virResctrlInfoPerLevelPtr i_level = NULL; virResctrlInfoPerTypePtr i_type = NULL; + virResctrlInfoMonPtr cachemon = NULL; size_t i = 0; int ret = -1;
if (virResctrlInfoIsEmpty(resctrl)) return 0;
+ if (VIR_ALLOC(cachemon) < 0) + return -1; + /* Let's take the opportunity to update the number of last level * cache. This number of memory bandwidth controller is same with * last level cache */ @@ -716,14 +805,35 @@ virResctrlInfoGetCache(virResctrlInfoPtr resctrl, memcpy((*controls)[*ncontrols - 1], &i_type->control, sizeof(i_type->control)); }
- ret = 0; - cleanup: - return ret; + cachemon->max_allocation = 0; + + if (resctrl->monitor_info) { + virResctrlInfoMonPtr info = resctrl->monitor_info; + + cachemon->max_allocation = info->max_allocation; + cachemon->cache_threshold = info->cache_threshold; + for (i = 0; i < info->nfeatures; i++) { + /* Only cares about last level cache */ + if (STREQLEN(info->features[i], "llc_", strlen("llc_"))) {
Again use STRPREFIX instead.
+ if (virStringListAdd(&cachemon->features, + info->features[i]) < 0) + goto error; + cachemon->nfeatures++; + } + }
This code further filters our "llc_" and "mbm_" list into just "llc_". Not sure I understand why we only "care about last level cache" values just yet. Of course I see that is all that gets added the subsequent patch capabilities output, but is that really what's wanted. Are we going to start seeing patches that start expanding this list? Why limit/filter now?
+ } + + if (cachemon->features) + *monitor = cachemon;
if (!cachemon->features), then @cachemon is leaked, consider using: VIR_STEAL_PTR(*monitor, cachemon); in the if condition, then VIR_FREE(cachemon); or just the VIR_FREE(cachemon); as an else. IDC either way. Of course, it's still not quite clear what's going on. Perhaps, you should have an API that gets all the names of the values prefixed by some string, IOW: virResctrlInfoGetMonitorPrefix(resctrl, filter) where filter is a "const char *filter" it would return that cachemon list whether it's NULL, empty, or full anything. Let the caller decide what to do with it. I haven't looked beyond the first 4 patches, so how things are used later on may determine what API's you could need. The relationship to <cache> and <bank> isn't clear.
+ + return 0; error: while (*ncontrols) VIR_FREE((*controls)[--*ncontrols]); VIR_FREE(*controls); - goto cleanup; + virStringListFree(cachemon->features); + VIR_FREE(cachemon); + return ret; }
diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h index cfd56dd..51bb68b 100644 --- a/src/util/virresctrl.h +++ b/src/util/virresctrl.h @@ -61,6 +61,19 @@ struct _virResctrlInfoMemBWPerNode { unsigned int max_allocation; };
+typedef struct _virResctrlInfoMon virResctrlInfoMon; +typedef virResctrlInfoMon *virResctrlInfoMonPtr; +/* Information about resource monitoring group */ +struct _virResctrlInfoMon { + /* null-terminal string list for hw supported monitor feature */ + char **features; + size_t nfeatures; + /* Maximum number of simultaneous allocations */ + unsigned int max_allocation;
What kind of allocations? From your cover you state maximum number of monitoring groups that could be created, but it's impossible to know how this value is expected to be used by what's provided as a comment here. The code shows this is the value from /info/L3_MON/num_rmids - I don't see the correlation in FS name to structure name. Since you later print a unit of 'B' I assume it's bytes of something, but the comment seems to imply a maximum simultaneous number of something. Perhaps if documentation was added I would have had my answer without needing to go research that is.
+ /* determines the occupancy at which an RMID can be freed */
Again, alone this comment is difficult to decipher as it relates to the structure field name. The code shows the value read is "/info/L3_MON/max_threshold_occupancy". It's not clear what "occupancy" means. Is there something related to this number that some consumer could change that would improve some performance? John
+ unsigned int cache_threshold; +}; + typedef struct _virResctrlInfo virResctrlInfo; typedef virResctrlInfo *virResctrlInfoPtr;
@@ -72,7 +85,9 @@ virResctrlInfoGetCache(virResctrlInfoPtr resctrl, unsigned int level, unsigned long long size, size_t *ncontrols, - virResctrlInfoPerCachePtr **controls); + virResctrlInfoPerCachePtr **controls, + virResctrlInfoMonPtr *monitor); +
int virResctrlInfoGetMemoryBandwidth(virResctrlInfoPtr resctrl,

-----Original Message----- From: John Ferlan [mailto:jferlan@redhat.com] Sent: Wednesday, September 5, 2018 7:58 PM To: Wang, Huaqiang <huaqiang.wang@intel.com>; libvir-list@redhat.com Cc: Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [PATCH 02/10] util: add interface retrieving CMT capability
On 08/27/2018 07:23 AM, Wang Huaqiang wrote:
Introduce function for reporting CMT capability through going through files under /sys/fs/info/L3_MON. This patch is co-work with later patches and report these information to domain.
Do you mean you're setting the basis for future patches to provide the capability data for monitor info?
Yes, patches 1 to 4 are adding host capability for cache monitor. This patch introduces two structures, adding 'virResctrlInfoMonPtr' to @resctrl (type 'virResctrlInfoPtr') to keep the resctrl monitoring group capabilities, and another is adding 'virResctrlInfoMonPtr' for each cache @bank (type 'virCapsHostCacheBankPtr') to store capabilites of cache monitor. A subsequent patch 3 posts cache monitor capability information to host capability query request.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com> --- src/conf/capabilities.c | 6 ++- src/conf/capabilities.h | 1 + src/util/virresctrl.c | 120
++++++++++++++++++++++++++++++++++++++++++++++--
src/util/virresctrl.h | 17 ++++++- 4 files changed, 137 insertions(+), 7 deletions(-)
Caveat - I didn't go back and read all the previous history on this. Sorry there's just too much. I hope that mkletzan will also take a look at the series since he was involved previously.
There's two things going on in this patch:
1. The actual fetch of the data into resctrl structures
This is accomplished by filling information to @virResctrlInfo->monitor_info.
2. The movement/copy of some of that data into @bank
This is accomplished by filling information to @virCapsHostCacheBankPtr->monitor
Splitting the patches such that item 1 is separate and then item 2 is combined with patches 3 and 4 along with some doc adjustments to describe the output.
Make sense. Will split this patch as you suggested, and add doc content. But patch 4 is a test patch for the newly involved cache monitor capability, do you think it is should be merged with patch 3 and second half of patch 2?
Of course I just peeked looking for "cache" and "bank" in docs/*.in and found nothing /-|... Looks like docs/formatcaps.html.in needs some love to describe <cache> and <memory_bandwidth> (how come this stuff comes to me afterwards...)
I'll provide the descriptions for cache monitor.
diff --git a/src/conf/capabilities.c b/src/conf/capabilities.c index 326bd15..5280348 100644 --- a/src/conf/capabilities.c +++ b/src/conf/capabilities.c @@ -1626,6 +1626,9 @@ virCapsHostCacheBankFree(virCapsHostCacheBankPtr ptr) virBitmapFree(ptr->cpus); for (i = 0; i < ptr->ncontrols; i++) VIR_FREE(ptr->controls[i]); + if (ptr->monitor && ptr->monitor->features) + virStringListFree(ptr->monitor->features); + VIR_FREE(ptr->monitor); VIR_FREE(ptr->controls); VIR_FREE(ptr); } @@ -1801,7 +1804,8 @@ virCapabilitiesInitCaches(virCapsPtr caps) bank->level, bank->size, &bank->ncontrols, - &bank->controls) < 0) + &bank->controls, + &bank->monitor) < 0)
I wonder if perhaps if it'd be better to have virResctrlInfoGetCache just take @bank as a parameter instead of continually adding more... I'm also not convinced @bank is the best place considering it's the same data that repeated gets fetched/stored.
This should be OK, since 'virCapsHostCacheBankPtr' structure is fully exposed in 'capabilities.h' file. The suggestion will be followed.
goto cleanup;
if (VIR_APPEND_ELEMENT(caps->host.caches, diff --git a/src/conf/capabilities.h b/src/conf/capabilities.h index 046e275..3ed2523 100644 --- a/src/conf/capabilities.h +++ b/src/conf/capabilities.h @@ -149,6 +149,7 @@ struct _virCapsHostCacheBank { virBitmapPtr cpus; /* All CPUs that share this bank */ size_t ncontrols; virResctrlInfoPerCachePtr *controls; + virResctrlInfoMonPtr monitor;
This structure notes usage for specific @level; however, from how I read the path to the data, the data is only provided in L3. Since on output <bank> can specify a specific level, I assume that L1 or L2 would be possible for it; however, given how the code is written wouldn't that mean @monitor data is included as well?
@level could be 1,2 or 3 for a 3-level cache system. @monitor will be involved in structure of each level cache. @monitor will be set to NULL if hardware does not support monitoring. For example any L1D cache does not support resource monitoring, then the corresponding 'virCapsHostCacheBank' 's @monitor will be set to NULL.
Additionally from how I read things it seems the same data is repeated for each bank id='#' found. So is the data unique to a bank by id or is unique to the level regardless of the bank? If the former, then the data needs to be properly split so it can be shown to be different for each id. If the latter, then <monitor> wouldn't seem to need to be a child of <bank>. It's not clear whether it's then a child or peer of <cache>.
Data is only kept in cache 'bank' that supports cache monitor feature, for cache that does not support monitoring feature, the @monitor will be set to 'NULL'. And, yes, it might be duplicated for serval times for a multi-node system, because multiple cache 'bank', which has feature of cache monitoring, were found. Seems you have a big concern for my arrangement of putting @monitor inside 'virCapsHostCacheBankPtr' structure. Before introducing my considerations for this arrangement, let me clarify the definition of 'cache bank' and the cache topologoy, I may make mistakes here, if does, please correct me. 'cache bank' has a group of attributes, including 'id', 'level', 'type', 'size' and 'cpus', there attributes are defined by Linux kernel. You can find the values of some specific CPU cache block from this directory: /sys/devices/system/cpu/cpu*/cache/* My understanding to libvirt cache 'bank' (or 'cache bank'): 'virCapsHostCacheBankPtr' represents the 'cache bank'. The 'virCapsHostCacheBankEquals', listed as below, tell us: for two 'cache bank', if they have exactly the same attributes values, means, they are the same 'cache bank', otherwise, they are different 'cache bank's. bool virCapsHostCacheBankEquals(virCapsHostCacheBankPtr a, virCapsHostCacheBankPtr b) { return (a->id == b->id && a->level == b->level && a->type == b->type && a->size == b->size && virBitmapEqual(a->cpus, b->cpus)); } How many 'cache bank's in system? and what is the relationship for 'cache bank' and hardware cache block? For convenience, take 2-socket E5-2699v4 system for example, there are two CPU nodes in system, each CPU(or node) has 22 hardware CPU cores, each core is equipped with two private L1 caches and one private L2 data cache, each node has a L3(or LLC) cache shared among cores. Let's use the concept of 'cache bank' to describe the cache of this 2-socket system, there are two L3 'cache bank's (assuming CDP is disabled), one for each socket and shared by associated CPU cores; There are totally 44 CPU cores, and each CPU core has three private 'cache bank's, the private L1D 'cache bank', L1I 'cache bank' and L2 'cache bank'. In total, 44x3+2=134 'cache bank's exist in the system. Here are my considerations for add @monitor for each 'cache bank': 1. This leverages the design of @controls/@ncontrols. a.) @controls/@ncontrols is designed for introducing cache allocation(CAT) feature, only last level cache of particular CPU supports cache allocation, private 'cache bank' does not support cache allocation. b.) @monitor follows the same policy of @controls/@ncontrols does. For 'cache bank' supports cache monitor, populates appropriate cache information through 'virCapsHostCacheBankPtr'->monitor; for 'cache bank' does not support cache monitor feature, just leave @monitor to empty. 2. Cache ‘monitor’ is designed as an accompany concept to cache ‘control’: cache ‘control’ mostly covers CAT technology, and cache ‘monitor’ now refers to CMT technology. You also mentioned "it seems the same data is repeated". Yes, it does for some system, for example, a 2-socket E5-2600v4 system. This is also confusing me a lot when I began to write the POC code. As I said, the 'cache monitor' leverages the 'cache allocation' design, the capability of cache allocation is stored in each 'cache bank' through @ncontrols and @controls also, @controls is empty and @ncontrols is 0, mean that there is no capability of cache allocation for this 'cache bank', otherwise, the @controls/@ncontrols points to an array of 'virResctrlInfoPerCachePtr' pointers, where it indicates the capabilities of cache allocation (CAT) feature. The content of @controls/@ncontrols may also duplicated. So libvirt has the capability (or designed) to keep 'cache bank' unique information, but resctrl could not provide such kind of 'cache bank' unique information, even you have a multiple socket system each socket populates a different CPU. The reality is resctrl only reports system wide CAT, CMT, MBA, MBM capabilities, it does not report CAT,CMT,MBA,MBM capabilities based on 'cache bank'. I think there should have an explanation for the necessity of keeping the cache allocation capability based on 'cache bank', despite the fact that resctrl only provide a copy of cache allocation capability data. Maybe, it is not wise to leverage the design of @bank/@controls here. or maybe I totally misunderstood the design of @bank/@controls.
};
typedef struct _virCapsHostMemBWNode virCapsHostMemBWNode; diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index 4b5442f..2f6923a 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -146,6 +146,8 @@ struct _virResctrlInfo { size_t nlevels;
virResctrlInfoMemBWPtr membw_info; + + virResctrlInfoMonPtr monitor_info; };
@@ -171,6 +173,9 @@ virResctrlInfoDispose(void *obj) VIR_FREE(level); }
+ if (resctrl->monitor_info) + virStringListFree(resctrl->monitor_info->features); + VIR_FREE(resctrl->monitor_info); VIR_FREE(resctrl->membw_info); VIR_FREE(resctrl->levels); } @@ -556,6 +561,81 @@ virResctrlGetMemoryBandwidthInfo(virResctrlInfoPtr resctrl)
Could add a few comments here to describe what's being provided here and how it fits (regardless of where in the schema of things it ends up).
how about adding these as function comments: /* * Retrieve the capability of resource monitoring group by checking the kernel * resource control interface. * the capability information includes: * max_allocation: the maximum number of monitoring groups could be created. * mon_features: the monitoring features supported, which could be * 'llc_occupancy' for the feature of reporting how much last level cache using. * 'mbm_total_bytes' for the feature of reporting total memory bandwidth using. * 'mbm_local_bytes' for the feature of reporting local memory bandwidth using. * cache_threshold: this affects the actual destroy of resource monitoring * group, mainly affects the hardware resource reclaim, a greater value of this, in * bytes, will make the request of creating new resource monitoring group more * likely to fail if the existing number of monitoring groups reaches up to * 'max_allocation'. */
static int +virResctrlGetMonitorInfo(virResctrlInfoPtr resctrl) { + int rv = -1; + char *featurestr = NULL; + char **lines = NULL> + size_t nlines = 0; + size_t i = 0; + int ret = -1; + virResctrlInfoMonPtr info = NULL; + + if (VIR_ALLOC(info) < 0) + return -1; + + rv = virFileReadValueUint(&info->max_allocation, + SYSFS_RESCTRL_PATH "/info/L3_MON/num_rmids"); + if (rv == -2) { + /* The file doesn't exist, so it's unusable for us, + * probably resource monitoring feature unsupported */ + VIR_WARN("The path '" SYSFS_RESCTRL_PATH "/info/L3_MON/num_rmids' " + "does not exist"); + + ret = 0; + goto cleanup; + } else if (rv < 0) { + /* Other failures are fatal, so just quit */ + goto cleanup; + } + + rv = virFileReadValueUint(&info->cache_threshold, + SYSFS_RESCTRL_PATH + + "/info/L3_MON/max_threshold_occupancy"); + + if (rv == -2) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Cannot get max_threshold_occupancy from resctrl" + " info")); + } + if (rv < 0) + goto cleanup; + + rv = virFileReadValueString(&featurestr, + SYSFS_RESCTRL_PATH + "/info/L3_MON/mon_features"); + if (rv == -2) + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Cannot get mon_features from resctrl info")); + if (rv < 0) + goto cleanup; + + lines = virStringSplitCount(featurestr, "\n", 0, &nlines); + + for (i = 0; i < nlines; i++) { + if (STREQLEN(lines[i], "llc_", strlen("llc_")) || + STREQLEN(lines[i], "mbm_", strlen("mbm_"))) {
Consider using STRPREFIX.
OK. To be fixed.
+ if (virStringListAdd(&info->features, lines[i]) < 0) + goto cleanup; + info->nfeatures++; + }
So we get a list filtered by prefixes "llc_" and "mbm_"
Eventually this list gets pared down again to just "llc_". None of the subsequent patches do anything with "mbm_" other than list it in capabilities XML output.
Sure "mbm_" could be used in the future, but the question that comes to mind is why are initially filtering at all? That is, why not just replace lines/nlines with info->features and info->nfeatures? That then provides "everything" supported in the tree, right?
Agree. Will remove the filter, just report the content from resctrl.
I would figure this code would be just mirroring what's available. It's then up to the upper layers or other code to decide what to do with the list.
Agree. Let upper layer make decision.
+ } + + VIR_FREE(featurestr); + virStringListFree(lines); + resctrl->monitor_info = info; + return 0;
consider using cleanup as part of the processing and thus removing the need to have multiple VIR_FREE(featurestr) and virStringListFree(lines).
If you add a char **features = NULL; which you use to perform the virStringListAdd calls and then
VIR_STEAL_PTR(info->features, features); VIR_STEAL_PTR(resctrl->monitor_info, info); ret = 0;
and fall through allowing the featurestr, lines, and features to be used below.
Of course that all assumes it's really necessary to do the filtering.
There's also the VIR_AUTOPTR stuff added which I'm not exactly sure how to use yet as I wasn't part of that effort.
Thanks for advice, really helpful. Will pay attention to 'cleanup'/'error' label and its backend logic.
+ + cleanup: + VIR_FREE(featurestr); + virStringListFree(lines); + virStringListFree(info->features); + VIR_FREE(info); + return ret; +} + + +static int virResctrlGetInfo(virResctrlInfoPtr resctrl) { DIR *dirp = NULL; @@ -569,6 +649,10 @@ virResctrlGetInfo(virResctrlInfoPtr resctrl) if (ret < 0) goto cleanup;
+ ret = virResctrlGetMonitorInfo(resctrl); + if (ret < 0) + goto cleanup; + ret = virResctrlGetCacheInfo(resctrl, dirp); if (ret < 0) goto cleanup; @@ -654,16 +738,21 @@ virResctrlInfoGetCache(virResctrlInfoPtr resctrl, unsigned int level, unsigned long long size, size_t *ncontrols, - virResctrlInfoPerCachePtr **controls) + virResctrlInfoPerCachePtr **controls, + virResctrlInfoMonPtr *monitor) { virResctrlInfoPerLevelPtr i_level = NULL; virResctrlInfoPerTypePtr i_type = NULL; + virResctrlInfoMonPtr cachemon = NULL; size_t i = 0; int ret = -1;
if (virResctrlInfoIsEmpty(resctrl)) return 0;
+ if (VIR_ALLOC(cachemon) < 0) + return -1; + /* Let's take the opportunity to update the number of last level * cache. This number of memory bandwidth controller is same with * last level cache */ @@ -716,14 +805,35 @@ virResctrlInfoGetCache(virResctrlInfoPtr resctrl, memcpy((*controls)[*ncontrols - 1], &i_type->control, sizeof(i_type- control)); }
- ret = 0; - cleanup: - return ret; + cachemon->max_allocation = 0; + + if (resctrl->monitor_info) { + virResctrlInfoMonPtr info = resctrl->monitor_info; + + cachemon->max_allocation = info->max_allocation; + cachemon->cache_threshold = info->cache_threshold; + for (i = 0; i < info->nfeatures; i++) { + /* Only cares about last level cache */ + if (STREQLEN(info->features[i], "llc_", strlen("llc_"))) + {
Again use STRPREFIX instead.
Will be fixed.
+ if (virStringListAdd(&cachemon->features, + info->features[i]) < 0) + goto error; + cachemon->nfeatures++; + } + }
This code further filters our "llc_" and "mbm_" list into just "llc_". Not sure I understand why we only "care about last level cache" values just yet. Of course I see that is all that gets added the subsequent patch capabilities output, but is that really what's wanted. Are we going to start seeing patches that start expanding this list? Why limit/filter now?
Hope I addressed your concerns: In resctrl, there are three resource monitoring features: 'llc_occupancy' for cache monitor. 'mbm_total_bytes' and 'mbm_local_bytes' for memory bandwidth monitor. In this series patches I only introduced the cache monitor, and memory bandwidth monitor is planned to be introduced in a series of patches afterwards. All resctrl features list above ('llc_'s and 'mbm_'s) are kept in @resctrl->monitor_info, then, when libvirt wants to get the feature list for, just for, cache monitor, function virResctrlGetCacheInfo will be invoked, only 'llc_' are the interested feature name. this is the reason why limit/filter applied here. (will using full name filter instead of the form such as 'llc_*') The virResctrlInfoGetCache will be called for any 'cache bank' that supported 'llc_occupancy' feature, the result will be stored in 'cache bank' private data area (with data type virCapsHostCacheBankPtr). Of course, as you mentioned, the data may be duplicated. The monitor feature list will be expanded when formatting cache monitor capabilities string in task of reporting host capabilities, as illustrated in following code: function virCapabilitiesFormatCaches@capabilities.c ... if (bank->monitor && bank->monitor->nfeatures) { virBufferAsprintf(&childrenBuf, "<monitor threshold='%u' unit='B' " "maxAllocs='%u'>\n", bank->monitor->cache_threshold, bank->monitor->max_allocation); /* expanding cache monitor feature list */ for (j = 0; j < bank->monitor->nfeatures; j++) { virBufferAdjustIndent(&childrenBuf, 2); virBufferAsprintf(&childrenBuf, "<feature name='%s'/>\n", bank->monitor->features[j]); virBufferAdjustIndent(&childrenBuf, -2); } virBufferAddLit(&childrenBuf, "</monitor>\n"); } ...
+ } + + if (cachemon->features) + *monitor = cachemon;
if (!cachemon->features), then @cachemon is leaked, consider using:
VIR_STEAL_PTR(*monitor, cachemon);
You catched my bug. Thanks.
in the if condition, then
VIR_FREE(cachemon);
or just the VIR_FREE(cachemon); as an else. IDC either way. Of course, it's still not quite clear what's going on.
Perhaps, you should have an API that gets all the names of the values prefixed by some string, IOW:
virResctrlInfoGetMonitorPrefix(resctrl, filter)
where filter is a "const char *filter"
it would return that cachemon list whether it's NULL, empty, or full anything. Let the caller decide what to do with it.
Looks reasonable and make code more concise, especially when adding memory bandwidth monitor. Will be implemented in next version patch.
I haven't looked beyond the first 4 patches, so how things are used later on may determine what API's you could need. The relationship to <cache> and <bank> isn't clear.
"The relationship to <cache> and <bank> isn't clear." -- Not catching your idea too much, do you mean the relationship to 'physical CPU cache' and the software scope 'cache bank' is not clear? If yes, please refer to my upper replies, the clarification paragraph of 'cache bank' with an example of 2S E5-2699v4 system.
+ + return 0; error: while (*ncontrols) VIR_FREE((*controls)[--*ncontrols]); VIR_FREE(*controls); - goto cleanup; + virStringListFree(cachemon->features); + VIR_FREE(cachemon); + return ret; }
diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h index cfd56dd..51bb68b 100644 --- a/src/util/virresctrl.h +++ b/src/util/virresctrl.h @@ -61,6 +61,19 @@ struct _virResctrlInfoMemBWPerNode { unsigned int max_allocation; };
+typedef struct _virResctrlInfoMon virResctrlInfoMon; typedef +virResctrlInfoMon *virResctrlInfoMonPtr; +/* Information about resource monitoring group */ struct +_virResctrlInfoMon { + /* null-terminal string list for hw supported monitor feature */ + char **features; + size_t nfeatures; + /* Maximum number of simultaneous allocations */ + unsigned int max_allocation;
What kind of allocations? From your cover you state maximum number of monitoring groups that could be created, but it's impossible to know how this value is expected to be used by what's provided as a comment here.
Changing the comment from /* Maximum number of simultaneous allocations */ to /* Maximum number of monitoring groups that could be created */
The code shows this is the value from /info/L3_MON/num_rmids - I don't see the correlation in FS name to structure name. Since you later print a unit of 'B' I assume it's bytes of something, but the comment seems to imply a maximum simultaneous number of something.
Looks the capability string make you confused. The XML output of cache monitor capability looks like: (leveraging the format of 'cache control' capability) <monitor threshold='270336' unit='B' maxAllocs='176'> <feature name='llc_occupancy'/> </monitor> 'B' is the unit for 'threshold', 'maxAllocs' is for max_allocation (parsed through /info/L3_MON/num_rmids). The 'unit' are not limited to 'B', could be 'KiB' ... This tells docs are necessary for interpreting these settings. "monitor": describes cache monitor capability. "threshold": This is cache occupancy threshold value used in kernel resource control system, and affects the actual release of hardware resource, the RMID (resource monitoring ID). A greater value of this will make the request of creating a new resource monitoring group more likely to fail if the existing number of monitoring groups reaches up to 'maxAlloc'. "unit": This is the unit of "threashold", could be 'B', 'KiB', 'MiB' or 'GiB'. "maxAllocs": This is a number that maximum number of monitoring groups could be created. "feature": describes the feature name supported by 'monitor'. Hope this documentation clears your confusion.
Perhaps if documentation was added I would have had my answer without needing to go research that is.
See docs above.
+ /* determines the occupancy at which an RMID can be freed */
Again, alone this comment is difficult to decipher as it relates to the structure field name. The code shows the value read is "/info/L3_MON/max_threshold_occupancy". It's not clear what "occupancy" means. Is there something related to this number that some consumer could change that would improve some performance?
"max_threshold_occupancy" is a concept involved by kernel resctrl. It is a cache value, in bytes, affects the release of hardware 'RMID', thus, affects the maximum number of monitor group could be created. Get more information from the cache monitor attribute 'threshold''s description. Thanks for your efforts of the review.
John
+ unsigned int cache_threshold; +}; + typedef struct _virResctrlInfo virResctrlInfo; typedef virResctrlInfo *virResctrlInfoPtr;
@@ -72,7 +85,9 @@ virResctrlInfoGetCache(virResctrlInfoPtr resctrl, unsigned int level, unsigned long long size, size_t *ncontrols, - virResctrlInfoPerCachePtr **controls); + virResctrlInfoPerCachePtr **controls, + virResctrlInfoMonPtr *monitor); +
int virResctrlInfoGetMemoryBandwidth(virResctrlInfoPtr resctrl,

On 09/07/2018 03:56 AM, Wang, Huaqiang wrote:
-----Original Message----- From: John Ferlan [mailto:jferlan@redhat.com] Sent: Wednesday, September 5, 2018 7:58 PM To: Wang, Huaqiang <huaqiang.wang@intel.com>; libvir-list@redhat.com Cc: Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [PATCH 02/10] util: add interface retrieving CMT capability
On 08/27/2018 07:23 AM, Wang Huaqiang wrote:
Introduce function for reporting CMT capability through going through files under /sys/fs/info/L3_MON. This patch is co-work with later patches and report these information to domain.
Do you mean you're setting the basis for future patches to provide the capability data for monitor info?
Yes, patches 1 to 4 are adding host capability for cache monitor. This patch introduces two structures, adding 'virResctrlInfoMonPtr' to @resctrl (type 'virResctrlInfoPtr') to keep the resctrl monitoring group capabilities, and another is adding 'virResctrlInfoMonPtr' for each cache @bank (type 'virCapsHostCacheBankPtr') to store capabilites of cache monitor. A subsequent patch 3 posts cache monitor capability information to host capability query request.
I pushed the src/conf/capabilities.c in patch 1, so let's take it off the table. Suggestion... When you're done with patches 2 -> 4 a/k/a the host piece of the cache monitor, post it. Once we are "good" with that it can be pushed. If at the same time you want to introduce the refactorings that don't include cache monitor, then do so in a separate series. Having everything in one series makes an impending review of 10-20 patches less desirable to start. If you get lucky, sometimes when there's a few 1-5 patch series you get multiple reviewers rather than waiting for 1 reviewer to complete a long series.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com> --- src/conf/capabilities.c | 6 ++- src/conf/capabilities.h | 1 + src/util/virresctrl.c | 120
++++++++++++++++++++++++++++++++++++++++++++++--
src/util/virresctrl.h | 17 ++++++- 4 files changed, 137 insertions(+), 7 deletions(-)
Caveat - I didn't go back and read all the previous history on this. Sorry there's just too much. I hope that mkletzan will also take a look at the series since he was involved previously.
There's two things going on in this patch:
1. The actual fetch of the data into resctrl structures
This is accomplished by filling information to @virResctrlInfo->monitor_info.
2. The movement/copy of some of that data into @bank
This is accomplished by filling information to @virCapsHostCacheBankPtr->monitor
Splitting the patches such that item 1 is separate and then item 2 is combined with patches 3 and 4 along with some doc adjustments to describe the output.
Make sense. Will split this patch as you suggested, and add doc content. But patch 4 is a test patch for the newly involved cache monitor capability, do you think it is should be merged with patch 3 and second half of patch 2?
Why not? You'd be testing what you introduced in patch3.
Of course I just peeked looking for "cache" and "bank" in docs/*.in and found nothing /-|... Looks like docs/formatcaps.html.in needs some love to describe <cache> and <memory_bandwidth> (how come this stuff comes to me afterwards...)
I'll provide the descriptions for cache monitor.
Catching up with "existing" can be a separate patch/series...
diff --git a/src/conf/capabilities.c b/src/conf/capabilities.c index 326bd15..5280348 100644 --- a/src/conf/capabilities.c +++ b/src/conf/capabilities.c @@ -1626,6 +1626,9 @@ virCapsHostCacheBankFree(virCapsHostCacheBankPtr ptr) virBitmapFree(ptr->cpus); for (i = 0; i < ptr->ncontrols; i++) VIR_FREE(ptr->controls[i]); + if (ptr->monitor && ptr->monitor->features) + virStringListFree(ptr->monitor->features); + VIR_FREE(ptr->monitor); VIR_FREE(ptr->controls); VIR_FREE(ptr); } @@ -1801,7 +1804,8 @@ virCapabilitiesInitCaches(virCapsPtr caps) bank->level, bank->size, &bank->ncontrols, - &bank->controls) < 0) + &bank->controls, + &bank->monitor) < 0)
I wonder if perhaps if it'd be better to have virResctrlInfoGetCache just take @bank as a parameter instead of continually adding more... I'm also not convinced @bank is the best place considering it's the same data that repeated gets fetched/stored.
This should be OK, since 'virCapsHostCacheBankPtr' structure is fully exposed in 'capabilities.h' file. The suggestion will be followed.
The point wasn't that @bank is exposed, it's the is it the right place. I became less convinced as I went on.
goto cleanup;
if (VIR_APPEND_ELEMENT(caps->host.caches, diff --git a/src/conf/capabilities.h b/src/conf/capabilities.h index 046e275..3ed2523 100644 --- a/src/conf/capabilities.h +++ b/src/conf/capabilities.h @@ -149,6 +149,7 @@ struct _virCapsHostCacheBank { virBitmapPtr cpus; /* All CPUs that share this bank */ size_t ncontrols; virResctrlInfoPerCachePtr *controls; + virResctrlInfoMonPtr monitor;
This structure notes usage for specific @level; however, from how I read the path to the data, the data is only provided in L3. Since on output <bank> can specify a specific level, I assume that L1 or L2 would be possible for it; however, given how the code is written wouldn't that mean @monitor data is included as well?
@level could be 1,2 or 3 for a 3-level cache system. @monitor will be involved in structure of each level cache. @monitor will be set to NULL if hardware does not support monitoring. For example any L1D cache does not support resource monitoring, then the corresponding 'virCapsHostCacheBank' 's @monitor will be set to NULL.
OK, but the path used in the subsequent patch is info/L3_MON/num_rmids - IOW: L3 only - hence the query. If there can be multiple levels of monitor cache, then it further makes me believe that it shouldn't be a child of @bank.
Additionally from how I read things it seems the same data is repeated for each bank id='#' found. So is the data unique to a bank by id or is unique to the level regardless of the bank? If the former, then the data needs to be properly split so it can be shown to be different for each id. If the latter, then <monitor> wouldn't seem to need to be a child of <bank>. It's not clear whether it's then a child or peer of <cache>.
Data is only kept in cache 'bank' that supports cache monitor feature, for cache that does not support monitoring feature, the @monitor will be set to 'NULL'. And, yes, it might be duplicated for serval times for a multi-node system, because multiple cache 'bank', which has feature of cache monitoring, were found.
Maybe I asked the wrong question, but if I look at the data presented in patch 4 - I didn't see anything that said to me that it should be a child of each <bank id='#'...>.... If it is a child of @bank, then the the @monitor would need to reference a bank by id, but instead it disjoint to that AFAICT.
Seems you have a big concern for my arrangement of putting @monitor inside 'virCapsHostCacheBankPtr' structure.
Yep, but it may just be a topological concern with how things are printed.
Before introducing my considerations for this arrangement, let me clarify the definition of 'cache bank' and the cache topologoy, I may make mistakes here, if does, please correct me.
I'm hoping you know it better than I do! The on disk structure I see starts at: /sys/fs/resctrl (SYSFS_RESCTRL_PATH) and then for each "subsystem" the subdirectory structure is: GetCacheInfo => info/%s/{num_closids|cbm_mask|min_cbm_bits} where "%s" is "L1", "L2", "L3", etc. GetMemoryBandwidthInfo => info/MB/{bandwidth_gran|min_bandwidth|num_closids} GetMonitorInfo => info/L3_MON/{num_rmids|mon_features} This maps to _virCapsHost: ... size_t ncaches; virCapsHostCacheBankPtr *caches; size_t nnodes; virCapsHostMemBWNodePtr *nodes; ... Where _virCapsHostCacheBank has: ... unsigned int id; unsigned int level; /* 1=L1, 2=L2, 3=L3, etc. */ ... The code places virResctrlInfoMonPtr as a child of each caches entry. However, when loading the data for each cache level the same "L3_MON" entry is read regardless of level and regardless of id. Thus, in my mind there's *a lot* of seemingly needless duplication. That is "bank=1" gets the same entry as "bank=2" and so on.
'cache bank' has a group of attributes, including 'id', 'level', 'type', 'size' and 'cpus', there attributes are defined by Linux kernel. You can find the values of some specific CPU cache block from this directory: /sys/devices/system/cpu/cpu*/cache/*
My understanding to libvirt cache 'bank' (or 'cache bank'): 'virCapsHostCacheBankPtr' represents the 'cache bank'. The 'virCapsHostCacheBankEquals', listed as below, tell us: for two 'cache bank', if they have exactly the same attributes values, means, they are the same 'cache bank', otherwise, they are different 'cache bank's.
bool virCapsHostCacheBankEquals(virCapsHostCacheBankPtr a, virCapsHostCacheBankPtr b) { return (a->id == b->id && a->level == b->level && a->type == b->type && a->size == b->size && virBitmapEqual(a->cpus, b->cpus)); }
How many 'cache bank's in system? and what is the relationship for 'cache bank' and hardware cache block? For convenience, take 2-socket E5-2699v4 system for example, there are two CPU nodes in system, each CPU(or node) has 22 hardware CPU cores, each core is equipped with two private L1 caches and one private L2 data cache, each node has a L3(or LLC) cache shared among cores.
So now you're mixing in the /sys/devices/system/cpu/cpu*/cache/* with the /sys/fs/resctrl/info/*. This is going to get confusing real fast. So far "banks" have been associated with some cpumask map as has the memory bandwidth via node id. Still nothing in the monitor code leads me to "see" how things would be different for each range of cpus by bank id.
Let's use the concept of 'cache bank' to describe the cache of this 2-socket system, there are two L3 'cache bank's (assuming CDP is disabled), one for each socket and shared by associated CPU cores; There are totally 44 CPU cores, and each CPU core has three private 'cache bank's, the private L1D 'cache bank', L1I 'cache bank' and L2 'cache bank'. In total, 44x3+2=134 'cache bank's exist in the system.
OK, but the TLA's and "knowledge" of levels and private or other variously named 'cache bank' just isn't knowledge I keep in my STM (short term memory) let alone my LTM (long).
Here are my considerations for add @monitor for each 'cache bank': 1. This leverages the design of @controls/@ncontrols. a.) @controls/@ncontrols is designed for introducing cache allocation(CAT) feature, only last level cache of particular CPU supports cache allocation, private 'cache bank' does not support cache allocation. b.) @monitor follows the same policy of @controls/@ncontrols does. For 'cache bank' supports cache monitor, populates appropriate cache information through 'virCapsHostCacheBankPtr'->monitor; for 'cache bank' does not support cache monitor feature, just leave @monitor to empty.
I think you've lost me, but I do see the <controls> to some degree equates to some level of calculated values based on the GetCacheInfo data. Again, not code I keep fresh in mind. I don't see the same for monitor - it's just raw data not related to anything bank related. I see "maxAlloc" a/k/a _virResctrlInfoMon->max_allocation a/k/a the read "/info/L3_MON/num_rmids" value. I see "threshold" a/k/a _virResctrlInfoMon->cache_threshold a/k/a the read "/info/L3_MON/max_threshold_occupancy" value. Nothing to do performing calculations like the control data has. Although not shown in the sample output - it's not clear whether the values could be different between banks. If they cannot, then it's too bad we have so much duplication.
2. Cache ‘monitor’ is designed as an accompany concept to cache ‘control’: cache ‘control’ mostly covers CAT technology, and cache ‘monitor’ now refers to CMT technology.
You also mentioned "it seems the same data is repeated". Yes, it does for some system, for example, a 2-socket E5-2600v4 system.
It's the same because the same file is read. There's nothing that I see that shows any differently. The *only* files read are in the path: /sys/fs/resctrl/info/L3_MON/ Not if they were in something like: /sys/fs/resctrl/info/%s/L3_MON/ where %s related to some bank id number and further if the L3 was L%d where %d = 1, 2, 3, etc. - then I can see the topology you've created. But the fact remains, it's one file, it's not different, so there's no need for the same data to be replicated.
This is also confusing me a lot when I began to write the POC code. As I said, the 'cache monitor' leverages the 'cache allocation' design, the capability of cache allocation is stored in each 'cache bank' through @ncontrols and @controls also, @controls is empty and @ncontrols is 0, mean that there is no capability of cache allocation for this 'cache bank', otherwise, the @controls/@ncontrols points to an array of 'virResctrlInfoPerCachePtr' pointers, where it indicates the capabilities of cache allocation (CAT) feature. The content of @controls/@ncontrols may also duplicated. So libvirt has the capability (or designed) to keep 'cache bank' unique information, but resctrl could not provide such kind of 'cache bank' unique information, even you have a multiple socket system each socket populates a different CPU. The reality is resctrl only reports system wide CAT, CMT, MBA, MBM capabilities, it does not report CAT,CMT,MBA,MBM capabilities based on 'cache bank'.
I think there should have an explanation for the necessity of keeping the cache allocation capability based on 'cache bank', despite the fact that resctrl only provide a copy of cache allocation capability data.
Maybe, it is not wise to leverage the design of @bank/@controls here. or maybe I totally misunderstood the design of @bank/@controls.
Sorry - I'm no help here. Martin did the original review and perhaps understands the model best.
};
typedef struct _virCapsHostMemBWNode virCapsHostMemBWNode; diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index 4b5442f..2f6923a 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -146,6 +146,8 @@ struct _virResctrlInfo { size_t nlevels;
virResctrlInfoMemBWPtr membw_info; + + virResctrlInfoMonPtr monitor_info; };
@@ -171,6 +173,9 @@ virResctrlInfoDispose(void *obj) VIR_FREE(level); }
+ if (resctrl->monitor_info) + virStringListFree(resctrl->monitor_info->features); + VIR_FREE(resctrl->monitor_info); VIR_FREE(resctrl->membw_info); VIR_FREE(resctrl->levels); } @@ -556,6 +561,81 @@ virResctrlGetMemoryBandwidthInfo(virResctrlInfoPtr resctrl)
Could add a few comments here to describe what's being provided here and how it fits (regardless of where in the schema of things it ends up).
how about adding these as function comments: /* * Retrieve the capability of resource monitoring group by checking the kernel * resource control interface. * the capability information includes:
s/the/The/ (and it can be on the preceding line) be sure to leave a blank link before the next though - makes it easier to read (at least for me)
* max_allocation: the maximum number of monitoring groups could be created.
Has monitoring groups been introduced? "could be created"? blank line for readability
* mon_features: the monitoring features supported, which could be * 'llc_occupancy' for the feature of reporting how much last level cache using. * 'mbm_total_bytes' for the feature of reporting total memory bandwidth using. * 'mbm_local_bytes' for the feature of reporting local memory bandwidth using.
These are essentially text strings for monitoring capabilities - how they're used is something for later on. IOW: This is what's allowed blank line for readability
* cache_threshold: this affects the actual destroy of resource monitoring * group, mainly affects the hardware resource reclaim, a greater value of this, in * bytes, will make the request of creating new resource monitoring group more * likely to fail if the existing number of monitoring groups reaches up to * 'max_allocation'.
ah, what, OK - Doesn't help me. It's a *capability* - changing it would seem to be the chore of someone sufficiently blessed in understanding in how all of this works.
*/
static int +virResctrlGetMonitorInfo(virResctrlInfoPtr resctrl) { + int rv = -1; + char *featurestr = NULL; + char **lines = NULL> + size_t nlines = 0; + size_t i = 0; + int ret = -1; + virResctrlInfoMonPtr info = NULL; + + if (VIR_ALLOC(info) < 0) + return -1; + + rv = virFileReadValueUint(&info->max_allocation, + SYSFS_RESCTRL_PATH "/info/L3_MON/num_rmids"); + if (rv == -2) { + /* The file doesn't exist, so it's unusable for us, + * probably resource monitoring feature unsupported */ + VIR_WARN("The path '" SYSFS_RESCTRL_PATH "/info/L3_MON/num_rmids' " + "does not exist"); + + ret = 0; + goto cleanup; + } else if (rv < 0) { + /* Other failures are fatal, so just quit */ + goto cleanup; + } + + rv = virFileReadValueUint(&info->cache_threshold, + SYSFS_RESCTRL_PATH + + "/info/L3_MON/max_threshold_occupancy"); + + if (rv == -2) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Cannot get max_threshold_occupancy from resctrl" + " info")); + } + if (rv < 0) + goto cleanup; + + rv = virFileReadValueString(&featurestr, + SYSFS_RESCTRL_PATH + "/info/L3_MON/mon_features"); + if (rv == -2) + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Cannot get mon_features from resctrl info")); + if (rv < 0) + goto cleanup; + + lines = virStringSplitCount(featurestr, "\n", 0, &nlines); + + for (i = 0; i < nlines; i++) { + if (STREQLEN(lines[i], "llc_", strlen("llc_")) || + STREQLEN(lines[i], "mbm_", strlen("mbm_"))) {
Consider using STRPREFIX.
OK. To be fixed.
+ if (virStringListAdd(&info->features, lines[i]) < 0) + goto cleanup; + info->nfeatures++; + }
So we get a list filtered by prefixes "llc_" and "mbm_"
Eventually this list gets pared down again to just "llc_". None of the subsequent patches do anything with "mbm_" other than list it in capabilities XML output.
Sure "mbm_" could be used in the future, but the question that comes to mind is why are initially filtering at all? That is, why not just replace lines/nlines with info->features and info->nfeatures? That then provides "everything" supported in the tree, right?
Agree. Will remove the filter, just report the content from resctrl.
I would figure this code would be just mirroring what's available. It's then up to the upper layers or other code to decide what to do with the list.
Agree. Let upper layer make decision.
+ } + + VIR_FREE(featurestr); + virStringListFree(lines); + resctrl->monitor_info = info; + return 0;
consider using cleanup as part of the processing and thus removing the need to have multiple VIR_FREE(featurestr) and virStringListFree(lines).
If you add a char **features = NULL; which you use to perform the virStringListAdd calls and then
VIR_STEAL_PTR(info->features, features); VIR_STEAL_PTR(resctrl->monitor_info, info); ret = 0;
and fall through allowing the featurestr, lines, and features to be used below.
Of course that all assumes it's really necessary to do the filtering.
There's also the VIR_AUTOPTR stuff added which I'm not exactly sure how to use yet as I wasn't part of that effort.
Thanks for advice, really helpful. Will pay attention to 'cleanup'/'error' label and its backend logic.
+ + cleanup: + VIR_FREE(featurestr); + virStringListFree(lines); + virStringListFree(info->features); + VIR_FREE(info); + return ret; +} + + +static int virResctrlGetInfo(virResctrlInfoPtr resctrl) { DIR *dirp = NULL; @@ -569,6 +649,10 @@ virResctrlGetInfo(virResctrlInfoPtr resctrl) if (ret < 0) goto cleanup;
+ ret = virResctrlGetMonitorInfo(resctrl); + if (ret < 0) + goto cleanup; + ret = virResctrlGetCacheInfo(resctrl, dirp); if (ret < 0) goto cleanup; @@ -654,16 +738,21 @@ virResctrlInfoGetCache(virResctrlInfoPtr resctrl, unsigned int level, unsigned long long size, size_t *ncontrols, - virResctrlInfoPerCachePtr **controls) + virResctrlInfoPerCachePtr **controls, + virResctrlInfoMonPtr *monitor) { virResctrlInfoPerLevelPtr i_level = NULL; virResctrlInfoPerTypePtr i_type = NULL; + virResctrlInfoMonPtr cachemon = NULL; size_t i = 0; int ret = -1;
if (virResctrlInfoIsEmpty(resctrl)) return 0;
+ if (VIR_ALLOC(cachemon) < 0) + return -1; + /* Let's take the opportunity to update the number of last level * cache. This number of memory bandwidth controller is same with * last level cache */ @@ -716,14 +805,35 @@ virResctrlInfoGetCache(virResctrlInfoPtr resctrl, memcpy((*controls)[*ncontrols - 1], &i_type->control, sizeof(i_type- control)); }
- ret = 0; - cleanup: - return ret; + cachemon->max_allocation = 0; + + if (resctrl->monitor_info) { + virResctrlInfoMonPtr info = resctrl->monitor_info; + + cachemon->max_allocation = info->max_allocation; + cachemon->cache_threshold = info->cache_threshold; + for (i = 0; i < info->nfeatures; i++) { + /* Only cares about last level cache */ + if (STREQLEN(info->features[i], "llc_", strlen("llc_"))) + {
Again use STRPREFIX instead.
Will be fixed.
+ if (virStringListAdd(&cachemon->features, + info->features[i]) < 0) + goto error; + cachemon->nfeatures++; + } + }
This code further filters our "llc_" and "mbm_" list into just "llc_". Not sure I understand why we only "care about last level cache" values just yet. Of course I see that is all that gets added the subsequent patch capabilities output, but is that really what's wanted. Are we going to start seeing patches that start expanding this list? Why limit/filter now?
Hope I addressed your concerns:
Not sure until I see the next version of course. I think we're perhaps far enough apart to make me concerned.
In resctrl, there are three resource monitoring features: 'llc_occupancy' for cache monitor.
Hence why you placed it where you did...
'mbm_total_bytes' and 'mbm_local_bytes' for memory bandwidth monitor.
does this mean these would be placed somehow under <memory_bandwidth> in a similar manner?
In this series patches I only introduced the cache monitor, and memory bandwidth monitor is planned to be introduced in a series of patches afterwards.
I think if we can come to an agreement on how the capability format should look it will be better. You have my feedback, now it's up to you to take the next step. I'm not sure I have the cycles to engineer this.
All resctrl features list above ('llc_'s and 'mbm_'s) are kept in @resctrl->monitor_info, then, when libvirt wants to get the feature list for, just for, cache monitor, function virResctrlGetCacheInfo will be invoked, only 'llc_' are the interested feature name. this is the reason why limit/filter applied here. (will using full name filter instead of the form such as 'llc_*')
The virResctrlInfoGetCache will be called for any 'cache bank' that supported 'llc_occupancy' feature, the result will be stored in 'cache bank' private data area (with data type virCapsHostCacheBankPtr). Of course, as you mentioned, the data may be duplicated.
The monitor feature list will be expanded when formatting cache monitor capabilities string in task of reporting host capabilities, as illustrated in following code:
function virCapabilitiesFormatCaches@capabilities.c ... if (bank->monitor && bank->monitor->nfeatures) { virBufferAsprintf(&childrenBuf, "<monitor threshold='%u' unit='B' " "maxAllocs='%u'>\n", bank->monitor->cache_threshold, bank->monitor->max_allocation); /* expanding cache monitor feature list */ for (j = 0; j < bank->monitor->nfeatures; j++) { virBufferAdjustIndent(&childrenBuf, 2); virBufferAsprintf(&childrenBuf, "<feature name='%s'/>\n", bank->monitor->features[j]); virBufferAdjustIndent(&childrenBuf, -2); } virBufferAddLit(&childrenBuf, "</monitor>\n"); } ...
+ } + + if (cachemon->features) + *monitor = cachemon;
if (!cachemon->features), then @cachemon is leaked, consider using:
VIR_STEAL_PTR(*monitor, cachemon);
You catched my bug. Thanks.
in the if condition, then
VIR_FREE(cachemon);
or just the VIR_FREE(cachemon); as an else. IDC either way. Of course, it's still not quite clear what's going on.
Perhaps, you should have an API that gets all the names of the values prefixed by some string, IOW:
virResctrlInfoGetMonitorPrefix(resctrl, filter)
where filter is a "const char *filter"
it would return that cachemon list whether it's NULL, empty, or full anything. Let the caller decide what to do with it.
Looks reasonable and make code more concise, especially when adding memory bandwidth monitor. Will be implemented in next version patch.
I haven't looked beyond the first 4 patches, so how things are used later on may determine what API's you could need. The relationship to <cache> and <bank> isn't clear.
"The relationship to <cache> and <bank> isn't clear." -- Not catching your idea too much, do you mean the relationship to 'physical CPU cache' and the software scope 'cache bank' is not clear? If yes, please refer to my upper replies, the clarification paragraph of 'cache bank' with an example of 2S E5-2699v4 system.
If it's not clear by this response, I'm afraid we'll just be too far apart going forward. Maybe the capability output should: <monitor level='3' threshold='%u' maxAlloc='%u' > <feature name='%s'/> <feature name='%s'/> ... </monitor> Where the '3' is because you read from "L3_MON" and only important if you feel 1 or 2 or something else would be generated eventually. And then you correlate however you have to "later". You "know" that <cache> would be related to "llc_occupancy" and take it from there. I see no way for each feature to have a different num_rmids or max__threshold_occupancy value, so that's why I'm putting them as attributes of <monitor>. What is of concern is how someone knows <monitor> relates to both <cache> and <memory_bandwidth> - I guess that has to be left for the documentation portion. If you wanted to name it something different than <monitor> that's fine - naming is hard (TM).
+ + return 0; error: while (*ncontrols) VIR_FREE((*controls)[--*ncontrols]); VIR_FREE(*controls); - goto cleanup; + virStringListFree(cachemon->features); + VIR_FREE(cachemon); + return ret; }
diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h index cfd56dd..51bb68b 100644 --- a/src/util/virresctrl.h +++ b/src/util/virresctrl.h @@ -61,6 +61,19 @@ struct _virResctrlInfoMemBWPerNode { unsigned int max_allocation; };
+typedef struct _virResctrlInfoMon virResctrlInfoMon; typedef +virResctrlInfoMon *virResctrlInfoMonPtr; +/* Information about resource monitoring group */ struct +_virResctrlInfoMon { + /* null-terminal string list for hw supported monitor feature */ + char **features; + size_t nfeatures; + /* Maximum number of simultaneous allocations */ + unsigned int max_allocation;
What kind of allocations? From your cover you state maximum number of monitoring groups that could be created, but it's impossible to know how this value is expected to be used by what's provided as a comment here.
Changing the comment from /* Maximum number of simultaneous allocations */ to /* Maximum number of monitoring groups that could be created */
The code shows this is the value from /info/L3_MON/num_rmids - I don't see the correlation in FS name to structure name. Since you later print a unit of 'B' I assume it's bytes of something, but the comment seems to imply a maximum simultaneous number of something.
Looks the capability string make you confused. The XML output of cache monitor capability looks like: (leveraging the format of 'cache control' capability)
<monitor threshold='270336' unit='B' maxAllocs='176'> <feature name='llc_occupancy'/> </monitor>
'B' is the unit for 'threshold', 'maxAllocs' is for max_allocation (parsed through /info/L3_MON/num_rmids). The 'unit' are not limited to 'B', could be 'KiB' ...
True, but you read a 'B' and never change that anywhere - it's also not clear threshold is a 'B' value. At least when I see 'size' followed by 'unit' - I'm certain the size is a related to unit. To me "threshold" is just a "value" not a necessarily byte related. Could be a count of something. Of course the longer wording "max_threshold_occupancy" doesn't help me much either. A maximum threshold occupancy of what? It's not unique to each feature name, it's unique to the L3_MON.
This tells docs are necessary for interpreting these settings.
clearly!
"monitor": describes cache monitor capability. "threshold": This is cache occupancy threshold value used in kernel resource control system, and affects the actual release of hardware resource, the RMID (resource monitoring ID). A greater value of this will make the request of creating a new resource monitoring group more likely to fail if the existing number of monitoring groups reaches up to 'maxAlloc'.
Again it's not something a "normal consumer" would probably change...
"unit": This is the unit of "threashold", could be 'B', 'KiB', 'MiB' or 'GiB'.
Unless you do the logic to present it as calculated value, then what's the purpose. I know there's code out there that will "prettify" output such that for the example from patch 4 a value of 270336 bytes is printed as '264 MiB'. If you're not going to do that, then just present as a value and note that it's a byte value. /me no wonders if you should be sure to store this in "unsigned long long" field since you mention 'GiB' an 'unsigned int' only gets you so far.
"maxAllocs": This is a number that maximum number of monitoring groups could be created. "feature": describes the feature name supported by 'monitor'.
Hope this documentation clears your confusion.
Perhaps if documentation was added I would have had my answer without needing to go research that is.
See docs above.
+ /* determines the occupancy at which an RMID can be freed */
Again, alone this comment is difficult to decipher as it relates to the structure field name. The code shows the value read is "/info/L3_MON/max_threshold_occupancy". It's not clear what "occupancy" means. Is there something related to this number that some consumer could change that would improve some performance?
"max_threshold_occupancy" is a concept involved by kernel resctrl. It is a cache value, in bytes, affects the release of hardware 'RMID', thus, affects the maximum number of monitor group could be created. Get more information from the cache monitor attribute 'threshold''s description.
Thanks for your efforts of the review.
Thanks for your return of details - I'm still not sure I really understand the maxAllocs and threshold values. I see them purely as display values at this point. I cannot imagine providing an interface or description that would help some consumer adjust the value to fix some problem on their host. There are patch series in the virtual bit bucket that have tried to do that in other areas. John
John
+ unsigned int cache_threshold; +}; + typedef struct _virResctrlInfo virResctrlInfo; typedef virResctrlInfo *virResctrlInfoPtr;
@@ -72,7 +85,9 @@ virResctrlInfoGetCache(virResctrlInfoPtr resctrl, unsigned int level, unsigned long long size, size_t *ncontrols, - virResctrlInfoPerCachePtr **controls); + virResctrlInfoPerCachePtr **controls, + virResctrlInfoMonPtr *monitor); +
int virResctrlInfoGetMemoryBandwidth(virResctrlInfoPtr resctrl,

-----Original Message----- From: John Ferlan [mailto:jferlan@redhat.com] Sent: Saturday, September 8, 2018 12:49 AM To: Wang, Huaqiang <huaqiang.wang@intel.com>; libvir-list@redhat.com Cc: Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [PATCH 02/10] util: add interface retrieving CMT capability
On 09/07/2018 03:56 AM, Wang, Huaqiang wrote:
-----Original Message----- From: John Ferlan [mailto:jferlan@redhat.com] Sent: Wednesday, September 5, 2018 7:58 PM To: Wang, Huaqiang <huaqiang.wang@intel.com>; libvir-list@redhat.com Cc: Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [PATCH 02/10] util: add interface retrieving CMT capability
On 08/27/2018 07:23 AM, Wang Huaqiang wrote:
Introduce function for reporting CMT capability through going through files under /sys/fs/info/L3_MON. This patch is co-work with later patches and report these information to domain.
Do you mean you're setting the basis for future patches to provide the capability data for monitor info?
Yes, patches 1 to 4 are adding host capability for cache monitor. This patch introduces two structures, adding 'virResctrlInfoMonPtr' to @resctrl (type 'virResctrlInfoPtr') to keep the resctrl monitoring group capabilities, and another is adding 'virResctrlInfoMonPtr' for each cache @bank (type 'virCapsHostCacheBankPtr') to store capabilites of
cache monitor.
A subsequent patch 3 posts cache monitor capability information to host capability query request.
I pushed the src/conf/capabilities.c in patch 1, so let's take it off the table.
Got. Thanks.
Suggestion... When you're done with patches 2 -> 4 a/k/a the host piece of the cache monitor, post it. Once we are "good" with that it can be pushed.
I'll submit the new patches covering patches 2->4 as a new series for next review.
If at the same time you want to introduce the refactorings that don't include cache monitor, then do so in a separate series.
Keep in mind for me: the refactorings need a new separate patch.
Having everything in one series makes an impending review of 10-20 patches less desirable to start. If you get lucky, sometimes when there's a few 1-5 patch series you get multiple reviewers rather than waiting for 1 reviewer to complete a long series.
Thanks for all your suggestions.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com> --- src/conf/capabilities.c | 6 ++- src/conf/capabilities.h | 1 + src/util/virresctrl.c | 120
++++++++++++++++++++++++++++++++++++++++++++++--
src/util/virresctrl.h | 17 ++++++- 4 files changed, 137 insertions(+), 7 deletions(-)
Caveat - I didn't go back and read all the previous history on this. Sorry there's just too much. I hope that mkletzan will also take a look at the series since he was involved previously.
There's two things going on in this patch:
1. The actual fetch of the data into resctrl structures
This is accomplished by filling information to @virResctrlInfo->monitor_info.
2. The movement/copy of some of that data into @bank
This is accomplished by filling information to @virCapsHostCacheBankPtr->monitor
Splitting the patches such that item 1 is separate and then item 2 is combined with patches 3 and 4 along with some doc adjustments to describe the output.
Make sense. Will split this patch as you suggested, and add doc content. But patch 4 is a test patch for the newly involved cache monitor capability, do you think it is should be merged with patch 3 and second half of patch 2?
Why not? You'd be testing what you introduced in patch3.
No problem. Will combine patch3 and 4 with part of patch2 as well as some document into a new patch.
Of course I just peeked looking for "cache" and "bank" in docs/*.in and found nothing /-|... Looks like docs/formatcaps.html.in needs some love to describe <cache> and <memory_bandwidth> (how come this stuff comes to me afterwards...)
I'll provide the descriptions for cache monitor.
Catching up with "existing" can be a separate patch/series...
Got.
diff --git a/src/conf/capabilities.c b/src/conf/capabilities.c index 326bd15..5280348 100644 --- a/src/conf/capabilities.c +++ b/src/conf/capabilities.c @@ -1626,6 +1626,9 @@ virCapsHostCacheBankFree(virCapsHostCacheBankPtr ptr) virBitmapFree(ptr->cpus); for (i = 0; i < ptr->ncontrols; i++) VIR_FREE(ptr->controls[i]); + if (ptr->monitor && ptr->monitor->features) + virStringListFree(ptr->monitor->features); + VIR_FREE(ptr->monitor); VIR_FREE(ptr->controls); VIR_FREE(ptr); } @@ -1801,7 +1804,8 @@ virCapabilitiesInitCaches(virCapsPtr caps) bank->level, bank->size, &bank->ncontrols, - &bank->controls) < 0) + &bank->controls, + &bank->monitor) < 0)
I wonder if perhaps if it'd be better to have virResctrlInfoGetCache just take @bank as a parameter instead of continually adding more... I'm also not convinced @bank is the best place considering it's the same data that repeated gets fetched/stored.
This should be OK, since 'virCapsHostCacheBankPtr' structure is fully exposed in 'capabilities.h' file. The suggestion will be followed.
The point wasn't that @bank is exposed, it's the is it the right place. I became less convinced as I went on.
Understand. Depending on if @monitor is a private of @bank. I proposed new layout for @monitor, hope it aligns with you. Please move on for next replies.
goto cleanup;
if (VIR_APPEND_ELEMENT(caps->host.caches, diff --git a/src/conf/capabilities.h b/src/conf/capabilities.h index 046e275..3ed2523 100644 --- a/src/conf/capabilities.h +++ b/src/conf/capabilities.h @@ -149,6 +149,7 @@ struct _virCapsHostCacheBank { virBitmapPtr cpus; /* All CPUs that share this bank */ size_t ncontrols; virResctrlInfoPerCachePtr *controls; + virResctrlInfoMonPtr monitor;
This structure notes usage for specific @level; however, from how I read the path to the data, the data is only provided in L3. Since on output <bank> can specify a specific level, I assume that L1 or L2 would be possible for it; however, given how the code is written wouldn't that mean @monitor data is included as well?
@level could be 1,2 or 3 for a 3-level cache system. @monitor will be involved in structure of each level cache. @monitor will be set to NULL if hardware does not support monitoring. For example any L1D cache does not support resource monitoring, then the corresponding 'virCapsHostCacheBank' 's @monitor will be set to NULL.
OK, but the path used in the subsequent patch is info/L3_MON/num_rmids - IOW: L3 only - hence the query. If there can be multiple levels of monitor cache, then it further makes me believe that it shouldn't be a child of @bank.
Additionally from how I read things it seems the same data is repeated for each bank id='#' found. So is the data unique to a bank by id or is unique to the level regardless of the bank? If the former, then the data needs to be properly split so it can be shown to be different for each id. If the latter, then <monitor> wouldn't seem to need to be a child of <bank>. It's not clear whether it's then a child or peer of <cache>.
Data is only kept in cache 'bank' that supports cache monitor feature, for cache that does not support monitoring feature, the @monitor will be set to 'NULL'. And, yes, it might be duplicated for serval times for a multi-node system, because multiple cache 'bank', which has feature of cache monitoring, were found.
Maybe I asked the wrong question, but if I look at the data presented in patch 4 - I didn't see anything that said to me that it should be a child of each <bank id='#'...>.... If it is a child of @bank, then the the @monitor would need to reference a bank by id, but instead it disjoint to that AFAICT.
Seems you have a big concern for my arrangement of putting @monitor inside 'virCapsHostCacheBankPtr' structure.
Yep, but it may just be a topological concern with how things are printed.
Before introducing my considerations for this arrangement, let me clarify the definition of 'cache bank' and the cache topologoy, I may make mistakes here, if does, please correct me.
I'm hoping you know it better than I do!
The on disk structure I see starts at:
/sys/fs/resctrl (SYSFS_RESCTRL_PATH)
and then for each "subsystem" the subdirectory structure is:
GetCacheInfo => info/%s/{num_closids|cbm_mask|min_cbm_bits}
where "%s" is "L1", "L2", "L3", etc.
GetMemoryBandwidthInfo => info/MB/{bandwidth_gran|min_bandwidth|num_closids}
GetMonitorInfo => info/L3_MON/{num_rmids|mon_features}
This maps to _virCapsHost:
... size_t ncaches; virCapsHostCacheBankPtr *caches;
size_t nnodes; virCapsHostMemBWNodePtr *nodes; ...
Where _virCapsHostCacheBank has: ... unsigned int id; unsigned int level; /* 1=L1, 2=L2, 3=L3, etc. */ ...
The code places virResctrlInfoMonPtr as a child of each caches entry. However, when loading the data for each cache level the same "L3_MON" entry is read regardless of level and regardless of id. Thus, in my mind there's *a lot* of seemingly needless duplication. That is "bank=1" gets the same entry as "bank=2" and so on.
You tell the truth of my code. Accept all your description. Let's figure out a way to place the capabilities of cache monitor and remove the duplication.
'cache bank' has a group of attributes, including 'id', 'level', 'type', 'size' and 'cpus', there attributes are defined by Linux kernel. You can find the values of some specific CPU cache block from this directory: /sys/devices/system/cpu/cpu*/cache/*
My understanding to libvirt cache 'bank' (or 'cache bank'): 'virCapsHostCacheBankPtr' represents the 'cache bank'. The 'virCapsHostCacheBankEquals', listed as below, tell us: for two 'cache bank', if they have exactly the same attributes values, means, they are the same 'cache bank', otherwise, they are different 'cache bank's.
bool virCapsHostCacheBankEquals(virCapsHostCacheBankPtr a, virCapsHostCacheBankPtr b) { return (a->id == b->id && a->level == b->level && a->type == b->type && a->size == b->size && virBitmapEqual(a->cpus, b->cpus)); }
How many 'cache bank's in system? and what is the relationship for 'cache bank' and hardware cache block? For convenience, take 2-socket E5-2699v4 system for example, there are two CPU nodes in system, each CPU(or node) has 22 hardware CPU cores, each core is equipped with two private L1 caches and one private L2 data cache, each node has a L3(or LLC) cache shared among cores.
So now you're mixing in the /sys/devices/system/cpu/cpu*/cache/* with the /sys/fs/resctrl/info/*. This is going to get confusing real fast. So far "banks" have been associated with some cpumask map as has the memory bandwidth via node id.
Still nothing in the monitor code leads me to "see" how things would be different for each range of cpus by bank id.
Let's use the concept of 'cache bank' to describe the cache of this 2-socket system, there are two L3 'cache bank's (assuming CDP is disabled), one for each socket and shared by associated CPU cores; There are totally 44 CPU cores, and each CPU core has three private 'cache bank's, the private L1D 'cache bank', L1I 'cache bank' and L2 'cache bank'. In total, 44x3+2=134 'cache bank's exist in the system.
OK, but the TLA's and "knowledge" of levels and private or other variously named 'cache bank' just isn't knowledge I keep in my STM (short term memory) let alone my LTM (long).
Here are my considerations for add @monitor for each 'cache bank': 1. This leverages the design of @controls/@ncontrols. a.) @controls/@ncontrols is designed for introducing cache allocation(CAT) feature, only last level cache of particular CPU supports cache allocation, private 'cache bank' does not support cache allocation. b.) @monitor follows the same policy of @controls/@ncontrols does. For 'cache bank' supports cache monitor, populates appropriate cache
information
through 'virCapsHostCacheBankPtr'->monitor; for 'cache bank' does not support cache monitor feature, just leave @monitor to empty.
I think you've lost me, but I do see the <controls> to some degree equates to some level of calculated values based on the GetCacheInfo data. Again, not code I keep fresh in mind.
I don't see the same for monitor - it's just raw data not related to anything bank related.
I see "maxAlloc" a/k/a _virResctrlInfoMon->max_allocation a/k/a the read "/info/L3_MON/num_rmids" value.
I see "threshold" a/k/a _virResctrlInfoMon->cache_threshold a/k/a the read "/info/L3_MON/max_threshold_occupancy" value.
Nothing to do performing calculations like the control data has. Although not shown in the sample output - it's not clear whether the values could be different between banks. If they cannot, then it's too bad we have so much duplication.
2. Cache ‘monitor’ is designed as an accompany concept to cache ‘control’: cache ‘control’ mostly covers CAT technology, and cache ‘monitor’ now refers to CMT technology.
You also mentioned "it seems the same data is repeated". Yes, it does for some system, for example, a 2-socket E5-2600v4 system.
It's the same because the same file is read. There's nothing that I see that shows any differently.
The *only* files read are in the path:
/sys/fs/resctrl/info/L3_MON/
Not if they were in something like:
/sys/fs/resctrl/info/%s/L3_MON/
where %s related to some bank id number and further if the L3 was L%d where %d = 1, 2, 3, etc. - then I can see the topology you've created.
But the fact remains, it's one file, it's not different, so there's no need for the same data to be replicated.
This is also confusing me a lot when I began to write the POC code. As I said, the 'cache monitor' leverages the 'cache allocation' design, the capability of cache allocation is stored in each 'cache bank' through @ncontrols and @controls also, @controls is empty and @ncontrols is 0, mean that there is no capability of cache allocation for this 'cache bank', otherwise, the @controls/@ncontrols points to an array of 'virResctrlInfoPerCachePtr' pointers, where it indicates the capabilities of cache allocation (CAT) feature. The content of @controls/@ncontrols may also duplicated. So libvirt has the capability (or designed) to keep 'cache bank' unique information, but resctrl could not provide such kind of 'cache bank' unique information, even you have a multiple socket system each socket populates a different CPU. The reality is resctrl only reports system wide CAT, CMT, MBA, MBM capabilities, it does not report CAT,CMT,MBA,MBM capabilities based on
'cache bank'.
I think there should have an explanation for the necessity of keeping the cache allocation capability based on 'cache bank', despite the fact that resctrl only provide a copy of cache allocation capability data.
Maybe, it is not wise to leverage the design of @bank/@controls here. or maybe I totally misunderstood the design of @bank/@controls.
Sorry - I'm no help here. Martin did the original review and perhaps understands the model best.
};
typedef struct _virCapsHostMemBWNode virCapsHostMemBWNode; diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index 4b5442f..2f6923a 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -146,6 +146,8 @@ struct _virResctrlInfo { size_t nlevels;
virResctrlInfoMemBWPtr membw_info; + + virResctrlInfoMonPtr monitor_info; };
@@ -171,6 +173,9 @@ virResctrlInfoDispose(void *obj) VIR_FREE(level); }
+ if (resctrl->monitor_info) + virStringListFree(resctrl->monitor_info->features); + VIR_FREE(resctrl->monitor_info); VIR_FREE(resctrl->membw_info); VIR_FREE(resctrl->levels); } @@ -556,6 +561,81 @@ virResctrlGetMemoryBandwidthInfo(virResctrlInfoPtr resctrl)
Could add a few comments here to describe what's being provided here and how it fits (regardless of where in the schema of things it ends up).
how about adding these as function comments: /* * Retrieve the capability of resource monitoring group by checking the kernel * resource control interface. * the capability information includes:
s/the/The/ (and it can be on the preceding line)
Confirmed. Thanks.
be sure to leave a blank link before the next though - makes it easier to read (at least for me)
Thanks for bearing me! Will try my best to make text readable.
* max_allocation: the maximum number of monitoring groups could be created.
Has monitoring groups been introduced? "could be created"?
'Monitoring groups' are concept of kernel resctrl document. I mis-used it. Will align it with the phrases such as libvirt resource/cache monitor. * max_allocation: the maximum number of resource monitor could be created.
blank line for readability
No problem.
* mon_features: the monitoring features supported, which could be * 'llc_occupancy' for the feature of reporting how much last level cache using. * 'mbm_total_bytes' for the feature of reporting total memory bandwidth using. * 'mbm_local_bytes' for the feature of reporting local memory bandwidth using.
These are essentially text strings for monitoring capabilities - how they're used is something for later on. IOW: This is what's allowed
I'll try to add some content such as their usage.
blank line for readability
No problem.
* cache_threshold: this affects the actual destroy of resource monitoring * group, mainly affects the hardware resource reclaim, a greater value of this, in * bytes, will make the request of creating new resource monitoring group more * likely to fail if the existing number of monitoring groups reaches up to * 'max_allocation'.
ah, what, OK - Doesn't help me. It's a *capability* - changing it would seem to be the chore of someone sufficiently blessed in understanding in how all of this works.
I am bad at describing this from a high level. I'll try to explain it from a low hardware view in almost the end of these replies , if possible please help on describing the role of this phrase.
*/
static int +virResctrlGetMonitorInfo(virResctrlInfoPtr resctrl) { + int rv = -1; + char *featurestr = NULL; + char **lines = NULL> + size_t nlines = 0; + size_t i = 0; + int ret = -1; + virResctrlInfoMonPtr info = NULL; + + if (VIR_ALLOC(info) < 0) + return -1; + + rv = virFileReadValueUint(&info->max_allocation, + SYSFS_RESCTRL_PATH "/info/L3_MON/num_rmids"); + if (rv == -2) { + /* The file doesn't exist, so it's unusable for us, + * probably resource monitoring feature unsupported */ + VIR_WARN("The path '" SYSFS_RESCTRL_PATH "/info/L3_MON/num_rmids' " + "does not exist"); + + ret = 0; + goto cleanup; + } else if (rv < 0) { + /* Other failures are fatal, so just quit */ + goto cleanup; + } + + rv = virFileReadValueUint(&info->cache_threshold, + SYSFS_RESCTRL_PATH + + "/info/L3_MON/max_threshold_occupancy"); + + if (rv == -2) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Cannot get max_threshold_occupancy from resctrl" + " info")); + } + if (rv < 0) + goto cleanup; + + rv = virFileReadValueString(&featurestr, + SYSFS_RESCTRL_PATH + "/info/L3_MON/mon_features"); + if (rv == -2) + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Cannot get mon_features from resctrl info")); + if (rv < 0) + goto cleanup; + + lines = virStringSplitCount(featurestr, "\n", 0, &nlines); + + for (i = 0; i < nlines; i++) { + if (STREQLEN(lines[i], "llc_", strlen("llc_")) || + STREQLEN(lines[i], "mbm_", strlen("mbm_"))) {
Consider using STRPREFIX.
OK. To be fixed.
+ if (virStringListAdd(&info->features, lines[i]) < 0) + goto cleanup; + info->nfeatures++; + }
So we get a list filtered by prefixes "llc_" and "mbm_"
Eventually this list gets pared down again to just "llc_". None of the subsequent patches do anything with "mbm_" other than list it in
capabilities XML output.
Sure "mbm_" could be used in the future, but the question that comes to mind is why are initially filtering at all? That is, why not just replace lines/nlines with info->features and info->nfeatures? That then provides "everything" info->supported in the tree, right?
Agree. Will remove the filter, just report the content from resctrl.
I would figure this code would be just mirroring what's available. It's then up to the upper layers or other code to decide what to do with the list.
Agree. Let upper layer make decision.
+ } + + VIR_FREE(featurestr); + virStringListFree(lines); + resctrl->monitor_info = info; + return 0;
consider using cleanup as part of the processing and thus removing the need to have multiple VIR_FREE(featurestr) and virStringListFree(lines).
If you add a char **features = NULL; which you use to perform the virStringListAdd calls and then
VIR_STEAL_PTR(info->features, features); VIR_STEAL_PTR(resctrl->monitor_info, info); ret = 0;
and fall through allowing the featurestr, lines, and features to be used below.
Of course that all assumes it's really necessary to do the filtering.
There's also the VIR_AUTOPTR stuff added which I'm not exactly sure how to use yet as I wasn't part of that effort.
Thanks for advice, really helpful. Will pay attention to 'cleanup'/'error' label and its backend logic.
+ + cleanup: + VIR_FREE(featurestr); + virStringListFree(lines); + virStringListFree(info->features); + VIR_FREE(info); + return ret; +} + + +static int virResctrlGetInfo(virResctrlInfoPtr resctrl) { DIR *dirp = NULL; @@ -569,6 +649,10 @@ virResctrlGetInfo(virResctrlInfoPtr resctrl) if (ret < 0) goto cleanup;
+ ret = virResctrlGetMonitorInfo(resctrl); + if (ret < 0) + goto cleanup; + ret = virResctrlGetCacheInfo(resctrl, dirp); if (ret < 0) goto cleanup; @@ -654,16 +738,21 @@ virResctrlInfoGetCache(virResctrlInfoPtr resctrl, unsigned int level, unsigned long long size, size_t *ncontrols, - virResctrlInfoPerCachePtr **controls) + virResctrlInfoPerCachePtr **controls, + virResctrlInfoMonPtr *monitor) { virResctrlInfoPerLevelPtr i_level = NULL; virResctrlInfoPerTypePtr i_type = NULL; + virResctrlInfoMonPtr cachemon = NULL; size_t i = 0; int ret = -1;
if (virResctrlInfoIsEmpty(resctrl)) return 0;
+ if (VIR_ALLOC(cachemon) < 0) + return -1; + /* Let's take the opportunity to update the number of last level * cache. This number of memory bandwidth controller is same with * last level cache */ @@ -716,14 +805,35 @@ virResctrlInfoGetCache(virResctrlInfoPtr resctrl, memcpy((*controls)[*ncontrols - 1], &i_type->control, sizeof(i_type- control)); }
- ret = 0; - cleanup: - return ret; + cachemon->max_allocation = 0; + + if (resctrl->monitor_info) { + virResctrlInfoMonPtr info = resctrl->monitor_info; + + cachemon->max_allocation = info->max_allocation; + cachemon->cache_threshold = info->cache_threshold; + for (i = 0; i < info->nfeatures; i++) { + /* Only cares about last level cache */ + if (STREQLEN(info->features[i], "llc_", + strlen("llc_"))) {
Again use STRPREFIX instead.
Will be fixed.
+ if (virStringListAdd(&cachemon->features, + info->features[i]) < 0) + goto error; + cachemon->nfeatures++; + } + }
This code further filters our "llc_" and "mbm_" list into just "llc_". Not sure I understand why we only "care about last level cache" values just
yet.
Of course I see that is all that gets added the subsequent patch capabilities output, but is that really what's wanted. Are we going to start seeing patches that start expanding this list? Why limit/filter now?
Hope I addressed your concerns:
Not sure until I see the next version of course. I think we're perhaps far enough apart to make me concerned.
In resctrl, there are three resource monitoring features: 'llc_occupancy' for cache monitor.
Hence why you placed it where you did...
'mbm_total_bytes' and 'mbm_local_bytes' for memory bandwidth monitor.
does this mean these would be placed somehow under <memory_bandwidth> in a similar manner?
In this series patches I only introduced the cache monitor, and memory bandwidth monitor is planned to be introduced in a series of patches afterwards.
I think if we can come to an agreement on how the capability format should look it will be better. You have my feedback, now it's up to you to take the next step. I'm not sure I have the cycles to engineer this.
All resctrl features list above ('llc_'s and 'mbm_'s) are kept in @resctrl->monitor_info, then, when libvirt wants to get the feature list for, just for, cache monitor, function virResctrlGetCacheInfo will be invoked, only 'llc_' are the interested feature name. this is the reason why limit/filter applied here. (will using full name filter instead of the form such as 'llc_*')
The virResctrlInfoGetCache will be called for any 'cache bank' that supported 'llc_occupancy' feature, the result will be stored in 'cache bank' private data area (with data type virCapsHostCacheBankPtr). Of course, as you mentioned, the data may be duplicated.
The monitor feature list will be expanded when formatting cache monitor capabilities string in task of reporting host capabilities, as illustrated in following code:
function virCapabilitiesFormatCaches@capabilities.c ... if (bank->monitor && bank->monitor->nfeatures) { virBufferAsprintf(&childrenBuf, "<monitor threshold='%u' unit='B' " "maxAllocs='%u'>\n", bank->monitor->cache_threshold, bank->monitor->max_allocation); /* expanding cache monitor feature list */ for (j = 0; j < bank->monitor->nfeatures; j++) { virBufferAdjustIndent(&childrenBuf, 2); virBufferAsprintf(&childrenBuf, "<feature name='%s'/>\n", bank->monitor->features[j]); virBufferAdjustIndent(&childrenBuf, -2); } virBufferAddLit(&childrenBuf, "</monitor>\n"); } ...
+ } + + if (cachemon->features) + *monitor = cachemon;
if (!cachemon->features), then @cachemon is leaked, consider using:
VIR_STEAL_PTR(*monitor, cachemon);
You catched my bug. Thanks.
in the if condition, then
VIR_FREE(cachemon);
or just the VIR_FREE(cachemon); as an else. IDC either way. Of course, it's still not quite clear what's going on.
Perhaps, you should have an API that gets all the names of the values prefixed by some string, IOW:
virResctrlInfoGetMonitorPrefix(resctrl, filter)
where filter is a "const char *filter"
it would return that cachemon list whether it's NULL, empty, or full anything. Let the caller decide what to do with it.
Looks reasonable and make code more concise, especially when adding memory bandwidth monitor. Will be implemented in next version patch.
I haven't looked beyond the first 4 patches, so how things are used later on may determine what API's you could need. The relationship to <cache> and <bank> isn't clear.
"The relationship to <cache> and <bank> isn't clear." -- Not catching your idea too much, do you mean the relationship to 'physical CPU cache' and the software scope 'cache bank' is not clear? If yes, please refer to my upper replies, the clarification paragraph of 'cache bank' with an example of 2S E5-2699v4 system.
If it's not clear by this response, I'm afraid we'll just be too far apart going forward.
Maybe the capability output should:
<monitor level='3' threshold='%u' maxAlloc='%u' > <feature name='%s'/> <feature name='%s'/> ... </monitor>
Where the '3' is because you read from "L3_MON" and only important if you feel 1 or 2 or something else would be generated eventually.
The adding of 'level' attribute make it more clear for indicating the cache level that supports cache monitoring technology, great. I have read your comments that newly added in the email, but I haven't reply them one by one for those related to the relationship for @monitor and @bank. Let's move on here ... As I said, I was also confused at the time I began my first design, I also noticed something is not that right for putting the @monitor under the @bank. how about placing the XML element 'monitor' under 'cache' in following style: <cache> <bank id='0' level='3' type='both' size='55' unit='MiB' cpus='0-21,44-65'> <control granularity='2816' unit='KiB' type='both' maxAllocs='16'/> </bank> <bank id='1' level='3' type='both' size='55' unit='MiB' cpus='22-43,66-87'> <control granularity='2816' unit='KiB' type='both' maxAllocs='16'/> </bank> + <monitor level='@cache level' maxAllocs='@maxAlloc' + <feature name='@feature names'/> + </monitor> </cache> The 'unit' is removed. for cache, the monitor only support feature 'llc_occupancy'.
And then you correlate however you have to "later". You "know" that <cache> would be related to "llc_occupancy" and take it from there.
I see no way for each feature to have a different num_rmids or max__threshold_occupancy value, so that's why I'm putting them as attributes of <monitor>. What is of concern is how someone knows <monitor> relates to both <cache> and <memory_bandwidth> - I guess that has to be left for the documentation portion. If you wanted to name it something different than <monitor> that's fine - naming is hard (TM).
num_rmids is common for all kind of monitors, both cache monitor and memory bandwidth monitor. But max_threshold_occupancy is special for cache monitor, more precisely, special for feature 'llc_occupancy', Maybe following configuration is more reasonable for cache monitor: + <monitor level='3' maxAllocs='176'> + <feature name='llc_occupancy' threshold='270336' /> + </monitor> And for memory bandwidth it would be: + <monitor level='3' maxAllocs='176'> + <feature name='mbm_total_bytes'/> + <feature name='mbm_local_bytes'/> + </monitor> I don't find a better name than 'monitor' for naming CMT technology in libvirt. I am even worse at naming.
+ + return 0; error: while (*ncontrols) VIR_FREE((*controls)[--*ncontrols]); VIR_FREE(*controls); - goto cleanup; + virStringListFree(cachemon->features); + VIR_FREE(cachemon); + return ret; }
diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h index cfd56dd..51bb68b 100644 --- a/src/util/virresctrl.h +++ b/src/util/virresctrl.h @@ -61,6 +61,19 @@ struct _virResctrlInfoMemBWPerNode { unsigned int max_allocation; };
+typedef struct _virResctrlInfoMon virResctrlInfoMon; typedef +virResctrlInfoMon *virResctrlInfoMonPtr; +/* Information about resource monitoring group */ struct +_virResctrlInfoMon { + /* null-terminal string list for hw supported monitor feature */ + char **features; + size_t nfeatures; + /* Maximum number of simultaneous allocations */ + unsigned int max_allocation;
What kind of allocations? From your cover you state maximum number of monitoring groups that could be created, but it's impossible to know how this value is expected to be used by what's provided as a comment here.
Changing the comment from /* Maximum number of simultaneous allocations */ to /* Maximum number of monitoring groups that could be created */
The code shows this is the value from /info/L3_MON/num_rmids - I don't see the correlation in FS name to structure name. Since you later print a unit of 'B' I assume it's bytes of something, but the comment seems to imply a maximum simultaneous number of something.
Looks the capability string make you confused. The XML output of cache monitor capability looks like: (leveraging the format of 'cache control' capability)
<monitor threshold='270336' unit='B' maxAllocs='176'> <feature name='llc_occupancy'/> </monitor>
'B' is the unit for 'threshold', 'maxAllocs' is for max_allocation (parsed through /info/L3_MON/num_rmids). The 'unit' are not limited to 'B', could be 'KiB' ...
True, but you read a 'B' and never change that anywhere - it's also not clear threshold is a 'B' value. At least when I see 'size' followed by 'unit' - I'm certain the size is a related to unit. To me "threshold" is just a "value" not a necessarily byte related. Could be a count of something.
You have convinced me, let's remove attribute 'unit'.
Of course the longer wording "max_threshold_occupancy" doesn't help me much either. A maximum threshold occupancy of what? It's not unique to each feature name, it's unique to the L3_MON.
Still trying to recap its behavior better...
This tells docs are necessary for interpreting these settings.
clearly!
"monitor": describes cache monitor capability. "threshold": This is cache occupancy threshold value used in kernel resource control system, and affects the actual release of hardware resource, the RMID (resource monitoring ID). A greater value of this will make the request of creating a new resource monitoring group more likely to fail if the existing number of monitoring groups reaches up to 'maxAlloc'.
Again it's not something a "normal consumer" would probably change...
"unit": This is the unit of "threashold", could be 'B', 'KiB', 'MiB' or 'GiB'.
Remove it. The threshold would be treated as byte, and make corresponding changes in document.
Unless you do the logic to present it as calculated value, then what's the purpose. I know there's code out there that will "prettify" output such that for the example from patch 4 a value of 270336 bytes is printed as '264 MiB'. If you're not going to do that, then just present as a value and note that it's a byte value. /me no wonders if you should be sure to store this in "unsigned long long" field since you mention 'GiB' an 'unsigned int' only gets you so far.
"maxAllocs": This is a number that maximum number of monitoring groups
could
be created. "feature": describes the feature name supported by 'monitor'.
Hope this documentation clears your confusion.
Perhaps if documentation was added I would have had my answer without needing to go research that is.
See docs above.
+ /* determines the occupancy at which an RMID can be freed */
Again, alone this comment is difficult to decipher as it relates to the structure field name. The code shows the value read is "/info/L3_MON/max_threshold_occupancy". It's not clear what "occupancy" means. Is there something related to this number that some consumer could change that would improve some performance?
"max_threshold_occupancy" is a concept involved by kernel resctrl. It is a cache value, in bytes, affects the release of hardware 'RMID', thus, affects the maximum number of monitor group could be created. Get more
information from
the cache monitor attribute 'threshold''s description.
Thanks for your efforts of the review.
Thanks for your return of details - I'm still not sure I really understand the maxAllocs and threshold values. I see them purely as display values at this point. I cannot imagine providing an interface or description that would help some consumer adjust the value to fix some problem on their host. There are patch series in the virtual bit bucket that have tried to do that in other areas.
The 'maxAllocs' should be easily understood, while 'threshold' is very obscure from the description of kernel document(intel_rdt_ui.txt). The maxAllocs number is the hardware RMID number, which is CPU chip level resource, each resource monitor will cost one RMID. So the maxAllocs number determines the maximum number monitor could be created. If hardware RMID is used up, the next creation of resource monitor will return an error. The threshold, or the max_threshold_occupancy, is harder to understand. It is related to the cache monitor, or the llc_occupancy feature. The following paragraph is my understanding of it, hope it helps you. If we have the goal to get to know the last level cache consumption for special Linux process for some time. With the help of resctrl file system, we need to create a monitor group and then remove this monitor in order to save the resource. We also need to put the target process's PID into the monitor's 'tasks' file. The underlying details is something like: A hardware RMID is allocated from hardware resource pool. The RMID is assigned to particular hardware CPU thread (or threads, RMID could be assigned to multiple threads at same time) during context switch. At the same time, the RMID is tagged the last level cache lines. When resctrl monitor removing operation is performed. The RMID could not be immediately reclaimed, because it is still tagged the cache lines, while these cache lines will be 'maintained' for some while. If we let this RMID to be used by another monitor immediately, the cache occupancy/consumption data will be inaccurate because it still tagged with cache lines which are used by previous Linux process. So this 'max_threshold_occupancy' is provided by RMID manager, actually the kernel CPU driver, to help make the judgment that if the number of cache line associated with the RMID is small enough. If the cache occupancy is less than this safety threshold, the RMID will be released for next reuse. To me, it hard to recap these information from a high level for a libvirt user. By the way, do you think if we really necessary to expose this 'threshold' to Libvirt user? I doubt that anyone would change it. Thanks for review. Huaqiang
John
John
+ unsigned int cache_threshold; +}; + typedef struct _virResctrlInfo virResctrlInfo; typedef virResctrlInfo *virResctrlInfoPtr;
@@ -72,7 +85,9 @@ virResctrlInfoGetCache(virResctrlInfoPtr resctrl, unsigned int level, unsigned long long size, size_t *ncontrols, - virResctrlInfoPerCachePtr **controls); + virResctrlInfoPerCachePtr **controls, + virResctrlInfoMonPtr *monitor); +
int virResctrlInfoGetMemoryBandwidth(virResctrlInfoPtr resctrl,

[...]
Maybe the capability output should:
<monitor level='3' threshold='%u' maxAlloc='%u' > <feature name='%s'/> <feature name='%s'/> ... </monitor>
Where the '3' is because you read from "L3_MON" and only important if you feel 1 or 2 or something else would be generated eventually.
The adding of 'level' attribute make it more clear for indicating the cache level that supports cache monitoring technology, great.
I have read your comments that newly added in the email, but I haven't reply them one by one for those related to the relationship for @monitor and @bank. Let's move on here ...
As I said, I was also confused at the time I began my first design, I also noticed something is not that right for putting the @monitor under the @bank.
how about placing the XML element 'monitor' under 'cache' in following style:
<cache> <bank id='0' level='3' type='both' size='55' unit='MiB' cpus='0-21,44-65'> <control granularity='2816' unit='KiB' type='both' maxAllocs='16'/> </bank> <bank id='1' level='3' type='both' size='55' unit='MiB' cpus='22-43,66-87'> <control granularity='2816' unit='KiB' type='both' maxAllocs='16'/> </bank> + <monitor level='@cache level' maxAllocs='@maxAlloc' + <feature name='@feature names'/> + </monitor> </cache>
Seems reasonable.
The 'unit' is removed. for cache, the monitor only support feature 'llc_occupancy'.
Shall I assume then that the <monitor...> would also appear in the <memory_bandwidth> and have entries for "mbm_"? IDC what the answer is, all I'm trying to point out is be sure that whatever API's get created will be able to reuse the same code, but just change the prefix.
And then you correlate however you have to "later". You "know" that <cache> would be related to "llc_occupancy" and take it from there.
I see no way for each feature to have a different num_rmids or max__threshold_occupancy value, so that's why I'm putting them as attributes of <monitor>. What is of concern is how someone knows <monitor> relates to both <cache> and <memory_bandwidth> - I guess that has to be left for the documentation portion. If you wanted to name it something different than <monitor> that's fine - naming is hard (TM).
num_rmids is common for all kind of monitors, both cache monitor and memory bandwidth monitor.
But max_threshold_occupancy is special for cache monitor, more precisely, special for feature 'llc_occupancy', Maybe following configuration is more reasonable for cache monitor:
+ <monitor level='3' maxAllocs='176'> + <feature name='llc_occupancy' threshold='270336' />
This is a strange one... Part of me would say that there's then some file llc_occupancy that has in it the @threshold value. Going into the future if a new feature name was created, one wonders how or if it's attribute would/could be similarly named 'threshold' or would it "assume" the same threshold as the other uses. Hard to know without seeing in the future.... Hint, buy some lottery tickets too when you're there.
+ </monitor>
And for memory bandwidth it would be:
+ <monitor level='3' maxAllocs='176'> + <feature name='mbm_total_bytes'/> + <feature name='mbm_local_bytes'/> + </monitor>
I don't find a better name than 'monitor' for naming CMT technology in libvirt. I am even worse at naming.
"Naming is hard" (TM, Andrea Bolognani)... Not many are good at naming, but everyone is good at complaining about someone else's chosen name for something. I think monitor is fine - I see it as "Cache Monitoring..." or a "Memory Bandwidth Monitoring...". Still for this one, because threshold is listed as a property, one wonders then is there a similar "threshold" that describes the total or local bytes value that could/should be printed.
+ + return 0; error: while (*ncontrols) VIR_FREE((*controls)[--*ncontrols]); VIR_FREE(*controls); - goto cleanup; + virStringListFree(cachemon->features); + VIR_FREE(cachemon); + return ret; }
diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h index cfd56dd..51bb68b 100644 --- a/src/util/virresctrl.h +++ b/src/util/virresctrl.h @@ -61,6 +61,19 @@ struct _virResctrlInfoMemBWPerNode { unsigned int max_allocation; };
+typedef struct _virResctrlInfoMon virResctrlInfoMon; typedef +virResctrlInfoMon *virResctrlInfoMonPtr; +/* Information about resource monitoring group */ struct +_virResctrlInfoMon { + /* null-terminal string list for hw supported monitor feature */ + char **features; + size_t nfeatures; + /* Maximum number of simultaneous allocations */ + unsigned int max_allocation;
What kind of allocations? From your cover you state maximum number of monitoring groups that could be created, but it's impossible to know how this value is expected to be used by what's provided as a comment here.
Changing the comment from /* Maximum number of simultaneous allocations */ to /* Maximum number of monitoring groups that could be created */
The code shows this is the value from /info/L3_MON/num_rmids - I don't see the correlation in FS name to structure name. Since you later print a unit of 'B' I assume it's bytes of something, but the comment seems to imply a maximum simultaneous number of something.
Looks the capability string make you confused. The XML output of cache monitor capability looks like: (leveraging the format of 'cache control' capability)
<monitor threshold='270336' unit='B' maxAllocs='176'> <feature name='llc_occupancy'/> </monitor>
'B' is the unit for 'threshold', 'maxAllocs' is for max_allocation (parsed through /info/L3_MON/num_rmids). The 'unit' are not limited to 'B', could be 'KiB' ...
True, but you read a 'B' and never change that anywhere - it's also not clear threshold is a 'B' value. At least when I see 'size' followed by 'unit' - I'm certain the size is a related to unit. To me "threshold" is just a "value" not a necessarily byte related. Could be a count of something.
You have convinced me, let's remove attribute 'unit'.
Of course the longer wording "max_threshold_occupancy" doesn't help me much either. A maximum threshold occupancy of what? It's not unique to each feature name, it's unique to the L3_MON.
Still trying to recap its behavior better...
This tells docs are necessary for interpreting these settings.
clearly!
"monitor": describes cache monitor capability. "threshold": This is cache occupancy threshold value used in kernel resource control system, and affects the actual release of hardware resource, the RMID (resource monitoring ID). A greater value of this will make the request of creating a new resource monitoring group more likely to fail if the existing number of monitoring groups reaches up to 'maxAlloc'.
Again it's not something a "normal consumer" would probably change...
"unit": This is the unit of "threashold", could be 'B', 'KiB', 'MiB' or 'GiB'.
Remove it. The threshold would be treated as byte, and make corresponding changes in document.
Unless you do the logic to present it as calculated value, then what's the purpose. I know there's code out there that will "prettify" output such that for the example from patch 4 a value of 270336 bytes is printed as '264 MiB'. If you're not going to do that, then just present as a value and note that it's a byte value. /me no wonders if you should be sure to store this in "unsigned long long" field since you mention 'GiB' an 'unsigned int' only gets you so far.
"maxAllocs": This is a number that maximum number of monitoring groups
could
be created. "feature": describes the feature name supported by 'monitor'.
Hope this documentation clears your confusion.
Perhaps if documentation was added I would have had my answer without needing to go research that is.
See docs above.
+ /* determines the occupancy at which an RMID can be freed */
Again, alone this comment is difficult to decipher as it relates to the structure field name. The code shows the value read is "/info/L3_MON/max_threshold_occupancy". It's not clear what "occupancy" means. Is there something related to this number that some consumer could change that would improve some performance?
"max_threshold_occupancy" is a concept involved by kernel resctrl. It is a cache value, in bytes, affects the release of hardware 'RMID', thus, affects the maximum number of monitor group could be created. Get more
information from
the cache monitor attribute 'threshold''s description.
Thanks for your efforts of the review.
Thanks for your return of details - I'm still not sure I really understand the maxAllocs and threshold values. I see them purely as display values at this point. I cannot imagine providing an interface or description that would help some consumer adjust the value to fix some problem on their host. There are patch series in the virtual bit bucket that have tried to do that in other areas.
The 'maxAllocs' should be easily understood, while 'threshold' is very obscure from the description of kernel document(intel_rdt_ui.txt).
The maxAllocs number is the hardware RMID number, which is CPU chip level resource, each resource monitor will cost one RMID. So the maxAllocs number determines the maximum number monitor could be created. If hardware RMID is used up, the next creation of resource monitor will return an error.
So somewhat similar to vHBA's where there's a limited number of vport_ops available. Still shouldn't maxAllocs get decremented for each <monitor> created? Thus if <monitor>[1] has maxAllocs=176, then <monitor>[2] would display maxAllocs=175... See: https://wiki.libvirt.org/page/NPIV_in_libvirt and search on vports - you'll see "vports" and "max_vports", but those are tied to the "parent" scsi_host. As more vHBA's are created the vports count changes, but max_vports stays the same.
The threshold, or the max_threshold_occupancy, is harder to understand. It is related to the cache monitor, or the llc_occupancy feature. The following paragraph is my understanding of it, hope it helps you.
If we have the goal to get to know the last level cache consumption for special Linux process for some time. With the help of resctrl file system, we need to create a monitor group and then remove this monitor in order to save the resource. We also need to put the target process's PID into the monitor's 'tasks' file.
The underlying details is something like:
A hardware RMID is allocated from hardware resource pool.
The RMID is assigned to particular hardware CPU thread (or threads, RMID could be assigned to multiple threads at same time) during context switch.
At the same time, the RMID is tagged the last level cache lines.
When resctrl monitor removing operation is performed. The RMID could not be immediately reclaimed, because it is still tagged the cache lines, while these cache lines will be 'maintained' for some while.
If we let this RMID to be used by another monitor immediately, the cache occupancy/consumption data will be inaccurate because it still tagged with cache lines which are used by previous Linux process.
So this 'max_threshold_occupancy' is provided by RMID manager, actually the kernel CPU driver, to help make the judgment that if the number of cache line associated with the RMID is small enough. If the cache occupancy is less than this safety threshold, the RMID will be released for next reuse.
To me, it hard to recap these information from a high level for a libvirt user. By the way, do you think if we really necessary to expose this 'threshold' to Libvirt user? I doubt that anyone would change it.
If something is hard to recap, then the question becomes is it worthwhile to expose. To some degree having the data available is nice, but knowing how to interpret or use the data is even better. At other times though exposing the data leads to the inevitable is the value good or bad. If bad, then what can we do to make things better. I think in the long run it's a question of interpretation. Exposing performance type data is both good and bad. It's a double edged sword of truth. Still for something like libvirt - only providing the raw numbers is perfectly fine. Some other tool can be developed to help interpret those values, just make it so that tool has half a chance to figure out what the data represents. Changing the name "too much" from the what is represented makes for harder interpretation. That may just be the case with "rmid" and "maxAlloc". It may be that "Allocs" should be replaced by "Monitors", but then does that mean each child element is "one" monitor object. Guess I still don't have a great picture in mind regarding how this is used from a client perspective. Right now it's just a lot of data. John
Thanks for review. Huaqiang
John
John
+ unsigned int cache_threshold; +}; + typedef struct _virResctrlInfo virResctrlInfo; typedef virResctrlInfo *virResctrlInfoPtr;
@@ -72,7 +85,9 @@ virResctrlInfoGetCache(virResctrlInfoPtr resctrl, unsigned int level, unsigned long long size, size_t *ncontrols, - virResctrlInfoPerCachePtr **controls); + virResctrlInfoPerCachePtr **controls, + virResctrlInfoMonPtr *monitor); +
int virResctrlInfoGetMemoryBandwidth(virResctrlInfoPtr resctrl,

Hi John, Sorry for replying this email so late, because I dared not to promise more answers before they have been verified by POC code. Sometimes I have to change the design even we have achieved agreement in the previous discussion if they are too many difficulties to implement. Anyway, I am trying to stick to our consensus. Later today I'll submit the patch series for monitor capability that we intensively discussed in the emails threads. The final output of 'capabilities' would like these: if 'CMT' is enabled in host, then a 'cache monitor' is introduced for cache, which is role is monitoring the last level cache utilization of target system process. Cache monitor capabilities is shown under element <cache>. <cache> <bank id='0' level='3' type='both' size='15' unit='MiB' cpus='0-5'> <control granularity='768' min='1536' unit='KiB' type='both' maxAllocs='4'/> </bank> <bank id='1' level='3' type='both' size='15' unit='MiB' cpus='6-11'> <control granularity='768' min='1536' unit='KiB' type='both' maxAllocs='4'/> </bank> <monitor level='3' reuseThreshold='270336' maxMonitors='176'> <feature name='llc_occupancy'/> </monitor> Note: Cache monitor works even cache allocation is not supported in host. 'maxAllocations' is substituted with 'maxMonitors'. 'threshold' is substituted with ' reuseThreshold'. My explanation for this is /* This Adjustable value affects the final reuse of resources used by * monitor. After the action of removing a monitor, the kernel may not * release all hardware resources that monitor used immediately if the * cache occupancy value associated with 'removed' monitor is above this * threshold. Once the cache occupancy is below this threshold, the * underlying hardware resource will be reclaimed and be put into the * resource pool for next reusing.*/ unsigned int cache_reuse_threshold; Then for 'MBM', a monitor named memory bandwidth monitor is introduced in patches, for role of monitoring memory bandwidth utilization. The capacity information block is located under <memory bandwidth> element. <memory_bandwidth> <node id='0' cpus='0-5'> <control granularity='10' min ='10' maxAllocs='4'/> </node> <node id='1' cpus='6-11'> <control granularity='10' min ='10' maxAllocs='4'/> </node> <monitor maxMonitors='176'> <feature name='mbm_total_bytes'/> <feature name='mbm_local_bytes'/> </monitor> </memory_bandwidth> There is only information copy for each capability. Three test cases are performed on the POC functionality: 1. vircaps-x86_64-resctrl.xml: Case for CAT, MBA, CMT and MBM features are supported. 2. vircaps-x86_64-resctrl-cmt.xml: Case for CAT is only supported feature. 3. vircaps-x86_64-resctrl-fake-feature.xml: Case for involving some future and fake feature set. Also replied the questions/comments you raised. See my answers inline. Thanks Huaqiang
-----Original Message----- From: John Ferlan [mailto:jferlan@redhat.com] Sent: Wednesday, September 12, 2018 1:24 AM To: Wang, Huaqiang <huaqiang.wang@intel.com>; libvir-list@redhat.com Cc: Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [PATCH 02/10] util: add interface retrieving CMT capability
[...]
Maybe the capability output should:
<monitor level='3' threshold='%u' maxAlloc='%u' > <feature name='%s'/> <feature name='%s'/> ... </monitor>
Where the '3' is because you read from "L3_MON" and only important if you feel 1 or 2 or something else would be generated eventually.
The adding of 'level' attribute make it more clear for indicating the cache level that supports cache monitoring technology, great.
I have read your comments that newly added in the email, but I haven't reply them one by one for those related to the relationship for @monitor and @bank. Let's move on here ...
As I said, I was also confused at the time I began my first design, I also noticed something is not that right for putting the @monitor under the @bank.
how about placing the XML element 'monitor' under 'cache' in following style:
<cache> <bank id='0' level='3' type='both' size='55' unit='MiB' cpus='0-21,44-65'> <control granularity='2816' unit='KiB' type='both' maxAllocs='16'/> </bank> <bank id='1' level='3' type='both' size='55' unit='MiB' cpus='22-43,66-87'> <control granularity='2816' unit='KiB' type='both' maxAllocs='16'/> </bank> + <monitor level='@cache level' maxAllocs='@maxAlloc' + <feature name='@feature names'/> + </monitor> </cache>
Seems reasonable.
The 'unit' is removed. for cache, the monitor only support feature 'llc_occupancy'.
Shall I assume then that the <monitor...> would also appear in the <memory_bandwidth> and have entries for "mbm_"? IDC what the answer is, all I'm trying to point out is be sure that whatever API's get created will be able to reuse the same code, but just change the prefix.
Yes. <memory_bandwidth> has a sub-element of <monitor>. Code/API reuse is considered in POC design.
And then you correlate however you have to "later". You "know" that <cache> would be related to "llc_occupancy" and take it from there.
I see no way for each feature to have a different num_rmids or max__threshold_occupancy value, so that's why I'm putting them as attributes of <monitor>. What is of concern is how someone knows <monitor> relates to both <cache> and <memory_bandwidth> - I guess that has to be left for the documentation portion. If you wanted to name it something different than <monitor> that's fine - naming is hard (TM).
num_rmids is common for all kind of monitors, both cache monitor and memory bandwidth monitor.
But max_threshold_occupancy is special for cache monitor, more precisely, special for feature 'llc_occupancy', Maybe following configuration is more reasonable for cache monitor:
+ <monitor level='3' maxAllocs='176'> + <feature name='llc_occupancy' threshold='270336' />
This is a strange one... Part of me would say that there's then some file llc_occupancy that has in it the @threshold value. Going into the future if a new feature name was created, one wonders how or if it's attribute would/could be similarly named 'threshold' or would it "assume" the same threshold as the other uses.
Hard to know without seeing in the future.... Hint, buy some lottery tickets too when you're there.
Removed from <feature> attribute, added it to cache <monitor>.
+ </monitor>
And for memory bandwidth it would be:
+ <monitor level='3' maxAllocs='176'> + <feature name='mbm_total_bytes'/> + <feature name='mbm_local_bytes'/> + </monitor>
I don't find a better name than 'monitor' for naming CMT technology in libvirt. I am even worse at naming.
"Naming is hard" (TM, Andrea Bolognani)... Not many are good at naming, but everyone is good at complaining about someone else's chosen name for something. I think monitor is fine - I see it as "Cache Monitoring..." or a "Memory Bandwidth Monitoring...".
Still for this one, because threshold is listed as a property, one wonders then is there a similar "threshold" that describes the total or local bytes value that could/should be printed.
Moved 'threshold' (now renamed to 'reuseThreshold' ) to monitor's attribute field. See my new capability information layout I listed in the header of email. I implemented two kinds of monitor, there is no 'threshold' attribute for memory bandwidth monitor but cache monitor has this attribute, this wouldn't introduce confusion.
+ + return 0; error: while (*ncontrols) VIR_FREE((*controls)[--*ncontrols]); VIR_FREE(*controls); - goto cleanup; + virStringListFree(cachemon->features); + VIR_FREE(cachemon); + return ret; }
diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h index cfd56dd..51bb68b 100644 --- a/src/util/virresctrl.h +++ b/src/util/virresctrl.h @@ -61,6 +61,19 @@ struct _virResctrlInfoMemBWPerNode { unsigned int max_allocation; };
+typedef struct _virResctrlInfoMon virResctrlInfoMon; typedef +virResctrlInfoMon *virResctrlInfoMonPtr; +/* Information about resource monitoring group */ struct +_virResctrlInfoMon { + /* null-terminal string list for hw supported monitor feature */ + char **features; + size_t nfeatures; + /* Maximum number of simultaneous allocations */ + unsigned int max_allocation;
What kind of allocations? From your cover you state maximum number of monitoring groups that could be created, but it's impossible to know how this value is expected to be used by what's provided as a
comment here.
Changing the comment from /* Maximum number of simultaneous allocations */ to /* Maximum number of monitoring groups that could be created */
The code shows this is the value from /info/L3_MON/num_rmids - I don't see the correlation in FS name to structure name. Since you later print a unit of 'B' I assume it's bytes of something, but the comment seems to imply a maximum simultaneous number of something.
Looks the capability string make you confused. The XML output of cache monitor capability looks like: (leveraging the format of 'cache control' capability)
<monitor threshold='270336' unit='B' maxAllocs='176'> <feature name='llc_occupancy'/> </monitor>
'B' is the unit for 'threshold', 'maxAllocs' is for max_allocation (parsed through /info/L3_MON/num_rmids). The 'unit' are not limited to 'B', could be 'KiB' ...
True, but you read a 'B' and never change that anywhere - it's also not clear threshold is a 'B' value. At least when I see 'size' followed by 'unit' - I'm certain the size is a related to unit. To me "threshold" is just a "value" not a necessarily byte related. Could be a count of something.
You have convinced me, let's remove attribute 'unit'.
Of course the longer wording "max_threshold_occupancy" doesn't help me much either. A maximum threshold occupancy of what? It's not unique to each feature name, it's unique to the L3_MON.
Still trying to recap its behavior better...
This tells docs are necessary for interpreting these settings.
clearly!
"monitor": describes cache monitor capability. "threshold": This is cache occupancy threshold value used in kernel resource control system, and affects the actual release of hardware resource, the RMID (resource monitoring ID). A greater value of this will make the request of creating a new resource monitoring group
more
likely to fail if the existing number of monitoring groups reaches up to 'maxAlloc'.
Again it's not something a "normal consumer" would probably change...
"unit": This is the unit of "threashold", could be 'B', 'KiB', 'MiB' or 'GiB'.
Remove it. The threshold would be treated as byte, and make corresponding changes in document.
Unless you do the logic to present it as calculated value, then what's the purpose. I know there's code out there that will "prettify" output such that for the example from patch 4 a value of 270336 bytes is printed as '264 MiB'. If you're not going to do that, then just present as a value and note that it's a byte value. /me no wonders if you should be sure to store this in "unsigned long long" field since you mention 'GiB' an 'unsigned int' only gets you so far.
"maxAllocs": This is a number that maximum number of monitoring groups
could
be created. "feature": describes the feature name supported by 'monitor'.
Hope this documentation clears your confusion.
Perhaps if documentation was added I would have had my answer without needing to go research that is.
See docs above.
+ /* determines the occupancy at which an RMID can be freed */
Again, alone this comment is difficult to decipher as it relates to the structure field name. The code shows the value read is "/info/L3_MON/max_threshold_occupancy". It's not clear what
"occupancy"
means. Is there something related to this number that some consumer could change that would improve some performance?
"max_threshold_occupancy" is a concept involved by kernel resctrl. It is a cache value, in bytes, affects the release of hardware 'RMID', thus, affects the maximum number of monitor group could be created. Get more information from the cache monitor attribute 'threshold''s description.
Thanks for your efforts of the review.
Thanks for your return of details - I'm still not sure I really understand the maxAllocs and threshold values. I see them purely as display values at this point. I cannot imagine providing an interface or description that would help some consumer adjust the value to fix some problem on their host. There are patch series in the virtual bit bucket that have tried to do that in other areas.
The 'maxAllocs' should be easily understood, while 'threshold' is very obscure from the description of kernel document(intel_rdt_ui.txt).
The maxAllocs number is the hardware RMID number, which is CPU chip level resource, each resource monitor will cost one RMID. So the maxAllocs number determines the maximum number monitor could be created. If hardware RMID is used up, the next creation of resource monitor will return an error.
So somewhat similar to vHBA's where there's a limited number of vport_ops available. Still shouldn't maxAllocs get decremented for each <monitor> created? Thus if <monitor>[1] has maxAllocs=176, then <monitor>[2] would display maxAllocs=175...
See: https://wiki.libvirt.org/page/NPIV_in_libvirt and search on vports - you'll see "vports" and "max_vports", but those are tied to the "parent" scsi_host. As more vHBA's are created the vports count changes, but max_vports stays the same.
I carefully read the vHBA/vports document. Vports count reflects the existing resource available. But due to the existence of behavior of monitor resource destroying cache occupancy 'threshold', it is not possible to get the accurate number that number of monitors that could be created later. If a monitor is destroyed, the RMID is not reclaimed immediately, that is depending on the cache lines still tagged with the 'RMID'. We'd better not adopt this kind of design, because we cannot get the number of monitors that user could create.
The threshold, or the max_threshold_occupancy, is harder to understand. It is related to the cache monitor, or the llc_occupancy feature. The following paragraph is my understanding of it, hope it helps you.
If we have the goal to get to know the last level cache consumption for special Linux process for some time. With the help of resctrl file system, we need to create a monitor group and then remove this monitor in order to save the resource. We also need to put the target process's PID into the monitor's 'tasks' file.
The underlying details is something like:
A hardware RMID is allocated from hardware resource pool.
The RMID is assigned to particular hardware CPU thread (or threads, RMID could be assigned to multiple threads at same time) during context switch.
At the same time, the RMID is tagged the last level cache lines.
When resctrl monitor removing operation is performed. The RMID could not be immediately reclaimed, because it is still tagged the cache lines, while these cache lines will be 'maintained' for some while.
If we let this RMID to be used by another monitor immediately, the cache occupancy/consumption data will be inaccurate because it still tagged with cache lines which are used by previous Linux process.
So this 'max_threshold_occupancy' is provided by RMID manager, actually the kernel CPU driver, to help make the judgment that if the number of cache line associated with the RMID is small enough. If the cache occupancy is less than this safety threshold, the RMID will be released for next reuse.
To me, it hard to recap these information from a high level for a libvirt user. By the way, do you think if we really necessary to expose this 'threshold' to Libvirt user? I doubt that anyone would change it.
If something is hard to recap, then the question becomes is it worthwhile to expose. To some degree having the data available is nice, but knowing how to interpret or use the data is even better. At other times though exposing the data leads to the inevitable is the value good or bad. If bad, then what can we do to make things better. I think in the long run it's a question of interpretation.
Let's keep it. My new recap for it is: /* This Adjustable value affects the final reuse of resources used by * monitor. After the action of removing a monitor, the kernel may not * release all hardware resources that monitor used immediately if the * cache occupancy value associated with 'removed' monitor is above this * threshold. Once the cache occupancy is below this threshold, the * underlying hardware resource will be reclaimed and be put into the * resource pool for next reusing.*/
Exposing performance type data is both good and bad. It's a double edged sword of truth. Still for something like libvirt - only providing the raw numbers is perfectly fine. Some other tool can be developed to help interpret those values, just make it so that tool has half a chance to figure out what the data represents. Changing the name "too much" from the what is represented makes for harder interpretation. That may just be the case with "rmid" and "maxAlloc". It may be that "Allocs" should be replaced by "Monitors", but then does that mean each child element is "one" monitor object. Guess I still don't have a great picture in mind regarding how this is used from a client perspective. Right now it's just a lot of data.
Replaced 'maxAllocs' with 'maxMonitors'.
John
Thanks for review. Huaqiang
John
John
+ unsigned int cache_threshold; }; + typedef struct _virResctrlInfo virResctrlInfo; typedef virResctrlInfo *virResctrlInfoPtr;
@@ -72,7 +85,9 @@ virResctrlInfoGetCache(virResctrlInfoPtr resctrl, unsigned int level, unsigned long long size, size_t *ncontrols, - virResctrlInfoPerCachePtr **controls); + virResctrlInfoPerCachePtr **controls, + virResctrlInfoMonPtr *monitor); +
int virResctrlInfoGetMemoryBandwidth(virResctrlInfoPtr resctrl,

CMT capability for each cache bank, includes -. Maximum CMT monitoring groups(sharing with MBM) could be created, which reflects the maximum hardware RMID count. -. 'cache threshold'. -. Statistical information of last level cache, the actual cache occupancy. cache is splitted into 'bank's, each bank MAY have different cache configuration, report cache monitoring capability in unit of cache bank. cache monitor capability is shown as below: <cache> <bank id='0' level='3' type='both' size='15' unit='MiB' cpus='0-5'> <control granularity='768' unit='KiB' type='code' maxAllocs='8'/> <control granularity='768' unit='KiB' type='data' maxAllocs='8'/> + <monitor threshold='540672' unit='B' maxAllocs='176'/> + <feature name=llc_occupancy/> + </monitor> </bank> <bank id='1' level='3' type='both' size='15' unit='MiB' cpus='6-11'> <control granularity='768' unit='KiB' type='code' maxAllocs='8'/> <control granularity='768' unit='KiB' type='data' maxAllocs='8'/> + <monitor threshold='540672' unit='B' maxAllocs='176'/> + <feature name=llc_occupancy/> + </monitor> </bank> </cache> Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com> --- docs/schemas/capability.rng | 28 ++++++++++++++++++++++++++++ src/conf/capabilities.c | 17 +++++++++++++++++ 2 files changed, 45 insertions(+) diff --git a/docs/schemas/capability.rng b/docs/schemas/capability.rng index d61515c..67498f1 100644 --- a/docs/schemas/capability.rng +++ b/docs/schemas/capability.rng @@ -314,6 +314,24 @@ </attribute> </element> </zeroOrMore> + <zeroOrMore> + <element name='monitor'> + <attribute name='threshold'> + <ref name='unsignedInt'/> + </attribute> + <attribute name='unit'> + <ref name='unit'/> + </attribute> + <attribute name='maxAllocs'> + <ref name='unsignedInt'/> + </attribute> + <zeroOrMore> + <element name='feature'> + <ref name='monitorFeature'/> + </element> + </zeroOrMore> + </element> + </zeroOrMore> </element> </oneOrMore> </element> @@ -329,6 +347,16 @@ </attribute> </define> + <define name='monitorFeature'> + <attribute name='name'> + <choice> + <value>llc_occupancy</value> + <value>mbm_total_bytes</value> + <value>mbm_local_bytes</value> + </choice> + </attribute> + </define> + <define name='memory_bandwidth'> <element name='memory_bandwidth'> <oneOrMore> diff --git a/src/conf/capabilities.c b/src/conf/capabilities.c index 5280348..7932088 100644 --- a/src/conf/capabilities.c +++ b/src/conf/capabilities.c @@ -942,6 +942,23 @@ virCapabilitiesFormatCaches(virBufferPtr buf, controls->max_allocation); } + if (bank->monitor && + bank->monitor->nfeatures) { + virBufferAsprintf(&childrenBuf, + "<monitor threshold='%u' unit='B' " + "maxAllocs='%u'>\n", + bank->monitor->cache_threshold, + bank->monitor->max_allocation); + for (j = 0; j < bank->monitor->nfeatures; j++) { + virBufferAdjustIndent(&childrenBuf, 2); + virBufferAsprintf(&childrenBuf, + "<feature name='%s'/>\n", + bank->monitor->features[j]); + virBufferAdjustIndent(&childrenBuf, -2); + } + virBufferAddLit(&childrenBuf, "</monitor>\n"); + } + if (virBufferCheckError(&childrenBuf) < 0) return -1; -- 2.7.4

On 08/27/2018 07:23 AM, Wang Huaqiang wrote:
CMT capability for each cache bank, includes -. Maximum CMT monitoring groups(sharing with MBM) could be created, which reflects the maximum hardware RMID count. -. 'cache threshold'. -. Statistical information of last level cache, the actual cache occupancy.
cache is splitted into 'bank's, each bank MAY have different cache configuration, report cache monitoring capability in unit of cache bank.
cache monitor capability is shown as below:
<cache> <bank id='0' level='3' type='both' size='15' unit='MiB' cpus='0-5'> <control granularity='768' unit='KiB' type='code' maxAllocs='8'/> <control granularity='768' unit='KiB' type='data' maxAllocs='8'/> + <monitor threshold='540672' unit='B' maxAllocs='176'/> + <feature name=llc_occupancy/> + </monitor> </bank> <bank id='1' level='3' type='both' size='15' unit='MiB' cpus='6-11'> <control granularity='768' unit='KiB' type='code' maxAllocs='8'/> <control granularity='768' unit='KiB' type='data' maxAllocs='8'/> + <monitor threshold='540672' unit='B' maxAllocs='176'/> + <feature name=llc_occupancy/> + </monitor>
</bank> </cache>
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com> --- docs/schemas/capability.rng | 28 ++++++++++++++++++++++++++++ src/conf/capabilities.c | 17 +++++++++++++++++ 2 files changed, 45 insertions(+)
This output would be combined with part of existing patch2.
diff --git a/docs/schemas/capability.rng b/docs/schemas/capability.rng index d61515c..67498f1 100644 --- a/docs/schemas/capability.rng +++ b/docs/schemas/capability.rng @@ -314,6 +314,24 @@ </attribute> </element> </zeroOrMore> + <zeroOrMore> + <element name='monitor'> + <attribute name='threshold'> + <ref name='unsignedInt'/> + </attribute> + <attribute name='unit'> + <ref name='unit'/> + </attribute> + <attribute name='maxAllocs'> + <ref name='unsignedInt'/> + </attribute> + <zeroOrMore> + <element name='feature'> + <ref name='monitorFeature'/> + </element> + </zeroOrMore> + </element> + </zeroOrMore> </element> </oneOrMore> </element> @@ -329,6 +347,16 @@ </attribute> </define>
+ <define name='monitorFeature'> + <attribute name='name'> + <choice> + <value>llc_occupancy</value> + <value>mbm_total_bytes</value> + <value>mbm_local_bytes</value>
So these are the only 3 values you'll ever expect? Probably not a good idea to list them like this or the current algorithm is overkill looking for prefixed "llc_" and "mbm_" values. If "llc_somethingnew" shows up some day, then the schema is invalidated. If all you're supporting or care about is the 3 values, then each should be fetched separately. Just the names make me wonder if they come with some associated value that would be in some file. Perhaps answered in later patches.
+ </choice> + </attribute> + </define> + <define name='memory_bandwidth'> <element name='memory_bandwidth'> <oneOrMore> diff --git a/src/conf/capabilities.c b/src/conf/capabilities.c index 5280348..7932088 100644 --- a/src/conf/capabilities.c +++ b/src/conf/capabilities.c @@ -942,6 +942,23 @@ virCapabilitiesFormatCaches(virBufferPtr buf, controls->max_allocation); }
+ if (bank->monitor && + bank->monitor->nfeatures) { + virBufferAsprintf(&childrenBuf, + "<monitor threshold='%u' unit='B' "
Why is "unit='B' " - does it really matter and is it technically right? If it's only ever going to be 'B', then easy enough to document that way.
+ "maxAllocs='%u'>\n", + bank->monitor->cache_threshold, + bank->monitor->max_allocation); + for (j = 0; j < bank->monitor->nfeatures; j++) { + virBufferAdjustIndent(&childrenBuf, 2); + virBufferAsprintf(&childrenBuf, + "<feature name='%s'/>\n", + bank->monitor->features[j]); + virBufferAdjustIndent(&childrenBuf, -2); + } + virBufferAddLit(&childrenBuf, "</monitor>\n"); + } +
Not clear this data is at the right level, still. John
if (virBufferCheckError(&childrenBuf) < 0) return -1;

-----Original Message----- From: John Ferlan [mailto:jferlan@redhat.com] Sent: Wednesday, September 5, 2018 7:59 PM To: Wang, Huaqiang <huaqiang.wang@intel.com>; libvir-list@redhat.com Cc: Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [PATCH 03/10] conf: Add CMT capability to host
On 08/27/2018 07:23 AM, Wang Huaqiang wrote:
CMT capability for each cache bank, includes -. Maximum CMT monitoring groups(sharing with MBM) could be created, which reflects the maximum hardware RMID count. -. 'cache threshold'. -. Statistical information of last level cache, the actual cache occupancy.
cache is splitted into 'bank's, each bank MAY have different cache configuration, report cache monitoring capability in unit of cache bank.
cache monitor capability is shown as below:
<cache> <bank id='0' level='3' type='both' size='15' unit='MiB' cpus='0-5'> <control granularity='768' unit='KiB' type='code' maxAllocs='8'/> <control granularity='768' unit='KiB' type='data' maxAllocs='8'/> + <monitor threshold='540672' unit='B' maxAllocs='176'/> + <feature name=llc_occupancy/> + </monitor> </bank> <bank id='1' level='3' type='both' size='15' unit='MiB' cpus='6-11'> <control granularity='768' unit='KiB' type='code' maxAllocs='8'/> <control granularity='768' unit='KiB' type='data' maxAllocs='8'/> + <monitor threshold='540672' unit='B' maxAllocs='176'/> + <feature name=llc_occupancy/> + </monitor>
</bank> </cache>
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com> --- docs/schemas/capability.rng | 28 ++++++++++++++++++++++++++++ src/conf/capabilities.c | 17 +++++++++++++++++ 2 files changed, 45 insertions(+)
This output would be combined with part of existing patch2.
Will be combined in next version patch.
diff --git a/docs/schemas/capability.rng b/docs/schemas/capability.rng index d61515c..67498f1 100644 --- a/docs/schemas/capability.rng +++ b/docs/schemas/capability.rng @@ -314,6 +314,24 @@ </attribute> </element> </zeroOrMore> + <zeroOrMore> + <element name='monitor'> + <attribute name='threshold'> + <ref name='unsignedInt'/> + </attribute> + <attribute name='unit'> + <ref name='unit'/> + </attribute> + <attribute name='maxAllocs'> + <ref name='unsignedInt'/> + </attribute> + <zeroOrMore> + <element name='feature'> + <ref name='monitorFeature'/> + </element> + </zeroOrMore> + </element> + </zeroOrMore> </element> </oneOrMore> </element> @@ -329,6 +347,16 @@ </attribute> </define>
+ <define name='monitorFeature'> + <attribute name='name'> + <choice> + <value>llc_occupancy</value> + <value>mbm_total_bytes</value> + <value>mbm_local_bytes</value>
So these are the only 3 values you'll ever expect? Probably not a good idea to list them like this or the current algorithm is overkill looking for prefixed "llc_" and "mbm_" values.
If "llc_somethingnew" shows up some day, then the schema is invalidated.
Disagree. I don't think the schema will be invalidated when new "llc_somethingnew" comes. Libvirt only recognize these three feature names. If a new hardware feature name comes, take your example, the 'llc_somethingnew', then without a function enabling, should we let it be shown here? I think properly no. It makes sense to say libvirt only supports the enabled hardware feature, not all hardware features. To get the new 'llc_somethingnew' supported here, you need to make changes here and submit the patch.
If all you're supporting or care about is the 3 values, then each should be fetched separately. Just the names make me wonder if they come with some associated value that would be in some file. Perhaps answered in later patches.
Only cares about the 3 values. Will apply more strict name rule to them in source code in next version patch. " Just the names make me wonder if they come with some associated value that would be in some file. Perhaps answered in later patches." -- not understand.
+ </choice> + </attribute> + </define> + <define name='memory_bandwidth'> <element name='memory_bandwidth'> <oneOrMore> diff --git a/src/conf/capabilities.c b/src/conf/capabilities.c index 5280348..7932088 100644 --- a/src/conf/capabilities.c +++ b/src/conf/capabilities.c @@ -942,6 +942,23 @@ virCapabilitiesFormatCaches(virBufferPtr buf, controls->max_allocation); }
+ if (bank->monitor && + bank->monitor->nfeatures) { + virBufferAsprintf(&childrenBuf, + "<monitor threshold='%u' unit='B' "
Why is "unit='B' " - does it really matter and is it technically right?
'unit' is the unit of 'threshold'. 'threshold' and 'unit' reflect the value reported through resctrl/'max_threshold_occupancy'.
If it's only ever going to be 'B', then easy enough to document that way.
Realized 'unit' shouldn't be 'B', it could be 'KiB', 'MiB' .... Should be dynamically changed accordingly. Will be beautified with 'virFormatIntPretty'.
+ "maxAllocs='%u'>\n", + bank->monitor->cache_threshold, + bank->monitor->max_allocation); + for (j = 0; j < bank->monitor->nfeatures; j++) { + virBufferAdjustIndent(&childrenBuf, 2); + virBufferAsprintf(&childrenBuf, + "<feature name='%s'/>\n", + bank->monitor->features[j]); + virBufferAdjustIndent(&childrenBuf, -2); + } + virBufferAddLit(&childrenBuf, "</monitor>\n"); + } +
Not clear this data is at the right level, still.
I outlined my considerations for putting cache monitor capability under the data structure of 'cache bank' in my reply of patch 2. It is a bit long, please go to that email for details. Again welcome suggestions. Thanks for review! Huaqiang
John
if (virBufferCheckError(&childrenBuf) < 0) return -1;

On 09/07/2018 04:37 AM, Wang, Huaqiang wrote:
-----Original Message----- From: John Ferlan [mailto:jferlan@redhat.com] Sent: Wednesday, September 5, 2018 7:59 PM To: Wang, Huaqiang <huaqiang.wang@intel.com>; libvir-list@redhat.com Cc: Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [PATCH 03/10] conf: Add CMT capability to host
On 08/27/2018 07:23 AM, Wang Huaqiang wrote:
CMT capability for each cache bank, includes -. Maximum CMT monitoring groups(sharing with MBM) could be created, which reflects the maximum hardware RMID count. -. 'cache threshold'. -. Statistical information of last level cache, the actual cache occupancy.
cache is splitted into 'bank's, each bank MAY have different cache configuration, report cache monitoring capability in unit of cache bank.
cache monitor capability is shown as below:
<cache> <bank id='0' level='3' type='both' size='15' unit='MiB' cpus='0-5'> <control granularity='768' unit='KiB' type='code' maxAllocs='8'/> <control granularity='768' unit='KiB' type='data' maxAllocs='8'/> + <monitor threshold='540672' unit='B' maxAllocs='176'/> + <feature name=llc_occupancy/> + </monitor> </bank> <bank id='1' level='3' type='both' size='15' unit='MiB' cpus='6-11'> <control granularity='768' unit='KiB' type='code' maxAllocs='8'/> <control granularity='768' unit='KiB' type='data' maxAllocs='8'/> + <monitor threshold='540672' unit='B' maxAllocs='176'/> + <feature name=llc_occupancy/> + </monitor>
</bank> </cache>
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com> --- docs/schemas/capability.rng | 28 ++++++++++++++++++++++++++++ src/conf/capabilities.c | 17 +++++++++++++++++ 2 files changed, 45 insertions(+)
This output would be combined with part of existing patch2.
Will be combined in next version patch.
diff --git a/docs/schemas/capability.rng b/docs/schemas/capability.rng index d61515c..67498f1 100644 --- a/docs/schemas/capability.rng +++ b/docs/schemas/capability.rng @@ -314,6 +314,24 @@ </attribute> </element> </zeroOrMore> + <zeroOrMore> + <element name='monitor'> + <attribute name='threshold'> + <ref name='unsignedInt'/> + </attribute> + <attribute name='unit'> + <ref name='unit'/> + </attribute> + <attribute name='maxAllocs'> + <ref name='unsignedInt'/> + </attribute> + <zeroOrMore> + <element name='feature'> + <ref name='monitorFeature'/> + </element> + </zeroOrMore> + </element> + </zeroOrMore> </element> </oneOrMore> </element> @@ -329,6 +347,16 @@ </attribute> </define>
+ <define name='monitorFeature'> + <attribute name='name'> + <choice> + <value>llc_occupancy</value> + <value>mbm_total_bytes</value> + <value>mbm_local_bytes</value>
So these are the only 3 values you'll ever expect? Probably not a good idea to list them like this or the current algorithm is overkill looking for prefixed "llc_" and "mbm_" values.
If "llc_somethingnew" shows up some day, then the schema is invalidated.
Disagree. I don't think the schema will be invalidated when new "llc_somethingnew" comes. Libvirt only recognize these three feature names.
Hmm... maybe not clear enough - see, virt-xml-validate. Using <choice> limits the choices or allowed values to that known list. So if some kernel some day adds llc_somethingnew, but the libvirt rng file for the customer isn't updated to include that, then the XML doesn't validate. Trying to think of something else similar that exists today, but nothing springs to mind.
If a new hardware feature name comes, take your example, the 'llc_somethingnew', then without a function enabling, should we let it be shown here? I think properly no. It makes sense to say libvirt only supports the enabled hardware feature, not all hardware features. To get the new 'llc_somethingnew' supported here, you need to make changes here and submit the patch.
If all you're supporting or care about is the 3 values, then each should be fetched separately. Just the names make me wonder if they come with some associated value that would be in some file. Perhaps answered in later patches.
Only cares about the 3 values. Will apply more strict name rule to them in source code in next version patch.
" Just the names make me wonder if they come with some associated value that would be in some file. Perhaps answered in later patches." -- not understand.
+ </choice> + </attribute> + </define> + <define name='memory_bandwidth'> <element name='memory_bandwidth'> <oneOrMore> diff --git a/src/conf/capabilities.c b/src/conf/capabilities.c index 5280348..7932088 100644 --- a/src/conf/capabilities.c +++ b/src/conf/capabilities.c @@ -942,6 +942,23 @@ virCapabilitiesFormatCaches(virBufferPtr buf, controls->max_allocation); }
+ if (bank->monitor && + bank->monitor->nfeatures) { + virBufferAsprintf(&childrenBuf, + "<monitor threshold='%u' unit='B' "
Why is "unit='B' " - does it really matter and is it technically right?
'unit' is the unit of 'threshold'. 'threshold' and 'unit' reflect the value reported through resctrl/'max_threshold_occupancy'.
If it's only ever going to be 'B', then easy enough to document that way.
Realized 'unit' shouldn't be 'B', it could be 'KiB', 'MiB' .... Should be dynamically changed accordingly. Will be beautified with 'virFormatIntPretty'.
Face-palm on my previous response - KiB not MiB <sigh>, it must be Friday. Oh it is!!! yay! John
+ "maxAllocs='%u'>\n", + bank->monitor->cache_threshold, + bank->monitor->max_allocation); + for (j = 0; j < bank->monitor->nfeatures; j++) { + virBufferAdjustIndent(&childrenBuf, 2); + virBufferAsprintf(&childrenBuf, + "<feature name='%s'/>\n", + bank->monitor->features[j]); + virBufferAdjustIndent(&childrenBuf, -2); + } + virBufferAddLit(&childrenBuf, "</monitor>\n"); + } +
Not clear this data is at the right level, still.
I outlined my considerations for putting cache monitor capability under the data structure of 'cache bank' in my reply of patch 2. It is a bit long, please go to that email for details. Again welcome suggestions.
Thanks for review! Huaqiang
John
if (virBufferCheckError(&childrenBuf) < 0) return -1;

-----Original Message----- From: John Ferlan [mailto:jferlan@redhat.com] Sent: Saturday, September 8, 2018 1:11 AM To: Wang, Huaqiang <huaqiang.wang@intel.com>; libvir-list@redhat.com Cc: Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [PATCH 03/10] conf: Add CMT capability to host
On 09/07/2018 04:37 AM, Wang, Huaqiang wrote:
-----Original Message----- From: John Ferlan [mailto:jferlan@redhat.com] Sent: Wednesday, September 5, 2018 7:59 PM To: Wang, Huaqiang <huaqiang.wang@intel.com>; libvir-list@redhat.com Cc: Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [PATCH 03/10] conf: Add CMT capability to host
On 08/27/2018 07:23 AM, Wang Huaqiang wrote:
CMT capability for each cache bank, includes -. Maximum CMT monitoring groups(sharing with MBM) could be created, which reflects the maximum hardware RMID count. -. 'cache threshold'. -. Statistical information of last level cache, the actual cache occupancy.
cache is splitted into 'bank's, each bank MAY have different cache configuration, report cache monitoring capability in unit of cache bank.
cache monitor capability is shown as below:
<cache> <bank id='0' level='3' type='both' size='15' unit='MiB' cpus='0-5'> <control granularity='768' unit='KiB' type='code' maxAllocs='8'/> <control granularity='768' unit='KiB' type='data' maxAllocs='8'/> + <monitor threshold='540672' unit='B' maxAllocs='176'/> + <feature name=llc_occupancy/> + </monitor> </bank> <bank id='1' level='3' type='both' size='15' unit='MiB' cpus='6-11'> <control granularity='768' unit='KiB' type='code' maxAllocs='8'/> <control granularity='768' unit='KiB' type='data' maxAllocs='8'/> + <monitor threshold='540672' unit='B' maxAllocs='176'/> + <feature name=llc_occupancy/> + </monitor>
</bank> </cache>
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com> --- docs/schemas/capability.rng | 28 ++++++++++++++++++++++++++++ src/conf/capabilities.c | 17 +++++++++++++++++ 2 files changed, 45 insertions(+)
This output would be combined with part of existing patch2.
Will be combined in next version patch.
diff --git a/docs/schemas/capability.rng b/docs/schemas/capability.rng index d61515c..67498f1 100644 --- a/docs/schemas/capability.rng +++ b/docs/schemas/capability.rng @@ -314,6 +314,24 @@ </attribute> </element> </zeroOrMore> + <zeroOrMore> + <element name='monitor'> + <attribute name='threshold'> + <ref name='unsignedInt'/> + </attribute> + <attribute name='unit'> + <ref name='unit'/> + </attribute> + <attribute name='maxAllocs'> + <ref name='unsignedInt'/> + </attribute> + <zeroOrMore> + <element name='feature'> + <ref name='monitorFeature'/> + </element> + </zeroOrMore> + </element> + </zeroOrMore> </element> </oneOrMore> </element> @@ -329,6 +347,16 @@ </attribute> </define>
+ <define name='monitorFeature'> + <attribute name='name'> + <choice> + <value>llc_occupancy</value> + <value>mbm_total_bytes</value> + <value>mbm_local_bytes</value>
So these are the only 3 values you'll ever expect? Probably not a good idea to list them like this or the current algorithm is overkill looking for
prefixed "llc_"
and "mbm_" values.
If "llc_somethingnew" shows up some day, then the schema is invalidated.
Disagree. I don't think the schema will be invalidated when new "llc_somethingnew" comes. Libvirt only recognize these three feature names.
Hmm... maybe not clear enough - see, virt-xml-validate.
Using <choice> limits the choices or allowed values to that known list. So if some kernel some day adds llc_somethingnew, but the libvirt rng file for the customer isn't updated to include that, then the XML doesn't validate.
Trying to think of something else similar that exists today, but nothing springs to mind.
Align with you. Let's pass through all features parsed from 'info/L3_MON/features' now and in the future. Then only constrain on feature name is being a printable character. Do you agree?
If a new hardware feature name comes, take your example, the 'llc_somethingnew', then without a function enabling, should we let it be
shown here?
I think properly no. It makes sense to say libvirt only supports the enabled hardware feature, not all hardware features. To get the new 'llc_somethingnew' supported here, you need to make changes here and submit the patch.
If all you're supporting or care about is the 3 values, then each should be fetched separately. Just the names make me wonder if they come with some associated value that would be in some file. Perhaps answered in later patches.
Only cares about the 3 values. Will apply more strict name rule to them in source code in next version patch.
" Just the names make me wonder if they come with some associated value that would be in some file. Perhaps answered in later patches." -- not understand.
+ </choice> + </attribute> + </define> + <define name='memory_bandwidth'> <element name='memory_bandwidth'> <oneOrMore> diff --git a/src/conf/capabilities.c b/src/conf/capabilities.c index 5280348..7932088 100644 --- a/src/conf/capabilities.c +++ b/src/conf/capabilities.c @@ -942,6 +942,23 @@ virCapabilitiesFormatCaches(virBufferPtr buf, controls->max_allocation); }
+ if (bank->monitor && + bank->monitor->nfeatures) { + virBufferAsprintf(&childrenBuf, + "<monitor threshold='%u' unit='B' "
Why is "unit='B' " - does it really matter and is it technically right?
'unit' is the unit of 'threshold'. 'threshold' and 'unit' reflect the value reported through resctrl/'max_threshold_occupancy'.
If it's only ever going to be 'B', then easy enough to document that way.
Realized 'unit' shouldn't be 'B', it could be 'KiB', 'MiB' .... Should be dynamically changed accordingly. Will be beautified with 'virFormatIntPretty'.
Face-palm on my previous response - KiB not MiB <sigh>, it must be Friday. Oh it is!!! yay!
'Unit' will be removed. Now it doesn't matter for KiB or MiB ... :) Thanks for review. Huaqiang
John
+ "maxAllocs='%u'>\n", + bank->monitor->cache_threshold, + bank->monitor->max_allocation); + for (j = 0; j < bank->monitor->nfeatures; j++) { + virBufferAdjustIndent(&childrenBuf, 2); + virBufferAsprintf(&childrenBuf, + "<feature name='%s'/>\n", + bank->monitor->features[j]); + virBufferAdjustIndent(&childrenBuf, -2); + } + virBufferAddLit(&childrenBuf, "</monitor>\n"); + } +
Not clear this data is at the right level, still.
I outlined my considerations for putting cache monitor capability under the data structure of 'cache bank' in my reply of patch 2. It is a bit long, please go to that email for details. Again welcome suggestions.
Thanks for review! Huaqiang
John
if (virBufferCheckError(&childrenBuf) < 0) return -1;

Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com> --- .../linux-resctrl/resctrl/info/L3_MON/max_threshold_occupancy | 1 + .../vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/mon_features | 3 +++ tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/num_rmids | 1 + tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml | 6 ++++++ 4 files changed, 11 insertions(+) create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/max_threshold_occupancy create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/mon_features create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/num_rmids diff --git a/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/max_threshold_occupancy b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/max_threshold_occupancy new file mode 100644 index 0000000..77f05e2 --- /dev/null +++ b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/max_threshold_occupancy @@ -0,0 +1 @@ +270336 diff --git a/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/mon_features b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/mon_features new file mode 100644 index 0000000..0c57b8d --- /dev/null +++ b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/mon_features @@ -0,0 +1,3 @@ +llc_occupancy +mbm_total_bytes +mbm_local_bytes diff --git a/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/num_rmids b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/num_rmids new file mode 100644 index 0000000..1057e9a --- /dev/null +++ b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/num_rmids @@ -0,0 +1 @@ +176 diff --git a/tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml b/tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml index 9b00cf0..678fdc9 100644 --- a/tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml +++ b/tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml @@ -44,9 +44,15 @@ <cache> <bank id='0' level='3' type='both' size='15' unit='MiB' cpus='0-5'> <control granularity='768' min='1536' unit='KiB' type='both' maxAllocs='4'/> + <monitor threshold='270336' unit='B' maxAllocs='176'> + <feature name='llc_occupancy'/> + </monitor> </bank> <bank id='1' level='3' type='both' size='15' unit='MiB' cpus='6-11'> <control granularity='768' min='1536' unit='KiB' type='both' maxAllocs='4'/> + <monitor threshold='270336' unit='B' maxAllocs='176'> + <feature name='llc_occupancy'/> + </monitor> </bank> </cache> <memory_bandwidth> -- 2.7.4

On 08/27/2018 07:23 AM, Wang Huaqiang wrote:
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com> --- .../linux-resctrl/resctrl/info/L3_MON/max_threshold_occupancy | 1 + .../vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/mon_features | 3 +++ tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/num_rmids | 1 + tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml | 6 ++++++ 4 files changed, 11 insertions(+) create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/max_threshold_occupancy create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/mon_features create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/num_rmids
And this would be combined with part of patch2 and patch3
diff --git a/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/max_threshold_occupancy b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/max_threshold_occupancy new file mode 100644 index 0000000..77f05e2 --- /dev/null +++ b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/max_threshold_occupancy @@ -0,0 +1 @@ +270336 diff --git a/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/mon_features b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/mon_features new file mode 100644 index 0000000..0c57b8d --- /dev/null +++ b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/mon_features @@ -0,0 +1,3 @@ +llc_occupancy +mbm_total_bytes +mbm_local_bytes
Could/should this list values that aren't prefixed by "llc_" and "mbm_" to validate your code? There's only 1 set of data but it's printed twice - that's the reason for my comment in patch2 about duplication of the same data that is unnecessary. What if there were 10 bank id's, 100? 1000? - lots of waste. Only 2, no big deal. John
diff --git a/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/num_rmids b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/num_rmids new file mode 100644 index 0000000..1057e9a --- /dev/null +++ b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/num_rmids @@ -0,0 +1 @@ +176 diff --git a/tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml b/tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml index 9b00cf0..678fdc9 100644 --- a/tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml +++ b/tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml @@ -44,9 +44,15 @@ <cache> <bank id='0' level='3' type='both' size='15' unit='MiB' cpus='0-5'> <control granularity='768' min='1536' unit='KiB' type='both' maxAllocs='4'/> + <monitor threshold='270336' unit='B' maxAllocs='176'> + <feature name='llc_occupancy'/> + </monitor> </bank> <bank id='1' level='3' type='both' size='15' unit='MiB' cpus='6-11'> <control granularity='768' min='1536' unit='KiB' type='both' maxAllocs='4'/> + <monitor threshold='270336' unit='B' maxAllocs='176'> + <feature name='llc_occupancy'/> + </monitor> </bank> </cache> <memory_bandwidth>

-----Original Message----- From: John Ferlan [mailto:jferlan@redhat.com] Sent: Wednesday, September 5, 2018 7:59 PM To: Wang, Huaqiang <huaqiang.wang@intel.com>; libvir-list@redhat.com Cc: Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [PATCH 04/10] test: add test case for resctrl monitor
On 08/27/2018 07:23 AM, Wang Huaqiang wrote:
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com> --- .../linux-resctrl/resctrl/info/L3_MON/max_threshold_occupancy | 1 + .../vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/mon_features | 3 +++ tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/num_rmids | 1 + tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml | 6 ++++++ 4 files changed, 11 insertions(+) create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/max_threshold_ occupancy create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/mon_features create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/num_rmids
And this would be combined with part of patch2 and patch3
diff --git a/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/max_threshol d_occupancy b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/max_threshol d_occupancy new file mode 100644 index 0000000..77f05e2 --- /dev/null +++ b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/max_thre +++ shold_occupancy @@ -0,0 +1 @@ +270336 diff --git a/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/mon_features b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/mon_features new file mode 100644 index 0000000..0c57b8d --- /dev/null +++ b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/mon_feat +++ ures @@ -0,0 +1,3 @@ +llc_occupancy +mbm_total_bytes +mbm_local_bytes
Could/should this list values that aren't prefixed by "llc_" and "mbm_" to validate your code?
Will add some 'fake' features to the list, and make more tests. To be done in next version patch.
There's only 1 set of data but it's printed twice - that's the reason for my comment in patch2 about duplication of the same data that is unnecessary. What if there were 10 bank id's, 100? 1000? - lots of waste. Only 2, no big deal.
<cache> <bank id='0' level='3' type='both' size='55' unit='MiB' cpus='0-21,44-65'> <control granularity='2816' unit='KiB' type='both' maxAllocs='16'/> <monitor threshold='270336' unit='B' maxAllocs='176'> <feature name='llc_occupancy'/> </monitor> </bank> <bank id='1' level='3' type='both' size='55' unit='MiB' cpus='22-43,66-87'> <control granularity='2816' unit='KiB' type='both' maxAllocs='16'/> <monitor threshold='270336' unit='B' maxAllocs='176'> <feature name='llc_occupancy'/> </monitor> </bank> </cache> Above is the cache capabilites section, dumped from my system through 'virsh capabilities' command. This is a 2-socket E5-2699v4 CPU(22 core with CAT/CMT enabled and CDP disabled) system, as you said, the cache monitor capability is printed more than once. And I need to point out that the following cache 'control' element is also printed for twice, it met the same situation you mentioned for cache monitor. "<control granularity='2816' unit='KiB' type='both' maxAllocs='16'/>" After you have read my considerations(in the reply to your review of patch 2) for reason why I used current disign, if you still think it not wise to make it duplicated, let's find a proper place to keep the data and eliminate such kind of duplication. We can make any changes to make our design more reasonable in the design stage. Thanks for your kind revew! BR Huaqiang
John
diff --git a/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/num_rmids b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/num_rmids new file mode 100644 index 0000000..1057e9a --- /dev/null +++ b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/num_rmid +++ s @@ -0,0 +1 @@ +176 diff --git a/tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml b/tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml index 9b00cf0..678fdc9 100644 --- a/tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml +++ b/tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml @@ -44,9 +44,15 @@ <cache> <bank id='0' level='3' type='both' size='15' unit='MiB' cpus='0-5'> <control granularity='768' min='1536' unit='KiB' type='both' maxAllocs='4'/> + <monitor threshold='270336' unit='B' maxAllocs='176'> + <feature name='llc_occupancy'/> + </monitor> </bank> <bank id='1' level='3' type='both' size='15' unit='MiB' cpus='6-11'> <control granularity='768' min='1536' unit='KiB' type='both' maxAllocs='4'/> + <monitor threshold='270336' unit='B' maxAllocs='176'> + <feature name='llc_occupancy'/> + </monitor> </bank> </cache> <memory_bandwidth>

On 09/07/2018 05:12 AM, Wang, Huaqiang wrote:
-----Original Message----- From: John Ferlan [mailto:jferlan@redhat.com] Sent: Wednesday, September 5, 2018 7:59 PM To: Wang, Huaqiang <huaqiang.wang@intel.com>; libvir-list@redhat.com Cc: Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [PATCH 04/10] test: add test case for resctrl monitor
On 08/27/2018 07:23 AM, Wang Huaqiang wrote:
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com> --- .../linux-resctrl/resctrl/info/L3_MON/max_threshold_occupancy | 1 + .../vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/mon_features | 3 +++ tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/num_rmids | 1 + tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml | 6 ++++++ 4 files changed, 11 insertions(+) create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/max_threshold_ occupancy create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/mon_features create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/num_rmids
And this would be combined with part of patch2 and patch3
diff --git a/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/max_threshol d_occupancy b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/max_threshol d_occupancy new file mode 100644 index 0000000..77f05e2 --- /dev/null +++ b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/max_thre +++ shold_occupancy @@ -0,0 +1 @@ +270336 diff --git a/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/mon_features b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/mon_features new file mode 100644 index 0000000..0c57b8d --- /dev/null +++ b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/mon_feat +++ ures @@ -0,0 +1,3 @@ +llc_occupancy +mbm_total_bytes +mbm_local_bytes
Could/should this list values that aren't prefixed by "llc_" and "mbm_" to validate your code?
Will add some 'fake' features to the list, and make more tests. To be done in next version patch.
There's only 1 set of data but it's printed twice - that's the reason for my comment in patch2 about duplication of the same data that is unnecessary. What if there were 10 bank id's, 100? 1000? - lots of waste. Only 2, no big deal.
<cache> <bank id='0' level='3' type='both' size='55' unit='MiB' cpus='0-21,44-65'> <control granularity='2816' unit='KiB' type='both' maxAllocs='16'/> <monitor threshold='270336' unit='B' maxAllocs='176'> <feature name='llc_occupancy'/> </monitor> </bank> <bank id='1' level='3' type='both' size='55' unit='MiB' cpus='22-43,66-87'> <control granularity='2816' unit='KiB' type='both' maxAllocs='16'/> <monitor threshold='270336' unit='B' maxAllocs='176'> <feature name='llc_occupancy'/> </monitor> </bank> </cache>
Above is the cache capabilites section, dumped from my system through 'virsh capabilities' command. This is a 2-socket E5-2699v4 CPU(22 core with CAT/CMT enabled and CDP disabled) system, as you said, the cache monitor capability is printed more than once.
And I need to point out that the following cache 'control' element is also printed for twice, it met the same situation you mentioned for cache monitor. "<control granularity='2816' unit='KiB' type='both' maxAllocs='16'/>"
I think we covered this earlier... I'm still not really clear on what <control> per <bank> really provides other than it's a calculated value based on a few different numbers and honestly I had a hard time following that logic all over the place. John
After you have read my considerations(in the reply to your review of patch 2) for reason why I used current disign, if you still think it not wise to make it duplicated, let's find a proper place to keep the data and eliminate such kind of duplication. We can make any changes to make our design more reasonable in the design stage.
Thanks for your kind revew!
BR Huaqiang
John
diff --git a/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/num_rmids b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/num_rmids new file mode 100644 index 0000000..1057e9a --- /dev/null +++ b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/num_rmid +++ s @@ -0,0 +1 @@ +176 diff --git a/tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml b/tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml index 9b00cf0..678fdc9 100644 --- a/tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml +++ b/tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml @@ -44,9 +44,15 @@ <cache> <bank id='0' level='3' type='both' size='15' unit='MiB' cpus='0-5'> <control granularity='768' min='1536' unit='KiB' type='both' maxAllocs='4'/> + <monitor threshold='270336' unit='B' maxAllocs='176'> + <feature name='llc_occupancy'/> + </monitor> </bank> <bank id='1' level='3' type='both' size='15' unit='MiB' cpus='6-11'> <control granularity='768' min='1536' unit='KiB' type='both' maxAllocs='4'/> + <monitor threshold='270336' unit='B' maxAllocs='176'> + <feature name='llc_occupancy'/> + </monitor> </bank> </cache> <memory_bandwidth>

-----Original Message----- From: John Ferlan [mailto:jferlan@redhat.com] Sent: Saturday, September 8, 2018 1:14 AM To: Wang, Huaqiang <huaqiang.wang@intel.com>; libvir-list@redhat.com Cc: Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [PATCH 04/10] test: add test case for resctrl monitor
On 09/07/2018 05:12 AM, Wang, Huaqiang wrote:
-----Original Message----- From: John Ferlan [mailto:jferlan@redhat.com] Sent: Wednesday, September 5, 2018 7:59 PM To: Wang, Huaqiang <huaqiang.wang@intel.com>; libvir-list@redhat.com Cc: Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [PATCH 04/10] test: add test case for resctrl monitor
On 08/27/2018 07:23 AM, Wang Huaqiang wrote:
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com> --- .../linux-resctrl/resctrl/info/L3_MON/max_threshold_occupancy | 1 + .../vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/mon_features | 3
tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/num_rmids | 1 + tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml | 6 ++++++ 4 files changed, 11 insertions(+) create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/max_threshol d_ occupancy create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/mon_features create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/num_rmids
And this would be combined with part of patch2 and patch3
diff --git a/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/max_thresh ol d_occupancy b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/max_thresh ol d_occupancy new file mode 100644 index 0000000..77f05e2 --- /dev/null +++ b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/max_th +++ re +++ shold_occupancy @@ -0,0 +1 @@ +270336 diff --git a/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/mon_featur es b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/mon_featur es new file mode 100644 index 0000000..0c57b8d --- /dev/null +++ b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/mon_fe +++ at +++ ures @@ -0,0 +1,3 @@ +llc_occupancy +mbm_total_bytes +mbm_local_bytes
Could/should this list values that aren't prefixed by "llc_" and "mbm_" to validate your code?
Will add some 'fake' features to the list, and make more tests. To be done in next version patch.
There's only 1 set of data but it's printed twice - that's the reason for my comment in patch2 about duplication of the same data that is
unnecessary.
What if there were 10 bank id's, 100? 1000? - lots of waste. Only 2, no big deal.
<cache> <bank id='0' level='3' type='both' size='55' unit='MiB' cpus='0-21,44-65'> <control granularity='2816' unit='KiB' type='both' maxAllocs='16'/> <monitor threshold='270336' unit='B' maxAllocs='176'> <feature name='llc_occupancy'/> </monitor> </bank> <bank id='1' level='3' type='both' size='55' unit='MiB' cpus='22-43,66-87'> <control granularity='2816' unit='KiB' type='both' maxAllocs='16'/> <monitor threshold='270336' unit='B' maxAllocs='176'> <feature name='llc_occupancy'/> </monitor> </bank> </cache>
Above is the cache capabilites section, dumped from my system through 'virsh capabilities' command. This is a 2-socket E5-2699v4 CPU(22 core with CAT/CMT enabled and CDP disabled) system, as you said, the cache monitor capability is printed more
+++ than once.
And I need to point out that the following cache 'control' element is also printed for twice, it met the same situation you mentioned for cache
monitor.
"<control granularity='2816' unit='KiB' type='both' maxAllocs='16'/>"
I think we covered this earlier... I'm still not really clear on what <control> per <bank> really provides other than it's a calculated value based on a few different numbers and honestly I had a hard time following that logic all over the place.
Understood. cache <control> information might be alerted by the cache size or something else. Have proposed new cache monitor layout in previous email's update. Hope it looks better. Thanks for review. Huaqiang
John
After you have read my considerations(in the reply to your review of patch 2) for reason why I used current disign, if you still think it not wise to make it duplicated, let's find a proper place to keep the data and eliminate such kind of duplication. We can make any changes to make our design more reasonable in the design stage.
Thanks for your kind revew!
BR Huaqiang
John
diff --git a/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/num_rmids b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/num_rmids new file mode 100644 index 0000000..1057e9a --- /dev/null +++ b/tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/num_rm +++ id +++ s @@ -0,0 +1 @@ +176 diff --git a/tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml b/tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml index 9b00cf0..678fdc9 100644 --- a/tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml +++ b/tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml @@ -44,9 +44,15 @@ <cache> <bank id='0' level='3' type='both' size='15' unit='MiB' cpus='0-5'> <control granularity='768' min='1536' unit='KiB' type='both' maxAllocs='4'/> + <monitor threshold='270336' unit='B' maxAllocs='176'> + <feature name='llc_occupancy'/> + </monitor> </bank> <bank id='1' level='3' type='both' size='15' unit='MiB' cpus='6-11'> <control granularity='768' min='1536' unit='KiB' type='both' maxAllocs='4'/> + <monitor threshold='270336' unit='B' maxAllocs='176'> + <feature name='llc_occupancy'/> + </monitor> </bank> </cache> <memory_bandwidth>

Some code, in virresctrl.c, manupulating the file objects of resctrlfs could be reused for cache monitor interfaces. This patch refactor these functions for purpose of reusing code in later patch: virResctrlAllocDeterminePath virResctrlAllocCreate virResctrlAddPID Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com> --- src/util/virresctrl.c | 126 +++++++++++++++++++++++++++++++++++++------------- 1 file changed, 93 insertions(+), 33 deletions(-) diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index 2f6923a..b3bae6e 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -2082,25 +2082,94 @@ virResctrlAllocAssign(virResctrlInfoPtr resctrl, } -int -virResctrlAllocDeterminePath(virResctrlAllocPtr alloc, - const char *machinename) +static int +virResctrlDeterminePath(const char *id, + const char *root, + const char *parentpath, + const char *prefix, + char **path) { - if (!alloc->id) { + if (!id) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("Resctrl Allocation ID must be set before creation")); + _("Resctrl resource ID must be set before creation")); return -1; } - if (!alloc->path && - virAsprintf(&alloc->path, "%s/%s-%s", - SYSFS_RESCTRL_PATH, machinename, alloc->id) < 0) + if (*path) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Resctrl group (%s) already created, path=%s."), + id, *path); return -1; + } + + if (!parentpath && !root) { + if (virAsprintf(path, "%s/%s-%s", + SYSFS_RESCTRL_PATH, prefix, id) < 0) + return -1; + } else if (!parentpath) { + if (virAsprintf(path, "%s/%s/%s-%s", + SYSFS_RESCTRL_PATH, parentpath, prefix, id) < 0) + return -1; + } else { + if (virAsprintf(path, "%s/%s/%s-%s", + root, parentpath, prefix, id) < 0) + return -1; + } return 0; } +int +virResctrlAllocDeterminePath(virResctrlAllocPtr alloc, + const char *machinename) +{ + return virResctrlDeterminePath(alloc->id, NULL, NULL, + machinename, &alloc->path); +} + +static int +virResctrlCreateGroup(virResctrlInfoPtr resctrl, + char *path) +{ + int ret = -1; + int lockfd = -1; + + if (!path) + return -1; + + if (virResctrlInfoIsEmpty(resctrl)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("Resource control is not supported on this host")); + return -1; + } + + if (STREQ(path, SYSFS_RESCTRL_PATH)) + return 0; + + if (virFileExists(path)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Path '%s' for resctrl resource group exists"), path); + goto cleanup; + } + + lockfd = virResctrlLockWrite(); + if (lockfd < 0) + goto cleanup; + + if (virFileMakePath(path) < 0) { + virReportSystemError(errno, + _("Cannot create resctrl directory '%s'"), path); + goto cleanup; + } + + ret = 0; + cleanup: + virResctrlUnlock(lockfd); + return ret; +} + + /* This checks if the directory for the alloc exists. If not it tries to create * it and apply appropriate alloc settings. */ int @@ -2116,21 +2185,11 @@ virResctrlAllocCreate(virResctrlInfoPtr resctrl, if (!alloc) return 0; - if (virResctrlInfoIsEmpty(resctrl)) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("Resource control is not supported on this host")); - return -1; - } - if (virResctrlAllocDeterminePath(alloc, machinename) < 0) return -1; - if (virFileExists(alloc->path)) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("Path '%s' for resctrl allocation exists"), - alloc->path); - goto cleanup; - } + if (virResctrlCreateGroup(resctrl, alloc->path) < 0) + return -1; lockfd = virResctrlLockWrite(); if (lockfd < 0) @@ -2146,13 +2205,6 @@ virResctrlAllocCreate(virResctrlInfoPtr resctrl, if (virAsprintf(&schemata_path, "%s/schemata", alloc->path) < 0) goto cleanup; - if (virFileMakePath(alloc->path) < 0) { - virReportSystemError(errno, - _("Cannot create resctrl directory '%s'"), - alloc->path); - goto cleanup; - } - VIR_DEBUG("Writing resctrl schemata '%s' into '%s'", alloc_str, schemata_path); if (virFileWriteStr(schemata_path, alloc_str, 0) < 0) { rmdir(alloc->path); @@ -2171,21 +2223,21 @@ virResctrlAllocCreate(virResctrlInfoPtr resctrl, } -int -virResctrlAllocAddPID(virResctrlAllocPtr alloc, - pid_t pid) +static int +virResctrlAddPID(char *path, + pid_t pid) { char *tasks = NULL; char *pidstr = NULL; int ret = 0; - if (!alloc->path) { + if (!path) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("Cannot add pid to non-existing resctrl allocation")); + _("Cannot add pid to non-existing resctrl group")); return -1; } - if (virAsprintf(&tasks, "%s/tasks", alloc->path) < 0) + if (virAsprintf(&tasks, "%s/tasks", path) < 0) return -1; if (virAsprintf(&pidstr, "%lld", (long long int) pid) < 0) @@ -2207,6 +2259,14 @@ virResctrlAllocAddPID(virResctrlAllocPtr alloc, int +virResctrlAllocAddPID(virResctrlAllocPtr alloc, + pid_t pid) +{ + return virResctrlAddPID(alloc->path, pid); +} + + +int virResctrlAllocRemove(virResctrlAllocPtr alloc) { int ret = 0; -- 2.7.4

On 08/27/2018 07:23 AM, Wang Huaqiang wrote:
Some code, in virresctrl.c, manupulating the file objects of resctrlfs could be reused for cache monitor interfaces. This patch refactor these functions for purpose of reusing code in later patch:
virResctrlAllocDeterminePath virResctrlAllocCreate virResctrlAddPID
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com> --- src/util/virresctrl.c | 126 +++++++++++++++++++++++++++++++++++++------------- 1 file changed, 93 insertions(+), 33 deletions(-)
Yikes, 3 or more patches in one.
diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index 2f6923a..b3bae6e 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -2082,25 +2082,94 @@ virResctrlAllocAssign(virResctrlInfoPtr resctrl, }
-int -virResctrlAllocDeterminePath(virResctrlAllocPtr alloc, - const char *machinename) +static int +virResctrlDeterminePath(const char *id, + const char *root,
Let's use @rootpath instead of @root
+ const char *parentpath, + const char *prefix, + char **path)
Take it slowly - round 1, convert virResctrlAllocDeterminePath into using virResctrlDeterminePath, but don't add the @parentpath yet since it's not "introduced" until later. I'd prefer to see the same argument order as being printed too...
{ - if (!alloc->id) { + if (!id) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("Resctrl Allocation ID must be set before creation")); + _("Resctrl resource ID must be set before creation")); return -1; }
- if (!alloc->path && - virAsprintf(&alloc->path, "%s/%s-%s", - SYSFS_RESCTRL_PATH, machinename, alloc->id) < 0) + if (*path) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Resctrl group (%s) already created, path=%s."), + id, *path);
The indent is off here w/ @id - need another space. Still is this a programming error or something else? Tough to tell since you're adding multiple things at one time.
return -1; + } + + if (!parentpath && !root) { + if (virAsprintf(path, "%s/%s-%s", + SYSFS_RESCTRL_PATH, prefix, id) < 0) + return -1;
and this is just the initial case...
+ } else if (!parentpath) { + if (virAsprintf(path, "%s/%s/%s-%s", + SYSFS_RESCTRL_PATH, parentpath, prefix, id) < 0) + return -1; + } else { + if (virAsprintf(path, "%s/%s/%s-%s", + root, parentpath, prefix, id) < 0) + return -1; + }
These are additional cases added later on, but used in this patch, so they need to "wait" to be added until we see "where" they come from.
return 0;
Seems to me rather than passing &alloc->path, this function could return @path and the caller then be able to "handle" that. For the "first pass" before @root and @parentpath are added, using: ignore_value(virAsprintf(&path, "%s/%s-%s", rootpath, prefix, id)); return path;
}
+int +virResctrlAllocDeterminePath(virResctrlAllocPtr alloc, + const char *machinename) +{ + return virResctrlDeterminePath(alloc->id, NULL, NULL, + machinename, &alloc->path);
Thus this becomes: if (!(alloc->path = virResctrlDeterminePath(SYSFS_RESCTRL_PATH, machinename, alloc->id))) return -1; return 0;
+} +
should be two blank lines between and this could use a comment describing what it's doing and what it's assumptions are.
+static int +virResctrlCreateGroup(virResctrlInfoPtr resctrl, + char *path)
s/char/const char/ should be: virResctrlCreateGroupPath
+{ + int ret = -1; + int lockfd = -1; + + if (!path) + return -1;
This would cause some sort of unknown error, but it's a caller bug isn't it? That is if @path is empty before calling in here, then we've missed some other condition, so in this instance it doesn't quite make sense.
+ + if (virResctrlInfoIsEmpty(resctrl)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("Resource control is not supported on this host")); + return -1; + }
Not quite sure what this has to do with creating the GroupPath. Feels like some that should be in the caller, but I guess that depends on future usage.... I see this helper is called in the next patch by virResctrlAllocCreateMonitor which isn't used until patch9 and only called once/if virResctrlAllocCreate is successful. So it doesn't seem that calling it once for each time virResctrlAllocCreateMonitor is called is really necessary since @resctrl doesn't change. In fact, going back to qemuProcessResctrlCreate it would seem that calling virResctrlAllocCreate once for each vm->def->nresctrls would also be somewhat inefficient since caps->host.resctrl (a/k/a @resctrl) doesn't change. But moving it back there may mean needing to check if vm->def->resctrls[i]->alloc is NULL... I think perhaps some more thought needs to be placed on "efficient" code paths before adding the monitor code paths.
+ + if (STREQ(path, SYSFS_RESCTRL_PATH)) + return 0;
This concept doesn't appear until the next patch, so we cannot introduce it yet.
+ + if (virFileExists(path)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Path '%s' for resctrl resource group exists"), path); + goto cleanup; + } + + lockfd = virResctrlLockWrite(); + if (lockfd < 0) + goto cleanup;
This Lock/Unlock sequence should be in the caller... and the fact that the lock should be taken documented as "expected" in the caller.
+ + if (virFileMakePath(path) < 0) { + virReportSystemError(errno, + _("Cannot create resctrl directory '%s'"), path); + goto cleanup; + } + + ret = 0; + cleanup: + virResctrlUnlock(lockfd); + return ret;
In the short term, @ret probably isn't needed - return 0 or -1 directly.
+} + + /* This checks if the directory for the alloc exists. If not it tries to create * it and apply appropriate alloc settings. */ int @@ -2116,21 +2185,11 @@ virResctrlAllocCreate(virResctrlInfoPtr resctrl, if (!alloc) return 0;
- if (virResctrlInfoIsEmpty(resctrl)) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("Resource control is not supported on this host")); - return -1; - } - if (virResctrlAllocDeterminePath(alloc, machinename) < 0) return -1;
If we return from this and alloc->path == NULL, there's a coding error, so I see no reason in virResctrlCreateGroupPath that we'd need to validate that (at least yet). It's a static helper and should be called only when your expected conditions are right.
- if (virFileExists(alloc->path)) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("Path '%s' for resctrl allocation exists"), - alloc->path); - goto cleanup; - } + if (virResctrlCreateGroup(resctrl, alloc->path) < 0) + return -1;
lockfd = virResctrlLockWrite(); if (lockfd < 0)
The call to virResctrlCreateGroupPath should come after this rather than Lock/Unlock when creating the directory and then Lock/Unlock again when writing to the file. I think it's all one autonomous operation.
@@ -2146,13 +2205,6 @@ virResctrlAllocCreate(virResctrlInfoPtr resctrl, if (virAsprintf(&schemata_path, "%s/schemata", alloc->path) < 0) goto cleanup;
- if (virFileMakePath(alloc->path) < 0) { - virReportSystemError(errno, - _("Cannot create resctrl directory '%s'"), - alloc->path); - goto cleanup; - } - VIR_DEBUG("Writing resctrl schemata '%s' into '%s'", alloc_str, schemata_path); if (virFileWriteStr(schemata_path, alloc_str, 0) < 0) { rmdir(alloc->path); @@ -2171,21 +2223,21 @@ virResctrlAllocCreate(virResctrlInfoPtr resctrl, }
The next hunk is fine as long as it is a single patch. John
-int -virResctrlAllocAddPID(virResctrlAllocPtr alloc, - pid_t pid) +static int +virResctrlAddPID(char *path, + pid_t pid) { char *tasks = NULL; char *pidstr = NULL; int ret = 0;
- if (!alloc->path) { + if (!path) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("Cannot add pid to non-existing resctrl allocation")); + _("Cannot add pid to non-existing resctrl group")); return -1; }
- if (virAsprintf(&tasks, "%s/tasks", alloc->path) < 0) + if (virAsprintf(&tasks, "%s/tasks", path) < 0) return -1;
if (virAsprintf(&pidstr, "%lld", (long long int) pid) < 0) @@ -2207,6 +2259,14 @@ virResctrlAllocAddPID(virResctrlAllocPtr alloc,
int +virResctrlAllocAddPID(virResctrlAllocPtr alloc, + pid_t pid) +{ + return virResctrlAddPID(alloc->path, pid); +} + + +int virResctrlAllocRemove(virResctrlAllocPtr alloc) { int ret = 0;

-----Original Message----- From: John Ferlan [mailto:jferlan@redhat.com] Sent: Wednesday, September 5, 2018 10:49 PM To: Wang, Huaqiang <huaqiang.wang@intel.com>; libvir-list@redhat.com Cc: Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [PATCH 05/10] util: resctrl: refactoring some functions
On 08/27/2018 07:23 AM, Wang Huaqiang wrote:
Some code, in virresctrl.c, manupulating the file objects of resctrlfs could be reused for cache monitor interfaces. This patch refactor these functions for purpose of reusing code in later patch:
virResctrlAllocDeterminePath virResctrlAllocCreate virResctrlAddPID
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com> --- src/util/virresctrl.c | 126 +++++++++++++++++++++++++++++++++++++------------- 1 file changed, 93 insertions(+), 33 deletions(-)
Yikes, 3 or more patches in one.
Will be split according to your suggestions.
diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index 2f6923a..b3bae6e 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -2082,25 +2082,94 @@ virResctrlAllocAssign(virResctrlInfoPtr resctrl, }
-int -virResctrlAllocDeterminePath(virResctrlAllocPtr alloc, - const char *machinename) +static int +virResctrlDeterminePath(const char *id, + const char *root,
Let's use @rootpath instead of @root
Will be fixed..
+ const char *parentpath, + const char *prefix, + char **path)
Take it slowly - round 1, convert virResctrlAllocDeterminePath into using virResctrlDeterminePath, but don't add the @parentpath yet since it's not "introduced" until later.
OK. Split into two steps/patches.
I'd prefer to see the same argument order as being printed too...
OK. Will pay attention to the order.
{ - if (!alloc->id) { + if (!id) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("Resctrl Allocation ID must be set before creation")); + _("Resctrl resource ID must be set before + creation")); return -1; }
- if (!alloc->path && - virAsprintf(&alloc->path, "%s/%s-%s", - SYSFS_RESCTRL_PATH, machinename, alloc->id) < 0) + if (*path) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Resctrl group (%s) already created, path=%s."), + id, *path);
The indent is off here w/ @id - need another space. Still is this a programming error or something else? Tough to tell since you're adding multiple things at one time.
Yes. Will add a space before @id. Will follow your suggestion mention below, this function will take @path as a return Value. No parameter of @path then, and no error message.
return -1; + } + + if (!parentpath && !root) { + if (virAsprintf(path, "%s/%s-%s", + SYSFS_RESCTRL_PATH, prefix, id) < 0) + return -1;
and this is just the initial case...
+ } else if (!parentpath) { + if (virAsprintf(path, "%s/%s/%s-%s", + SYSFS_RESCTRL_PATH, parentpath, prefix, id) < 0) + return -1; + } else { + if (virAsprintf(path, "%s/%s/%s-%s", + root, parentpath, prefix, id) < 0) + return -1; + }
These are additional cases added later on, but used in this patch, so they need to "wait" to be added until we see "where" they come from.
Will be fixed.
return 0;
Seems to me rather than passing &alloc->path, this function could return @path and the caller then be able to "handle" that.
OK. Follow the suggestion, take @path as a return value.
For the "first pass" before @root and @parentpath are added, using:
ignore_value(virAsprintf(&path, "%s/%s-%s", rootpath, prefix, id));
return path;
OK. Take your suggestion and will make change.
}
+int +virResctrlAllocDeterminePath(virResctrlAllocPtr alloc, + const char *machinename) { + return virResctrlDeterminePath(alloc->id, NULL, NULL, + machinename, &alloc->path);
Thus this becomes:
if (!(alloc->path = virResctrlDeterminePath(SYSFS_RESCTRL_PATH, machinename, alloc->id))) return -1;
return 0;
Understand. Will be followed.
+} +
should be two blank lines between and this could use a comment describing what it's doing and what it's assumptions are.
Two blank lines here for coding style consistence, ok. And add following comments to describe the functionality. /* * This helper creates the resctrl group by making the real directory in * resctrl file system. @path is the directory path. */
+static int +virResctrlCreateGroup(virResctrlInfoPtr resctrl, + char *path)
s/char/const char/
Will be fixed.
should be:
virResctrlCreateGroupPath
I prefer the original name ' virResctrlCreateGroup' than 'virResctrlCreateGroupPath'. The main role of this function is to make a directory, and the directory is called 'resource group' in kernel's document. See document https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt If you still think 'virResctrlCreateGroupPath' is better, OK, let's change it.
+{ + int ret = -1; + int lockfd = -1; + + if (!path) + return -1;
This would cause some sort of unknown error, but it's a caller bug isn't it? That is if @path is empty before calling in here, then we've missed some other condition, so in this instance it doesn't quite make sense.
OK. I need to pay more attention on these places that could cause 'unknown error'.
+ + if (virResctrlInfoIsEmpty(resctrl)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("Resource control is not supported on this host")); + return -1; + }
Not quite sure what this has to do with creating the GroupPath.
Does 'this' mean the invoking of ' virResctrlInfoIsEmpty'? virResctrlInfoIsEmpty return true ensures that the 'resctrl fs' is supported here.
Feels like some that should be in the caller, but I guess that depends on future usage.... I see this helper is called in the next patch by virResctrlAllocCreateMonitor which isn't used until patch9 and only called once/if virResctrlAllocCreate is successful.
Awesome, your feeling is right. My design is, virResctrlAllocCreate creates an resource 'allocation', and virResctrlAllocCreateMonitor creates a resource 'monitor'. The 'monitor' belongs to some specific 'allocation'. If you want to create a 'monitor', an 'allocation' must be created already, and link the 'monitor' to the 'allocation'. So when virResctrlAllocCreateMonitor is called, the virResctrlAllocCreate must be called successfully. And the ' virResctrlInfoIsEmpty' is checked more than one times. Will move the call of virResctrlInfoIsEmpty into virResctrlAllocCreate.
So it doesn't seem that calling it once for each time virResctrlAllocCreateMonitor is called is really necessary since @resctrl doesn't change.
In fact, going back to qemuProcessResctrlCreate it would seem that calling virResctrlAllocCreate once for each vm->def->nresctrls would also be somewhat inefficient since caps->host.resctrl (a/k/a @resctrl) doesn't change. But moving it back there may mean needing to check if vm->def->resctrls[i]->alloc is NULL...
I think perhaps some more thought needs to be placed on "efficient" code paths before adding the monitor code paths.
Confused. Do you still talking about the mult-call over function virResctrlInfoIsEmpty?
+ + if (STREQ(path, SYSFS_RESCTRL_PATH)) + return 0;
This concept doesn't appear until the next patch, so we cannot introduce it yet.
OK. Will split the patch and make change accordingly.
+ + if (virFileExists(path)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Path '%s' for resctrl resource group exists"), path); + goto cleanup; + } + + lockfd = virResctrlLockWrite(); + if (lockfd < 0) + goto cleanup;
This Lock/Unlock sequence should be in the caller... and the fact that the lock should be taken documented as "expected" in the caller.
Will remove locker here and put locker in caller.
+ + if (virFileMakePath(path) < 0) { + virReportSystemError(errno, + _("Cannot create resctrl directory '%s'"), path); + goto cleanup; + } + + ret = 0; + cleanup: + virResctrlUnlock(lockfd); + return ret;
In the short term, @ret probably isn't needed - return 0 or -1 directly.
Will be fixed.
+} + + /* This checks if the directory for the alloc exists. If not it tries to create * it and apply appropriate alloc settings. */ int @@ -2116,21 +2185,11 @@ virResctrlAllocCreate(virResctrlInfoPtr resctrl, if (!alloc) return 0;
- if (virResctrlInfoIsEmpty(resctrl)) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("Resource control is not supported on this host")); - return -1; - } - if (virResctrlAllocDeterminePath(alloc, machinename) < 0) return -1;
If we return from this and alloc->path == NULL, there's a coding error, so I see no reason in virResctrlCreateGroupPath that we'd need to validate that (at least yet). It's a static helper and should be called only when your expected conditions are right.
Got. Will pay attention to places that will generate 'unkown error's.
- if (virFileExists(alloc->path)) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("Path '%s' for resctrl allocation exists"), - alloc->path); - goto cleanup; - } + if (virResctrlCreateGroup(resctrl, alloc->path) < 0) + return -1;
lockfd = virResctrlLockWrite(); if (lockfd < 0)
The call to virResctrlCreateGroupPath should come after this rather than Lock/Unlock when creating the directory and then Lock/Unlock again when writing to the file. I think it's all one autonomous operation.
OK.
@@ -2146,13 +2205,6 @@ virResctrlAllocCreate(virResctrlInfoPtr resctrl, if (virAsprintf(&schemata_path, "%s/schemata", alloc->path) < 0) goto cleanup;
- if (virFileMakePath(alloc->path) < 0) { - virReportSystemError(errno, - _("Cannot create resctrl directory '%s'"), - alloc->path); - goto cleanup; - } - VIR_DEBUG("Writing resctrl schemata '%s' into '%s'", alloc_str, schemata_path); if (virFileWriteStr(schemata_path, alloc_str, 0) < 0) { rmdir(alloc->path); @@ -2171,21 +2223,21 @@ virResctrlAllocCreate(virResctrlInfoPtr resctrl, }
The next hunk is fine as long as it is a single patch.
John
Thanks for review. Huaqiang
-int -virResctrlAllocAddPID(virResctrlAllocPtr alloc, - pid_t pid) +static int +virResctrlAddPID(char *path, + pid_t pid) { char *tasks = NULL; char *pidstr = NULL; int ret = 0;
- if (!alloc->path) { + if (!path) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("Cannot add pid to non-existing resctrl allocation")); + _("Cannot add pid to non-existing resctrl + group")); return -1; }
- if (virAsprintf(&tasks, "%s/tasks", alloc->path) < 0) + if (virAsprintf(&tasks, "%s/tasks", path) < 0) return -1;
if (virAsprintf(&pidstr, "%lld", (long long int) pid) < 0) @@ -2207,6 +2259,14 @@ virResctrlAllocAddPID(virResctrlAllocPtr alloc,
int +virResctrlAllocAddPID(virResctrlAllocPtr alloc, + pid_t pid) +{ + return virResctrlAddPID(alloc->path, pid); } + + +int virResctrlAllocRemove(virResctrlAllocPtr alloc) { int ret = 0;

On 09/07/2018 06:52 AM, Wang, Huaqiang wrote:
[...]
+static int +virResctrlCreateGroup(virResctrlInfoPtr resctrl, + char *path)
s/char/const char/
Will be fixed.
should be:
virResctrlCreateGroupPath
I prefer the original name ' virResctrlCreateGroup' than 'virResctrlCreateGroupPath'. The main role of this function is to make a directory, and the directory is called 'resource group' in kernel's document. See document https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt If you still think 'virResctrlCreateGroupPath' is better, OK, let's change it.
I don't really care... I also don't follow the kernel naming rules.
+{ + int ret = -1; + int lockfd = -1; + + if (!path) + return -1;
This would cause some sort of unknown error, but it's a caller bug isn't it? That is if @path is empty before calling in here, then we've missed some other condition, so in this instance it doesn't quite make sense.
OK. I need to pay more attention on these places that could cause 'unknown error'.
+ + if (virResctrlInfoIsEmpty(resctrl)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("Resource control is not supported on this host")); + return -1; + }
Not quite sure what this has to do with creating the GroupPath.
Does 'this' mean the invoking of ' virResctrlInfoIsEmpty'? virResctrlInfoIsEmpty return true ensures that the 'resctrl fs' is supported here.
Yes and I know why it was called, but the rest of the text explained why I thought with overkill thrown in for good measure.
Feels like some that should be in the caller, but I guess that depends on future usage.... I see this helper is called in the next patch by virResctrlAllocCreateMonitor which isn't used until patch9 and only called once/if virResctrlAllocCreate is successful.
Awesome, your feeling is right. My design is, virResctrlAllocCreate creates an resource 'allocation', and virResctrlAllocCreateMonitor creates a resource 'monitor'. The 'monitor' belongs to some specific 'allocation'. If you want to create a 'monitor', an 'allocation' must be created already, and link the 'monitor' to the 'allocation'. So when virResctrlAllocCreateMonitor is called, the virResctrlAllocCreate must be called successfully. And the ' virResctrlInfoIsEmpty' is checked more than one times. Will move the call of virResctrlInfoIsEmpty into virResctrlAllocCreate.
So it doesn't seem that calling it once for each time virResctrlAllocCreateMonitor is called is really necessary since @resctrl doesn't change.
In fact, going back to qemuProcessResctrlCreate it would seem that calling virResctrlAllocCreate once for each vm->def->nresctrls would also be somewhat inefficient since caps->host.resctrl (a/k/a @resctrl) doesn't change. But moving it back there may mean needing to check if vm->def->resctrls[i]->alloc is NULL...
I think perhaps some more thought needs to be placed on "efficient" code paths before adding the monitor code paths.
Confused. Do you still talking about the mult-call over function virResctrlInfoIsEmpty?
In the long run it's perhaps a "cheap call", but using a "static int" means you can control who calls it and whether they have checked virResctrlInfoIsEmpty prior to calling thus this can assume it has. Having multiple functions calling IsEmpty is overkill. John [...]

-----Original Message----- From: John Ferlan [mailto:jferlan@redhat.com] Sent: Saturday, September 8, 2018 1:41 AM To: Wang, Huaqiang <huaqiang.wang@intel.com>; libvir-list@redhat.com Cc: Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [PATCH 05/10] util: resctrl: refactoring some functions
On 09/07/2018 06:52 AM, Wang, Huaqiang wrote:
[...]
+static int +virResctrlCreateGroup(virResctrlInfoPtr resctrl, + char *path)
s/char/const char/
Will be fixed.
should be:
virResctrlCreateGroupPath
I prefer the original name ' virResctrlCreateGroup' than 'virResctrlCreateGroupPath'. The main role of this function is to make a directory, and the directory is called 'resource group' in kernel's document. See document https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt If you still think 'virResctrlCreateGroupPath' is better, OK, let's change it.
I don't really care... I also don't follow the kernel naming rules.
Let's use the name virResctrlCreateGroupPath.
+{ + int ret = -1; + int lockfd = -1; + + if (!path) + return -1;
This would cause some sort of unknown error, but it's a caller bug isn't it? That is if @path is empty before calling in here, then we've missed some other condition, so in this instance it doesn't quite make sense.
OK. I need to pay more attention on these places that could cause 'unknown error'.
+ + if (virResctrlInfoIsEmpty(resctrl)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("Resource control is not supported on this host")); + return -1; + }
Not quite sure what this has to do with creating the GroupPath.
Does 'this' mean the invoking of ' virResctrlInfoIsEmpty'? virResctrlInfoIsEmpty return true ensures that the 'resctrl fs' is supported here.
Yes and I know why it was called, but the rest of the text explained why I thought with overkill thrown in for good measure.
Feels like some that should be in the caller, but I guess that depends on future usage.... I see this helper is called in the next patch by virResctrlAllocCreateMonitor which isn't used until patch9 and only called once/if virResctrlAllocCreate is successful.
Awesome, your feeling is right. My design is, virResctrlAllocCreate creates an resource 'allocation', and virResctrlAllocCreateMonitor creates a resource 'monitor'. The 'monitor' belongs to some specific 'allocation'. If you want to create a 'monitor', an 'allocation' must be created already, and link the 'monitor' to the 'allocation'. So when virResctrlAllocCreateMonitor is called, the virResctrlAllocCreate must be called successfully. And the ' virResctrlInfoIsEmpty' is checked more than one times. Will move the call of virResctrlInfoIsEmpty into virResctrlAllocCreate.
So it doesn't seem that calling it once for each time virResctrlAllocCreateMonitor is called is really necessary since @resctrl doesn't change.
In fact, going back to qemuProcessResctrlCreate it would seem that calling virResctrlAllocCreate once for each vm->def->nresctrls would also be somewhat inefficient since caps->host.resctrl (a/k/a @resctrl) doesn't change. But moving it back there may mean needing to check if vm->def->resctrls[i]->alloc is NULL...
I think perhaps some more thought needs to be placed on "efficient" code paths before adding the monitor code paths.
Confused. Do you still talking about the mult-call over function virResctrlInfoIsEmpty?
In the long run it's perhaps a "cheap call", but using a "static int" means you can control who calls it and whether they have checked virResctrlInfoIsEmpty prior to calling thus this can assume it has. Having multiple functions calling IsEmpty is overkill.
Thanks for clarification. I'll check code and make sure virResctrlInfoIsEmpty is not necessarily called in next version patches. Thanks for review. Huaqinag
John
[...]

'virResctrlAllocMon' denotes a resctrl monitor reporting the resource consumption information. This patch introduced the interfaces for resctrl monitor. Relationship of 'resctrl allocation' and 'resctrl monitor': 1. resctrl monitor monitors resources (cache or memory bandwidth) of particular allocation. 2. resctrl allocation may refer to the 'default' allocation if no dedicated resource 'control' applied to it. The 'default' allocation enjoys remaining resource that not allocated. 3. resctrl monitor belongs to 'default' allocation if no 'cachetune' specified in XML file. 4. one resctrl allocation may have several monitors. It is also permitted that there is no resctrl monitor associated with an allocation. Key data structures: + struct _virResctrlAllocMon { + char *id; + char *path; + }; struct _virResctrlAlloc { virObject parent; @@ -276,6 +289,12 @@ struct _virResctrlAlloc { virResctrlAllocMemBWPtr mem_bw; + virResctrlAllocMonPtr *monitors; + size_t nmonitors; } Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com> --- src/libvirt_private.syms | 6 + src/util/virresctrl.c | 361 ++++++++++++++++++++++++++++++++++++++++++++++- src/util/virresctrl.h | 31 ++++ 3 files changed, 394 insertions(+), 4 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 47ea35f..1439327 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2645,12 +2645,17 @@ virCacheKernelTypeFromString; virCacheKernelTypeToString; virCacheTypeFromString; virCacheTypeToString; +virResctrlAllocAddMonitorPID; virResctrlAllocAddPID; virResctrlAllocCreate; +virResctrlAllocCreateMonitor; +virResctrlAllocDeleteMonitor; +virResctrlAllocDetermineMonitorPath; virResctrlAllocDeterminePath; virResctrlAllocForeachCache; virResctrlAllocForeachMemory; virResctrlAllocFormat; +virResctrlAllocGetCacheOccupancy; virResctrlAllocGetID; virResctrlAllocGetUnused; virResctrlAllocNew; @@ -2658,6 +2663,7 @@ virResctrlAllocRemove; virResctrlAllocSetCacheSize; virResctrlAllocSetID; virResctrlAllocSetMemoryBandwidth; +virResctrlAllocSetMonitor; virResctrlInfoGetCache; virResctrlInfoNew; diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index b3bae6e..7215a47 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -257,6 +257,19 @@ struct _virResctrlAllocMemBW { size_t nbandwidths; }; + +typedef struct _virResctrlAllocMon virResctrlAllocMon; +typedef virResctrlAllocMon *virResctrlAllocMonPtr; +/* virResctrlAllocMon denotes a resctrl monitoring group reporting the resource + * consumption information for resource of either cache or memory + * bandwidth. */ +struct _virResctrlAllocMon { + /* monitoring group identifier, should be unique in scope of allocation */ + char *id; + /* directory path under /sys/fs/resctrl*/ + char *path; +}; + struct _virResctrlAlloc { virObject parent; @@ -265,11 +278,21 @@ struct _virResctrlAlloc { virResctrlAllocMemBWPtr mem_bw; + /* monintoring groups associated with current resource allocation + * it might report resource consumption information at a finer + * granularity */ + virResctrlAllocMonPtr *monitors; + size_t nmonitors; + /* The identifier (any unique string for now) */ char *id; /* libvirt-generated path in /sys/fs/resctrl for this particular * allocation */ char *path; + /* is this a default resctrl group? + * true : default group, directory path equals '/sys/fs/resctrl' + * false: non-default group */ + bool default_group; }; @@ -315,6 +338,13 @@ virResctrlAllocDispose(void *obj) VIR_FREE(alloc->mem_bw); } + for (i = 0; i < alloc->nmonitors; i++) { + virResctrlAllocMonPtr monitor = alloc->monitors[i]; + VIR_FREE(monitor->id); + VIR_FREE(monitor->path); + VIR_FREE(monitor); + } + VIR_FREE(alloc->monitors); VIR_FREE(alloc->id); VIR_FREE(alloc->path); VIR_FREE(alloc->levels); @@ -805,6 +835,7 @@ virResctrlInfoGetCache(virResctrlInfoPtr resctrl, memcpy((*controls)[*ncontrols - 1], &i_type->control, sizeof(i_type->control)); } + cachemon->nfeatures = 0; cachemon->max_allocation = 0; if (resctrl->monitor_info) { @@ -817,7 +848,7 @@ virResctrlInfoGetCache(virResctrlInfoPtr resctrl, if (STREQLEN(info->features[i], "llc_", strlen("llc_"))) { if (virStringListAdd(&cachemon->features, info->features[i]) < 0) - goto error; + goto error; cachemon->nfeatures++; } } @@ -841,10 +872,19 @@ virResctrlInfoGetCache(virResctrlInfoPtr resctrl, virResctrlAllocPtr virResctrlAllocNew(void) { + virResctrlAllocPtr ret = NULL; + if (virResctrlInitialize() < 0) return NULL; - return virObjectNew(virResctrlAllocClass); + ret = virObjectNew(virResctrlAllocClass); + if (!ret) + return NULL; + + /* By default, a resource group is a default group */ + ret->default_group = true; + + return ret; } @@ -861,6 +901,9 @@ virResctrlAllocIsEmpty(virResctrlAllocPtr alloc) if (alloc->mem_bw) return false; + if (alloc->nmonitors) + return false; + for (i = 0; i < alloc->nlevels; i++) { virResctrlAllocPerLevelPtr a_level = alloc->levels[i]; @@ -2124,10 +2167,18 @@ int virResctrlAllocDeterminePath(virResctrlAllocPtr alloc, const char *machinename) { - return virResctrlDeterminePath(alloc->id, NULL, NULL, - machinename, &alloc->path); + if (alloc->default_group) { + if (VIR_STRDUP(alloc->path, SYSFS_RESCTRL_PATH) < 0) + return -1; + return 0; + } else { + + return virResctrlDeterminePath(alloc->id, NULL, NULL, + machinename, &alloc->path); + } } + static int virResctrlCreateGroup(virResctrlInfoPtr resctrl, char *path) @@ -2169,6 +2220,27 @@ virResctrlCreateGroup(virResctrlInfoPtr resctrl, return ret; } + /* In case of no explicit requirement for allocating cache and memory + * bandwidth, set 'alloc->default' to 'true', then the monitoring + * group will be created under '/sys/fs/resctrl/mon_groups' in later + * invocation of virResctrlAllocCreate. + * Otherwise, set 'alloc->default' to false, create a new directory + * under '/sys/fs/resctrl/'. This is will cost a hardware 'COSID'.*/ +static int +virResctrlAllocCheckDefault(virResctrlAllocPtr alloc) +{ + bool default_group = true; + if (!alloc) + return -1; + + if (alloc->nlevels) + default_group = false; + if (alloc->mem_bw && alloc->mem_bw->nbandwidths) + default_group = false; + + alloc->default_group = default_group; + return 0; +} /* This checks if the directory for the alloc exists. If not it tries to create * it and apply appropriate alloc settings. */ @@ -2185,6 +2257,8 @@ virResctrlAllocCreate(virResctrlInfoPtr resctrl, if (!alloc) return 0; + virResctrlAllocCheckDefault(alloc); + if (virResctrlAllocDeterminePath(alloc, machinename) < 0) return -1; @@ -2275,10 +2349,289 @@ virResctrlAllocRemove(virResctrlAllocPtr alloc) return 0; VIR_DEBUG("Removing resctrl allocation %s", alloc->path); + + while (alloc->nmonitors > 0) { + ret = virResctrlAllocDeleteMonitor(alloc, alloc->monitors[0]->id); + if (ret < 0) + goto cleanup; + } + if (rmdir(alloc->path) != 0 && errno != ENOENT) { ret = -errno; VIR_ERROR(_("Unable to remove %s (%d)"), alloc->path, errno); } + ret = 0; + cleanup: + return ret; +} + + +static int +virResctrlAllocGetMonitor(virResctrlAllocPtr alloc, + const char *id, + virResctrlAllocMonPtr *monitor, + size_t *pos) +{ + size_t i = 0; + + if (!alloc || !id) + return -1; + + for (i = 0; i < alloc->nmonitors; i++) { + if (alloc->monitors[i]->id && + STREQ(id, (alloc->monitors[i])->id)) { + if (monitor) + *monitor = alloc->monitors[i]; + if (pos) + *pos = i; + return 0; + } + } + + return -1; +} + + +int +virResctrlAllocDetermineMonitorPath(virResctrlAllocPtr alloc, + const char *id, + const char *machinename) +{ + virResctrlAllocMonPtr monitor = NULL; + + if (virResctrlAllocGetMonitor(alloc, id, &monitor, NULL) < 0) + return -1; + + + return virResctrlDeterminePath(monitor->id, + alloc->path, + "mon_groups", + machinename, + &monitor->path); +} + + +int +virResctrlAllocAddMonitorPID(virResctrlAllocPtr alloc, + const char *id, + pid_t pid) +{ + virResctrlAllocMonPtr monitor = NULL; + + if (virResctrlAllocGetMonitor(alloc, id, &monitor, NULL) < 0) + return -1; + + return virResctrlAddPID(monitor->path, pid); +} + + +int +virResctrlAllocSetMonitor(virResctrlAllocPtr alloc, + const char *id) +{ + virResctrlAllocMonPtr monitor = NULL; + + if (!alloc || !id) + return - 1; + + if (virResctrlAllocGetMonitor(alloc, id, &monitor, NULL) < 0) { + if (VIR_ALLOC(monitor) < 0) + return -1; + } + + if (VIR_STRDUP(monitor->id, (char*)id) < 0) + return -1; + + if (VIR_APPEND_ELEMENT(alloc->monitors, alloc->nmonitors, monitor) < 0) + return -1; + + return 0; +} + +int +virResctrlAllocCreateMonitor(virResctrlInfoPtr resctrl, + virResctrlAllocPtr alloc, + const char *machinename, + const char *id) +{ + virResctrlAllocMonPtr monitor = NULL; + + if (virResctrlAllocGetMonitor(alloc, id, &monitor, NULL) < 0) + return - 1; + + if (virResctrlAllocDetermineMonitorPath(alloc, id, machinename) < 0) + return -1; + + VIR_DEBUG("Creating resctrl monitor %s", monitor->path); + if (virResctrlCreateGroup(resctrl, monitor->path) < 0) + return -1; + + return 0; +} + + +int +virResctrlAllocDeleteMonitor(virResctrlAllocPtr alloc, + const char *id) +{ + int ret = 0; + + virResctrlAllocMonPtr monitor = NULL; + size_t pos = 0; + + if (virResctrlAllocGetMonitor(alloc, id, &monitor, &pos) < 0) + return -1; + + VIR_DELETE_ELEMENT(alloc->monitors, pos, alloc->nmonitors); + + VIR_DEBUG("Deleting resctrl monitor %s ", monitor->path); + if (rmdir(monitor->path) != 0 && errno != ENOENT) { + ret = -errno; + VIR_ERROR(_("Unable to remove %s (%d)"), monitor->path, errno); + } + + VIR_FREE(monitor->id); + VIR_FREE(monitor->path); + VIR_FREE(monitor); + return ret; +} + + +static int +virResctrlAllocGetStatistic(virResctrlAllocPtr alloc, + const char *id, + const char *resfile, + unsigned int *nnodes, + unsigned int **nodeids, + unsigned int **nodevals) +{ + DIR *dirp = NULL; + int ret = -1; + int rv = -1; + struct dirent *ent = NULL; + virBuffer buf = VIR_BUFFER_INITIALIZER; + char *mondatapath = NULL; + size_t ntmpid = 0; + size_t ntmpval = 0; + virResctrlAllocMonPtr monitor = NULL; + + if (!nnodes || !nodeids || !nodevals) + return -1; + + if (virResctrlAllocGetMonitor(alloc, id, &monitor, NULL) < 0) + goto cleanup; + + if (!monitor || !monitor->path) + goto cleanup; + + rv = virDirOpenIfExists(&dirp, monitor->path); + if (rv <= 0) + goto cleanup; + + *nnodes = 0; + + virBufferAsprintf(&buf, "%s/mon_data", monitor->path); + + mondatapath = virBufferContentAndReset(&buf); + if (!mondatapath) + goto cleanup; + + if (virDirOpen(&dirp, mondatapath) < 0) + goto cleanup; + + while ((rv = virDirRead(dirp, &ent, mondatapath)) > 0) { + char *pstrid = NULL; + size_t i = 0; + unsigned int len = 0; + unsigned int counter = 0; + unsigned int cacheid = 0; + unsigned int cur_cacheid = 0; + unsigned int val = 0; + int tmpnodeid = 0; + int tmpnodeval = 0; + + if (ent->d_type != DT_DIR) + continue; + + /* mon_L3(|CODE|DATA)_xx, xx is cache id */ + if (STRNEQLEN(ent->d_name, "mon_L", 5)) + continue; + + len = strlen(ent->d_name); + pstrid = ent->d_name; + /* locating the cache id string: 'xx' */ + for (i = 0; i < len; i++) { + if (*(pstrid + i) == '_') + counter ++; + if (counter == 2) + break; + } + i++; + + if (i >= len) + goto cleanup; + + if (virStrToLong_uip(pstrid + i, NULL, 0, &cacheid) < 0) { + VIR_DEBUG("Cannot parse id from folder '%s'", ent->d_name); + goto cleanup; + } + + rv = virFileReadValueUint(&val, + "%s/%s/%s", + mondatapath, ent->d_name, resfile); + + if (rv == -2) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("file %s/%s/%s does not exist"), + mondatapath, ent->d_name, resfile); + goto cleanup; + } else { + if (rv < 0) + goto cleanup; + } + + /* The ultimate caller will be responiblefor free memory of + * 'nodeids' an 'nodevals' */ + if (VIR_APPEND_ELEMENT(*nodeids, ntmpid, cacheid) < 0) + goto cleanup; + if (VIR_APPEND_ELEMENT(*nodevals, ntmpval, val) < 0) + goto cleanup; + + cur_cacheid = ntmpval - 1; + /* sort the cache information in caach bank id's ascending order */ + for (i = 0; i < cur_cacheid; i++) { + if ((*nodeids)[cur_cacheid] < (*nodeids)[i]) { + tmpnodeid = (*nodeids)[cur_cacheid]; + tmpnodeval = (*nodevals)[cur_cacheid]; + (*nodeids)[cur_cacheid] = (*nodeids)[i]; + (*nodevals)[cur_cacheid] = (*nodevals)[i]; + (*nodeids)[i] = tmpnodeid; + (*nodevals)[i] = tmpnodeval; + } + } + } + + (*nnodes) = ntmpval; + ret = 0; + cleanup: + VIR_FREE(mondatapath); + VIR_DIR_CLOSE(dirp); + return ret; +} + + +int +virResctrlAllocGetCacheOccupancy(virResctrlAllocPtr alloc, + const char *id, + unsigned int *nbank, + unsigned int **bankids, + unsigned int **bankcaches) +{ + int ret = - 1; + + ret = virResctrlAllocGetStatistic(alloc, id, "llc_occupancy", + nbank, bankids, bankcaches); + return ret; } diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h index 51bb68b..0f63997 100644 --- a/src/util/virresctrl.h +++ b/src/util/virresctrl.h @@ -160,4 +160,35 @@ virResctrlAllocAddPID(virResctrlAllocPtr alloc, int virResctrlAllocRemove(virResctrlAllocPtr alloc); +int +virResctrlAllocDetermineMonitorPath(virResctrlAllocPtr alloc, + const char *id, + const char *machinename); + +int +virResctrlAllocAddMonitorPID(virResctrlAllocPtr alloc, + const char *id, + pid_t pid); + +int +virResctrlAllocSetMonitor(virResctrlAllocPtr alloc, + const char *id); + +int +virResctrlAllocCreateMonitor(virResctrlInfoPtr resctrl, + virResctrlAllocPtr alloc, + const char *machinename, + const char *id); + +int +virResctrlAllocDeleteMonitor(virResctrlAllocPtr alloc, + const char *id); + +int +virResctrlAllocGetCacheOccupancy(virResctrlAllocPtr alloc, + const char *id, + unsigned int *nbank, + unsigned int **bankids, + unsigned int **bankcaches); + #endif /* __VIR_RESCTRL_H__ */ -- 2.7.4

On 08/27/2018 07:23 AM, Wang Huaqiang wrote:
'virResctrlAllocMon' denotes a resctrl monitor reporting the resource consumption information.
This patch introduced the interfaces for resctrl monitor.
Relationship of 'resctrl allocation' and 'resctrl monitor': 1. resctrl monitor monitors resources (cache or memory bandwidth) of particular allocation. 2. resctrl allocation may refer to the 'default' allocation if no dedicated resource 'control' applied to it. The 'default' allocation enjoys remaining resource that not allocated. 3. resctrl monitor belongs to 'default' allocation if no 'cachetune' specified in XML file. 4. one resctrl allocation may have several monitors. It is also permitted that there is no resctrl monitor associated with an allocation.
Key data structures:
+ struct _virResctrlAllocMon { + char *id; + char *path; + };
struct _virResctrlAlloc { virObject parent;
@@ -276,6 +289,12 @@ struct _virResctrlAlloc { virResctrlAllocMemBWPtr mem_bw; + virResctrlAllocMonPtr *monitors; + size_t nmonitors; }
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com> --- src/libvirt_private.syms | 6 + src/util/virresctrl.c | 361 ++++++++++++++++++++++++++++++++++++++++++++++- src/util/virresctrl.h | 31 ++++ 3 files changed, 394 insertions(+), 4 deletions(-)
Similar to the previous patch - there's a bit too much going on for just one patch here. Introducing "default_group" and "monitors". This needs some separation. I am not going to look at this patch. I think you really need to describe this default_group concept quite a bit more as you'd be totally changing the meaning of alloc->path by "removing" everything after the SYSFS_RESCTRL_PATH. What's the purpose of alloc->path then in this environment? John
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 47ea35f..1439327 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2645,12 +2645,17 @@ virCacheKernelTypeFromString; virCacheKernelTypeToString; virCacheTypeFromString; virCacheTypeToString; +virResctrlAllocAddMonitorPID; virResctrlAllocAddPID; virResctrlAllocCreate; +virResctrlAllocCreateMonitor; +virResctrlAllocDeleteMonitor; +virResctrlAllocDetermineMonitorPath; virResctrlAllocDeterminePath; virResctrlAllocForeachCache; virResctrlAllocForeachMemory; virResctrlAllocFormat; +virResctrlAllocGetCacheOccupancy; virResctrlAllocGetID; virResctrlAllocGetUnused; virResctrlAllocNew; @@ -2658,6 +2663,7 @@ virResctrlAllocRemove; virResctrlAllocSetCacheSize; virResctrlAllocSetID; virResctrlAllocSetMemoryBandwidth; +virResctrlAllocSetMonitor; virResctrlInfoGetCache; virResctrlInfoNew;
diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index b3bae6e..7215a47 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -257,6 +257,19 @@ struct _virResctrlAllocMemBW { size_t nbandwidths; };
+ +typedef struct _virResctrlAllocMon virResctrlAllocMon; +typedef virResctrlAllocMon *virResctrlAllocMonPtr; +/* virResctrlAllocMon denotes a resctrl monitoring group reporting the resource + * consumption information for resource of either cache or memory + * bandwidth. */ +struct _virResctrlAllocMon { + /* monitoring group identifier, should be unique in scope of allocation */ + char *id; + /* directory path under /sys/fs/resctrl*/ + char *path; +}; + struct _virResctrlAlloc { virObject parent;
@@ -265,11 +278,21 @@ struct _virResctrlAlloc {
virResctrlAllocMemBWPtr mem_bw;
+ /* monintoring groups associated with current resource allocation + * it might report resource consumption information at a finer + * granularity */ + virResctrlAllocMonPtr *monitors; + size_t nmonitors; + /* The identifier (any unique string for now) */ char *id; /* libvirt-generated path in /sys/fs/resctrl for this particular * allocation */ char *path; + /* is this a default resctrl group? + * true : default group, directory path equals '/sys/fs/resctrl' + * false: non-default group */ + bool default_group; };
@@ -315,6 +338,13 @@ virResctrlAllocDispose(void *obj) VIR_FREE(alloc->mem_bw); }
+ for (i = 0; i < alloc->nmonitors; i++) { + virResctrlAllocMonPtr monitor = alloc->monitors[i]; + VIR_FREE(monitor->id); + VIR_FREE(monitor->path); + VIR_FREE(monitor); + } + VIR_FREE(alloc->monitors); VIR_FREE(alloc->id); VIR_FREE(alloc->path); VIR_FREE(alloc->levels); @@ -805,6 +835,7 @@ virResctrlInfoGetCache(virResctrlInfoPtr resctrl, memcpy((*controls)[*ncontrols - 1], &i_type->control, sizeof(i_type->control)); }
+ cachemon->nfeatures = 0; cachemon->max_allocation = 0;
if (resctrl->monitor_info) { @@ -817,7 +848,7 @@ virResctrlInfoGetCache(virResctrlInfoPtr resctrl, if (STREQLEN(info->features[i], "llc_", strlen("llc_"))) { if (virStringListAdd(&cachemon->features, info->features[i]) < 0) - goto error; + goto error; cachemon->nfeatures++; } } @@ -841,10 +872,19 @@ virResctrlInfoGetCache(virResctrlInfoPtr resctrl, virResctrlAllocPtr virResctrlAllocNew(void) { + virResctrlAllocPtr ret = NULL; + if (virResctrlInitialize() < 0) return NULL;
- return virObjectNew(virResctrlAllocClass); + ret = virObjectNew(virResctrlAllocClass); + if (!ret) + return NULL; + + /* By default, a resource group is a default group */ + ret->default_group = true; + + return ret; }
@@ -861,6 +901,9 @@ virResctrlAllocIsEmpty(virResctrlAllocPtr alloc) if (alloc->mem_bw) return false;
+ if (alloc->nmonitors) + return false; + for (i = 0; i < alloc->nlevels; i++) { virResctrlAllocPerLevelPtr a_level = alloc->levels[i];
@@ -2124,10 +2167,18 @@ int virResctrlAllocDeterminePath(virResctrlAllocPtr alloc, const char *machinename) { - return virResctrlDeterminePath(alloc->id, NULL, NULL, - machinename, &alloc->path); + if (alloc->default_group) { + if (VIR_STRDUP(alloc->path, SYSFS_RESCTRL_PATH) < 0) + return -1; + return 0; + } else { + + return virResctrlDeterminePath(alloc->id, NULL, NULL, + machinename, &alloc->path); + } }
+ static int virResctrlCreateGroup(virResctrlInfoPtr resctrl, char *path) @@ -2169,6 +2220,27 @@ virResctrlCreateGroup(virResctrlInfoPtr resctrl, return ret; }
+ /* In case of no explicit requirement for allocating cache and memory + * bandwidth, set 'alloc->default' to 'true', then the monitoring + * group will be created under '/sys/fs/resctrl/mon_groups' in later + * invocation of virResctrlAllocCreate. + * Otherwise, set 'alloc->default' to false, create a new directory + * under '/sys/fs/resctrl/'. This is will cost a hardware 'COSID'.*/ +static int +virResctrlAllocCheckDefault(virResctrlAllocPtr alloc) +{ + bool default_group = true; + if (!alloc) + return -1; + + if (alloc->nlevels) + default_group = false; + if (alloc->mem_bw && alloc->mem_bw->nbandwidths) + default_group = false; + + alloc->default_group = default_group; + return 0; +}
/* This checks if the directory for the alloc exists. If not it tries to create * it and apply appropriate alloc settings. */ @@ -2185,6 +2257,8 @@ virResctrlAllocCreate(virResctrlInfoPtr resctrl, if (!alloc) return 0;
+ virResctrlAllocCheckDefault(alloc); + if (virResctrlAllocDeterminePath(alloc, machinename) < 0) return -1;
@@ -2275,10 +2349,289 @@ virResctrlAllocRemove(virResctrlAllocPtr alloc) return 0;
VIR_DEBUG("Removing resctrl allocation %s", alloc->path); + + while (alloc->nmonitors > 0) { + ret = virResctrlAllocDeleteMonitor(alloc, alloc->monitors[0]->id); + if (ret < 0) + goto cleanup; + } + if (rmdir(alloc->path) != 0 && errno != ENOENT) { ret = -errno; VIR_ERROR(_("Unable to remove %s (%d)"), alloc->path, errno); }
+ ret = 0; + cleanup: + return ret; +} + + +static int +virResctrlAllocGetMonitor(virResctrlAllocPtr alloc, + const char *id, + virResctrlAllocMonPtr *monitor, + size_t *pos) +{ + size_t i = 0; + + if (!alloc || !id) + return -1; + + for (i = 0; i < alloc->nmonitors; i++) { + if (alloc->monitors[i]->id && + STREQ(id, (alloc->monitors[i])->id)) { + if (monitor) + *monitor = alloc->monitors[i]; + if (pos) + *pos = i; + return 0; + } + } + + return -1; +} + + +int +virResctrlAllocDetermineMonitorPath(virResctrlAllocPtr alloc, + const char *id, + const char *machinename) +{ + virResctrlAllocMonPtr monitor = NULL; + + if (virResctrlAllocGetMonitor(alloc, id, &monitor, NULL) < 0) + return -1; + + + return virResctrlDeterminePath(monitor->id, + alloc->path, + "mon_groups", + machinename, + &monitor->path); +} + + +int +virResctrlAllocAddMonitorPID(virResctrlAllocPtr alloc, + const char *id, + pid_t pid) +{ + virResctrlAllocMonPtr monitor = NULL; + + if (virResctrlAllocGetMonitor(alloc, id, &monitor, NULL) < 0) + return -1; + + return virResctrlAddPID(monitor->path, pid); +} + + +int +virResctrlAllocSetMonitor(virResctrlAllocPtr alloc, + const char *id) +{ + virResctrlAllocMonPtr monitor = NULL; + + if (!alloc || !id) + return - 1; + + if (virResctrlAllocGetMonitor(alloc, id, &monitor, NULL) < 0) { + if (VIR_ALLOC(monitor) < 0) + return -1; + } + + if (VIR_STRDUP(monitor->id, (char*)id) < 0) + return -1; + + if (VIR_APPEND_ELEMENT(alloc->monitors, alloc->nmonitors, monitor) < 0) + return -1; + + return 0; +} + +int +virResctrlAllocCreateMonitor(virResctrlInfoPtr resctrl, + virResctrlAllocPtr alloc, + const char *machinename, + const char *id) +{ + virResctrlAllocMonPtr monitor = NULL; + + if (virResctrlAllocGetMonitor(alloc, id, &monitor, NULL) < 0) + return - 1; + + if (virResctrlAllocDetermineMonitorPath(alloc, id, machinename) < 0) + return -1; + + VIR_DEBUG("Creating resctrl monitor %s", monitor->path); + if (virResctrlCreateGroup(resctrl, monitor->path) < 0) + return -1; + + return 0; +} + + +int +virResctrlAllocDeleteMonitor(virResctrlAllocPtr alloc, + const char *id) +{ + int ret = 0; + + virResctrlAllocMonPtr monitor = NULL; + size_t pos = 0; + + if (virResctrlAllocGetMonitor(alloc, id, &monitor, &pos) < 0) + return -1; + + VIR_DELETE_ELEMENT(alloc->monitors, pos, alloc->nmonitors); + + VIR_DEBUG("Deleting resctrl monitor %s ", monitor->path); + if (rmdir(monitor->path) != 0 && errno != ENOENT) { + ret = -errno; + VIR_ERROR(_("Unable to remove %s (%d)"), monitor->path, errno); + } + + VIR_FREE(monitor->id); + VIR_FREE(monitor->path); + VIR_FREE(monitor); + return ret; +} + + +static int +virResctrlAllocGetStatistic(virResctrlAllocPtr alloc, + const char *id, + const char *resfile, + unsigned int *nnodes, + unsigned int **nodeids, + unsigned int **nodevals) +{ + DIR *dirp = NULL; + int ret = -1; + int rv = -1; + struct dirent *ent = NULL; + virBuffer buf = VIR_BUFFER_INITIALIZER; + char *mondatapath = NULL; + size_t ntmpid = 0; + size_t ntmpval = 0; + virResctrlAllocMonPtr monitor = NULL; + + if (!nnodes || !nodeids || !nodevals) + return -1; + + if (virResctrlAllocGetMonitor(alloc, id, &monitor, NULL) < 0) + goto cleanup; + + if (!monitor || !monitor->path) + goto cleanup; + + rv = virDirOpenIfExists(&dirp, monitor->path); + if (rv <= 0) + goto cleanup; + + *nnodes = 0; + + virBufferAsprintf(&buf, "%s/mon_data", monitor->path); + + mondatapath = virBufferContentAndReset(&buf); + if (!mondatapath) + goto cleanup; + + if (virDirOpen(&dirp, mondatapath) < 0) + goto cleanup; + + while ((rv = virDirRead(dirp, &ent, mondatapath)) > 0) { + char *pstrid = NULL; + size_t i = 0; + unsigned int len = 0; + unsigned int counter = 0; + unsigned int cacheid = 0; + unsigned int cur_cacheid = 0; + unsigned int val = 0; + int tmpnodeid = 0; + int tmpnodeval = 0; + + if (ent->d_type != DT_DIR) + continue; + + /* mon_L3(|CODE|DATA)_xx, xx is cache id */ + if (STRNEQLEN(ent->d_name, "mon_L", 5)) + continue; + + len = strlen(ent->d_name); + pstrid = ent->d_name; + /* locating the cache id string: 'xx' */ + for (i = 0; i < len; i++) { + if (*(pstrid + i) == '_') + counter ++; + if (counter == 2) + break; + } + i++; + + if (i >= len) + goto cleanup; + + if (virStrToLong_uip(pstrid + i, NULL, 0, &cacheid) < 0) { + VIR_DEBUG("Cannot parse id from folder '%s'", ent->d_name); + goto cleanup; + } + + rv = virFileReadValueUint(&val, + "%s/%s/%s", + mondatapath, ent->d_name, resfile); + + if (rv == -2) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("file %s/%s/%s does not exist"), + mondatapath, ent->d_name, resfile); + goto cleanup; + } else { + if (rv < 0) + goto cleanup; + } + + /* The ultimate caller will be responiblefor free memory of + * 'nodeids' an 'nodevals' */ + if (VIR_APPEND_ELEMENT(*nodeids, ntmpid, cacheid) < 0) + goto cleanup; + if (VIR_APPEND_ELEMENT(*nodevals, ntmpval, val) < 0) + goto cleanup; + + cur_cacheid = ntmpval - 1; + /* sort the cache information in caach bank id's ascending order */ + for (i = 0; i < cur_cacheid; i++) { + if ((*nodeids)[cur_cacheid] < (*nodeids)[i]) { + tmpnodeid = (*nodeids)[cur_cacheid]; + tmpnodeval = (*nodevals)[cur_cacheid]; + (*nodeids)[cur_cacheid] = (*nodeids)[i]; + (*nodevals)[cur_cacheid] = (*nodevals)[i]; + (*nodeids)[i] = tmpnodeid; + (*nodevals)[i] = tmpnodeval; + } + } + } + + (*nnodes) = ntmpval; + ret = 0; + cleanup: + VIR_FREE(mondatapath); + VIR_DIR_CLOSE(dirp); + return ret; +} + + +int +virResctrlAllocGetCacheOccupancy(virResctrlAllocPtr alloc, + const char *id, + unsigned int *nbank, + unsigned int **bankids, + unsigned int **bankcaches) +{ + int ret = - 1; + + ret = virResctrlAllocGetStatistic(alloc, id, "llc_occupancy", + nbank, bankids, bankcaches); + return ret; } diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h index 51bb68b..0f63997 100644 --- a/src/util/virresctrl.h +++ b/src/util/virresctrl.h @@ -160,4 +160,35 @@ virResctrlAllocAddPID(virResctrlAllocPtr alloc, int virResctrlAllocRemove(virResctrlAllocPtr alloc);
+int +virResctrlAllocDetermineMonitorPath(virResctrlAllocPtr alloc, + const char *id, + const char *machinename); + +int +virResctrlAllocAddMonitorPID(virResctrlAllocPtr alloc, + const char *id, + pid_t pid); + +int +virResctrlAllocSetMonitor(virResctrlAllocPtr alloc, + const char *id); + +int +virResctrlAllocCreateMonitor(virResctrlInfoPtr resctrl, + virResctrlAllocPtr alloc, + const char *machinename, + const char *id); + +int +virResctrlAllocDeleteMonitor(virResctrlAllocPtr alloc, + const char *id); + +int +virResctrlAllocGetCacheOccupancy(virResctrlAllocPtr alloc, + const char *id, + unsigned int *nbank, + unsigned int **bankids, + unsigned int **bankcaches); + #endif /* __VIR_RESCTRL_H__ */

-----Original Message----- From: John Ferlan [mailto:jferlan@redhat.com] Sent: Wednesday, September 5, 2018 11:00 PM To: Wang, Huaqiang <huaqiang.wang@intel.com>; libvir-list@redhat.com Cc: Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [PATCH 06/10] util: Introduce resctrl monitor for CMT
On 08/27/2018 07:23 AM, Wang Huaqiang wrote:
'virResctrlAllocMon' denotes a resctrl monitor reporting the resource consumption information.
This patch introduced the interfaces for resctrl monitor.
Relationship of 'resctrl allocation' and 'resctrl monitor': 1. resctrl monitor monitors resources (cache or memory bandwidth) of particular allocation. 2. resctrl allocation may refer to the 'default' allocation if no dedicated resource 'control' applied to it. The 'default' allocation enjoys remaining resource that not allocated. 3. resctrl monitor belongs to 'default' allocation if no 'cachetune' specified in XML file. 4. one resctrl allocation may have several monitors. It is also permitted that there is no resctrl monitor associated with an allocation.
Key data structures:
+ struct _virResctrlAllocMon { + char *id; + char *path; + };
struct _virResctrlAlloc { virObject parent;
@@ -276,6 +289,12 @@ struct _virResctrlAlloc { virResctrlAllocMemBWPtr mem_bw; + virResctrlAllocMonPtr *monitors; + size_t nmonitors; }
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com> --- src/libvirt_private.syms | 6 + src/util/virresctrl.c | 361 ++++++++++++++++++++++++++++++++++++++++++++++- src/util/virresctrl.h | 31 ++++ 3 files changed, 394 insertions(+), 4 deletions(-)
Similar to the previous patch - there's a bit too much going on for just one patch here.
Introducing "default_group" and "monitors". This needs some separation.
Will split this patch into serval patches. The 'default_group' will be a separate patch.
I am not going to look at this patch. I think you really need to describe this default_group concept quite a bit more as you'd be totally changing the meaning of alloc->path by "removing" everything after the SYSFS_RESCTRL_PATH. What's the purpose of alloc->path then in this environment?
This is a bug. 'default_group' is not allowed to be removed in libvirt. It is created by the resource control file system at the time of mounting, can only be removed at the time of file system unmounting by the system. In next version virResctrlAllocRemove, a check will be made to determine if it is removing root directory /sys/fs/resctrl, if yes, report an error. Following paragraph explains the 'default_group' and some relevant concepts. I'll also write these content into cover of patch series. The resctrl default group is described initially by Kernel document 'intel_rdt_ui.txt', with the link "https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt". The default group is the root directory of the resource control file system, that is, the directory of '/sys/fs/resctrl/'. The default group is created immediately after the mounting of resctrl file system, and owns all the tasks and make full use of all resources. In these patches, the libvirt virresctrl 'default_group' is introduced by extending the scope of current 'resource allocation'. virResctrlAlloc is presenting a 'resource allocation'. 'default_group' is also called 'default allocation'. The 'default allocation' represents the root directory of resource control file system. The main difference to a non-default allocation is the 'default group' is created after fs mounting, and owns all system tasks, while non-default allocation need to create through 'mdkir' and explicit operation of filling PID into the tasks file. The main purpose for introducing 'default allocation' is creating monitors that for non-default allocation to save hardware resources. Think about the case: a KVM virtual machine with 1 working vcpu, we want to monitor the cache utilization of the vcpu but don't apply any resource limitaion on vcpu process. We have two possible solutions: 1. Create a new "resource allocation" through making a directory under /sys/fs/resctrl/, assuming the allocation is /sys/fs/resource/p0, and put the host process PID for the vcpu into file /fys/fs/resource/p0/tasks. Then Get the cache utilization information through file /fys/fs/resource/p0/mon_data/llc_occupancy. 2. Do not create an extra allocation, but create a resource monitor under default allocation by creating an sub-directory under '/sys/fs/resctrl/mon_group', then put the vcpu PID into the monitor's sub-directory's tasks file. Get the cache utilization information through this monitor's llc_occupancy file. Comparing with slotion1, solution2 uses less hardware resource by saving one CLOSID. Normally we have much more RMIDs than CLOSIDs, for CPU E5-2699v4, the number of CLOSID is 16, while RMID number is 176. To support 'default_group' the domain's xml configuration file need to be changed: The 'default allocation' has the similar behavior with original libvirt defined 'resource allocation' for creating a monitor group, getting resource consumption data from a monitor, as well as the task assignment operations. The 'default allocation' could be looked as a special 'resource allocation'. Libvirt treats virResctrlAlloc (sometimes called resource allocation) as the representing of resctrl allocation, each resctrl allocation has the capability to report the resource consumption information of involved tasks, through files 'llc_occupancy', 'mbm_total_bytes' and 'mbm_local_bytes' under the directory of allocation. virResctrlAllocMon (sometimes called resource monitor) is introduced to represent virResctrlAlloc's role for resource consumption monitoring. One or more resource monitors could be created to monitor the resource utilization for a small set of tasks of current allocation. This explains why the 'monitor' is being put under 'cachetune' element in domain's XML configuration file. <cputune> <cachetune vcpus='0-1'> <cache id='0' level='3' type='code' size='7680' unit='KiB'/> <cache id='1' level='3' type='data' size='3840' unit='KiB'/> + <monitor vcpus='0-1'/> + <monitor vcpus='0'/> </cachetune> </cputune> The 'defautl_group' is resource allocation shared by all system tasks that do not have specified resource limitation. In existing libvirt policy no resource limitation is allowed to put on it. so I need to generate configuration such as <cachetune vcpus='3'> + <monitor vcpus='3'/> </cachetune> In the implementation, this monitor for monitoring domain's vcpu 3 is created under 'default allocation', default allocation is the directory /sys/fs/resctrl, which is set up at time of mounting. And above is the reason for why element 'cache' is changed being optional in RNG file. <element name="cachetune"> <attribute name="vcpus"> <ref name='cpuset'/> </attribute> - <oneOrMore> + <zeroOrMore> <element name="cache"> <attribute name="id"> <ref name='unsignedInt'/> </attribute> <attribute name="level"> <ref name='unsignedInt'/> </attribute> <attribute name="type"> <choice> <value>both</value> <value>code</value> <value>data</value> </choice> </attribute> <attribute name="size"> <ref name='unsignedLong'/> </attribute> <optional> <attribute name='unit'> <ref name='unit'/> </attribute> </optional> </element> - </oneOrMore> + </zeroOrMore> <zeroOrMore> <element name="monitor"> <attribute name="vcpus"> <ref name='cpuset'/> </attribute> </element> </zeroOrMore> </element> Thanks for review. Huaqiang
John
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 47ea35f..1439327 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2645,12 +2645,17 @@ virCacheKernelTypeFromString; virCacheKernelTypeToString; virCacheTypeFromString; virCacheTypeToString; +virResctrlAllocAddMonitorPID; virResctrlAllocAddPID; virResctrlAllocCreate; +virResctrlAllocCreateMonitor; +virResctrlAllocDeleteMonitor; +virResctrlAllocDetermineMonitorPath; virResctrlAllocDeterminePath; virResctrlAllocForeachCache; virResctrlAllocForeachMemory; virResctrlAllocFormat; +virResctrlAllocGetCacheOccupancy; virResctrlAllocGetID; virResctrlAllocGetUnused; virResctrlAllocNew; @@ -2658,6 +2663,7 @@ virResctrlAllocRemove; virResctrlAllocSetCacheSize; virResctrlAllocSetID; virResctrlAllocSetMemoryBandwidth; +virResctrlAllocSetMonitor; virResctrlInfoGetCache; virResctrlInfoNew;
diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index b3bae6e..7215a47 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -257,6 +257,19 @@ struct _virResctrlAllocMemBW { size_t nbandwidths; };
+ +typedef struct _virResctrlAllocMon virResctrlAllocMon; typedef +virResctrlAllocMon *virResctrlAllocMonPtr; +/* virResctrlAllocMon denotes a resctrl monitoring group reporting +the resource + * consumption information for resource of either cache or memory + * bandwidth. */ +struct _virResctrlAllocMon { + /* monitoring group identifier, should be unique in scope of allocation */ + char *id; + /* directory path under /sys/fs/resctrl*/ + char *path; +}; + struct _virResctrlAlloc { virObject parent;
@@ -265,11 +278,21 @@ struct _virResctrlAlloc {
virResctrlAllocMemBWPtr mem_bw;
+ /* monintoring groups associated with current resource allocation + * it might report resource consumption information at a finer + * granularity */ + virResctrlAllocMonPtr *monitors; + size_t nmonitors; + /* The identifier (any unique string for now) */ char *id; /* libvirt-generated path in /sys/fs/resctrl for this particular * allocation */ char *path; + /* is this a default resctrl group? + * true : default group, directory path equals '/sys/fs/resctrl' + * false: non-default group */ + bool default_group; };
@@ -315,6 +338,13 @@ virResctrlAllocDispose(void *obj) VIR_FREE(alloc->mem_bw); }
+ for (i = 0; i < alloc->nmonitors; i++) { + virResctrlAllocMonPtr monitor = alloc->monitors[i]; + VIR_FREE(monitor->id); + VIR_FREE(monitor->path); + VIR_FREE(monitor); + } + VIR_FREE(alloc->monitors); VIR_FREE(alloc->id); VIR_FREE(alloc->path); VIR_FREE(alloc->levels); @@ -805,6 +835,7 @@ virResctrlInfoGetCache(virResctrlInfoPtr resctrl, memcpy((*controls)[*ncontrols - 1], &i_type->control, sizeof(i_type- control)); }
+ cachemon->nfeatures = 0; cachemon->max_allocation = 0;
if (resctrl->monitor_info) { @@ -817,7 +848,7 @@ virResctrlInfoGetCache(virResctrlInfoPtr resctrl, if (STREQLEN(info->features[i], "llc_", strlen("llc_"))) { if (virStringListAdd(&cachemon->features, info->features[i]) < 0) - goto error; + goto error; cachemon->nfeatures++; } } @@ -841,10 +872,19 @@ virResctrlInfoGetCache(virResctrlInfoPtr resctrl, virResctrlAllocPtr virResctrlAllocNew(void) { + virResctrlAllocPtr ret = NULL; + if (virResctrlInitialize() < 0) return NULL;
- return virObjectNew(virResctrlAllocClass); + ret = virObjectNew(virResctrlAllocClass); + if (!ret) + return NULL; + + /* By default, a resource group is a default group */ + ret->default_group = true; + + return ret; }
@@ -861,6 +901,9 @@ virResctrlAllocIsEmpty(virResctrlAllocPtr alloc) if (alloc->mem_bw) return false;
+ if (alloc->nmonitors) + return false; + for (i = 0; i < alloc->nlevels; i++) { virResctrlAllocPerLevelPtr a_level = alloc->levels[i];
@@ -2124,10 +2167,18 @@ int virResctrlAllocDeterminePath(virResctrlAllocPtr alloc, const char *machinename) { - return virResctrlDeterminePath(alloc->id, NULL, NULL, - machinename, &alloc->path); + if (alloc->default_group) { + if (VIR_STRDUP(alloc->path, SYSFS_RESCTRL_PATH) < 0) + return -1; + return 0; + } else { + + return virResctrlDeterminePath(alloc->id, NULL, NULL, + machinename, &alloc->path); + } }
+ static int virResctrlCreateGroup(virResctrlInfoPtr resctrl, char *path) @@ -2169,6 +2220,27 @@ virResctrlCreateGroup(virResctrlInfoPtr resctrl, return ret; }
+ /* In case of no explicit requirement for allocating cache and +memory + * bandwidth, set 'alloc->default' to 'true', then the monitoring + * group will be created under '/sys/fs/resctrl/mon_groups' in later + * invocation of virResctrlAllocCreate. + * Otherwise, set 'alloc->default' to false, create a new directory + * under '/sys/fs/resctrl/'. This is will cost a hardware 'COSID'.*/ +static int virResctrlAllocCheckDefault(virResctrlAllocPtr alloc) { + bool default_group = true; + if (!alloc) + return -1; + + if (alloc->nlevels) + default_group = false; + if (alloc->mem_bw && alloc->mem_bw->nbandwidths) + default_group = false; + + alloc->default_group = default_group; + return 0; +}
/* This checks if the directory for the alloc exists. If not it tries to create * it and apply appropriate alloc settings. */ @@ -2185,6 +2257,8 @@ virResctrlAllocCreate(virResctrlInfoPtr resctrl, if (!alloc) return 0;
+ virResctrlAllocCheckDefault(alloc); + if (virResctrlAllocDeterminePath(alloc, machinename) < 0) return -1;
@@ -2275,10 +2349,289 @@ virResctrlAllocRemove(virResctrlAllocPtr alloc) return 0;
VIR_DEBUG("Removing resctrl allocation %s", alloc->path); + + while (alloc->nmonitors > 0) { + ret = virResctrlAllocDeleteMonitor(alloc, alloc->monitors[0]->id); + if (ret < 0) + goto cleanup; + } + if (rmdir(alloc->path) != 0 && errno != ENOENT) { ret = -errno; VIR_ERROR(_("Unable to remove %s (%d)"), alloc->path, errno); }
+ ret = 0; + cleanup: + return ret; +} + + +static int +virResctrlAllocGetMonitor(virResctrlAllocPtr alloc, + const char *id, + virResctrlAllocMonPtr *monitor, + size_t *pos) { + size_t i = 0; + + if (!alloc || !id) + return -1; + + for (i = 0; i < alloc->nmonitors; i++) { + if (alloc->monitors[i]->id && + STREQ(id, (alloc->monitors[i])->id)) { + if (monitor) + *monitor = alloc->monitors[i]; + if (pos) + *pos = i; + return 0; + } + } + + return -1; +} + + +int +virResctrlAllocDetermineMonitorPath(virResctrlAllocPtr alloc, + const char *id, + const char *machinename) { + virResctrlAllocMonPtr monitor = NULL; + + if (virResctrlAllocGetMonitor(alloc, id, &monitor, NULL) < 0) + return -1; + + + return virResctrlDeterminePath(monitor->id, + alloc->path, + "mon_groups", + machinename, + &monitor->path); } + + +int +virResctrlAllocAddMonitorPID(virResctrlAllocPtr alloc, + const char *id, + pid_t pid) { + virResctrlAllocMonPtr monitor = NULL; + + if (virResctrlAllocGetMonitor(alloc, id, &monitor, NULL) < 0) + return -1; + + return virResctrlAddPID(monitor->path, pid); } + + +int +virResctrlAllocSetMonitor(virResctrlAllocPtr alloc, + const char *id) { + virResctrlAllocMonPtr monitor = NULL; + + if (!alloc || !id) + return - 1; + + if (virResctrlAllocGetMonitor(alloc, id, &monitor, NULL) < 0) { + if (VIR_ALLOC(monitor) < 0) + return -1; + } + + if (VIR_STRDUP(monitor->id, (char*)id) < 0) + return -1; + + if (VIR_APPEND_ELEMENT(alloc->monitors, alloc->nmonitors, monitor) < 0) + return -1; + + return 0; +} + +int +virResctrlAllocCreateMonitor(virResctrlInfoPtr resctrl, + virResctrlAllocPtr alloc, + const char *machinename, + const char *id) { + virResctrlAllocMonPtr monitor = NULL; + + if (virResctrlAllocGetMonitor(alloc, id, &monitor, NULL) < 0) + return - 1; + + if (virResctrlAllocDetermineMonitorPath(alloc, id, machinename) < 0) + return -1; + + VIR_DEBUG("Creating resctrl monitor %s", monitor->path); + if (virResctrlCreateGroup(resctrl, monitor->path) < 0) + return -1; + + return 0; +} + + +int +virResctrlAllocDeleteMonitor(virResctrlAllocPtr alloc, + const char *id) { + int ret = 0; + + virResctrlAllocMonPtr monitor = NULL; + size_t pos = 0; + + if (virResctrlAllocGetMonitor(alloc, id, &monitor, &pos) < 0) + return -1; + + VIR_DELETE_ELEMENT(alloc->monitors, pos, alloc->nmonitors); + + VIR_DEBUG("Deleting resctrl monitor %s ", monitor->path); + if (rmdir(monitor->path) != 0 && errno != ENOENT) { + ret = -errno; + VIR_ERROR(_("Unable to remove %s (%d)"), monitor->path, errno); + } + + VIR_FREE(monitor->id); + VIR_FREE(monitor->path); + VIR_FREE(monitor); + return ret; +} + + +static int +virResctrlAllocGetStatistic(virResctrlAllocPtr alloc, + const char *id, + const char *resfile, + unsigned int *nnodes, + unsigned int **nodeids, + unsigned int **nodevals) { + DIR *dirp = NULL; + int ret = -1; + int rv = -1; + struct dirent *ent = NULL; + virBuffer buf = VIR_BUFFER_INITIALIZER; + char *mondatapath = NULL; + size_t ntmpid = 0; + size_t ntmpval = 0; + virResctrlAllocMonPtr monitor = NULL; + + if (!nnodes || !nodeids || !nodevals) + return -1; + + if (virResctrlAllocGetMonitor(alloc, id, &monitor, NULL) < 0) + goto cleanup; + + if (!monitor || !monitor->path) + goto cleanup; + + rv = virDirOpenIfExists(&dirp, monitor->path); + if (rv <= 0) + goto cleanup; + + *nnodes = 0; + + virBufferAsprintf(&buf, "%s/mon_data", monitor->path); + + mondatapath = virBufferContentAndReset(&buf); + if (!mondatapath) + goto cleanup; + + if (virDirOpen(&dirp, mondatapath) < 0) + goto cleanup; + + while ((rv = virDirRead(dirp, &ent, mondatapath)) > 0) { + char *pstrid = NULL; + size_t i = 0; + unsigned int len = 0; + unsigned int counter = 0; + unsigned int cacheid = 0; + unsigned int cur_cacheid = 0; + unsigned int val = 0; + int tmpnodeid = 0; + int tmpnodeval = 0; + + if (ent->d_type != DT_DIR) + continue; + + /* mon_L3(|CODE|DATA)_xx, xx is cache id */ + if (STRNEQLEN(ent->d_name, "mon_L", 5)) + continue; + + len = strlen(ent->d_name); + pstrid = ent->d_name; + /* locating the cache id string: 'xx' */ + for (i = 0; i < len; i++) { + if (*(pstrid + i) == '_') + counter ++; + if (counter == 2) + break; + } + i++; + + if (i >= len) + goto cleanup; + + if (virStrToLong_uip(pstrid + i, NULL, 0, &cacheid) < 0) { + VIR_DEBUG("Cannot parse id from folder '%s'", ent->d_name); + goto cleanup; + } + + rv = virFileReadValueUint(&val, + "%s/%s/%s", + mondatapath, ent->d_name, resfile); + + if (rv == -2) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("file %s/%s/%s does not exist"), + mondatapath, ent->d_name, resfile); + goto cleanup; + } else { + if (rv < 0) + goto cleanup; + } + + /* The ultimate caller will be responiblefor free memory of + * 'nodeids' an 'nodevals' */ + if (VIR_APPEND_ELEMENT(*nodeids, ntmpid, cacheid) < 0) + goto cleanup; + if (VIR_APPEND_ELEMENT(*nodevals, ntmpval, val) < 0) + goto cleanup; + + cur_cacheid = ntmpval - 1; + /* sort the cache information in caach bank id's ascending order */ + for (i = 0; i < cur_cacheid; i++) { + if ((*nodeids)[cur_cacheid] < (*nodeids)[i]) { + tmpnodeid = (*nodeids)[cur_cacheid]; + tmpnodeval = (*nodevals)[cur_cacheid]; + (*nodeids)[cur_cacheid] = (*nodeids)[i]; + (*nodevals)[cur_cacheid] = (*nodevals)[i]; + (*nodeids)[i] = tmpnodeid; + (*nodevals)[i] = tmpnodeval; + } + } + } + + (*nnodes) = ntmpval; + ret = 0; + cleanup: + VIR_FREE(mondatapath); + VIR_DIR_CLOSE(dirp); + return ret; +} + + +int +virResctrlAllocGetCacheOccupancy(virResctrlAllocPtr alloc, + const char *id, + unsigned int *nbank, + unsigned int **bankids, + unsigned int **bankcaches) { + int ret = - 1; + + ret = virResctrlAllocGetStatistic(alloc, id, "llc_occupancy", + nbank, bankids, bankcaches); + return ret; } diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h index 51bb68b..0f63997 100644 --- a/src/util/virresctrl.h +++ b/src/util/virresctrl.h @@ -160,4 +160,35 @@ virResctrlAllocAddPID(virResctrlAllocPtr alloc, int virResctrlAllocRemove(virResctrlAllocPtr alloc);
+int +virResctrlAllocDetermineMonitorPath(virResctrlAllocPtr alloc, + const char *id, + const char *machinename); + +int +virResctrlAllocAddMonitorPID(virResctrlAllocPtr alloc, + const char *id, + pid_t pid); + +int +virResctrlAllocSetMonitor(virResctrlAllocPtr alloc, + const char *id); + +int +virResctrlAllocCreateMonitor(virResctrlInfoPtr resctrl, + virResctrlAllocPtr alloc, + const char *machinename, + const char *id); + +int +virResctrlAllocDeleteMonitor(virResctrlAllocPtr alloc, + const char *id); + +int +virResctrlAllocGetCacheOccupancy(virResctrlAllocPtr alloc, + const char *id, + unsigned int *nbank, + unsigned int **bankids, + unsigned int **bankcaches); + #endif /* __VIR_RESCTRL_H__ */

On 09/10/2018 02:10 PM, Wang, Huaqiang wrote:
-----Original Message----- From: John Ferlan [mailto:jferlan@redhat.com] Sent: Wednesday, September 5, 2018 11:00 PM To: Wang, Huaqiang <huaqiang.wang@intel.com>; libvir-list@redhat.com Cc: Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [PATCH 06/10] util: Introduce resctrl monitor for CMT
On 08/27/2018 07:23 AM, Wang Huaqiang wrote:
'virResctrlAllocMon' denotes a resctrl monitor reporting the resource consumption information.
This patch introduced the interfaces for resctrl monitor.
Relationship of 'resctrl allocation' and 'resctrl monitor': 1. resctrl monitor monitors resources (cache or memory bandwidth) of particular allocation. 2. resctrl allocation may refer to the 'default' allocation if no dedicated resource 'control' applied to it. The 'default' allocation enjoys remaining resource that not allocated. 3. resctrl monitor belongs to 'default' allocation if no 'cachetune' specified in XML file. 4. one resctrl allocation may have several monitors. It is also permitted that there is no resctrl monitor associated with an allocation.
Key data structures:
+ struct _virResctrlAllocMon { + char *id; + char *path; + };
struct _virResctrlAlloc { virObject parent;
@@ -276,6 +289,12 @@ struct _virResctrlAlloc { virResctrlAllocMemBWPtr mem_bw; + virResctrlAllocMonPtr *monitors; + size_t nmonitors; }
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com> --- src/libvirt_private.syms | 6 + src/util/virresctrl.c | 361 ++++++++++++++++++++++++++++++++++++++++++++++- src/util/virresctrl.h | 31 ++++ 3 files changed, 394 insertions(+), 4 deletions(-)
Similar to the previous patch - there's a bit too much going on for just one patch here.
Introducing "default_group" and "monitors". This needs some separation.
Will split this patch into serval patches. The 'default_group' will be a separate patch.
I am not going to look at this patch. I think you really need to describe this default_group concept quite a bit more as you'd be totally changing the meaning of alloc->path by "removing" everything after the SYSFS_RESCTRL_PATH. What's the purpose of alloc->path then in this environment?
This is a bug. 'default_group' is not allowed to be removed in libvirt. It is created by the resource control file system at the time of mounting, can only be removed at the time of file system unmounting by the system. In next version virResctrlAllocRemove, a check will be made to determine if it is removing root directory /sys/fs/resctrl, if yes, report an error.
Like I've said before - just be sure to properly separate things. Fixing existing issues in the middle of new code changes is just one of those areas that makes reviewing difficult.
Following paragraph explains the 'default_group' and some relevant concepts. I'll also write these content into cover of patch series.
In much less detail I hope... Not sure I can properly page in the rest. I read it, but I certainly cannot internalize it. John
The resctrl default group is described initially by Kernel document 'intel_rdt_ui.txt', with the link "https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt".
The default group is the root directory of the resource control file system, that is, the directory of '/sys/fs/resctrl/'. The default group is created immediately after the mounting of resctrl file system, and owns all the tasks and make full use of all resources.
In these patches, the libvirt virresctrl 'default_group' is introduced by extending the scope of current 'resource allocation'. virResctrlAlloc is presenting a 'resource allocation'. 'default_group' is also called 'default allocation'.
The 'default allocation' represents the root directory of resource control file system. The main difference to a non-default allocation is the 'default group' is created after fs mounting, and owns all system tasks, while non-default allocation need to create through 'mdkir' and explicit operation of filling PID into the tasks file.
The main purpose for introducing 'default allocation' is creating monitors that for non-default allocation to save hardware resources.
Think about the case: a KVM virtual machine with 1 working vcpu, we want to monitor the cache utilization of the vcpu but don't apply any resource limitaion on vcpu process. We have two possible solutions:
1. Create a new "resource allocation" through making a directory under /sys/fs/resctrl/, assuming the allocation is /sys/fs/resource/p0, and put the host process PID for the vcpu into file /fys/fs/resource/p0/tasks. Then Get the cache utilization information through file /fys/fs/resource/p0/mon_data/llc_occupancy.
2. Do not create an extra allocation, but create a resource monitor under default allocation by creating an sub-directory under '/sys/fs/resctrl/mon_group', then put the vcpu PID into the monitor's sub-directory's tasks file. Get the cache utilization information through this monitor's llc_occupancy file.
Comparing with slotion1, solution2 uses less hardware resource by saving one CLOSID. Normally we have much more RMIDs than CLOSIDs, for CPU E5-2699v4, the number of CLOSID is 16, while RMID number is 176.
To support 'default_group' the domain's xml configuration file need to be changed:
The 'default allocation' has the similar behavior with original libvirt defined 'resource allocation' for creating a monitor group, getting resource consumption data from a monitor, as well as the task assignment operations. The 'default allocation' could be looked as a special 'resource allocation'.
Libvirt treats virResctrlAlloc (sometimes called resource allocation) as the representing of resctrl allocation, each resctrl allocation has the capability to report the resource consumption information of involved tasks, through files 'llc_occupancy', 'mbm_total_bytes' and 'mbm_local_bytes' under the directory of allocation. virResctrlAllocMon (sometimes called resource monitor) is introduced to represent virResctrlAlloc's role for resource consumption monitoring. One or more resource monitors could be created to monitor the resource utilization for a small set of tasks of current allocation. This explains why the 'monitor' is being put under 'cachetune' element in domain's XML configuration file.
<cputune> <cachetune vcpus='0-1'> <cache id='0' level='3' type='code' size='7680' unit='KiB'/> <cache id='1' level='3' type='data' size='3840' unit='KiB'/> + <monitor vcpus='0-1'/> + <monitor vcpus='0'/> </cachetune> </cputune>
The 'defautl_group' is resource allocation shared by all system tasks that do not have specified resource limitation. In existing libvirt policy no resource limitation is allowed to put on it. so I need to generate configuration such as
<cachetune vcpus='3'> + <monitor vcpus='3'/> </cachetune>
In the implementation, this monitor for monitoring domain's vcpu 3 is created under 'default allocation', default allocation is the directory /sys/fs/resctrl, which is set up at time of mounting.
And above is the reason for why element 'cache' is changed being optional in RNG file.
<element name="cachetune"> <attribute name="vcpus"> <ref name='cpuset'/> </attribute> - <oneOrMore> + <zeroOrMore> <element name="cache"> <attribute name="id"> <ref name='unsignedInt'/> </attribute> <attribute name="level"> <ref name='unsignedInt'/> </attribute> <attribute name="type"> <choice> <value>both</value> <value>code</value> <value>data</value> </choice> </attribute> <attribute name="size"> <ref name='unsignedLong'/> </attribute> <optional> <attribute name='unit'> <ref name='unit'/> </attribute> </optional> </element> - </oneOrMore> + </zeroOrMore> <zeroOrMore> <element name="monitor"> <attribute name="vcpus"> <ref name='cpuset'/> </attribute> </element> </zeroOrMore> </element>
Thanks for review. Huaqiang
[...]

Changed the interface from virDomainResctrlAppend(virDomainDefPtr def, xmlNodePtr node, virResctrlAllocPtr alloc, virBitmapPtr vcpus, unsigned int flags); to virDomainResctrlAppend(virDomainDefPtr def, xmlNodePtr node, virDomainResctrlDefPtr resctrl, unsigned int flags); Changes will let virDomainRestrlAppend pass through more information with virDomainResctrlDefPtr, such as monitoring groups associated with the allocation. Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com> --- src/conf/domain_conf.c | 48 ++++++++++++++++++++++++++++++++++-------------- 1 file changed, 34 insertions(+), 14 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index bde9fef..9a65655 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -19247,17 +19247,21 @@ virDomainCachetuneDefParseCache(xmlXPathContextPtr ctxt, static int virDomainResctrlAppend(virDomainDefPtr def, xmlNodePtr node, - virResctrlAllocPtr alloc, - virBitmapPtr vcpus, + virDomainResctrlDefPtr resctrl, unsigned int flags) { char *vcpus_str = NULL; char *alloc_id = NULL; - virDomainResctrlDefPtr tmp_resctrl = NULL; + virResctrlAllocPtr alloc = NULL; + virBitmapPtr vcpus = NULL; + int ret = -1; - if (VIR_ALLOC(tmp_resctrl) < 0) - goto cleanup; + if (!resctrl) + return -1; + + alloc = virObjectRef(resctrl->alloc); + vcpus = resctrl->vcpus; /* We need to format it back because we need to be consistent in the naming * even when users specify some "sub-optimal" string there. */ @@ -19281,15 +19285,12 @@ virDomainResctrlAppend(virDomainDefPtr def, if (virResctrlAllocSetID(alloc, alloc_id) < 0) goto cleanup; - tmp_resctrl->vcpus = vcpus; - tmp_resctrl->alloc = alloc; - - if (VIR_APPEND_ELEMENT(def->resctrls, def->nresctrls, tmp_resctrl) < 0) + if (VIR_APPEND_ELEMENT(def->resctrls, def->nresctrls, resctrl) < 0) goto cleanup; ret = 0; cleanup: - virDomainResctrlDefFree(tmp_resctrl); + virObjectUnref(alloc); VIR_FREE(alloc_id); VIR_FREE(vcpus_str); return ret; @@ -19306,6 +19307,8 @@ virDomainCachetuneDefParse(virDomainDefPtr def, xmlNodePtr *nodes = NULL; virBitmapPtr vcpus = NULL; virResctrlAllocPtr alloc = NULL; + virDomainResctrlDefPtr tmp_resctrl = NULL; + ssize_t i = 0; int n; int ret = -1; @@ -19349,15 +19352,24 @@ virDomainCachetuneDefParse(virDomainDefPtr def, goto cleanup; } - if (virDomainResctrlAppend(def, node, alloc, vcpus, flags) < 0) + if (VIR_ALLOC(tmp_resctrl) < 0) goto cleanup; - vcpus = NULL; + + tmp_resctrl->vcpus = vcpus; + tmp_resctrl->alloc = virObjectRef(alloc); + + if (virDomainResctrlAppend(def, node, tmp_resctrl, flags) < 0) + goto cleanup; + alloc = NULL; + vcpus = NULL; + tmp_resctrl = NULL; ret = 0; cleanup: ctxt->node = oldnode; virObjectUnref(alloc); + VIR_FREE(tmp_resctrl); virBitmapFree(vcpus); VIR_FREE(nodes); return ret; @@ -19514,6 +19526,7 @@ virDomainMemorytuneDefParse(virDomainDefPtr def, xmlNodePtr *nodes = NULL; virBitmapPtr vcpus = NULL; virResctrlAllocPtr alloc = NULL; + virDomainResctrlDefPtr tmp_resctrl = NULL; ssize_t i = 0; int n; int ret = -1; @@ -19560,17 +19573,24 @@ virDomainMemorytuneDefParse(virDomainDefPtr def, * just update the existing alloc information, which is done in above * virDomainMemorytuneDefParseMemory */ if (new_alloc) { - if (virDomainResctrlAppend(def, node, alloc, vcpus, flags) < 0) + if (VIR_ALLOC(tmp_resctrl) < 0) + goto cleanup; + + tmp_resctrl->alloc = virObjectRef(alloc); + tmp_resctrl->vcpus = vcpus; + if (virDomainResctrlAppend(def, node, tmp_resctrl, flags) < 0) goto cleanup; vcpus = NULL; alloc = NULL; + tmp_resctrl = NULL; } ret = 0; cleanup: ctxt->node = oldnode; - virObjectUnref(alloc); virBitmapFree(vcpus); + virObjectUnref(alloc); + VIR_FREE(tmp_resctrl); VIR_FREE(nodes); return ret; } -- 2.7.4

On 08/27/2018 07:23 AM, Wang Huaqiang wrote:
Changed the interface from virDomainResctrlAppend(virDomainDefPtr def, xmlNodePtr node, virResctrlAllocPtr alloc, virBitmapPtr vcpus, unsigned int flags); to virDomainResctrlAppend(virDomainDefPtr def, xmlNodePtr node, virDomainResctrlDefPtr resctrl, unsigned int flags);
Changes will let virDomainRestrlAppend pass through more information with virDomainResctrlDefPtr, such as monitoring groups associated with the allocation.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com> --- src/conf/domain_conf.c | 48 ++++++++++++++++++++++++++++++++++-------------- 1 file changed, 34 insertions(+), 14 deletions(-)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index bde9fef..9a65655 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -19247,17 +19247,21 @@ virDomainCachetuneDefParseCache(xmlXPathContextPtr ctxt, static int virDomainResctrlAppend(virDomainDefPtr def, xmlNodePtr node, - virResctrlAllocPtr alloc, - virBitmapPtr vcpus, + virDomainResctrlDefPtr resctrl, unsigned int flags) { char *vcpus_str = NULL; char *alloc_id = NULL; - virDomainResctrlDefPtr tmp_resctrl = NULL; + virResctrlAllocPtr alloc = NULL; + virBitmapPtr vcpus = NULL;
No need for locals here - just change to resctrl->{alloc|vcpus}
+ int ret = -1;
- if (VIR_ALLOC(tmp_resctrl) < 0) - goto cleanup; + if (!resctrl) + return -1;
Again, here we have a programming error without an error message which results in a generic libvirt an error occurred. Either create a specific error message or "assume" that your caller has done the right thing.
+ + alloc = virObjectRef(resctrl->alloc);
Yikes, how many Ref's are we taking on this? [1] I don't think this is necessary since we Unref later and currently both callers do the Ref
+ vcpus = resctrl->vcpus;
/* We need to format it back because we need to be consistent in the naming * even when users specify some "sub-optimal" string there. */ @@ -19281,15 +19285,12 @@ virDomainResctrlAppend(virDomainDefPtr def, if (virResctrlAllocSetID(alloc, alloc_id) < 0) goto cleanup;
- tmp_resctrl->vcpus = vcpus; - tmp_resctrl->alloc = alloc; - - if (VIR_APPEND_ELEMENT(def->resctrls, def->nresctrls, tmp_resctrl) < 0) + if (VIR_APPEND_ELEMENT(def->resctrls, def->nresctrls, resctrl) < 0) goto cleanup;
ret = 0; cleanup: - virDomainResctrlDefFree(tmp_resctrl); + virObjectUnref(alloc); VIR_FREE(alloc_id); VIR_FREE(vcpus_str); return ret; @@ -19306,6 +19307,8 @@ virDomainCachetuneDefParse(virDomainDefPtr def, xmlNodePtr *nodes = NULL; virBitmapPtr vcpus = NULL; virResctrlAllocPtr alloc = NULL; + virDomainResctrlDefPtr tmp_resctrl = NULL; + ssize_t i = 0; int n; int ret = -1; @@ -19349,15 +19352,24 @@ virDomainCachetuneDefParse(virDomainDefPtr def, goto cleanup; }
- if (virDomainResctrlAppend(def, node, alloc, vcpus, flags) < 0) + if (VIR_ALLOC(tmp_resctrl) < 0) goto cleanup; - vcpus = NULL; + + tmp_resctrl->vcpus = vcpus; + tmp_resctrl->alloc = virObjectRef(alloc);
[1] Seems the called function also takes a Ref?
+ + if (virDomainResctrlAppend(def, node, tmp_resctrl, flags) < 0) + goto cleanup; + alloc = NULL; + vcpus = NULL; + tmp_resctrl = NULL;
This sequence is quite familiar with [2] ...
ret = 0; cleanup: ctxt->node = oldnode; virObjectUnref(alloc); + VIR_FREE(tmp_resctrl); virBitmapFree(vcpus); VIR_FREE(nodes); return ret; @@ -19514,6 +19526,7 @@ virDomainMemorytuneDefParse(virDomainDefPtr def, xmlNodePtr *nodes = NULL; virBitmapPtr vcpus = NULL; virResctrlAllocPtr alloc = NULL; + virDomainResctrlDefPtr tmp_resctrl = NULL; ssize_t i = 0; int n; int ret = -1; @@ -19560,17 +19573,24 @@ virDomainMemorytuneDefParse(virDomainDefPtr def, * just update the existing alloc information, which is done in above * virDomainMemorytuneDefParseMemory */ if (new_alloc) { - if (virDomainResctrlAppend(def, node, alloc, vcpus, flags) < 0) + if (VIR_ALLOC(tmp_resctrl) < 0) + goto cleanup; + + tmp_resctrl->alloc = virObjectRef(alloc);
[1] Seems the called function also takes a Ref?
+ tmp_resctrl->vcpus = vcpus; + if (virDomainResctrlAppend(def, node, tmp_resctrl, flags) < 0) goto cleanup; vcpus = NULL; alloc = NULL; + tmp_resctrl = NULL;
[2] ... this sequence It seems to me you could create helper : virDomainResctrlCreate(def, node, alloc, vcpus, flags) which could : if (VIR_ALLOC(resctrl) < 0) return -1; resctrl->alloc = alloc; resctrl->vcpus = vcpus; if (virDomainResctrlAppend(def, node, resctrl, flags) < 0) { VIR_FREE(resctrl); return -1; } virObjectRef(alloc); return 0; with the current callers just changing from Append to Create keeping their alloc = NULL and vcpus = NULL on success. John
}
ret = 0; cleanup: ctxt->node = oldnode; - virObjectUnref(alloc); virBitmapFree(vcpus); + virObjectUnref(alloc); + VIR_FREE(tmp_resctrl); VIR_FREE(nodes); return ret; }

-----Original Message----- From: John Ferlan [mailto:jferlan@redhat.com] Sent: Wednesday, September 5, 2018 11:49 PM To: Wang, Huaqiang <huaqiang.wang@intel.com>; libvir-list@redhat.com Cc: Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [PATCH 07/10] conf: refactor virDomainResctrlAppend
On 08/27/2018 07:23 AM, Wang Huaqiang wrote:
Changed the interface from virDomainResctrlAppend(virDomainDefPtr def, xmlNodePtr node, virResctrlAllocPtr alloc, virBitmapPtr vcpus, unsigned int flags); to virDomainResctrlAppend(virDomainDefPtr def, xmlNodePtr node, virDomainResctrlDefPtr resctrl, unsigned int flags);
Changes will let virDomainRestrlAppend pass through more information with virDomainResctrlDefPtr, such as monitoring groups associated with the allocation.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com> --- src/conf/domain_conf.c | 48 ++++++++++++++++++++++++++++++++++-------------- 1 file changed, 34 insertions(+), 14 deletions(-)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index bde9fef..9a65655 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -19247,17 +19247,21 @@ virDomainCachetuneDefParseCache(xmlXPathContextPtr ctxt, static int virDomainResctrlAppend(virDomainDefPtr def, xmlNodePtr node, - virResctrlAllocPtr alloc, - virBitmapPtr vcpus, + virDomainResctrlDefPtr resctrl, unsigned int flags) { char *vcpus_str = NULL; char *alloc_id = NULL; - virDomainResctrlDefPtr tmp_resctrl = NULL; + virResctrlAllocPtr alloc = NULL; + virBitmapPtr vcpus = NULL;
No need for locals here - just change to resctrl->{alloc|vcpus}
Local varaibles 'alloc', 'vcpus' will be removed.
+ int ret = -1;
- if (VIR_ALLOC(tmp_resctrl) < 0) - goto cleanup; + if (!resctrl) + return -1;
Again, here we have a programming error without an error message which results in a generic libvirt an error occurred. Either create a specific error message or "assume" that your caller has done the right thing.
This is a static function, the caller will ensure its safety. How about removing these two lines?
+ + alloc = virObjectRef(resctrl->alloc);
Yikes, how many Ref's are we taking on this? [1]
I don't think this is necessary since we Unref later and currently both callers do the Ref
Will remove this local variable along with this Ref. But the caller's Ref/unRef remained. Thanks.
+ vcpus = resctrl->vcpus;
/* We need to format it back because we need to be consistent in the naming * even when users specify some "sub-optimal" string there. */ @@ -19281,15 +19285,12 @@ virDomainResctrlAppend(virDomainDefPtr def, if (virResctrlAllocSetID(alloc, alloc_id) < 0) goto cleanup;
- tmp_resctrl->vcpus = vcpus; - tmp_resctrl->alloc = alloc; - - if (VIR_APPEND_ELEMENT(def->resctrls, def->nresctrls, tmp_resctrl) < 0) + if (VIR_APPEND_ELEMENT(def->resctrls, def->nresctrls, resctrl) < + 0) goto cleanup;
ret = 0; cleanup: - virDomainResctrlDefFree(tmp_resctrl); + virObjectUnref(alloc); VIR_FREE(alloc_id); VIR_FREE(vcpus_str); return ret; @@ -19306,6 +19307,8 @@ virDomainCachetuneDefParse(virDomainDefPtr def, xmlNodePtr *nodes = NULL; virBitmapPtr vcpus = NULL; virResctrlAllocPtr alloc = NULL; + virDomainResctrlDefPtr tmp_resctrl = NULL; + ssize_t i = 0; int n; int ret = -1; @@ -19349,15 +19352,24 @@ virDomainCachetuneDefParse(virDomainDefPtr def, goto cleanup; }
- if (virDomainResctrlAppend(def, node, alloc, vcpus, flags) < 0) + if (VIR_ALLOC(tmp_resctrl) < 0) goto cleanup; - vcpus = NULL; + + tmp_resctrl->vcpus = vcpus; + tmp_resctrl->alloc = virObjectRef(alloc);
[1] Seems the called function also takes a Ref?
Yes. virDomainResctrlAppend takes a Ref, and the caller also take another Ref, it is only necessary to take a Ref at the caller level, right? A Ref active to an object is ensuring the object memory will not be released, right? Anyway, the local 'alloc' will be removed, and the Ref is removed too, but the caller's Ref/unRef will be kept.
+ + if (virDomainResctrlAppend(def, node, tmp_resctrl, flags) < 0) + goto cleanup; + alloc = NULL; + vcpus = NULL; + tmp_resctrl = NULL;
This sequence is quite familiar with [2] ...
ret = 0; cleanup: ctxt->node = oldnode; virObjectUnref(alloc); + VIR_FREE(tmp_resctrl); virBitmapFree(vcpus); VIR_FREE(nodes); return ret; @@ -19514,6 +19526,7 @@
virDomainMemorytuneDefParse(virDomainDefPtr def,
xmlNodePtr *nodes = NULL; virBitmapPtr vcpus = NULL; virResctrlAllocPtr alloc = NULL; + virDomainResctrlDefPtr tmp_resctrl = NULL; ssize_t i = 0; int n; int ret = -1; @@ -19560,17 +19573,24 @@
virDomainMemorytuneDefParse(virDomainDefPtr def,
* just update the existing alloc information, which is done in above * virDomainMemorytuneDefParseMemory */ if (new_alloc) { - if (virDomainResctrlAppend(def, node, alloc, vcpus, flags) < 0) + if (VIR_ALLOC(tmp_resctrl) < 0) + goto cleanup; + + tmp_resctrl->alloc = virObjectRef(alloc);
[1] Seems the called function also takes a Ref?
+ tmp_resctrl->vcpus = vcpus; + if (virDomainResctrlAppend(def, node, tmp_resctrl, flags) < + 0) goto cleanup; vcpus = NULL; alloc = NULL; + tmp_resctrl = NULL;
[2] ... this sequence
It seems to me you could create helper :
virDomainResctrlCreate(def, node, alloc, vcpus, flags)
which could :
if (VIR_ALLOC(resctrl) < 0) return -1;
resctrl->alloc = alloc; resctrl->vcpus = vcpus; if (virDomainResctrlAppend(def, node, resctrl, flags) < 0) { VIR_FREE(resctrl); return -1; }
virObjectRef(alloc); return 0;
with the current callers just changing from Append to Create keeping their alloc = NULL and vcpus = NULL on success.
Agree. Thanks for the sample code. In later patch, some code for paring the configuration of monitor is added, I also will add this part of code into this new helper. Will create virDomainResctrlCreate to remove the code duplication. Thanks for review. Huaqiang
John
}
ret = 0; cleanup: ctxt->node = oldnode; - virObjectUnref(alloc); virBitmapFree(vcpus); + virObjectUnref(alloc); + VIR_FREE(tmp_resctrl); VIR_FREE(nodes); return ret; }

On 09/10/2018 02:13 PM, Wang, Huaqiang wrote:
-----Original Message----- From: John Ferlan [mailto:jferlan@redhat.com] Sent: Wednesday, September 5, 2018 11:49 PM To: Wang, Huaqiang <huaqiang.wang@intel.com>; libvir-list@redhat.com Cc: Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [PATCH 07/10] conf: refactor virDomainResctrlAppend
On 08/27/2018 07:23 AM, Wang Huaqiang wrote:
Changed the interface from virDomainResctrlAppend(virDomainDefPtr def, xmlNodePtr node, virResctrlAllocPtr alloc, virBitmapPtr vcpus, unsigned int flags); to virDomainResctrlAppend(virDomainDefPtr def, xmlNodePtr node, virDomainResctrlDefPtr resctrl, unsigned int flags);
Changes will let virDomainRestrlAppend pass through more information with virDomainResctrlDefPtr, such as monitoring groups associated with the allocation.
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com> --- src/conf/domain_conf.c | 48 ++++++++++++++++++++++++++++++++++-------------- 1 file changed, 34 insertions(+), 14 deletions(-)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index bde9fef..9a65655 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -19247,17 +19247,21 @@ virDomainCachetuneDefParseCache(xmlXPathContextPtr ctxt, static int virDomainResctrlAppend(virDomainDefPtr def, xmlNodePtr node, - virResctrlAllocPtr alloc, - virBitmapPtr vcpus, + virDomainResctrlDefPtr resctrl, unsigned int flags) { char *vcpus_str = NULL; char *alloc_id = NULL; - virDomainResctrlDefPtr tmp_resctrl = NULL; + virResctrlAllocPtr alloc = NULL; + virBitmapPtr vcpus = NULL;
No need for locals here - just change to resctrl->{alloc|vcpus}
Local varaibles 'alloc', 'vcpus' will be removed.
+ int ret = -1;
- if (VIR_ALLOC(tmp_resctrl) < 0) - goto cleanup; + if (!resctrl) + return -1;
Again, here we have a programming error without an error message which results in a generic libvirt an error occurred. Either create a specific error message or "assume" that your caller has done the right thing.
This is a static function, the caller will ensure its safety. How about removing these two lines?
Seems reasonable...
+ + alloc = virObjectRef(resctrl->alloc);
Yikes, how many Ref's are we taking on this? [1]
I don't think this is necessary since we Unref later and currently both callers do the Ref
Will remove this local variable along with this Ref. But the caller's Ref/unRef remained. Thanks.
+ vcpus = resctrl->vcpus;
/* We need to format it back because we need to be consistent in the naming * even when users specify some "sub-optimal" string there. */ @@ -19281,15 +19285,12 @@ virDomainResctrlAppend(virDomainDefPtr def, if (virResctrlAllocSetID(alloc, alloc_id) < 0) goto cleanup;
- tmp_resctrl->vcpus = vcpus; - tmp_resctrl->alloc = alloc; - - if (VIR_APPEND_ELEMENT(def->resctrls, def->nresctrls, tmp_resctrl) < 0) + if (VIR_APPEND_ELEMENT(def->resctrls, def->nresctrls, resctrl) < + 0) goto cleanup;
ret = 0; cleanup: - virDomainResctrlDefFree(tmp_resctrl); + virObjectUnref(alloc); VIR_FREE(alloc_id); VIR_FREE(vcpus_str); return ret; @@ -19306,6 +19307,8 @@ virDomainCachetuneDefParse(virDomainDefPtr def, xmlNodePtr *nodes = NULL; virBitmapPtr vcpus = NULL; virResctrlAllocPtr alloc = NULL; + virDomainResctrlDefPtr tmp_resctrl = NULL; + ssize_t i = 0; int n; int ret = -1; @@ -19349,15 +19352,24 @@ virDomainCachetuneDefParse(virDomainDefPtr def, goto cleanup; }
- if (virDomainResctrlAppend(def, node, alloc, vcpus, flags) < 0) + if (VIR_ALLOC(tmp_resctrl) < 0) goto cleanup; - vcpus = NULL; + + tmp_resctrl->vcpus = vcpus; + tmp_resctrl->alloc = virObjectRef(alloc);
[1] Seems the called function also takes a Ref?
Yes. virDomainResctrlAppend takes a Ref, and the caller also take another Ref, it is only necessary to take a Ref at the caller level, right? A Ref active to an object is ensuring the object memory will not be released, right? Anyway, the local 'alloc' will be removed, and the Ref is removed too, but the caller's Ref/unRef will be kept.
When you place some object into a second structure and that structure is successfully placed into a list that would be Free'd at a different point in time of it's initial "parent", then increase the refcnt. Each Free routine then would have the Unref of the object indicating it's done using it. John
+ + if (virDomainResctrlAppend(def, node, tmp_resctrl, flags) < 0) + goto cleanup; + alloc = NULL; + vcpus = NULL; + tmp_resctrl = NULL;
This sequence is quite familiar with [2] ...
ret = 0; cleanup: ctxt->node = oldnode; virObjectUnref(alloc); + VIR_FREE(tmp_resctrl); virBitmapFree(vcpus); VIR_FREE(nodes); return ret; @@ -19514,6 +19526,7 @@
virDomainMemorytuneDefParse(virDomainDefPtr def,
xmlNodePtr *nodes = NULL; virBitmapPtr vcpus = NULL; virResctrlAllocPtr alloc = NULL; + virDomainResctrlDefPtr tmp_resctrl = NULL; ssize_t i = 0; int n; int ret = -1; @@ -19560,17 +19573,24 @@
virDomainMemorytuneDefParse(virDomainDefPtr def,
* just update the existing alloc information, which is done in above * virDomainMemorytuneDefParseMemory */ if (new_alloc) { - if (virDomainResctrlAppend(def, node, alloc, vcpus, flags) < 0) + if (VIR_ALLOC(tmp_resctrl) < 0) + goto cleanup; + + tmp_resctrl->alloc = virObjectRef(alloc);
[1] Seems the called function also takes a Ref?
+ tmp_resctrl->vcpus = vcpus; + if (virDomainResctrlAppend(def, node, tmp_resctrl, flags) < + 0) goto cleanup; vcpus = NULL; alloc = NULL; + tmp_resctrl = NULL;
[2] ... this sequence
It seems to me you could create helper :
virDomainResctrlCreate(def, node, alloc, vcpus, flags)
which could :
if (VIR_ALLOC(resctrl) < 0) return -1;
resctrl->alloc = alloc; resctrl->vcpus = vcpus; if (virDomainResctrlAppend(def, node, resctrl, flags) < 0) { VIR_FREE(resctrl); return -1; }
virObjectRef(alloc); return 0;
with the current callers just changing from Append to Create keeping their alloc = NULL and vcpus = NULL on success.
Agree. Thanks for the sample code. In later patch, some code for paring the configuration of monitor is added, I also will add this part of code into this new helper. Will create virDomainResctrlCreate to remove the code duplication.
Thanks for review. Huaqiang
John
}
ret = 0; cleanup: ctxt->node = oldnode; - virObjectUnref(alloc); virBitmapFree(vcpus); + virObjectUnref(alloc); + VIR_FREE(tmp_resctrl); VIR_FREE(nodes); return ret; }

Introduce resource monitoring group in domain configuration file to support CPU cache monitoring technology (CMT). Domain rng file changes, supporting following types of resource monitoring group regarding the allocation regin it belongs to: 1. monitoring group that working for partial working thread of current allocation: e.g. "<monitor vcpus='0'/>" creates monitoring group special for vcpu '0' while an allocation group is created for vcpus of '0' *and* '1'. 2. monitoring group for whole vcpu set of current allocation: e.g. "<monitor vcpus='0-1'/>" creates monitoring group for all vcpus belonging to current allocation. 3. monitoring group for vcpu(s) that does not have dedicated allocation group: e.g. "<monitor vcpus='3'/>" creates a monitoring group but no resource control applied to it. <cputune> <cachetune vcpus='0-1'> <cache id='0' level='3' type='code' size='7680' unit='KiB'/> <cache id='1' level='3' type='data' size='3840' unit='KiB'/> + <monitor vcpus='0-1'/> + <monitor vcpus='0'/> </cachetune> <cachetune vcpus='2'> <cache id='1' level='3' type='code' size='6' unit='MiB'/> + <monitor vcpus='2'/> </cachetune> <cachetune vcpus='3'> + <monitor vcpus='3'/> </cachetune> </cputune> Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com> --- docs/formatdomain.html.in | 14 ++- docs/schemas/domaincommon.rng | 11 +- src/conf/domain_conf.c | 131 ++++++++++++++++++--- src/conf/domain_conf.h | 20 ++++ tests/genericxml2xmlindata/cachetune-cdp.xml | 2 + .../cachetune-colliding-monitors.xml | 36 ++++++ tests/genericxml2xmlindata/cachetune-small.xml | 1 + tests/genericxml2xmlindata/cachetune.xml | 3 + tests/genericxml2xmltest.c | 4 + 9 files changed, 204 insertions(+), 18 deletions(-) create mode 100644 tests/genericxml2xmlindata/cachetune-colliding-monitors.xml diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 0cbf570..33d2890 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -758,6 +758,7 @@ <cachetune vcpus='0-3'> <cache id='0' level='3' type='both' size='3' unit='MiB'/> <cache id='1' level='3' type='both' size='3' unit='MiB'/> + <monitor vcpus='0-1'/> </cachetune> <memorytune vcpus='0-3'> <node id='0' bandwidth='60'/> @@ -942,8 +943,8 @@ <dl> <dt><code>cache</code></dt> <dd> - This element controls the allocation of CPU cache and has the - following attributes: + This optional element controls the allocation of CPU cache and has + the following attributes: <dl> <dt><code>level</code></dt> <dd> @@ -977,6 +978,15 @@ </dd> </dl> </dd> + <dt><code>monitor</code></dt> + <dd> + The optional element <code>monitor</code> creates the cahce + monitoring group(s) for current cache allocation group. The required + attribute <code>vcpus</code> specifies to which vCPUs this + monitoring group applies. A vCPU can only be member of one + <code>cachetune</code> element allocation. And no overlap is + permitted. + </dd> </dl> </dd> diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index f176538..83fb9b7 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -956,7 +956,7 @@ <attribute name="vcpus"> <ref name='cpuset'/> </attribute> - <oneOrMore> + <zeroOrMore> <element name="cache"> <attribute name="id"> <ref name='unsignedInt'/> @@ -980,7 +980,14 @@ </attribute> </optional> </element> - </oneOrMore> + </zeroOrMore> + <zeroOrMore> + <element name="monitor"> + <attribute name="vcpus"> + <ref name='cpuset'/> + </attribute> + </element> + </zeroOrMore> </element> </zeroOrMore> <zeroOrMore> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 9a65655..304a94e 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -2969,13 +2969,30 @@ virDomainLoaderDefFree(virDomainLoaderDefPtr loader) static void +virDomainResctrlMonFree(virDomainResctrlMonitorPtr monitor) +{ + if (!monitor) + return; + + VIR_FREE(monitor->id); + virBitmapFree(monitor->vcpus); + VIR_FREE(monitor); +} + + +static void virDomainResctrlDefFree(virDomainResctrlDefPtr resctrl) { + size_t i = 0; + if (!resctrl) return; virObjectUnref(resctrl->alloc); virBitmapFree(resctrl->vcpus); + for (i = 0; i < resctrl->nmonitors; i++) + virDomainResctrlMonFree(resctrl->monitors[i]); + VIR_FREE(resctrl->monitors); VIR_FREE(resctrl); } @@ -19298,6 +19315,71 @@ virDomainResctrlAppend(virDomainDefPtr def, static int +virDomainResctrlParseMonitor(virDomainDefPtr def, + xmlXPathContextPtr ctxt, + xmlNodePtr node, + virDomainResctrlDefPtr resctrl) +{ + xmlNodePtr oldnode = ctxt->node; + virBitmapPtr vcpus = NULL; + char *id = NULL; + int vcpu = -1; + char *vcpus_str = NULL; + virDomainResctrlMonitorPtr tmp_domresmon = NULL; + int ret = -1; + + if (!resctrl || !resctrl->vcpus || !resctrl->alloc) + return -1; + + ctxt->node = node; + + if (VIR_ALLOC(tmp_domresmon) < 0) + goto cleanup; + + if (virDomainResctrlParseVcpus(def, node, &vcpus) < 0) + goto cleanup; + + /* empty monitoring group is not allowed */ + if (virBitmapIsAllClear(vcpus)) + goto cleanup; + + while ((vcpu = virBitmapNextSetBit(vcpus, vcpu)) >= 0) { + if (!virBitmapIsBitSet(resctrl->vcpus, vcpu)) + goto cleanup; + } + + vcpus_str = virBitmapFormat(vcpus); + if (!vcpus_str) + goto cleanup; + + if (virAsprintf(&id, "vcpus_%s", vcpus_str) < 0) + goto cleanup; + + if (VIR_STRDUP(tmp_domresmon->id, id) < 0) + goto cleanup; + + tmp_domresmon->vcpus = vcpus; + + if (VIR_APPEND_ELEMENT(resctrl->monitors, + resctrl->nmonitors, + tmp_domresmon) < 0) + goto cleanup; + + if (virResctrlAllocSetMonitor(resctrl->alloc, id) < 0) + goto cleanup; + + tmp_domresmon = NULL; + ret = 0; + cleanup: + ctxt->node = oldnode; + VIR_FREE(id); + VIR_FREE(vcpus_str); + virDomainResctrlMonFree(tmp_domresmon); + return ret; +} + + +static int virDomainCachetuneDefParse(virDomainDefPtr def, xmlXPathContextPtr ctxt, xmlNodePtr node, @@ -19313,6 +19395,9 @@ virDomainCachetuneDefParse(virDomainDefPtr def, int n; int ret = -1; + if (VIR_ALLOC(tmp_resctrl) < 0) + return -1; + ctxt->node = node; if (virDomainResctrlParseVcpus(def, node, &vcpus) < 0) @@ -19347,30 +19432,40 @@ virDomainCachetuneDefParse(virDomainDefPtr def, goto cleanup; } - if (virResctrlAllocIsEmpty(alloc)) { - ret = 0; + tmp_resctrl->vcpus = vcpus; + tmp_resctrl->alloc = virObjectRef(alloc); + + VIR_FREE(nodes); + ctxt->node = node; + + if ((n = virXPathNodeSet("./monitor", ctxt, &nodes)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Cannot extract monitor nodes under cachetune")); goto cleanup; } - if (VIR_ALLOC(tmp_resctrl) < 0) - goto cleanup; + for (i = 0; i < n; i++) { + if (virDomainResctrlParseMonitor(def, ctxt, + nodes[i], tmp_resctrl) < 0) - tmp_resctrl->vcpus = vcpus; - tmp_resctrl->alloc = virObjectRef(alloc); + goto cleanup; + } + + if (virResctrlAllocIsEmpty(alloc)) { + VIR_WARN("cachetune: resctrl alloc is empty"); + ret = 0; + goto cleanup; + } if (virDomainResctrlAppend(def, node, tmp_resctrl, flags) < 0) goto cleanup; - alloc = NULL; - vcpus = NULL; tmp_resctrl = NULL; ret = 0; cleanup: ctxt->node = oldnode; - virObjectUnref(alloc); - VIR_FREE(tmp_resctrl); - virBitmapFree(vcpus); + virDomainResctrlDefFree(tmp_resctrl); VIR_FREE(nodes); return ret; } @@ -19588,10 +19683,8 @@ virDomainMemorytuneDefParse(virDomainDefPtr def, ret = 0; cleanup: ctxt->node = oldnode; - virBitmapFree(vcpus); - virObjectUnref(alloc); - VIR_FREE(tmp_resctrl); VIR_FREE(nodes); + virDomainResctrlDefFree(tmp_resctrl); return ret; } @@ -27394,6 +27487,7 @@ virDomainCachetuneDefFormat(virBufferPtr buf, { virBuffer childrenBuf = VIR_BUFFER_INITIALIZER; char *vcpus = NULL; + size_t i = 0; int ret = -1; virBufferSetChildIndent(&childrenBuf, buf); @@ -27405,6 +27499,15 @@ virDomainCachetuneDefFormat(virBufferPtr buf, if (virBufferCheckError(&childrenBuf) < 0) goto cleanup; + for (i = 0; i < resctrl->nmonitors; i++) { + vcpus = virBitmapFormat(resctrl->monitors[i]->vcpus); + if (!vcpus) + goto cleanup; + + virBufferAsprintf(&childrenBuf, "<monitor vcpus='%s'/>\n", vcpus); + VIR_FREE(vcpus); + } + if (!virBufferUse(&childrenBuf)) { ret = 0; goto cleanup; diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index c0ad072..797b4bd 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2235,12 +2235,31 @@ struct _virDomainCputune { }; +typedef enum { + VIR_DOMAIN_RESCTRL_MONITOR_CACHE, + VIR_DOMAIN_RESCTRL_MONITOR_MEMBW, + VIR_DOMAIN_RESCTRL_MONITOR_CACHE_MEMBW, + + VIR_DOMAIN_RESCTRL_MONITOR_LAST +} virDomainResctrlMonType; + +typedef struct _virDomainResctrlMonitor virDomainResctrlMonitor; +typedef virDomainResctrlMonitor *virDomainResctrlMonitorPtr; +struct _virDomainResctrlMonitor { + int type; /* virDomainResctrlMonType*/ + char *id; + virBitmapPtr vcpus; +}; + + typedef struct _virDomainResctrlDef virDomainResctrlDef; typedef virDomainResctrlDef *virDomainResctrlDefPtr; struct _virDomainResctrlDef { virBitmapPtr vcpus; virResctrlAllocPtr alloc; + virDomainResctrlMonitorPtr *monitors; + size_t nmonitors; }; @@ -3455,6 +3474,7 @@ VIR_ENUM_DECL(virDomainIOMMUModel) VIR_ENUM_DECL(virDomainVsockModel) VIR_ENUM_DECL(virDomainShmemModel) VIR_ENUM_DECL(virDomainLaunchSecurity) +VIR_ENUM_DECL(virDomainResctrlMonType) /* from libvirt.h */ VIR_ENUM_DECL(virDomainState) VIR_ENUM_DECL(virDomainNostateReason) diff --git a/tests/genericxml2xmlindata/cachetune-cdp.xml b/tests/genericxml2xmlindata/cachetune-cdp.xml index 9718f06..b257fd5 100644 --- a/tests/genericxml2xmlindata/cachetune-cdp.xml +++ b/tests/genericxml2xmlindata/cachetune-cdp.xml @@ -8,9 +8,11 @@ <cachetune vcpus='0-1'> <cache id='0' level='3' type='code' size='7680' unit='KiB'/> <cache id='1' level='3' type='data' size='3840' unit='KiB'/> + <monitor vcpus='0-1'/> </cachetune> <cachetune vcpus='2'> <cache id='1' level='3' type='code' size='6' unit='MiB'/> + <monitor vcpus='2'/> </cachetune> <cachetune vcpus='3'> <cache id='1' level='3' type='data' size='6912' unit='KiB'/> diff --git a/tests/genericxml2xmlindata/cachetune-colliding-monitors.xml b/tests/genericxml2xmlindata/cachetune-colliding-monitors.xml new file mode 100644 index 0000000..7526070 --- /dev/null +++ b/tests/genericxml2xmlindata/cachetune-colliding-monitors.xml @@ -0,0 +1,36 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219136</memory> + <currentMemory unit='KiB'>219136</currentMemory> + <vcpu placement='static'>4</vcpu> + <cputune> + <cachetune vcpus='0-1'> + <cache id='0' level='3' type='both' size='3' unit='MiB'/> + <cache id='1' level='3' type='both' size='3' unit='MiB'/> + <monitor vcpus='0-2'/> + <monitor vcpus='0'/> + </cachetune> + <cachetune vcpus='3'> + <cache id='0' level='3' type='both' size='3' unit='MiB'/> + <monitor vcpus='3'/> + </cachetune> + </cputune> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-i686</emulator> + <controller type='usb' index='0'/> + <controller type='ide' index='0'/> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/genericxml2xmlindata/cachetune-small.xml b/tests/genericxml2xmlindata/cachetune-small.xml index ab2d9cf..aa7b2c3 100644 --- a/tests/genericxml2xmlindata/cachetune-small.xml +++ b/tests/genericxml2xmlindata/cachetune-small.xml @@ -7,6 +7,7 @@ <cputune> <cachetune vcpus='0-1'> <cache id='0' level='3' type='both' size='768' unit='KiB'/> + <monitor vcpus='0-1'/> </cachetune> </cputune> <os> diff --git a/tests/genericxml2xmlindata/cachetune.xml b/tests/genericxml2xmlindata/cachetune.xml index 645cab7..52e95bc 100644 --- a/tests/genericxml2xmlindata/cachetune.xml +++ b/tests/genericxml2xmlindata/cachetune.xml @@ -8,9 +8,12 @@ <cachetune vcpus='0-1'> <cache id='0' level='3' type='both' size='3' unit='MiB'/> <cache id='1' level='3' type='both' size='3' unit='MiB'/> + <monitor vcpus='0-1'/> + <monitor vcpus='0'/> </cachetune> <cachetune vcpus='3'> <cache id='0' level='3' type='both' size='3' unit='MiB'/> + <monitor vcpus='3'/> </cachetune> </cputune> <os> diff --git a/tests/genericxml2xmltest.c b/tests/genericxml2xmltest.c index e6d4ef2..bc2fc50 100644 --- a/tests/genericxml2xmltest.c +++ b/tests/genericxml2xmltest.c @@ -140,11 +140,15 @@ mymain(void) TEST_COMPARE_DOM_XML2XML_RESULT_FAIL_PARSE); DO_TEST_FULL("cachetune-colliding-types", false, true, TEST_COMPARE_DOM_XML2XML_RESULT_FAIL_PARSE); + DO_TEST_FULL("cachetune-colliding-monitors", false, true, + TEST_COMPARE_DOM_XML2XML_RESULT_FAIL_PARSE); DO_TEST("memorytune"); DO_TEST_FULL("memorytune-colliding-allocs", false, true, TEST_COMPARE_DOM_XML2XML_RESULT_FAIL_PARSE); DO_TEST_FULL("memorytune-colliding-cachetune", false, true, TEST_COMPARE_DOM_XML2XML_RESULT_FAIL_PARSE); + DO_TEST_FULL("cachetune-colliding-monitors", false, true, + TEST_COMPARE_DOM_XML2XML_RESULT_FAIL_PARSE); DO_TEST("tseg"); -- 2.7.4

On 08/27/2018 07:23 AM, Wang Huaqiang wrote:
Introduce resource monitoring group in domain configuration file to support CPU cache monitoring technology (CMT).
Domain rng file changes, supporting following types of resource monitoring group regarding the allocation regin it belongs to: 1. monitoring group that working for partial working thread of current allocation: e.g. "<monitor vcpus='0'/>" creates monitoring group special for vcpu '0' while an allocation group is created for vcpus of '0' *and* '1'. 2. monitoring group for whole vcpu set of current allocation: e.g. "<monitor vcpus='0-1'/>" creates monitoring group for all vcpus belonging to current allocation. 3. monitoring group for vcpu(s) that does not have dedicated allocation group: e.g. "<monitor vcpus='3'/>" creates a monitoring group but no resource control applied to it.
<cputune> <cachetune vcpus='0-1'> <cache id='0' level='3' type='code' size='7680' unit='KiB'/> <cache id='1' level='3' type='data' size='3840' unit='KiB'/> + <monitor vcpus='0-1'/> + <monitor vcpus='0'/> </cachetune> <cachetune vcpus='2'> <cache id='1' level='3' type='code' size='6' unit='MiB'/> + <monitor vcpus='2'/> </cachetune> <cachetune vcpus='3'> + <monitor vcpus='3'/> </cachetune> </cputune>
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com> --- docs/formatdomain.html.in | 14 ++- docs/schemas/domaincommon.rng | 11 +- src/conf/domain_conf.c | 131 ++++++++++++++++++--- src/conf/domain_conf.h | 20 ++++ tests/genericxml2xmlindata/cachetune-cdp.xml | 2 + .../cachetune-colliding-monitors.xml | 36 ++++++ tests/genericxml2xmlindata/cachetune-small.xml | 1 + tests/genericxml2xmlindata/cachetune.xml | 3 + tests/genericxml2xmltest.c | 4 + 9 files changed, 204 insertions(+), 18 deletions(-) create mode 100644 tests/genericxml2xmlindata/cachetune-colliding-monitors.xml
Getting more difficult to keep these changes and my suggested alterations in the same context.
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 0cbf570..33d2890 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -758,6 +758,7 @@ <cachetune vcpus='0-3'> <cache id='0' level='3' type='both' size='3' unit='MiB'/> <cache id='1' level='3' type='both' size='3' unit='MiB'/> + <monitor vcpus='0-1'/>
Interesting that for domain <monitor> is at the same level (or child relationship) as <cache>, but for the capabilities it was a child of the <bank> which honestly is confusing.
</cachetune> <memorytune vcpus='0-3'> <node id='0' bandwidth='60'/> @@ -942,8 +943,8 @@ <dl> <dt><code>cache</code></dt> <dd> - This element controls the allocation of CPU cache and has the - following attributes: + This optional element controls the allocation of CPU cache and has + the following attributes:
So <cache> is optional now?! That needs to be separate.
<dl> <dt><code>level</code></dt> <dd> @@ -977,6 +978,15 @@ </dd> </dl> </dd> + <dt><code>monitor</code></dt> + <dd> + The optional element <code>monitor</code> creates the cahce
cache
+ monitoring group(s) for current cache allocation group. The required + attribute <code>vcpus</code> specifies to which vCPUs this + monitoring group applies. A vCPU can only be member of one + <code>cachetune</code> element allocation. And no overlap is + permitted.
And it only works for L3 <cache>'s right?
+ </dd> </dl> </dd>
diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index f176538..83fb9b7 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -956,7 +956,7 @@ <attribute name="vcpus"> <ref name='cpuset'/> </attribute> - <oneOrMore> + <zeroOrMore>
!! Needs to be separate
<element name="cache"> <attribute name="id"> <ref name='unsignedInt'/> @@ -980,7 +980,14 @@ </attribute> </optional> </element> - </oneOrMore> + </zeroOrMore> + <zeroOrMore> + <element name="monitor"> + <attribute name="vcpus"> + <ref name='cpuset'/> + </attribute> + </element> + </zeroOrMore> </element> </zeroOrMore> <zeroOrMore> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 9a65655..304a94e 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -2969,13 +2969,30 @@ virDomainLoaderDefFree(virDomainLoaderDefPtr loader)
static void +virDomainResctrlMonFree(virDomainResctrlMonitorPtr monitor) +{ + if (!monitor) + return; + + VIR_FREE(monitor->id); + virBitmapFree(monitor->vcpus); + VIR_FREE(monitor); +} + + +static void virDomainResctrlDefFree(virDomainResctrlDefPtr resctrl) { + size_t i = 0; + if (!resctrl) return;
virObjectUnref(resctrl->alloc); virBitmapFree(resctrl->vcpus); + for (i = 0; i < resctrl->nmonitors; i++) + virDomainResctrlMonFree(resctrl->monitors[i]); + VIR_FREE(resctrl->monitors); VIR_FREE(resctrl); }
@@ -19298,6 +19315,71 @@ virDomainResctrlAppend(virDomainDefPtr def,
static int +virDomainResctrlParseMonitor(virDomainDefPtr def, + xmlXPathContextPtr ctxt, + xmlNodePtr node, + virDomainResctrlDefPtr resctrl) +{ + xmlNodePtr oldnode = ctxt->node; + virBitmapPtr vcpus = NULL; + char *id = NULL; + int vcpu = -1; + char *vcpus_str = NULL; + virDomainResctrlMonitorPtr tmp_domresmon = NULL;
The "tmp_" prefix doesn't seem necessary...
+ int ret = -1; + + if (!resctrl || !resctrl->vcpus || !resctrl->alloc) + return -1; + + ctxt->node = node; + + if (VIR_ALLOC(tmp_domresmon) < 0) + goto cleanup;
We don't need/use this until ... [1]
+ + if (virDomainResctrlParseVcpus(def, node, &vcpus) < 0) + goto cleanup; + + /* empty monitoring group is not allowed */ + if (virBitmapIsAllClear(vcpus))
So we'll fail without an error? How is the consumer supposed to know that providing the empty set isn't valid?
+ goto cleanup; + + while ((vcpu = virBitmapNextSetBit(vcpus, vcpu)) >= 0) { + if (!virBitmapIsBitSet(resctrl->vcpus, vcpu))
Again fail without an error? How would someone know that what they've provided doesn't 'work' properly because the resctrl->vcpus doesn't have that vcpu in it's list?
+ goto cleanup; + } + + vcpus_str = virBitmapFormat(vcpus); + if (!vcpus_str) + goto cleanup; +
[1] right about here
+ if (virAsprintf(&id, "vcpus_%s", vcpus_str) < 0) + goto cleanup; + + if (VIR_STRDUP(tmp_domresmon->id, id) < 0) + goto cleanup;
The two steps are unnecessary since @id is VIR_FREE'd anyway. Let's just: if (virAsprintf(&domresmon->id, "vcpus_%s", vcpus_str) < 0) goto cleanup;
+ + tmp_domresmon->vcpus = vcpus; + + if (VIR_APPEND_ELEMENT(resctrl->monitors, + resctrl->nmonitors, + tmp_domresmon) < 0) + goto cleanup; + + if (virResctrlAllocSetMonitor(resctrl->alloc, id) < 0) + goto cleanup; + + tmp_domresmon = NULL;
Shouldn't this go after VIR_APPEND_ELEMENT? otherwise we could end up in cleanup with it on resctrl->monitors *and* virDomainResctrlMonFree is called.
+ ret = 0; + cleanup: + ctxt->node = oldnode; + VIR_FREE(id); + VIR_FREE(vcpus_str); + virDomainResctrlMonFree(tmp_domresmon); + return ret; +} + + +static int virDomainCachetuneDefParse(virDomainDefPtr def, xmlXPathContextPtr ctxt, xmlNodePtr node, @@ -19313,6 +19395,9 @@ virDomainCachetuneDefParse(virDomainDefPtr def, int n; int ret = -1;
+ if (VIR_ALLOC(tmp_resctrl) < 0) + return -1; + ctxt->node = node;
if (virDomainResctrlParseVcpus(def, node, &vcpus) < 0) @@ -19347,30 +19432,40 @@ virDomainCachetuneDefParse(virDomainDefPtr def, goto cleanup; }
- if (virResctrlAllocIsEmpty(alloc)) { - ret = 0; + tmp_resctrl->vcpus = vcpus; + tmp_resctrl->alloc = virObjectRef(alloc); + + VIR_FREE(nodes); + ctxt->node = node; + + if ((n = virXPathNodeSet("./monitor", ctxt, &nodes)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Cannot extract monitor nodes under cachetune")); goto cleanup; }
- if (VIR_ALLOC(tmp_resctrl) < 0) - goto cleanup; + for (i = 0; i < n; i++) { + if (virDomainResctrlParseMonitor(def, ctxt, + nodes[i], tmp_resctrl) < 0)
Hmmm - something slightly different with this ordering which makes my previous patch comments not work as well.
- tmp_resctrl->vcpus = vcpus; - tmp_resctrl->alloc = virObjectRef(alloc); + goto cleanup; + } + + if (virResctrlAllocIsEmpty(alloc)) { + VIR_WARN("cachetune: resctrl alloc is empty"); + ret = 0; + goto cleanup; + }
So if I reconsider slightly my previous patch because now we need a trip through virDomainResctrlParseMonitor, we could have: virDomainResctrlDefNew(alloc, vcpus): if (VIR_ALLOC(resctrl) < 0) return NULL; resctrl->alloc = virObjectRef(alloc); resctrl->vcpus = vcpus; return resctrl; Back in the caller we have: if (!(resctrl = virDomainResctrlDefNew(alloc, vcpus))) goto cleanup; alloc = NULL; vcpus = NULL; Then calling virDomainResctrlAppend using @resctrl: if (virDomainResctrlAppend(def, node, resctrl, flags) < 0) goto cleanup; resctrl = NULL; ... cleanup: ... virDomainResctrlDefFree(resctrl); I think doing this gives the flexibility to this code to make that virDomainResctrlParseMonitor call before appending the new resctrl There's so much changing now - I'm just going to stop here and see how things shake out in the next series. One other note first though - in patch 10 in qemuDomainGetStatsCpuResource the "unsigned int nmonitor = NULL;" failed the compiler rather spectacularly... John
if (virDomainResctrlAppend(def, node, tmp_resctrl, flags) < 0) goto cleanup;
- alloc = NULL; - vcpus = NULL; tmp_resctrl = NULL;
ret = 0; cleanup: ctxt->node = oldnode; - virObjectUnref(alloc); - VIR_FREE(tmp_resctrl); - virBitmapFree(vcpus); + virDomainResctrlDefFree(tmp_resctrl); VIR_FREE(nodes); return ret; } @@ -19588,10 +19683,8 @@ virDomainMemorytuneDefParse(virDomainDefPtr def, ret = 0; cleanup: ctxt->node = oldnode; - virBitmapFree(vcpus); - virObjectUnref(alloc); - VIR_FREE(tmp_resctrl); VIR_FREE(nodes); + virDomainResctrlDefFree(tmp_resctrl); return ret; }
@@ -27394,6 +27487,7 @@ virDomainCachetuneDefFormat(virBufferPtr buf, { virBuffer childrenBuf = VIR_BUFFER_INITIALIZER; char *vcpus = NULL; + size_t i = 0; int ret = -1;
virBufferSetChildIndent(&childrenBuf, buf); @@ -27405,6 +27499,15 @@ virDomainCachetuneDefFormat(virBufferPtr buf, if (virBufferCheckError(&childrenBuf) < 0) goto cleanup;
+ for (i = 0; i < resctrl->nmonitors; i++) { + vcpus = virBitmapFormat(resctrl->monitors[i]->vcpus); + if (!vcpus) + goto cleanup; + + virBufferAsprintf(&childrenBuf, "<monitor vcpus='%s'/>\n", vcpus); + VIR_FREE(vcpus); + } + if (!virBufferUse(&childrenBuf)) { ret = 0; goto cleanup; diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index c0ad072..797b4bd 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2235,12 +2235,31 @@ struct _virDomainCputune { };
+typedef enum { + VIR_DOMAIN_RESCTRL_MONITOR_CACHE, + VIR_DOMAIN_RESCTRL_MONITOR_MEMBW, + VIR_DOMAIN_RESCTRL_MONITOR_CACHE_MEMBW, + + VIR_DOMAIN_RESCTRL_MONITOR_LAST +} virDomainResctrlMonType; + +typedef struct _virDomainResctrlMonitor virDomainResctrlMonitor; +typedef virDomainResctrlMonitor *virDomainResctrlMonitorPtr; +struct _virDomainResctrlMonitor { + int type; /* virDomainResctrlMonType*/ + char *id; + virBitmapPtr vcpus; +}; + + typedef struct _virDomainResctrlDef virDomainResctrlDef; typedef virDomainResctrlDef *virDomainResctrlDefPtr;
struct _virDomainResctrlDef { virBitmapPtr vcpus; virResctrlAllocPtr alloc; + virDomainResctrlMonitorPtr *monitors; + size_t nmonitors; };
@@ -3455,6 +3474,7 @@ VIR_ENUM_DECL(virDomainIOMMUModel) VIR_ENUM_DECL(virDomainVsockModel) VIR_ENUM_DECL(virDomainShmemModel) VIR_ENUM_DECL(virDomainLaunchSecurity) +VIR_ENUM_DECL(virDomainResctrlMonType) /* from libvirt.h */ VIR_ENUM_DECL(virDomainState) VIR_ENUM_DECL(virDomainNostateReason) diff --git a/tests/genericxml2xmlindata/cachetune-cdp.xml b/tests/genericxml2xmlindata/cachetune-cdp.xml index 9718f06..b257fd5 100644 --- a/tests/genericxml2xmlindata/cachetune-cdp.xml +++ b/tests/genericxml2xmlindata/cachetune-cdp.xml @@ -8,9 +8,11 @@ <cachetune vcpus='0-1'> <cache id='0' level='3' type='code' size='7680' unit='KiB'/> <cache id='1' level='3' type='data' size='3840' unit='KiB'/> + <monitor vcpus='0-1'/> </cachetune> <cachetune vcpus='2'> <cache id='1' level='3' type='code' size='6' unit='MiB'/> + <monitor vcpus='2'/> </cachetune> <cachetune vcpus='3'> <cache id='1' level='3' type='data' size='6912' unit='KiB'/> diff --git a/tests/genericxml2xmlindata/cachetune-colliding-monitors.xml b/tests/genericxml2xmlindata/cachetune-colliding-monitors.xml new file mode 100644 index 0000000..7526070 --- /dev/null +++ b/tests/genericxml2xmlindata/cachetune-colliding-monitors.xml @@ -0,0 +1,36 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219136</memory> + <currentMemory unit='KiB'>219136</currentMemory> + <vcpu placement='static'>4</vcpu> + <cputune> + <cachetune vcpus='0-1'> + <cache id='0' level='3' type='both' size='3' unit='MiB'/> + <cache id='1' level='3' type='both' size='3' unit='MiB'/> + <monitor vcpus='0-2'/> + <monitor vcpus='0'/> + </cachetune> + <cachetune vcpus='3'> + <cache id='0' level='3' type='both' size='3' unit='MiB'/> + <monitor vcpus='3'/> + </cachetune> + </cputune> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-i686</emulator> + <controller type='usb' index='0'/> + <controller type='ide' index='0'/> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/genericxml2xmlindata/cachetune-small.xml b/tests/genericxml2xmlindata/cachetune-small.xml index ab2d9cf..aa7b2c3 100644 --- a/tests/genericxml2xmlindata/cachetune-small.xml +++ b/tests/genericxml2xmlindata/cachetune-small.xml @@ -7,6 +7,7 @@ <cputune> <cachetune vcpus='0-1'> <cache id='0' level='3' type='both' size='768' unit='KiB'/> + <monitor vcpus='0-1'/> </cachetune> </cputune> <os> diff --git a/tests/genericxml2xmlindata/cachetune.xml b/tests/genericxml2xmlindata/cachetune.xml index 645cab7..52e95bc 100644 --- a/tests/genericxml2xmlindata/cachetune.xml +++ b/tests/genericxml2xmlindata/cachetune.xml @@ -8,9 +8,12 @@ <cachetune vcpus='0-1'> <cache id='0' level='3' type='both' size='3' unit='MiB'/> <cache id='1' level='3' type='both' size='3' unit='MiB'/> + <monitor vcpus='0-1'/> + <monitor vcpus='0'/> </cachetune> <cachetune vcpus='3'> <cache id='0' level='3' type='both' size='3' unit='MiB'/> + <monitor vcpus='3'/> </cachetune> </cputune> <os> diff --git a/tests/genericxml2xmltest.c b/tests/genericxml2xmltest.c index e6d4ef2..bc2fc50 100644 --- a/tests/genericxml2xmltest.c +++ b/tests/genericxml2xmltest.c @@ -140,11 +140,15 @@ mymain(void) TEST_COMPARE_DOM_XML2XML_RESULT_FAIL_PARSE); DO_TEST_FULL("cachetune-colliding-types", false, true, TEST_COMPARE_DOM_XML2XML_RESULT_FAIL_PARSE); + DO_TEST_FULL("cachetune-colliding-monitors", false, true, + TEST_COMPARE_DOM_XML2XML_RESULT_FAIL_PARSE); DO_TEST("memorytune"); DO_TEST_FULL("memorytune-colliding-allocs", false, true, TEST_COMPARE_DOM_XML2XML_RESULT_FAIL_PARSE); DO_TEST_FULL("memorytune-colliding-cachetune", false, true, TEST_COMPARE_DOM_XML2XML_RESULT_FAIL_PARSE); + DO_TEST_FULL("cachetune-colliding-monitors", false, true, + TEST_COMPARE_DOM_XML2XML_RESULT_FAIL_PARSE);
DO_TEST("tseg");

-----Original Message----- From: John Ferlan [mailto:jferlan@redhat.com] Sent: Thursday, September 6, 2018 12:39 AM To: Wang, Huaqiang <huaqiang.wang@intel.com>; libvir-list@redhat.com Cc: Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [PATCH 08/10] conf: introduce resctrl monitor group in domain
On 08/27/2018 07:23 AM, Wang Huaqiang wrote:
Introduce resource monitoring group in domain configuration file to support CPU cache monitoring technology (CMT).
Domain rng file changes, supporting following types of resource monitoring group regarding the allocation regin it belongs to: 1. monitoring group that working for partial working thread of current allocation: e.g. "<monitor vcpus='0'/>" creates monitoring group special for vcpu '0' while an allocation group is created for vcpus of '0' *and* '1'. 2. monitoring group for whole vcpu set of current allocation: e.g. "<monitor vcpus='0-1'/>" creates monitoring group for all vcpus belonging to current allocation. 3. monitoring group for vcpu(s) that does not have dedicated allocation group: e.g. "<monitor vcpus='3'/>" creates a monitoring group but no resource control applied to it.
<cputune> <cachetune vcpus='0-1'> <cache id='0' level='3' type='code' size='7680' unit='KiB'/> <cache id='1' level='3' type='data' size='3840' unit='KiB'/> + <monitor vcpus='0-1'/> + <monitor vcpus='0'/> </cachetune> <cachetune vcpus='2'> <cache id='1' level='3' type='code' size='6' unit='MiB'/> + <monitor vcpus='2'/> </cachetune> <cachetune vcpus='3'> + <monitor vcpus='3'/> </cachetune> </cputune>
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com> --- docs/formatdomain.html.in | 14 ++- docs/schemas/domaincommon.rng | 11 +- src/conf/domain_conf.c | 131 ++++++++++++++++++--- src/conf/domain_conf.h | 20 ++++ tests/genericxml2xmlindata/cachetune-cdp.xml | 2 + .../cachetune-colliding-monitors.xml | 36 ++++++ tests/genericxml2xmlindata/cachetune-small.xml | 1 + tests/genericxml2xmlindata/cachetune.xml | 3 + tests/genericxml2xmltest.c | 4 + 9 files changed, 204 insertions(+), 18 deletions(-) create mode 100644 tests/genericxml2xmlindata/cachetune-colliding-monitors.xml
Getting more difficult to keep these changes and my suggested alterations in the same context.
OK, let's be patient to fix the gap ...
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 0cbf570..33d2890 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -758,6 +758,7 @@ <cachetune vcpus='0-3'> <cache id='0' level='3' type='both' size='3' unit='MiB'/> <cache id='1' level='3' type='both' size='3' unit='MiB'/> + <monitor vcpus='0-1'/>
Interesting that for domain <monitor> is at the same level (or child relationship) as <cache>, but for the capabilities it was a child of the <bank> which honestly is confusing.
Have read your second batch of comments. Now I understand your points for showing the monitor's capability, and basically agree with you. Let's have discuss based on your latest comments of that email.
</cachetune> <memorytune vcpus='0-3'> <node id='0' bandwidth='60'/> @@ -942,8 +943,8 @@ <dl> <dt><code>cache</code></dt> <dd> - This element controls the allocation of CPU cache and has the - following attributes: + This optional element controls the allocation of CPU cache and has + the following attributes:
So <cache> is optional now?! That needs to be separate.
Will create a separate patch to elaborating on reason why 'cache' is being optional for your review. I outlined some of my considerations for making this being optional in discussion of patch 7. Hope you understand my logic.
<dl> <dt><code>level</code></dt> <dd> @@ -977,6 +978,15 @@ </dd> </dl> </dd> + <dt><code>monitor</code></dt> + <dd> + The optional element <code>monitor</code> creates the + cahce
cache
OK. cahce -> cache
+ monitoring group(s) for current cache allocation group. The required + attribute <code>vcpus</code> specifies to which vCPUs this + monitoring group applies. A vCPU can only be member of one + <code>cachetune</code> element allocation. And no overlap is + permitted.
And it only works for L3 <cache>'s right?
+ </dd> </dl> </dd>
diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index f176538..83fb9b7 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -956,7 +956,7 @@ <attribute name="vcpus"> <ref name='cpuset'/> </attribute> - <oneOrMore> + <zeroOrMore>
!! Needs to be separate
This part and the changes in formatdomain.html.in will be separated to another patch.
<element name="cache"> <attribute name="id"> <ref name='unsignedInt'/> @@ -980,7 +980,14 @@ </attribute> </optional> </element> - </oneOrMore> + </zeroOrMore> + <zeroOrMore> + <element name="monitor"> + <attribute name="vcpus"> + <ref name='cpuset'/> + </attribute> + </element> + </zeroOrMore> </element> </zeroOrMore> <zeroOrMore> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 9a65655..304a94e 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -2969,13 +2969,30 @@ virDomainLoaderDefFree(virDomainLoaderDefPtr loader)
static void +virDomainResctrlMonFree(virDomainResctrlMonitorPtr monitor) { + if (!monitor) + return; + + VIR_FREE(monitor->id); + virBitmapFree(monitor->vcpus); + VIR_FREE(monitor); +} + + +static void virDomainResctrlDefFree(virDomainResctrlDefPtr resctrl) { + size_t i = 0; + if (!resctrl) return;
virObjectUnref(resctrl->alloc); virBitmapFree(resctrl->vcpus); + for (i = 0; i < resctrl->nmonitors; i++) + virDomainResctrlMonFree(resctrl->monitors[i]); + VIR_FREE(resctrl->monitors); VIR_FREE(resctrl); }
@@ -19298,6 +19315,71 @@ virDomainResctrlAppend(virDomainDefPtr def,
static int +virDomainResctrlParseMonitor(virDomainDefPtr def, + xmlXPathContextPtr ctxt, + xmlNodePtr node, + virDomainResctrlDefPtr resctrl) { + xmlNodePtr oldnode = ctxt->node; + virBitmapPtr vcpus = NULL; + char *id = NULL; + int vcpu = -1; + char *vcpus_str = NULL; + virDomainResctrlMonitorPtr tmp_domresmon = NULL;
The "tmp_" prefix doesn't seem necessary...
Prefix will be removed.
+ int ret = -1; + + if (!resctrl || !resctrl->vcpus || !resctrl->alloc) + return -1; + + ctxt->node = node; + + if (VIR_ALLOC(tmp_domresmon) < 0) + goto cleanup;
We don't need/use this until ... [1]
+ + if (virDomainResctrlParseVcpus(def, node, &vcpus) < 0) + goto cleanup; + + /* empty monitoring group is not allowed */ + if (virBitmapIsAllClear(vcpus))
So we'll fail without an error? How is the consumer supposed to know that providing the empty set isn't valid?
+ goto cleanup; + + while ((vcpu = virBitmapNextSetBit(vcpus, vcpu)) >= 0) { + if (!virBitmapIsBitSet(resctrl->vcpus, vcpu))
Again fail without an error? How would someone know that what they've provided doesn't 'work' properly because the resctrl->vcpus doesn't have that vcpu in it's list?
Should report an error message. Will be fixed.
+ goto cleanup; + } + + vcpus_str = virBitmapFormat(vcpus); + if (!vcpus_str) + goto cleanup; +
[1] right about here
+ if (virAsprintf(&id, "vcpus_%s", vcpus_str) < 0) + goto cleanup; + + if (VIR_STRDUP(tmp_domresmon->id, id) < 0) + goto cleanup;
The two steps are unnecessary since @id is VIR_FREE'd anyway. Let's just:
if (virAsprintf(&domresmon->id, "vcpus_%s", vcpus_str) < 0) goto cleanup;
Will be fixed. thanks.
+ + tmp_domresmon->vcpus = vcpus; + + if (VIR_APPEND_ELEMENT(resctrl->monitors, + resctrl->nmonitors, + tmp_domresmon) < 0) + goto cleanup; + + if (virResctrlAllocSetMonitor(resctrl->alloc, id) < 0) + goto cleanup; + + tmp_domresmon = NULL;
Shouldn't this go after VIR_APPEND_ELEMENT? otherwise we could end up in cleanup with it on resctrl->monitors *and* virDomainResctrlMonFree is called.
Yes, the resctrl->monitors array, as well as the newly appended element, should be cleaned by virDomainResctrlDefFree, when virResctrlAllocSetMonitor returns an error. Will be changed to: " if (VIR_APPEND_ELEMENT(resctrl->monitors, resctrl->nmonitors, domresmon) < 0) goto cleanup; domresmon = NULL; if (virResctrlAllocSetMonitor(resctrl->alloc, id) < 0) goto cleanup; " Thanks for catching this bug.
+ ret = 0; + cleanup: + ctxt->node = oldnode; + VIR_FREE(id); + VIR_FREE(vcpus_str); + virDomainResctrlMonFree(tmp_domresmon); + return ret; +} + + +static int virDomainCachetuneDefParse(virDomainDefPtr def, xmlXPathContextPtr ctxt, xmlNodePtr node, @@ -19313,6 +19395,9 @@ virDomainCachetuneDefParse(virDomainDefPtr def, int n; int ret = -1;
+ if (VIR_ALLOC(tmp_resctrl) < 0) + return -1; + ctxt->node = node;
if (virDomainResctrlParseVcpus(def, node, &vcpus) < 0) @@ -19347,30 +19432,40 @@ virDomainCachetuneDefParse(virDomainDefPtr def, goto cleanup; }
- if (virResctrlAllocIsEmpty(alloc)) { - ret = 0; + tmp_resctrl->vcpus = vcpus; + tmp_resctrl->alloc = virObjectRef(alloc); + + VIR_FREE(nodes); + ctxt->node = node; + + if ((n = virXPathNodeSet("./monitor", ctxt, &nodes)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Cannot extract monitor nodes under + cachetune")); goto cleanup; }
- if (VIR_ALLOC(tmp_resctrl) < 0) - goto cleanup; + for (i = 0; i < n; i++) { + if (virDomainResctrlParseMonitor(def, ctxt, + nodes[i], tmp_resctrl) < 0)
Hmmm - something slightly different with this ordering which makes my previous patch comments not work as well.
- tmp_resctrl->vcpus = vcpus; - tmp_resctrl->alloc = virObjectRef(alloc); + goto cleanup; + } + + if (virResctrlAllocIsEmpty(alloc)) { + VIR_WARN("cachetune: resctrl alloc is empty"); + ret = 0; + goto cleanup; + }
So if I reconsider slightly my previous patch because now we need a trip through virDomainResctrlParseMonitor, we could have:
virDomainResctrlDefNew(alloc, vcpus):
if (VIR_ALLOC(resctrl) < 0) return NULL;
resctrl->alloc = virObjectRef(alloc); resctrl->vcpus = vcpus; return resctrl;
Back in the caller we have:
if (!(resctrl = virDomainResctrlDefNew(alloc, vcpus))) goto cleanup; alloc = NULL; vcpus = NULL;
Then calling virDomainResctrlAppend using @resctrl:
if (virDomainResctrlAppend(def, node, resctrl, flags) < 0) goto cleanup; resctrl = NULL;
...
cleanup: ... virDomainResctrlDefFree(resctrl);
I think doing this gives the flexibility to this code to make that virDomainResctrlParseMonitor call before appending the new resctrl
Thanks for suggestion, I will evaluate both of your helpers. Currently it seems the virDomainResctrlParseMonitor works for the existing two callers, we may put it into virDomainResctrlCreate, the helper you suggested in comments of last patch. Anyway, I'll make changes to make this piece of code more clearly.
There's so much changing now - I'm just going to stop here and see how things shake out in the next series.
One other note first though - in patch 10 in qemuDomainGetStatsCpuResource the "unsigned int nmonitor = NULL;" failed the compiler rather spectacularly...
It is an copy-paste error! Will be fixed. "unsigned int nmonitor = 0;" Many thanks for your suggestions. Huaqiang
John
if (virDomainResctrlAppend(def, node, tmp_resctrl, flags) < 0) goto cleanup;
- alloc = NULL; - vcpus = NULL; tmp_resctrl = NULL;
ret = 0; cleanup: ctxt->node = oldnode; - virObjectUnref(alloc); - VIR_FREE(tmp_resctrl); - virBitmapFree(vcpus); + virDomainResctrlDefFree(tmp_resctrl); VIR_FREE(nodes); return ret; } @@ -19588,10 +19683,8 @@
virDomainMemorytuneDefParse(virDomainDefPtr def,
ret = 0; cleanup: ctxt->node = oldnode; - virBitmapFree(vcpus); - virObjectUnref(alloc); - VIR_FREE(tmp_resctrl); VIR_FREE(nodes); + virDomainResctrlDefFree(tmp_resctrl); return ret; }
@@ -27394,6 +27487,7 @@ virDomainCachetuneDefFormat(virBufferPtr buf, { virBuffer childrenBuf = VIR_BUFFER_INITIALIZER; char *vcpus = NULL; + size_t i = 0; int ret = -1;
virBufferSetChildIndent(&childrenBuf, buf); @@ -27405,6 +27499,15 @@ virDomainCachetuneDefFormat(virBufferPtr buf, if (virBufferCheckError(&childrenBuf) < 0) goto cleanup;
+ for (i = 0; i < resctrl->nmonitors; i++) { + vcpus = virBitmapFormat(resctrl->monitors[i]->vcpus); + if (!vcpus) + goto cleanup; + + virBufferAsprintf(&childrenBuf, "<monitor vcpus='%s'/>\n", vcpus); + VIR_FREE(vcpus); + } + if (!virBufferUse(&childrenBuf)) { ret = 0; goto cleanup; diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index c0ad072..797b4bd 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2235,12 +2235,31 @@ struct _virDomainCputune { };
+typedef enum { + VIR_DOMAIN_RESCTRL_MONITOR_CACHE, + VIR_DOMAIN_RESCTRL_MONITOR_MEMBW, + VIR_DOMAIN_RESCTRL_MONITOR_CACHE_MEMBW, + + VIR_DOMAIN_RESCTRL_MONITOR_LAST +} virDomainResctrlMonType; + +typedef struct _virDomainResctrlMonitor virDomainResctrlMonitor; +typedef virDomainResctrlMonitor *virDomainResctrlMonitorPtr; struct +_virDomainResctrlMonitor { + int type; /* virDomainResctrlMonType*/ + char *id; + virBitmapPtr vcpus; +}; + + typedef struct _virDomainResctrlDef virDomainResctrlDef; typedef virDomainResctrlDef *virDomainResctrlDefPtr;
struct _virDomainResctrlDef { virBitmapPtr vcpus; virResctrlAllocPtr alloc; + virDomainResctrlMonitorPtr *monitors; + size_t nmonitors; };
@@ -3455,6 +3474,7 @@ VIR_ENUM_DECL(virDomainIOMMUModel) VIR_ENUM_DECL(virDomainVsockModel) VIR_ENUM_DECL(virDomainShmemModel) VIR_ENUM_DECL(virDomainLaunchSecurity) +VIR_ENUM_DECL(virDomainResctrlMonType) /* from libvirt.h */ VIR_ENUM_DECL(virDomainState) VIR_ENUM_DECL(virDomainNostateReason) diff --git a/tests/genericxml2xmlindata/cachetune-cdp.xml b/tests/genericxml2xmlindata/cachetune-cdp.xml index 9718f06..b257fd5 100644 --- a/tests/genericxml2xmlindata/cachetune-cdp.xml +++ b/tests/genericxml2xmlindata/cachetune-cdp.xml @@ -8,9 +8,11 @@ <cachetune vcpus='0-1'> <cache id='0' level='3' type='code' size='7680' unit='KiB'/> <cache id='1' level='3' type='data' size='3840' unit='KiB'/> + <monitor vcpus='0-1'/> </cachetune> <cachetune vcpus='2'> <cache id='1' level='3' type='code' size='6' unit='MiB'/> + <monitor vcpus='2'/> </cachetune> <cachetune vcpus='3'> <cache id='1' level='3' type='data' size='6912' unit='KiB'/> diff --git a/tests/genericxml2xmlindata/cachetune-colliding-monitors.xml b/tests/genericxml2xmlindata/cachetune-colliding-monitors.xml new file mode 100644 index 0000000..7526070 --- /dev/null +++ b/tests/genericxml2xmlindata/cachetune-colliding-monitors.xml @@ -0,0 +1,36 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219136</memory> + <currentMemory unit='KiB'>219136</currentMemory> + <vcpu placement='static'>4</vcpu> + <cputune> + <cachetune vcpus='0-1'> + <cache id='0' level='3' type='both' size='3' unit='MiB'/> + <cache id='1' level='3' type='both' size='3' unit='MiB'/> + <monitor vcpus='0-2'/> + <monitor vcpus='0'/> + </cachetune> + <cachetune vcpus='3'> + <cache id='0' level='3' type='both' size='3' unit='MiB'/> + <monitor vcpus='3'/> + </cachetune> + </cputune> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-i686</emulator> + <controller type='usb' index='0'/> + <controller type='ide' index='0'/> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/genericxml2xmlindata/cachetune-small.xml b/tests/genericxml2xmlindata/cachetune-small.xml index ab2d9cf..aa7b2c3 100644 --- a/tests/genericxml2xmlindata/cachetune-small.xml +++ b/tests/genericxml2xmlindata/cachetune-small.xml @@ -7,6 +7,7 @@ <cputune> <cachetune vcpus='0-1'> <cache id='0' level='3' type='both' size='768' unit='KiB'/> + <monitor vcpus='0-1'/> </cachetune> </cputune> <os> diff --git a/tests/genericxml2xmlindata/cachetune.xml b/tests/genericxml2xmlindata/cachetune.xml index 645cab7..52e95bc 100644 --- a/tests/genericxml2xmlindata/cachetune.xml +++ b/tests/genericxml2xmlindata/cachetune.xml @@ -8,9 +8,12 @@ <cachetune vcpus='0-1'> <cache id='0' level='3' type='both' size='3' unit='MiB'/> <cache id='1' level='3' type='both' size='3' unit='MiB'/> + <monitor vcpus='0-1'/> + <monitor vcpus='0'/> </cachetune> <cachetune vcpus='3'> <cache id='0' level='3' type='both' size='3' unit='MiB'/> + <monitor vcpus='3'/> </cachetune> </cputune> <os> diff --git a/tests/genericxml2xmltest.c b/tests/genericxml2xmltest.c index e6d4ef2..bc2fc50 100644 --- a/tests/genericxml2xmltest.c +++ b/tests/genericxml2xmltest.c @@ -140,11 +140,15 @@ mymain(void) TEST_COMPARE_DOM_XML2XML_RESULT_FAIL_PARSE); DO_TEST_FULL("cachetune-colliding-types", false, true, TEST_COMPARE_DOM_XML2XML_RESULT_FAIL_PARSE); + DO_TEST_FULL("cachetune-colliding-monitors", false, true, + TEST_COMPARE_DOM_XML2XML_RESULT_FAIL_PARSE); DO_TEST("memorytune"); DO_TEST_FULL("memorytune-colliding-allocs", false, true, TEST_COMPARE_DOM_XML2XML_RESULT_FAIL_PARSE); DO_TEST_FULL("memorytune-colliding-cachetune", false, true, TEST_COMPARE_DOM_XML2XML_RESULT_FAIL_PARSE); + DO_TEST_FULL("cachetune-colliding-monitors", false, true, + TEST_COMPARE_DOM_XML2XML_RESULT_FAIL_PARSE);
DO_TEST("tseg");

Resource monitoring group monitors the resource consumption, cache and memory bandwidth, of particular resctrl allocation. Introduce the resctrl monitoring group. Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com> --- src/qemu/qemu_process.c | 40 ++++++++++++++++++++++++++++++++++++++-- 1 file changed, 38 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 960c3ed..e70aa5e 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -2593,6 +2593,7 @@ qemuProcessResctrlCreate(virQEMUDriverPtr driver, { int ret = -1; size_t i = 0; + size_t j = 0; virCapsPtr caps = NULL; qemuDomainObjPrivatePtr priv = vm->privateData; @@ -2610,6 +2611,20 @@ qemuProcessResctrlCreate(virQEMUDriverPtr driver, vm->def->resctrls[i]->alloc, priv->machineName) < 0) goto cleanup; + + /* Create resctrl monitoring groups associated with allocation */ + for (j = 0; j < vm->def->resctrls[i]->nmonitors; j++) { + virDomainResctrlMonitorPtr monitor = NULL; + monitor = vm->def->resctrls[i]->monitors[j]; + + if (virResctrlAllocCreateMonitor(caps->host.resctrl, + vm->def->resctrls[i]->alloc, + priv->machineName, + monitor->id) < 0) + + goto cleanup; + + } } ret = 0; @@ -5419,7 +5434,9 @@ qemuProcessSetupVcpu(virDomainObjPtr vm, { pid_t vcpupid = qemuDomainGetVcpuPid(vm, vcpuid); virDomainVcpuDefPtr vcpu = virDomainDefGetVcpu(vm->def, vcpuid); + virDomainResctrlMonitorPtr mon = NULL; size_t i = 0; + size_t j = 0; if (qemuProcessSetupPid(vm, vcpupid, VIR_CGROUP_THREAD_VCPU, vcpuid, vcpu->cpumask, @@ -5434,7 +5451,15 @@ qemuProcessSetupVcpu(virDomainObjPtr vm, if (virBitmapIsBitSet(ct->vcpus, vcpuid)) { if (virResctrlAllocAddPID(ct->alloc, vcpupid) < 0) return -1; - break; + } + + for (j = 0; j < vm->def->resctrls[i]->nmonitors; j++) { + mon = vm->def->resctrls[i]->monitors[j]; + if (virBitmapIsBitSet(mon->vcpus, vcpuid)) { + if (virResctrlAllocAddMonitorPID(ct->alloc, + mon->id, vcpupid) < 0) + return -1; + } } } @@ -7747,10 +7772,12 @@ qemuProcessReconnect(void *opaque) int reason; virQEMUDriverConfigPtr cfg; size_t i; + size_t j; unsigned int stopFlags = 0; bool jobStarted = false; virCapsPtr caps = NULL; bool retry = true; + virDomainResctrlDefPtr resctrl = NULL; VIR_FREE(data); @@ -7934,9 +7961,18 @@ qemuProcessReconnect(void *opaque) goto error; for (i = 0; i < obj->def->nresctrls; i++) { - if (virResctrlAllocDeterminePath(obj->def->resctrls[i]->alloc, + resctrl = obj->def->resctrls[i]; + + if (virResctrlAllocDeterminePath(resctrl->alloc, priv->machineName) < 0) goto error; + + for (j = 0; j < resctrl->nmonitors; j++) { + if (virResctrlAllocDetermineMonitorPath(resctrl->alloc, + resctrl->monitors[j]->id, + priv->machineName) < 0) + goto error; + } } /* update domain state XML with possibly updated state in virDomainObj */ -- 2.7.4

Intel x86 RDT CMT is the technology to tell the last level cache occupancy information. Adding the interface in qemu to report this information for resource monitor group through command 'virsh domstats --cpu-total'. Below is a typical output: # virsh domstats 1 --cpu-total Domain: 'ubuntu16.04-base' ... cpu.cache.monitor.count=2 cpu.cache.0.name=vcpus_1 cpu.cache.0.vcpus=1 cpu.cache.0.bank.count=2 cpu.cache.0.bank.0.id=0 cpu.cache.0.bank.0.bytes=4505600 cpu.cache.0.bank.1.id=1 cpu.cache.0.bank.1.bytes=5586944 cpu.cache.1.name=vcpus_4-6 cpu.cache.1.vcpus=4,5,6 cpu.cache.1.bank.count=2 cpu.cache.1.bank.0.id=0 cpu.cache.1.bank.0.bytes=17571840 cpu.cache.1.bank.1.id=1 cpu.cache.1.bank.1.bytes=29106176 Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com> --- src/libvirt-domain.c | 9 ++ src/qemu/qemu_driver.c | 265 +++++++++++++++++++++++++++++++++++++++++++++---- 2 files changed, 253 insertions(+), 21 deletions(-) diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c index ef46027..a88e94a 100644 --- a/src/libvirt-domain.c +++ b/src/libvirt-domain.c @@ -11337,6 +11337,15 @@ virConnectGetDomainCapabilities(virConnectPtr conn, * "cpu.user" - user cpu time spent in nanoseconds as unsigned long long. * "cpu.system" - system cpu time spent in nanoseconds as unsigned long * long. + * "cpu.cache.monitor.count" - tocal cache monitoring groups + * "cpu.cache.M.name" - name for cache monitoring group 'M' + * "cpu.cache.M.vcpus" - vcpus for cache monitoring group 'M' + * "cpu.cache.M.bank.count" - total bank number for cache monitoring + * group 'M' + * "cpu.cache.M.bank.N.id" - OS assigned cache bank id for cache 'N' in + * cache monitoring group 'M' + * "cpu.cache.M.bank.N.bytes" - cache occupancy of cache bank 'N' in + * cache monitoring group 'M' * * VIR_DOMAIN_STATS_BALLOON: * Return memory balloon device information. diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index da8c4e8..193f606 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -107,6 +107,7 @@ #include "virnuma.h" #include "dirname.h" #include "netdev_bandwidth_conf.h" +#include "c-ctype.h" #define VIR_FROM_THIS VIR_FROM_QEMU @@ -19660,6 +19661,225 @@ typedef enum { #define HAVE_JOB(flags) ((flags) & QEMU_DOMAIN_STATS_HAVE_JOB) +/* + * qemuDomainVcpuFormatHelper + * For vcpu string, both '1-3' and '1,3' are valid format and + * representing different vcpu set, but it is not easy to + * differentiate them at first galance, to avoid this case + * substituting all '-' with ',', e.g. substitute string '1-3' + * with '1,2,3'. + */ +static int +qemuDomainVcpuFormatHelper(char **vcpus) +{ + const char *cur = NULL; + size_t i = 0; + char *tmp = NULL; + int start, last; + virBuffer buf = VIR_BUFFER_INITIALIZER; + bool firstnum = 1; + + if (!*vcpus) + goto error; + + cur = *vcpus; + + virSkipSpaces(&cur); + + if (*cur == '\0') + goto error; + + while (*cur != 0) { + if (!c_isdigit(*cur)) + goto error; + + if (virStrToLong_i(cur, &tmp, 10, &start) < 0) + goto error; + if (start < 0) + goto error; + + cur = tmp; + + virSkipSpaces(&cur); + + if (*cur == ',' || *cur == 0) { + if (!firstnum) + virBufferAddChar(&buf, ','); + virBufferAsprintf(&buf, "%d", start); + firstnum = 0; + } else if (*cur == '-') { + cur++; + virSkipSpaces(&cur); + + if (virStrToLong_i(cur, &tmp, 10, &last) < 0) + + goto error; + if (last < start) + goto error; + cur = tmp; + + for (i = start; i <= last; i++) { + if (!firstnum) + + virBufferAddChar(&buf, ','); + virBufferAsprintf(&buf, "%ld", i); + firstnum = 0; + } + + virSkipSpaces(&cur); + } + + if (*cur == ',') { + cur++; + virSkipSpaces(&cur); + } else if (*cur == 0) { + break; + } else { + goto error; + } + } + VIR_FREE(*vcpus); + *vcpus = virBufferContentAndReset(&buf); + return 0; + error: + virBufferFreeAndReset(&buf); + return -1; +} + +static int +qemuDomainGetStatsCpuResource(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, + virDomainObjPtr dom, + virDomainStatsRecordPtr record, + int *maxparams, + unsigned int privflags ATTRIBUTE_UNUSED) +{ + size_t i = 0; + size_t j = 0; + char param_name[VIR_TYPED_PARAM_FIELD_LENGTH]; + virDomainResctrlDefPtr resctrl = NULL; + virDomainResctrlMonitorPtr monitor = NULL; + unsigned int nvals = 0; + unsigned int *ids = NULL; + unsigned int *vals = NULL; + unsigned int nmonitor = NULL; + char *vcpustr = NULL; + int ret = -1; + + for (i = 0; i < dom->def->nresctrls; i++) { + resctrl = dom->def->resctrls[i]; + + for (j = 0; j < resctrl->nmonitors; j++) { + monitor = resctrl->monitors[j]; + if (monitor->vcpus) + nmonitor++; + } + } + + snprintf(param_name, VIR_TYPED_PARAM_FIELD_LENGTH, + "cpu.cache.monitor.count"); + if (virTypedParamsAddUInt(&record->params, + &record->nparams, + maxparams, + param_name, + nmonitor) < 0) + goto cleanup; + + for (i = 0; i < dom->def->nresctrls; i++) { + resctrl = dom->def->resctrls[i]; + + for (j = 0; j < resctrl->nmonitors; j++) { + size_t l = 0; + + monitor = resctrl->monitors[j]; + + if (!(vcpustr = virBitmapFormat(monitor->vcpus))) + goto cleanup; + + if (qemuDomainVcpuFormatHelper(&vcpustr) < 0) + goto cleanup; + + switch ((virDomainResctrlMonType) monitor->type) { + case VIR_DOMAIN_RESCTRL_MONITOR_CACHE: + case VIR_DOMAIN_RESCTRL_MONITOR_CACHE_MEMBW: + if (!monitor->vcpus) + continue; + + if (virResctrlAllocGetCacheOccupancy(resctrl->alloc, + monitor->id, &nvals, + &ids, &vals) < 0) + goto cleanup; + + snprintf(param_name, VIR_TYPED_PARAM_FIELD_LENGTH, + "cpu.cache.%ld.name", i); + if (virTypedParamsAddString(&record->params, + &record->nparams, + maxparams, + param_name, + monitor->id) < 0) + goto cleanup; + + snprintf(param_name, VIR_TYPED_PARAM_FIELD_LENGTH, + "cpu.cache.%ld.vcpus", i); + + if (virTypedParamsAddString(&record->params, + &record->nparams, + maxparams, + param_name, + vcpustr) < 0) + goto cleanup; + + snprintf(param_name, VIR_TYPED_PARAM_FIELD_LENGTH, + "cpu.cache.%ld.bank.count", i); + if (virTypedParamsAddUInt(&record->params, + &record->nparams, + maxparams, + param_name, + nvals) < 0) + goto cleanup; + + for (l = 0; l < nvals; l++) { + snprintf(param_name, VIR_TYPED_PARAM_FIELD_LENGTH, + "cpu.cache.%ld.bank.%ld.id", i, l); + if (virTypedParamsAddUInt(&record->params, + &record->nparams, + maxparams, + param_name, + ids[l]) < 0) + goto cleanup; + + + snprintf(param_name, VIR_TYPED_PARAM_FIELD_LENGTH, + "cpu.cache.%ld.bank.%ld.bytes", i, l); + if (virTypedParamsAddUInt(&record->params, + &record->nparams, + maxparams, + param_name, + vals[l]) < 0) + goto cleanup; + } + break; + + case VIR_DOMAIN_RESCTRL_MONITOR_MEMBW: + case VIR_DOMAIN_RESCTRL_MONITOR_LAST: + default: + break; + } + + VIR_FREE(ids); + VIR_FREE(vals); + VIR_FREE(vcpustr); + nvals = 0; + } + } + + ret = 0; + cleanup: + VIR_FREE(ids); + VIR_FREE(vals); + VIR_FREE(vcpustr); + return ret; +} + static int qemuDomainGetStatsCpu(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, virDomainObjPtr dom, @@ -19673,29 +19893,32 @@ qemuDomainGetStatsCpu(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, unsigned long long sys_time = 0; int err = 0; - if (!priv->cgroup) - return 0; + if (priv->cgroup) { + err = virCgroupGetCpuacctUsage(priv->cgroup, &cpu_time); + if (!err && virTypedParamsAddULLong(&record->params, + &record->nparams, + maxparams, + "cpu.time", + cpu_time) < 0) + return -1; - err = virCgroupGetCpuacctUsage(priv->cgroup, &cpu_time); - if (!err && virTypedParamsAddULLong(&record->params, - &record->nparams, - maxparams, - "cpu.time", - cpu_time) < 0) - return -1; + err = virCgroupGetCpuacctStat(priv->cgroup, &user_time, &sys_time); + if (!err && virTypedParamsAddULLong(&record->params, + &record->nparams, + maxparams, + "cpu.user", + user_time) < 0) + return -1; + if (!err && virTypedParamsAddULLong(&record->params, + &record->nparams, + maxparams, + "cpu.system", + sys_time) < 0) + return -1; + } - err = virCgroupGetCpuacctStat(priv->cgroup, &user_time, &sys_time); - if (!err && virTypedParamsAddULLong(&record->params, - &record->nparams, - maxparams, - "cpu.user", - user_time) < 0) - return -1; - if (!err && virTypedParamsAddULLong(&record->params, - &record->nparams, - maxparams, - "cpu.system", - sys_time) < 0) + if (qemuDomainGetStatsCpuResource(driver, dom, + record, maxparams, privflags) < 0) return -1; return 0; -- 2.7.4

Hi, This series was run against 'syntax-check' test by patchew.org, which failed, please find the details below: Type: series Message-id: 1535368993-24901-1-git-send-email-huaqiang.wang@intel.com Subject: [libvirt] [PATCH 00/10] Introduce x86 Cache Monitoring Technology (CMT) === TEST SCRIPT BEGIN === #!/bin/bash # Testing script will be invoked under the git checkout with # HEAD pointing to a commit that has the patches applied on top of "base" # branch time bash -c './autogen.sh && make syntax-check' === TEST SCRIPT END === Updating bcb55ab053bc79561b55d0394490f4b64e0f2d01 >From https://github.com/patchew-project/libvirt * [new tag] patchew/1535368993-24901-1-git-send-email-huaqiang.wang@intel.com -> patchew/1535368993-24901-1-git-send-email-huaqiang.wang@intel.com Switched to a new branch 'test' 37e9fb2bc1 qemu: Report cache occupancy (CMT) with domstats 60f4c08fe8 qemu: Introduce resctrl monitoring group 6ccb886a38 conf: introduce resctrl monitor group in domain da2f5cd445 conf: refactor virDomainResctrlAppend 3821628278 util: Introduce resctrl monitor for CMT dfa2d7672e util: resctrl: refactoring some functions 5a536c4bef test: add test case for resctrl monitor c439b99753 conf: Add CMT capability to host 25ad8ce336 util: add interface retrieving CMT capability c52db949be conf: Renamed 'controlBuf' to 'childrenBuf' === OUTPUT BEGIN === Updating submodules... Submodule 'keycodemapdb' (https://gitlab.com/keycodemap/keycodemapdb.git) registered for path 'src/keycodemapdb' Cloning into '/var/tmp/patchew-tester-tmp-p894jcsi/src/src/keycodemapdb'... Submodule path 'src/keycodemapdb': checked out '16e5b0787687d8904dad2c026107409eb9bfcb95' error: pathspec '.gnulib' did not match any file(s) known to git. Running bootstrap... ./bootstrap: Bootstrapping from checked-out libvirt sources... ./bootstrap: consider installing git-merge-changelog from gnulib ./bootstrap: getting gnulib files... error: pathspec '.gnulib' did not match any file(s) known to git. error: bootstrap failed real 0m4.717s user 0m2.819s sys 0m1.345s === OUTPUT END === Test command exited with code: 1 --- Email generated automatically by Patchew [http://patchew.org/]. Please send your feedback to patchew-devel@redhat.com

hi reviewer, I understand libvirt community is quite active and you are quite busy. I am written here to know if you ever noticed this patch series, and welcome your comment. BR On 2018年08月27日 19:23, Wang Huaqiang wrote:
This series of patches introduced the x86 Cache Monitoring Technology (CMT) to libvirt by interacting with kernel resource control (resctrl) interface. CMT is one of the Intel(R) x86 CPU feature which belongs to the Resource Director Technology (RDT). CMT reports the occupancy of the last level cache, which is shared by all CPU cores.
We have serval discussion about the enabling of CMT, please refer to following links for the RFCs. RFCv3 https://www.redhat.com/archives/libvir-list/2018-August/msg01213.html RFCv2 https://www.redhat.com/archives/libvir-list/2018-July/msg00409.html https://www.redhat.com/archives/libvir-list/2018-July/msg01241.html RFCv1 https://www.redhat.com/archives/libvir-list/2018-June/msg00674.html
1. About reason why CMT is necessary in libvirt? The perf events of 'CMT, MBML, MBMT' have been phased out since Linux kernel commit c39a0e2c8850f08249383f2425dbd8dbe4baad69, in libvirt the perf based cmt,mbm will not work with the latest linux kernel. These patches add CMT feature to libvirt through kernel resctrlfs interface.
2. Interfaces for CMT from the high level.
2.1 Query the host capability of CMT.
The element 'monitor' represents the host capabilities of CMT. The explanations of involved CMT attributes: - 'maxAllocs' denotes the maximum monitoring groups could be created, which is limited by the number of hardware 'RMID'. - 'threshold' denotes the upper bound of cache occupancy for current group, in bytes, to determine if an RMID can be reused. - element 'feature' denotes the monitoring feature supported. - 'llc_occupancy' is the feature for reporting the last level cache occupancy information.
# virsh capabilities ... <cache> <bank id='0' level='3' type='both' size='15' unit='MiB' cpus='0-5'> <control granularity='768' unit='KiB' type='code' maxAllocs='8'/> <control granularity='768' unit='KiB' type='data' maxAllocs='8'/> + <monitor threshold='540672' unit='B' maxAllocs='176'/> + <feature name=llc_occupancy/> + </monitor> </bank> <bank id='1' level='3' type='both' size='15' unit='MiB' cpus='6-11'> <control granularity='768' unit='KiB' type='code' maxAllocs='8'/> <control granularity='768' unit='KiB' type='data' maxAllocs='8'/> + <monitor threshold='540672' unit='B' maxAllocs='176'/> + <feature name=llc_occupancy/> + </monitor> </bank> </cache> ...
2.2 Create cache monitoring group (cache monitor).
The main interface for creating monitoring group is through XML file. The proposed configuration is like:
<cputune> <cachetune vcpus='1'> <cache id='0' level='3' type='code' size='7680' unit='KiB'/> <cache id='1' level='3' type='data' size='3840' unit='KiB'/> + <monitor vcpus='1'/> </cachetune> <cachetune vcpus='4-7'> + <monitor vcpus='4-6'/> </cachetune> </cputune>
In above XML, created 2 cache resctrl allocation groups and 2 resctrl monitoring groups. The changes of cache monitor will be effective in next booting of VM.
2.3 Show CMT result through command 'domstats'
Adding the interface in qemu to report this information for resource monitor group through command 'virsh domstats --cpu-total'. Below is a typical output:
# virsh domstats 1 --cpu-total Domain: 'ubuntu16.04-base' ... cpu.cache.monitor.count=2 cpu.cache.0.name=vcpus_1 cpu.cache.0.vcpus=1 cpu.cache.0.bank.count=2 cpu.cache.0.bank.0.id=0 cpu.cache.0.bank.0.bytes=4505600 cpu.cache.0.bank.1.id=1 cpu.cache.0.bank.1.bytes=5586944 cpu.cache.1.name=vcpus_4-6 cpu.cache.1.vcpus=4,5,6 cpu.cache.1.bank.count=2 cpu.cache.1.bank.0.id=0 cpu.cache.1.bank.0.bytes=17571840 cpu.cache.1.bank.1.id=1 cpu.cache.1.bank.1.bytes=29106176
**Changes Since RFCv3** In the output of 'domstats', added 'cpu.cache.<cmt_group_index>.bank.<bank_index>.id' to tell the OS assigned cache bank id of current cache. Changes is prefixed with a '+':
# virsh domstats 1 --cpu-total Domain: 'ubuntu16.04-base' ... cpu.cache.monitor.count=2 cpu.cache.0.name=vcpus_1 cpu.cache.0.vcpus=1 cpu.cache.0.bank.count=2 + cpu.cache.0.bank.0.id=0 cpu.cache.0.bank.0.bytes=4505600 + cpu.cache.0.bank.1.id=1 cpu.cache.0.bank.1.bytes=5586944 cpu.cache.1.name=vcpus_4-6 cpu.cache.1.vcpus=4,5,6 cpu.cache.1.bank.count=2 + cpu.cache.1.bank.0.id=0 cpu.cache.1.bank.0.bytes=17571840 + cpu.cache.1.bank.1.id=1 cpu.cache.1.bank.1.bytes=29106176
Wang Huaqiang (10): conf: Renamed 'controlBuf' to 'childrenBuf' util: add interface retrieving CMT capability conf: Add CMT capability to host test: add test case for resctrl monitor util: resctrl: refactoring some functions util: Introduce resctrl monitor for CMT conf: refactor virDomainResctrlAppend conf: introduce resctrl monitor group in domain qemu: Introduce resctrl monitoring group qemu: Report cache occupancy (CMT) with domstats
.gnulib | 1 - docs/formatdomain.html.in | 14 +- docs/schemas/capability.rng | 28 + docs/schemas/domaincommon.rng | 11 +- src/conf/capabilities.c | 51 +- src/conf/capabilities.h | 1 + src/conf/domain_conf.c | 159 +++++- src/conf/domain_conf.h | 20 + src/libvirt-domain.c | 9 + src/libvirt_private.syms | 6 + src/qemu/qemu_driver.c | 265 ++++++++- src/qemu/qemu_process.c | 40 +- src/util/virresctrl.c | 597 +++++++++++++++++++-- src/util/virresctrl.h | 48 +- tests/genericxml2xmlindata/cachetune-cdp.xml | 2 + .../cachetune-colliding-monitors.xml | 36 ++ tests/genericxml2xmlindata/cachetune-small.xml | 1 + tests/genericxml2xmlindata/cachetune.xml | 3 + tests/genericxml2xmltest.c | 4 + .../resctrl/info/L3_MON/max_threshold_occupancy | 1 + .../linux-resctrl/resctrl/info/L3_MON/mon_features | 3 + .../linux-resctrl/resctrl/info/L3_MON/num_rmids | 1 + tests/vircaps2xmldata/vircaps-x86_64-resctrl.xml | 6 + 23 files changed, 1208 insertions(+), 99 deletions(-) delete mode 160000 .gnulib create mode 100644 tests/genericxml2xmlindata/cachetune-colliding-monitors.xml create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/max_threshold_occupancy create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/mon_features create mode 100644 tests/vircaps2xmldata/linux-resctrl/resctrl/info/L3_MON/num_rmids

Hi, This series was run against 'syntax-check' test by patchew.org, which failed, please find the details below: Type: series Message-id: 1535368993-24901-1-git-send-email-huaqiang.wang@intel.com Subject: [libvirt] [PATCH 00/10] Introduce x86 Cache Monitoring Technology (CMT) === TEST SCRIPT BEGIN === #!/bin/bash # Testing script will be invoked under the git checkout with # HEAD pointing to a commit that has the patches applied on top of "base" # branch time bash -c './autogen.sh && make syntax-check' === TEST SCRIPT END === Updating bcb55ab053bc79561b55d0394490f4b64e0f2d01
From https://github.com/patchew-project/libvirt 39015a6f3a..e9e904b3b7 master -> master t [tag update] patchew/1535368993-24901-1-git-send-email-huaqiang.wang@intel.com -> patchew/1535368993-24901-1-git-send-email-huaqiang.wang@intel.com Switched to a new branch 'test' b59bfacba7 qemu: Report cache occupancy (CMT) with domstats 250d75c15d qemu: Introduce resctrl monitoring group 686045d727 conf: introduce resctrl monitor group in domain 23e3f587a5 conf: refactor virDomainResctrlAppend 6032925e72 util: Introduce resctrl monitor for CMT 1816e759f2 util: resctrl: refactoring some functions 7f1f5b878c test: add test case for resctrl monitor 15f550adf2 conf: Add CMT capability to host d7d4551e33 util: add interface retrieving CMT capability 8e704c3761 conf: Renamed 'controlBuf' to 'childrenBuf'
=== OUTPUT BEGIN === Updating submodules... Submodule 'keycodemapdb' (https://gitlab.com/keycodemap/keycodemapdb.git) registered for path 'src/keycodemapdb' Cloning into '/var/tmp/patchew-tester-tmp-sf9mnyb2/src/src/keycodemapdb'... Submodule path 'src/keycodemapdb': checked out '16e5b0787687d8904dad2c026107409eb9bfcb95' error: pathspec '.gnulib' did not match any file(s) known to git. Running bootstrap... ./bootstrap: Bootstrapping from checked-out libvirt sources... ./bootstrap: consider installing git-merge-changelog from gnulib ./bootstrap: getting gnulib files... error: pathspec '.gnulib' did not match any file(s) known to git. error: bootstrap failed real 0m7.491s user 0m3.166s sys 0m1.824s === OUTPUT END === Test command exited with code: 1 --- Email generated automatically by Patchew [http://patchew.org/]. Please send your feedback to patchew-devel@redhat.com
participants (5)
-
Huaqiang,Wang
-
John Ferlan
-
no-reply@patchew.org
-
Wang Huaqiang
-
Wang, Huaqiang