[libvirt] [RFC PATCHv2 00/10] x86 RDT Cache Monitoring Technology (CMT)

This is the V2 of RFC and the POC source code for introducing x86 RDT CMT feature, thanks Martin Kletzander for his review and constructive suggestion for V1. This series is trying to provide the similar functions of the perf event based CMT, MBMT and MBML features in reporting cache occupancy, total memory bandwidth utilization and local memory bandwidth utilization information in livirt. Firstly we focus on cmt. x86 RDT Cache Monitoring Technology (CMT) provides a medthod to track the cache occupancy information per CPU thread. We are leveraging the implementation of kernel resctrl filesystem and create our patches on top of that. Describing the functionality from a high level: 1. Extend the output of 'domstats' and report CMT inforamtion. Comparing with perf event based CMT implementation in libvirt, this series extends the output of command 'domstat' and reports cache occupancy information like these: <pre> [root@dl-c200 libvirt]# virsh domstats vm3 --cpu-resource Domain: 'vm3' cpu.cacheoccupancy.vcpus_2.value=4415488 cpu.cacheoccupancy.vcpus_2.vcpus=2 cpu.cacheoccupancy.vcpus_1.value=7839744 cpu.cacheoccupancy.vcpus_1.vcpus=1 cpu.cacheoccupancy.vcpus_0,3.value=53796864 cpu.cacheoccupancy.vcpus_0,3.vcpus=0,3 </pre> The vcpus have been arragned into three monitoring groups, these three groups cover vcpu 1, vcpu 2 and vcpus 0,3 respectively. Take an example, the 'cpu.cacheoccupancy.vcpus_0,3.value' reports the cache occupancy information for vcpu 0 and vcpu 3, the 'cpu.cacheoccupancy.vcpus_0,3.vcpus' represents the vcpu group information. To address Martin's suggestion "beware as 1-4 is something else than 1,4 so you need to differentiate that.", the content of 'vcpus' (cpu.cacheoccupancy.<groupname>.vcpus=xxx) has been specially processed, if vcpus is a continous range, e.g. 0-2, then the output of cpu.cacheoccupancy.vcpus_0-2.vcpus will be like 'cpu.cacheoccupancy.vcpus_0-2.vcpus=0,1,2' instead of 'cpu.cacheoccupancy.vcpus_0-2.vcpus=0-2'. Please note that 'vcpus_0-2' is a name of this monitoring group, could be specified any other word from the XML configuration file or lively changed with the command introduced in following part. 2. A new command 'cpu-resource' for live changing CMT groups. A virsh tool has been introduced in this series to dynamically create, destroy monitoring groups as well as showing the existing grouping status. The general command interface is like this: <pre> [root@dl-c200 libvirt]# virsh help cpu-resource NAME cpu-resource - get or set hardware CPU RDT monitoring group SYNOPSIS cpu-resource <domain> [--group-name <string>] [--vcpulist <string>] [--create] [--destroy] [--live] [--config] [--current] DESCRIPTION Create or destroy CPU resource monitoring group. To get current CPU resource monitoring group status: virsh # cpu-resource [domain] OPTIONS [--domain] <string> domain name, id or uuid --group-name <string> group name to manipulate --vcpulist <string> ids of vcpus to manipulate --create Create CPU resctrl monitoring group for functions such as monitoring cache occupancy --destroy Destroy CPU resctrl monitoring group --live modify/get running state --config modify/get persistent configuration --current affect current domain </pre> This command provides live interface of changing resource monitoring group and keeping the result in persistent domain XML configuration file. 3. XML configuration changes for keeping CMT groups. To keep the monitoring group information and monitoring CPU cache resource utilization information at launch time, XML configuration file has been changed by adding a new element <resmongroup>: <pre> # Add a new element <cputune> <resmongroup vcpus='0-2'/> <resmongroup vcpus='3'/> </cputune> </pre> 4. About the naming used in this series for RDT CMT technology. About the wording and naming used in this series for Intel RDT CMT technology, 'RDT', 'CMT' and 'resctrl' are currently used names in Intel documents and kernel namespace in the context of CPU resource, but they are pretty confusing for system administrator. But 'Resource Control' or 'Monitoring' is a not good choice either, the scope of these two phrases are too big which normally cover lots of aspects other than CPU cache and memory hbandwidth. Intel 'RDT' is technology emphasizing on the resource allocation and monitoring within the scope CPU, I would like to use the term 'cpu-resource' here to describe the technology that these patches' are trying to address. This series is focusing on CPU cache occupancy monitoring(CMT), and this naming seems has a wider scope than CMT, we could add the similar resource monitoring part for technologies of MBML and MBMT under the framework that introduced in these patches. This naming is also applicable to technology of CPU resource allocation, it is possible to add some command by adding some arguments to allocate cache or memory bandwidth at run time. 5. About emulator and io threads CMT Currently, it is not possible to allocate an dedicated amount of cache or memory bandwidth for emulator or io threads. so the resource monitoring for emulator or io threads is not considered in this series. Could be planned in next stage. Changes since v1: A lot of things changed, mainly * report cache occupancy information based on vcpu group instead of whole domain. * be possible to destroy vcpu group at run time * XML configuration file changed * change naming for describing 'RDT CMT' to 'cpu-resource' Wang Huaqiang (10): util: add Intel x86 RDT/CMT support conf: introduce <resmongroup> element tests: add tests for validating <resmongroup> libvirt: add public APIs for resource monitoring group qemu: enable resctrl monitoring at booting stage remote: add remote protocol for resctrl monitoring qemu: add interfaces for dynamically manupulating resctl mon groups tool: add command cpuresource to interact with cpu resources tools: show cpu cache occupancy information in domstats news: add Intel x86 RDT CMT feature docs/formatdomain.html.in | 17 + docs/news.xml | 10 + docs/schemas/domaincommon.rng | 14 + include/libvirt/libvirt-domain.h | 14 + src/conf/domain_conf.c | 320 ++++++++++++++++++ src/conf/domain_conf.h | 25 ++ src/driver-hypervisor.h | 13 + src/libvirt-domain.c | 96 ++++++ src/libvirt_private.syms | 13 + src/libvirt_public.syms | 6 + src/qemu/qemu_driver.c | 357 +++++++++++++++++++++ src/qemu/qemu_process.c | 45 ++- src/remote/remote_daemon_dispatch.c | 45 +++ src/remote/remote_driver.c | 4 +- src/remote/remote_protocol.x | 31 +- src/remote_protocol-structs | 16 + src/util/virresctrl.c | 338 +++++++++++++++++++ src/util/virresctrl.h | 40 +++ tests/genericxml2xmlindata/cachetune-cdp.xml | 3 + tests/genericxml2xmlindata/cachetune-small.xml | 2 + tests/genericxml2xmlindata/cachetune.xml | 2 + .../resmongroup-colliding-cachetune.xml | 34 ++ tests/genericxml2xmltest.c | 3 + tools/virsh-domain-monitor.c | 7 + tools/virsh-domain.c | 139 ++++++++ 25 files changed, 1588 insertions(+), 6 deletions(-) create mode 100644 tests/genericxml2xmlindata/resmongroup-colliding-cachetune.xml -- 2.7.4

Add RDT/CMT feature (Intel x86) by interacting with kernel resctrl file system. Integrate code into util/resctrl. --- src/libvirt_private.syms | 10 ++ src/util/virresctrl.c | 338 +++++++++++++++++++++++++++++++++++++++++++++++ src/util/virresctrl.h | 40 ++++++ 3 files changed, 388 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 3e30490..b10a3a5 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2655,6 +2655,16 @@ virResctrlAllocSetID; virResctrlAllocSetSize; virResctrlInfoGetCache; virResctrlInfoNew; +virResctrlMonAddPID; +virResctrlMonCreate; +virResctrlMonDeterminePath; +virResctrlMonGetAlloc; +virResctrlMonGetCacheOccupancy; +virResctrlMonGetID; +virResctrlMonIsRunning; +virResctrlMonNew; +virResctrlMonRemove; +virResctrlMonSetID; # util/virrotatingfile.h diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index e492a63..3dca937 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -90,6 +90,7 @@ typedef virResctrlAllocPerLevel *virResctrlAllocPerLevelPtr; /* Class definitions and initializations */ static virClassPtr virResctrlInfoClass; static virClassPtr virResctrlAllocClass; +static virClassPtr virResctrlMonClass; /* virResctrlInfo */ @@ -257,6 +258,33 @@ virResctrlAllocDispose(void *obj) } + +/* virResctrlMon*/ +struct _virResctrlMon { + virObject parent; + + /* alloc: keeps the pointer to a allocation group if sharing same + * resctrlfs subdirectory with a allocation group. + * NULL for a standalone monitoring group.*/ + virResctrlAllocPtr alloc; + /* The identifier (any unique string for now) */ + char *id; + /* libvirt-generated path, be identical to path of 'alloc' if sharing + * same resctrlfs subdirectory with allocation group */ + char *path; +}; + + +static void +virResctrlMonDispose(void *obj) +{ + virResctrlMonPtr resctrlMon = obj; + + VIR_FREE(resctrlMon->id); + VIR_FREE(resctrlMon->path); +} + + /* Global initialization for classes */ static int virResctrlOnceInit(void) @@ -267,6 +295,9 @@ virResctrlOnceInit(void) if (!VIR_CLASS_NEW(virResctrlAlloc, virClassForObject())) return -1; + if (!VIR_CLASS_NEW(virResctrlMon, virClassForObject())) + return -1; + return 0; } @@ -1612,3 +1643,310 @@ virResctrlAllocRemove(virResctrlAllocPtr alloc) return ret; } + + +virResctrlMonPtr +virResctrlMonNew(void) +{ + if (virResctrlInitialize() < 0) + return NULL; + + return virObjectNew(virResctrlMonClass); +} + + +int +virResctrlMonSetID(virResctrlMonPtr mon, + const char *id) +{ + if (!id) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Resctrl mon group 'id' cannot be NULL")); + return -1; + } + + return VIR_STRDUP(mon->id, id); +} + + +const char * +virResctrlMonGetID(virResctrlMonPtr mon) +{ + return mon->id; +} + + +int +virResctrlMonDeterminePath(virResctrlMonPtr mon, + const char *machinename) +{ + char *grouppath = NULL; + int ret = -1; + + if (!mon->id) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Resctrl mon group id must be set before creation")); + goto cleanup; + } + + if (!mon->path) { + if (virAsprintf(&grouppath, "%s/mon_groups/%s-%s", + SYSFS_RESCTRL_PATH, machinename, mon->id) < 0) { + + goto cleanup; + } + VIR_STEAL_PTR(mon->path, grouppath); + + } else { + /* if path exists, validate it */ + if (virAsprintf(&grouppath, "%s/%s-%s", + SYSFS_RESCTRL_PATH, machinename, mon->id) < 0) { + goto cleanup; + } + + if (STRNEQ(mon->path, grouppath)) + goto cleanup; + } + + ret = 0; + cleanup: + VIR_FREE(grouppath); + return ret; +} + + +int +virResctrlMonAddPID(virResctrlMonPtr mon, + pid_t pid) +{ + char *tasks = NULL; + char *pidstr = NULL; + int ret = 0; + + /* PID only writes to standalone mon group */ + if (mon->alloc) + return 0; + + if (!mon->path) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Cannot add pid to non-existing resctrl mon group")); + return -1; + } + + VIR_DEBUG("Add PID %d to domain %s\n", + pid, mon->path); + + if (virAsprintf(&tasks, "%s/tasks", mon->path) < 0) + return -1; + + if (virAsprintf(&pidstr, "%lld", (long long int) pid) < 0) + goto cleanup; + + if (virFileWriteStr(tasks, pidstr, 0) < 0) { + virReportSystemError(errno, + _("Cannot write pid in tasks file '%s'"), + tasks); + goto cleanup; + } + + ret = 0; + cleanup: + VIR_FREE(tasks); + VIR_FREE(pidstr); + return ret; +} + + +int +virResctrlMonCreate(virResctrlAllocPtr alloc, + virResctrlMonPtr mon, + const char *machinename) +{ + int ret = -1; + int lockfd = -1; + + if (!mon) + return -1; + + if (alloc) { + if (!virFileExists(alloc->path)) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("resctrl mon group: allocation group exists but not " + "valid")); + return -1; + } + mon->alloc = alloc; + + if (virResctrlMonDeterminePath(mon, machinename) < 0) + return -1; + } else { + if (virResctrlMonDeterminePath(mon, machinename) < 0) + return -1; + + lockfd = virResctrlLockWrite(); + if (lockfd < 0) + goto cleanup; + + if (virFileExists(mon->path)) { + VIR_DEBUG("Removing resctrl mon group %s", mon->path); + if (rmdir(mon->path) != 0 && errno != ENOENT) { + virReportSystemError(errno, + _("Unable to remove resctrl directory '%s'"), + mon->path); + goto cleanup; + } + } + + if (virFileMakePath(mon->path) < 0) { + virReportSystemError(errno, + _("Cannot create resctrl directory '%s'"), + mon->path); + goto cleanup; + } + } + + ret = 0; + cleanup: + virResctrlUnlock(lockfd); + return ret; +} + + +int +virResctrlMonRemove(virResctrlMonPtr mon) +{ + int ret = 0; + + if (!mon->path) + return 0; + + VIR_DEBUG("Removing resctrl mon group %s", mon->path); + if (rmdir(mon->path) != 0 && errno != ENOENT) { + ret = -errno; + VIR_ERROR(_("Unable to remove %s (%d)"), mon->path, errno); + } + + return ret; +} + + +bool +virResctrlMonIsRunning(virResctrlMonPtr mon) +{ + bool ret = false; + int rv = -1; + char *tasks = NULL; + + if (mon && virFileExists(mon->path)) { + rv = virFileReadValueString(&tasks, "%s/tasks", mon->path); + if (rv < 0) + goto cleanup; + + if (!tasks || !tasks[0]) + goto cleanup; + + ret = true; + } + + cleanup: + VIR_FREE(tasks); + return ret; +} + + +static int +virResctrlMonGetStatistic(virResctrlMonPtr mon, + const char *resfile, + unsigned int *value) +{ + DIR *dirp = NULL; + int ret = -1; + int rv = -1; + struct dirent *ent = NULL; + unsigned int val = 0; + unsigned int valtotal = 0; + virBuffer buf = VIR_BUFFER_INITIALIZER; + char *mondatapath = NULL; + + if (!mon->path) + goto cleanup; + + if (!resfile) + goto cleanup; + + *value = 0; + + rv = virDirOpenIfExists(&dirp, mon->path); + if (rv <= 0) + goto cleanup; + + virBufferAsprintf(&buf, "%s/mon_data", + mon->path); + mondatapath = virBufferContentAndReset(&buf); + if (!mondatapath) + goto cleanup; + + VIR_DEBUG("Seek llc_occupancy file from root: %s ", + mondatapath); + + if (virDirOpen(&dirp, mondatapath) < 0) + goto cleanup; + + while ((rv = virDirRead(dirp, &ent, mondatapath)) > 0) { + VIR_DEBUG("Parsing file '%s'", ent->d_name); + if (ent->d_type != DT_DIR) + continue; + + if (STRNEQLEN(ent->d_name, "mon_L", 5)) + continue; + + rv = virFileReadValueUint(&val, + "%s/%s/%s", + mondatapath, ent->d_name, resfile); + + if (rv == -2) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("file %s/%s/%s does not exist"), + mondatapath, ent->d_name, resfile); + goto cleanup; + } else { + if (rv < 0) + goto cleanup; + } + + valtotal += val; + } + + *value = valtotal; + ret = 0; + cleanup: + VIR_FREE(mondatapath); + VIR_DIR_CLOSE(dirp); + return ret; +} + +int +virResctrlMonGetCacheOccupancy(virResctrlMonPtr mon, + unsigned int * cacheoccu) +{ + const char *cacheoccufile = "llc_occupancy"; + unsigned int value = 0; + int ret = - 1; + + *cacheoccu = 0; + + ret = virResctrlMonGetStatistic(mon, + cacheoccufile, + &value); + if (ret >= 0) + *cacheoccu = value; + + return ret; +} + + +virResctrlAllocPtr +virResctrlMonGetAlloc(virResctrlMonPtr mon) +{ + return mon->alloc; +} diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h index 9052a2b..6865ab6 100644 --- a/src/util/virresctrl.h +++ b/src/util/virresctrl.h @@ -116,4 +116,44 @@ virResctrlAllocAddPID(virResctrlAllocPtr alloc, int virResctrlAllocRemove(virResctrlAllocPtr alloc); + +/* Monitoring-related things */ +typedef struct _virResctrlMon virResctrlMon; +typedef virResctrlMon *virResctrlMonPtr; + +virResctrlMonPtr +virResctrlMonNew(void); + +int +virResctrlMonSetID(virResctrlMonPtr mon, + const char *id); + +const char * +virResctrlMonGetID(virResctrlMonPtr mon); + +int +virResctrlMonDeterminePath(virResctrlMonPtr mon, + const char *machinename); + +int +virResctrlMonAddPID(virResctrlMonPtr alloc, + pid_t pid); + +int +virResctrlMonCreate(virResctrlAllocPtr pairedalloc, + virResctrlMonPtr mon, + const char *machinename); + +int +virResctrlMonRemove(virResctrlMonPtr mon); + +bool +virResctrlMonIsRunning(virResctrlMonPtr mon); + +int +virResctrlMonGetCacheOccupancy(virResctrlMonPtr mon, + unsigned int * cacheoccu); + +virResctrlAllocPtr +virResctrlMonGetAlloc(virResctrlMonPtr mon); #endif /* __VIR_RESCTRL_H__ */ -- 2.7.4

resmongroup element is used for feature of resctrl monitoring group, and keeps the information for how resctrl monitoring groups is arranged. --- docs/formatdomain.html.in | 17 +++ docs/schemas/domaincommon.rng | 14 ++ src/conf/domain_conf.c | 318 ++++++++++++++++++++++++++++++++++++++++++ src/conf/domain_conf.h | 25 ++++ src/libvirt_private.syms | 3 + 5 files changed, 377 insertions(+) diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index a3afe13..cfb10c4 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -757,6 +757,8 @@ <cache id='0' level='3' type='both' size='3' unit='MiB'/> <cache id='1' level='3' type='both' size='3' unit='MiB'/> </cachetune> + <resmongroup vcpus="0-3"/> + <resmongroup vcpus="4"/> </cputune> ... </domain> @@ -952,6 +954,21 @@ </dl> </dd> + <dt><code>resmongroup</code><span class="since">Since 4.6.0</span></dt> + <dd> + Optional <code>resmongroup</code> element can be used to create resctrl + monitoring group for purpose of reporting cache occupancy informatoin. + The attribute <code>vcpus</code> specifies vCPUs this monitoring group + applies and which is impacted by the <code>cachetune</code> + <code>vcpus</code> attribute. A <code>resmongroup</code> + <code>vcpus</code> is valid if it has the same setting with that of + <code>cachetune</code> element, but any other kind of overlap between + <code>vcpus</code> of <code>rdtmongroup</code> and + <code>cachetune</code> <code>vcpus</code> is not permitted. Further, + any other kind of vCPUs overlap between monitoring groups is allowed. + Optional attribute <code>id</code> specifies the group name. + + </dd> </dl> diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index bd687ce..a8057b1 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -952,6 +952,20 @@ </element> </zeroOrMore> <zeroOrMore> + <element name="resmongroup"> + <attribute name="vcpus"> + <ref name='cpuset'/> + </attribute> + <optional> + <attribute name="id"> + <data type="string"> + <param name='pattern'>[a-zA-Z0-9,-_]+</param> + </data> + </attribute> + </optional> + </element> + </zeroOrMore> + <zeroOrMore> <element name="cachetune"> <attribute name="vcpus"> <ref name='cpuset'/> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index f4e59f6..0cdad79 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -2975,6 +2975,18 @@ virDomainSEVDefFree(virDomainSEVDefPtr def) } +static void +virDomainCpuResmonDefFree(virDomainCpuResmonDefPtr resmon) +{ + if (!resmon) + return; + + virObjectUnref(resmon->mon); + virBitmapFree(resmon->vcpus); + VIR_FREE(resmon); +} + + void virDomainDefFree(virDomainDefPtr def) { size_t i; @@ -3152,6 +3164,10 @@ void virDomainDefFree(virDomainDefPtr def) virDomainCachetuneDefFree(def->cachetunes[i]); VIR_FREE(def->cachetunes); + for (i = 0; i < def->nresmons; i++) + virDomainCpuResmonDefFree(def->resmons[i]); + VIR_FREE(def->resmons); + VIR_FREE(def->keywrap); if (def->namespaceData && def->ns.free) @@ -19055,6 +19071,264 @@ virDomainCachetuneDefParse(virDomainDefPtr def, } +bool +virDomainCpuResmonDefValidate(virDomainDefPtr def, + const char *id, + virBitmapPtr vcpus, + virResctrlAllocPtr *alloc) +{ + ssize_t i = -1; + + /* vcpu should exist in current domain */ + while ((i = virBitmapNextSetBit(vcpus, i)) > -1) { + if (!virDomainDefGetVcpu(def, i)) + return false; + } + + if (alloc) + *alloc = NULL; + + /* if 'vcpus' equals to vcpus of any existing allocation group, means, mon + * group is sharing same resctrl resource group with allocation group, this + * is a legal case. Otherwise, no vcpu overlap is allowed between mon group + * and any aollocation group. + * if a mon group is sharing the same resource group with one allocation + * group, we hope the mon group and the allocation group have a same 'id', + * and the 'alloc' pointer points to the allocation group. */ + for (i = 0; i < def->ncachetunes; i++) { + if (virBitmapOverlaps(vcpus, def->cachetunes[i]->vcpus)) { + if (virBitmapEqual(vcpus, def->cachetunes[i]->vcpus)) { + const char *allocid = + virResctrlAllocGetID(def->cachetunes[i]->alloc); + if (!allocid || (id && STRNEQ(id, allocid))) + return false; + + if (alloc) + *alloc = def->cachetunes[i]->alloc; + return true; + } + + return false; + } + } + + /* if vcpus equals to vcpus of existing mon group vcpus, + * a mon group already created, return True. + * for new mon group no overlap for vcpus */ + for (i = 0; i < def->nresmons; i++) { + if (virBitmapEqual(vcpus, def->resmons[i]->vcpus) && + (!id || + STREQ(id, virResctrlMonGetID(def->resmons[i]->mon)))) + return true; + + if (virBitmapOverlaps(vcpus, def->resmons[i]->vcpus)) + return false; + } + + return true; +} + + +virDomainCpuResmonDefPtr +virDomainCpuResmonDefAdd(virDomainDefPtr def, + virBitmapPtr vcpuslist, + const char *monid) +{ + virDomainCpuResmonDefPtr resmon = NULL; + virResctrlMonPtr mon = NULL; + char *id = NULL; + char *vcpus_str = NULL; + size_t i = -1; + virDomainCpuResmonDefPtr ret = NULL; + virBitmapPtr vcpus = virBitmapNewCopy(vcpuslist); + + if (VIR_STRDUP(id, monid) < 0) + goto cleanup; + + for (i = 0; i < def->nresmons; i++) { + if (virBitmapEqual(vcpus, def->resmons[i]->vcpus)) { + if (!id || + STREQ(id, virResctrlMonGetID(def->resmons[i]->mon))) { + ret = def->resmons[i]; + goto cleanup; + } + virReportError(VIR_ERR_INVALID_ARG, + "%s", _("resource monitoring group id mismatch")); + goto cleanup; + } + } + + /* resouce group created by cachtune also has a mon group, if matched + * copying the group id and no sub-directory under resctrl fs will be + * created */ + for (i = 0; i < def->ncachetunes; i++) { + if (virBitmapEqual(vcpus, def->cachetunes[i]->vcpus)) { + const char *allocid + = virResctrlAllocGetID(def->cachetunes[i]->alloc); + /* for mon group matched in cachetunes list should never + * be disabled because we cannot disable an allocation + * group in runtime */ + if (!id) { + if (VIR_STRDUP(id, allocid) < 0) + goto cleanup; + + } + break; + } + } + + if (!id) { + vcpus_str = virBitmapFormat(vcpus); + if (virAsprintf(&id, "vcpus_%s", vcpus_str) < 0) + goto cleanup; + } + + if (VIR_ALLOC(resmon) < 0) + goto cleanup; + + mon = virResctrlMonNew(); + if (!mon) + goto cleanup; + + if (virResctrlMonSetID(mon, id) < 0) + goto cleanup; + + VIR_STEAL_PTR(resmon->vcpus, vcpus); + VIR_STEAL_PTR(resmon->mon, mon); + + if (VIR_APPEND_ELEMENT(def->resmons, def->nresmons, resmon) < 0) + goto cleanup; + + ret = def->resmons[def->nresmons - 1]; + cleanup: + virBitmapFree(vcpus); + VIR_FREE(id); + virDomainCpuResmonDefFree(resmon); + virObjectUnref(mon); + return ret; +} + + +virResctrlMonPtr +virDomainCpuResmonDefRemove(virDomainDefPtr def, + const char *monid) +{ + virDomainCpuResmonDefPtr resmon = NULL; + virResctrlMonPtr mon = NULL; + size_t i = -1; + + if (!monid) { + virReportError(VIR_ERR_INVALID_ARG, + _("Cannot remove resource monitoring group: " + "group name is NULL")); + goto error; + } + + for (i = 0; i < def->nresmons; i++) { + const char *id = virResctrlMonGetID(def->resmons[i]->mon); + if (!id) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Cannot remove resource monitoring group: " + "error in get monitoring group name")); + goto error; + } + + if (STREQ(monid, id)) + break; + } + + if (i == def->nresmons) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Cannot remove resource monitoring group: " + "no monitoring group '%s' found"), + monid); + goto error; + } + + resmon = def->resmons[i]; + VIR_DELETE_ELEMENT(def->resmons, i, def->nresmons); + + mon = resmon->mon; + virBitmapFree(resmon->vcpus); + VIR_FREE(resmon); + error: + return mon; +} + + +static int +virDomainCpuResmonDefParse(virDomainDefPtr def, + xmlXPathContextPtr ctxt, + unsigned int flags) +{ + xmlNodePtr oldnode = ctxt->node; + xmlNodePtr *nodes = NULL; + virBitmapPtr vcpus = NULL; + char *vcpus_str = NULL; + char *monid = NULL; + size_t i = 0; + int n = 0; + int ret = -1; + + if ((n = virXPathNodeSet("./cputune/resmongroup", ctxt, &nodes)) < 0) + goto cleanup; + + for (i = 0; i < n; i++) { + + if (!(vcpus_str = virXMLPropString(nodes[i], "vcpus"))) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("Missing resmongroup attribute 'vcpus'")); + goto cleanup; + } + + if (virBitmapParse(vcpus_str, &vcpus, VIR_DOMAIN_CPUMASK_LEN) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("Invalid resmongroup attribute 'vcpus' value '%s'"), + vcpus_str); + goto cleanup; + } + + virBitmapShrink(vcpus, def->maxvcpus); + + if (virBitmapIsAllClear(vcpus)) { + ret = 0; + goto cleanup; + } + + if (!(flags & VIR_DOMAIN_DEF_PARSE_INACTIVE)) + monid = virXMLPropString(nodes[i], "id"); + + if (!virDomainCpuResmonDefValidate(def, monid, vcpus, NULL)) { + virReportError(VIR_ERR_INVALID_ARG, + "%s", _("vcpus or group name conflicts with domain " + "settings")); + goto cleanup; + } + + if (!virDomainCpuResmonDefAdd(def, vcpus, monid)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + "%s", _("Error in add resource monitoring group settings " + "to configuration file")); + goto cleanup; + } + + virBitmapFree(vcpus); + vcpus = NULL; + VIR_FREE(monid); + monid = NULL; + } + + ret = 0; + cleanup: + ctxt->node = oldnode; + virBitmapFree(vcpus); + VIR_FREE(monid); + VIR_FREE(vcpus_str); + VIR_FREE(nodes); + return ret; +} + + static virDomainDefPtr virDomainDefParseXML(xmlDocPtr xml, xmlNodePtr root, @@ -19648,8 +19922,15 @@ virDomainDefParseXML(xmlDocPtr xml, if (virDomainCachetuneDefParse(def, ctxt, nodes[i], flags) < 0) goto error; } + VIR_FREE(nodes); + if (virDomainCpuResmonDefParse(def, ctxt, flags) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("cannot extract CPU resource monitoring group setting")); + goto error; + } + if (virCPUDefParseXML(ctxt, "./cpu[1]", VIR_CPU_TYPE_GUEST, &def->cpu) < 0) goto error; @@ -26943,6 +27224,41 @@ virDomainCachetuneDefFormat(virBufferPtr buf, static int +virDomainCpuResmonDefFormat(virBufferPtr buf, + virDomainDefPtr def, + unsigned int flags) +{ + char *vcpus = NULL; + size_t i = 0; + int ret = -1; + + for (i = 0; i < def->nresmons; i++) { + vcpus = virBitmapFormat(def->resmons[i]->vcpus); + if (!vcpus) + goto cleanup; + + virBufferAsprintf(buf, "<resmongroup vcpus='%s'", vcpus); + + if (!(flags & VIR_DOMAIN_DEF_FORMAT_INACTIVE)) { + const char *mon_id = virResctrlMonGetID(def->resmons[i]->mon); + if (!mon_id) + goto cleanup; + + virBufferAsprintf(buf, " id='%s'", mon_id); + } + virBufferAddLit(buf, "/>\n"); + + VIR_FREE(vcpus); + } + + ret = 0; + cleanup: + VIR_FREE(vcpus); + return ret; +} + + +static int virDomainCputuneDefFormat(virBufferPtr buf, virDomainDefPtr def, unsigned int flags) @@ -27047,6 +27363,8 @@ virDomainCputuneDefFormat(virBufferPtr buf, for (i = 0; i < def->ncachetunes; i++) virDomainCachetuneDefFormat(&childrenBuf, def->cachetunes[i], flags); + virDomainCpuResmonDefFormat(&childrenBuf, def, flags); + if (virBufferCheckError(&childrenBuf) < 0) return -1; diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 41d2748..7d31254 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2237,6 +2237,13 @@ struct _virDomainCachetuneDef { virResctrlAllocPtr alloc; }; +typedef struct _virDomainCpuResmonDef virDomainCpuResmonDef; +typedef virDomainCpuResmonDef *virDomainCpuResmonDefPtr; + +struct _virDomainCpuResmonDef { + virBitmapPtr vcpus; + virResctrlMonPtr mon; +}; typedef struct _virDomainVcpuDef virDomainVcpuDef; typedef virDomainVcpuDef *virDomainVcpuDefPtr; @@ -2413,6 +2420,9 @@ struct _virDomainDef { virDomainCachetuneDefPtr *cachetunes; size_t ncachetunes; + virDomainCpuResmonDefPtr *resmons; + size_t nresmons; + virDomainNumaPtr numa; virDomainResourceDefPtr resource; virDomainIdMapDef idmap; @@ -3640,4 +3650,19 @@ virDomainDiskGetDetectZeroesMode(virDomainDiskDiscard discard, bool virDomainDefHasManagedPR(const virDomainDef *def); +bool +virDomainCpuResmonDefValidate(virDomainDefPtr def, + const char *id, + virBitmapPtr vcpus, + virResctrlAllocPtr *pairedalloc); + +virDomainCpuResmonDefPtr +virDomainCpuResmonDefAdd(virDomainDefPtr def, + virBitmapPtr vcpus, + const char *monid); + +virResctrlMonPtr +virDomainCpuResmonDefRemove(virDomainDefPtr def, + const char *monid); + #endif /* __DOMAIN_CONF_H */ diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index b10a3a5..9c2b2f0 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -243,6 +243,9 @@ virDomainControllerRemove; virDomainControllerTypeToString; virDomainCpuPlacementModeTypeFromString; virDomainCpuPlacementModeTypeToString; +virDomainCpuResmonDefAdd; +virDomainCpuResmonDefRemove; +virDomainCpuResmonDefValidate; virDomainDefAddController; virDomainDefAddImplicitDevices; virDomainDefAddUSBController; -- 2.7.4

--- tests/genericxml2xmlindata/cachetune-cdp.xml | 3 ++ tests/genericxml2xmlindata/cachetune-small.xml | 2 ++ tests/genericxml2xmlindata/cachetune.xml | 2 ++ .../resmongroup-colliding-cachetune.xml | 34 ++++++++++++++++++++++ tests/genericxml2xmltest.c | 3 ++ 5 files changed, 44 insertions(+) create mode 100644 tests/genericxml2xmlindata/resmongroup-colliding-cachetune.xml diff --git a/tests/genericxml2xmlindata/cachetune-cdp.xml b/tests/genericxml2xmlindata/cachetune-cdp.xml index 9718f06..9b0874e 100644 --- a/tests/genericxml2xmlindata/cachetune-cdp.xml +++ b/tests/genericxml2xmlindata/cachetune-cdp.xml @@ -15,6 +15,9 @@ <cachetune vcpus='3'> <cache id='1' level='3' type='data' size='6912' unit='KiB'/> </cachetune> + <resmongroup vcpus='0-1'/> + <resmongroup vcpus='2'/> + <resmongroup vcpus='3'/> </cputune> <os> <type arch='i686' machine='pc'>hvm</type> diff --git a/tests/genericxml2xmlindata/cachetune-small.xml b/tests/genericxml2xmlindata/cachetune-small.xml index ab2d9cf..ef4321e 100644 --- a/tests/genericxml2xmlindata/cachetune-small.xml +++ b/tests/genericxml2xmlindata/cachetune-small.xml @@ -8,6 +8,8 @@ <cachetune vcpus='0-1'> <cache id='0' level='3' type='both' size='768' unit='KiB'/> </cachetune> + <resmongroup vcpus='0-1'/> + <resmongroup vcpus='2-3'/> </cputune> <os> <type arch='i686' machine='pc'>hvm</type> diff --git a/tests/genericxml2xmlindata/cachetune.xml b/tests/genericxml2xmlindata/cachetune.xml index 645cab7..d30c730 100644 --- a/tests/genericxml2xmlindata/cachetune.xml +++ b/tests/genericxml2xmlindata/cachetune.xml @@ -12,6 +12,8 @@ <cachetune vcpus='3'> <cache id='0' level='3' type='both' size='3' unit='MiB'/> </cachetune> + <resmongroup vcpus='0-1'/> + <resmongroup vcpus='3'/> </cputune> <os> <type arch='i686' machine='pc'>hvm</type> diff --git a/tests/genericxml2xmlindata/resmongroup-colliding-cachetune.xml b/tests/genericxml2xmlindata/resmongroup-colliding-cachetune.xml new file mode 100644 index 0000000..ff85cd4 --- /dev/null +++ b/tests/genericxml2xmlindata/resmongroup-colliding-cachetune.xml @@ -0,0 +1,34 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219136</memory> + <currentMemory unit='KiB'>219136</currentMemory> + <vcpu placement='static'>4</vcpu> + <cputune> + <cachetune vcpus='0-1'> + <cache id='0' level='3' type='both' size='3' unit='MiB'/> + <cache id='1' level='3' type='both' size='3' unit='MiB'/> + </cachetune> + <cachetune vcpus='3'> + <cache id='0' level='3' type='both' size='3' unit='MiB'/> + </cachetune> + <resmongroup vcpus='0'/> + </cputune> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-i686</emulator> + <controller type='usb' index='0'/> + <controller type='ide' index='0'/> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/genericxml2xmltest.c b/tests/genericxml2xmltest.c index 7a4fc1e..68a366e 100644 --- a/tests/genericxml2xmltest.c +++ b/tests/genericxml2xmltest.c @@ -145,6 +145,9 @@ mymain(void) DO_TEST("launch-security-sev"); + DO_TEST_FULL("resmongroup-colliding-cachetune", false, true, + TEST_COMPARE_DOM_XML2XML_RESULT_FAIL_PARSE); + virObjectUnref(caps); virObjectUnref(xmlopt); -- 2.7.4

support functions to create, destory and monitoring resctl monioring group. --- include/libvirt/libvirt-domain.h | 13 ++++++ src/conf/domain_conf.c | 2 + src/driver-hypervisor.h | 13 ++++++ src/libvirt-domain.c | 96 ++++++++++++++++++++++++++++++++++++++++ src/libvirt_public.syms | 6 +++ 5 files changed, 130 insertions(+) diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h index 796f2e1..c703346 100644 --- a/include/libvirt/libvirt-domain.h +++ b/include/libvirt/libvirt-domain.h @@ -4785,5 +4785,18 @@ int virDomainGetLaunchSecurityInfo(virDomainPtr domain, virTypedParameterPtr *params, int *nparams, unsigned int flags); +/* + * cpures API + */ +int virDomainSetCPUResmon(virDomainPtr domain, + const char *vcpustr, + const char *mongroup, + int action, + unsigned int flags); + +int virDomainGetCPUResmonSts(virDomainPtr domain, + const char *mongroup, + char **sts); + #endif /* __VIR_LIBVIRT_DOMAIN_H__ */ diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 0cdad79..393439a 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -19219,6 +19219,7 @@ virDomainCpuResmonDefRemove(virDomainDefPtr def, if (!monid) { virReportError(VIR_ERR_INVALID_ARG, + "%s", _("Cannot remove resource monitoring group: " "group name is NULL")); goto error; @@ -19228,6 +19229,7 @@ virDomainCpuResmonDefRemove(virDomainDefPtr def, const char *id = virResctrlMonGetID(def->resmons[i]->mon); if (!id) { virReportError(VIR_ERR_INTERNAL_ERROR, + "%s", _("Cannot remove resource monitoring group: " "error in get monitoring group name")); goto error; diff --git a/src/driver-hypervisor.h b/src/driver-hypervisor.h index eef31eb..8b736da 100644 --- a/src/driver-hypervisor.h +++ b/src/driver-hypervisor.h @@ -1321,6 +1321,17 @@ typedef int int *nparams, unsigned int flags); +typedef int +(*virDrvDomainSetCPUResmon)(virDomainPtr domain, + const char *vcpustr, + const char *monid, + int action, + unsigned int flags); + +typedef char * +(*virDrvDomainGetCPUResmonSts)(virDomainPtr domain, + const char *monid); + typedef struct _virHypervisorDriver virHypervisorDriver; typedef virHypervisorDriver *virHypervisorDriverPtr; @@ -1572,6 +1583,8 @@ struct _virHypervisorDriver { virDrvConnectBaselineHypervisorCPU connectBaselineHypervisorCPU; virDrvNodeGetSEVInfo nodeGetSEVInfo; virDrvDomainGetLaunchSecurityInfo domainGetLaunchSecurityInfo; + virDrvDomainSetCPUResmon domainSetCPUResmon; + virDrvDomainGetCPUResmonSts domainGetCPUResmonSts; }; diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c index ab7266d..8b080fc 100644 --- a/src/libvirt-domain.c +++ b/src/libvirt-domain.c @@ -11488,6 +11488,11 @@ virConnectGetDomainCapabilities(virConnectPtr conn, * long long. It is produced by the * emulation_faults perf event * + * VIR_DOMAIN_STATS_CPURES + * "cpu.cacheoccupancy" - the usage of l3 cache (bytes) by applications + * running on the platform as unsigned long long. It is + * retrieved from resctrl file system. + * * Note that entire stats groups or individual stat fields may be missing from * the output in case they are not supported by the given hypervisor, are not * applicable for the current state of the guest domain, or their retrieval @@ -12220,3 +12225,94 @@ int virDomainGetLaunchSecurityInfo(virDomainPtr domain, virDispatchError(domain->conn); return -1; } + + +/** + * virDomainSetCPUResmon: + * @domain : a domain object + * @vcpustr: string specifying vcpus list + * @mongroup: mon group id + * @action : action to be performed + * 1 for enabling a rdt monitroing group + * 2 for disabling a rdt monitroing group + * not valid for others + * @flags : bitwise-OR of virDomainModificationImpact + * + * Enable or disable resctrl monitoring. + * + * Returns -1 in case of failure, 0 in case of success. + */ +int +virDomainSetCPUResmon(virDomainPtr domain, + const char *vcpustr, + const char *mongroup, + int action, + unsigned int flags) +{ + int ret; + virConnectPtr conn; + + virResetLastError(); + + virCheckDomainReturn(domain, -1); + + conn = domain->conn; + + if (conn->driver->domainSetCPUResmon) { + ret = conn->driver->domainSetCPUResmon( + domain, + vcpustr, + mongroup, + action, + flags); + if (ret < 0) + goto error; + return ret; + } + + virReportUnsupportedError(); + error: + virDispatchError(domain->conn); + return -1; +} + + +/** + * virDomainGetCPUResmonSts: + * @domain: a domain object + * @mongroup: mon group id + * @status: pointer of a string buffer for holding resctrl mon + * group status string, caller is responsible for free it. + * + * Get domain resctrl status. + * + * Returns -1 in case of failure, 0 in case of success. + */ +int +virDomainGetCPUResmonSts(virDomainPtr domain, + const char *mongroup, + char **status) +{ + /* *allstatus*, the magic string for retrieving all domain's status */ + const char *monid = mongroup ? mongroup : "*allstatus*"; + virConnectPtr conn; + + virResetLastError(); + + virCheckDomainReturn(domain, -1); + + conn = domain->conn; + + if (conn->driver->domainGetCPUResmonSts) { + *status = conn->driver->domainGetCPUResmonSts(domain, monid); + if (*status) + return 0; + + goto error; + } + + virReportUnsupportedError(); + error: + virDispatchError(domain->conn); + return -1; +} diff --git a/src/libvirt_public.syms b/src/libvirt_public.syms index d4cdbd8..0b75146 100644 --- a/src/libvirt_public.syms +++ b/src/libvirt_public.syms @@ -809,4 +809,10 @@ LIBVIRT_4.5.0 { virNWFilterBindingGetFilterName; } LIBVIRT_4.4.0; +LIBVIRT_4.6.0 { + global: + virDomainSetCPUResmon; + virDomainGetCPUResmonSts; +} LIBVIRT_4.5.0; + # .... define new API here using predicted next version number .... -- 2.7.4

fetching resctrl monitoring group settings from def->resmons and createing resctrl group accodring to cachetune element status. This patch relies on the function of resctrl of util. --- src/qemu/qemu_process.c | 45 +++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 41 insertions(+), 4 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 40d35cb..eb0778d 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -2444,10 +2444,12 @@ qemuProcessResctrlCreate(virQEMUDriverPtr driver, { int ret = -1; size_t i = 0; + size_t j = 0; virCapsPtr caps = NULL; + virResctrlAllocPtr alloc = NULL; qemuDomainObjPrivatePtr priv = vm->privateData; - if (!vm->def->ncachetunes) + if (!vm->def->ncachetunes && !vm->def->nresmons) return 0; /* Force capability refresh since resctrl info can change @@ -2463,6 +2465,29 @@ qemuProcessResctrlCreate(virQEMUDriverPtr driver, goto cleanup; } + for (i = 0; i < vm->def->nresmons; i++) { + alloc = NULL; + for (j = 0; j < vm->def->ncachetunes; j++) { + const char *monid + = virResctrlMonGetID(vm->def->resmons[i]->mon); + const char *allocid + = virResctrlAllocGetID(vm->def->cachetunes[j]->alloc); + if (STREQ(monid, allocid) && + (virBitmapEqual(vm->def->resmons[i]->vcpus, + vm->def->cachetunes[j]->vcpus))) { + alloc = vm->def->cachetunes[j]->alloc; + break; + } + + } + + if (virResctrlMonCreate( + alloc, + vm->def->resmons[i]->mon, + priv->machineName) < 0) + goto cleanup; + } + ret = 0; cleanup: virObjectUnref(caps); @@ -5272,6 +5297,16 @@ qemuProcessSetupVcpu(virDomainObjPtr vm, } } + for (i = 0; i < vm->def->nresmons; i++) { + virDomainCpuResmonDefPtr rt = vm->def->resmons[i]; + + if (virBitmapIsBitSet(rt->vcpus, vcpuid)) { + if (virResctrlMonAddPID(rt->mon, vcpupid) < 0) + return -1; + break; + } + } + return 0; } @@ -6960,11 +6995,13 @@ void qemuProcessStop(virQEMUDriverPtr driver, vm->def->name); } - /* Remove resctrl allocation after cgroups are cleaned up which makes it - * kind of safer (although removing the allocation should work even with - * pids in tasks file */ + /* Remove resctrl allocation and monitoring group after cgroups are cleaned + * up which makes it kind of safer (although removing the allocation should + * work even with pids in tasks file */ for (i = 0; i < vm->def->ncachetunes; i++) virResctrlAllocRemove(vm->def->cachetunes[i]->alloc); + for (i = 0; i < vm->def->nresmons; i++) + virResctrlMonRemove(vm->def->resmons[i]->mon); qemuProcessRemoveDomainStatus(driver, vm); -- 2.7.4

Function includes setting and getting the status of resource monitoring group. --- src/remote/remote_daemon_dispatch.c | 45 +++++++++++++++++++++++++++++++++++++ src/remote/remote_driver.c | 4 +++- src/remote/remote_protocol.x | 31 ++++++++++++++++++++++++- src/remote_protocol-structs | 16 +++++++++++++ 4 files changed, 94 insertions(+), 2 deletions(-) diff --git a/src/remote/remote_daemon_dispatch.c b/src/remote/remote_daemon_dispatch.c index 4a93f09..fbec052 100644 --- a/src/remote/remote_daemon_dispatch.c +++ b/src/remote/remote_daemon_dispatch.c @@ -7213,3 +7213,48 @@ remoteSerializeDomainDiskErrors(virDomainDiskErrorPtr errors, } return -1; } + +static int remoteDispatchDomainGetCPUResmonSts( + virNetServerPtr server ATTRIBUTE_UNUSED, + virNetServerClientPtr client, + virNetMessagePtr msg ATTRIBUTE_UNUSED, + virNetMessageErrorPtr rerr, + remote_domain_get_cpu_resmon_sts_args *args, + remote_domain_get_cpu_resmon_sts_ret *ret) +{ + int rv = -1; + virDomainPtr dom = NULL; + char *sts = NULL; + char **sts_p = NULL; + struct daemonClientPrivate *priv = + virNetServerClientGetPrivateData(client); + + if (!priv->conn) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("connection not open")); + goto cleanup; + } + + if (!(dom = get_nonnull_domain(priv->conn, args->dom))) + goto cleanup; + + if ((rv = virDomainGetCPUResmonSts(dom, args->monid, &sts)) < 0) + goto cleanup; + + if (VIR_ALLOC(sts_p) < 0) + goto cleanup; + + if (VIR_STRDUP(*sts_p, sts) < 0) + goto cleanup; + + ret->sts = sts_p; + rv = 0; + + cleanup: + if (rv < 0) { + virNetMessageSaveError(rerr); + VIR_FREE(sts_p); + } + virObjectUnref(dom); + VIR_FREE(sts); + return rv; +} diff --git a/src/remote/remote_driver.c b/src/remote/remote_driver.c index 1d94c2e..3a83df6 100644 --- a/src/remote/remote_driver.c +++ b/src/remote/remote_driver.c @@ -8536,7 +8536,9 @@ static virHypervisorDriver hypervisor_driver = { .connectCompareHypervisorCPU = remoteConnectCompareHypervisorCPU, /* 4.4.0 */ .connectBaselineHypervisorCPU = remoteConnectBaselineHypervisorCPU, /* 4.4.0 */ .nodeGetSEVInfo = remoteNodeGetSEVInfo, /* 4.5.0 */ - .domainGetLaunchSecurityInfo = remoteDomainGetLaunchSecurityInfo /* 4.5.0 */ + .domainGetLaunchSecurityInfo = remoteDomainGetLaunchSecurityInfo, /* 4.5.0 */ + .domainSetCPUResmon = remoteDomainSetCPUResmon, /* 4.5.0 */ + .domainGetCPUResmonSts = remoteDomainGetCPUResmonSts, /* 4.6.0 */ }; static virNetworkDriver network_driver = { diff --git a/src/remote/remote_protocol.x b/src/remote/remote_protocol.x index 28c8feb..fbf88a0 100644 --- a/src/remote/remote_protocol.x +++ b/src/remote/remote_protocol.x @@ -3557,6 +3557,23 @@ struct remote_connect_list_all_nwfilter_bindings_ret { /* insert@1 */ unsigned int ret; }; +struct remote_domain_set_cpu_resmon_args { + remote_nonnull_domain dom; + remote_string vcpustr; + remote_string monid; + int action; + unsigned int flags; +}; + +struct remote_domain_get_cpu_resmon_sts_args { + remote_nonnull_domain dom; + remote_nonnull_string monid; +}; + +struct remote_domain_get_cpu_resmon_sts_ret { /* insert@1 */ + remote_string sts; +}; + /*----- Protocol. -----*/ /* Define the program number, protocol version and procedure numbers here. */ @@ -6312,5 +6329,17 @@ enum remote_procedure { * @acl: connect:search_nwfilter_bindings * @aclfilter: nwfilter_binding:getattr */ - REMOTE_PROC_CONNECT_LIST_ALL_NWFILTER_BINDINGS = 401 + REMOTE_PROC_CONNECT_LIST_ALL_NWFILTER_BINDINGS = 401, + + /** + * @generate: both + * @acl: domain:write + */ + REMOTE_PROC_DOMAIN_SET_CPU_RESMON = 402, + + /** + * @generate: client + * @acl: domain:read + */ + REMOTE_PROC_DOMAIN_GET_CPU_RESMON_STS = 403 }; diff --git a/src/remote_protocol-structs b/src/remote_protocol-structs index 6343e14..ddbab04 100644 --- a/src/remote_protocol-structs +++ b/src/remote_protocol-structs @@ -2966,6 +2966,20 @@ struct remote_connect_list_all_nwfilter_bindings_ret { } bindings; u_int ret; }; +struct remote_domain_set_cpu_resmon_args { + remote_nonnull_domain dom; + remote_string vcpustr; + remote_string monid; + int action; + unsigned int flags; +}; +struct remote_domain_get_cpu_resmon_sts_args { + remote_nonnull_domain dom; + remote_nonnull_string monid; +}; +struct remote_domain_get_cpu_resmon_sts_ret { + remote_string sts; +}; enum remote_procedure { REMOTE_PROC_CONNECT_OPEN = 1, REMOTE_PROC_CONNECT_CLOSE = 2, @@ -3368,4 +3382,6 @@ enum remote_procedure { REMOTE_PROC_NWFILTER_BINDING_CREATE_XML = 399, REMOTE_PROC_NWFILTER_BINDING_DELETE = 400, REMOTE_PROC_CONNECT_LIST_ALL_NWFILTER_BINDINGS = 401, + REMOTE_PROC_DOMAIN_SET_CPU_RESMON = 402, + REMOTE_PROC_DOMAIN_GET_CPU_RESMON_STS = 403, }; -- 2.7.4

Add interfaces for resource monitoring group - query monitoring group status - dynamically create monitoring group - dynamically destory monitoring group --- src/qemu/qemu_driver.c | 252 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 252 insertions(+) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 9a35e04..647d864 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -21607,6 +21607,256 @@ qemuDomainGetLaunchSecurityInfo(virDomainPtr domain, return ret; } + +static int +qemuDomainSetCPUResmon(virDomainPtr dom, + const char *vcpumap, + const char *monid, + int action, + unsigned int flags) +{ + virDomainDefPtr def; + virDomainDefPtr persistentDef; + virQEMUDriverPtr driver = dom->conn->privateData; + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + virDomainObjPtr vm = NULL; + virBitmapPtr vcpus = NULL; + qemuDomainObjPrivatePtr priv = NULL; + unsigned int maxvcpus = 0; + size_t i = 0; + int ret = -1; + + virCheckFlags(VIR_DOMAIN_AFFECT_LIVE | VIR_DOMAIN_AFFECT_CONFIG, -1); + + if (action != 1 && action != 2) { + virReportError(VIR_ERR_INVALID_ARG, "%s", + _("unsupported action.")); + return ret; + } + + if (!(vm = qemuDomObjFromDomain(dom))) + return ret; + + if (vcpumap) { + if (virBitmapParse(vcpumap, &vcpus, QEMU_GUEST_VCPU_MAX_ID) < 0 || + virBitmapLastSetBit(vcpus) < 0) { + virReportError(VIR_ERR_INVALID_ARG, "%s", + _("no vcpus selected for modification")); + goto cleanup; + } + } + + if (!vcpus) { + if (!monid) { + virReportError(VIR_ERR_INVALID_ARG, "%s", + _("bad resource monitoring group ID")); + goto cleanup; + } + + for (i = 0; i < vm->def->nresmons; i++) { + const char *id = virResctrlMonGetID(vm->def->resmons[i]->mon); + if (id && STREQ(monid, id)) { + vcpus = virBitmapNewCopy(vm->def->resmons[i]->vcpus); + break; + } + } + + if (!vcpus) { + virReportError(VIR_ERR_INVALID_ARG, "%s", + _("bad resource monitoring group ID")); + goto cleanup; + } + } + + priv = vm->privateData; + + if (virDomainSetCPUResmonEnsureACL(dom->conn, vm->def) < 0) + goto cleanup; + + if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0) + goto cleanup; + + if (virDomainObjGetDefs(vm, flags, &def, &persistentDef) < 0) + goto endjob; + + if (action == 2) { + /* action == 'DESTROY' */ + + if (def) { + virResctrlMonPtr mon = virDomainCpuResmonDefRemove(def, monid); + if (!mon) + goto endjob; + + /* if allocation group exists, there is no way + * to disable it */ + virResctrlAllocPtr alloc = virResctrlMonGetAlloc(mon); + if (!alloc) { + if (virResctrlMonRemove(mon) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Error in remove rdt mon group.")); + goto endjob; + } + } + + virObjectUnref(mon); + } + + if (persistentDef) { + virResctrlMonPtr monpersist = + virDomainCpuResmonDefRemove(persistentDef, monid); + if (!monpersist) + goto endjob; + + virObjectUnref(monpersist); + } + } + + if (action == 1) { + /* action == 'CREATE' */ + + if (def) { + virResctrlAllocPtr alloc = NULL; + if (!virDomainCpuResmonDefValidate(def, + monid, + vcpus, + &alloc)) { + virReportError(VIR_ERR_INVALID_ARG, + "%s", + _("error in create resource monitoring " + "group vcpus or group name conflicts " + "with domain settings")); + goto endjob; + } + + virDomainCpuResmonDefPtr resmon = + virDomainCpuResmonDefAdd(def, vcpus, monid); + + if (!resmon) { + virReportError(VIR_ERR_INVALID_ARG, "%s", + _("cannot create set rdt monitoring for " + "live configuration.")); + goto endjob; + } + + if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, + vm, driver->caps) < 0) + goto endjob; + + if (!virResctrlMonIsRunning(resmon->mon)) { + if (virResctrlMonCreate(alloc, + resmon->mon, + priv->machineName) < 0) + goto endjob; + + maxvcpus = virDomainDefGetVcpusMax(vm->def); + for (i = 0; i < maxvcpus; i++) { + virDomainVcpuDefPtr vcpu + = virDomainDefGetVcpu(vm->def, i); + + if (!vcpu->online) + continue; + + if (virBitmapIsBitSet(resmon->vcpus, i)) { + pid_t vcpupid = qemuDomainGetVcpuPid(vm, i); + if (virResctrlMonAddPID(resmon->mon, vcpupid) < 0) + goto endjob; + } + } + } + } + + if (persistentDef) { + if (!virDomainCpuResmonDefValidate(persistentDef, + monid, vcpus, + NULL)) { + virReportError(VIR_ERR_INVALID_ARG, + "%s", + _("Error in creating resource monitoring " + "group: vcpus or group name conflicts " + "with domain settings")); + goto endjob; + } + + if (!virDomainCpuResmonDefAdd(persistentDef, + vcpus, + monid)) { + virReportError(VIR_ERR_INVALID_ARG, "%s", + _("cannot create set resource monitoring group " + "for domain persistent configuration")); + goto endjob; + } + + if (virDomainSaveConfig(cfg->configDir, driver->caps, + persistentDef) < 0) + goto endjob; + } + } + + ret = 0; + endjob: + qemuDomainObjEndJob(driver, vm); + + cleanup: + virBitmapFree(vcpus); + virDomainObjEndAPI(&vm); + virObjectUnref(cfg); + return ret; +} + +static char * +qemuDomainGetCPUResmonSts(virDomainPtr dom, const char *monid) +{ + virDomainObjPtr vm = NULL; + virDomainCpuResmonDefPtr resmon = NULL; + virBuffer buf = VIR_BUFFER_INITIALIZER; + char *bufstr = NULL; + char *sts = NULL; + size_t i = 0; + bool listallstatus = false; + + /* "*allstatus*" is the magic string for getting all existing + * mon group status */ + if (STREQ(monid, "*allstatus*")) + listallstatus = true; + + if (virAsprintf(&sts, "no group found") < 0) + goto cleanup; + + if (!(vm = qemuDomObjFromDomain(dom))) + return sts; + + if (virDomainGetCPUResmonStsEnsureACL(dom->conn, vm->def) < 0) + goto cleanup; + + for (i = 0; i < vm->def->nresmons; i++) { + resmon = vm->def->resmons[i]; + const char *id = virResctrlMonGetID(resmon->mon); + if (!id) + goto cleanup; + + if (!listallstatus && STRNEQ(monid, id)) + continue; + + if (virResctrlMonIsRunning(resmon->mon)) + virBufferStrcat(&buf, "group name: ", id, ";", NULL); + } + + bufstr = virBufferContentAndReset(&buf); + + if (bufstr) { + VIR_FREE(sts); + if (VIR_STRDUP(sts, bufstr) < 0) + goto cleanup; + VIR_FREE(bufstr); + } + + cleanup: + virBufferFreeAndReset(&buf); + virDomainObjEndAPI(&vm); + return sts; +} + + static virHypervisorDriver qemuHypervisorDriver = { .name = QEMU_DRIVER_NAME, .connectURIProbe = qemuConnectURIProbe, @@ -21832,6 +22082,8 @@ static virHypervisorDriver qemuHypervisorDriver = { .connectBaselineHypervisorCPU = qemuConnectBaselineHypervisorCPU, /* 4.4.0 */ .nodeGetSEVInfo = qemuNodeGetSEVInfo, /* 4.5.0 */ .domainGetLaunchSecurityInfo = qemuDomainGetLaunchSecurityInfo, /* 4.5.0 */ + .domainSetCPUResmon = qemuDomainSetCPUResmon, /* 4.6.0 */ + .domainGetCPUResmonSts = qemuDomainGetCPUResmonSts, /* 4.6.0 */ }; -- 2.7.4

A tool to create, destroy and query resource monitoring group lively. --- tools/virsh-domain.c | 139 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 139 insertions(+) diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index e9b88f0..6aa674e 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -7677,6 +7677,139 @@ cmdIOThreadDel(vshControl *ctl, const vshCmd *cmd) return ret; } +static const vshCmdInfo info_cpuresource[] = { + {.name = "help", + .data = N_("get or set hardware CPU RDT monitoring group") + }, + {.name = "desc", + .data = N_("Create or destroy CPU resource monitoring group.\n" + " To get current CPU resource monitoring group status: " + " virsh # cpuresource [domain]") + }, + {.name = NULL} +}; + +static const vshCmdOptDef opts_cpurescource[] = { + VIRSH_COMMON_OPT_DOMAIN_FULL(0), + {.name = "group_name", + .type = VSH_OT_ALIAS, + .help = "group-name" + }, + {.name = "group-name", + .type = VSH_OT_STRING, + .flags = VSH_OFLAG_REQ_OPT, + .help = N_("group name to manipulate") + }, + {.name = "vcpulist", + .type = VSH_OT_STRING, + .flags = VSH_OFLAG_REQ_OPT, + .help = N_("ids of vcpus to manipulate") + }, + {.name = "create", + .type = VSH_OT_BOOL, + .help = N_("Create CPU resctrl monitoring group for functions such as " + "monitoring cache occupancy"), + }, + {.name = "destroy", + .type = VSH_OT_BOOL, + .help = N_("Destroy CPU resctrl monitoring group") + }, + VIRSH_COMMON_OPT_LIVE(N_("modify/get running state")), + VIRSH_COMMON_OPT_CONFIG(N_("modify/get persistent configuration")), + VIRSH_COMMON_OPT_DOMAIN_CURRENT, + {.name = NULL} +}; + + +static bool +cmdCpuResource(vshControl *ctl, const vshCmd *cmd) +{ + virDomainPtr dom; + bool ret = false; + char *status = NULL; + int action = 0; + bool config = vshCommandOptBool(cmd, "config"); + bool live = vshCommandOptBool(cmd, "live"); + bool enable = vshCommandOptBool(cmd, "create"); + bool disable = vshCommandOptBool(cmd, "destroy"); + bool current = vshCommandOptBool(cmd, "current"); + const char *vcpustr = NULL; + const char *group_name = NULL; + unsigned int flags = VIR_DOMAIN_AFFECT_CURRENT; + char **tok = NULL; + size_t ntok = 0; + size_t i = 0; + int maxvcpus = 0; + + VSH_EXCLUSIVE_OPTIONS_VAR(enable, disable); + + VSH_EXCLUSIVE_OPTIONS_VAR(current, live); + VSH_EXCLUSIVE_OPTIONS_VAR(current, config); + + VSH_REQUIRE_OPTION("disable", "group-name"); + + if (config) + flags |= VIR_DOMAIN_AFFECT_CONFIG; + if (live) + flags |= VIR_DOMAIN_AFFECT_LIVE; + + if (vshCommandOptStringReq(ctl, cmd, "group-name", &group_name)) + return false; + + if (vshCommandOptStringReq(ctl, cmd, "vcpulist", &vcpustr)) + return false; + + if (enable && !vcpustr && !group_name) { + vshError(ctl, _("Option --create requires at least one " + "of the options --group-name or --vcpulist")); + return false; + } + + if (!(dom = virshCommandOptDomain(ctl, cmd, NULL))) + return false; + + if (!enable && !disable) { + if (virDomainGetCPUResmonSts(dom, group_name, &status) < 0) + goto cleanup; + + if (!status) + goto cleanup; + + maxvcpus = virDomainGetMaxVcpus(dom); + if (maxvcpus == -1) + goto cleanup; + + if (!(tok = virStringSplitCount(status, ";", maxvcpus + 1, &ntok))) + goto cleanup; + + if (ntok > maxvcpus) + ntok = maxvcpus; + + vshPrint(ctl, "CPU Resource Monitoring Group Status: \n"); + + for (i = 0; i < ntok; i++) + vshPrint(ctl, " %s\n", tok[i]); + } else { + if (disable) + action = 2; + if (enable) + action = 1; + + if (virDomainSetCPUResmon(dom, vcpustr, group_name, action, flags) < 0) + goto cleanup; + + if (!status) + goto cleanup; + } + + ret = true; + cleanup: + VIR_FREE(status); + virStringListFree(tok); + virshDomainFree(dom); + return ret; +} + /* * "cpu-stats" command */ @@ -13726,6 +13859,12 @@ const vshCmdDef domManagementCmds[] = { .info = info_attach_device, .flags = 0 }, + {.name = "cpu-resource", + .handler = cmdCpuResource, + .opts = opts_cpurescource, + .info = info_cpuresource, + .flags = 0 + }, {.name = "attach-disk", .handler = cmdAttachDisk, .opts = opts_attach_disk, -- 2.7.4

add cache occupancy information in command virsh domstats for domains has resource monitoring groups. --- include/libvirt/libvirt-domain.h | 1 + src/qemu/qemu_driver.c | 105 +++++++++++++++++++++++++++++++++++++++ tools/virsh-domain-monitor.c | 7 +++ 3 files changed, 113 insertions(+) diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h index c703346..c7ee425 100644 --- a/include/libvirt/libvirt-domain.h +++ b/include/libvirt/libvirt-domain.h @@ -2041,6 +2041,7 @@ typedef enum { VIR_DOMAIN_STATS_INTERFACE = (1 << 4), /* return domain interfaces info */ VIR_DOMAIN_STATS_BLOCK = (1 << 5), /* return domain block info */ VIR_DOMAIN_STATS_PERF = (1 << 6), /* return domain perf event info */ + VIR_DOMAIN_STATS_CPU_RES = (1<<7), /* return CPU resource info */ } virDomainStatsTypes; typedef enum { diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 647d864..9b6e3fe 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -20330,6 +20330,110 @@ qemuDomainGetStatsPerf(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, return ret; } + +static int +qemuDomainGetStatsCPUResmon(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, + virDomainObjPtr vm, + virDomainStatsRecordPtr record, + int *maxparams, + unsigned int privflags ATTRIBUTE_UNUSED) +{ + char param_name[VIR_TYPED_PARAM_FIELD_LENGTH]; + size_t i = 0; + size_t l = 0; + unsigned int llc_occu = 0; + int ret = -1; + char *vcpustr = NULL; + + for (i = 0; i < vm->def->nresmons; i++) { + virDomainCpuResmonDefPtr resmon = vm->def->resmons[i]; + + llc_occu = 0; + if (virResctrlMonIsRunning(resmon->mon)) { + if (virResctrlMonGetCacheOccupancy(resmon->mon, &llc_occu) < 0) + goto cleanup; + } + + const char *mon_id = virResctrlMonGetID(resmon->mon); + if (!mon_id) + goto cleanup; + if (!(vcpustr = virBitmapFormat(resmon->vcpus))) + goto cleanup; + + /* for vcpu string, both '1-3' and '1,3' are valid format and + * representing different vcpu set. But it is not easy to + * differentiate them at first galance, to avoid this case + * substituting all '-' with ',', e.g. substitute '1-3' with + * '1,2,3' */ + for (l = 0; l < strlen(vcpustr); l++) { + if (vcpustr[l] == '-') { + char strbuf[256]; + unsigned int cpul = 0; + unsigned int cpur = 0; + virBuffer buf = VIR_BUFFER_INITIALIZER; + unsigned int icpu = 0; + char *tmp = NULL; + + /* virStrToLong_ui is very tricking in processing '-'. to + * avoid to trigger error, replace '-' with '_' */ + vcpustr[l] = '_'; + + if (virStrToLong_ui(vcpustr, &tmp, 10, &cpul) < 0) + goto cleanup; + if (virStrToLong_ui(vcpustr + l + 1, &tmp, 10, &cpur) < 0) + goto cleanup; + if (cpur < cpul) + goto cleanup; + + for (icpu = cpul; icpu <= cpur; icpu++) { + snprintf(strbuf, 256, "%d", icpu); + virBufferStrcat(&buf, strbuf, NULL); + if (icpu != cpur) + virBufferStrcat(&buf, ",", NULL); + } + + VIR_FREE(vcpustr); + vcpustr = virBufferContentAndReset(&buf); + + break; + } + } + + snprintf(param_name, VIR_TYPED_PARAM_FIELD_LENGTH, + "cpu.cacheoccupancy.%s.value", + mon_id); + + if (virTypedParamsAddInt(&record->params, + &record->nparams, + maxparams, + param_name, + llc_occu) < 0) { + goto cleanup; + } + + snprintf(param_name, VIR_TYPED_PARAM_FIELD_LENGTH, + "cpu.cacheoccupancy.%s.vcpus", + mon_id); + + if (virTypedParamsAddString(&record->params, + &record->nparams, + maxparams, + param_name, + vcpustr) < 0) { + goto cleanup; + } + + VIR_FREE(vcpustr); + vcpustr = NULL; + } + + ret = 0; + cleanup: + VIR_FREE(vcpustr); + return ret; +} + + typedef int (*qemuDomainGetStatsFunc)(virQEMUDriverPtr driver, virDomainObjPtr dom, @@ -20351,6 +20455,7 @@ static struct qemuDomainGetStatsWorker qemuDomainGetStatsWorkers[] = { { qemuDomainGetStatsInterface, VIR_DOMAIN_STATS_INTERFACE, false }, { qemuDomainGetStatsBlock, VIR_DOMAIN_STATS_BLOCK, true }, { qemuDomainGetStatsPerf, VIR_DOMAIN_STATS_PERF, false }, + { qemuDomainGetStatsCPUResmon, VIR_DOMAIN_STATS_CPU_RES, false }, { NULL, 0, false } }; diff --git a/tools/virsh-domain-monitor.c b/tools/virsh-domain-monitor.c index 87660ee..5f65f3d 100644 --- a/tools/virsh-domain-monitor.c +++ b/tools/virsh-domain-monitor.c @@ -2099,6 +2099,10 @@ static const vshCmdOptDef opts_domstats[] = { .type = VSH_OT_BOOL, .help = N_("report only stats that are accessible instantly"), }, + {.name = "cpu-resource", + .type = VSH_OT_BOOL, + .help = N_("report cpu resource information"), + }, VIRSH_COMMON_OPT_DOMAIN_OT_ARGV(N_("list of domains to get stats for"), 0), {.name = NULL} }; @@ -2164,6 +2168,9 @@ cmdDomstats(vshControl *ctl, const vshCmd *cmd) if (vshCommandOptBool(cmd, "perf")) stats |= VIR_DOMAIN_STATS_PERF; + if (vshCommandOptBool(cmd, "cpu-resource")) + stats |= VIR_DOMAIN_STATS_CPU_RES; + if (vshCommandOptBool(cmd, "list-active")) flags |= VIR_CONNECT_GET_ALL_DOMAINS_STATS_ACTIVE; -- 2.7.4

--- docs/news.xml | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/docs/news.xml b/docs/news.xml index 773c95b..d406e51 100644 --- a/docs/news.xml +++ b/docs/news.xml @@ -44,6 +44,16 @@ support should be available to the guest. </description> </change> + <change> + <summary> + Add support for Intel x86 RDT CMT(Cache Monitoring Technology) + </summary> + <description> + Report domain cache occupancy information based on vcpu groups. + Add a live command 'cpu-resource' for creating, destroying and + reporing the CPU resource monitoring group status. + </description> + </change> </section> <section title="Improvements"> </section> -- 2.7.4

This is the V2 of RFC and the POC source code for introducing x86 RDT CMT feature, thanks Martin Kletzander for his review and constructive suggestion for V1.
This series is trying to provide the similar functions of the perf event based CMT, MBMT and MBML features in reporting cache occupancy, total memory bandwidth utilization and local memory bandwidth utilization information in livirt. Firstly we focus on cmt.
x86 RDT Cache Monitoring Technology (CMT) provides a medthod to track the cache occupancy information per CPU thread. We are leveraging the implementation of kernel resctrl filesystem and create our patches on top of that.
Describing the functionality from a high level:
1. Extend the output of 'domstats' and report CMT inforamtion.
Comparing with perf event based CMT implementation in libvirt, this series extends the output of command 'domstat' and reports cache occupancy information like these: <pre> [root@dl-c200 libvirt]# virsh domstats vm3 --cpu-resource Domain: 'vm3' cpu.cacheoccupancy.vcpus_2.value=4415488 cpu.cacheoccupancy.vcpus_2.vcpus=2 cpu.cacheoccupancy.vcpus_1.value=7839744 cpu.cacheoccupancy.vcpus_1.vcpus=1 cpu.cacheoccupancy.vcpus_0,3.value=53796864 cpu.cacheoccupancy.vcpus_0,3.vcpus=0,3 </pre> Since kernel resctrlfs outputs cache occupancy information for each cache block id, I'd like to adhere to resctrlfs's arrangement and dumping cache occupancy information for each cache block in the result of 'domstats'. The output of 'domstats' would be like
Hi, Regarding the output of CMT monitoring result, which will be listed in the result of command 'domstats', I'd like to make it changed by adding a field of 'cache block id' for indicating cache occupancy of each cache block. So the CMT related message for every cache monitoring group would be: (see in-lined update for a real example) " cpu.cacheoccupancy.<mon_group_name>.vcpus = <vcpu list> cpu.cacheoccupancy.<mon_group_name>.<cache_block_id>.value = <cache occu in bytes> " I'd like to hear your voice regarding this RFC. On 2018年07月09日 15:00, Wang Huaqiang wrote: these: <pre> [root@dl-c200 libvirt]# virsh domstats vm3 --cpu-resource Domain: 'vm3' cpu.cacheoccupancy.vcpus_2.vcpus=2 cpu.cacheoccupancy.vcpus_2.1.value=27832 cpu.cacheoccupancy.vcpus_2.0.value=372186 cpu.cacheoccupancy.vcpus_1.vcpus=1 cpu.cacheoccupancy.vcpus_1.1.value=0 cpu.cacheoccupancy.vcpus_1.0.value=90112 cpu.cacheoccupancy.vcpus_0,3.vcpus=0,3 cpu.cacheoccupancy.vcpus_0,3.1.value=90112 cpu.cacheoccupancy.vcpus_0,3.0.value=540672 </pre> So from above message, it is known there is a cpu CMT resource monitoring group exists in domain with the group name 'vcpus_2'. 'cpu.cacheoccupancy.vcpus_2.vcpus=2' tells us this cpu resource monitoring group contains one vcpu, the vcpu 2. 'cpu.cacheoccupancy.vcpus_2.1.value=2387832' and 'cpu.cacheoccupancy.vcpus_2.0.value=372186' indicate the cache occupancy information for cache block 1 and cache block 0 respectively. You can also get similar information for cpu monitoring groups 'vcpus_1' and 'vcpus_0,3'.
The vcpus have been arragned into three monitoring groups, these three groups cover vcpu 1, vcpu 2 and vcpus 0,3 respectively. Take an example, the 'cpu.cacheoccupancy.vcpus_0,3.value' reports the cache occupancy information for vcpu 0 and vcpu 3, the 'cpu.cacheoccupancy.vcpus_0,3.vcpus' represents the vcpu group information.
To address Martin's suggestion "beware as 1-4 is something else than 1,4 so you need to differentiate that.", the content of 'vcpus' (cpu.cacheoccupancy.<groupname>.vcpus=xxx) has been specially processed, if vcpus is a continous range, e.g. 0-2, then the output of cpu.cacheoccupancy.vcpus_0-2.vcpus will be like 'cpu.cacheoccupancy.vcpus_0-2.vcpus=0,1,2' instead of 'cpu.cacheoccupancy.vcpus_0-2.vcpus=0-2'. Please note that 'vcpus_0-2' is a name of this monitoring group, could be specified any other word from the XML configuration file or lively changed with the command introduced in following part.
2. A new command 'cpu-resource' for live changing CMT groups.
A virsh tool has been introduced in this series to dynamically create, destroy monitoring groups as well as showing the existing grouping status. The general command interface is like this: <pre> [root@dl-c200 libvirt]# virsh help cpu-resource NAME cpu-resource - get or set hardware CPU RDT monitoring group
SYNOPSIS cpu-resource <domain> [--group-name <string>] [--vcpulist <string>] [--create] [--destroy] [--live] [--config] [--current]
DESCRIPTION Create or destroy CPU resource monitoring group. To get current CPU resource monitoring group status: virsh # cpu-resource [domain]
OPTIONS [--domain] <string> domain name, id or uuid --group-name <string> group name to manipulate --vcpulist <string> ids of vcpus to manipulate --create Create CPU resctrl monitoring group for functions such as monitoring cache occupancy --destroy Destroy CPU resctrl monitoring group --live modify/get running state --config modify/get persistent configuration --current affect current domain </pre> This command provides live interface of changing resource monitoring group and keeping the result in persistent domain XML configuration file.
3. XML configuration changes for keeping CMT groups.
To keep the monitoring group information and monitoring CPU cache resource utilization information at launch time, XML configuration file has been changed by adding a new element <resmongroup>: <pre> # Add a new element <cputune> <resmongroup vcpus='0-2'/> <resmongroup vcpus='3'/> </cputune> </pre>
4. About the naming used in this series for RDT CMT technology.
About the wording and naming used in this series for Intel RDT CMT technology, 'RDT', 'CMT' and 'resctrl' are currently used names in Intel documents and kernel namespace in the context of CPU resource, but they are pretty confusing for system administrator. But 'Resource Control' or 'Monitoring' is a not good choice either, the scope of these two phrases are too big which normally cover lots of aspects other than CPU cache and memory hbandwidth. Intel 'RDT' is technology emphasizing on the resource allocation and monitoring within the scope CPU, I would like to use the term 'cpu-resource' here to describe the technology that these patches' are trying to address. This series is focusing on CPU cache occupancy monitoring(CMT), and this naming seems has a wider scope than CMT, we could add the similar resource monitoring part for technologies of MBML and MBMT under the framework that introduced in these patches. This naming is also applicable to technology of CPU resource allocation, it is possible to add some command by adding some arguments to allocate cache or memory bandwidth at run time.
5. About emulator and io threads CMT
Currently, it is not possible to allocate an dedicated amount of cache or memory bandwidth for emulator or io threads. so the resource monitoring for emulator or io threads is not considered in this series. Could be planned in next stage.
Changes since v1: A lot of things changed, mainly * report cache occupancy information based on vcpu group instead of whole domain. * be possible to destroy vcpu group at run time * XML configuration file changed * change naming for describing 'RDT CMT' to 'cpu-resource'
Wang Huaqiang (10): util: add Intel x86 RDT/CMT support conf: introduce <resmongroup> element tests: add tests for validating <resmongroup> libvirt: add public APIs for resource monitoring group qemu: enable resctrl monitoring at booting stage remote: add remote protocol for resctrl monitoring qemu: add interfaces for dynamically manupulating resctl mon groups tool: add command cpuresource to interact with cpu resources tools: show cpu cache occupancy information in domstats news: add Intel x86 RDT CMT feature
docs/formatdomain.html.in | 17 + docs/news.xml | 10 + docs/schemas/domaincommon.rng | 14 + include/libvirt/libvirt-domain.h | 14 + src/conf/domain_conf.c | 320 ++++++++++++++++++ src/conf/domain_conf.h | 25 ++ src/driver-hypervisor.h | 13 + src/libvirt-domain.c | 96 ++++++ src/libvirt_private.syms | 13 + src/libvirt_public.syms | 6 + src/qemu/qemu_driver.c | 357 +++++++++++++++++++++ src/qemu/qemu_process.c | 45 ++- src/remote/remote_daemon_dispatch.c | 45 +++ src/remote/remote_driver.c | 4 +- src/remote/remote_protocol.x | 31 +- src/remote_protocol-structs | 16 + src/util/virresctrl.c | 338 +++++++++++++++++++ src/util/virresctrl.h | 40 +++ tests/genericxml2xmlindata/cachetune-cdp.xml | 3 + tests/genericxml2xmlindata/cachetune-small.xml | 2 + tests/genericxml2xmlindata/cachetune.xml | 2 + .../resmongroup-colliding-cachetune.xml | 34 ++ tests/genericxml2xmltest.c | 3 + tools/virsh-domain-monitor.c | 7 + tools/virsh-domain.c | 139 ++++++++ 25 files changed, 1588 insertions(+), 6 deletions(-) create mode 100644 tests/genericxml2xmlindata/resmongroup-colliding-cachetune.xml

On Mon, Jul 09, 2018 at 03:00:48PM +0800, Wang Huaqiang wrote:
This is the V2 of RFC and the POC source code for introducing x86 RDT CMT feature, thanks Martin Kletzander for his review and constructive suggestion for V1.
This series is trying to provide the similar functions of the perf event based CMT, MBMT and MBML features in reporting cache occupancy, total memory bandwidth utilization and local memory bandwidth utilization information in livirt. Firstly we focus on cmt.
x86 RDT Cache Monitoring Technology (CMT) provides a medthod to track the cache occupancy information per CPU thread. We are leveraging the implementation of kernel resctrl filesystem and create our patches on top of that.
Describing the functionality from a high level:
1. Extend the output of 'domstats' and report CMT inforamtion.
Comparing with perf event based CMT implementation in libvirt, this series extends the output of command 'domstat' and reports cache occupancy information like these: <pre> [root@dl-c200 libvirt]# virsh domstats vm3 --cpu-resource Domain: 'vm3' cpu.cacheoccupancy.vcpus_2.value=4415488 cpu.cacheoccupancy.vcpus_2.vcpus=2 cpu.cacheoccupancy.vcpus_1.value=7839744 cpu.cacheoccupancy.vcpus_1.vcpus=1 cpu.cacheoccupancy.vcpus_0,3.value=53796864 cpu.cacheoccupancy.vcpus_0,3.vcpus=0,3 </pre> The vcpus have been arragned into three monitoring groups, these three groups cover vcpu 1, vcpu 2 and vcpus 0,3 respectively. Take an example, the 'cpu.cacheoccupancy.vcpus_0,3.value' reports the cache occupancy information for vcpu 0 and vcpu 3, the 'cpu.cacheoccupancy.vcpus_0,3.vcpus' represents the vcpu group information.
To address Martin's suggestion "beware as 1-4 is something else than 1,4 so you need to differentiate that.", the content of 'vcpus' (cpu.cacheoccupancy.<groupname>.vcpus=xxx) has been specially processed, if vcpus is a continous range, e.g. 0-2, then the output of cpu.cacheoccupancy.vcpus_0-2.vcpus will be like 'cpu.cacheoccupancy.vcpus_0-2.vcpus=0,1,2' instead of 'cpu.cacheoccupancy.vcpus_0-2.vcpus=0-2'. Please note that 'vcpus_0-2' is a name of this monitoring group, could be specified any other word from the XML configuration file or lively changed with the command introduced in following part.
One small nit according to the naming (but it shouldn't block any reviewers from reviewing, just keep this in mind for next version for example) is that this is still inconsistent. The way domstats are structured when there is something like an array could shed some light into this. What you suggested is really kind of hard to parse (although looks better). What would you say to something like this: cpu.cacheoccupancy.count = 3 cpu.cacheoccupancy.0.value=4415488 cpu.cacheoccupancy.0.vcpus=2 cpu.cacheoccupancy.0.name=vcpus_2 cpu.cacheoccupancy.1.value=7839744 cpu.cacheoccupancy.1.vcpus=1 cpu.cacheoccupancy.1.name=vcpus_1 cpu.cacheoccupancy.2.value=53796864 cpu.cacheoccupancy.2.vcpus=0,3 cpu.cacheoccupancy.2.name=0,3 Other than that I didn't go through all the patches now, sorry.

Hi Martin, Thanks for your comments. Please see my reply inline.
-----Original Message----- From: Martin Kletzander [mailto:mkletzan@redhat.com] Sent: Tuesday, July 17, 2018 2:27 PM To: Wang, Huaqiang <huaqiang.wang@intel.com> Cc: libvir-list@redhat.com; Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [RFC PATCHv2 00/10] x86 RDT Cache Monitoring Technology (CMT)
On Mon, Jul 09, 2018 at 03:00:48PM +0800, Wang Huaqiang wrote:
This is the V2 of RFC and the POC source code for introducing x86 RDT CMT feature, thanks Martin Kletzander for his review and constructive suggestion for V1.
This series is trying to provide the similar functions of the perf event based CMT, MBMT and MBML features in reporting cache occupancy, total memory bandwidth utilization and local memory bandwidth utilization information in livirt. Firstly we focus on cmt.
x86 RDT Cache Monitoring Technology (CMT) provides a medthod to track the cache occupancy information per CPU thread. We are leveraging the implementation of kernel resctrl filesystem and create our patches on top of that.
Describing the functionality from a high level:
1. Extend the output of 'domstats' and report CMT inforamtion.
Comparing with perf event based CMT implementation in libvirt, this series extends the output of command 'domstat' and reports cache occupancy information like these: <pre> [root@dl-c200 libvirt]# virsh domstats vm3 --cpu-resource Domain: 'vm3' cpu.cacheoccupancy.vcpus_2.value=4415488 cpu.cacheoccupancy.vcpus_2.vcpus=2 cpu.cacheoccupancy.vcpus_1.value=7839744 cpu.cacheoccupancy.vcpus_1.vcpus=1 cpu.cacheoccupancy.vcpus_0,3.value=53796864 cpu.cacheoccupancy.vcpus_0,3.vcpus=0,3 </pre> The vcpus have been arragned into three monitoring groups, these three groups cover vcpu 1, vcpu 2 and vcpus 0,3 respectively. Take an example, the 'cpu.cacheoccupancy.vcpus_0,3.value' reports the cache occupancy information for vcpu 0 and vcpu 3, the
'cpu.cacheoccupancy.vcpus_0,3.vcpus'
represents the vcpu group information.
To address Martin's suggestion "beware as 1-4 is something else than 1,4 so you need to differentiate that.", the content of 'vcpus' (cpu.cacheoccupancy.<groupname>.vcpus=xxx) has been specially processed, if vcpus is a continous range, e.g. 0-2, then the output of cpu.cacheoccupancy.vcpus_0-2.vcpus will be like 'cpu.cacheoccupancy.vcpus_0-2.vcpus=0,1,2' instead of 'cpu.cacheoccupancy.vcpus_0-2.vcpus=0-2'. Please note that 'vcpus_0-2' is a name of this monitoring group, could be specified any other word from the XML configuration file or lively changed with the command introduced in following part.
One small nit according to the naming (but it shouldn't block any reviewers from reviewing, just keep this in mind for next version for example) is that this is still inconsistent.
OK. I'll try to use words such as 'cache', 'cpu resource' and avoid using 'RDT', 'CMT'. The way domstats are structured when there is something like an
array could shed some light into this. What you suggested is really kind of hard to parse (although looks better). What would you say to something like this:
cpu.cacheoccupancy.count = 3 cpu.cacheoccupancy.0.value=4415488 cpu.cacheoccupancy.0.vcpus=2 cpu.cacheoccupancy.0.name=vcpus_2 cpu.cacheoccupancy.1.value=7839744 cpu.cacheoccupancy.1.vcpus=1 cpu.cacheoccupancy.1.name=vcpus_1 cpu.cacheoccupancy.2.value=53796864 cpu.cacheoccupancy.2.vcpus=0,3 cpu.cacheoccupancy.2.name=0,3
Your arrangement looks more reasonable, thanks for your advice. However, as I mentioned in another email that I sent to libvirt-list hours ago, the kernel resctrl interface provides cache occupancy information for each cache block for every resource group. Maybe we need to expose the cache occupancy for each cache block. If you agree, we need to refine the 'domstats' output message, how about this: cpu.cacheoccupancy.count=3 cpu.cacheoccupancy.0.name=vcpus_2 cpu.cacheoccupancy.0.vcpus=2 cpu.cacheoccupancy.0.block.count=2 cpu.cacheoccupancy.0.block.0.bytes=5488 cpu.cacheoccupancy.0.block.1. bytes =4410000 cpu.cacheoccupancy.1.name=vcpus_1 cpu.cacheoccupancy.1.vcpus=1 cpu.cacheoccupancy.1.block.count=2 cpu.cacheoccupancy.1.block.0. bytes =7839744 cpu.cacheoccupancy.1.block.0. bytes =0 cpu.cacheoccupancy.2.name=0,3 cpu.cacheoccupancy.2.vcpus=0,3 cpu.cacheoccupancy.2.block.count=2 cpu.cacheoccupancy.2.block.0. bytes=53796864 cpu.cacheoccupancy.2.block.1. bytes=0
Other than that I didn't go through all the patches now, sorry.

On Tue, Jul 17, 2018 at 07:19:41AM +0000, Wang, Huaqiang wrote:
Hi Martin,
Thanks for your comments. Please see my reply inline.
-----Original Message----- From: Martin Kletzander [mailto:mkletzan@redhat.com] Sent: Tuesday, July 17, 2018 2:27 PM To: Wang, Huaqiang <huaqiang.wang@intel.com> Cc: libvir-list@redhat.com; Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [RFC PATCHv2 00/10] x86 RDT Cache Monitoring Technology (CMT)
On Mon, Jul 09, 2018 at 03:00:48PM +0800, Wang Huaqiang wrote:
This is the V2 of RFC and the POC source code for introducing x86 RDT CMT feature, thanks Martin Kletzander for his review and constructive suggestion for V1.
This series is trying to provide the similar functions of the perf event based CMT, MBMT and MBML features in reporting cache occupancy, total memory bandwidth utilization and local memory bandwidth utilization information in livirt. Firstly we focus on cmt.
x86 RDT Cache Monitoring Technology (CMT) provides a medthod to track the cache occupancy information per CPU thread. We are leveraging the implementation of kernel resctrl filesystem and create our patches on top of that.
Describing the functionality from a high level:
1. Extend the output of 'domstats' and report CMT inforamtion.
Comparing with perf event based CMT implementation in libvirt, this series extends the output of command 'domstat' and reports cache occupancy information like these: <pre> [root@dl-c200 libvirt]# virsh domstats vm3 --cpu-resource Domain: 'vm3' cpu.cacheoccupancy.vcpus_2.value=4415488 cpu.cacheoccupancy.vcpus_2.vcpus=2 cpu.cacheoccupancy.vcpus_1.value=7839744 cpu.cacheoccupancy.vcpus_1.vcpus=1 cpu.cacheoccupancy.vcpus_0,3.value=53796864 cpu.cacheoccupancy.vcpus_0,3.vcpus=0,3 </pre> The vcpus have been arragned into three monitoring groups, these three groups cover vcpu 1, vcpu 2 and vcpus 0,3 respectively. Take an example, the 'cpu.cacheoccupancy.vcpus_0,3.value' reports the cache occupancy information for vcpu 0 and vcpu 3, the
'cpu.cacheoccupancy.vcpus_0,3.vcpus'
represents the vcpu group information.
To address Martin's suggestion "beware as 1-4 is something else than 1,4 so you need to differentiate that.", the content of 'vcpus' (cpu.cacheoccupancy.<groupname>.vcpus=xxx) has been specially processed, if vcpus is a continous range, e.g. 0-2, then the output of cpu.cacheoccupancy.vcpus_0-2.vcpus will be like 'cpu.cacheoccupancy.vcpus_0-2.vcpus=0,1,2' instead of 'cpu.cacheoccupancy.vcpus_0-2.vcpus=0-2'. Please note that 'vcpus_0-2' is a name of this monitoring group, could be specified any other word from the XML configuration file or lively changed with the command introduced in following part.
One small nit according to the naming (but it shouldn't block any reviewers from reviewing, just keep this in mind for next version for example) is that this is still inconsistent.
OK. I'll try to use words such as 'cache', 'cpu resource' and avoid using 'RDT', 'CMT'.
Oh, you misunderstood, I meant the naming in the domstats output =)
The way domstats are structured when there is something like an
array could shed some light into this. What you suggested is really kind of hard to parse (although looks better). What would you say to something like this:
cpu.cacheoccupancy.count = 3 cpu.cacheoccupancy.0.value=4415488 cpu.cacheoccupancy.0.vcpus=2 cpu.cacheoccupancy.0.name=vcpus_2 cpu.cacheoccupancy.1.value=7839744 cpu.cacheoccupancy.1.vcpus=1 cpu.cacheoccupancy.1.name=vcpus_1 cpu.cacheoccupancy.2.value=53796864 cpu.cacheoccupancy.2.vcpus=0,3 cpu.cacheoccupancy.2.name=0,3
Your arrangement looks more reasonable, thanks for your advice. However, as I mentioned in another email that I sent to libvirt-list hours ago, the kernel resctrl interface provides cache occupancy information for each cache block for every resource group. Maybe we need to expose the cache occupancy for each cache block. If you agree, we need to refine the 'domstats' output message, how about this:
cpu.cacheoccupancy.count=3 cpu.cacheoccupancy.0.name=vcpus_2 cpu.cacheoccupancy.0.vcpus=2 cpu.cacheoccupancy.0.block.count=2 cpu.cacheoccupancy.0.block.0.bytes=5488 cpu.cacheoccupancy.0.block.1. bytes =4410000 cpu.cacheoccupancy.1.name=vcpus_1 cpu.cacheoccupancy.1.vcpus=1 cpu.cacheoccupancy.1.block.count=2 cpu.cacheoccupancy.1.block.0. bytes =7839744 cpu.cacheoccupancy.1.block.0. bytes =0 cpu.cacheoccupancy.2.name=0,3 cpu.cacheoccupancy.2.vcpus=0,3 cpu.cacheoccupancy.2.block.count=2 cpu.cacheoccupancy.2.block.0. bytes=53796864 cpu.cacheoccupancy.2.block.1. bytes=0
What do you mean by cache block? Is that (cache_size / granularity)? In that case it looks fine, I guess (without putting too much thought into it). Martin

-----Original Message----- From: Martin Kletzander [mailto:mkletzan@redhat.com] Sent: Tuesday, July 17, 2018 5:11 PM To: Wang, Huaqiang <huaqiang.wang@intel.com> Cc: libvir-list@redhat.com; Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [RFC PATCHv2 00/10] x86 RDT Cache Monitoring Technology (CMT)
On Tue, Jul 17, 2018 at 07:19:41AM +0000, Wang, Huaqiang wrote:
Hi Martin,
Thanks for your comments. Please see my reply inline.
-----Original Message----- From: Martin Kletzander [mailto:mkletzan@redhat.com] Sent: Tuesday, July 17, 2018 2:27 PM To: Wang, Huaqiang <huaqiang.wang@intel.com> Cc: libvir-list@redhat.com; Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [RFC PATCHv2 00/10] x86 RDT Cache Monitoring Technology (CMT)
On Mon, Jul 09, 2018 at 03:00:48PM +0800, Wang Huaqiang wrote:
This is the V2 of RFC and the POC source code for introducing x86 RDT CMT feature, thanks Martin Kletzander for his review and constructive suggestion for V1.
This series is trying to provide the similar functions of the perf event based CMT, MBMT and MBML features in reporting cache occupancy, total memory bandwidth utilization and local memory bandwidth utilization information in livirt. Firstly we focus on cmt.
x86 RDT Cache Monitoring Technology (CMT) provides a medthod to track the cache occupancy information per CPU thread. We are leveraging the implementation of kernel resctrl filesystem and create our patches on top of that.
Describing the functionality from a high level:
1. Extend the output of 'domstats' and report CMT inforamtion.
Comparing with perf event based CMT implementation in libvirt, this series extends the output of command 'domstat' and reports cache occupancy information like these: <pre> [root@dl-c200 libvirt]# virsh domstats vm3 --cpu-resource Domain: 'vm3' cpu.cacheoccupancy.vcpus_2.value=4415488 cpu.cacheoccupancy.vcpus_2.vcpus=2 cpu.cacheoccupancy.vcpus_1.value=7839744 cpu.cacheoccupancy.vcpus_1.vcpus=1 cpu.cacheoccupancy.vcpus_0,3.value=53796864 cpu.cacheoccupancy.vcpus_0,3.vcpus=0,3 </pre> The vcpus have been arragned into three monitoring groups, these three groups cover vcpu 1, vcpu 2 and vcpus 0,3 respectively. Take an example, the 'cpu.cacheoccupancy.vcpus_0,3.value' reports the cache occupancy information for vcpu 0 and vcpu 3, the
'cpu.cacheoccupancy.vcpus_0,3.vcpus'
represents the vcpu group information.
To address Martin's suggestion "beware as 1-4 is something else than 1,4 so you need to differentiate that.", the content of 'vcpus' (cpu.cacheoccupancy.<groupname>.vcpus=xxx) has been specially processed, if vcpus is a continous range, e.g. 0-2, then the output of cpu.cacheoccupancy.vcpus_0-2.vcpus will be like 'cpu.cacheoccupancy.vcpus_0-2.vcpus=0,1,2' instead of 'cpu.cacheoccupancy.vcpus_0-2.vcpus=0-2'. Please note that 'vcpus_0-2' is a name of this monitoring group, could be specified any other word from the XML configuration file or lively changed with the command introduced in following part.
One small nit according to the naming (but it shouldn't block any reviewers from reviewing, just keep this in mind for next version for example) is that this is still inconsistent.
OK. I'll try to use words such as 'cache', 'cpu resource' and avoid using 'RDT', 'CMT'.
Oh, you misunderstood, I meant the naming in the domstats output =)
The way domstats are structured when there is something like an
array could shed some light into this. What you suggested is really kind of hard to parse (although looks better). What would you say to something like this:
cpu.cacheoccupancy.count = 3 cpu.cacheoccupancy.0.value=4415488 cpu.cacheoccupancy.0.vcpus=2 cpu.cacheoccupancy.0.name=vcpus_2 cpu.cacheoccupancy.1.value=7839744 cpu.cacheoccupancy.1.vcpus=1 cpu.cacheoccupancy.1.name=vcpus_1 cpu.cacheoccupancy.2.value=53796864 cpu.cacheoccupancy.2.vcpus=0,3 cpu.cacheoccupancy.2.name=0,3
Your arrangement looks more reasonable, thanks for your advice. However, as I mentioned in another email that I sent to libvirt-list hours ago, the kernel resctrl interface provides cache occupancy information for each cache block for every resource group. Maybe we need to expose the cache occupancy for each cache block. If you agree, we need to refine the 'domstats' output message, how about this:
cpu.cacheoccupancy.count=3 cpu.cacheoccupancy.0.name=vcpus_2 cpu.cacheoccupancy.0.vcpus=2 cpu.cacheoccupancy.0.block.count=2 cpu.cacheoccupancy.0.block.0.bytes=5488 cpu.cacheoccupancy.0.block.1. bytes =4410000 cpu.cacheoccupancy.1.name=vcpus_1 cpu.cacheoccupancy.1.vcpus=1 cpu.cacheoccupancy.1.block.count=2 cpu.cacheoccupancy.1.block.0. bytes =7839744 cpu.cacheoccupancy.1.block.0. bytes =0 cpu.cacheoccupancy.2.name=0,3 cpu.cacheoccupancy.2.vcpus=0,3 cpu.cacheoccupancy.2.block.count=2 cpu.cacheoccupancy.2.block.0. bytes=53796864 cpu.cacheoccupancy.2.block.1. bytes=0
What do you mean by cache block? Is that (cache_size / granularity)? In that case it looks fine, I guess (without putting too much thought into it).
No. 'cache block' that I mean is indexed with 'cache id', with the id number kept in '/sys/devices/system/cpu/cpu*/cache/index*/id'. Generally for a two socket server node, there are two sockets (with CPU E5-2680 v4, for example) in system, and each socket has a L3 cache, if resctrl monitoring group is created (/sys/fs/resctrl/p0, for example), you can find the cache occupancy information for these two L3 cache areas separately from file /sys/fs/resctrl/p0/mon_data/mon_L3_00/llc_occupancy and file /sys/fs/resctrl/p0/mon_data/mon_L3_01/llc_occupancy Cache information for individual socket is meaningful to detect performance issues such as workload balancing...etc. We'd better expose these details to libvirt users. To my knowledge, I am using 'cache block' to describe the CPU cache indexed with number found in '/sys/devices/system/cpu/cpu*/cache/index*/id'. I welcome suggestion on other kind of naming for it.
Martin

On Wed, Jul 18, 2018 at 02:29:32AM +0000, Wang, Huaqiang wrote:
-----Original Message----- From: Martin Kletzander [mailto:mkletzan@redhat.com] Sent: Tuesday, July 17, 2018 5:11 PM To: Wang, Huaqiang <huaqiang.wang@intel.com> Cc: libvir-list@redhat.com; Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [RFC PATCHv2 00/10] x86 RDT Cache Monitoring Technology (CMT)
On Tue, Jul 17, 2018 at 07:19:41AM +0000, Wang, Huaqiang wrote:
Hi Martin,
Thanks for your comments. Please see my reply inline.
-----Original Message----- From: Martin Kletzander [mailto:mkletzan@redhat.com] Sent: Tuesday, July 17, 2018 2:27 PM To: Wang, Huaqiang <huaqiang.wang@intel.com> Cc: libvir-list@redhat.com; Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [RFC PATCHv2 00/10] x86 RDT Cache Monitoring Technology (CMT)
On Mon, Jul 09, 2018 at 03:00:48PM +0800, Wang Huaqiang wrote:
This is the V2 of RFC and the POC source code for introducing x86 RDT CMT feature, thanks Martin Kletzander for his review and constructive suggestion for V1.
This series is trying to provide the similar functions of the perf event based CMT, MBMT and MBML features in reporting cache occupancy, total memory bandwidth utilization and local memory bandwidth utilization information in livirt. Firstly we focus on cmt.
x86 RDT Cache Monitoring Technology (CMT) provides a medthod to track the cache occupancy information per CPU thread. We are leveraging the implementation of kernel resctrl filesystem and create our patches on top of that.
Describing the functionality from a high level:
1. Extend the output of 'domstats' and report CMT inforamtion.
Comparing with perf event based CMT implementation in libvirt, this series extends the output of command 'domstat' and reports cache occupancy information like these: <pre> [root@dl-c200 libvirt]# virsh domstats vm3 --cpu-resource Domain: 'vm3' cpu.cacheoccupancy.vcpus_2.value=4415488 cpu.cacheoccupancy.vcpus_2.vcpus=2 cpu.cacheoccupancy.vcpus_1.value=7839744 cpu.cacheoccupancy.vcpus_1.vcpus=1 cpu.cacheoccupancy.vcpus_0,3.value=53796864 cpu.cacheoccupancy.vcpus_0,3.vcpus=0,3 </pre> The vcpus have been arragned into three monitoring groups, these three groups cover vcpu 1, vcpu 2 and vcpus 0,3 respectively. Take an example, the 'cpu.cacheoccupancy.vcpus_0,3.value' reports the cache occupancy information for vcpu 0 and vcpu 3, the
'cpu.cacheoccupancy.vcpus_0,3.vcpus'
represents the vcpu group information.
To address Martin's suggestion "beware as 1-4 is something else than 1,4 so you need to differentiate that.", the content of 'vcpus' (cpu.cacheoccupancy.<groupname>.vcpus=xxx) has been specially processed, if vcpus is a continous range, e.g. 0-2, then the output of cpu.cacheoccupancy.vcpus_0-2.vcpus will be like 'cpu.cacheoccupancy.vcpus_0-2.vcpus=0,1,2' instead of 'cpu.cacheoccupancy.vcpus_0-2.vcpus=0-2'. Please note that 'vcpus_0-2' is a name of this monitoring group, could be specified any other word from the XML configuration file or lively changed with the command introduced in following part.
One small nit according to the naming (but it shouldn't block any reviewers from reviewing, just keep this in mind for next version for example) is that this is still inconsistent.
OK. I'll try to use words such as 'cache', 'cpu resource' and avoid using 'RDT', 'CMT'.
Oh, you misunderstood, I meant the naming in the domstats output =)
The way domstats are structured when there is something like an
array could shed some light into this. What you suggested is really kind of hard to parse (although looks better). What would you say to something like this:
cpu.cacheoccupancy.count = 3 cpu.cacheoccupancy.0.value=4415488 cpu.cacheoccupancy.0.vcpus=2 cpu.cacheoccupancy.0.name=vcpus_2 cpu.cacheoccupancy.1.value=7839744 cpu.cacheoccupancy.1.vcpus=1 cpu.cacheoccupancy.1.name=vcpus_1 cpu.cacheoccupancy.2.value=53796864 cpu.cacheoccupancy.2.vcpus=0,3 cpu.cacheoccupancy.2.name=0,3
Your arrangement looks more reasonable, thanks for your advice. However, as I mentioned in another email that I sent to libvirt-list hours ago, the kernel resctrl interface provides cache occupancy information for each cache block for every resource group. Maybe we need to expose the cache occupancy for each cache block. If you agree, we need to refine the 'domstats' output message, how about this:
cpu.cacheoccupancy.count=3 cpu.cacheoccupancy.0.name=vcpus_2 cpu.cacheoccupancy.0.vcpus=2 cpu.cacheoccupancy.0.block.count=2 cpu.cacheoccupancy.0.block.0.bytes=5488 cpu.cacheoccupancy.0.block.1. bytes =4410000 cpu.cacheoccupancy.1.name=vcpus_1 cpu.cacheoccupancy.1.vcpus=1 cpu.cacheoccupancy.1.block.count=2 cpu.cacheoccupancy.1.block.0. bytes =7839744 cpu.cacheoccupancy.1.block.0. bytes =0 cpu.cacheoccupancy.2.name=0,3 cpu.cacheoccupancy.2.vcpus=0,3 cpu.cacheoccupancy.2.block.count=2 cpu.cacheoccupancy.2.block.0. bytes=53796864 cpu.cacheoccupancy.2.block.1. bytes=0
What do you mean by cache block? Is that (cache_size / granularity)? In that case it looks fine, I guess (without putting too much thought into it).
No. 'cache block' that I mean is indexed with 'cache id', with the id number kept in '/sys/devices/system/cpu/cpu*/cache/index*/id'.
Generally for a two socket server node, there are two sockets (with CPU E5-2680 v4, for example) in system, and each socket has a L3 cache, if resctrl monitoring group is created (/sys/fs/resctrl/p0, for example), you can find the cache occupancy information for these two L3 cache areas separately from file /sys/fs/resctrl/p0/mon_data/mon_L3_00/llc_occupancy and file /sys/fs/resctrl/p0/mon_data/mon_L3_01/llc_occupancy Cache information for individual socket is meaningful to detect performance issues such as workload balancing...etc. We'd better expose these details to libvirt users. To my knowledge, I am using 'cache block' to describe the CPU cache indexed with number found in '/sys/devices/system/cpu/cpu*/cache/index*/id'. I welcome suggestion on other kind of naming for it.
To be consistent I'd prefer "cache" "cache bank" and "index" or "id". I don't have specific requirements, I just don't want to invent new words. Look at how it is described in capabilities for example.
Martin

-----Original Message----- From: Martin Kletzander [mailto:mkletzan@redhat.com] Sent: Wednesday, July 18, 2018 8:07 PM To: Wang, Huaqiang <huaqiang.wang@intel.com> Cc: libvir-list@redhat.com; Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [RFC PATCHv2 00/10] x86 RDT Cache Monitoring Technology (CMT)
On Wed, Jul 18, 2018 at 02:29:32AM +0000, Wang, Huaqiang wrote:
-----Original Message----- From: Martin Kletzander [mailto:mkletzan@redhat.com] Sent: Tuesday, July 17, 2018 5:11 PM To: Wang, Huaqiang <huaqiang.wang@intel.com> Cc: libvir-list@redhat.com; Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [RFC PATCHv2 00/10] x86 RDT Cache Monitoring Technology (CMT)
On Tue, Jul 17, 2018 at 07:19:41AM +0000, Wang, Huaqiang wrote:
Hi Martin,
Thanks for your comments. Please see my reply inline.
-----Original Message----- From: Martin Kletzander [mailto:mkletzan@redhat.com] Sent: Tuesday, July 17, 2018 2:27 PM To: Wang, Huaqiang <huaqiang.wang@intel.com> Cc: libvir-list@redhat.com; Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [RFC PATCHv2 00/10] x86 RDT Cache Monitoring Technology (CMT)
On Mon, Jul 09, 2018 at 03:00:48PM +0800, Wang Huaqiang wrote:
This is the V2 of RFC and the POC source code for introducing x86 RDT CMT feature, thanks Martin Kletzander for his review and constructive suggestion for V1.
This series is trying to provide the similar functions of the perf event based CMT, MBMT and MBML features in reporting cache occupancy, total memory bandwidth utilization and local memory bandwidth utilization information in livirt. Firstly we focus on cmt.
x86 RDT Cache Monitoring Technology (CMT) provides a medthod to track the cache occupancy information per CPU thread. We are leveraging the implementation of kernel resctrl filesystem and create our patches on top of that.
Describing the functionality from a high level:
1. Extend the output of 'domstats' and report CMT inforamtion.
Comparing with perf event based CMT implementation in libvirt, this series extends the output of command 'domstat' and reports cache occupancy information like these: <pre> [root@dl-c200 libvirt]# virsh domstats vm3 --cpu-resource Domain: 'vm3' cpu.cacheoccupancy.vcpus_2.value=4415488 cpu.cacheoccupancy.vcpus_2.vcpus=2 cpu.cacheoccupancy.vcpus_1.value=7839744 cpu.cacheoccupancy.vcpus_1.vcpus=1 cpu.cacheoccupancy.vcpus_0,3.value=53796864 cpu.cacheoccupancy.vcpus_0,3.vcpus=0,3 </pre> The vcpus have been arragned into three monitoring groups, these three groups cover vcpu 1, vcpu 2 and vcpus 0,3 respectively. Take an example, the 'cpu.cacheoccupancy.vcpus_0,3.value' reports the cache occupancy information for vcpu 0 and vcpu 3, the
'cpu.cacheoccupancy.vcpus_0,3.vcpus'
represents the vcpu group information.
To address Martin's suggestion "beware as 1-4 is something else than 1,4 so you need to differentiate that.", the content of 'vcpus' (cpu.cacheoccupancy.<groupname>.vcpus=xxx) has been specially processed, if vcpus is a continous range, e.g. 0-2, then the output of cpu.cacheoccupancy.vcpus_0-2.vcpus will be like 'cpu.cacheoccupancy.vcpus_0-2.vcpus=0,1,2' instead of 'cpu.cacheoccupancy.vcpus_0-2.vcpus=0-2'. Please note that 'vcpus_0-2' is a name of this monitoring group, could be specified any other word from the XML configuration file or lively changed with the command introduced in following part.
One small nit according to the naming (but it shouldn't block any reviewers from reviewing, just keep this in mind for next version for example) is that this is still inconsistent.
OK. I'll try to use words such as 'cache', 'cpu resource' and avoid using 'RDT', 'CMT'.
Oh, you misunderstood, I meant the naming in the domstats output =)
The way domstats are structured when there is something like an
array could shed some light into this. What you suggested is really kind of hard to parse (although looks better). What would you say to something like this:
cpu.cacheoccupancy.count = 3 cpu.cacheoccupancy.0.value=4415488 cpu.cacheoccupancy.0.vcpus=2 cpu.cacheoccupancy.0.name=vcpus_2 cpu.cacheoccupancy.1.value=7839744 cpu.cacheoccupancy.1.vcpus=1 cpu.cacheoccupancy.1.name=vcpus_1 cpu.cacheoccupancy.2.value=53796864 cpu.cacheoccupancy.2.vcpus=0,3 cpu.cacheoccupancy.2.name=0,3
Your arrangement looks more reasonable, thanks for your advice. However, as I mentioned in another email that I sent to libvirt-list hours ago, the kernel resctrl interface provides cache occupancy information for each cache block for every resource group. Maybe we need to expose the cache occupancy for each cache block. If you agree, we need to refine the 'domstats' output message, how about this:
cpu.cacheoccupancy.count=3 cpu.cacheoccupancy.0.name=vcpus_2 cpu.cacheoccupancy.0.vcpus=2 cpu.cacheoccupancy.0.block.count=2 cpu.cacheoccupancy.0.block.0.bytes=5488 cpu.cacheoccupancy.0.block.1. bytes =4410000 cpu.cacheoccupancy.1.name=vcpus_1 cpu.cacheoccupancy.1.vcpus=1 cpu.cacheoccupancy.1.block.count=2 cpu.cacheoccupancy.1.block.0. bytes =7839744 cpu.cacheoccupancy.1.block.0. bytes =0 cpu.cacheoccupancy.2.name=0,3 cpu.cacheoccupancy.2.vcpus=0,3 cpu.cacheoccupancy.2.block.count=2 cpu.cacheoccupancy.2.block.0. bytes=53796864 cpu.cacheoccupancy.2.block.1. bytes=0
What do you mean by cache block? Is that (cache_size / granularity)? In that case it looks fine, I guess (without putting too much thought into it).
No. 'cache block' that I mean is indexed with 'cache id', with the id number kept in '/sys/devices/system/cpu/cpu*/cache/index*/id'.
Generally for a two socket server node, there are two sockets (with CPU E5-2680 v4, for example) in system, and each socket has a L3 cache, if resctrl monitoring group is created (/sys/fs/resctrl/p0, for example), you can find the cache occupancy information for these two L3 cache areas separately from file /sys/fs/resctrl/p0/mon_data/mon_L3_00/llc_occupancy and file /sys/fs/resctrl/p0/mon_data/mon_L3_01/llc_occupancy Cache information for individual socket is meaningful to detect performance issues such as workload balancing...etc. We'd better expose these details to libvirt users. To my knowledge, I am using 'cache block' to describe the CPU cache indexed with number found in '/sys/devices/system/cpu/cpu*/cache/index*/id'. I welcome suggestion on other kind of naming for it.
To be consistent I'd prefer "cache" "cache bank" and "index" or "id". I don't have specific requirements, I just don't want to invent new words. Look at how it is described in capabilities for example.
Make sense. Then let's use 'id' for the the purpose, and the output would be: cpu.cacheoccupancy.count=3 cpu.cacheoccupancy.0.name=vcpus_2 cpu.cacheoccupancy.0.vcpus=2 cpu.cacheoccupancy.0.id.count=2 cpu.cacheoccupancy.0.id.0.bytes=5488 cpu.cacheoccupancy.0.id.1.bytes =4410000 cpu.cacheoccupancy.1.name=vcpus_1 cpu.cacheoccupancy.1.vcpus=1 cpu.cacheoccupancy.1.id.count=2 cpu.cacheoccupancy.1.id.0.bytes =7839744 cpu.cacheoccupancy.1.id.1.bytes =0 cpu.cacheoccupancy.2.name=0,3 cpu.cacheoccupancy.2.vcpus=0,3 cpu.cacheoccupancy.2.id.count=2 cpu.cacheoccupancy.2.id.0.bytes=53796864 cpu.cacheoccupancy.2.id.1.bytes=0 How about it?
Martin

On Wed, Jul 18, 2018 at 12:19:18PM +0000, Wang, Huaqiang wrote:
-----Original Message----- From: Martin Kletzander [mailto:mkletzan@redhat.com] Sent: Wednesday, July 18, 2018 8:07 PM To: Wang, Huaqiang <huaqiang.wang@intel.com> Cc: libvir-list@redhat.com; Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [RFC PATCHv2 00/10] x86 RDT Cache Monitoring Technology (CMT)
On Wed, Jul 18, 2018 at 02:29:32AM +0000, Wang, Huaqiang wrote:
-----Original Message----- From: Martin Kletzander [mailto:mkletzan@redhat.com] Sent: Tuesday, July 17, 2018 5:11 PM To: Wang, Huaqiang <huaqiang.wang@intel.com> Cc: libvir-list@redhat.com; Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [RFC PATCHv2 00/10] x86 RDT Cache Monitoring Technology (CMT)
On Tue, Jul 17, 2018 at 07:19:41AM +0000, Wang, Huaqiang wrote:
Hi Martin,
Thanks for your comments. Please see my reply inline.
-----Original Message----- From: Martin Kletzander [mailto:mkletzan@redhat.com] Sent: Tuesday, July 17, 2018 2:27 PM To: Wang, Huaqiang <huaqiang.wang@intel.com> Cc: libvir-list@redhat.com; Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [RFC PATCHv2 00/10] x86 RDT Cache Monitoring Technology (CMT)
On Mon, Jul 09, 2018 at 03:00:48PM +0800, Wang Huaqiang wrote: > >This is the V2 of RFC and the POC source code for introducing x86 >RDT CMT feature, thanks Martin Kletzander for his review and >constructive suggestion for V1. > >This series is trying to provide the similar functions of the >perf event based CMT, MBMT and MBML features in reporting cache >occupancy, total memory bandwidth utilization and local memory >bandwidth utilization information in livirt. Firstly we focus on cmt. > >x86 RDT Cache Monitoring Technology (CMT) provides a medthod to >track the cache occupancy information per CPU thread. We are >leveraging the implementation of kernel resctrl filesystem and >create our patches on top of that. > >Describing the functionality from a high level: > >1. Extend the output of 'domstats' and report CMT inforamtion. > >Comparing with perf event based CMT implementation in libvirt, >this series extends the output of command 'domstat' and reports >cache occupancy information like these: ><pre> >[root@dl-c200 libvirt]# virsh domstats vm3 --cpu-resource >Domain: 'vm3' > cpu.cacheoccupancy.vcpus_2.value=4415488 > cpu.cacheoccupancy.vcpus_2.vcpus=2 > cpu.cacheoccupancy.vcpus_1.value=7839744 > cpu.cacheoccupancy.vcpus_1.vcpus=1 > cpu.cacheoccupancy.vcpus_0,3.value=53796864 > cpu.cacheoccupancy.vcpus_0,3.vcpus=0,3 ></pre> >The vcpus have been arragned into three monitoring groups, these >three groups cover vcpu 1, vcpu 2 and vcpus 0,3 respectively. >Take an example, the 'cpu.cacheoccupancy.vcpus_0,3.value' reports >the cache occupancy information for vcpu 0 and vcpu 3, the 'cpu.cacheoccupancy.vcpus_0,3.vcpus' >represents the vcpu group information. > >To address Martin's suggestion "beware as 1-4 is something else >than >1,4 so you need to differentiate that.", the content of 'vcpus' >(cpu.cacheoccupancy.<groupname>.vcpus=xxx) has been specially >processed, if vcpus is a continous range, e.g. 0-2, then the >output of cpu.cacheoccupancy.vcpus_0-2.vcpus will be like >'cpu.cacheoccupancy.vcpus_0-2.vcpus=0,1,2' >instead of >'cpu.cacheoccupancy.vcpus_0-2.vcpus=0-2'. >Please note that 'vcpus_0-2' is a name of this monitoring group, >could be specified any other word from the XML configuration file >or lively changed with the command introduced in following part. >
One small nit according to the naming (but it shouldn't block any reviewers from reviewing, just keep this in mind for next version for example) is that this is still inconsistent.
OK. I'll try to use words such as 'cache', 'cpu resource' and avoid using 'RDT', 'CMT'.
Oh, you misunderstood, I meant the naming in the domstats output =)
The way domstats are structured when there is something like an
array could shed some light into this. What you suggested is really kind of hard to parse (although looks better). What would you say to something like this:
cpu.cacheoccupancy.count = 3 cpu.cacheoccupancy.0.value=4415488 cpu.cacheoccupancy.0.vcpus=2 cpu.cacheoccupancy.0.name=vcpus_2 cpu.cacheoccupancy.1.value=7839744 cpu.cacheoccupancy.1.vcpus=1 cpu.cacheoccupancy.1.name=vcpus_1 cpu.cacheoccupancy.2.value=53796864 cpu.cacheoccupancy.2.vcpus=0,3 cpu.cacheoccupancy.2.name=0,3
Your arrangement looks more reasonable, thanks for your advice. However, as I mentioned in another email that I sent to libvirt-list hours ago, the kernel resctrl interface provides cache occupancy information for each cache block for every resource group. Maybe we need to expose the cache occupancy for each cache block. If you agree, we need to refine the 'domstats' output message, how about this:
cpu.cacheoccupancy.count=3 cpu.cacheoccupancy.0.name=vcpus_2 cpu.cacheoccupancy.0.vcpus=2 cpu.cacheoccupancy.0.block.count=2 cpu.cacheoccupancy.0.block.0.bytes=5488 cpu.cacheoccupancy.0.block.1. bytes =4410000 cpu.cacheoccupancy.1.name=vcpus_1 cpu.cacheoccupancy.1.vcpus=1 cpu.cacheoccupancy.1.block.count=2 cpu.cacheoccupancy.1.block.0. bytes =7839744 cpu.cacheoccupancy.1.block.0. bytes =0 cpu.cacheoccupancy.2.name=0,3 cpu.cacheoccupancy.2.vcpus=0,3 cpu.cacheoccupancy.2.block.count=2 cpu.cacheoccupancy.2.block.0. bytes=53796864 cpu.cacheoccupancy.2.block.1. bytes=0
What do you mean by cache block? Is that (cache_size / granularity)? In that case it looks fine, I guess (without putting too much thought into it).
No. 'cache block' that I mean is indexed with 'cache id', with the id number kept in '/sys/devices/system/cpu/cpu*/cache/index*/id'.
Generally for a two socket server node, there are two sockets (with CPU E5-2680 v4, for example) in system, and each socket has a L3 cache, if resctrl monitoring group is created (/sys/fs/resctrl/p0, for example), you can find the cache occupancy information for these two L3 cache areas separately from file /sys/fs/resctrl/p0/mon_data/mon_L3_00/llc_occupancy and file /sys/fs/resctrl/p0/mon_data/mon_L3_01/llc_occupancy Cache information for individual socket is meaningful to detect performance issues such as workload balancing...etc. We'd better expose these details to libvirt users. To my knowledge, I am using 'cache block' to describe the CPU cache indexed with number found in '/sys/devices/system/cpu/cpu*/cache/index*/id'. I welcome suggestion on other kind of naming for it.
To be consistent I'd prefer "cache" "cache bank" and "index" or "id". I don't have specific requirements, I just don't want to invent new words. Look at how it is described in capabilities for example.
Make sense. Then let's use 'id' for the the purpose, and the output would be:
cpu.cacheoccupancy.count=3 cpu.cacheoccupancy.0.name=vcpus_2 cpu.cacheoccupancy.0.vcpus=2 cpu.cacheoccupancy.0.id.count=2 cpu.cacheoccupancy.0.id.0.bytes=5488 cpu.cacheoccupancy.0.id.1.bytes =4410000 cpu.cacheoccupancy.1.name=vcpus_1 cpu.cacheoccupancy.1.vcpus=1 cpu.cacheoccupancy.1.id.count=2 cpu.cacheoccupancy.1.id.0.bytes =7839744 cpu.cacheoccupancy.1.id.1.bytes =0 cpu.cacheoccupancy.2.name=0,3 cpu.cacheoccupancy.2.vcpus=0,3 cpu.cacheoccupancy.2.id.count=2 cpu.cacheoccupancy.2.id.0.bytes=53796864 cpu.cacheoccupancy.2.id.1.bytes=0
How about it?
I'm switching contexts too much and hence I didn't make myself clear. Since IDs are not guaranteed to be consecutive, this might be more future-proof: cpu.cacheoccupancy.count=3 cpu.cacheoccupancy.0.name=vcpus_2 cpu.cacheoccupancy.0.vcpus=2 cpu.cacheoccupancy.0.bank.count=2 cpu.cacheoccupancy.0.bank.0.id=0 cpu.cacheoccupancy.0.bank.0.bytes=5488 cpu.cacheoccupancy.0.bank.1.id=1 cpu.cacheoccupancy.0.bank.1.bytes =4410000 cpu.cacheoccupancy.1.name=vcpus_1 cpu.cacheoccupancy.1.vcpus=1 cpu.cacheoccupancy.1.bank.count=2 cpu.cacheoccupancy.0.bank.0.id=0 cpu.cacheoccupancy.1.bank.0.bytes =7839744 cpu.cacheoccupancy.0.bank.1.id=1 cpu.cacheoccupancy.1.bank.1.bytes =0 cpu.cacheoccupancy.2.name=0,3 cpu.cacheoccupancy.2.vcpus=0,3 cpu.cacheoccupancy.2.bank.count=2 cpu.cacheoccupancy.0.bank.0.id=0 cpu.cacheoccupancy.2.bank.0.bytes=53796864 cpu.cacheoccupancy.0.bank.1.id=1 cpu.cacheoccupancy.2.bank.1.bytes=0

-----Original Message----- From: Martin Kletzander [mailto:mkletzan@redhat.com] Sent: Wednesday, July 18, 2018 10:03 PM To: Wang, Huaqiang <huaqiang.wang@intel.com> Cc: libvir-list@redhat.com; Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [RFC PATCHv2 00/10] x86 RDT Cache Monitoring Technology (CMT)
On Wed, Jul 18, 2018 at 12:19:18PM +0000, Wang, Huaqiang wrote:
-----Original Message----- From: Martin Kletzander [mailto:mkletzan@redhat.com] Sent: Wednesday, July 18, 2018 8:07 PM To: Wang, Huaqiang <huaqiang.wang@intel.com> Cc: libvir-list@redhat.com; Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [RFC PATCHv2 00/10] x86 RDT Cache Monitoring Technology (CMT)
On Wed, Jul 18, 2018 at 02:29:32AM +0000, Wang, Huaqiang wrote:
-----Original Message----- From: Martin Kletzander [mailto:mkletzan@redhat.com] Sent: Tuesday, July 17, 2018 5:11 PM To: Wang, Huaqiang <huaqiang.wang@intel.com> Cc: libvir-list@redhat.com; Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [RFC PATCHv2 00/10] x86 RDT Cache Monitoring Technology (CMT)
On Tue, Jul 17, 2018 at 07:19:41AM +0000, Wang, Huaqiang wrote:
Hi Martin,
Thanks for your comments. Please see my reply inline.
> -----Original Message----- > From: Martin Kletzander [mailto:mkletzan@redhat.com] > Sent: Tuesday, July 17, 2018 2:27 PM > To: Wang, Huaqiang <huaqiang.wang@intel.com> > Cc: libvir-list@redhat.com; Feng, Shaohe > <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, > Jian-feng <jian-feng.ding@intel.com>; Zang, Rui > <rui.zang@intel.com> > Subject: Re: [libvirt] [RFC PATCHv2 00/10] x86 RDT Cache > Monitoring Technology (CMT) > > On Mon, Jul 09, 2018 at 03:00:48PM +0800, Wang Huaqiang wrote: > > > >This is the V2 of RFC and the POC source code for introducing > >x86 RDT CMT feature, thanks Martin Kletzander for his review > >and constructive suggestion for V1. > > > >This series is trying to provide the similar functions of the > >perf event based CMT, MBMT and MBML features in reporting > >cache occupancy, total memory bandwidth utilization and local > >memory bandwidth utilization information in livirt. Firstly we focus on
cmt.
> > > >x86 RDT Cache Monitoring Technology (CMT) provides a medthod > >to track the cache occupancy information per CPU thread. We > >are leveraging the implementation of kernel resctrl filesystem > >and create our patches on top of that. > > > >Describing the functionality from a high level: > > > >1. Extend the output of 'domstats' and report CMT inforamtion. > > > >Comparing with perf event based CMT implementation in libvirt, > >this series extends the output of command 'domstat' and > >reports cache occupancy information like these: > ><pre> > >[root@dl-c200 libvirt]# virsh domstats vm3 --cpu-resource > >Domain: 'vm3' > > cpu.cacheoccupancy.vcpus_2.value=4415488 > > cpu.cacheoccupancy.vcpus_2.vcpus=2 > > cpu.cacheoccupancy.vcpus_1.value=7839744 > > cpu.cacheoccupancy.vcpus_1.vcpus=1 > > cpu.cacheoccupancy.vcpus_0,3.value=53796864 > > cpu.cacheoccupancy.vcpus_0,3.vcpus=0,3 > ></pre> > >The vcpus have been arragned into three monitoring groups, > >these three groups cover vcpu 1, vcpu 2 and vcpus 0,3 respectively. > >Take an example, the 'cpu.cacheoccupancy.vcpus_0,3.value' > >reports the cache occupancy information for vcpu 0 and vcpu 3, > >the > 'cpu.cacheoccupancy.vcpus_0,3.vcpus' > >represents the vcpu group information. > > > >To address Martin's suggestion "beware as 1-4 is something > >else than > >1,4 so you need to differentiate that.", the content of 'vcpus' > >(cpu.cacheoccupancy.<groupname>.vcpus=xxx) has been specially > >processed, if vcpus is a continous range, e.g. 0-2, then the > >output of cpu.cacheoccupancy.vcpus_0-2.vcpus will be like > >'cpu.cacheoccupancy.vcpus_0-2.vcpus=0,1,2' > >instead of > >'cpu.cacheoccupancy.vcpus_0-2.vcpus=0-2'. > >Please note that 'vcpus_0-2' is a name of this monitoring > >group, could be specified any other word from the XML > >configuration file or lively changed with the command introduced in following part. > > > > One small nit according to the naming (but it shouldn't block > any reviewers from reviewing, just keep this in mind for next > version for > example) is that this is still inconsistent.
OK. I'll try to use words such as 'cache', 'cpu resource' and avoid using 'RDT', 'CMT'.
Oh, you misunderstood, I meant the naming in the domstats output =)
The way domstats are structured when there is something like an > array could shed some light into this. What you suggested is > really kind of hard to parse (although looks better). What > would you say to something like this: > > cpu.cacheoccupancy.count = 3 > cpu.cacheoccupancy.0.value=4415488 > cpu.cacheoccupancy.0.vcpus=2 > cpu.cacheoccupancy.0.name=vcpus_2 > cpu.cacheoccupancy.1.value=7839744 > cpu.cacheoccupancy.1.vcpus=1 > cpu.cacheoccupancy.1.name=vcpus_1 > cpu.cacheoccupancy.2.value=53796864 > cpu.cacheoccupancy.2.vcpus=0,3 > cpu.cacheoccupancy.2.name=0,3 >
Your arrangement looks more reasonable, thanks for your advice. However, as I mentioned in another email that I sent to libvirt-list hours ago, the kernel resctrl interface provides cache occupancy information for each cache block for every resource group. Maybe we need to expose the cache occupancy for each cache block. If you agree, we need to refine the 'domstats' output message, how about this:
cpu.cacheoccupancy.count=3 cpu.cacheoccupancy.0.name=vcpus_2 cpu.cacheoccupancy.0.vcpus=2 cpu.cacheoccupancy.0.block.count=2 cpu.cacheoccupancy.0.block.0.bytes=5488 cpu.cacheoccupancy.0.block.1. bytes =4410000 cpu.cacheoccupancy.1.name=vcpus_1 cpu.cacheoccupancy.1.vcpus=1 cpu.cacheoccupancy.1.block.count=2 cpu.cacheoccupancy.1.block.0. bytes =7839744 cpu.cacheoccupancy.1.block.0. bytes =0 cpu.cacheoccupancy.2.name=0,3 cpu.cacheoccupancy.2.vcpus=0,3 cpu.cacheoccupancy.2.block.count=2 cpu.cacheoccupancy.2.block.0. bytes=53796864 cpu.cacheoccupancy.2.block.1. bytes=0
What do you mean by cache block? Is that (cache_size / granularity)? In that case it looks fine, I guess (without putting too much thought into it).
No. 'cache block' that I mean is indexed with 'cache id', with the id number kept in '/sys/devices/system/cpu/cpu*/cache/index*/id'.
Generally for a two socket server node, there are two sockets (with CPU E5-2680 v4, for example) in system, and each socket has a L3 cache, if resctrl monitoring group is created (/sys/fs/resctrl/p0, for example), you can find the cache occupancy information for these two L3 cache areas separately from file /sys/fs/resctrl/p0/mon_data/mon_L3_00/llc_occupancy and file /sys/fs/resctrl/p0/mon_data/mon_L3_01/llc_occupancy Cache information for individual socket is meaningful to detect performance issues such as workload balancing...etc. We'd better expose these details to libvirt users. To my knowledge, I am using 'cache block' to describe the CPU cache indexed with number found in '/sys/devices/system/cpu/cpu*/cache/index*/id'. I welcome suggestion on other kind of naming for it.
To be consistent I'd prefer "cache" "cache bank" and "index" or "id". I don't have specific requirements, I just don't want to invent new words. Look at how it is described in capabilities for example.
Make sense. Then let's use 'id' for the the purpose, and the output would be:
cpu.cacheoccupancy.count=3 cpu.cacheoccupancy.0.name=vcpus_2 cpu.cacheoccupancy.0.vcpus=2 cpu.cacheoccupancy.0.id.count=2 cpu.cacheoccupancy.0.id.0.bytes=5488 cpu.cacheoccupancy.0.id.1.bytes =4410000 cpu.cacheoccupancy.1.name=vcpus_1 cpu.cacheoccupancy.1.vcpus=1 cpu.cacheoccupancy.1.id.count=2 cpu.cacheoccupancy.1.id.0.bytes =7839744 cpu.cacheoccupancy.1.id.1.bytes =0 cpu.cacheoccupancy.2.name=0,3 cpu.cacheoccupancy.2.vcpus=0,3 cpu.cacheoccupancy.2.id.count=2 cpu.cacheoccupancy.2.id.0.bytes=53796864 cpu.cacheoccupancy.2.id.1.bytes=0
How about it?
I'm switching contexts too much and hence I didn't make myself clear. Since IDs are not guaranteed to be consecutive, this might be more future-proof:
cpu.cacheoccupancy.count=3 cpu.cacheoccupancy.0.name=vcpus_2 cpu.cacheoccupancy.0.vcpus=2 cpu.cacheoccupancy.0.bank.count=2 cpu.cacheoccupancy.0.bank.0.id=0 cpu.cacheoccupancy.0.bank.0.bytes=5488 cpu.cacheoccupancy.0.bank.1.id=1 cpu.cacheoccupancy.0.bank.1.bytes =4410000 cpu.cacheoccupancy.1.name=vcpus_1 cpu.cacheoccupancy.1.vcpus=1 cpu.cacheoccupancy.1.bank.count=2 cpu.cacheoccupancy.0.bank.0.id=0 cpu.cacheoccupancy.1.bank.0.bytes =7839744 cpu.cacheoccupancy.0.bank.1.id=1 cpu.cacheoccupancy.1.bank.1.bytes =0 cpu.cacheoccupancy.2.name=0,3 cpu.cacheoccupancy.2.vcpus=0,3 cpu.cacheoccupancy.2.bank.count=2 cpu.cacheoccupancy.0.bank.0.id=0 cpu.cacheoccupancy.2.bank.0.bytes=53796864 cpu.cacheoccupancy.0.bank.1.id=1 cpu.cacheoccupancy.2.bank.1.bytes=0
It is better now. Agree.
participants (4)
-
Huaqiang,Wang
-
Martin Kletzander
-
Wang Huaqiang
-
Wang, Huaqiang