[libvirt] [RFC PATCH 0/3] RFC for X86 RDT Cache Monitoring Technology (CMT) support

This is an RFC request for supporting CPU Cache Monitoring Technology (CMT) feature in libvirt. Since MBM is also another feature which is very close to CMT, for simplicity we only discuss CMT here. MBM is the followup that will be implemented after CMT. About CMT please refer to Intel x86 SDM section 17.18 of volume 3 (link:https://software.intel.com/en-us/articles/intel-sdm). ## About '_virResctrlMon' interface The cache allocation technology (CAT) has already been implemented in util/virresctrl.* which interacts with Linux kernel resctrl file system. Very simlimar to CAT, the CMT object is represented by 'struct _virResctrlMon', which is ``` struct _virResctrlMon { virObject parent; /* pairedalloc: pointer to a resctrl allocaion it paried with. * NULL for a resctrl monitoring group not associated with * any allocation. */ virResctrlAllocPtr pairedalloc; /* The identifier (any unique string for now) */ char *id; /* libvirt-generated path, may be identical to alloction path * may not if allocation is ready */ char *path; }; ``` Almost following the same logic behind '_virResctrlAlloc' which is mainly presented in file 'virresctrl.c', a group of APIs has been designed to manipulate '_virResctrlMon'. The '_virResctrlMon' shares a lot in common with '_virResctrlAlloc' except field 'pairedalloc'. 'pairedalloc' stores the pointer of paired resctrl allocation object. With current libvirt resctrl implementation, if a resctrl '_virResctrlAlloc' object is created, the CMT hardware is enabled automatically and shares the same folder under same resctrlfs, I call the CMT '_virResctrlMon' object that shares the same folder under resctrlfs as 'paired' _virResctrlMon, further, one '_virResctrlMon' and one '_virResctrlAlloc' are a pare. In '_virResctrlMon' the paired '_virResctrlAlloc' is tracked through pairedalloc. paired mon group could not be dynamically enabled or disabled during runtime. 'pairedalloc' could be set to NULL, which creates a non-paired mon group object. Which is necessory because CMT could work independently to monitor the utilization of critical CPU resouces (cache or memory bandwidth) without allocating any dedicated cache or memory bandwidth. A non-paired mon group object represents an independent working CMT. Non-paired mon group could be enabled or disabled during runtime. ## About virsh command 'resctrl' To set or get the resctrl mon group (hardware CMT), a virsh command 'resctrl' is created. here are the common usages: ``` [root@dl-c200 david]# virsh list --all Id Name State ---------------------------------------------------- 1 vm3 running 3 vm2 running - vm1 shut off ``` ### Test on a running domain vm3 To get RDT monitoring status, type 'virsh resctrl <domain>' ``` [root@dl-c200 david]# virsh resctrl vm3 RDT Monitoring Status: Enabled ``` To enable RDT monitoring, type 'virsh resctrl <domain> --enable' ``` [root@dl-c200 david]# virsh resctrl vm3 --enable RDT Monitoring Status: Enabled ``` To diable RDT monitoring, type 'virsh resctrl <domain> --disable' ``` [root@dl-c200 david]# virsh resctrl vm3 --disable RDT Monitoring Status: Disabled [root@dl-c200 david]# virsh resctrl vm3 RDT Monitoring Status: Disabled ``` ### test on domain not running vm1 if domain is not active, it will fail to set RDT monitoring status, and also get the state of 'disabled' ``` [root@dl-c200 david]# virsh resctrl vm1 RDT Monitoring Status: Disabled [root@dl-c200 david]# virsh resctrl vm1 --enable error: Requested operation is not valid: domain is not running [root@dl-c200 david]# virsh resctrl vm1 --disable error: Requested operation is not valid: domain is not running ``` ### test on domain vm2 domain vm2 is active and the CAT functionality is enabled through 'cachetune' (configured in 'cputune/cachetune' section). So the resctrl mon group is a 'paried' one, for 'pared' mon group, the RDT monitoring could not be disabled. If it is allowed to disable 'paire' mon group, we have to destroy resctrl allocation folders which is not supported by current cache allocation design. ``` [root@dl-c200 libvirt]# virsh resctrl vm2 --enable RDT Monitoring Status: Enabled (forced by cachetune) [root@dl-c200 libvirt]# virsh resctrl vm2 --disable RDT Monitoring Status: Enabled (forced by cachetune) [root@dl-c200 libvirt]# virsh resctrl vm2 RDT Monitoring Status: Enabled (forced by cachetune) ``` ## About showing the utilization information of RDT A domstats field has been created to show the utilization of RDT resources, the command is like this: ``` [root@dl-c200 libvirt]# virsh domstats --resctrl Domain: 'vm1' resctrl.cmt=0 Domain: 'vm3' resctrl.cmt=180224 Domain: 'vm2' resctrl.cmt=2613248 ``` Wang Huaqiang (3): util: add Intel x86 RDT/CMT support tools: virsh: add command for controling/monitoring resctrl tools: virsh domstats: show RDT CMT resource utilization information include/libvirt/libvirt-domain.h | 10 ++ src/conf/domain_conf.c | 28 ++++ src/conf/domain_conf.h | 3 + src/driver-hypervisor.h | 8 + src/libvirt-domain.c | 92 +++++++++++ src/libvirt_private.syms | 9 + src/libvirt_public.syms | 6 + src/qemu/qemu_driver.c | 189 +++++++++++++++++++++ src/qemu/qemu_process.c | 65 +++++++- src/remote/remote_daemon_dispatch.c | 45 +++++ src/remote/remote_driver.c | 2 + src/remote/remote_protocol.x | 28 +++- src/remote_protocol-structs | 12 ++ src/util/virresctrl.c | 316 +++++++++++++++++++++++++++++++++++- src/util/virresctrl.h | 44 +++++ tools/virsh-domain-monitor.c | 7 + tools/virsh-domain.c | 74 +++++++++ 17 files changed, 933 insertions(+), 5 deletions(-) -- 2.7.4

Add RDT/CMT feature (Intel x86) by interacting with kernel resctrl file system. Integrate code into util/resctrl. --- src/libvirt_private.syms | 9 ++ src/util/virresctrl.c | 316 ++++++++++++++++++++++++++++++++++++++++++++++- src/util/virresctrl.h | 44 +++++++ 3 files changed, 367 insertions(+), 2 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index b4ab1f3..e16c3e0 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2634,6 +2634,15 @@ virResctrlAllocSetSize; virResctrlGetInfo; virResctrlInfoGetCache; virResctrlInfoNew; +virResctrlMonNew; +virResctrlMonSetID; +virResctrlMonGetID; +virResctrlMonDeterminePath; +virResctrlMonAddPID; +virResctrlMonCreate; +virResctrlMonRemove; +virResctrlMonIsRunning; +virResctrlMonGetCacheOccupancy; # util/virrotatingfile.h diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index fc11635..e5f5caf 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -224,6 +224,22 @@ struct _virResctrlAlloc { static virClassPtr virResctrlAllocClass; +struct _virResctrlMon { + virObject parent; + + /* pairedalloc: pointer to a resctrl allocaion it paried with. + * NULL for a resctrl monitoring group not associated with + * any allocation. */ + virResctrlAllocPtr pairedalloc; + /* The identifier (any unique string for now) */ + char *id; + /* libvirt-generated path, may be identical to alloction path + * may not if allocation is ready */ + char *path; +}; + +static virClassPtr virResctrlMonClass; + static void virResctrlAllocDispose(void *obj) { @@ -275,7 +291,28 @@ virResctrlAllocOnceInit(void) } +static void +virResctrlMonDispose(void *obj) +{ + virResctrlMonPtr resctrlMon = obj; + + VIR_FREE(resctrlMon->id); + VIR_FREE(resctrlMon->path); +} + + +static int +virResctrlMonOnceInit(void) +{ + if (!VIR_CLASS_NEW(virResctrlMon, virClassForObject())) + return -1; + + return 0; +} + + VIR_ONCE_GLOBAL_INIT(virResctrlAlloc) +VIR_ONCE_GLOBAL_INIT(virResctrlMon) virResctrlAllocPtr @@ -288,6 +325,16 @@ virResctrlAllocNew(void) } +virResctrlMonPtr +virResctrlMonNew(void) +{ + if (virResctrlMonInitialize() < 0) + return NULL; + + return virObjectNew(virResctrlMonClass); +} + + /* Common functions */ #ifdef __linux__ static int @@ -329,8 +376,6 @@ virResctrlLockWrite(void) #endif - - static int virResctrlUnlock(int fd) { @@ -1646,3 +1691,270 @@ virResctrlAllocRemove(virResctrlAllocPtr alloc) return ret; } + + +int +virResctrlMonSetID(virResctrlMonPtr mon, + const char *id) +{ + if (!id) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Resctrl mon group 'id' cannot be NULL")); + return -1; + } + + return VIR_STRDUP(mon->id, id); +} + + +const char * +virResctrlMonGetID(virResctrlMonPtr mon) +{ + return mon->id; +} + + +int +virResctrlMonDeterminePath(virResctrlMonPtr mon, + const char *machinename) +{ + + VIR_DEBUG("mon group, mon->path=%s\n", mon->path); + if (!mon->id) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Resctrl mon group id must be set before creation")); + return -1; + } + + if (mon->path) + return -1; + + if(mon->pairedalloc) + { + if (virAsprintf(&mon->path, "%s/%s-%s", + SYSFS_RESCTRL_PATH, machinename, mon->id) < 0) + return -1; + } + else + { + if (virAsprintf(&mon->path, "%s/mon_groups/%s-%s", + SYSFS_RESCTRL_PATH, machinename, mon->id) < 0) + return -1; + } + + return 0; +} + + +int +virResctrlMonAddPID(virResctrlMonPtr mon, + pid_t pid) +{ + char *tasks = NULL; + char *pidstr = NULL; + int ret = 0; + + if (!mon->path) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Cannot add pid to non-existing resctrl mon group")); + return -1; + } + + VIR_DEBUG("Add PID %d to domain %s\n", + pid, mon->path); + + if (virAsprintf(&tasks, "%s/tasks", mon->path) < 0) + return -1; + + if (virAsprintf(&pidstr, "%lld", (long long int) pid) < 0) + goto cleanup; + + if (virFileWriteStr(tasks, pidstr, 0) < 0) { + virReportSystemError(errno, + _("Cannot write pid in tasks file '%s'"), + tasks); + goto cleanup; + } + + ret = 0; +cleanup: + VIR_FREE(tasks); + VIR_FREE(pidstr); + return ret; +} + + +int +virResctrlMonCreate(virResctrlAllocPtr pairedalloc, + virResctrlMonPtr mon, + const char *machinename) +{ + int ret = -1; + int lockfd = -1; + + if (!mon) + return 0; + + + if (pairedalloc) + { + if (!virFileExists(pairedalloc->path)) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("For paired mon group, the resctrl allocation " + "must be created first")); + goto cleanup; + } + mon->pairedalloc = pairedalloc; + + if (virResctrlMonDeterminePath(mon, machinename) < 0) + return -1; + } + else + { + mon->pairedalloc = NULL; + + /* resctrl mon group object may use for multiple purpose, + * free mon group path information*/ + VIR_FREE(mon->path); + + if (virResctrlMonDeterminePath(mon, machinename) < 0) + return -1; + + lockfd = virResctrlLockWrite(); + if (lockfd < 0) + goto cleanup; + + if (virFileExists(mon->path)) + { + VIR_DEBUG("Removing resctrl mon group %s", mon->path); + if (rmdir(mon->path) != 0 && errno != ENOENT) { + ret = -errno; + VIR_ERROR(_("Unable to remove %s (%d)"), mon->path, errno); + goto cleanup; + } + } + + if (virFileMakePath(mon->path) < 0) { + virReportSystemError(errno, + _("Cannot create resctrl directory '%s'"), + mon->path); + goto cleanup; + } + } + + ret = 0; +cleanup: + virResctrlUnlock(lockfd); + return ret; +} + + +int +virResctrlMonRemove(virResctrlMonPtr mon) +{ + int ret = 0; + + if (!mon->path) + return 0; + + VIR_DEBUG("Removing resctrl mon group %s", mon->path); + if (rmdir(mon->path) != 0 && errno != ENOENT) { + ret = -errno; + VIR_ERROR(_("Unable to remove %s (%d)"), mon->path, errno); + } + + return ret; +} + + +bool +virResctrlMonIsRunning(virResctrlMonPtr mon) +{ + bool ret = false; + char *tasks = NULL; + + if (mon && virFileExists(mon->path)) { + ret = virFileReadValueString(&tasks, "%s/tasks", mon->path); + if (ret < 0) + goto cleanup; + + if (!tasks || !tasks[0]) + goto cleanup; + + ret = true; + } + +cleanup: + VIR_FREE(tasks); + + return ret; +} + + +int +virResctrlMonGetCacheOccupancy(virResctrlMonPtr mon, + unsigned int * cacheoccu) +{ + DIR *dirp = NULL; + int ret = -1; + int rv = -1; + struct dirent *ent = NULL; + unsigned int cachetotal = 0; + unsigned int cacheoccyperblock = 0; + virBuffer buf = VIR_BUFFER_INITIALIZER; + char *pathmondata = NULL; + + if (!mon->path) + goto cleanup; + + rv = virDirOpenIfExists(&dirp,mon->path); + if (rv <= 0) { + goto cleanup; + } + + virBufferAsprintf(&buf, "%s/mon_data", + mon->path); + pathmondata = virBufferContentAndReset(&buf); + if (!pathmondata) + goto cleanup; + + VIR_DEBUG("Seek llc_occupancy file from root: %s ", + pathmondata); + + if (virDirOpen(&dirp, pathmondata) < 0) + goto cleanup; + + while ((rv = virDirRead(dirp, &ent, pathmondata)) > 0) { + VIR_DEBUG("Parsing file '%s'", ent->d_name); + if (ent->d_type != DT_DIR) + continue; + + if (STRNEQLEN(ent->d_name, "mon_L", 5)) + continue; + + rv = virFileReadValueUint(&cacheoccyperblock, + "%s/%s/llc_occupancy", + pathmondata, ent->d_name); + if (rv == -2) { + virReportError(VIR_ERR_INTERNAL_ERROR, + "file %s/%s/llc_occupancy does not exist", + pathmondata, ent->d_name); + goto cleanup; + } else if (rv < 0) { + goto cleanup; + } + + VIR_DEBUG("%s/%s/llc_occupancy: occupancy %d bytes", + pathmondata, ent->d_name, cacheoccyperblock); + + cachetotal += cacheoccyperblock; + } + + *cacheoccu = cachetotal; + + ret = 0; +cleanup: + VIR_FREE(pathmondata); + VIR_DIR_CLOSE(dirp); + return ret; +} diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h index 5368ba2..a23c425 100644 --- a/src/util/virresctrl.h +++ b/src/util/virresctrl.h @@ -35,6 +35,12 @@ typedef enum { VIR_ENUM_DECL(virCache); +typedef enum { + VIR_RESCTRL_MONACT_NONE, + VIR_RESCTRL_MONACT_ENABLE, + VIR_RESCTRL_MONACT_DISABLE +} virResctrlMonAct; + typedef struct _virResctrlInfoPerCache virResctrlInfoPerCache; typedef virResctrlInfoPerCache *virResctrlInfoPerCachePtr; @@ -118,4 +124,42 @@ virResctrlAllocAddPID(virResctrlAllocPtr alloc, int virResctrlAllocRemove(virResctrlAllocPtr alloc); + +/* Monitoring-related things */ +typedef struct _virResctrlMon virResctrlMon; +typedef virResctrlMon *virResctrlMonPtr; + +virResctrlMonPtr +virResctrlMonNew(void); + +int +virResctrlMonSetID(virResctrlMonPtr mon, + const char *id); + +const char * +virResctrlMonGetID(virResctrlMonPtr mon); + +int +virResctrlMonDeterminePath(virResctrlMonPtr mon, + const char *machinename); + +int +virResctrlMonAddPID(virResctrlMonPtr alloc, + pid_t pid); + +int +virResctrlMonCreate(virResctrlAllocPtr pairedalloc, + virResctrlMonPtr mon, + const char *machinename); + +int +virResctrlMonRemove(virResctrlMonPtr mon); + +bool +virResctrlMonIsRunning(virResctrlMonPtr mon); + +int +virResctrlMonGetCacheOccupancy(virResctrlMonPtr mon, + unsigned int * cacheoccu); + #endif /* __VIR_RESCTRL_H__ */ -- 2.7.4

On Fri, Jun 08, 2018 at 05:02:17PM +0800, Wang Huaqiang wrote:
Add RDT/CMT feature (Intel x86) by interacting with kernel resctrl file system. Integrate code into util/resctrl. --- src/libvirt_private.syms | 9 ++ src/util/virresctrl.c | 316 ++++++++++++++++++++++++++++++++++++++++++++++- src/util/virresctrl.h | 44 +++++++ 3 files changed, 367 insertions(+), 2 deletions(-)
This will not merge after some of the cleanups I made. There is one more patch that didn't get in and you clould look there for some inspiration as well, but it's just about keeping the data in another part of the code. Anyway the conflict is very easy to fix now. Why isn't it just a matter of setting a boolean? Aling the code and run the checks before posting to the list. For more info see contribution guidelines: https://libvirt.org/hacking.html

See my update inline.
-----Original Message----- From: Martin Kletzander [mailto:mkletzan@redhat.com] Sent: Monday, June 11, 2018 4:40 PM To: Wang, Huaqiang <huaqiang.wang@intel.com> Cc: libvir-list@redhat.com; Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [PATCH 1/3] util: add Intel x86 RDT/CMT support
On Fri, Jun 08, 2018 at 05:02:17PM +0800, Wang Huaqiang wrote:
Add RDT/CMT feature (Intel x86) by interacting with kernel resctrl file system. Integrate code into util/resctrl. --- src/libvirt_private.syms | 9 ++ src/util/virresctrl.c | 316 ++++++++++++++++++++++++++++++++++++++++++++++- src/util/virresctrl.h | 44 +++++++ 3 files changed, 367 insertions(+), 2 deletions(-)
This will not merge after some of the cleanups I made. There is one more patch that didn't get in and you clould look there for some inspiration as well, but it's just about keeping the data in another part of the code.
Anyway the conflict is very easy to fix now.
Why isn't it just a matter of setting a boolean?
Aling the code and run the checks before posting to the list. For more info see contribution guidelines:
Yes, noticed your patch. I'd like to make changes accordingly. Will follow the rules listed in "https://libvirt.org/hacking.html" and take care the coding style. Thanks very much.

--- include/libvirt/libvirt-domain.h | 9 +++ src/conf/domain_conf.c | 28 +++++++ src/conf/domain_conf.h | 3 + src/driver-hypervisor.h | 8 ++ src/libvirt-domain.c | 81 +++++++++++++++++++++ src/libvirt_public.syms | 6 ++ src/qemu/qemu_driver.c | 141 ++++++++++++++++++++++++++++++++++++ src/qemu/qemu_process.c | 65 ++++++++++++++++- src/remote/remote_daemon_dispatch.c | 45 ++++++++++++ src/remote/remote_driver.c | 2 + src/remote/remote_protocol.x | 28 ++++++- src/remote_protocol-structs | 12 +++ tools/virsh-domain.c | 74 +++++++++++++++++++ 13 files changed, 499 insertions(+), 3 deletions(-) diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h index da773b7..598db28 100644 --- a/include/libvirt/libvirt-domain.h +++ b/include/libvirt/libvirt-domain.h @@ -4767,4 +4767,13 @@ int virDomainSetLifecycleAction(virDomainPtr domain, unsigned int action, unsigned int flags); +/* + * resctrl API + */ +int virDomainSetResctrlMon(virDomainPtr domain, + int enable, int disable); + +int virDomainGetResctrlMonSts(virDomainPtr domain, + char **sts); + #endif /* __VIR_LIBVIRT_DOMAIN_H__ */ diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 5be773c..0ada3dc 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -18885,6 +18885,7 @@ virDomainCachetuneDefParse(virDomainDefPtr def, xmlNodePtr *nodes = NULL; virBitmapPtr vcpus = NULL; virResctrlAllocPtr alloc = virResctrlAllocNew(); + virResctrlMonPtr mon= virResctrlMonNew(); virDomainCachetuneDefPtr tmp_cachetune = NULL; char *tmp = NULL; char *vcpus_str = NULL; @@ -18898,6 +18899,9 @@ virDomainCachetuneDefParse(virDomainDefPtr def, if (!alloc) goto cleanup; + if (!mon) + goto cleanup; + if (VIR_ALLOC(tmp_cachetune) < 0) goto cleanup; @@ -18970,8 +18974,12 @@ virDomainCachetuneDefParse(virDomainDefPtr def, if (virResctrlAllocSetID(alloc, alloc_id) < 0) goto cleanup; + if (virResctrlMonSetID(mon, alloc_id) < 0) + goto cleanup; + VIR_STEAL_PTR(tmp_cachetune->vcpus, vcpus); VIR_STEAL_PTR(tmp_cachetune->alloc, alloc); + VIR_STEAL_PTR(tmp_cachetune->mon, mon); if (VIR_APPEND_ELEMENT(def->cachetunes, def->ncachetunes, tmp_cachetune) < 0) goto cleanup; @@ -18990,6 +18998,20 @@ virDomainCachetuneDefParse(virDomainDefPtr def, } +static int +virDomainResctrlDefParse(virDomainDefPtr def, + xmlXPathContextPtr ctxr ATTRIBUTE_UNUSED) +{ + virResctrlMonPtr mon= virResctrlMonNew(); + if (virResctrlMonSetID(mon, "vcpu-rest") < 0) + return -1; + + def->resctrlmon_noalloc = mon; + + return 0; +} + + static virDomainDefPtr virDomainDefParseXML(xmlDocPtr xml, xmlNodePtr root, @@ -19585,6 +19607,12 @@ virDomainDefParseXML(xmlDocPtr xml, } VIR_FREE(nodes); + if (virDomainResctrlDefParse(def, ctxt) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("cannot extract resctrl")); + goto error; + } + if (virCPUDefParseXML(ctxt, "./cpu[1]", VIR_CPU_TYPE_GUEST, &def->cpu) < 0) goto error; diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 8a8121b..2febe62 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2234,6 +2234,7 @@ typedef virDomainCachetuneDef *virDomainCachetuneDefPtr; struct _virDomainCachetuneDef { virBitmapPtr vcpus; virResctrlAllocPtr alloc; + virResctrlMonPtr mon; }; @@ -2389,6 +2390,8 @@ struct _virDomainDef { virDomainCputune cputune; + virResctrlMonPtr resctrlmon_noalloc; + virDomainCachetuneDefPtr *cachetunes; size_t ncachetunes; diff --git a/src/driver-hypervisor.h b/src/driver-hypervisor.h index aa99cbb..c2e5d2a 100644 --- a/src/driver-hypervisor.h +++ b/src/driver-hypervisor.h @@ -1309,6 +1309,12 @@ typedef int unsigned int action, unsigned int flags); +typedef int +(*virDrvDomainSetResctrlMon)(virDomainPtr domain, + int enable, int disable); + +typedef char * +(*virDrvDomainGetResctrlMonSts)(virDomainPtr domain); typedef struct _virHypervisorDriver virHypervisorDriver; typedef virHypervisorDriver *virHypervisorDriverPtr; @@ -1558,6 +1564,8 @@ struct _virHypervisorDriver { virDrvDomainSetLifecycleAction domainSetLifecycleAction; virDrvConnectCompareHypervisorCPU connectCompareHypervisorCPU; virDrvConnectBaselineHypervisorCPU connectBaselineHypervisorCPU; + virDrvDomainSetResctrlMon domainSetResctrlMon; + virDrvDomainGetResctrlMonSts domainGetResctrlMonSts; }; diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c index d44b553..07a19a6 100644 --- a/src/libvirt-domain.c +++ b/src/libvirt-domain.c @@ -12154,3 +12154,84 @@ int virDomainSetLifecycleAction(virDomainPtr domain, virDispatchError(domain->conn); return -1; } + + +/** + * virDomainSetResctrlMon: + * @domain: a domain object + * @enable: true(non-zero) for enbling resctrl mon group. + * @disable: true(non-zero) for disbling resctrl mon group. + * valid if @enable is false + * + * Enable or disable resctrl monitoring. + * + * Returns -1 in case of failure, 0 in case of success. + */ +int +virDomainSetResctrlMon(virDomainPtr domain, + int enable, int disable) +{ + int ret; + virConnectPtr conn; + + virResetLastError(); + + if(!disable && !enable) + return 0; + + virCheckDomainReturn(domain, -1); + + conn = domain->conn; + + if (conn->driver->domainSetResctrlMon) { + ret = conn->driver->domainSetResctrlMon(domain, + enable, disable); + if (ret < 0) + goto error; + return ret; + } + + virReportUnsupportedError(); + + error: + virDispatchError(domain->conn); + return -1; +} + + +/** + * virDomainGetResctrlMonSts: + * @domain: a domain object + * @status: pointer of a string buffer for holding resctrl mon + * group status string, caller is responsible for free it. + * + * Get domain resctrl status. + * + * Returns -1 in case of failure, 0 in case of success. + */ +int +virDomainGetResctrlMonSts(virDomainPtr domain, + char **status) +{ + int ret = -1; + virConnectPtr conn; + + virResetLastError(); + + virCheckDomainReturn(domain, -1); + + conn = domain->conn; + + if (conn->driver->domainGetResctrlMonSts) { + *status = conn->driver->domainGetResctrlMonSts(domain); + if (*status) + ret = 0; + + goto done; + } + + virReportUnsupportedError(); + done: + virDispatchError(domain->conn); + return ret; +} diff --git a/src/libvirt_public.syms b/src/libvirt_public.syms index 4f54b84..fb3eef5 100644 --- a/src/libvirt_public.syms +++ b/src/libvirt_public.syms @@ -798,4 +798,10 @@ LIBVIRT_4.5.0 { virGetLastErrorDomain; } LIBVIRT_4.4.0; +LIBVIRT_4.6.0 { + global: + virDomainSetResctrlMon; + virDomainGetResctrlMonSts; +} LIBVIRT_4.5.0; + # .... define new API here using predicted next version number .... diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 38ea865..4075daa 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -108,6 +108,7 @@ #include "virnuma.h" #include "dirname.h" #include "netdev_bandwidth_conf.h" +#include "virresctrl.h" #define VIR_FROM_THIS VIR_FROM_QEMU @@ -21437,6 +21438,144 @@ qemuDomainSetLifecycleAction(virDomainPtr dom, } +static int +qemuDomainSetResctrlMon(virDomainPtr dom, + int enable ,int disable) +{ + int ret = -1; + virDomainObjPtr vm; + virResctrlMonAct act = VIR_RESCTRL_MONACT_NONE;; + int i = 0; + unsigned int maxvcpus = 0; + + /* The 'enable' action will override the 'disable' one */ + if(disable) + act = VIR_RESCTRL_MONACT_DISABLE; + if(enable) + act = VIR_RESCTRL_MONACT_ENABLE; + + if (act == VIR_RESCTRL_MONACT_NONE) + return 0; + + if (!(vm = qemuDomObjFromDomain(dom))) + return ret; + + qemuDomainObjPrivatePtr priv = vm->privateData; + + if (!virDomainObjIsActive(vm)) { + virReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain is not running")); + goto cleanup; + } + + /* If 'resctrl' is enabled in xml configuation file through 'cachetune' + * section, this interface doesn't work. return 1 for this case */ + if (vm->def->ncachetunes != 0){ + VIR_DEBUG("resctrl monitoring interface is governed by domain " + "configration 'cachetune' sections. Interface disabled.\n"); + ret = 1; + goto cleanup; + } + + if (act == VIR_RESCTRL_MONACT_ENABLE) { + + if (!vm->def->resctrlmon_noalloc){ + virReportError(VIR_ERR_NO_DOMAIN, + _("resctrlmon_noalloc should be allocated.")); + goto cleanup; + } + + if(!virResctrlMonIsRunning(vm->def->resctrlmon_noalloc)) { + + if (virResctrlMonCreate(NULL, + vm->def->resctrlmon_noalloc, priv->machineName) < 0) + goto cleanup; + + /* Set vcpus */ + maxvcpus = virDomainDefGetVcpusMax(vm->def); + for (i = 0; i < maxvcpus; i++) { + virDomainVcpuDefPtr vcpu + = virDomainDefGetVcpu(vm->def, i); + + if (!vcpu->online) + continue; + + pid_t vcpupid = qemuDomainGetVcpuPid(vm, i); + if (virResctrlMonAddPID(vm->def->resctrlmon_noalloc, + vcpupid) < 0) + goto cleanup; + } + } + + VIR_DEBUG("resctrl monitoring is enabled"); + } else if (act == VIR_RESCTRL_MONACT_DISABLE){ + if (!vm->def->resctrlmon_noalloc){ + virReportError(VIR_ERR_NO_DOMAIN, + _("resctrlmon_noalloc should be allocated.")); + goto cleanup; + } + + if(virResctrlMonIsRunning(vm->def->resctrlmon_noalloc)) { + if (virResctrlMonRemove(vm->def->resctrlmon_noalloc) < 0){ + virReportError(VIR_ERR_NO_DOMAIN, + _("Error in remove resctrl mon group.")); + goto cleanup; + } + } + + VIR_DEBUG("resctrl monitoring is disabled\n"); + } + + ret = 0; +cleanup: + virDomainObjEndAPI(&vm); + return ret; +} + + +static char * +qemuDomainGetResctrlMonSts(virDomainPtr dom) +{ + virDomainObjPtr vm; + char *sts = NULL; + + if (!(vm = qemuDomObjFromDomain(dom))) + return sts; + + if (vm->def->ncachetunes != 0){ + VIR_DEBUG("resctrl monitoring interface is governed by domain " + "'cachetune' sections. resctrl monitoring is compulsively enabled.\n"); + + /* only check cachetune[0] for domain resctrl mon group status */ + if (!virResctrlMonIsRunning(vm->def->cachetunes[0]->mon)) { + if (virAsprintf(&sts, "Disabled") < 0) + goto cleanup; + } else { + if (virAsprintf(&sts, "Enabled (forced by cachetune)") < 0) + goto cleanup; + } + + } else { + + if (vm->def->resctrlmon_noalloc && + virResctrlMonIsRunning(vm->def->resctrlmon_noalloc)){ + if (virAsprintf(&sts, "Enabled") < 0) + goto cleanup; + + } else { + if (virAsprintf(&sts, "Disabled") < 0) + goto cleanup; + } + } + + VIR_DEBUG("resctrl monitoring status: %s\n", sts); + +cleanup: + virDomainObjEndAPI(&vm); + return sts; +} + + static virHypervisorDriver qemuHypervisorDriver = { .name = QEMU_DRIVER_NAME, .connectURIProbe = qemuConnectURIProbe, @@ -21660,6 +21799,8 @@ static virHypervisorDriver qemuHypervisorDriver = { .domainSetLifecycleAction = qemuDomainSetLifecycleAction, /* 3.9.0 */ .connectCompareHypervisorCPU = qemuConnectCompareHypervisorCPU, /* 4.4.0 */ .connectBaselineHypervisorCPU = qemuConnectBaselineHypervisorCPU, /* 4.4.0 */ + .domainSetResctrlMon = qemuDomainSetResctrlMon, /*FIXME: assign proper ver string */ + .domainGetResctrlMonSts = qemuDomainGetResctrlMonSts, /*FIXME: assign proper ver string */ }; diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 1606f4c..4fab0e1 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -2461,6 +2461,11 @@ qemuProcessResctrlCreate(virQEMUDriverPtr driver, vm->def->cachetunes[i]->alloc, priv->machineName) < 0) goto cleanup; + + if (virResctrlMonCreate(vm->def->cachetunes[i]->alloc, + vm->def->cachetunes[i]->mon, + priv->machineName) < 0) + goto cleanup; } ret = 0; @@ -5259,6 +5264,7 @@ qemuProcessSetupVcpu(virDomainObjPtr vm, &vcpu->sched) < 0) return -1; + for (i = 0; i < vm->def->ncachetunes; i++) { virDomainCachetuneDefPtr ct = vm->def->cachetunes[i]; @@ -5279,6 +5285,10 @@ qemuProcessSetupVcpus(virDomainObjPtr vm) virDomainVcpuDefPtr vcpu; unsigned int maxvcpus = virDomainDefGetVcpusMax(vm->def); size_t i; + virBitmapPtr vcpuleft = NULL; + int ret = -1; + + qemuDomainObjPrivatePtr priv = vm->privateData; if ((vm->def->cputune.period || vm->def->cputune.quota) && !virCgroupHasController(((qemuDomainObjPrivatePtr) vm->privateData)->cgroup, @@ -5308,17 +5318,52 @@ qemuProcessSetupVcpus(virDomainObjPtr vm) return 0; } + /* To monitor whole domain's cache occupancy information + * create mon group for un-covered VCPUs */ + if (!(vcpuleft = virBitmapNew(maxvcpus + 1))) + goto cleanup; + + virBitmapClearAll(vcpuleft); + for (i = 0; i < maxvcpus; i++) { vcpu = virDomainDefGetVcpu(vm->def, i); if (!vcpu->online) continue; + if ( virBitmapSetBit(vcpuleft, i) < 0) + goto cleanup; + if (qemuProcessSetupVcpu(vm, i) < 0) return -1; } - return 0; + for (i = 0; i < vm->def->ncachetunes; i++) { + virDomainCachetuneDefPtr ct = vm->def->cachetunes[i]; + virBitmapSubtract(vcpuleft, ct->vcpus); + } + + + if (vm->def->ncachetunes && + !virBitmapIsAllClear(vcpuleft)){ + + if (virResctrlMonCreate(NULL, vm->def->resctrlmon_noalloc, priv->machineName) < 0) + goto cleanup; + + for (i = 0; i < maxvcpus; i++) { + if (virBitmapIsBitSet(vcpuleft, i)){ + pid_t vcpupid = qemuDomainGetVcpuPid(vm, i); + + if (virResctrlMonAddPID(vm->def->resctrlmon_noalloc, vcpupid) < 0) + goto cleanup; + } + } + } + + ret = 0; +cleanup: + virBitmapFree(vcpuleft); + return ret; } @@ -6895,8 +6940,14 @@ void qemuProcessStop(virQEMUDriverPtr driver, /* Remove resctrl allocation after cgroups are cleaned up which makes it * kind of safer (although removing the allocation should work even with * pids in tasks file */ - for (i = 0; i < vm->def->ncachetunes; i++) + for (i = 0; i < vm->def->ncachetunes; i++){ virResctrlAllocRemove(vm->def->cachetunes[i]->alloc); + virResctrlMonRemove(vm->def->cachetunes[i]->mon); + } + + if(vm->def->resctrlmon_noalloc) + virResctrlMonRemove(vm->def->resctrlmon_noalloc); + qemuProcessRemoveDomainStatus(driver, vm); @@ -7620,8 +7671,18 @@ qemuProcessReconnect(void *opaque) if (virResctrlAllocDeterminePath(obj->def->cachetunes[i]->alloc, priv->machineName) < 0) goto error; + + if (virResctrlMonDeterminePath(obj->def->cachetunes[i]->mon, + priv->machineName) < 0) + goto error; + } + if(obj->def->resctrlmon_noalloc && + virResctrlMonDeterminePath(obj->def->resctrlmon_noalloc, + priv->machineName) < 0) + goto error; + /* update domain state XML with possibly updated state in virDomainObj */ if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, obj, driver->caps) < 0) goto error; diff --git a/src/remote/remote_daemon_dispatch.c b/src/remote/remote_daemon_dispatch.c index 81d0445..2ef0e5e 100644 --- a/src/remote/remote_daemon_dispatch.c +++ b/src/remote/remote_daemon_dispatch.c @@ -7107,3 +7107,48 @@ remoteSerializeDomainDiskErrors(virDomainDiskErrorPtr errors, } return -1; } + +static int remoteDispatchDomainGetResctrlMonSts( + virNetServerPtr server ATTRIBUTE_UNUSED, + virNetServerClientPtr client, + virNetMessagePtr msg ATTRIBUTE_UNUSED, + virNetMessageErrorPtr rerr, + remote_domain_get_resctrl_mon_sts_args *args, + remote_domain_get_resctrl_mon_sts_ret *ret) +{ + int rv = -1; + virDomainPtr dom = NULL; + char *sts = NULL; + char **sts_p = NULL; + struct daemonClientPrivate *priv = + virNetServerClientGetPrivateData(client); + + if (!priv->conn) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("connection not open")); + goto cleanup; + } + + if (!(dom = get_nonnull_domain(priv->conn, args->dom))) + goto cleanup; + + if ((rv = virDomainGetResctrlMonSts(dom, &sts)) < 0) + goto cleanup; + + if (VIR_ALLOC(sts_p) < 0) + goto cleanup; + + if (VIR_STRDUP(*sts_p, sts) < 0) + goto cleanup; + + ret->sts = sts_p; + rv = 0; + +cleanup: + if (rv < 0) { + virNetMessageSaveError(rerr); + VIR_FREE(sts_p); + } + virObjectUnref(dom); + VIR_FREE(sts); + return rv; +} diff --git a/src/remote/remote_driver.c b/src/remote/remote_driver.c index c22993c..4a6b101 100644 --- a/src/remote/remote_driver.c +++ b/src/remote/remote_driver.c @@ -8451,6 +8451,8 @@ static virHypervisorDriver hypervisor_driver = { .domainSetLifecycleAction = remoteDomainSetLifecycleAction, /* 3.9.0 */ .connectCompareHypervisorCPU = remoteConnectCompareHypervisorCPU, /* 4.4.0 */ .connectBaselineHypervisorCPU = remoteConnectBaselineHypervisorCPU, /* 4.4.0 */ + .domainSetResctrlMon = remoteDomainSetResctrlMon, /*FIXME: assign proper ver string */ + .domainGetResctrlMonSts = remoteDomainGetResctrlMonSts, /*FIXME: assign proper ver string */ }; static virNetworkDriver network_driver = { diff --git a/src/remote/remote_protocol.x b/src/remote/remote_protocol.x index a0ab7e9..9242a61 100644 --- a/src/remote/remote_protocol.x +++ b/src/remote/remote_protocol.x @@ -3480,6 +3480,20 @@ struct remote_connect_baseline_hypervisor_cpu_ret { remote_nonnull_string cpu; }; +struct remote_domain_set_resctrl_mon_args { + remote_nonnull_domain dom; + int enable; + int disable; +}; + +struct remote_domain_get_resctrl_mon_sts_args { + remote_nonnull_domain dom; +}; + +struct remote_domain_get_resctrl_mon_sts_ret { /* insert@1 */ + remote_string sts; +}; + /*----- Protocol. -----*/ /* Define the program number, protocol version and procedure numbers here. */ @@ -6187,5 +6201,17 @@ enum remote_procedure { * @generate: both * @acl: connect:write */ - REMOTE_PROC_CONNECT_BASELINE_HYPERVISOR_CPU = 394 + REMOTE_PROC_CONNECT_BASELINE_HYPERVISOR_CPU = 394, + + /** + * @generate: both + * @acl: domain:write + */ + REMOTE_PROC_DOMAIN_SET_RESCTRL_MON = 395, + + /** + * @generate: client + * @acl: domain:read + */ + REMOTE_PROC_DOMAIN_GET_RESCTRL_MON_STS = 396 }; diff --git a/src/remote_protocol-structs b/src/remote_protocol-structs index 0c4cfc6..ed6a782 100644 --- a/src/remote_protocol-structs +++ b/src/remote_protocol-structs @@ -2907,6 +2907,16 @@ struct remote_connect_baseline_hypervisor_cpu_args { struct remote_connect_baseline_hypervisor_cpu_ret { remote_nonnull_string cpu; }; +struct remote_domain_set_resctrl_mon_args { + remote_nonnull_domain dom; + int enable; + int disable; +} +struct remote_domain_get_resctrl_mon_sts_args { + remote_nonnull_domain dom; +}; +struct remote_domain_get_resctrl_mon_sts_ret { + remote_string sts; enum remote_procedure { REMOTE_PROC_CONNECT_OPEN = 1, REMOTE_PROC_CONNECT_CLOSE = 2, @@ -3302,4 +3312,6 @@ enum remote_procedure { REMOTE_PROC_DOMAIN_DETACH_DEVICE_ALIAS = 392, REMOTE_PROC_CONNECT_COMPARE_HYPERVISOR_CPU = 393, REMOTE_PROC_CONNECT_BASELINE_HYPERVISOR_CPU = 394, + REMOTE_PROC_DOMAIN_SET_RESCTRL_MON = 395, + REMOTE_PROC_DOMAIN_GET_RESCTRL_MON_STS = 396, }; diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index 6aa79f1..4ae8ed2 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -7677,6 +7677,74 @@ cmdIOThreadDel(vshControl *ctl, const vshCmd *cmd) return ret; } +static const vshCmdInfo info_resctrl[] = { + {.name = "help", + .data = N_("get or set hardware CPU resource monitoring functions") + }, + {.name = "desc", + .data = N_("Enable or disable resctrl monitoring for a guest domain.\n" + " To get resctrl status use following" + " command: \n\n" + " virsh # resctrl <domain>") + }, + {.name = NULL} +}; + +static const vshCmdOptDef opts_resctrl[] = { + VIRSH_COMMON_OPT_DOMAIN_FULL(VIR_CONNECT_LIST_DOMAINS_ACTIVE), + {.name = "enable", + .type = VSH_OT_BOOL, + .help = N_("Enable resctrl function such as monitoring cache occupancy " + "or memory bandwidth.") + }, + {.name = "disable", + .type = VSH_OT_BOOL, + .help = N_("Disable hardware function such as monitoring cache occupancy " + "or memory bandwidth.") + }, + {.name = NULL} +}; + + +static bool +cmdResctrl(vshControl *ctl, const vshCmd *cmd) +{ + virDomainPtr dom; + bool ret = false; + char *ressts = NULL; + + bool enable = vshCommandOptBool(cmd, "enable"); + bool disable= vshCommandOptBool(cmd, "disable"); + + if (!(dom = virshCommandOptDomain(ctl, cmd, NULL))) + return false; + + if(!enable && !disable){ + if (virDomainGetResctrlMonSts(dom, &ressts) < 0) + goto cleanup; + + if (!ressts) + goto cleanup; + + } else { + if (virDomainSetResctrlMon(dom, enable, disable) < 0) + goto cleanup; + + if (virDomainGetResctrlMonSts(dom, &ressts) < 0) + goto cleanup; + + if (!ressts) + goto cleanup; + } + + vshPrint(ctl,"RDT Monitoring Status: %s\n", ressts); + ret = true; +cleanup: + VIR_FREE(ressts); + virshDomainFree(dom); + return ret; +} + /* * "cpu-stats" command */ @@ -13799,6 +13867,12 @@ const vshCmdDef domManagementCmds[] = { .flags = 0 }, #endif + {.name = "resctrl", + .handler = cmdResctrl, + .opts = opts_resctrl, + .info = info_resctrl, + .flags = 0 + }, {.name = "cpu-stats", .handler = cmdCPUStats, .opts = opts_cpu_stats, -- 2.7.4

On Fri, Jun 08, 2018 at 05:02:18PM +0800, Wang Huaqiang wrote:
---
The subject says 'virsh', but the code does:
include/libvirt/libvirt-domain.h | 9 +++ src/conf/domain_conf.c | 28 +++++++ src/conf/domain_conf.h | 3 +
something with XML parsing and/or formatting
src/driver-hypervisor.h | 8 ++ src/libvirt-domain.c | 81 +++++++++++++++++++++ src/libvirt_public.syms | 6 ++
changes som public API?
src/qemu/qemu_driver.c | 141 ++++++++++++++++++++++++++++++++++++ src/qemu/qemu_process.c | 65 ++++++++++++++++-
qemu?
src/remote/remote_daemon_dispatch.c | 45 ++++++++++++ src/remote/remote_driver.c | 2 + src/remote/remote_protocol.x | 28 ++++++-
remote driver?
src/remote_protocol-structs | 12 +++ tools/virsh-domain.c | 74 +++++++++++++++++++
oh, look, here's the virsh change =) Look at the list and check the git log to see how changes are usually split. Also see contributor guidelines, mainly for formatting.

-----Original Message----- From: Martin Kletzander [mailto:mkletzan@redhat.com] Sent: Monday, June 11, 2018 4:45 PM To: Wang, Huaqiang <huaqiang.wang@intel.com> Cc: libvir-list@redhat.com; Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [PATCH 2/3] tools: virsh: add command for controling/monitoring resctrl
On Fri, Jun 08, 2018 at 05:02:18PM +0800, Wang Huaqiang wrote:
---
The subject says 'virsh', but the code does:
include/libvirt/libvirt-domain.h | 9 +++ src/conf/domain_conf.c | 28 +++++++ src/conf/domain_conf.h | 3 +
something with XML parsing and/or formatting
src/driver-hypervisor.h | 8 ++ src/libvirt-domain.c | 81 +++++++++++++++++++++ src/libvirt_public.syms | 6 ++
changes som public API?
src/qemu/qemu_driver.c | 141 ++++++++++++++++++++++++++++++++++++ src/qemu/qemu_process.c | 65 ++++++++++++++++-
qemu?
src/remote/remote_daemon_dispatch.c | 45 ++++++++++++ src/remote/remote_driver.c | 2 + src/remote/remote_protocol.x | 28 ++++++-
remote driver?
src/remote_protocol-structs | 12 +++ tools/virsh-domain.c | 74 +++++++++++++++++++
oh, look, here's the virsh change =)
Look at the list and check the git log to see how changes are usually split.
Also see contributor guidelines, mainly for formatting.
Will separate source code and create more reasonable patch serials in the next update. Thanks.

--- include/libvirt/libvirt-domain.h | 1 + src/libvirt-domain.c | 11 +++++++++ src/qemu/qemu_driver.c | 48 ++++++++++++++++++++++++++++++++++++++++ tools/virsh-domain-monitor.c | 7 ++++++ 4 files changed, 67 insertions(+) diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h index 598db28..696b686 100644 --- a/include/libvirt/libvirt-domain.h +++ b/include/libvirt/libvirt-domain.h @@ -2041,6 +2041,7 @@ typedef enum { VIR_DOMAIN_STATS_INTERFACE = (1 << 4), /* return domain interfaces info */ VIR_DOMAIN_STATS_BLOCK = (1 << 5), /* return domain block info */ VIR_DOMAIN_STATS_PERF = (1 << 6), /* return domain perf event info */ + VIR_DOMAIN_STATS_RESCTRL = (1<<7), /* return resctrlfs mornitoring info */ } virDomainStatsTypes; typedef enum { diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c index 07a19a6..3f1e156 100644 --- a/src/libvirt-domain.c +++ b/src/libvirt-domain.c @@ -11486,6 +11486,17 @@ virConnectGetDomainCapabilities(virConnectPtr conn, * long long. It is produced by the * emulation_faults perf event * + * VIR_DOMAIN_STATS_RESCTRL + * "resctrl.cmt" - the usage of l3 cache (bytes) by applications running on + * the platform as unsigned long long. It is retrieved from + * resctrl file system. + * "resctrl.mbmt" - the total system bandwidth (bytes/s) from one level of + * cache to another as unsigned long long. Retrieved from + * resctrl file system. + * "resctrl.mbml" - the amount of data (bytes/s) sent through the memory + * controller on the socket as unsigned long long. Retrieved + * from resctrl file system. + * * Note that entire stats groups or individual stat fields may be missing from * the output in case they are not supported by the given hypervisor, are not * applicable for the current state of the guest domain, or their retrieval diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 4075daa..8004e26 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -20313,6 +20313,53 @@ qemuDomainGetStatsPerf(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, return ret; } +static int +qemuDomainGetStatsResctrl(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, + virDomainObjPtr vm, + virDomainStatsRecordPtr record, + int *maxparams, + unsigned int privflags ATTRIBUTE_UNUSED) +{ + size_t i; + unsigned int llc_occu; + unsigned int llc_occu_total= 0; + int ret = -1; +#define DOMAIN_STATE_STR_RESCTRL "resctrl" + + for (i = 0; i < vm->def->ncachetunes; i++) { + virDomainCachetuneDefPtr ct = vm->def->cachetunes[i]; + + if (virResctrlMonIsRunning(ct->mon)) { + VIR_DEBUG("llc_occupancy: checking cachetune [%ld] ", i); + if (virResctrlMonGetCacheOccupancy(ct->mon, &llc_occu) < 0) + goto cleanup; + llc_occu_total += llc_occu; + } + } + + if (vm->def->resctrlmon_noalloc && + virResctrlMonIsRunning(vm->def->resctrlmon_noalloc)) { + VIR_DEBUG("llc_occupancy: checking resctrl vcpu-rest"); + if (virResctrlMonGetCacheOccupancy( + vm->def->resctrlmon_noalloc, &llc_occu) < 0) + goto cleanup; + llc_occu_total += llc_occu; + } + + if (virTypedParamsAddInt(&record->params, + &record->nparams, + maxparams, + DOMAIN_STATE_STR_RESCTRL + ".cmt", + llc_occu_total) < 0){ + goto cleanup; + } + + ret = 0; +cleanup: + return ret; +} + typedef int (*qemuDomainGetStatsFunc)(virQEMUDriverPtr driver, virDomainObjPtr dom, @@ -20334,6 +20381,7 @@ static struct qemuDomainGetStatsWorker qemuDomainGetStatsWorkers[] = { { qemuDomainGetStatsInterface, VIR_DOMAIN_STATS_INTERFACE, false }, { qemuDomainGetStatsBlock, VIR_DOMAIN_STATS_BLOCK, true }, { qemuDomainGetStatsPerf, VIR_DOMAIN_STATS_PERF, false }, + { qemuDomainGetStatsResctrl, VIR_DOMAIN_STATS_RESCTRL, false }, { NULL, 0, false } }; diff --git a/tools/virsh-domain-monitor.c b/tools/virsh-domain-monitor.c index 8cbb3db..b08d977 100644 --- a/tools/virsh-domain-monitor.c +++ b/tools/virsh-domain-monitor.c @@ -1948,6 +1948,10 @@ static const vshCmdOptDef opts_domstats[] = { .type = VSH_OT_BOOL, .help = N_("report domain perf event statistics"), }, + {.name = "resctrl", + .type = VSH_OT_BOOL, + .help = N_("report resctrlfs mon group information"), + }, {.name = "list-active", .type = VSH_OT_BOOL, .help = N_("list only active domains"), @@ -2057,6 +2061,9 @@ cmdDomstats(vshControl *ctl, const vshCmd *cmd) if (vshCommandOptBool(cmd, "perf")) stats |= VIR_DOMAIN_STATS_PERF; + if (vshCommandOptBool(cmd, "resctrl")) + stats |= VIR_DOMAIN_STATS_RESCTRL; + if (vshCommandOptBool(cmd, "list-active")) flags |= VIR_CONNECT_GET_ALL_DOMAINS_STATS_ACTIVE; -- 2.7.4

[It would be nice if you wrapped the long lines] On Fri, Jun 08, 2018 at 05:02:16PM +0800, Wang Huaqiang wrote:
This is an RFC request for supporting CPU Cache Monitoring Technology (CMT) feature in libvirt. Since MBM is also another feature which is very close to CMT, for simplicity we only discuss CMT here. MBM is the followup that will be implemented after CMT. About CMT please refer to Intel x86 SDM section 17.18 of volume 3 (link:https://software.intel.com/en-us/articles/intel-sdm).
Can you elaborate on how is this different to the CMT perf event that is already in libvirt and can be monitored through domstats API? https://libvirt.org/formatdomain.html#elementsPerf
## About '_virResctrlMon' interface
The cache allocation technology (CAT) has already been implemented in util/virresctrl.* which interacts with Linux kernel resctrl file system. Very simlimar to CAT, the CMT object is represented by 'struct _virResctrlMon', which is
``` struct _virResctrlMon { virObject parent;
/* pairedalloc: pointer to a resctrl allocaion it paried with. * NULL for a resctrl monitoring group not associated with * any allocation. */ virResctrlAllocPtr pairedalloc; /* The identifier (any unique string for now) */ char *id; /* libvirt-generated path, may be identical to alloction path * may not if allocation is ready */ char *path; }; ```
Almost following the same logic behind '_virResctrlAlloc' which is mainly presented in file 'virresctrl.c', a group of APIs has been designed to manipulate '_virResctrlMon'. The '_virResctrlMon' shares a lot in common with '_virResctrlAlloc' except field 'pairedalloc'. 'pairedalloc' stores the pointer of paired resctrl allocation object. With current libvirt resctrl implementation, if a resctrl '_virResctrlAlloc' object is created, the CMT hardware is enabled automatically and shares the same folder under same resctrlfs, I call the CMT '_virResctrlMon' object that shares the same folder under resctrlfs as 'paired' _virResctrlMon, further, one '_virResctrlMon' and one '_virResctrlAlloc' are a pare. In '_virResctrlMon' the paired '_virResctrlAlloc' is tracked through pairedalloc. paired mon group could not be dynamically enabled or disabled during runtime. 'pairedalloc' could be set to NULL, which creates a non-paired mon group object. Which is necessory because CMT could work independently to monitor the utilization of critical CPU resouces (cache or memory bandwidth) without allocating any dedicated cache or memory bandwidth. A non-paired mon group object represents an independent working CMT. Non-paired mon group could be enabled or disabled during runtime.
## About virsh command 'resctrl'
To set or get the resctrl mon group (hardware CMT), a virsh command 'resctrl' is created. here are the common usages:
The command does make sense for people who know how the stuff works on the inside or have seen the code in libvirt. For other users the name 'resctrl' is going to feel very much arbitrary. We re trying to abstract the details for users, so I don't see why it should be named 'resctrl' when it handles "RDT Monitoring Status".
``` [root@dl-c200 david]# virsh list --all Id Name State ---------------------------------------------------- 1 vm3 running 3 vm2 running - vm1 shut off ```
### Test on a running domain vm3 To get RDT monitoring status, type 'virsh resctrl <domain>' ``` [root@dl-c200 david]# virsh resctrl vm3 RDT Monitoring Status: Enabled ```
To enable RDT monitoring, type 'virsh resctrl <domain> --enable' ``` [root@dl-c200 david]# virsh resctrl vm3 --enable RDT Monitoring Status: Enabled ```
To diable RDT monitoring, type 'virsh resctrl <domain> --disable' ``` [root@dl-c200 david]# virsh resctrl vm3 --disable RDT Monitoring Status: Disabled
[root@dl-c200 david]# virsh resctrl vm3 RDT Monitoring Status: Disabled ```
### test on domain not running vm1 if domain is not active, it will fail to set RDT monitoring status, and also get the state of 'disabled' ``` [root@dl-c200 david]# virsh resctrl vm1 RDT Monitoring Status: Disabled
[root@dl-c200 david]# virsh resctrl vm1 --enable error: Requested operation is not valid: domain is not running
[root@dl-c200 david]# virsh resctrl vm1 --disable error: Requested operation is not valid: domain is not running ```
Can't these commands enable it in the XML? It would be nice if the XML part was shown here in the explanation.
### test on domain vm2 domain vm2 is active and the CAT functionality is enabled through 'cachetune' (configured in 'cputune/cachetune' section). So the resctrl mon group is a 'paried' one, for 'pared' mon group, the RDT monitoring could not be disabled. If it is allowed to disable 'paire' mon group, we have to destroy resctrl allocation folders which is not supported by current cache allocation design.
What if you have multiple cachetunes? What if the cachetune is only set for one vcpu and you want to monitor the others as well? I guess I have to see the patches to understand why you have so much information stored for something that looks like a boolean (enable/disable).
``` [root@dl-c200 libvirt]# virsh resctrl vm2 --enable RDT Monitoring Status: Enabled (forced by cachetune)
[root@dl-c200 libvirt]# virsh resctrl vm2 --disable RDT Monitoring Status: Enabled (forced by cachetune)
[root@dl-c200 libvirt]# virsh resctrl vm2 RDT Monitoring Status: Enabled (forced by cachetune) ```
## About showing the utilization information of RDT
A domstats field has been created to show the utilization of RDT resources, the command is like this: ``` [root@dl-c200 libvirt]# virsh domstats --resctrl Domain: 'vm1' resctrl.cmt=0
Domain: 'vm3' resctrl.cmt=180224
Domain: 'vm2' resctrl.cmt=2613248 ```
Wang Huaqiang (3): util: add Intel x86 RDT/CMT support tools: virsh: add command for controling/monitoring resctrl tools: virsh domstats: show RDT CMT resource utilization information
include/libvirt/libvirt-domain.h | 10 ++ src/conf/domain_conf.c | 28 ++++ src/conf/domain_conf.h | 3 + src/driver-hypervisor.h | 8 + src/libvirt-domain.c | 92 +++++++++++ src/libvirt_private.syms | 9 + src/libvirt_public.syms | 6 + src/qemu/qemu_driver.c | 189 +++++++++++++++++++++ src/qemu/qemu_process.c | 65 +++++++- src/remote/remote_daemon_dispatch.c | 45 +++++ src/remote/remote_driver.c | 2 + src/remote/remote_protocol.x | 28 +++- src/remote_protocol-structs | 12 ++ src/util/virresctrl.c | 316 +++++++++++++++++++++++++++++++++++- src/util/virresctrl.h | 44 +++++ tools/virsh-domain-monitor.c | 7 + tools/virsh-domain.c | 74 +++++++++ 17 files changed, 933 insertions(+), 5 deletions(-)
-- 2.7.4
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

Hi Martin, Thanks for your comments, please see my update inline below.
-----Original Message----- From: Martin Kletzander [mailto:mkletzan@redhat.com] Sent: Monday, June 11, 2018 4:30 PM To: Wang, Huaqiang <huaqiang.wang@intel.com> Cc: libvir-list@redhat.com; Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [RFC PATCH 0/3] RFC for X86 RDT Cache Monitoring Technology (CMT) support
[It would be nice if you wrapped the long lines] I'll pay attention to these long lines. Thanks for advices.
On Fri, Jun 08, 2018 at 05:02:16PM +0800, Wang Huaqiang wrote:
This is an RFC request for supporting CPU Cache Monitoring Technology (CMT) feature in libvirt. Since MBM is also another feature which is very close to CMT, for simplicity we only discuss CMT here. MBM is the followup that will be implemented after CMT. About CMT please refer to Intel x86 SDM section 17.18 of volume 3 (link:https://software.intel.com/en-us/articles/intel-sdm).
Can you elaborate on how is this different to the CMT perf event that is already in libvirt and can be monitored through domstats API?
Due to kernel interface removal of the perf events 'cmt,mbmt,mbml', the libvirt will no longer work with latest kernel. Please examine following link for details. https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/comm..., This serials is trying to provide the similar functions of this missing part for reporting cmt, mbmt and mbml information. First we only focus on cmt. Comparing with 'CMT perf event already in libvirt', I am trying to implement almost the same output as 'perf.cmt' in the output message of 'domstats', but with another name , such as 'resctrl.cmt' or 'rdt.cmt' (or some others). Another difference is that the underlying implementation is done through the kernel resctrl fs. This serials also attempts to provide a command interface for enabling and disabling cmt feature in scope of whole domain as original perf event based cmt could be controlled, enabled or disabled, through specifying '--enable cmt' or '--disable cmt' while invoking command 'virsh perf <domain>'. Our version is like 'virsh resctrl <domain> --enable' with a difference of no suffix of 'cmt'. The 'cmt' is omitted because the CMT and MBM function are both enabled whenever a valid resctrl fs sub-folder created, there is no way to disable one while enable another one, such as enabling CMT while disabling MBML at the same time. This serials is trying to stick to interfaces exposed by perf event based CMT/MBM and provide an interface substitution for perf event based CMB/MBM, such as the perf based CMT only provides the cache occupancy information for whole domain only. We are also in thinking providing the capability to provide the cache occupancy information based on vcpus groups which may be specified in XML file. For example, if we have following configuration: <cputune> <vcpupin vcpu='0' cpuset='1'/> <vcpupin vcpu='1' cpuset='3-4'/> <vcpupin vcpu='2' cpuset='4-5'/> <vcpupin vcpu='3' cpuset='6-7'/> <cachetune vcpus='0'> <cache id='0' level='3' type='both' size='2816' unit='KiB'/> <cache id='1' level='3' type='both' size='2816' unit='KiB'/> </cachetune> <cachetune vcpus='1-2'> <cache id='0' level='3' type='both' size='2816' unit='KiB'/> <cache id='1' level='3' type='both' size='2816' unit='KiB'/> </cachetune> <rdt-monitoring vcpu='0' enable='yes'> <rdt-monitoring vcpu='1-2' enable='yes'> <rdt-monitoring vcpu='3' enable='yes'> </cputune> The 'domstats' will output following information regarding cmt [root@dl-c200 libvirt]# virsh domstats vm1 --resctrl Domain: 'vm1' rdt.cmt.total=645562 rdt.cmt.vcpu0=104331 rdt.cmt.vcpu1_2=203200 rdt.cmt.vcpu3=340129 Those updates address your comment for " Can you elaborate on how is this different to the CMT perf event that is already in libvirt and can be monitored through domstats API?", any input is welcome.
https://libvirt.org/formatdomain.html#elementsPerf
## About '_virResctrlMon' interface
The cache allocation technology (CAT) has already been implemented in util/virresctrl.* which interacts with Linux kernel resctrl file system. Very simlimar to CAT, the CMT object is represented by 'struct _virResctrlMon', which is
``` struct _virResctrlMon { virObject parent;
/* pairedalloc: pointer to a resctrl allocaion it paried with. * NULL for a resctrl monitoring group not associated with * any allocation. */ virResctrlAllocPtr pairedalloc; /* The identifier (any unique string for now) */ char *id; /* libvirt-generated path, may be identical to alloction path * may not if allocation is ready */ char *path; }; ```
Almost following the same logic behind '_virResctrlAlloc' which is mainly presented in file 'virresctrl.c', a group of APIs has been designed to manipulate '_virResctrlMon'. The '_virResctrlMon' shares a lot in common with '_virResctrlAlloc' except field 'pairedalloc'. 'pairedalloc' stores the pointer of paired resctrl allocation object. With current libvirt resctrl implementation, if a resctrl '_virResctrlAlloc' object is created, the CMT hardware is enabled automatically and shares the same folder under same resctrlfs, I call the CMT '_virResctrlMon' object that shares the same folder under resctrlfs as 'paired' _virResctrlMon, further, one '_virResctrlMon' and one '_virResctrlAlloc' are a pare. In '_virResctrlMon' the paired '_virResctrlAlloc' is tracked through pairedalloc. paired mon group could not be dynamically enabled or disabled during runtime. 'pairedalloc' could be set to NULL, which creates a non-paired mon group object. Which is necessory because CMT could work independently to monitor the utilization of critical CPU resouces (cache or memory bandwidth) without allocating any dedicated cache or memory bandwidth. A non-paired mon group object represents an independent working CMT. Non-paired mon group could be enabled or disabled during runtime.
## About virsh command 'resctrl'
To set or get the resctrl mon group (hardware CMT), a virsh command 'resctrl' is created. here are the common usages:
The command does make sense for people who know how the stuff works on the inside or have seen the code in libvirt. For other users the name 'resctrl' is going to feel very much arbitrary. We re trying to abstract the details for users, so I don't see why it should be named 'resctrl' when it handles "RDT Monitoring Status".
Agree. 'resctrl' do make a lot of confusion to end users. Since the underlying kernel interface combines CAT and MBM features together, what I mean is , the files ' llc_occupancy', ' mbm_local_bytes' and ' mbm_total_bytes' that represent the information of cache, local memory bandwidth, and total memory bandwidth respectively are created automatically and simultaneously for each resctrl group, there is no way to enable one and disable another one. So for a command which affects both cache and memory bandwidth, I would like to use the word 'rdt' as the key command word. Both cache monitoring(CMT) and memory bandwidth monitoring(MBM) are belong to the scope of RDT monitoring. So to replace the confusing word 'resctrl', I'd like to use 'rdtmon' as command name, the command 'virsh resctrl <domain>' would be changed to 'virsh rdtmon <domain>'. Also, here welcoming any suggestions from community.
``` [root@dl-c200 david]# virsh list --all Id Name State ---------------------------------------------------- 1 vm3 running 3 vm2 running - vm1 shut off ```
### Test on a running domain vm3 To get RDT monitoring status, type 'virsh resctrl <domain>' ``` [root@dl-c200 david]# virsh resctrl vm3 RDT Monitoring Status: Enabled ```
To enable RDT monitoring, type 'virsh resctrl <domain> --enable' ``` [root@dl-c200 david]# virsh resctrl vm3 --enable RDT Monitoring Status: Enabled ```
To diable RDT monitoring, type 'virsh resctrl <domain> --disable' ``` [root@dl-c200 david]# virsh resctrl vm3 --disable RDT Monitoring Status: Disabled
[root@dl-c200 david]# virsh resctrl vm3 RDT Monitoring Status: Disabled ```
### test on domain not running vm1 if domain is not active, it will fail to set RDT monitoring status, and also get the
state of 'disabled'
``` [root@dl-c200 david]# virsh resctrl vm1 RDT Monitoring Status: Disabled
[root@dl-c200 david]# virsh resctrl vm1 --enable error: Requested operation is not valid: domain is not running
[root@dl-c200 david]# virsh resctrl vm1 --disable error: Requested operation is not valid: domain is not running ```
Can't these commands enable it in the XML? It would be nice if the XML part was shown here in the explanation.
In the POC code of the first version there is no XML changes, and could not be enabled/disabled through XML file. Let's have a discuss and add this function, how about this configuration <cputune> <cachetune vcpus='1-2'> <cache id='0' level='3' type='both' size='2816' unit='KiB'/> <cache id='1' level='3' type='both' size='2816' unit='KiB'/> </cachetune> <rdt-monitoring vcpu='0' enable='no'> <rdt-monitoring vcpu='1-2' enable='yes'> <rdt-monitoring vcpu='3' enable='yes'> </cputune> With upper setting, - Two rdt monitoring groups will be created along with the launch of vm. - <rdt-monitoring vcpu='1-2' enable='yes'> is created automatically due to the setting of <cachetune>. Under resctrl fs, the resctrl allocation and rdt monitoring group are presented in the way of sub-folders, we cannot create two sub-folders under resctrl fs folders for one process. so a resctrl allocation will create a rdt monitoring group as well. This rdt monitoring group could not be disabled in runtime because there is no way to disable resctrl allocation (CAT) in runtime. - <rdt-monitoring vcpu='3' enable='yes'> creates another default enabled rdt monitoring group, and task id (pid associated with vcpu3) will be put into the 'tasks' file. This rdt monitoring over vcpu 3 could be enabled or disabled in runtime through command such as 'virsh rdtmon --enable vcpu3' . The MBM feature will also be enabled or disabled with this command. - <rdt-monitoring vcpu='0' enable='no'> specifies the default CAT state for vcpu0 of domain, which is disabled after launch, and could be changed in runtime.
### test on domain vm2 domain vm2 is active and the CAT functionality is enabled through 'cachetune' (configured in 'cputune/cachetune' section). So the resctrl mon group is a 'paried' one, for 'pared' mon group, the RDT monitoring could not be disabled. If it is allowed to disable 'paire' mon group, we have to destroy resctrl allocation folders which is not supported by current cache allocation design.
What if you have multiple cachetunes? What if the cachetune is only set for one vcpu and you want to monitor the others as well? I guess I have to see the patches to understand why you have so much information stored for something that looks like a boolean (enable/disable).
At the time I raised this RFC, there is no design for reporting rdt monitoring information in granularity of cachetune, only report cache /memory bandwidth information for whole domain. But now I'd like to discuss the design that I list above, reporting rdt monitoring Information based on the setting of rdt-monitoring(cachetune) groups. Need your comments.
``` [root@dl-c200 libvirt]# virsh resctrl vm2 --enable RDT Monitoring Status: Enabled (forced by cachetune)
[root@dl-c200 libvirt]# virsh resctrl vm2 --disable RDT Monitoring Status: Enabled (forced by cachetune)
[root@dl-c200 libvirt]# virsh resctrl vm2 RDT Monitoring Status: Enabled (forced by cachetune) ```
## About showing the utilization information of RDT
A domstats field has been created to show the utilization of RDT resources, the
command is like this:
``` [root@dl-c200 libvirt]# virsh domstats --resctrl Domain: 'vm1' resctrl.cmt=0
Domain: 'vm3' resctrl.cmt=180224
Domain: 'vm2' resctrl.cmt=2613248 ```
Wang Huaqiang (3): util: add Intel x86 RDT/CMT support tools: virsh: add command for controling/monitoring resctrl tools: virsh domstats: show RDT CMT resource utilization information
include/libvirt/libvirt-domain.h | 10 ++ src/conf/domain_conf.c | 28 ++++ src/conf/domain_conf.h | 3 + src/driver-hypervisor.h | 8 + src/libvirt-domain.c | 92 +++++++++++ src/libvirt_private.syms | 9 + src/libvirt_public.syms | 6 + src/qemu/qemu_driver.c | 189 +++++++++++++++++++++ src/qemu/qemu_process.c | 65 +++++++- src/remote/remote_daemon_dispatch.c | 45 +++++ src/remote/remote_driver.c | 2 + src/remote/remote_protocol.x | 28 +++- src/remote_protocol-structs | 12 ++ src/util/virresctrl.c | 316 +++++++++++++++++++++++++++++++++++- src/util/virresctrl.h | 44 +++++ tools/virsh-domain-monitor.c | 7 + tools/virsh-domain.c | 74 +++++++++ 17 files changed, 933 insertions(+), 5 deletions(-)
-- 2.7.4
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On Tue, Jun 12, 2018 at 10:11:30AM +0000, Wang, Huaqiang wrote:
Hi Martin,
Thanks for your comments, please see my update inline below.
-----Original Message----- From: Martin Kletzander [mailto:mkletzan@redhat.com] Sent: Monday, June 11, 2018 4:30 PM To: Wang, Huaqiang <huaqiang.wang@intel.com> Cc: libvir-list@redhat.com; Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [RFC PATCH 0/3] RFC for X86 RDT Cache Monitoring Technology (CMT) support
[It would be nice if you wrapped the long lines] I'll pay attention to these long lines. Thanks for advices.
No need to, most email clients can do that automatically. Doing stuff like this manually is very unproductive :).
On Fri, Jun 08, 2018 at 05:02:16PM +0800, Wang Huaqiang wrote:
This is an RFC request for supporting CPU Cache Monitoring Technology (CMT) feature in libvirt. Since MBM is also another feature which is very close to CMT, for simplicity we only discuss CMT here. MBM is the followup that will be implemented after CMT. About CMT please refer to Intel x86 SDM section 17.18 of volume 3 (link:https://software.intel.com/en-us/articles/intel-sdm).
Can you elaborate on how is this different to the CMT perf event that is already in libvirt and can be monitored through domstats API?
Due to kernel interface removal of the perf events 'cmt,mbmt,mbml', the libvirt will no longer work with latest kernel. Please examine following link for details. https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/comm...,
This serials is trying to provide the similar functions of this missing part for reporting cmt, mbmt and mbml information. First we only focus on cmt. Comparing with 'CMT perf event already in libvirt', I am trying to implement almost the same output as 'perf.cmt' in the output message of 'domstats', but with another name , such as 'resctrl.cmt' or 'rdt.cmt' (or some others). Another difference is that the underlying implementation is done through the kernel resctrl fs.
This serials also attempts to provide a command interface for enabling and disabling cmt feature in scope of whole domain as original perf event based cmt could be controlled, enabled or disabled, through specifying '--enable cmt' or '--disable cmt' while invoking command 'virsh perf <domain>'. Our version is like 'virsh resctrl <domain> --enable' with a difference of no suffix of 'cmt'. The 'cmt' is omitted because the CMT and MBM function are both enabled whenever a valid resctrl fs sub-folder created, there is no way to disable one while enable another one, such as enabling CMT while disabling MBML at the same time.
This serials is trying to stick to interfaces exposed by perf event based CMT/MBM and provide an interface substitution for perf event based CMB/MBM, such as the perf based CMT only provides the cache occupancy information for whole domain only. We are also in thinking providing the capability to provide the cache occupancy information based on vcpus groups which may be specified in XML file. For example, if we have following configuration: <cputune> <vcpupin vcpu='0' cpuset='1'/> <vcpupin vcpu='1' cpuset='3-4'/> <vcpupin vcpu='2' cpuset='4-5'/> <vcpupin vcpu='3' cpuset='6-7'/> <cachetune vcpus='0'> <cache id='0' level='3' type='both' size='2816' unit='KiB'/> <cache id='1' level='3' type='both' size='2816' unit='KiB'/> </cachetune> <cachetune vcpus='1-2'> <cache id='0' level='3' type='both' size='2816' unit='KiB'/> <cache id='1' level='3' type='both' size='2816' unit='KiB'/> </cachetune> <rdt-monitoring vcpu='0' enable='yes'> <rdt-monitoring vcpu='1-2' enable='yes'> <rdt-monitoring vcpu='3' enable='yes'> </cputune>
The 'domstats' will output following information regarding cmt [root@dl-c200 libvirt]# virsh domstats vm1 --resctrl Domain: 'vm1' rdt.cmt.total=645562 rdt.cmt.vcpu0=104331 rdt.cmt.vcpu1_2=203200 rdt.cmt.vcpu3=340129
beware as 1-4 is something else than 1,4 so you need to differentiate that. Or to make it easier to parse for consumers of that API just list each vcpu on its own line (but then you need to say which are counted together). Or group them: rdt.cmt.total=645562 rdt.cmt.group0.value=104331 rdt.cmt.group0.vcpus=0 rdt.cmt.group0.value=203200 rdt.cmt.group0.vcpus=1-2 rdt.cmt.group0.value=340129 rdt.cmt.group0.vcpus=3 Honestly, I don't care that much how it is going to look, but it needs to be easy to parse and understand.
Those updates address your comment for " Can you elaborate on how is this different to the CMT perf event that is already in libvirt and can be monitored through domstats API?", any input is welcome.
Great, this information (or rather a brief summary) should be part of the patch series. Not necessarily the commit messages (some of the things would fit there), but at least the cover letter. Otherwise you might get the same question next time and will have to provide the same answer to the next reviewer and so on.
https://libvirt.org/formatdomain.html#elementsPerf
## About '_virResctrlMon' interface
The cache allocation technology (CAT) has already been implemented in util/virresctrl.* which interacts with Linux kernel resctrl file system. Very simlimar to CAT, the CMT object is represented by 'struct _virResctrlMon', which is
``` struct _virResctrlMon { virObject parent;
/* pairedalloc: pointer to a resctrl allocaion it paried with. * NULL for a resctrl monitoring group not associated with * any allocation. */ virResctrlAllocPtr pairedalloc; /* The identifier (any unique string for now) */ char *id; /* libvirt-generated path, may be identical to alloction path * may not if allocation is ready */ char *path; }; ```
Almost following the same logic behind '_virResctrlAlloc' which is mainly presented in file 'virresctrl.c', a group of APIs has been designed to manipulate '_virResctrlMon'. The '_virResctrlMon' shares a lot in common with '_virResctrlAlloc' except field 'pairedalloc'. 'pairedalloc' stores the pointer of paired resctrl allocation object. With current libvirt resctrl implementation, if a resctrl '_virResctrlAlloc' object is created, the CMT hardware is enabled automatically and shares the same folder under same resctrlfs, I call the CMT '_virResctrlMon' object that shares the same folder under resctrlfs as 'paired' _virResctrlMon, further, one '_virResctrlMon' and one '_virResctrlAlloc' are a pare. In '_virResctrlMon' the paired '_virResctrlAlloc' is tracked through pairedalloc. paired mon group could not be dynamically enabled or disabled during runtime. 'pairedalloc' could be set to NULL, which creates a non-paired mon group object. Which is necessory because CMT could work independently to monitor the utilization of critical CPU resouces (cache or memory bandwidth) without allocating any dedicated cache or memory bandwidth. A non-paired mon group object represents an independent working CMT. Non-paired mon group could be enabled or disabled during runtime.
## About virsh command 'resctrl'
To set or get the resctrl mon group (hardware CMT), a virsh command 'resctrl' is created. here are the common usages:
The command does make sense for people who know how the stuff works on the inside or have seen the code in libvirt. For other users the name 'resctrl' is going to feel very much arbitrary. We re trying to abstract the details for users, so I don't see why it should be named 'resctrl' when it handles "RDT Monitoring Status".
Agree. 'resctrl' do make a lot of confusion to end users. Since the underlying kernel interface combines CAT and MBM features together, what I mean is , the files ' llc_occupancy', ' mbm_local_bytes' and ' mbm_total_bytes' that represent the information of cache, local memory bandwidth, and total memory bandwidth respectively are created automatically and simultaneously for each resctrl group, there is no way to enable one and disable another one. So for a command which affects both cache and memory bandwidth, I would like to use the word 'rdt' as the key command word. Both cache monitoring(CMT) and memory bandwidth monitoring(MBM) are belong to the scope of RDT monitoring. So to replace the confusing word 'resctrl', I'd like to use 'rdtmon' as command name, the command 'virsh resctrl <domain>' would be changed to 'virsh rdtmon <domain>'. Also, here welcoming any suggestions from community.
Libvirt tries to abstract various vendor-specific things. For example AMD's SEV is abstracted under the name `launch-security` IIRC so that if there are more in the future not all the code needs to be duplicated. In the same sense Intel's RDT could be named in a more generic sense. Resource Control and Monitoring seems to reflect what it does, but it's kind of a mouthful. Maybe others will have better ideas. I'm bad at naming.
``` [root@dl-c200 david]# virsh list --all Id Name State ---------------------------------------------------- 1 vm3 running 3 vm2 running - vm1 shut off ```
### Test on a running domain vm3 To get RDT monitoring status, type 'virsh resctrl <domain>' ``` [root@dl-c200 david]# virsh resctrl vm3 RDT Monitoring Status: Enabled ```
To enable RDT monitoring, type 'virsh resctrl <domain> --enable' ``` [root@dl-c200 david]# virsh resctrl vm3 --enable RDT Monitoring Status: Enabled ```
To diable RDT monitoring, type 'virsh resctrl <domain> --disable' ``` [root@dl-c200 david]# virsh resctrl vm3 --disable RDT Monitoring Status: Disabled
[root@dl-c200 david]# virsh resctrl vm3 RDT Monitoring Status: Disabled ```
### test on domain not running vm1 if domain is not active, it will fail to set RDT monitoring status, and also get the
state of 'disabled'
``` [root@dl-c200 david]# virsh resctrl vm1 RDT Monitoring Status: Disabled
[root@dl-c200 david]# virsh resctrl vm1 --enable error: Requested operation is not valid: domain is not running
[root@dl-c200 david]# virsh resctrl vm1 --disable error: Requested operation is not valid: domain is not running ```
Can't these commands enable it in the XML? It would be nice if the XML part was shown here in the explanation.
In the POC code of the first version there is no XML changes, and could not be enabled/disabled through XML file.
Let's have a discuss and add this function, how about this configuration <cputune> <cachetune vcpus='1-2'> <cache id='0' level='3' type='both' size='2816' unit='KiB'/> <cache id='1' level='3' type='both' size='2816' unit='KiB'/> </cachetune> <rdt-monitoring vcpu='0' enable='no'> <rdt-monitoring vcpu='1-2' enable='yes'> <rdt-monitoring vcpu='3' enable='yes'> </cputune>
Just so we are on the same note, it doesn't have to have an option to be enabled/disabled in the XML. However, you probably still need to keep the state of that information somewhere across libvirtd restarts. If there is any, I haven't gone through the code.
With upper setting, - Two rdt monitoring groups will be created along with the launch of vm. - <rdt-monitoring vcpu='1-2' enable='yes'> is created automatically due to the setting of <cachetune>. Under resctrl fs, the resctrl allocation and rdt monitoring group are presented in the way of sub-folders, we cannot create two sub-folders under resctrl fs folders for one process. so a resctrl allocation will create a rdt monitoring group as well. This rdt monitoring group could not be disabled in runtime because there is no way to disable resctrl allocation (CAT) in runtime. - <rdt-monitoring vcpu='3' enable='yes'> creates another default enabled rdt monitoring group, and task id (pid associated with vcpu3) will be put into the 'tasks' file. This rdt monitoring over vcpu 3 could be enabled or disabled in runtime through command such as 'virsh rdtmon --enable vcpu3' . The MBM feature will also be enabled or disabled with this command. - <rdt-monitoring vcpu='0' enable='no'> specifies the default CAT state for vcpu0 of domain, which is disabled after launch, and could be changed in runtime.
There are many places where stuff can be created. I started going down the rabbit hole again (like last time when I was implementing CAT) and again, the kernel interface is horrible. Inconsistent naming, poor documentation (or maybe I'm just a very bad reader). I hope someone will join this review because I can't sensibly map the kernel interface to whatever libvirt might do/expose. I already wasted so much time on CAT and I don't want to go back to that again. Let's not do any XML changes unless we find out they are actually needed.
### test on domain vm2 domain vm2 is active and the CAT functionality is enabled through 'cachetune' (configured in 'cputune/cachetune' section). So the resctrl mon group is a 'paried' one, for 'pared' mon group, the RDT monitoring could not be disabled. If it is allowed to disable 'paire' mon group, we have to destroy resctrl allocation folders which is not supported by current cache allocation design.
What if you have multiple cachetunes? What if the cachetune is only set for one vcpu and you want to monitor the others as well? I guess I have to see the patches to understand why you have so much information stored for something that looks like a boolean (enable/disable).
At the time I raised this RFC, there is no design for reporting rdt monitoring information in granularity of cachetune, only report cache /memory bandwidth information for whole domain. But now I'd like to discuss the design that I list above, reporting rdt monitoring Information based on the setting of rdt-monitoring(cachetune) groups. Need your comments.
I just wanted to know what is the preferred approach. If we're creating mon_groups/domain_name_vcpus_X/ or just new resctrl group (there is not much of a difference in that). Does it take hot-(un)plug of vcpus into consideration? How about emulator threads and iothreads? I know libvirt doesn't support them yet for CAT, but that'd be a good way to start adding features to libvirt IMHO. Or live changes to cachetunes. If we have that, then maybe the addition of monitoring will make more sense and it will fit more nicely (since we'll have a more complete picture).
``` [root@dl-c200 libvirt]# virsh resctrl vm2 --enable RDT Monitoring Status: Enabled (forced by cachetune)
[root@dl-c200 libvirt]# virsh resctrl vm2 --disable RDT Monitoring Status: Enabled (forced by cachetune)
[root@dl-c200 libvirt]# virsh resctrl vm2 RDT Monitoring Status: Enabled (forced by cachetune) ```
## About showing the utilization information of RDT
A domstats field has been created to show the utilization of RDT resources, the
command is like this:
``` [root@dl-c200 libvirt]# virsh domstats --resctrl Domain: 'vm1' resctrl.cmt=0
Domain: 'vm3' resctrl.cmt=180224
Domain: 'vm2' resctrl.cmt=2613248 ```
Wang Huaqiang (3): util: add Intel x86 RDT/CMT support tools: virsh: add command for controling/monitoring resctrl tools: virsh domstats: show RDT CMT resource utilization information
include/libvirt/libvirt-domain.h | 10 ++ src/conf/domain_conf.c | 28 ++++ src/conf/domain_conf.h | 3 + src/driver-hypervisor.h | 8 + src/libvirt-domain.c | 92 +++++++++++ src/libvirt_private.syms | 9 + src/libvirt_public.syms | 6 + src/qemu/qemu_driver.c | 189 +++++++++++++++++++++ src/qemu/qemu_process.c | 65 +++++++- src/remote/remote_daemon_dispatch.c | 45 +++++ src/remote/remote_driver.c | 2 + src/remote/remote_protocol.x | 28 +++- src/remote_protocol-structs | 12 ++ src/util/virresctrl.c | 316 +++++++++++++++++++++++++++++++++++- src/util/virresctrl.h | 44 +++++ tools/virsh-domain-monitor.c | 7 + tools/virsh-domain.c | 74 +++++++++ 17 files changed, 933 insertions(+), 5 deletions(-)
-- 2.7.4
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

Please see my inline reply.
-----Original Message----- From: Martin Kletzander [mailto:mkletzan@redhat.com] Sent: Thursday, June 14, 2018 3:54 PM To: Wang, Huaqiang <huaqiang.wang@intel.com> Cc: libvir-list@redhat.com; Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [RFC PATCH 0/3] RFC for X86 RDT Cache Monitoring Technology (CMT) support
On Tue, Jun 12, 2018 at 10:11:30AM +0000, Wang, Huaqiang wrote:
Hi Martin,
Thanks for your comments, please see my update inline below.
-----Original Message----- From: Martin Kletzander [mailto:mkletzan@redhat.com] Sent: Monday, June 11, 2018 4:30 PM To: Wang, Huaqiang <huaqiang.wang@intel.com> Cc: libvir-list@redhat.com; Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com> Subject: Re: [libvirt] [RFC PATCH 0/3] RFC for X86 RDT Cache Monitoring Technology (CMT) support
[It would be nice if you wrapped the long lines] I'll pay attention to these long lines. Thanks for advices.
No need to, most email clients can do that automatically. Doing stuff like this manually is very unproductive :).
On Fri, Jun 08, 2018 at 05:02:16PM +0800, Wang Huaqiang wrote:
This is an RFC request for supporting CPU Cache Monitoring Technology (CMT) feature in libvirt. Since MBM is also another feature which is very close to CMT, for simplicity we only discuss CMT here. MBM is the followup that will be implemented after CMT. About CMT please refer to Intel x86 SDM section 17.18 of volume 3 (link:https://software.intel.com/en-us/articles/intel-sdm).
Can you elaborate on how is this different to the CMT perf event that is already in libvirt and can be monitored through domstats API?
Due to kernel interface removal of the perf events 'cmt,mbmt,mbml', the libvirt will no longer work with latest kernel. Please examine following link for details. https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git /commit/?id=c39a0e2c8850f08249383f2425dbd8dbe4baad69,
This serials is trying to provide the similar functions of this missing part for reporting cmt, mbmt and mbml information. First we only focus on cmt. Comparing with 'CMT perf event already in libvirt', I am trying to implement almost the same output as 'perf.cmt' in the output message of 'domstats', but with another name , such as 'resctrl.cmt' or 'rdt.cmt' (or some others). Another difference is that the underlying implementation is done through the kernel resctrl fs.
This serials also attempts to provide a command interface for enabling and disabling cmt feature in scope of whole domain as original perf event based cmt could be controlled, enabled or disabled, through specifying '- -enable cmt' or '--disable cmt' while invoking command 'virsh perf <domain>'. Our version is like 'virsh resctrl <domain> --enable' with a difference of no suffix of 'cmt'. The 'cmt' is omitted because the CMT and MBM function are both enabled whenever a valid resctrl fs sub-folder created, there is no way to disable one while enable another one, such as enabling CMT while disabling MBML at the same time.
This serials is trying to stick to interfaces exposed by perf event based CMT/MBM and provide an interface substitution for perf event based CMB/MBM, such as the perf based CMT only provides the cache occupancy information for whole domain only. We are also in thinking providing the capability to provide the cache occupancy information based on vcpus groups which may be specified in XML file. For example, if we have following configuration: <cputune> <vcpupin vcpu='0' cpuset='1'/> <vcpupin vcpu='1' cpuset='3-4'/> <vcpupin vcpu='2' cpuset='4-5'/> <vcpupin vcpu='3' cpuset='6-7'/> <cachetune vcpus='0'> <cache id='0' level='3' type='both' size='2816' unit='KiB'/> <cache id='1' level='3' type='both' size='2816' unit='KiB'/> </cachetune> <cachetune vcpus='1-2'> <cache id='0' level='3' type='both' size='2816' unit='KiB'/> <cache id='1' level='3' type='both' size='2816' unit='KiB'/> </cachetune> <rdt-monitoring vcpu='0' enable='yes'> <rdt-monitoring vcpu='1-2' enable='yes'> <rdt-monitoring vcpu='3' enable='yes'> </cputune>
The 'domstats' will output following information regarding cmt [root@dl-c200 libvirt]# virsh domstats vm1 --resctrl Domain: 'vm1' rdt.cmt.total=645562 rdt.cmt.vcpu0=104331 rdt.cmt.vcpu1_2=203200 rdt.cmt.vcpu3=340129
beware as 1-4 is something else than 1,4 so you need to differentiate that. Or to make it easier to parse for consumers of that API just list each vcpu on its own line (but then you need to say which are counted together). Or group them:
rdt.cmt.total=645562 rdt.cmt.group0.value=104331 rdt.cmt.group0.vcpus=0 rdt.cmt.group0.value=203200 rdt.cmt.group0.vcpus=1-2 rdt.cmt.group0.value=340129 rdt.cmt.group0.vcpus=3
Honestly, I don't care that much how it is going to look, but it needs to be easy to parse and understand. Honestly, I don't care that much how it is going to look, but it needs to be easy to parse and understand.
Your arrangement by separating group vcpus and group resource value is much better than my version, thanks for suggestion. By the way, I may omit the output of 'rdt.cmt.total', reason is if not all domain's vcpus are covered in the resctrl monitoring groups, the 'rdt.cmt.total' may be confusing, either mean providing whole domain's resource utilization information or a sum of created groups resource utilization. If user want a sum of resource for currently enabled CMT monitoring groups, user can add them by themselves. If user wants whole domain's number, create groups covering all vcpus.
Those updates address your comment for " Can you elaborate on how is this different to the CMT perf event that is already in libvirt and can be monitored through domstats API?", any input is welcome.
Great, this information (or rather a brief summary) should be part of the patch series. Not necessarily the commit messages (some of the things would fit there), but at least the cover letter. Otherwise you might get the same question next time and will have to provide the same answer to the next reviewer and so on.
OK. I'll update this part of discussion in my next version RFC as well as the cover letter of the POC code.
https://libvirt.org/formatdomain.html#elementsPerf
## About '_virResctrlMon' interface
The cache allocation technology (CAT) has already been implemented in util/virresctrl.* which interacts with Linux kernel resctrl file system. Very simlimar to CAT, the CMT object is represented by 'struct _virResctrlMon', which is
``` struct _virResctrlMon { virObject parent;
/* pairedalloc: pointer to a resctrl allocaion it paried with. * NULL for a resctrl monitoring group not associated with * any allocation. */ virResctrlAllocPtr pairedalloc; /* The identifier (any unique string for now) */ char *id; /* libvirt-generated path, may be identical to alloction path * may not if allocation is ready */ char *path; }; ```
Almost following the same logic behind '_virResctrlAlloc' which is mainly presented in file 'virresctrl.c', a group of APIs has been designed to manipulate '_virResctrlMon'. The '_virResctrlMon' shares a lot in common with '_virResctrlAlloc' except field 'pairedalloc'. 'pairedalloc' stores the pointer of paired resctrl allocation object. With current libvirt resctrl implementation, if a resctrl '_virResctrlAlloc' object is created, the CMT hardware is enabled automatically and shares the same folder under same resctrlfs, I call the CMT '_virResctrlMon' object that shares the same folder under resctrlfs as 'paired' _virResctrlMon, further, one '_virResctrlMon' and one '_virResctrlAlloc' are a pare. In '_virResctrlMon' the paired '_virResctrlAlloc' is tracked through pairedalloc. paired mon group could not
be dynamically enabled or disabled during runtime.
'pairedalloc' could be set to NULL, which creates a non-paired mon group object. Which is necessory because CMT could work independently to monitor the utilization of critical CPU resouces (cache or memory bandwidth) without allocating any dedicated cache or memory bandwidth. A non-paired mon group object represents an independent working CMT. Non-paired mon group could be enabled or disabled during runtime.
## About virsh command 'resctrl'
To set or get the resctrl mon group (hardware CMT), a virsh command 'resctrl' is created. here are the common usages:
The command does make sense for people who know how the stuff works on the inside or have seen the code in libvirt. For other users the name 'resctrl' is going to feel very much arbitrary. We re trying to abstract the details for users, so I don't see why it should be named 'resctrl' when it handles "RDT Monitoring Status".
Agree. 'resctrl' do make a lot of confusion to end users. Since the underlying kernel interface combines CAT and MBM features together, what I mean is , the files ' llc_occupancy', ' mbm_local_bytes' and ' mbm_total_bytes' that represent the information of cache, local memory bandwidth, and total memory bandwidth respectively are created automatically and simultaneously for each resctrl group, there is no way to enable one and disable another one. So for a command which affects both cache and memory bandwidth, I would like to use the word 'rdt' as the key command word. Both cache monitoring(CMT) and memory bandwidth monitoring(MBM) are belong to the scope of RDT monitoring. So to replace the confusing word 'resctrl', I'd like to use 'rdtmon' as command name, the command 'virsh resctrl <domain>' would be changed to 'virsh rdtmon <domain>'. Also, here welcoming any suggestions from community.
Libvirt tries to abstract various vendor-specific things. For example AMD's SEV is abstracted under the name `launch-security` IIRC so that if there are more in the future not all the code needs to be duplicated. In the same sense Intel's RDT could be named in a more generic sense. Resource Control and Monitoring seems to reflect what it does, but it's kind of a mouthful. Maybe others will have better ideas. I'm bad at naming.
I am bad at naming too :) I agree that 'RDT' and 'resctrl' are pretty confusing for system administrator from the name. But 'Resource Control' or 'Monitoring' is not good choice either, in my opinion. These two phrases have too big scope varying from network resource to memory (DRAM) resource as well as some other resources. Here only focus on CPU resources, currently, the last level cache and memory bandwidth, I would like to use 'cpu-resouce' or 'cpures' as the name for general RDT feature enabling. how about the interfaces shown below: 1. A virsh command 'cpu-resource' for checking domain associated resctrl resource group status and creating/setting resctrl monitoring group in granularity of vcpu. there command may show like this virsh cpu-resource --create <resource type> --destroy <resource type> --vcpulist <vcpulist> --group-name <resctrl group name> *. Using '--create' and '--destroy' to substitute '--enable' and '--disable' that I proposed in my last update. create and destroy are more accurate as the operation action is accurately set up and delete resource groups. *. for <resource type>, here will specify 'monitoring' for feature both CMT and MBM. CAT and MBA could be supported here if it is planned to created function which is similar to command 'cachetune' or 'membwtune' here, his parameter is also extensible for future CPU resources. *. For <vcpulist> it specify the associated vcpu list for a cpu resource group. *. For <group-name>, specifies resource group name, if creating monitoring group for specific vcpu list, a null string for this is expected to match the virResctrlAllocPtr->id string. This argument is also extensible to support some other features, e.g. create an monitoring group for emulator threads with a specific group name, such as 'emulator'. Resource monitoring group for iothread could be done by leveraging 'group-name' argument in a similar way. 2. An update for virsh command 'domstats'. Followed your suggestion you provided in upper discussion, the output related to rdt is like these: cpu-resource.cache-occupancy.group0.value=104331 cpu-resource.cache-occupancy.group0.vcpus=0 cpu-resource.cache-occupancy.group0.value=203200 cpu-resource.cache-occupancy.group0.vcpus=1-2 cpu-resource.cache-occupancy.group0.value=340129 cpu-resource.cache-occupancy.group0.vcpus=3 later for mbm, these outputs would be cpu-resource.memory-bandwidth.group0.value=10331 cpu-resource.memory-bandwidth.group0.vcpus=0 cpu-resource.memory-bandwidthy.group0.value=2000 cpu-resource.memory-bandwidth.group0.vcpus=1-2
``` [root@dl-c200 david]# virsh list --all Id Name State ---------------------------------------------------- 1 vm3 running 3 vm2 running - vm1 shut off ```
### Test on a running domain vm3 To get RDT monitoring status, type 'virsh resctrl <domain>' ``` [root@dl-c200 david]# virsh resctrl vm3 RDT Monitoring Status: Enabled ```
To enable RDT monitoring, type 'virsh resctrl <domain> --enable' ``` [root@dl-c200 david]# virsh resctrl vm3 --enable RDT Monitoring Status: Enabled ```
To diable RDT monitoring, type 'virsh resctrl <domain> --disable' ``` [root@dl-c200 david]# virsh resctrl vm3 --disable RDT Monitoring Status: Disabled
[root@dl-c200 david]# virsh resctrl vm3 RDT Monitoring Status: Disabled ```
### test on domain not running vm1 if domain is not active, it will fail to set RDT monitoring status, and also get the
state of 'disabled'
``` [root@dl-c200 david]# virsh resctrl vm1 RDT Monitoring Status: Disabled
[root@dl-c200 david]# virsh resctrl vm1 --enable error: Requested operation is not valid: domain is not running
[root@dl-c200 david]# virsh resctrl vm1 --disable error: Requested operation is not valid: domain is not running ```
Can't these commands enable it in the XML? It would be nice if the XML part was shown here in the explanation.
In the POC code of the first version there is no XML changes, and could not be enabled/disabled through XML file.
Let's have a discuss and add this function, how about this configuration <cputune> <cachetune vcpus='1-2'> <cache id='0' level='3' type='both' size='2816' unit='KiB'/> <cache id='1' level='3' type='both' size='2816' unit='KiB'/> </cachetune> <rdt-monitoring vcpu='0' enable='no'> <rdt-monitoring vcpu='1-2' enable='yes'> <rdt-monitoring vcpu='3' enable='yes'> </cputune>
To not make user confusing, here changing 'rdt-monitoring' to 'monitoring'. Since 'monitoring' is a sub-node of 'cputune', it obviously means CPU(tune) related 'monitoring'. Also removing attribute 'enable'. The XML configuration would be: <cputune> <cachetune vcpus='1-2'> <cache id='0' level='3' type='both' size='2816' unit='KiB'/> <cache id='1' level='3' type='both' size='2816' unit='KiB'/> </cachetune> <monitoring vcpu='0'> <monitoring vcpu='1-2'> <monitoring vcpu='3'> </cputune>
Just so we are on the same note, it doesn't have to have an option to be enabled/disabled in the XML. However, you probably still need to keep the state of that information somewhere across libvirtd restarts. If there is any, I haven't gone through the code.
Similar to def->ncachetunes and def->cachetunes, def->nmongroups and def->mongroups are created to preserve monitoring group settings.
With upper setting, - Two rdt monitoring groups will be created along with the launch of vm. - <rdt-monitoring vcpu='1-2' enable='yes'> is created automatically due to the setting of <cachetune>. Under resctrl fs, the resctrl allocation and rdt monitoring group are presented in the way of sub-folders, we cannot create two sub-folders under resctrl fs folders for one process. so a resctrl allocation will create a rdt monitoring group as well. This rdt monitoring group could not be disabled in runtime because there is no way to disable resctrl allocation (CAT) in runtime. - <rdt-monitoring vcpu='3' enable='yes'> creates another default enabled rdt monitoring group, and task id (pid associated with vcpu3) will be put into the 'tasks' file. This rdt monitoring over vcpu 3 could be enabled or disabled in runtime through command such as 'virsh rdtmon --enable vcpu3' . The MBM feature will also be enabled or disabled with this command. - <rdt-monitoring vcpu='0' enable='no'> specifies the default CAT state for vcpu0 of domain, which is disabled after launch, and could be changed in runtime.
There are many places where stuff can be created. I started going down the rabbit hole again (like last time when I was implementing CAT) and again, the kernel interface is horrible. Inconsistent naming, poor documentation (or maybe I'm just a very bad reader). I hope someone will join this review because I can't sensibly map the kernel interface to whatever libvirt might do/expose. I already wasted so much time on CAT and I don't want to go back to that again.
Let's not do any XML changes unless we find out they are actually needed.
For the XML changes part, not understand. If we want the feature to save and create some resource groups at domain startup, the XML is the place for keeping the configuration. Do you still want to remove my XML changes after I removed 'enable' attribute?
### test on domain vm2 domain vm2 is active and the CAT functionality is enabled through
'cachetune'
(configured in 'cputune/cachetune' section). So the resctrl mon group is a 'paried' one, for 'pared' mon group, the RDT monitoring could not be disabled. If it is allowed to disable 'paire' mon group, we have to destroy resctrl allocation folders which is not supported by current cache allocation design.
What if you have multiple cachetunes? What if the cachetune is only set for one vcpu and you want to monitor the others as well? I guess I have to see the patches to understand why you have so much information stored for something that looks like a boolean (enable/disable).
At the time I raised this RFC, there is no design for reporting rdt monitoring information in granularity of cachetune, only report cache /memory bandwidth information for whole domain. But now I'd like to discuss the design that I list above, reporting rdt monitoring Information based on the setting of rdt-monitoring(cachetune) groups. Need your comments.
I just wanted to know what is the preferred approach. If we're creating mon_groups/domain_name_vcpus_X/ or just new resctrl group (there is not much of a difference in that). Does it take hot-(un)plug of vcpus into consideration? How about emulator threads and iothreads? I know libvirt doesn't support them yet for CAT, but that'd be a good way to start adding features to libvirt IMHO.
Or live changes to cachetunes. If we have that, then maybe the addition of monitoring will make more sense and it will fit more nicely (since we'll have a more complete picture).
I haven't taken vcpu hotplug in consideration. It may cause some trouble to libvirt RDT function, both resource monitoring and allocation part, because the vcpu thread may be destroyed after command of setvcpu, if rescource control interface does not aware that, it will cause some miss-match, e.g. resource group sub-directory exists but it doesn't work well for missing disappeared vcpu thread. If resource group live change interface (CMT and CAT) exists we could ask user to destroy resource groups first. Or we define the rule that if you want to change vcpu count lively, your resource groups, both allocation and monitoring groups, will disappear. I have no special consideration for emulation threads and io threads. I don't find there is any special action for emulator and io threads by reading CAT source codes, do I miss any part of it? Anyway, do we need this feature right now? You proposed a command, 'cachetune', for a live change of cache allocations, that should be fine to implement, I will involve some code to implement the live change of monitoring groups in my next POC code. Maybe I could submit a patch for 'cachetune' after this RDT monitoring feature.
``` [root@dl-c200 libvirt]# virsh resctrl vm2 --enable RDT Monitoring Status: Enabled (forced by cachetune)
[root@dl-c200 libvirt]# virsh resctrl vm2 --disable RDT Monitoring Status: Enabled (forced by cachetune)
[root@dl-c200 libvirt]# virsh resctrl vm2 RDT Monitoring Status: Enabled (forced by cachetune) ```
## About showing the utilization information of RDT
A domstats field has been created to show the utilization of RDT resources, the
command is like this:
``` [root@dl-c200 libvirt]# virsh domstats --resctrl Domain: 'vm1' resctrl.cmt=0
Domain: 'vm3' resctrl.cmt=180224
Domain: 'vm2' resctrl.cmt=2613248 ```
Wang Huaqiang (3): util: add Intel x86 RDT/CMT support tools: virsh: add command for controling/monitoring resctrl tools: virsh domstats: show RDT CMT resource utilization information
include/libvirt/libvirt-domain.h | 10 ++ src/conf/domain_conf.c | 28 ++++ src/conf/domain_conf.h | 3 + src/driver-hypervisor.h | 8 + src/libvirt-domain.c | 92 +++++++++++ src/libvirt_private.syms | 9 + src/libvirt_public.syms | 6 + src/qemu/qemu_driver.c | 189 +++++++++++++++++++++ src/qemu/qemu_process.c | 65 +++++++- src/remote/remote_daemon_dispatch.c | 45 +++++ src/remote/remote_driver.c | 2 + src/remote/remote_protocol.x | 28 +++- src/remote_protocol-structs | 12 ++ src/util/virresctrl.c | 316
+++++++++++++++++++++++++++++++++++-
src/util/virresctrl.h | 44 +++++ tools/virsh-domain-monitor.c | 7 + tools/virsh-domain.c | 74 +++++++++ 17 files changed, 933 insertions(+), 5 deletions(-)
-- 2.7.4
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
participants (3)
-
Martin Kletzander
-
Wang Huaqiang
-
Wang, Huaqiang