[libvirt] [v7 00/10] Support cache tune in libvirt

Addressed comment from v7 -> v6 Marcelo: * Fix flock usage while VM initialization. Addressed comment from v6 -> v5 Marcelo: * Support other APPs to operate /sys/fs/resctrl at same time Libvirt will scan /sys/fs/resctrl again before doing cache allocation. patch 10 will address this. Addressed comment from v4 -> v5: Marcelo: * Several typos * Use flock instead of virFileLock Addressed comment from v3 -> v4: Daniel & Marcelo: * Added concurrence support Addressed comment from v2 -> v3: Daniel: * Fixed coding style, passed `make check` and `make syntax-check` * Variables renaming and move from header file to c file. * For locking/mutex support, no progress. There are some discussion from mailing list, but I can not find a better way to add locking support without performance impact. I'll explain the process and please help to advice what shoud we do. VM create: 1) Get the cache left value on each bank of the host. This should be shared amount all VMs. 2) Calculate the schemata on the bank based on all created resctrl domain's schemata 3) Calculate the default schemata by scaning all domain's schemata. 4) Flush default schemata to /sys/fs/resctrl/schemata VM destroy: 1) Remove the resctrl domain of that VM 2) Recalculate default schemata 3) Flush default schemata to /sys/fs/resctrl/schemata The key point is that all VMs shares /sys/fs/resctrl/schemata, and when a VM create a resctrl domain, the schemata of that VM depends on the default schemata and all other exsited schematas. So a global mutex is reqired. Before calculate a schemata or update default schemata, libvirt should gain this global mutex. I will try to think more about how to support this gracefully in next patch set. Marcelo: * Added vcpu support for cachetune, this will allow user to define which vcpu using which cache allocation bank. <cachetune id='0' host_id=0 size='3072' unit='KiB' vcpus='0,1'/> vcpus is a cpumap, the vcpu pids will be added to tasks file * Added cdp compatible, user can specify l3 cache even host enable cdp. See patch 8. On a cdp enabled host, specify l3code/l3data by <cachetune id='0' host_id='0' type='l3' size='3072' unit='KiB'/> This will create a schemata like: L3data:0=0xff00;... L3code:0=0xff00;... * Would you please help to test if the functions work. Martin: * Xml test case, I have no time to work on this yet, would you please show me an example, would like to amend it later. This series patches are for supportting CAT featues, which also called cache tune in libvirt. First to expose cache information which could be tuned in capabilites XML. Then add new domain xml element support to add cacahe bank which will apply on this libvirt domain. This series patches add a util file `resctrl.c/h`, an interface to talk with linux kernel's system fs. There are still one TODO left: 1. Expose a new public interface to get free cache information. 2. Expose a new public interface to set cachetune lively. Some discussion about this feature support can be found from: https://www.redhat.com/archives/libvir-list/2017-January/msg00644.html *** BLURB HERE *** Eli Qiao (10): Resctrl: Add some utils functions Resctrl: expose cache information to capabilities Resctrl: Add new xml element to support cache tune Resctrl: Add private interface to set cachebanks Qemu: Set cache banks Resctrl: enable l3code/l3data Resctrl: Make sure l3data/l3code are pairs Resctrl: Compatible mode for cdp enabled Resctrl: concurrence support Resctrl: Scan resctrl before doing cache allocation docs/schemas/domaincommon.rng | 46 ++ include/libvirt/virterror.h | 1 + po/POTFILES.in | 1 + src/Makefile.am | 1 + src/conf/capabilities.c | 56 ++ src/conf/capabilities.h | 23 + src/conf/domain_conf.c | 182 +++++++ src/conf/domain_conf.h | 19 + src/libvirt_private.syms | 11 + src/nodeinfo.c | 64 +++ src/nodeinfo.h | 1 + src/qemu/qemu_capabilities.c | 8 + src/qemu/qemu_driver.c | 6 + src/qemu/qemu_process.c | 53 ++ src/util/virerror.c | 1 + src/util/virresctrl.c | 1202 +++++++++++++++++++++++++++++++++++++++++ src/util/virresctrl.h | 88 +++ 17 files changed, 1763 insertions(+) create mode 100644 src/util/virresctrl.c create mode 100644 src/util/virresctrl.h -- 1.9.1

This patch adds some utils struct and functions to expose resctrl information. virResCtrlAvailable: If resctrl interface exist on host virResCtrlGet: get specify type resource contral information virResCtrlInit: initialize resctrl struct from the host's sys fs. resctrlall[]: an array to maintain resource control information. Signed-off-by: Eli Qiao <liyong.qiao@intel.com> --- include/libvirt/virterror.h | 1 + po/POTFILES.in | 1 + src/Makefile.am | 1 + src/libvirt_private.syms | 4 + src/util/virerror.c | 1 + src/util/virresctrl.c | 343 ++++++++++++++++++++++++++++++++++++++++++++ src/util/virresctrl.h | 80 +++++++++++ 7 files changed, 431 insertions(+) create mode 100644 src/util/virresctrl.c create mode 100644 src/util/virresctrl.h diff --git a/include/libvirt/virterror.h b/include/libvirt/virterror.h index 2efee8f..3dd2d08 100644 --- a/include/libvirt/virterror.h +++ b/include/libvirt/virterror.h @@ -132,6 +132,7 @@ typedef enum { VIR_FROM_PERF = 65, /* Error from perf */ VIR_FROM_LIBSSH = 66, /* Error from libssh connection transport */ + VIR_FROM_RESCTRL = 67, /* Error from resource control */ # ifdef VIR_ENUM_SENTINELS VIR_ERR_DOMAIN_LAST diff --git a/po/POTFILES.in b/po/POTFILES.in index 365ea66..f7fda98 100644 --- a/po/POTFILES.in +++ b/po/POTFILES.in @@ -240,6 +240,7 @@ src/util/virportallocator.c src/util/virprocess.c src/util/virqemu.c src/util/virrandom.c +src/util/virresctrl.c src/util/virrotatingfile.c src/util/virscsi.c src/util/virscsivhost.c diff --git a/src/Makefile.am b/src/Makefile.am index 2f32d41..b626f29 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -162,6 +162,7 @@ UTIL_SOURCES = \ util/virprocess.c util/virprocess.h \ util/virqemu.c util/virqemu.h \ util/virrandom.h util/virrandom.c \ + util/virresctrl.h util/virresctrl.c \ util/virrotatingfile.h util/virrotatingfile.c \ util/virscsi.c util/virscsi.h \ util/virscsivhost.c util/virscsivhost.h \ diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 8e994c7..743e5ac 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2313,6 +2313,10 @@ virRandomGenerateWWN; virRandomInt; +# util/virresctrl.h +virResCtrlAvailable; +virResCtrlInit; + # util/virrotatingfile.h virRotatingFileReaderConsume; virRotatingFileReaderFree; diff --git a/src/util/virerror.c b/src/util/virerror.c index ef17fb5..0ba15e6 100644 --- a/src/util/virerror.c +++ b/src/util/virerror.c @@ -139,6 +139,7 @@ VIR_ENUM_IMPL(virErrorDomain, VIR_ERR_DOMAIN_LAST, "Perf", /* 65 */ "Libssh transport layer", + "Resouce Control", ) diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c new file mode 100644 index 0000000..80481bc --- /dev/null +++ b/src/util/virresctrl.c @@ -0,0 +1,343 @@ +/* + * virresctrl.c: methods for managing resource contral + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Authors: + * Eli Qiao <liyong.qiao@intel.com> + */ +#include <config.h> + +#include <sys/ioctl.h> +#if defined HAVE_SYS_SYSCALL_H +# include <sys/syscall.h> +#endif +#include <sys/types.h> +#include <sys/stat.h> +#include <fcntl.h> + +#include "virresctrl.h" +#include "viralloc.h" +#include "virerror.h" +#include "virfile.h" +#include "virhostcpu.h" +#include "virlog.h" +#include "virstring.h" +#include "nodeinfo.h" + +VIR_LOG_INIT("util.resctrl"); + +#define VIR_FROM_THIS VIR_FROM_RESCTRL + +#define RESCTRL_DIR "/sys/fs/resctrl" +#define RESCTRL_INFO_DIR "/sys/fs/resctrl/info" +#define SYSFS_SYSTEM_PATH "/sys/devices/system" + +#define MAX_CPU_SOCKET_NUM 8 +#define MAX_CBM_BIT_LEN 32 +#define MAX_SCHEMATA_LEN 1024 +#define MAX_FILE_LEN ( 10 * 1024 * 1024) + + +static unsigned int host_id; + +static virResCtrl resctrlall[] = { + { + .name = "L3", + .cache_level = "l3", + }, + { + .name = "L3DATA", + .cache_level = "l3", + }, + { + .name = "L3CODE", + .cache_level = "l3", + }, + { + .name = "L2", + .cache_level = "l2", + }, +}; + +static int virResCtrlGetInfoStr(const int type, const char *item, char **str) +{ + int ret = 0; + char *tmp; + char *path; + + if (virAsprintf(&path, "%s/%s/%s", RESCTRL_INFO_DIR, resctrlall[type].name, item) < 0) + return -1; + if (virFileReadAll(path, 10, str) < 0) { + ret = -1; + goto cleanup; + } + + if ((tmp = strchr(*str, '\n'))) *tmp = '\0'; + + cleanup: + VIR_FREE(path); + return ret; +} + + +static int virResCtrlGetCPUValue(const char *path, char **value) +{ + int ret = -1; + char *tmp; + + if (virFileReadAll(path, 10, value) < 0) goto cleanup; + if ((tmp = strchr(*value, '\n'))) *tmp = '\0'; + ret = 0; + + cleanup: + return ret; +} + +static int virResctrlGetCPUSocketID(const size_t cpu, int *socket_id) +{ + int ret = -1; + char *physical_package_path = NULL; + char *physical_package = NULL; + if (virAsprintf(&physical_package_path, + "%s/cpu/cpu%zu/topology/physical_package_id", + SYSFS_SYSTEM_PATH, cpu) < 0) { + return -1; + } + + if (virResCtrlGetCPUValue(physical_package_path, + &physical_package) < 0) + goto cleanup; + + if (virStrToLong_i(physical_package, NULL, 0, socket_id) < 0) + goto cleanup; + + ret = 0; + cleanup: + VIR_FREE(physical_package); + VIR_FREE(physical_package_path); + return ret; +} + +static int virResCtrlGetCPUCache(const size_t cpu, int type, int *cache) +{ + int ret = -1; + char *cache_dir = NULL; + char *cache_str = NULL; + char *tmp; + int carry = -1; + + if (virAsprintf(&cache_dir, + "%s/cpu/cpu%zu/cache/index%d/size", + SYSFS_SYSTEM_PATH, cpu, type) < 0) + return -1; + + if (virResCtrlGetCPUValue(cache_dir, &cache_str) < 0) + goto cleanup; + + tmp = cache_str; + + while (*tmp != '\0') tmp++; + + if (*(tmp - 1) == 'K') { + *(tmp - 1) = '\0'; + carry = 1; + } else if (*(tmp - 1) == 'M') { + *(tmp - 1) = '\0'; + carry = 1024; + } + + if (virStrToLong_i(cache_str, NULL, 0, cache) < 0) + goto cleanup; + + *cache = (*cache) * carry; + + if (*cache < 0) + goto cleanup; + + ret = 0; + cleanup: + VIR_FREE(cache_dir); + VIR_FREE(cache_str); + return ret; +} + +/* Fill all cache bank informations */ +static virResCacheBankPtr virResCtrlGetCacheBanks(int type, int *n_sockets) +{ + int npresent_cpus; + int idx = -1; + size_t i; + virResCacheBankPtr bank; + + *n_sockets = 1; + if ((npresent_cpus = virHostCPUGetCount()) < 0) + return NULL; + + if (type == VIR_RDT_RESOURCE_L3 + || type == VIR_RDT_RESOURCE_L3DATA + || type == VIR_RDT_RESOURCE_L3CODE) + idx = 3; + else if (type == VIR_RDT_RESOURCE_L2) + idx = 2; + + if (idx == -1) + return NULL; + + if (VIR_ALLOC_N(bank, *n_sockets) < 0) { + *n_sockets = 0; + return NULL; + } + + for (i = 0; i < npresent_cpus; i ++) { + int s_id; + int cache_size; + + if (virResctrlGetCPUSocketID(i, &s_id) < 0) + goto error; + + if (s_id > (*n_sockets - 1)) { + size_t cur = *n_sockets; + size_t exp = s_id - (*n_sockets) + 1; + if (VIR_EXPAND_N(bank, cur, exp) < 0) + goto error; + *n_sockets = s_id + 1; + } + + if (bank[s_id].cpu_mask == NULL) { + if (!(bank[s_id].cpu_mask = virBitmapNew(npresent_cpus))) + goto error; + } + + ignore_value(virBitmapSetBit(bank[s_id].cpu_mask, i)); + + if (bank[s_id].cache_size == 0) { + if (virResCtrlGetCPUCache(i, idx, &cache_size) < 0) + goto error; + + bank[s_id].cache_size = cache_size; + bank[s_id].cache_min = cache_size / resctrlall[type].cbm_len; + } + } + return bank; + + error: + *n_sockets = 0; + VIR_FREE(bank); + return NULL; +} + +static int virResCtrlGetConfig(int type) +{ + int ret; + size_t i; + char *str; + + /* Read min_cbm_bits from resctrl. + eg: /sys/fs/resctrl/info/L3/num_closids + */ + if ((ret = virResCtrlGetInfoStr(type, "num_closids", &str)) < 0) + return ret; + + if (virStrToLong_i(str, NULL, 10, &resctrlall[type].num_closid) < 0) + return -1; + + VIR_FREE(str); + + /* Read min_cbm_bits from resctrl. + eg: /sys/fs/resctrl/info/L3/cbm_mask + */ + if ((ret = virResCtrlGetInfoStr(type, "min_cbm_bits", &str)) < 0) + return ret; + + if (virStrToLong_i(str, NULL, 10, &resctrlall[type].min_cbm_bits) < 0) + return -1; + + VIR_FREE(str); + + /* Read cbm_mask string from resctrl. + eg: /sys/fs/resctrl/info/L3/cbm_mask + */ + if ((ret = virResCtrlGetInfoStr(type, "cbm_mask", &str)) < 0) + return ret; + + /* Calculate cbm length from the default cbm_mask. */ + resctrlall[type].cbm_len = strlen(str) * 4; + VIR_FREE(str); + + /* Get all cache bank informations */ + resctrlall[type].cache_banks = virResCtrlGetCacheBanks(type, + &(resctrlall[type].num_banks)); + + if (resctrlall[type].cache_banks == NULL) + return -1; + + for (i = 0; i < resctrlall[type].num_banks; i++) { + /*L3CODE and L3DATA shares same L3 resource, so they should + * have same host_id. */ + if (type == VIR_RDT_RESOURCE_L3CODE) + resctrlall[type].cache_banks[i].host_id = resctrlall[VIR_RDT_RESOURCE_L3DATA].cache_banks[i].host_id; + else + resctrlall[type].cache_banks[i].host_id = host_id++; + } + + resctrlall[type].enabled = true; + + return ret; +} + +int +virResCtrlInit(void) +{ + size_t i = 0; + char *tmp; + int rc = 0; + + for (i = 0; i < VIR_RDT_RESOURCE_LAST; i++) { + if ((rc = virAsprintf(&tmp, "%s/%s", RESCTRL_INFO_DIR, resctrlall[i].name)) < 0) { + VIR_ERROR(_("Failed to initialize resource control config")); + return -1; + } + if (virFileExists(tmp)) { + if ((rc = virResCtrlGetConfig(i)) < 0) + VIR_ERROR(_("Failed to get resource control config")); + return -1; + } + + VIR_FREE(tmp); + } + return rc; +} + +/* + * Test whether the host support resource control + */ +bool +virResCtrlAvailable(void) +{ + if (!virFileExists(RESCTRL_INFO_DIR)) + return false; + return true; +} + +/* + * Return an virResCtrlPtr point to virResCtrl object, + * We should not modify it out side of virresctrl.c + */ +virResCtrlPtr +virResCtrlGet(int type) +{ + return &resctrlall[type]; +} diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h new file mode 100644 index 0000000..aa113f4 --- /dev/null +++ b/src/util/virresctrl.h @@ -0,0 +1,80 @@ +/* + * * virresctrl.h: methods for managing rscctrl + * * + * * Copyright (C) 2016 Intel, Inc. + * * + * * This library is free software; you can redistribute it and/or + * * modify it under the terms of the GNU Lesser General Public + * * License as published by the Free Software Foundation; either + * * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Authors: + * Eli Qiao <liyong.qiao@intel.com> + */ + +#ifndef __VIR_RESCTRL_H__ +# define __VIR_RESCTRL_H__ + +# include "virutil.h" +# include "virbitmap.h" + +enum { + VIR_RDT_RESOURCE_L3, + VIR_RDT_RESOURCE_L3DATA, + VIR_RDT_RESOURCE_L3CODE, + VIR_RDT_RESOURCE_L2, + /* Must be the last */ + VIR_RDT_RESOURCE_LAST, +}; + + +typedef struct _virResCacheBank virResCacheBank; +typedef virResCacheBank *virResCacheBankPtr; +struct _virResCacheBank { + unsigned int host_id; + unsigned long long cache_size; + unsigned long long cache_left; + unsigned long long cache_min; + virBitmapPtr cpu_mask; +}; + +/** + * struct rdt_resource - attributes of an RDT resource + * @enabled: Is this feature enabled on this machine + * @name: Name to use in "schemata" file + * @num_closid: Number of CLOSIDs available + * @max_cbm: Largest Cache Bit Mask allowed + * @min_cbm_bits: Minimum number of consecutive bits to be set + * in a cache bit mask + * @cache_level: Which cache level defines scope of this domain + * @num_banks: Number of cache bank on this machine. + * @cache_banks: Array of cache bank + */ +typedef struct _virResCtrl virResCtrl; +typedef virResCtrl *virResCtrlPtr; +struct _virResCtrl { + bool enabled; + const char *name; + int num_closid; + int cbm_len; + int min_cbm_bits; + const char* cache_level; + int num_banks; + virResCacheBankPtr cache_banks; +}; + + +bool virResCtrlAvailable(void); +int virResCtrlInit(void); +virResCtrlPtr virResCtrlGet(int); + +#endif -- 1.9.1

On Thu, Feb 16, 2017 at 05:35:12PM +0800, Eli Qiao wrote:
This patch adds some utils struct and functions to expose resctrl information.
One tiny nitpick, but it might actually help you as well. You can use -v7 parameter to git send-email or git format-patch to automatically add 'v7' to the subject-prefix keeping the "PATCH" in there. Not only could this be easier for you, but people like me, who filter patches and other mails on the list to different folders, would have these properly categorized. Anyway, for the review now.
virResCtrlAvailable: If resctrl interface exist on host virResCtrlGet: get specify type resource contral information
"get specific resource control information" ???
virResCtrlInit: initialize resctrl struct from the host's sys fs. resctrlall[]: an array to maintain resource control information.
Signed-off-by: Eli Qiao <liyong.qiao@intel.com> --- include/libvirt/virterror.h | 1 + po/POTFILES.in | 1 + src/Makefile.am | 1 + src/libvirt_private.syms | 4 + src/util/virerror.c | 1 + src/util/virresctrl.c | 343 ++++++++++++++++++++++++++++++++++++++++++++ src/util/virresctrl.h | 80 +++++++++++ 7 files changed, 431 insertions(+) create mode 100644 src/util/virresctrl.c create mode 100644 src/util/virresctrl.h
diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c new file mode 100644 index 0000000..80481bc --- /dev/null +++ b/src/util/virresctrl.c @@ -0,0 +1,343 @@ +/* + * virresctrl.c: methods for managing resource contral + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Authors: + * Eli Qiao <liyong.qiao@intel.com> + */ +#include <config.h> + +#include <sys/ioctl.h> +#if defined HAVE_SYS_SYSCALL_H +# include <sys/syscall.h>
What do you need syscall.h for?
+#endif +#include <sys/types.h> +#include <sys/stat.h> +#include <fcntl.h> + +#include "virresctrl.h" +#include "viralloc.h" +#include "virerror.h" +#include "virfile.h" +#include "virhostcpu.h" +#include "virlog.h" +#include "virstring.h" +#include "nodeinfo.h" + +VIR_LOG_INIT("util.resctrl"); + +#define VIR_FROM_THIS VIR_FROM_RESCTRL + +#define RESCTRL_DIR "/sys/fs/resctrl" +#define RESCTRL_INFO_DIR "/sys/fs/resctrl/info" +#define SYSFS_SYSTEM_PATH "/sys/devices/system" +
If you need SYSFS_..._PATH for anything, it probably could be split into other src/util/ files. Example below.
+#define MAX_CPU_SOCKET_NUM 8 +#define MAX_CBM_BIT_LEN 32 +#define MAX_SCHEMATA_LEN 1024 +#define MAX_FILE_LEN ( 10 * 1024 * 1024) + + +static unsigned int host_id; + +static virResCtrl resctrlall[] = { + { + .name = "L3", + .cache_level = "l3", + }, + { + .name = "L3DATA", + .cache_level = "l3", + }, + { + .name = "L3CODE", + .cache_level = "l3", + }, + { + .name = "L2", + .cache_level = "l2", + }, +}; + +static int virResCtrlGetInfoStr(const int type, const char *item, char **str) +{ + int ret = 0; + char *tmp; + char *path; + + if (virAsprintf(&path, "%s/%s/%s", RESCTRL_INFO_DIR, resctrlall[type].name, item) < 0) + return -1; + if (virFileReadAll(path, 10, str) < 0) { + ret = -1; + goto cleanup; + } + + if ((tmp = strchr(*str, '\n'))) *tmp = '\0'; + + cleanup: + VIR_FREE(path); + return ret; +} + + +static int virResCtrlGetCPUValue(const char *path, char **value)
It would be more consistent if you reused parts of virHostCPUGetValue(), put them in a function, and use that one in both this one an the original one. It chould be also put in the virhostcpu.c since that's about the host cpu.
+static int virResctrlGetCPUSocketID(const size_t cpu, int *socket_id)
No need for this function, just use virHostCPUParseSocket()
+static int virResCtrlGetCPUCache(const size_t cpu, int type, int *cache)
So we have some places in the code that get info from sysfs. I understand that cache is controlled in the resctrl, but one doesn't have to have resctrl to get some cache info, so I would move this function into virhostcpu.c and keep here only the stuff strictly related to resource control.
+{ + int ret = -1; + char *cache_dir = NULL; + char *cache_str = NULL; + char *tmp; + int carry = -1; + + if (virAsprintf(&cache_dir, + "%s/cpu/cpu%zu/cache/index%d/size", + SYSFS_SYSTEM_PATH, cpu, type) < 0) + return -1; + + if (virResCtrlGetCPUValue(cache_dir, &cache_str) < 0) + goto cleanup; + + tmp = cache_str; + + while (*tmp != '\0') tmp++; + + if (*(tmp - 1) == 'K') { + *(tmp - 1) = '\0'; + carry = 1; + } else if (*(tmp - 1) == 'M') { + *(tmp - 1) = '\0'; + carry = 1024; + } + + if (virStrToLong_i(cache_str, NULL, 0, cache) < 0) + goto cleanup; + + *cache = (*cache) * carry; + + if (*cache < 0) + goto cleanup; + + ret = 0; + cleanup: + VIR_FREE(cache_dir); + VIR_FREE(cache_str); + return ret; +} +
Why all this fuzz? You should instead pass pointer to virStrToLong_i to get where the number ends and then use virScaleInteger() to multiply it properly.
+/* Fill all cache bank informations */ +static virResCacheBankPtr virResCtrlGetCacheBanks(int type, int *n_sockets) +{
Still could be in virhostcpu.c
+ int npresent_cpus; + int idx = -1; + size_t i; + virResCacheBankPtr bank; + + *n_sockets = 1; + if ((npresent_cpus = virHostCPUGetCount()) < 0) + return NULL; + + if (type == VIR_RDT_RESOURCE_L3 + || type == VIR_RDT_RESOURCE_L3DATA + || type == VIR_RDT_RESOURCE_L3CODE) + idx = 3; + else if (type == VIR_RDT_RESOURCE_L2) + idx = 2; + + if (idx == -1) + return NULL; +
Indentation, "||" should be on the previous line but, most importantly, why not switch? That'd make sure you won't miss any enum value and if someone adds a new one, compilator will see it's missing here.
+ if (VIR_ALLOC_N(bank, *n_sockets) < 0) { + *n_sockets = 0;
set this before the first return so that this function guarantees n_sockets to be 0 on fail. Moreover, n_sockets is always set to 1 here. Due to the way the rest of the function is designed, this doesn't have to be here at all.
+ return NULL; + } + + for (i = 0; i < npresent_cpus; i ++) { + int s_id; + int cache_size; + + if (virResctrlGetCPUSocketID(i, &s_id) < 0) + goto error; + + if (s_id > (*n_sockets - 1)) { + size_t cur = *n_sockets; + size_t exp = s_id - (*n_sockets) + 1; + if (VIR_EXPAND_N(bank, cur, exp) < 0) + goto error; + *n_sockets = s_id + 1; + } + + if (bank[s_id].cpu_mask == NULL) { + if (!(bank[s_id].cpu_mask = virBitmapNew(npresent_cpus))) + goto error; + } + + ignore_value(virBitmapSetBit(bank[s_id].cpu_mask, i)); + + if (bank[s_id].cache_size == 0) { + if (virResCtrlGetCPUCache(i, idx, &cache_size) < 0) + goto error; + + bank[s_id].cache_size = cache_size; + bank[s_id].cache_min = cache_size / resctrlall[type].cbm_len; + } + } + return bank; +
Wouldn't it be easier to have list of virResCacheBankPtr *banks; and size_t nbanks; and then just populate each pointer when that socket is on the system, so that you that NULL means the socket info was not filled yet. Or just a list that isn't sparse (like yours is now). The logic here seems hard to read. I'll continue the review tomorrow. Martin
+ error: + *n_sockets = 0; + VIR_FREE(bank); + return NULL; +} + +static int virResCtrlGetConfig(int type) +{ + int ret; + size_t i; + char *str; + + /* Read min_cbm_bits from resctrl. + eg: /sys/fs/resctrl/info/L3/num_closids + */ + if ((ret = virResCtrlGetInfoStr(type, "num_closids", &str)) < 0) + return ret; + + if (virStrToLong_i(str, NULL, 10, &resctrlall[type].num_closid) < 0) + return -1; + + VIR_FREE(str); + + /* Read min_cbm_bits from resctrl. + eg: /sys/fs/resctrl/info/L3/cbm_mask + */ + if ((ret = virResCtrlGetInfoStr(type, "min_cbm_bits", &str)) < 0) + return ret; + + if (virStrToLong_i(str, NULL, 10, &resctrlall[type].min_cbm_bits) < 0) + return -1; + + VIR_FREE(str); + + /* Read cbm_mask string from resctrl. + eg: /sys/fs/resctrl/info/L3/cbm_mask + */ + if ((ret = virResCtrlGetInfoStr(type, "cbm_mask", &str)) < 0) + return ret; + + /* Calculate cbm length from the default cbm_mask. */ + resctrlall[type].cbm_len = strlen(str) * 4; + VIR_FREE(str); + + /* Get all cache bank informations */ + resctrlall[type].cache_banks = virResCtrlGetCacheBanks(type, + &(resctrlall[type].num_banks)); + + if (resctrlall[type].cache_banks == NULL) + return -1; + + for (i = 0; i < resctrlall[type].num_banks; i++) { + /*L3CODE and L3DATA shares same L3 resource, so they should + * have same host_id. */ + if (type == VIR_RDT_RESOURCE_L3CODE) + resctrlall[type].cache_banks[i].host_id = resctrlall[VIR_RDT_RESOURCE_L3DATA].cache_banks[i].host_id; + else + resctrlall[type].cache_banks[i].host_id = host_id++; + } + + resctrlall[type].enabled = true; + + return ret; +} + +int +virResCtrlInit(void) +{ + size_t i = 0; + char *tmp; + int rc = 0; + + for (i = 0; i < VIR_RDT_RESOURCE_LAST; i++) { + if ((rc = virAsprintf(&tmp, "%s/%s", RESCTRL_INFO_DIR, resctrlall[i].name)) < 0) { + VIR_ERROR(_("Failed to initialize resource control config")); + return -1; + } + if (virFileExists(tmp)) { + if ((rc = virResCtrlGetConfig(i)) < 0) + VIR_ERROR(_("Failed to get resource control config")); + return -1; + } + + VIR_FREE(tmp); + } + return rc; +} + +/* + * Test whether the host support resource control + */ +bool +virResCtrlAvailable(void) +{ + if (!virFileExists(RESCTRL_INFO_DIR)) + return false; + return true; +} + +/* + * Return an virResCtrlPtr point to virResCtrl object, + * We should not modify it out side of virresctrl.c + */ +virResCtrlPtr +virResCtrlGet(int type) +{ + return &resctrlall[type]; +} diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h new file mode 100644 index 0000000..aa113f4 --- /dev/null +++ b/src/util/virresctrl.h @@ -0,0 +1,80 @@ +/* + * * virresctrl.h: methods for managing rscctrl + * * + * * Copyright (C) 2016 Intel, Inc. + * * + * * This library is free software; you can redistribute it and/or + * * modify it under the terms of the GNU Lesser General Public + * * License as published by the Free Software Foundation; either + * * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Authors: + * Eli Qiao <liyong.qiao@intel.com> + */ + +#ifndef __VIR_RESCTRL_H__ +# define __VIR_RESCTRL_H__ + +# include "virutil.h" +# include "virbitmap.h" + +enum { + VIR_RDT_RESOURCE_L3, + VIR_RDT_RESOURCE_L3DATA, + VIR_RDT_RESOURCE_L3CODE, + VIR_RDT_RESOURCE_L2, + /* Must be the last */ + VIR_RDT_RESOURCE_LAST, +}; + + +typedef struct _virResCacheBank virResCacheBank; +typedef virResCacheBank *virResCacheBankPtr; +struct _virResCacheBank { + unsigned int host_id; + unsigned long long cache_size; + unsigned long long cache_left; + unsigned long long cache_min; + virBitmapPtr cpu_mask; +}; + +/** + * struct rdt_resource - attributes of an RDT resource + * @enabled: Is this feature enabled on this machine + * @name: Name to use in "schemata" file + * @num_closid: Number of CLOSIDs available + * @max_cbm: Largest Cache Bit Mask allowed + * @min_cbm_bits: Minimum number of consecutive bits to be set + * in a cache bit mask + * @cache_level: Which cache level defines scope of this domain + * @num_banks: Number of cache bank on this machine. + * @cache_banks: Array of cache bank + */ +typedef struct _virResCtrl virResCtrl; +typedef virResCtrl *virResCtrlPtr; +struct _virResCtrl { + bool enabled; + const char *name; + int num_closid; + int cbm_len; + int min_cbm_bits; + const char* cache_level; + int num_banks; + virResCacheBankPtr cache_banks; +}; + + +bool virResCtrlAvailable(void); +int virResCtrlInit(void); +virResCtrlPtr virResCtrlGet(int); + +#endif -- 1.9.1

On Thu, Feb 16, 2017 at 06:03:50PM +0100, Martin Kletzander wrote:
On Thu, Feb 16, 2017 at 05:35:12PM +0800, Eli Qiao wrote:
This patch adds some utils struct and functions to expose resctrl information.
One tiny nitpick, but it might actually help you as well. You can use -v7 parameter to git send-email or git format-patch to automatically add 'v7' to the subject-prefix keeping the "PATCH" in there. Not only could this be easier for you, but people like me, who filter patches and other mails on the list to different folders, would have these properly categorized.
Anyway, for the review now.
virResCtrlAvailable: If resctrl interface exist on host virResCtrlGet: get specify type resource contral information
"get specific resource control information" ???
virResCtrlInit: initialize resctrl struct from the host's sys fs. resctrlall[]: an array to maintain resource control information.
Signed-off-by: Eli Qiao <liyong.qiao@intel.com> --- include/libvirt/virterror.h | 1 + po/POTFILES.in | 1 + src/Makefile.am | 1 + src/libvirt_private.syms | 4 + src/util/virerror.c | 1 + src/util/virresctrl.c | 343 ++++++++++++++++++++++++++++++++++++++++++++ src/util/virresctrl.h | 80 +++++++++++ 7 files changed, 431 insertions(+) create mode 100644 src/util/virresctrl.c create mode 100644 src/util/virresctrl.h
diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c new file mode 100644 index 0000000..80481bc --- /dev/null +++ b/src/util/virresctrl.c @@ -0,0 +1,343 @@ +/* + * virresctrl.c: methods for managing resource contral + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Authors: + * Eli Qiao <liyong.qiao@intel.com> + */ +#include <config.h> + +#include <sys/ioctl.h> +#if defined HAVE_SYS_SYSCALL_H +# include <sys/syscall.h>
What do you need syscall.h for?
+#endif +#include <sys/types.h> +#include <sys/stat.h> +#include <fcntl.h> + +#include "virresctrl.h" +#include "viralloc.h" +#include "virerror.h" +#include "virfile.h" +#include "virhostcpu.h" +#include "virlog.h" +#include "virstring.h" +#include "nodeinfo.h" + +VIR_LOG_INIT("util.resctrl"); + +#define VIR_FROM_THIS VIR_FROM_RESCTRL + +#define RESCTRL_DIR "/sys/fs/resctrl" +#define RESCTRL_INFO_DIR "/sys/fs/resctrl/info" +#define SYSFS_SYSTEM_PATH "/sys/devices/system" +
If you need SYSFS_..._PATH for anything, it probably could be split into other src/util/ files. Example below.
+#define MAX_CPU_SOCKET_NUM 8 +#define MAX_CBM_BIT_LEN 32 +#define MAX_SCHEMATA_LEN 1024 +#define MAX_FILE_LEN ( 10 * 1024 * 1024) + + +static unsigned int host_id; + +static virResCtrl resctrlall[] = { + { + .name = "L3", + .cache_level = "l3", + }, + { + .name = "L3DATA", + .cache_level = "l3", + }, + { + .name = "L3CODE", + .cache_level = "l3", + }, + { + .name = "L2", + .cache_level = "l2", + }, +}; + +static int virResCtrlGetInfoStr(const int type, const char *item, char **str) +{ + int ret = 0; + char *tmp; + char *path; + + if (virAsprintf(&path, "%s/%s/%s", RESCTRL_INFO_DIR, resctrlall[type].name, item) < 0) + return -1; + if (virFileReadAll(path, 10, str) < 0) { + ret = -1; + goto cleanup; + } + + if ((tmp = strchr(*str, '\n'))) *tmp = '\0'; + + cleanup: + VIR_FREE(path); + return ret; +} + + +static int virResCtrlGetCPUValue(const char *path, char **value)
It would be more consistent if you reused parts of virHostCPUGetValue(), put them in a function, and use that one in both this one an the original one. It chould be also put in the virhostcpu.c since that's about the host cpu.
+static int virResctrlGetCPUSocketID(const size_t cpu, int *socket_id)
No need for this function, just use virHostCPUParseSocket()
+static int virResCtrlGetCPUCache(const size_t cpu, int type, int *cache)
So we have some places in the code that get info from sysfs. I understand that cache is controlled in the resctrl, but one doesn't have to have resctrl to get some cache info, so I would move this function into virhostcpu.c and keep here only the stuff strictly related to resource control.
+{ + int ret = -1; + char *cache_dir = NULL; + char *cache_str = NULL; + char *tmp; + int carry = -1; + + if (virAsprintf(&cache_dir, + "%s/cpu/cpu%zu/cache/index%d/size", + SYSFS_SYSTEM_PATH, cpu, type) < 0) + return -1; + + if (virResCtrlGetCPUValue(cache_dir, &cache_str) < 0) + goto cleanup; + + tmp = cache_str; + + while (*tmp != '\0') tmp++; + + if (*(tmp - 1) == 'K') { + *(tmp - 1) = '\0'; + carry = 1; + } else if (*(tmp - 1) == 'M') { + *(tmp - 1) = '\0'; + carry = 1024; + } + + if (virStrToLong_i(cache_str, NULL, 0, cache) < 0) + goto cleanup; + + *cache = (*cache) * carry; + + if (*cache < 0) + goto cleanup; + + ret = 0; + cleanup: + VIR_FREE(cache_dir); + VIR_FREE(cache_str); + return ret; +} +
Why all this fuzz? You should instead pass pointer to virStrToLong_i to get where the number ends and then use virScaleInteger() to multiply it properly.
+/* Fill all cache bank informations */ +static virResCacheBankPtr virResCtrlGetCacheBanks(int type, int *n_sockets) +{
Still could be in virhostcpu.c
+ int npresent_cpus; + int idx = -1; + size_t i; + virResCacheBankPtr bank; + + *n_sockets = 1; + if ((npresent_cpus = virHostCPUGetCount()) < 0) + return NULL; + + if (type == VIR_RDT_RESOURCE_L3 + || type == VIR_RDT_RESOURCE_L3DATA + || type == VIR_RDT_RESOURCE_L3CODE) + idx = 3; + else if (type == VIR_RDT_RESOURCE_L2) + idx = 2; + + if (idx == -1) + return NULL; +
Indentation, "||" should be on the previous line but, most importantly, why not switch? That'd make sure you won't miss any enum value and if someone adds a new one, compilator will see it's missing here.
+ if (VIR_ALLOC_N(bank, *n_sockets) < 0) { + *n_sockets = 0;
set this before the first return so that this function guarantees n_sockets to be 0 on fail. Moreover, n_sockets is always set to 1 here. Due to the way the rest of the function is designed, this doesn't have to be here at all.
+ return NULL; + } + + for (i = 0; i < npresent_cpus; i ++) { + int s_id; + int cache_size; + + if (virResctrlGetCPUSocketID(i, &s_id) < 0) + goto error; + + if (s_id > (*n_sockets - 1)) { + size_t cur = *n_sockets; + size_t exp = s_id - (*n_sockets) + 1; + if (VIR_EXPAND_N(bank, cur, exp) < 0) + goto error; + *n_sockets = s_id + 1; + } + + if (bank[s_id].cpu_mask == NULL) { + if (!(bank[s_id].cpu_mask = virBitmapNew(npresent_cpus))) + goto error; + } + + ignore_value(virBitmapSetBit(bank[s_id].cpu_mask, i)); + + if (bank[s_id].cache_size == 0) { + if (virResCtrlGetCPUCache(i, idx, &cache_size) < 0) + goto error; + + bank[s_id].cache_size = cache_size; + bank[s_id].cache_min = cache_size / resctrlall[type].cbm_len; + } + } + return bank; +
Wouldn't it be easier to have list of virResCacheBankPtr *banks; and size_t nbanks; and then just populate each pointer when that socket is on the system, so that you that NULL means the socket info was not filled yet. Or just a list that isn't sparse (like yours is now). The logic here seems hard to read.
I'll continue the review tomorrow.
Martin
+ error: + *n_sockets = 0; + VIR_FREE(bank); + return NULL; +} + +static int virResCtrlGetConfig(int type)
I feel like 'Get' implies you're returning it, I'd go with 'Read' instead.
+{ + int ret; + size_t i; + char *str; + + /* Read min_cbm_bits from resctrl. + eg: /sys/fs/resctrl/info/L3/num_closids + */ + if ((ret = virResCtrlGetInfoStr(type, "num_closids", &str)) < 0) + return ret; + + if (virStrToLong_i(str, NULL, 10, &resctrlall[type].num_closid) < 0) + return -1;
You leak str here ^^
+ + VIR_FREE(str); + + /* Read min_cbm_bits from resctrl. + eg: /sys/fs/resctrl/info/L3/cbm_mask + */ + if ((ret = virResCtrlGetInfoStr(type, "min_cbm_bits", &str)) < 0) + return ret; + + if (virStrToLong_i(str, NULL, 10, &resctrlall[type].min_cbm_bits) < 0) + return -1;
Same here
+ + VIR_FREE(str); + + /* Read cbm_mask string from resctrl. + eg: /sys/fs/resctrl/info/L3/cbm_mask + */ + if ((ret = virResCtrlGetInfoStr(type, "cbm_mask", &str)) < 0) + return ret; + + /* Calculate cbm length from the default cbm_mask. */
The comment could say the mask is in hex and that's why strlen(s) * 4 is OK. Question: It can never be, for example, 38 bits or just 2? Just asking to be sure.
+ resctrlall[type].cbm_len = strlen(str) * 4; + VIR_FREE(str); + + /* Get all cache bank informations */ + resctrlall[type].cache_banks = virResCtrlGetCacheBanks(type, + &(resctrlall[type].num_banks)); + + if (resctrlall[type].cache_banks == NULL) + return -1; + + for (i = 0; i < resctrlall[type].num_banks; i++) { + /*L3CODE and L3DATA shares same L3 resource, so they should + * have same host_id. */ + if (type == VIR_RDT_RESOURCE_L3CODE) + resctrlall[type].cache_banks[i].host_id = resctrlall[VIR_RDT_RESOURCE_L3DATA].cache_banks[i].host_id; + else + resctrlall[type].cache_banks[i].host_id = host_id++;
Shouldn't this be done only if CDP is not supported?
+ } + + resctrlall[type].enabled = true; + + return ret; +} + +int +virResCtrlInit(void) +{ + size_t i = 0; + char *tmp; + int rc = 0; + + for (i = 0; i < VIR_RDT_RESOURCE_LAST; i++) { + if ((rc = virAsprintf(&tmp, "%s/%s", RESCTRL_INFO_DIR, resctrlall[i].name)) < 0) { + VIR_ERROR(_("Failed to initialize resource control config")); + return -1; + } + if (virFileExists(tmp)) { + if ((rc = virResCtrlGetConfig(i)) < 0) + VIR_ERROR(_("Failed to get resource control config"));
virReportError should be used here, VIR_ERROR has a specific use case that's not applicable here.
+ return -1;
tmp leaks here
+ } + + VIR_FREE(tmp); + } + return rc; +} + +/* + * Test whether the host support resource control + */ +bool +virResCtrlAvailable(void) +{ + if (!virFileExists(RESCTRL_INFO_DIR)) + return false; + return true; +} + +/* + * Return an virResCtrlPtr point to virResCtrl object, + * We should not modify it out side of virresctrl.c + */
That's easy to achieve. If you only put the typedef into the header file and accessors into this file, then no other code will be able to modify the data because the pointer will be opaque. Even if it is modifiable, that should be the way to go.
+virResCtrlPtr +virResCtrlGet(int type) +{ + return &resctrlall[type]; +} diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h new file mode 100644 index 0000000..aa113f4 --- /dev/null +++ b/src/util/virresctrl.h @@ -0,0 +1,80 @@ +/* + * * virresctrl.h: methods for managing rscctrl + * * + * * Copyright (C) 2016 Intel, Inc. + * * + * * This library is free software; you can redistribute it and/or + * * modify it under the terms of the GNU Lesser General Public + * * License as published by the Free Software Foundation; either + * * version 2.1 of the License, or (at your option) any later version. + *
Too many asterisks.
+ * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Authors: + * Eli Qiao <liyong.qiao@intel.com> + */ + +#ifndef __VIR_RESCTRL_H__ +# define __VIR_RESCTRL_H__ + +# include "virutil.h"
What do you need virutil in the header file for?
+# include "virbitmap.h" + +enum { + VIR_RDT_RESOURCE_L3, + VIR_RDT_RESOURCE_L3DATA, + VIR_RDT_RESOURCE_L3CODE, + VIR_RDT_RESOURCE_L2, + /* Must be the last */ + VIR_RDT_RESOURCE_LAST, +}; + + +typedef struct _virResCacheBank virResCacheBank; +typedef virResCacheBank *virResCacheBankPtr; +struct _virResCacheBank { + unsigned int host_id; + unsigned long long cache_size; + unsigned long long cache_left; + unsigned long long cache_min; + virBitmapPtr cpu_mask; +}; + +/** + * struct rdt_resource - attributes of an RDT resource + * @enabled: Is this feature enabled on this machine + * @name: Name to use in "schemata" file + * @num_closid: Number of CLOSIDs available + * @max_cbm: Largest Cache Bit Mask allowed + * @min_cbm_bits: Minimum number of consecutive bits to be set + * in a cache bit mask + * @cache_level: Which cache level defines scope of this domain + * @num_banks: Number of cache bank on this machine. + * @cache_banks: Array of cache bank + */ +typedef struct _virResCtrl virResCtrl; +typedef virResCtrl *virResCtrlPtr; +struct _virResCtrl { + bool enabled; + const char *name; + int num_closid; + int cbm_len; + int min_cbm_bits; + const char* cache_level; + int num_banks; + virResCacheBankPtr cache_banks; +}; + + +bool virResCtrlAvailable(void); +int virResCtrlInit(void); +virResCtrlPtr virResCtrlGet(int); + +#endif -- 1.9.1
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

-- Best regards Eli 天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Friday, 17 February 2017 at 8:47 PM, Martin Kletzander wrote:
On Thu, Feb 16, 2017 at 06:03:50PM +0100, Martin Kletzander wrote:
On Thu, Feb 16, 2017 at 05:35:12PM +0800, Eli Qiao wrote:
This patch adds some utils struct and functions to expose resctrl information.
One tiny nitpick, but it might actually help you as well. You can use -v7 parameter to git send-email or git format-patch to automatically add 'v7' to the subject-prefix keeping the "PATCH" in there. Not only could this be easier for you, but people like me, who filter patches and other mails on the list to different folders, would have these properly categorized.
Anyway, for the review now.
virResCtrlAvailable: If resctrl interface exist on host virResCtrlGet: get specify type resource contral information
"get specific resource control information" ???
virResCtrlInit: initialize resctrl struct from the host's sys fs. resctrlall[]: an array to maintain resource control information.
Signed-off-by: Eli Qiao <liyong.qiao@intel.com (mailto:liyong.qiao@intel.com)> --- include/libvirt/virterror.h | 1 + po/POTFILES.in (http://POTFILES.in) | 1 + src/Makefile.am (http://Makefile.am) | 1 + src/libvirt_private.syms | 4 + src/util/virerror.c | 1 + src/util/virresctrl.c | 343 ++++++++++++++++++++++++++++++++++++++++++++ src/util/virresctrl.h | 80 +++++++++++ 7 files changed, 431 insertions(+) create mode 100644 src/util/virresctrl.c create mode 100644 src/util/virresctrl.h
diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c new file mode 100644 index 0000000..80481bc --- /dev/null +++ b/src/util/virresctrl.c @@ -0,0 +1,343 @@ +/* + * virresctrl.c: methods for managing resource contral + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Authors: + * Eli Qiao <liyong.qiao@intel.com (mailto:liyong.qiao@intel.com)> + */ +#include <config.h> + +#include <sys/ioctl.h> +#if defined HAVE_SYS_SYSCALL_H +# include <sys/syscall.h>
What do you need syscall.h for?
+#endif +#include <sys/types.h> +#include <sys/stat.h> +#include <fcntl.h> + +#include "virresctrl.h" +#include "viralloc.h" +#include "virerror.h" +#include "virfile.h" +#include "virhostcpu.h" +#include "virlog.h" +#include "virstring.h" +#include "nodeinfo.h" + +VIR_LOG_INIT("util.resctrl"); + +#define VIR_FROM_THIS VIR_FROM_RESCTRL + +#define RESCTRL_DIR "/sys/fs/resctrl" +#define RESCTRL_INFO_DIR "/sys/fs/resctrl/info" +#define SYSFS_SYSTEM_PATH "/sys/devices/system" +
If you need SYSFS_..._PATH for anything, it probably could be split into other src/util/ files. Example below.
+#define MAX_CPU_SOCKET_NUM 8 +#define MAX_CBM_BIT_LEN 32 +#define MAX_SCHEMATA_LEN 1024 +#define MAX_FILE_LEN ( 10 * 1024 * 1024) + + +static unsigned int host_id; + +static virResCtrl resctrlall[] = { + { + .name = "L3", + .cache_level = "l3", + }, + { + .name = "L3DATA", + .cache_level = "l3", + }, + { + .name = "L3CODE", + .cache_level = "l3", + }, + { + .name = "L2", + .cache_level = "l2", + }, +}; + +static int virResCtrlGetInfoStr(const int type, const char *item, char **str) +{ + int ret = 0; + char *tmp; + char *path; + + if (virAsprintf(&path, "%s/%s/%s", RESCTRL_INFO_DIR, resctrlall[type].name, item) < 0) + return -1; + if (virFileReadAll(path, 10, str) < 0) { + ret = -1; + goto cleanup; + } + + if ((tmp = strchr(*str, '\n'))) *tmp = '\0'; + + cleanup: + VIR_FREE(path); + return ret; +} + + +static int virResCtrlGetCPUValue(const char *path, char **value)
It would be more consistent if you reused parts of virHostCPUGetValue(), put them in a function, and use that one in both this one an the original one. It chould be also put in the virhostcpu.c since that's about the host cpu.
+static int virResctrlGetCPUSocketID(const size_t cpu, int *socket_id)
No need for this function, just use virHostCPUParseSocket()
+static int virResCtrlGetCPUCache(const size_t cpu, int type, int *cache)
So we have some places in the code that get info from sysfs. I understand that cache is controlled in the resctrl, but one doesn't have to have resctrl to get some cache info, so I would move this function into virhostcpu.c and keep here only the stuff strictly related to resource control.
+{ + int ret = -1; + char *cache_dir = NULL; + char *cache_str = NULL; + char *tmp; + int carry = -1; + + if (virAsprintf(&cache_dir, + "%s/cpu/cpu%zu/cache/index%d/size", + SYSFS_SYSTEM_PATH, cpu, type) < 0) + return -1; + + if (virResCtrlGetCPUValue(cache_dir, &cache_str) < 0) + goto cleanup; + + tmp = cache_str; + + while (*tmp != '\0') tmp++; + + if (*(tmp - 1) == 'K') { + *(tmp - 1) = '\0'; + carry = 1; + } else if (*(tmp - 1) == 'M') { + *(tmp - 1) = '\0'; + carry = 1024; + } + + if (virStrToLong_i(cache_str, NULL, 0, cache) < 0) + goto cleanup; + + *cache = (*cache) * carry; + + if (*cache < 0) + goto cleanup; + + ret = 0; + cleanup: + VIR_FREE(cache_dir); + VIR_FREE(cache_str); + return ret; +} +
Why all this fuzz? You should instead pass pointer to virStrToLong_i to get where the number ends and then use virScaleInteger() to multiply it properly.
+/* Fill all cache bank informations */ +static virResCacheBankPtr virResCtrlGetCacheBanks(int type, int *n_sockets) +{
Still could be in virhostcpu.c
+ int npresent_cpus; + int idx = -1; + size_t i; + virResCacheBankPtr bank; + + *n_sockets = 1; + if ((npresent_cpus = virHostCPUGetCount()) < 0) + return NULL; + + if (type == VIR_RDT_RESOURCE_L3 + || type == VIR_RDT_RESOURCE_L3DATA + || type == VIR_RDT_RESOURCE_L3CODE) + idx = 3; + else if (type == VIR_RDT_RESOURCE_L2) + idx = 2; + + if (idx == -1) + return NULL; +
Indentation, "||" should be on the previous line but, most importantly, why not switch? That'd make sure you won't miss any enum value and if someone adds a new one, compilator will see it's missing here.
+ if (VIR_ALLOC_N(bank, *n_sockets) < 0) { + *n_sockets = 0;
set this before the first return so that this function guarantees n_sockets to be 0 on fail. Moreover, n_sockets is always set to 1 here. Due to the way the rest of the function is designed, this doesn't have to be here at all.
+ return NULL; + } + + for (i = 0; i < npresent_cpus; i ++) { + int s_id; + int cache_size; + + if (virResctrlGetCPUSocketID(i, &s_id) < 0) + goto error; + + if (s_id > (*n_sockets - 1)) { + size_t cur = *n_sockets; + size_t exp = s_id - (*n_sockets) + 1; + if (VIR_EXPAND_N(bank, cur, exp) < 0) + goto error; + *n_sockets = s_id + 1; + } + + if (bank[s_id].cpu_mask == NULL) { + if (!(bank[s_id].cpu_mask = virBitmapNew(npresent_cpus))) + goto error; + } + + ignore_value(virBitmapSetBit(bank[s_id].cpu_mask, i)); + + if (bank[s_id].cache_size == 0) { + if (virResCtrlGetCPUCache(i, idx, &cache_size) < 0) + goto error; + + bank[s_id].cache_size = cache_size; + bank[s_id].cache_min = cache_size / resctrlall[type].cbm_len; + } + } + return bank; +
Wouldn't it be easier to have list of virResCacheBankPtr *banks; and size_t nbanks; and then just populate each pointer when that socket is on the system, so that you that NULL means the socket info was not filled yet. Or just a list that isn't sparse (like yours is now). The logic here seems hard to read.
I'll continue the review tomorrow.
Martin
+ error: + *n_sockets = 0; + VIR_FREE(bank); + return NULL; +} + +static int virResCtrlGetConfig(int type)
I feel like 'Get' implies you're returning it, I'd go with 'Read' instead.
Done
+{ + int ret; + size_t i; + char *str; + + /* Read min_cbm_bits from resctrl. + eg: /sys/fs/resctrl/info/L3/num_closids + */ + if ((ret = virResCtrlGetInfoStr(type, "num_closids", &str)) < 0) + return ret; + + if (virStrToLong_i(str, NULL, 10, &resctrlall[type].num_closid) < 0) + return -1;
You leak str here ^^
Done
+ + VIR_FREE(str); + + /* Read min_cbm_bits from resctrl. + eg: /sys/fs/resctrl/info/L3/cbm_mask + */ + if ((ret = virResCtrlGetInfoStr(type, "min_cbm_bits", &str)) < 0) + return ret; + + if (virStrToLong_i(str, NULL, 10, &resctrlall[type].min_cbm_bits) < 0) + return -1;
Same here
Done
+ + VIR_FREE(str); + + /* Read cbm_mask string from resctrl. + eg: /sys/fs/resctrl/info/L3/cbm_mask + */ + if ((ret = virResCtrlGetInfoStr(type, "cbm_mask", &str)) < 0) + return ret; + + /* Calculate cbm length from the default cbm_mask. */
The comment could say the mask is in hex and that's why strlen(s) * 4 is OK.
Question: It can never be, for example, 38 bits or just 2? Just asking to be sure.
+ resctrlall[type].cbm_len = strlen(str) * 4; + VIR_FREE(str); + + /* Get all cache bank informations */ + resctrlall[type].cache_banks = virResCtrlGetCacheBanks(type, + &(resctrlall[type].num_banks)); + + if (resctrlall[type].cache_banks == NULL) + return -1; + + for (i = 0; i < resctrlall[type].num_banks; i++) { + /*L3CODE and L3DATA shares same L3 resource, so they should + * have same host_id. */ + if (type == VIR_RDT_RESOURCE_L3CODE) + resctrlall[type].cache_banks[i].host_id = resctrlall[VIR_RDT_RESOURCE_L3DATA].cache_banks[i].host_id; + else + resctrlall[type].cache_banks[i].host_id = host_id++;
Shouldn't this be done only if CDP is not supported?
Not really, here ’s tricky. only host id for VIR_RDT_RESOURCE_L3CODE set to VIR_RDT_RESOURCE_L3DATA
+ } + + resctrlall[type].enabled = true; + + return ret; +} + +int +virResCtrlInit(void) +{ + size_t i = 0; + char *tmp; + int rc = 0; + + for (i = 0; i < VIR_RDT_RESOURCE_LAST; i++) { + if ((rc = virAsprintf(&tmp, "%s/%s", RESCTRL_INFO_DIR, resctrlall[i].name)) < 0) { + VIR_ERROR(_("Failed to initialize resource control config")); + return -1; + } + if (virFileExists(tmp)) { + if ((rc = virResCtrlGetConfig(i)) < 0) + VIR_ERROR(_("Failed to get resource control config"));
virReportError should be used here, VIR_ERROR has a specific use case that's not applicable here.
Done.
+ return -1;
tmp leaks here
Done.
+ } + + VIR_FREE(tmp); + } + return rc; +} + +/* + * Test whether the host support resource control + */ +bool +virResCtrlAvailable(void) +{ + if (!virFileExists(RESCTRL_INFO_DIR)) + return false; + return true; +} + +/* + * Return an virResCtrlPtr point to virResCtrl object, + * We should not modify it out side of virresctrl.c + */
That's easy to achieve. If you only put the typedef into the header file and accessors into this file, then no other code will be able to modify the data because the pointer will be opaque. Even if it is modifiable, that should be the way to go.
I tried to move accessors to .c file, but failed, I can not even reference it outside.
+virResCtrlPtr +virResCtrlGet(int type) +{ + return &resctrlall[type]; +} diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h new file mode 100644 index 0000000..aa113f4 --- /dev/null +++ b/src/util/virresctrl.h @@ -0,0 +1,80 @@ +/* + * * virresctrl.h: methods for managing rscctrl + * * + * * Copyright (C) 2016 Intel, Inc. + * * + * * This library is free software; you can redistribute it and/or + * * modify it under the terms of the GNU Lesser General Public + * * License as published by the Free Software Foundation; either + * * version 2.1 of the License, or (at your option) any later version. + *
Too many asterisks. Done.
+ * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Authors: + * Eli Qiao <liyong.qiao@intel.com (mailto:liyong.qiao@intel.com)> + */ + +#ifndef __VIR_RESCTRL_H__ +# define __VIR_RESCTRL_H__ + +# include "virutil.h"
What do you need virutil in the header file for?
Removed
+# include "virbitmap.h" + +enum { + VIR_RDT_RESOURCE_L3, + VIR_RDT_RESOURCE_L3DATA, + VIR_RDT_RESOURCE_L3CODE, + VIR_RDT_RESOURCE_L2, + /* Must be the last */ + VIR_RDT_RESOURCE_LAST, +}; + + +typedef struct _virResCacheBank virResCacheBank; +typedef virResCacheBank *virResCacheBankPtr; +struct _virResCacheBank { + unsigned int host_id; + unsigned long long cache_size; + unsigned long long cache_left; + unsigned long long cache_min; + virBitmapPtr cpu_mask; +}; + +/** + * struct rdt_resource - attributes of an RDT resource + * @enabled: Is this feature enabled on this machine + * @name: Name to use in "schemata" file + * @num_closid: Number of CLOSIDs available + * @max_cbm: Largest Cache Bit Mask allowed + * @min_cbm_bits: Minimum number of consecutive bits to be set + * in a cache bit mask + * @cache_level: Which cache level defines scope of this domain + * @num_banks: Number of cache bank on this machine. + * @cache_banks: Array of cache bank + */ +typedef struct _virResCtrl virResCtrl; +typedef virResCtrl *virResCtrlPtr; +struct _virResCtrl { + bool enabled; + const char *name; + int num_closid; + int cbm_len; + int min_cbm_bits; + const char* cache_level; + int num_banks; + virResCacheBankPtr cache_banks; +}; + + +bool virResCtrlAvailable(void); +int virResCtrlInit(void); +virResCtrlPtr virResCtrlGet(int); + +#endif -- 1.9.1
-- libvir-list mailing list libvir-list@redhat.com (mailto:libvir-list@redhat.com) https://www.redhat.com/mailman/listinfo/libvir-list

-- Best regards Eli 天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Friday, 17 February 2017 at 1:03 AM, Martin Kletzander wrote:
On Thu, Feb 16, 2017 at 05:35:12PM +0800, Eli Qiao wrote:
This patch adds some utils struct and functions to expose resctrl information.
One tiny nitpick, but it might actually help you as well. You can use -v7 parameter to git send-email or git format-patch to automatically add 'v7' to the subject-prefix keeping the "PATCH" in there. Not only could this be easier for you, but people like me, who filter patches and other mails on the list to different folders, would have these properly categorized.
Anyway, for the review now.
Thx
virResCtrlAvailable: If resctrl interface exist on host virResCtrlGet: get specify type resource contral information
"get specific resource control information" ???
Yep
virResCtrlInit: initialize resctrl struct from the host's sys fs. resctrlall[]: an array to maintain resource control information.
Signed-off-by: Eli Qiao <liyong.qiao@intel.com (mailto:liyong.qiao@intel.com)> --- include/libvirt/virterror.h | 1 + po/POTFILES.in (http://POTFILES.in) | 1 + src/Makefile.am (http://Makefile.am) | 1 + src/libvirt_private.syms | 4 + src/util/virerror.c | 1 + src/util/virresctrl.c | 343 ++++++++++++++++++++++++++++++++++++++++++++ src/util/virresctrl.h | 80 +++++++++++ 7 files changed, 431 insertions(+) create mode 100644 src/util/virresctrl.c create mode 100644 src/util/virresctrl.h
diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c new file mode 100644 index 0000000..80481bc --- /dev/null +++ b/src/util/virresctrl.c @@ -0,0 +1,343 @@ +/* + * virresctrl.c: methods for managing resource contral + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Authors: + * Eli Qiao <liyong.qiao@intel.com (mailto:liyong.qiao@intel.com)> + */ +#include <config.h> + +#include <sys/ioctl.h> +#if defined HAVE_SYS_SYSCALL_H +# include <sys/syscall.h>
What do you need syscall.h for?
Removed
+#endif +#include <sys/types.h> +#include <sys/stat.h> +#include <fcntl.h> + +#include "virresctrl.h" +#include "viralloc.h" +#include "virerror.h" +#include "virfile.h" +#include "virhostcpu.h" +#include "virlog.h" +#include "virstring.h" +#include "nodeinfo.h" + +VIR_LOG_INIT("util.resctrl"); + +#define VIR_FROM_THIS VIR_FROM_RESCTRL + +#define RESCTRL_DIR "/sys/fs/resctrl" +#define RESCTRL_INFO_DIR "/sys/fs/resctrl/info" +#define SYSFS_SYSTEM_PATH "/sys/devices/system" +
If you need SYSFS_..._PATH for anything, it probably could be split into other src/util/ files. Example below.
+#define MAX_CPU_SOCKET_NUM 8 +#define MAX_CBM_BIT_LEN 32 +#define MAX_SCHEMATA_LEN 1024 +#define MAX_FILE_LEN ( 10 * 1024 * 1024) + + +static unsigned int host_id; + +static virResCtrl resctrlall[] = { + { + .name = "L3", + .cache_level = "l3", + }, + { + .name = "L3DATA", + .cache_level = "l3", + }, + { + .name = "L3CODE", + .cache_level = "l3", + }, + { + .name = "L2", + .cache_level = "l2", + }, +}; + +static int virResCtrlGetInfoStr(const int type, const char *item, char **str) +{ + int ret = 0; + char *tmp; + char *path; + + if (virAsprintf(&path, "%s/%s/%s", RESCTRL_INFO_DIR, resctrlall[type].name, item) < 0) + return -1; + if (virFileReadAll(path, 10, str) < 0) { + ret = -1; + goto cleanup; + } + + if ((tmp = strchr(*str, '\n'))) *tmp = '\0'; + + cleanup: + VIR_FREE(path); + return ret; +} + + +static int virResCtrlGetCPUValue(const char *path, char **value)
It would be more consistent if you reused parts of virHostCPUGetValue(), put them in a function, and use that one in both this one an the original one. It chould be also put in the virhostcpu.c since that's about the host cpu.
Done
+static int virResctrlGetCPUSocketID(const size_t cpu, int *socket_id)
No need for this function, just use virHostCPUParseSocket()
virHostCPUParseSocket is a static function, I created a new one.
+static int virResCtrlGetCPUCache(const size_t cpu, int type, int *cache)
So we have some places in the code that get info from sysfs. I understand that cache is controlled in the resctrl, but one doesn't have to have resctrl to get some cache info, so I would move this function into virhostcpu.c and keep here only the stuff strictly related to resource control.
Done
+{ + int ret = -1; + char *cache_dir = NULL; + char *cache_str = NULL; + char *tmp; + int carry = -1; + + if (virAsprintf(&cache_dir, + "%s/cpu/cpu%zu/cache/index%d/size", + SYSFS_SYSTEM_PATH, cpu, type) < 0) + return -1; + + if (virResCtrlGetCPUValue(cache_dir, &cache_str) < 0) + goto cleanup; + + tmp = cache_str; + + while (*tmp != '\0') tmp++; + + if (*(tmp - 1) == 'K') { + *(tmp - 1) = '\0'; + carry = 1; + } else if (*(tmp - 1) == 'M') { + *(tmp - 1) = '\0'; + carry = 1024; + } + + if (virStrToLong_i(cache_str, NULL, 0, cache) < 0) + goto cleanup; + + *cache = (*cache) * carry; + + if (*cache < 0) + goto cleanup; + + ret = 0; + cleanup: + VIR_FREE(cache_dir); + VIR_FREE(cache_str); + return ret; +} +
Why all this fuzz? You should instead pass pointer to virStrToLong_i to get where the number ends and then use virScaleInteger() to multiply it properly.
Good to know, thx. done.
+/* Fill all cache bank informations */ +static virResCacheBankPtr virResCtrlGetCacheBanks(int type, int *n_sockets) +{
Still could be in virhostcpu.c
okay, will move it to there.
+ int npresent_cpus; + int idx = -1; + size_t i; + virResCacheBankPtr bank; + + *n_sockets = 1; + if ((npresent_cpus = virHostCPUGetCount()) < 0) + return NULL; + + if (type == VIR_RDT_RESOURCE_L3 + || type == VIR_RDT_RESOURCE_L3DATA + || type == VIR_RDT_RESOURCE_L3CODE) + idx = 3; + else if (type == VIR_RDT_RESOURCE_L2) + idx = 2; + + if (idx == -1) + return NULL; +
Indentation, "||" should be on the previous line but, most importantly, why not switch? That'd make sure you won't miss any enum value and if someone adds a new one, compilator will see it's missing here.
Done
+ if (VIR_ALLOC_N(bank, *n_sockets) < 0) { + *n_sockets = 0;
set this before the first return so that this function guarantees n_sockets to be 0 on fail. Moreover, n_sockets is always set to 1 here. Due to the way the rest of the function is designed, this doesn't have to be here at all.
Done.
+ return NULL; + } + + for (i = 0; i < npresent_cpus; i ++) { + int s_id; + int cache_size; + + if (virResctrlGetCPUSocketID(i, &s_id) < 0) + goto error; + + if (s_id > (*n_sockets - 1)) { + size_t cur = *n_sockets; + size_t exp = s_id - (*n_sockets) + 1; + if (VIR_EXPAND_N(bank, cur, exp) < 0) + goto error; + *n_sockets = s_id + 1; + } + + if (bank[s_id].cpu_mask == NULL) { + if (!(bank[s_id].cpu_mask = virBitmapNew(npresent_cpus))) + goto error; + } + + ignore_value(virBitmapSetBit(bank[s_id].cpu_mask, i)); + + if (bank[s_id].cache_size == 0) { + if (virResCtrlGetCPUCache(i, idx, &cache_size) < 0) + goto error; + + bank[s_id].cache_size = cache_size; + bank[s_id].cache_min = cache_size / resctrlall[type].cbm_len; + } + } + return bank; +
Wouldn't it be easier to have list of virResCacheBankPtr *banks; and size_t nbanks; and then just populate each pointer when that socket is on the system, so that you that NULL means the socket info was not filled yet. Or just a list that isn't sparse (like yours is now). The logic here seems hard to read.
:( sorry, I don’t get you Are you saying pre-allocate memory for virResCacheBankPtr? here nbanks are equal to sockets. I can not know how many nbanks(sockets) on the host before virResCtrlGetCacheBanks. can you show me some code examples?
I'll continue the review tomorrow.
Martin
+ error: + *n_sockets = 0; + VIR_FREE(bank); + return NULL; +} + +static int virResCtrlGetConfig(int type) +{ + int ret; + size_t i; + char *str; + + /* Read min_cbm_bits from resctrl. + eg: /sys/fs/resctrl/info/L3/num_closids + */ + if ((ret = virResCtrlGetInfoStr(type, "num_closids", &str)) < 0) + return ret; + + if (virStrToLong_i(str, NULL, 10, &resctrlall[type].num_closid) < 0) + return -1; + + VIR_FREE(str); + + /* Read min_cbm_bits from resctrl. + eg: /sys/fs/resctrl/info/L3/cbm_mask + */ + if ((ret = virResCtrlGetInfoStr(type, "min_cbm_bits", &str)) < 0) + return ret; + + if (virStrToLong_i(str, NULL, 10, &resctrlall[type].min_cbm_bits) < 0) + return -1; + + VIR_FREE(str); + + /* Read cbm_mask string from resctrl. + eg: /sys/fs/resctrl/info/L3/cbm_mask + */ + if ((ret = virResCtrlGetInfoStr(type, "cbm_mask", &str)) < 0) + return ret; + + /* Calculate cbm length from the default cbm_mask. */ + resctrlall[type].cbm_len = strlen(str) * 4; + VIR_FREE(str); + + /* Get all cache bank informations */ + resctrlall[type].cache_banks = virResCtrlGetCacheBanks(type, + &(resctrlall[type].num_banks)); + + if (resctrlall[type].cache_banks == NULL) + return -1; + + for (i = 0; i < resctrlall[type].num_banks; i++) { + /*L3CODE and L3DATA shares same L3 resource, so they should + * have same host_id. */ + if (type == VIR_RDT_RESOURCE_L3CODE) + resctrlall[type].cache_banks[i].host_id = resctrlall[VIR_RDT_RESOURCE_L3DATA].cache_banks[i].host_id; + else + resctrlall[type].cache_banks[i].host_id = host_id++; + } + + resctrlall[type].enabled = true; + + return ret; +} + +int +virResCtrlInit(void) +{ + size_t i = 0; + char *tmp; + int rc = 0; + + for (i = 0; i < VIR_RDT_RESOURCE_LAST; i++) { + if ((rc = virAsprintf(&tmp, "%s/%s", RESCTRL_INFO_DIR, resctrlall[i].name)) < 0) { + VIR_ERROR(_("Failed to initialize resource control config")); + return -1; + } + if (virFileExists(tmp)) { + if ((rc = virResCtrlGetConfig(i)) < 0) + VIR_ERROR(_("Failed to get resource control config")); + return -1; + } + + VIR_FREE(tmp); + } + return rc; +} + +/* + * Test whether the host support resource control + */ +bool +virResCtrlAvailable(void) +{ + if (!virFileExists(RESCTRL_INFO_DIR)) + return false; + return true; +} + +/* + * Return an virResCtrlPtr point to virResCtrl object, + * We should not modify it out side of virresctrl.c + */ +virResCtrlPtr +virResCtrlGet(int type) +{ + return &resctrlall[type]; +} diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h new file mode 100644 index 0000000..aa113f4 --- /dev/null +++ b/src/util/virresctrl.h @@ -0,0 +1,80 @@ +/* + * * virresctrl.h: methods for managing rscctrl + * * + * * Copyright (C) 2016 Intel, Inc. + * * + * * This library is free software; you can redistribute it and/or + * * modify it under the terms of the GNU Lesser General Public + * * License as published by the Free Software Foundation; either + * * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Authors: + * Eli Qiao <liyong.qiao@intel.com (mailto:liyong.qiao@intel.com)> + */ + +#ifndef __VIR_RESCTRL_H__ +# define __VIR_RESCTRL_H__ + +# include "virutil.h" +# include "virbitmap.h" + +enum { + VIR_RDT_RESOURCE_L3, + VIR_RDT_RESOURCE_L3DATA, + VIR_RDT_RESOURCE_L3CODE, + VIR_RDT_RESOURCE_L2, + /* Must be the last */ + VIR_RDT_RESOURCE_LAST, +}; + + +typedef struct _virResCacheBank virResCacheBank; +typedef virResCacheBank *virResCacheBankPtr; +struct _virResCacheBank { + unsigned int host_id; + unsigned long long cache_size; + unsigned long long cache_left; + unsigned long long cache_min; + virBitmapPtr cpu_mask; +}; + +/** + * struct rdt_resource - attributes of an RDT resource + * @enabled: Is this feature enabled on this machine + * @name: Name to use in "schemata" file + * @num_closid: Number of CLOSIDs available + * @max_cbm: Largest Cache Bit Mask allowed + * @min_cbm_bits: Minimum number of consecutive bits to be set + * in a cache bit mask + * @cache_level: Which cache level defines scope of this domain + * @num_banks: Number of cache bank on this machine. + * @cache_banks: Array of cache bank + */ +typedef struct _virResCtrl virResCtrl; +typedef virResCtrl *virResCtrlPtr; +struct _virResCtrl { + bool enabled; + const char *name; + int num_closid; + int cbm_len; + int min_cbm_bits; + const char* cache_level; + int num_banks; + virResCacheBankPtr cache_banks; +}; + + +bool virResCtrlAvailable(void); +int virResCtrlInit(void); +virResCtrlPtr virResCtrlGet(int); + +#endif -- 1.9.1

This patch expose cache information to host's capabilites xml. For l3 cache allocation <cache> <bank id='0' type='l3' size='56320' unit='KiB' cpus='0-21,44-65'> <control min='2816' reserved='2816' unit='KiB' scope='L3'/> </bank> <bank id='1' type='l3' size='56320' unit='KiB' cpus='22-43,66-87'> <control min='2816' reserved='2816' unit='KiB' scope='L3'/> </bank> </cache> For l3 cache allocation supported cdp(seperate data/code): <cache> <bank id='0' type='l3' size='56320' unit='KiB' cpus='0-21,44-65'> <control min='2816' reserved='2816' unit='KiB' scope='L3DATA'/> <control min='2816' reserved='2816' unit='KiB' scope='L3CODE'/> </bank> <bank id='1' type='l3' size='56320' unit='KiB' cpus='22-43,66-87'> <control min='2816' reserved='2816' unit='KiB' scope='L3DATA'/> <control min='2816' reserved='2816' unit='KiB' scope='L3CODE'/> </bank> </cache> RFC on mailing list. https://www.redhat.com/archives/libvir-list/2017-January/msg00644.html Signed-off-by: Eli Qiao <liyong.qiao@intel.com> --- src/conf/capabilities.c | 56 ++++++++++++++++++++++++++++++++++++++ src/conf/capabilities.h | 23 ++++++++++++++++ src/libvirt_private.syms | 3 +++ src/nodeinfo.c | 64 ++++++++++++++++++++++++++++++++++++++++++++ src/nodeinfo.h | 1 + src/qemu/qemu_capabilities.c | 8 ++++++ src/qemu/qemu_driver.c | 4 +++ 7 files changed, 159 insertions(+) diff --git a/src/conf/capabilities.c b/src/conf/capabilities.c index 9ab343b..23e21e4 100644 --- a/src/conf/capabilities.c +++ b/src/conf/capabilities.c @@ -198,6 +198,18 @@ virCapabilitiesClearSecModel(virCapsHostSecModelPtr secmodel) } static void +virCapabilitiesClearCacheBank(virCapsHostCacheBankPtr cachebank) +{ + size_t i; + for (i = 0; i < cachebank->ncontrol; i++) + VIR_FREE(cachebank->control[i].scope); + + VIR_FREE(cachebank->type); + VIR_FREE(cachebank->cpus); +} + + +static void virCapabilitiesDispose(void *object) { virCapsPtr caps = object; @@ -221,6 +233,10 @@ virCapabilitiesDispose(void *object) virCapabilitiesClearSecModel(&caps->host.secModels[i]); VIR_FREE(caps->host.secModels); + for (i = 0; i < caps->host.ncachebank; i++) + virCapabilitiesClearCacheBank(caps->host.cachebank[i]); + VIR_FREE(caps->host.cachebank); + VIR_FREE(caps->host.netprefix); VIR_FREE(caps->host.pagesSize); virCPUDefFree(caps->host.cpu); @@ -844,6 +860,41 @@ virCapabilitiesFormatNUMATopology(virBufferPtr buf, return 0; } +static int +virCapabilitiesFormatCache(virBufferPtr buf, + size_t ncachebank, + virCapsHostCacheBankPtr *cachebank) +{ + size_t i; + size_t j; + + virBufferAddLit(buf, "<cache>\n"); + virBufferAdjustIndent(buf, 2); + + for (i = 0; i < ncachebank; i++) { + virBufferAsprintf(buf, + "<bank id='%u' type='%s' size='%llu' unit='KiB' cpus='%s'>\n", + cachebank[i]->id, + cachebank[i]->type, + cachebank[i]->size, + cachebank[i]->cpus); + + virBufferAdjustIndent(buf, 2); + for (j = 0; j < cachebank[i]->ncontrol; j++) { + virBufferAsprintf(buf, + "<control min='%llu' reserved='%llu' unit='KiB' scope='%s'/>\n", + cachebank[i]->control[j].min, + cachebank[i]->control[j].reserved, + cachebank[i]->control[j].scope); + } + virBufferAdjustIndent(buf, -2); + virBufferAddLit(buf, "</bank>\n"); + } + virBufferAdjustIndent(buf, -2); + virBufferAddLit(buf, "</cache>\n"); + return 0; +} + /** * virCapabilitiesFormatXML: * @caps: capabilities to format @@ -931,6 +982,11 @@ virCapabilitiesFormatXML(virCapsPtr caps) virBufferAddLit(&buf, "</migration_features>\n"); } + if (caps->host.ncachebank && + virCapabilitiesFormatCache(&buf, caps->host.ncachebank, + caps->host.cachebank) < 0) + return NULL; + if (caps->host.netprefix) virBufferAsprintf(&buf, "<netprefix>%s</netprefix>\n", caps->host.netprefix); diff --git a/src/conf/capabilities.h b/src/conf/capabilities.h index cfdc34a..b446de5 100644 --- a/src/conf/capabilities.h +++ b/src/conf/capabilities.h @@ -138,6 +138,25 @@ struct _virCapsHostSecModel { virCapsHostSecModelLabelPtr labels; }; +typedef struct _virCapsHostCacheControl virCapsHostCacheControl; +typedef virCapsHostCacheControl *virCapsHostCacheControlPtr; +struct _virCapsHostCacheControl { + unsigned long long min; + unsigned long long reserved; + char* scope; +}; + +typedef struct _virCapsHostCacheBank virCapsHostCacheBank; +typedef virCapsHostCacheBank *virCapsHostCacheBankPtr; +struct _virCapsHostCacheBank { + unsigned int id; + char* type; + char* cpus; + unsigned long long size; + size_t ncontrol; + virCapsHostCacheControlPtr control; +}; + typedef struct _virCapsHost virCapsHost; typedef virCapsHost *virCapsHostPtr; struct _virCapsHost { @@ -160,6 +179,10 @@ struct _virCapsHost { size_t nsecModels; virCapsHostSecModelPtr secModels; + size_t ncachebank; + size_t ncachebank_max; + virCapsHostCacheBankPtr *cachebank; + char *netprefix; virCPUDefPtr cpu; int nPagesSize; /* size of pagesSize array */ diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 743e5ac..cc6c433 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1123,6 +1123,7 @@ virLogManagerNew; # nodeinfo.h nodeCapsInitNUMA; nodeGetInfo; +virCapsInitCache; virHostCPUGetCount; virHostCPUGetKVMMaxVCPUs; virHostCPUGetMap; @@ -2315,8 +2316,10 @@ virRandomInt; # util/virresctrl.h virResCtrlAvailable; +virResCtrlGet; virResCtrlInit; + # util/virrotatingfile.h virRotatingFileReaderConsume; virRotatingFileReaderFree; diff --git a/src/nodeinfo.c b/src/nodeinfo.c index f2ded02..a001cd0 100644 --- a/src/nodeinfo.c +++ b/src/nodeinfo.c @@ -48,6 +48,7 @@ #include "virstring.h" #include "virnuma.h" #include "virlog.h" +#include "virresctrl.h" #define VIR_FROM_THIS VIR_FROM_NONE @@ -416,3 +417,66 @@ nodeCapsInitNUMA(virCapsPtr caps) VIR_FREE(pageinfo); return ret; } + +int +virCapsInitCache(virCapsPtr caps) +{ + size_t i, j; + virResCtrlPtr resctrl; + virCapsHostCacheBankPtr bank; + + for (i = 0; i < VIR_RDT_RESOURCE_LAST; i++) { + /* L3DATA and L3CODE share L3 resources */ + if (i == VIR_RDT_RESOURCE_L3CODE) + continue; + + resctrl = virResCtrlGet(i); + + if (resctrl->enabled) { + for (j = 0; j < resctrl->num_banks; j++) { + if (VIR_RESIZE_N(caps->host.cachebank, caps->host.ncachebank_max, + caps->host.ncachebank, 1) < 0) + return -1; + + if (VIR_ALLOC(bank) < 0) + return -1; + + bank->id = resctrl->cache_banks[j].host_id; + if (VIR_STRDUP(bank->type, resctrl->cache_level) < 0) + goto err; + if (VIR_STRDUP(bank->cpus, virBitmapFormat(resctrl->cache_banks[j].cpu_mask)) < 0) + goto err; + bank->size = resctrl->cache_banks[j].cache_size; + /*L3DATA and L3CODE shares L3 cache resources, so fill them to the control element*/ + if (i == VIR_RDT_RESOURCE_L3DATA) { + if (VIR_EXPAND_N(bank->control, bank->ncontrol, 2) < 0) + goto err; + + bank->control[0].min = virResCtrlGet(VIR_RDT_RESOURCE_L3DATA)->cache_banks[j].cache_min; + bank->control[0].reserved = bank->control[0].min * (virResCtrlGet(VIR_RDT_RESOURCE_L3DATA)->min_cbm_bits); + if (VIR_STRDUP(bank->control[0].scope, + virResCtrlGet(VIR_RDT_RESOURCE_L3DATA)->name) < 0) + goto err; + + bank->control[1].min = virResCtrlGet(VIR_RDT_RESOURCE_L3CODE)->cache_banks[j].cache_min; + bank->control[1].reserved = bank->control[1].min * (virResCtrlGet(VIR_RDT_RESOURCE_L3CODE)->min_cbm_bits); + if (VIR_STRDUP(bank->control[1].scope, + virResCtrlGet(VIR_RDT_RESOURCE_L3CODE)->name) < 0) + goto err; + } else { + if (VIR_EXPAND_N(bank->control, bank->ncontrol, 1) < 0) + goto err; + bank->control[0].min = resctrl->cache_banks[j].cache_min; + bank->control[0].reserved = bank->control[0].min * resctrl->min_cbm_bits; + if (VIR_STRDUP(bank->control[0].scope, resctrl->name) < 0) + goto err; + } + caps->host.cachebank[caps->host.ncachebank++] = bank; + } + } + } + return 0; + err: + VIR_FREE(bank); + return -1; +} diff --git a/src/nodeinfo.h b/src/nodeinfo.h index 3c4dc46..5eb0f83 100644 --- a/src/nodeinfo.h +++ b/src/nodeinfo.h @@ -28,5 +28,6 @@ int nodeGetInfo(virNodeInfoPtr nodeinfo); int nodeCapsInitNUMA(virCapsPtr caps); +int virCapsInitCache(virCapsPtr caps); #endif /* __VIR_NODEINFO_H__*/ diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 3247d25..662a9ed 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -1098,6 +1098,11 @@ virQEMUCapsInitCPU(virCapsPtr caps, goto cleanup; } +static int +virQEMUCapsInitCache(virCapsPtr caps) +{ + return virCapsInitCache(caps); +} static int virQEMUCapsInitPages(virCapsPtr caps) @@ -1144,6 +1149,9 @@ virCapsPtr virQEMUCapsInit(virQEMUCapsCachePtr cache) if (virQEMUCapsInitCPU(caps, hostarch) < 0) VIR_WARN("Failed to get host CPU"); + if (virQEMUCapsInitCache(caps) < 0) + VIR_WARN("Failed to get host cache"); + /* Add the power management features of the host */ if (virNodeSuspendGetTargetMask(&caps->host.powerMgmt) < 0) VIR_WARN("Failed to get host power management capabilities"); diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 37ccfdf..7995511 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -105,6 +105,7 @@ #include "vircgroup.h" #include "virperf.h" #include "virnuma.h" +#include "virresctrl.h" #include "dirname.h" #include "network/bridge_driver.h" @@ -849,6 +850,9 @@ qemuStateInitialize(bool privileged, run_gid = cfg->group; } + if (virResCtrlAvailable() && virResCtrlInit() < 0) + VIR_WARN("Faild to initialize resource control."); + qemu_driver->qemuCapsCache = virQEMUCapsCacheNew(cfg->libDir, cfg->cacheDir, run_uid, -- 1.9.1

This patch adds new xml element to support cache tune as: <cputune> ... <cachetune id='1' host_id='0' type='l3' size='2816' unit='KiB' vcpus='1,2'/> ... </cputune> id: any non-minus number host_id: reference of the host's cache banks id, it's from capabilities type: cache bank type size: should be multiples of the min_size of the bank on host. vcpus: cache allocation on vcpu set, if empty, will apply the allocation on all vcpus --- docs/schemas/domaincommon.rng | 46 +++++++++++++ src/conf/domain_conf.c | 152 ++++++++++++++++++++++++++++++++++++++++++ src/conf/domain_conf.h | 19 ++++++ src/util/virresctrl.c | 32 +++++++-- src/util/virresctrl.h | 1 + 5 files changed, 245 insertions(+), 5 deletions(-) diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index cc6e0d0..edb2888 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -795,6 +795,32 @@ </attribute> </element> </zeroOrMore> + <zeroOrMore> + <element name="cachetune"> + <attribute name="id"> + <ref name="cacheid"/> + </attribute> + <attribute name="host_id"> + <ref name="hostid"/> + </attribute> + <attribute name="type"> + <ref name="cachetype"/> + </attribute> + <attribute name="size"> + <ref name="unsignedInt"/> + </attribute> + <optional> + <attribute name="unit"> + <ref name="cacheunit"/> + </attribute> + </optional> + <optional> + <attribute name="vcpus"> + <ref name="cpuset"/> + </attribute> + </optional> + </element> + </zeroOrMore> <optional> <element name="emulatorpin"> <attribute name="cpuset"> @@ -5451,6 +5477,26 @@ <param name="minInclusive">-1</param> </data> </define> + <define name="cacheid"> + <data type="unsignedShort"> + <param name="pattern">[0-9]+</param> + </data> + </define> + <define name="hostid"> + <data type="unsignedShort"> + <param name="pattern">[0-9]+</param> + </data> + </define> + <define name="cachetype"> + <data type="string"> + <param name="pattern">(l3)</param> + </data> + </define> + <define name="cacheunit"> + <data type="string"> + <param name="pattern">KiB</param> + </data> + </define> <!-- weight currently is in range [100, 1000] --> <define name="weight"> <data type="unsignedInt"> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index c06b128..430c451 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -56,6 +56,7 @@ #include "virstring.h" #include "virnetdev.h" #include "virhostdev.h" +#include "virresctrl.h" #define VIR_FROM_THIS VIR_FROM_DOMAIN @@ -15604,6 +15605,127 @@ virDomainVcpuPinDefParseXML(virDomainDefPtr def, return ret; } +/* Parse the XML definition for cachetune + * and a cachetune has the form + * <cachetune id='0' host_id='0' type='l3' size='1024' unit='KiB'/> + */ +static int +virDomainCacheTuneDefParseXML(virDomainDefPtr def, + int n, + xmlNodePtr* nodes) +{ + char* tmp = NULL; + size_t i, j; + int type = -1; + virDomainCacheBankPtr bank = NULL; + virResCtrlPtr resctrl; + + if (VIR_ALLOC_N(bank, n) < 0) + goto cleanup; + + for (i = 0; i < n; i++) { + if (!(tmp = virXMLPropString(nodes[i], "id"))) { + virReportError(VIR_ERR_XML_ERROR, "%s", _("missing id in cache tune")); + goto cleanup; + } + if (virStrToLong_uip(tmp, NULL, 10, &(bank[i].id)) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("invalid setting for cache id '%s'"), tmp); + goto cleanup; + } + + VIR_FREE(tmp); + if (!(tmp = virXMLPropString(nodes[i], "host_id"))) { + virReportError(VIR_ERR_XML_ERROR, "%s", _("missing host id in cache tune")); + goto cleanup; + } + if (virStrToLong_uip(tmp, NULL, 10, &(bank[i].host_id)) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("invalid setting for cache host id '%s'"), tmp); + goto cleanup; + } + VIR_FREE(tmp); + + if (!(tmp = virXMLPropString(nodes[i], "size"))) { + virReportError(VIR_ERR_XML_ERROR, "%s", _("missing size in cache tune")); + goto cleanup; + } + if (virStrToLong_ull(tmp, NULL, 10, &(bank[i].size)) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("invalid setting for cache size '%s'"), tmp); + goto cleanup; + } + VIR_FREE(tmp); + + if (!(tmp = virXMLPropString(nodes[i], "type"))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("missing cache type")); + goto cleanup; + } + + if ((type = virResCtrlTypeFromString(tmp)) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("'unsupported cache type '%s'"), tmp); + goto cleanup; + } + + resctrl = virResCtrlGet(type); + + if (resctrl == NULL || !resctrl->enabled) { + virReportError(VIR_ERR_XML_ERROR, + _("'host doesn't enabled cache type '%s'"), tmp); + goto cleanup; + } + + bool found_host_id = false; + /* Loop for banks to search host_id */ + for (j = 0; j < resctrl->num_banks; j++) { + if (resctrl->cache_banks[j].host_id == bank[i].host_id) { + found_host_id = true; + break; + } + } + + if (! found_host_id) { + virReportError(VIR_ERR_XML_ERROR, + _("'cache bank's host id %u not found on the host"), + bank[i].host_id); + goto cleanup; + } + + if (bank[i].size == 0 || + bank[i].size % resctrl->cache_banks[j].cache_min != 0) { + virReportError(VIR_ERR_XML_ERROR, + _("'the size should be multiples of '%llu'"), + resctrl->cache_banks[j].cache_min); + goto cleanup; + } + + if (VIR_STRDUP(bank[i].type, tmp) < 0) + goto cleanup; + + if ((tmp = virXMLPropString(nodes[i], "vcpus"))) { + if (virBitmapParse(tmp, &bank[i].vcpus, + VIR_DOMAIN_CPUMASK_LEN) < 0) + goto cleanup; + + if (virBitmapIsAllClear(bank[i].vcpus)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Invalid value of 'vcpus': %s"), tmp); + goto cleanup; + } + } + } + + def->cachetune.cache_banks = bank; + def->cachetune.n_banks = n; + return 0; + + cleanup: + VIR_FREE(bank); + VIR_FREE(tmp); + return -1; +} /* Parse the XML definition for a iothreadpin * and an iothreadspin has the form @@ -16882,6 +17004,14 @@ virDomainDefParseXML(xmlDocPtr xml, } VIR_FREE(nodes); + if ((n = virXPathNodeSet("./cputune/cachetune", ctxt, &nodes)) < 0) + goto error; + + if (virDomainCacheTuneDefParseXML(def, n, nodes) < 0) + goto error; + + VIR_FREE(nodes); + if ((n = virXPathNodeSet("./cputune/emulatorpin", ctxt, &nodes)) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("cannot extract emulatorpin nodes")); @@ -23398,6 +23528,26 @@ virDomainSchedulerFormat(virBufferPtr buf, } +static void +virDomainCacheTuneDefFormat(virBufferPtr buf, + virDomainCachetunePtr cache) +{ + size_t i; + for (i = 0; i < cache->n_banks; i ++) { + virBufferAsprintf(buf, "<cachetune id='%u' host_id='%u' " + "type='%s' size='%llu' unit='KiB'", + cache->cache_banks[i].id, + cache->cache_banks[i].host_id, + cache->cache_banks[i].type, + cache->cache_banks[i].size); + + if (cache->cache_banks[i].vcpus) + virBufferAsprintf(buf, " vcpus='%s'/>\n", + virBitmapFormat(cache->cache_banks[i].vcpus)); + else + virBufferAddLit(buf, "/>\n"); + } +} static int virDomainCputuneDefFormat(virBufferPtr buf, @@ -23461,6 +23611,8 @@ virDomainCputuneDefFormat(virBufferPtr buf, VIR_FREE(cpumask); } + virDomainCacheTuneDefFormat(&childrenBuf, &def->cachetune); + if (def->cputune.emulatorpin) { char *cpumask; virBufferAddLit(&childrenBuf, "<emulatorpin "); diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 507ace8..fa47fe4 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2132,6 +2132,23 @@ struct _virDomainMemtune { unsigned long long swap_hard_limit; /* in kibibytes, limit at off_t bytes */ }; +typedef struct _virDomainCacheBank virDomainCacheBank; +typedef virDomainCacheBank *virDomainCacheBankPtr; +struct _virDomainCacheBank { + unsigned int id; + unsigned int host_id; + unsigned long long size; + char *type; + virBitmapPtr vcpus; +}; + +typedef struct _virDomainCachetune virDomainCachetune; +typedef virDomainCachetune *virDomainCachetunePtr; +struct _virDomainCachetune { + size_t n_banks; + virDomainCacheBankPtr cache_banks; +}; + typedef struct _virDomainPowerManagement virDomainPowerManagement; typedef virDomainPowerManagement *virDomainPowerManagementPtr; @@ -2196,6 +2213,8 @@ struct _virDomainDef { virDomainCputune cputune; + virDomainCachetune cachetune; + virDomainNumaPtr numa; virDomainResourceDefPtr resource; virDomainIdMapDef idmap; diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index 80481bc..b49e965 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -40,15 +40,37 @@ VIR_LOG_INIT("util.resctrl"); #define VIR_FROM_THIS VIR_FROM_RESCTRL - -#define RESCTRL_DIR "/sys/fs/resctrl" -#define RESCTRL_INFO_DIR "/sys/fs/resctrl/info" -#define SYSFS_SYSTEM_PATH "/sys/devices/system" - #define MAX_CPU_SOCKET_NUM 8 #define MAX_CBM_BIT_LEN 32 #define MAX_SCHEMATA_LEN 1024 #define MAX_FILE_LEN ( 10 * 1024 * 1024) +#define RESCTRL_DIR "/sys/fs/resctrl" +#define RESCTRL_INFO_DIR "/sys/fs/resctrl/info" +#define SYSFS_SYSTEM_PATH "/sys/devices/system" + +VIR_ENUM_IMPL(virResCtrl, VIR_RDT_RESOURCE_LAST, + "l3", "l3data", "l3code", "l2"); + +#define CONSTRUCT_RESCTRL_PATH(domain_name, item_name) \ +do { \ + if (NULL == domain_name) { \ + if (virAsprintf(&path, "%s/%s", RESCTRL_DIR, item_name) < 0)\ + return -1; \ + } else { \ + if (virAsprintf(&path, "%s/%s/%s", RESCTRL_DIR, \ + domain_name, \ + item_name) < 0) \ + return -1; \ + } \ +} while (0) + +#define VIR_RESCTRL_ENABLED(type) \ + resctrlall[type].enabled + +#define VIR_RESCTRL_GET_SCHEMATA(count) ((1 << count) - 1) + +#define VIR_RESCTRL_SET_SCHEMATA(p, type, pos, val) \ + p->schematas[type]->schemata_items[pos] = val static unsigned int host_id; diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h index aa113f4..3f6fcae 100644 --- a/src/util/virresctrl.h +++ b/src/util/virresctrl.h @@ -36,6 +36,7 @@ enum { VIR_RDT_RESOURCE_LAST, }; +VIR_ENUM_DECL(virResCtrl); typedef struct _virResCacheBank virResCacheBank; typedef virResCacheBank *virResCacheBankPtr; -- 1.9.1

virResCtrlSetCacheBanks: Set cache banks of a libvirt domain. It will create new resource domain under `/sys/fs/resctrl` and fill the schemata according the cache banks configration. virResCtrlUpdate: Update the schemata after libvirt domain destroy. Signed-off-by: Eli Qiao <liyong.qiao@intel.com> --- src/libvirt_private.syms | 6 +- src/util/virresctrl.c | 704 ++++++++++++++++++++++++++++++++++++++++++++++- src/util/virresctrl.h | 6 +- 3 files changed, 710 insertions(+), 6 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index cc6c433..85b9e94 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2318,7 +2318,11 @@ virRandomInt; virResCtrlAvailable; virResCtrlGet; virResCtrlInit; - +virResCtrlSetCacheBanks; +virResCtrlTypeFromString; +virResCtrlTypeToString; +virResCtrlUpdate; +# # util/virrotatingfile.h virRotatingFileReaderConsume; diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index b49e965..3da3ec0 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -72,9 +72,59 @@ do { \ #define VIR_RESCTRL_SET_SCHEMATA(p, type, pos, val) \ p->schematas[type]->schemata_items[pos] = val +/** + * a virResSchemata represents a schemata object under a resource control + * domain. + */ +typedef struct _virResSchemataItem virResSchemataItem; +typedef virResSchemataItem *virResSchemataItemPtr; +struct _virResSchemataItem { + unsigned int socket_no; + unsigned schemata; +}; + +typedef struct _virResSchemata virResSchemata; +typedef virResSchemata *virResSchemataPtr; +struct _virResSchemata { + unsigned int n_schemata_items; + virResSchemataItemPtr schemata_items; +}; + +/** + * a virResDomain represents a resource control domain. It's a double linked + * list. + */ + +typedef struct _virResDomain virResDomain; +typedef virResDomain *virResDomainPtr; + +struct _virResDomain { + char *name; + virResSchemataPtr schematas[VIR_RDT_RESOURCE_LAST]; + char **tasks; + size_t n_tasks; + size_t n_sockets; + virResDomainPtr pre; + virResDomainPtr next; +}; + +/* All resource control domains on this host*/ + +typedef struct _virResCtrlDomain virResCtrlDomain; +typedef virResCtrlDomain *virResCtrlDomainPtr; +struct _virResCtrlDomain { + unsigned int num_domains; + virResDomainPtr domains; +}; + static unsigned int host_id; +/* Global static struct to be maintained which is a interface */ +static virResCtrlDomain domainall; + +/* Global static struct array to be maintained which indicate + * resource status on a host */ static virResCtrl resctrlall[] = { { .name = "L3", @@ -94,6 +144,66 @@ static virResCtrl resctrlall[] = { }, }; +/* + * How many bits is set in schemata + * eg: + * virResCtrlBitsNum(1011) = 2 */ +static int virResCtrlBitsContinuesNum(unsigned schemata) +{ + size_t i; + int ret = 0; + for (i = 0; i < MAX_CBM_BIT_LEN; i ++) { + if ((schemata & 0x1) == 0x1) + ret++; + else + break; + schemata = schemata >> 1; + if (schemata == 0) break; + } + return ret; +} + +static int virResCtrlGetStr(const char *domain_name, const char *item_name, char **ret) +{ + char *path; + int rc = 0; + + CONSTRUCT_RESCTRL_PATH(domain_name, item_name); + + if (virFileReadAll(path, MAX_FILE_LEN, ret) < 0) { + rc = -1; + goto cleanup; + } + + cleanup: + VIR_FREE(path); + return rc; +} + +static int virResCtrlGetTasks(const char *domain_name, char **pids) +{ + return virResCtrlGetStr(domain_name, "tasks", pids); +} + +static int virResCtrlGetSchemata(const int type, const char *name, char **schemata) +{ + int rc; + char *tmp, *end; + char *buf; + + if ((rc = virResCtrlGetStr(name, "schemata", &buf)) < 0) + return rc; + + tmp = strstr(buf, resctrlall[type].name); + end = strchr(tmp, '\n'); + *end = '\0'; + if (VIR_STRDUP(*schemata, tmp) < 0) + rc = -1; + + VIR_FREE(buf); + return rc; +} + static int virResCtrlGetInfoStr(const int type, const char *item, char **str) { int ret = 0; @@ -114,6 +224,70 @@ static int virResCtrlGetInfoStr(const int type, const char *item, char **str) return ret; } +/* Return pointer of and ncount of schemata*/ +static virResSchemataPtr virParseSchemata(const char *schemata_str, size_t *ncount) +{ + const char *p, *q; + int pos; + int ischemata; + virResSchemataPtr schemata; + virResSchemataItemPtr schemataitems, tmpitem; + unsigned int socket_no = 0; + char *tmp; + + if (VIR_ALLOC(schemata) < 0) + goto cleanup; + + p = q = schemata_str; + pos = strchr(schemata_str, ':') - p; + + /* calculate cpu socket count */ + *ncount = 1; + while ((q = strchr(p, ';')) != 0) { + p = q + 1; + (*ncount)++; + } + + /* allocat an arrry to store schemata for each socket*/ + if (VIR_ALLOC_N_QUIET(tmpitem, *ncount) < 0) + goto cleanup; + + schemataitems = tmpitem; + + p = q = schemata_str + pos + 1; + + while (*p != '\0') { + if (*p == '=') { + q = p + 1; + + tmpitem->socket_no = socket_no++; + + while (*p != ';' && *p != '\0') p++; + + if (VIR_STRNDUP(tmp, q, p-q) < 0) + goto cleanup; + + if (virStrToLong_i(tmp, NULL, 16, &ischemata) < 0) + goto cleanup; + + VIR_FREE(tmp); + tmp = NULL; + tmpitem->schemata = ischemata; + tmpitem ++; + schemata->n_schemata_items += 1; + } + p++; + } + + schemata->schemata_items = schemataitems; + return schemata; + + cleanup: + VIR_FREE(schemata); + VIR_FREE(tmpitem); + return NULL; +} + static int virResCtrlGetCPUValue(const char *path, char **value) { @@ -251,6 +425,7 @@ static virResCacheBankPtr virResCtrlGetCacheBanks(int type, int *n_sockets) bank[s_id].cache_size = cache_size; bank[s_id].cache_min = cache_size / resctrlall[type].cbm_len; + bank[s_id].cache_left = cache_size - (bank[s_id].cache_min * resctrlall[type].min_cbm_bits); } } return bank; @@ -267,7 +442,7 @@ static int virResCtrlGetConfig(int type) size_t i; char *str; - /* Read min_cbm_bits from resctrl. + /* Read num_closids from resctrl. eg: /sys/fs/resctrl/info/L3/num_closids */ if ((ret = virResCtrlGetInfoStr(type, "num_closids", &str)) < 0) @@ -320,6 +495,524 @@ static int virResCtrlGetConfig(int type) return ret; } +/* Remove the Domain from sysfs, this should only success no pids in tasks + * of a partition. + */ +static +int virResCtrlRemoveDomain(const char *name) +{ + char *path = NULL; + int rc = 0; + + if ((rc = virAsprintf(&path, "%s/%s", RESCTRL_DIR, name)) < 0) + return rc; + rc = rmdir(path); + VIR_FREE(path); + return rc; +} + +static +int virResCtrlDestroyDomain(virResDomainPtr p) +{ + size_t i; + int rc; + if ((rc = virResCtrlRemoveDomain(p->name)) < 0) + VIR_WARN("Failed to removed partition %s", p->name); + + VIR_FREE(p->name); + p->name = NULL; + for (i = 0; i < p->n_tasks; i ++) + VIR_FREE(p->tasks[i]); + VIR_FREE(p); + p = NULL; + return rc; +} + + +/* assemble schemata string*/ +static +char* virResCtrlAssembleSchemata(virResSchemataPtr schemata, int type) +{ + virBuffer buf = VIR_BUFFER_INITIALIZER; + size_t i; + + virBufferAsprintf(&buf, "%s:%u=%x", resctrlall[type].name, + schemata->schemata_items[0].socket_no, + schemata->schemata_items[0].schemata); + + for (i = 1; i < schemata->n_schemata_items; i++) { + virBufferAsprintf(&buf, ";%u=%x", + schemata->schemata_items[i].socket_no, + schemata->schemata_items[i].schemata); + } + + return virBufferContentAndReset(&buf); +} + +/* Refresh default domains' schemata + */ +static +int virResCtrlRefreshSchemata(void) +{ + size_t i, j, k; + unsigned int tmp_schemata; + unsigned int default_schemata; + + virResDomainPtr header, p; + + header = domainall.domains; + + if (!header) + return 0; + + for (i = 0; i < VIR_RDT_RESOURCE_LAST; i++) { + if (VIR_RESCTRL_ENABLED(i)) { + for (j = 0; j < header->schematas[i]->n_schemata_items; j ++) { + p = header->next; + default_schemata = VIR_RESCTRL_GET_SCHEMATA(resctrlall[i].cbm_len); + tmp_schemata = 0; + /* NOTEs: if only header domain, the schemata will be set to default one*/ + for (k = 1; k < domainall.num_domains; k++) { + tmp_schemata |= p->schematas[i]->schemata_items[j].schemata; + p = p->next; + } + /* sys fs doens't let us use 0 */ + int min_bits = VIR_RESCTRL_GET_SCHEMATA(resctrlall[i].min_cbm_bits); + if ((tmp_schemata & min_bits) == min_bits) + tmp_schemata -= min_bits; + + default_schemata ^= tmp_schemata; + + int bitsnum = virResCtrlBitsContinuesNum(default_schemata); + // calcuate header's schemata + // NOTES: resctrl sysfs only allow us to set a continues schemata + header->schematas[i]->schemata_items[j].schemata = VIR_RESCTRL_GET_SCHEMATA(bitsnum); + resctrlall[i].cache_banks[j].cache_left = + (bitsnum - resctrlall[i].min_cbm_bits) * resctrlall[i].cache_banks[j].cache_min; + } + } + } + + return 0; + +} + +/* Refresh all domains', remove the domains which has no task ids. + * This will be used after VM pause, restart, destroy etc. + */ +static int +virResCtrlRefresh(void) +{ + size_t i; + char* tasks; + unsigned int origin_count = domainall.num_domains; + virResDomainPtr p, pre, del; + pre = domainall.domains; + p = del = NULL; + if (pre) + p = pre->next; + + for (i = 1; i < origin_count && p; i++) { + if (virResCtrlGetTasks(p->name, &tasks) < 0) { + VIR_WARN("Failed to get tasks from %s", p->name); + pre = p; + p = p->next; + } + if (virStringIsEmpty(tasks)) { + pre->next = p->next; + if (p->next != NULL) + p->next->pre = pre; + + del = p; + p = p->next; + virResCtrlDestroyDomain(del); + domainall.num_domains -= 1; + } else { + pre = p; + p = p->next; + } + VIR_FREE(tasks); + + } + + return virResCtrlRefreshSchemata(); +} + +/* Get a domain ptr by domain's name*/ +static +virResDomainPtr virResCtrlGetDomain(const char* name) { + size_t i; + virResDomainPtr p = domainall.domains; + for (i = 0; i < domainall.num_domains; i++) { + if ((p->name) && STREQ(name, p->name)) + return p; + p = p->next; + } + return NULL; +} + +static int +virResCtrlAddTask(virResDomainPtr dom, pid_t pid) +{ + size_t maxtasks; + + if (!dom->tasks) { + if (VIR_ALLOC_N(dom->tasks, 1) < 0) + return -1; + } else { + if (VIR_RESIZE_N(dom->tasks, maxtasks, dom->n_tasks, 1) < 0) + return -1; + } + + if (virAsprintf(&(dom->tasks[dom->n_tasks]), "%llu", (long long)pid) < 0) + return -1; + + dom->n_tasks += 1; + return 0; +} + +static int +virResCtrlWrite(const char *name, const char *item, const char *content) +{ + char *path; + int writefd; + int rc = -1; + + CONSTRUCT_RESCTRL_PATH(name, item); + + if (!virFileExists(path)) + goto cleanup; + + if ((writefd = open(path, O_WRONLY | O_APPEND, S_IRUSR | S_IWUSR)) < 0) + goto cleanup; + + if (safewrite(writefd, content, strlen(content)) < 0) + goto cleanup; + + rc = 0; + cleanup: + VIR_FREE(path); + VIR_FORCE_CLOSE(writefd); + return rc; +} + +/* if name == NULL we load default schemata */ +static +virResDomainPtr virResCtrlLoadDomain(const char *name) +{ + char *schematas; + virResDomainPtr p; + size_t i; + + if (VIR_ALLOC(p) < 0) + goto cleanup; + + for (i = 0; i < VIR_RDT_RESOURCE_LAST; i++) { + if (VIR_RESCTRL_ENABLED(i)) { + if (virResCtrlGetSchemata(i, name, &schematas) < 0) + goto cleanup; + p->schematas[i] = virParseSchemata(schematas, &(p->n_sockets)); + VIR_FREE(schematas); + } + } + + p->tasks = NULL; + p->n_tasks = 0; + + if ((name != NULL) && (VIR_STRDUP(p->name, name)) < 0) + goto cleanup; + + return p; + + cleanup: + VIR_FREE(p); + return NULL; +} + +static +virResDomainPtr virResCtrlCreateDomain(const char *name) +{ + char *path; + mode_t mode = 0755; + virResDomainPtr p; + size_t i, j; + if (virAsprintf(&path, "%s/%s", RESCTRL_DIR, name) < 0) + return NULL; + + if (virDirCreate(path, mode, 0, 0, 0) < 0) + goto cleanup; + + if ((p = virResCtrlLoadDomain(name)) == NULL) + return p; + + /* sys fs doens't let us use 0. + * reset schemata to min_bits*/ + for (i = 0; i < VIR_RDT_RESOURCE_LAST; i++) { + if (VIR_RESCTRL_ENABLED(i)) { + int min_bits = VIR_RESCTRL_GET_SCHEMATA(resctrlall[i].min_cbm_bits); + for (j = 0; j < p->n_sockets; j++) + p->schematas[i]->schemata_items[j].schemata = min_bits; + } + } + + VIR_FREE(path); + return p; + + cleanup: + VIR_FREE(path); + return NULL; +} + +/* flush domains's information to sysfs*/ +static int +virResCtrlFlushDomainToSysfs(virResDomainPtr dom) +{ + size_t i; + char* schemata; + char* tmp; + int rc = -1; + virBuffer buf = VIR_BUFFER_INITIALIZER; + + for (i = 0; i < VIR_RDT_RESOURCE_LAST; i++) { + if (VIR_RESCTRL_ENABLED(i)) { + tmp = virResCtrlAssembleSchemata(dom->schematas[i], i); + virBufferAsprintf(&buf, "%s\n", tmp); + VIR_FREE(tmp); + } + } + + schemata = virBufferContentAndReset(&buf); + + if (virResCtrlWrite(dom->name, "schemata", schemata) < 0) + goto cleanup; + + if (dom->n_tasks > 0) { + for (i = 0; i < dom->n_tasks; i++) { + if (virResCtrlWrite(dom->name, "tasks", dom->tasks[i]) < 0) + goto cleanup; + } + } + + rc = 0; + + cleanup: + VIR_FREE(schemata); + return rc; +} + +static virResDomainPtr virResCtrlGetAllDomains(unsigned int *len) +{ + struct dirent *ent; + DIR *dp = NULL; + int direrr; + + *len = 0; + virResDomainPtr header, tmp, tmp_pre; + header = tmp = tmp_pre = NULL; + if (virDirOpenQuiet(&dp, RESCTRL_DIR) < 0) { + if (errno == ENOENT) + return NULL; + VIR_ERROR(_("Unable to open %s (%d)"), RESCTRL_DIR, errno); + goto cleanup; + } + + header = virResCtrlLoadDomain(NULL); + if (header == NULL) + goto cleanup; + + header->next = NULL; + + *len = 1; + + while ((direrr = virDirRead(dp, &ent, NULL)) > 0) { + if ((ent->d_type != DT_DIR) || STREQ(ent->d_name, "info")) + continue; + + tmp = virResCtrlLoadDomain(ent->d_name); + if (tmp == NULL) + goto cleanup; + + tmp->next = NULL; + + if (header->next == NULL) + header->next = tmp; + + if (tmp_pre == NULL) { + tmp->pre = header; + } else { + tmp->pre = tmp_pre; + tmp_pre->next = tmp; + } + + tmp_pre = tmp; + (*len) ++; + } + return header; + + cleanup: + VIR_DIR_CLOSE(dp); + tmp_pre = tmp = header; + while (tmp) { + tmp_pre = tmp; + tmp = tmp->next; + VIR_FREE(tmp_pre); + } + return NULL; +} + +static int +virResCtrlAppendDomain(virResDomainPtr dom) +{ + virResDomainPtr p = domainall.domains; + while (p->next != NULL) p = p->next; + p->next = dom; + dom->pre = p; + domainall.num_domains += 1; + return 0; +} + +static int +virResCtrlGetSocketIdByHostID(int type, unsigned int hostid) +{ + size_t i; + for (i = 0; i < resctrlall[type].num_banks; i++) { + if (resctrlall[type].cache_banks[i].host_id == hostid) + return i; + } + return -1; +} + +static int +virResCtrlCalculateSchemata(int type, + int sid, + unsigned hostid, + unsigned long long size) +{ + size_t i; + int count; + virResDomainPtr p; + unsigned int tmp_schemata; + unsigned int schemata_sum = 0; + + if (resctrlall[type].cache_banks[sid].cache_left < size) { + VIR_ERROR(_("Not enough cache left on bank %u"), hostid); + return -1; + } + if ((count = size / resctrlall[type].cache_banks[sid].cache_min) <= 0) { + VIR_ERROR(_("Error cache size %llu"), size); + return -1; + } + + tmp_schemata = VIR_RESCTRL_GET_SCHEMATA(count); + + p = domainall.domains; + p = p->next; + for (i = 1; i < domainall.num_domains; i ++) { + schemata_sum |= p->schematas[type]->schemata_items[sid].schemata; + p = p->next; + } + + tmp_schemata = tmp_schemata << (resctrlall[type].cbm_len - count); + + while ((tmp_schemata & schemata_sum) != 0) + tmp_schemata = tmp_schemata >> 1; + return tmp_schemata; +} + +int virResCtrlSetCacheBanks(virDomainCachetunePtr cachetune, + unsigned char *uuid, pid_t *pids, int npid) +{ + size_t i; + char name[VIR_UUID_STRING_BUFLEN]; + virResDomainPtr p; + int type; + int sid; + int schemata; + + virUUIDFormat(uuid, name); + + for (i = 0; i < cachetune->n_banks; i++) { + VIR_DEBUG("cache_banks %u, %u, %llu, %s", + cachetune->cache_banks[i].id, + cachetune->cache_banks[i].host_id, + cachetune->cache_banks[i].size, + cachetune->cache_banks[i].type); + } + + if (cachetune->n_banks < 1) + return 0; + + p = virResCtrlGetDomain(name); + if (p == NULL) { + VIR_DEBUG("no domain name %s found, create new one!", name); + p = virResCtrlCreateDomain(name); + } + + if (p != NULL) { + for (i = 0; i < cachetune->n_banks; i++) { + if ((type = virResCtrlTypeFromString( + cachetune->cache_banks[i].type)) < 0) { + VIR_WARN("Ignore unknown cache type %s.", + cachetune->cache_banks[i].type); + continue; + } + + if ((sid = virResCtrlGetSocketIdByHostID( + type, cachetune->cache_banks[i].host_id)) < 0) { + VIR_WARN("Can not find cache bank host id %u.", + cachetune->cache_banks[i].host_id); + continue; + } + + if ((schemata = virResCtrlCalculateSchemata( + type, sid, cachetune->cache_banks[i].host_id, + cachetune->cache_banks[i].size)) < 0) { + VIR_WARN("Failed to set schemata for cache bank id %u", + cachetune->cache_banks[i].id); + continue; + } + + p->schematas[type]->schemata_items[sid].schemata = schemata; + } + + for (i = 0; i < npid; i++) + virResCtrlAddTask(p, pids[i]); + + if (virResCtrlFlushDomainToSysfs(p) < 0) { + VIR_ERROR(_("failed to flush domain %s to sysfs"), name); + virResCtrlDestroyDomain(p); + return -1; + } + virResCtrlAppendDomain(p); + } else { + VIR_ERROR(_("Failed to create a domain in sysfs")); + return -1; + } + + virResCtrlRefresh(); + /* after refresh, flush header's schemata changes to sys fs */ + if (virResCtrlFlushDomainToSysfs(domainall.domains) < 0) + VIR_WARN("failed to flush domain to sysfs"); + + return 0; +} + +/* Should be called after pid disappeared, we recalculate + * schemata of default and flush it to sys fs. + */ +int virResCtrlUpdate(void) +{ + int rc; + + if ((rc = virResCtrlRefresh()) < 0) + VIR_WARN("failed to refresh rescontrol"); + + if ((rc = virResCtrlFlushDomainToSysfs(domainall.domains)) < 0) + VIR_WARN("failed to flush domain to sysfs"); + + return rc; +} + int virResCtrlInit(void) { @@ -333,13 +1026,18 @@ virResCtrlInit(void) return -1; } if (virFileExists(tmp)) { - if ((rc = virResCtrlGetConfig(i)) < 0) + if ((rc = virResCtrlGetConfig(i)) < 0) { VIR_ERROR(_("Failed to get resource control config")); return -1; + } } - VIR_FREE(tmp); } + + domainall.domains = virResCtrlGetAllDomains(&(domainall.num_domains)); + + if ((rc = virResCtrlRefresh()) < 0) + VIR_ERROR(_("Failed to refresh resource control")); return rc; } diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h index 3f6fcae..ba5697e 100644 --- a/src/util/virresctrl.h +++ b/src/util/virresctrl.h @@ -26,6 +26,7 @@ # include "virutil.h" # include "virbitmap.h" +# include "domain_conf.h" enum { VIR_RDT_RESOURCE_L3, @@ -73,9 +74,10 @@ struct _virResCtrl { virResCacheBankPtr cache_banks; }; - bool virResCtrlAvailable(void); int virResCtrlInit(void); virResCtrlPtr virResCtrlGet(int); - +int virResCtrlSetCacheBanks(virDomainCachetunePtr, + unsigned char *, pid_t *, int); +int virResCtrlUpdate(void); #endif -- 1.9.1

Set cache banks while booting a new domain. Signed-off-by: Eli Qiao <liyong.qiao@intel.com> --- src/qemu/qemu_driver.c | 6 ++++-- src/qemu/qemu_process.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 57 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 7995511..1e3ed1a 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -850,8 +850,10 @@ qemuStateInitialize(bool privileged, run_gid = cfg->group; } - if (virResCtrlAvailable() && virResCtrlInit() < 0) - VIR_WARN("Faild to initialize resource control."); + if (virResCtrlAvailable() && virResCtrlInit() < 0) { + VIR_ERROR(_("Faild to initialize resource control")); + goto error; + } qemu_driver->qemuCapsCache = virQEMUCapsCacheNew(cfg->libDir, cfg->cacheDir, diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 184440d..baf3ff1 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -76,6 +76,7 @@ #include "configmake.h" #include "nwfilter_conf.h" #include "netdev_bandwidth_conf.h" +#include "virresctrl.h" #define VIR_FROM_THIS VIR_FROM_QEMU @@ -5021,6 +5022,49 @@ qemuProcessVcpusSortOrder(const void *a, return vcpua->order - vcpub->order; } +static int +qemuProcessSetCacheBanks(virDomainObjPtr vm) +{ + size_t i, j; + virDomainCachetunePtr cachetune; + unsigned int max_vcpus = virDomainDefGetVcpusMax(vm->def); + pid_t *pids = NULL; + virDomainVcpuDefPtr vcpu; + size_t npid = 0; + size_t count = 0; + int ret = -1; + + cachetune = &(vm->def->cachetune); + + for (i = 0; i < cachetune->n_banks; i++) { + if (cachetune->cache_banks[i].vcpus) { + for (j = 0; j < max_vcpus; j++) { + if (virBitmapIsBitSet(cachetune->cache_banks[i].vcpus, j)) { + + vcpu = virDomainDefGetVcpu(vm->def, j); + if (!vcpu->online) + continue; + + if (VIR_RESIZE_N(pids, npid, count, 1) < 0) + goto cleanup; + pids[count ++] = qemuDomainGetVcpuPid(vm, j); + } + } + } + } + + if (pids == NULL) { + if (VIR_ALLOC_N(pids, 1) < 0) + goto cleanup; + pids[0] = vm->pid; + count = 1; + } + ret = virResCtrlSetCacheBanks(cachetune, vm->def->uuid, pids, count); + + cleanup: + VIR_FREE(pids); + return ret; +} static int qemuProcessSetupHotpluggableVcpus(virQEMUDriverPtr driver, @@ -5714,6 +5758,11 @@ qemuProcessLaunch(virConnectPtr conn, qemuProcessAutoDestroyAdd(driver, vm, conn) < 0) goto cleanup; + VIR_DEBUG("Cache allocation"); + + if (virResCtrlAvailable() && qemuProcessSetCacheBanks(vm) < 0) + goto cleanup; + ret = 0; cleanup: @@ -6216,6 +6265,10 @@ void qemuProcessStop(virQEMUDriverPtr driver, virPerfFree(priv->perf); priv->perf = NULL; + if (virResCtrlAvailable() && virResCtrlUpdate() < 0) + VIR_WARN("Failed to update resource control for %s", + vm->def->name); + qemuProcessRemoveDomainStatus(driver, vm); /* Remove VNC and Spice ports from port reservation bitmap, but only if -- 1.9.1

Enable l3code/l3data while doing cache tune. l3code/l3data should use a continus cbm in their seperated schemata and the cache size are shared between them, so we need to deal them differently with l3 cache. This should enable cdp feature while mounting /sys/fs/resctrl, eg: mount -t resctrl resctrl -o cdp /sys/fs/resctrl Signed-off-by: Eli Qiao <liyong.qiao@intel.com> --- docs/schemas/domaincommon.rng | 2 +- src/util/virresctrl.c | 25 ++++++++++++++++++++++++- 2 files changed, 25 insertions(+), 2 deletions(-) diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index edb2888..f61c57a 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -5489,7 +5489,7 @@ </define> <define name="cachetype"> <data type="string"> - <param name="pattern">(l3)</param> + <param name="pattern">(l3|l3code|l3data)</param> </data> </define> <define name="cacheunit"> diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index 3da3ec0..3ca3600 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -557,6 +557,7 @@ int virResCtrlRefreshSchemata(void) size_t i, j, k; unsigned int tmp_schemata; unsigned int default_schemata; + int pair_type = 0; virResDomainPtr header, p; @@ -567,6 +568,12 @@ int virResCtrlRefreshSchemata(void) for (i = 0; i < VIR_RDT_RESOURCE_LAST; i++) { if (VIR_RESCTRL_ENABLED(i)) { + + if (i == VIR_RDT_RESOURCE_L3DATA) + pair_type = VIR_RDT_RESOURCE_L3CODE; + if (i == VIR_RDT_RESOURCE_L3CODE) + pair_type = VIR_RDT_RESOURCE_L3DATA; + for (j = 0; j < header->schematas[i]->n_schemata_items; j ++) { p = header->next; default_schemata = VIR_RESCTRL_GET_SCHEMATA(resctrlall[i].cbm_len); @@ -574,6 +581,8 @@ int virResCtrlRefreshSchemata(void) /* NOTEs: if only header domain, the schemata will be set to default one*/ for (k = 1; k < domainall.num_domains; k++) { tmp_schemata |= p->schematas[i]->schemata_items[j].schemata; + if (pair_type > 0) + tmp_schemata |= p->schematas[pair_type]->schemata_items[j].schemata; p = p->next; } /* sys fs doens't let us use 0 */ @@ -690,6 +699,7 @@ virResCtrlWrite(const char *name, const char *item, const char *content) goto cleanup; rc = 0; + cleanup: VIR_FREE(path); VIR_FORCE_CLOSE(writefd); @@ -893,6 +903,7 @@ virResCtrlCalculateSchemata(int type, virResDomainPtr p; unsigned int tmp_schemata; unsigned int schemata_sum = 0; + int pair_type = 0; if (resctrlall[type].cache_banks[sid].cache_left < size) { VIR_ERROR(_("Not enough cache left on bank %u"), hostid); @@ -907,8 +918,18 @@ virResCtrlCalculateSchemata(int type, p = domainall.domains; p = p->next; + + /* for type is l3code and l3data, we need to deal them specially*/ + if (type == VIR_RDT_RESOURCE_L3DATA) + pair_type = VIR_RDT_RESOURCE_L3CODE; + + if (type == VIR_RDT_RESOURCE_L3CODE) + pair_type = VIR_RDT_RESOURCE_L3DATA; + for (i = 1; i < domainall.num_domains; i ++) { schemata_sum |= p->schematas[type]->schemata_items[sid].schemata; + if (pair_type > 0) + schemata_sum |= p->schematas[pair_type]->schemata_items[sid].schemata; p = p->next; } @@ -949,6 +970,9 @@ int virResCtrlSetCacheBanks(virDomainCachetunePtr cachetune, } if (p != NULL) { + + virResCtrlAppendDomain(p); + for (i = 0; i < cachetune->n_banks; i++) { if ((type = virResCtrlTypeFromString( cachetune->cache_banks[i].type)) < 0) { @@ -983,7 +1007,6 @@ int virResCtrlSetCacheBanks(virDomainCachetunePtr cachetune, virResCtrlDestroyDomain(p); return -1; } - virResCtrlAppendDomain(p); } else { VIR_ERROR(_("Failed to create a domain in sysfs")); return -1; -- 1.9.1

l3data and l3code type of cache banks should be configured pairs. Signed-off-by: Eli Qiao <liyong.qiao@intel.com> --- src/conf/domain_conf.c | 19 +++++++++++++++++++ src/util/virresctrl.c | 1 - src/util/virresctrl.h | 2 ++ 3 files changed, 21 insertions(+), 1 deletion(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 430c451..0f7f331 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -15619,9 +15619,13 @@ virDomainCacheTuneDefParseXML(virDomainDefPtr def, int type = -1; virDomainCacheBankPtr bank = NULL; virResCtrlPtr resctrl; + /* A Array to make sure l3code and l3data are pairs*/ + int* sem = NULL; if (VIR_ALLOC_N(bank, n) < 0) goto cleanup; + if (VIR_ALLOC_N(sem, MAX_CPU_SOCKET_NUM) < 0) + goto cleanup; for (i = 0; i < n; i++) { if (!(tmp = virXMLPropString(nodes[i], "id"))) { @@ -15669,6 +15673,12 @@ virDomainCacheTuneDefParseXML(virDomainDefPtr def, goto cleanup; } + /* VIR_RDT_RESOURCE_L3DATA and VIR_RDT_RESOURCE_L3CODE should be pair */ + if (type == VIR_RDT_RESOURCE_L3DATA) + sem[bank[i].host_id] ++; + else if (type == VIR_RDT_RESOURCE_L3CODE) + sem[bank[i].host_id] --; + resctrl = virResCtrlGet(type); if (resctrl == NULL || !resctrl->enabled) { @@ -15717,6 +15727,14 @@ virDomainCacheTuneDefParseXML(virDomainDefPtr def, } } + for (i = 0; i < MAX_CPU_SOCKET_NUM; i ++) { + if (sem[i] != 0) { + virReportError(VIR_ERR_XML_ERROR, + _("'l3code and l3data shoud be show up pairs on bank %zu'"), + i); + goto cleanup; + } + } def->cachetune.cache_banks = bank; def->cachetune.n_banks = n; return 0; @@ -15724,6 +15742,7 @@ virDomainCacheTuneDefParseXML(virDomainDefPtr def, cleanup: VIR_FREE(bank); VIR_FREE(tmp); + VIR_FREE(sem); return -1; } diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index 3ca3600..d15e985 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -40,7 +40,6 @@ VIR_LOG_INIT("util.resctrl"); #define VIR_FROM_THIS VIR_FROM_RESCTRL -#define MAX_CPU_SOCKET_NUM 8 #define MAX_CBM_BIT_LEN 32 #define MAX_SCHEMATA_LEN 1024 #define MAX_FILE_LEN ( 10 * 1024 * 1024) diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h index ba5697e..ee7e115 100644 --- a/src/util/virresctrl.h +++ b/src/util/virresctrl.h @@ -28,6 +28,8 @@ # include "virbitmap.h" # include "domain_conf.h" +#define MAX_CPU_SOCKET_NUM 8 + enum { VIR_RDT_RESOURCE_L3, VIR_RDT_RESOURCE_L3DATA, -- 1.9.1

This patch support l3 cache allocation compatible mode if cdp enabled on host. In this case l3code/l3data has same schemata. Signed-off-by: Eli Qiao <liyong.qiao@intel.com> --- src/conf/domain_conf.c | 15 +++++++++++++-- src/util/virresctrl.c | 10 ++++++++++ 2 files changed, 23 insertions(+), 2 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 0f7f331..ec11792 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -15682,9 +15682,20 @@ virDomainCacheTuneDefParseXML(virDomainDefPtr def, resctrl = virResCtrlGet(type); if (resctrl == NULL || !resctrl->enabled) { - virReportError(VIR_ERR_XML_ERROR, + /* support cdp compatible */ + if (type == VIR_RDT_RESOURCE_L3) { + resctrl = virResCtrlGet(type + 1); + if (resctrl == NULL || !resctrl->enabled) { + virReportError(VIR_ERR_XML_ERROR, _("'host doesn't enabled cache type '%s'"), tmp); - goto cleanup; + goto cleanup; + } + VIR_WARN("Use cdp compatible mode."); + } else { + virReportError(VIR_ERR_XML_ERROR, + _("'host doesn't enabled cache type '%s'"), tmp); + goto cleanup; + } } bool found_host_id = false; diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index d15e985..17d95a8 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -946,6 +946,7 @@ int virResCtrlSetCacheBanks(virDomainCachetunePtr cachetune, char name[VIR_UUID_STRING_BUFLEN]; virResDomainPtr p; int type; + int pair_type = -1; int sid; int schemata; @@ -979,6 +980,13 @@ int virResCtrlSetCacheBanks(virDomainCachetunePtr cachetune, cachetune->cache_banks[i].type); continue; } + /* use cdp compatible mode */ + if (!VIR_RESCTRL_ENABLED(type) && + (type == VIR_RDT_RESOURCE_L3) && + VIR_RESCTRL_ENABLED(VIR_RDT_RESOURCE_L3DATA)) { + type = VIR_RDT_RESOURCE_L3DATA; + pair_type = VIR_RDT_RESOURCE_L3CODE; + } if ((sid = virResCtrlGetSocketIdByHostID( type, cachetune->cache_banks[i].host_id)) < 0) { @@ -996,6 +1004,8 @@ int virResCtrlSetCacheBanks(virDomainCachetunePtr cachetune, } p->schematas[type]->schemata_items[sid].schemata = schemata; + if (pair_type > 0) + p->schematas[pair_type]->schemata_items[sid].schemata = schemata; } for (i = 0; i < npid; i++) -- 1.9.1

The internal struct list domainall is a list which are resctral domain status shared by all VMs, especiall the default domain, each VM should access it concomitantly. Ues a mutex to control it. Each bank's cache_left field is also a global shared resource we need to be care, add a mutex for each bank. We need also to add lock to access /sys/fs/resctrl, use flock. Signed-off-by: Eli Qiao <liyong.qiao@intel.com> --- src/util/virresctrl.c | 97 ++++++++++++++++++++++++++++++++++++++++++--------- src/util/virresctrl.h | 3 ++ 2 files changed, 83 insertions(+), 17 deletions(-) diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index 17d95a8..2b1ee41 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -24,6 +24,7 @@ #if defined HAVE_SYS_SYSCALL_H # include <sys/syscall.h> #endif +#include <sys/file.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> @@ -68,8 +69,8 @@ do { \ #define VIR_RESCTRL_GET_SCHEMATA(count) ((1 << count) - 1) -#define VIR_RESCTRL_SET_SCHEMATA(p, type, pos, val) \ - p->schematas[type]->schemata_items[pos] = val +#define VIR_RESCTRL_LOCK(fd, op) flock(fd, op) +#define VIR_RESCTRL_UNLOCK(fd) flock(fd, LOCK_UN) /** * a virResSchemata represents a schemata object under a resource control @@ -114,6 +115,8 @@ typedef virResCtrlDomain *virResCtrlDomainPtr; struct _virResCtrlDomain { unsigned int num_domains; virResDomainPtr domains; + + virMutex lock; }; @@ -169,11 +172,16 @@ static int virResCtrlGetStr(const char *domain_name, const char *item_name, char CONSTRUCT_RESCTRL_PATH(domain_name, item_name); + if (!virFileExists(path)) + goto cleanup; + if (virFileReadAll(path, MAX_FILE_LEN, ret) < 0) { rc = -1; goto cleanup; } + rc = 0; + cleanup: VIR_FREE(path); return rc; @@ -616,6 +624,9 @@ virResCtrlRefresh(void) unsigned int origin_count = domainall.num_domains; virResDomainPtr p, pre, del; pre = domainall.domains; + + virMutexLock(&domainall.lock); + p = del = NULL; if (pre) p = pre->next; @@ -640,10 +651,11 @@ virResCtrlRefresh(void) p = p->next; } VIR_FREE(tasks); - } - return virResCtrlRefreshSchemata(); + virResCtrlRefreshSchemata(); + virMutexUnlock(&domainall.lock); + return 0; } /* Get a domain ptr by domain's name*/ @@ -873,10 +885,15 @@ static int virResCtrlAppendDomain(virResDomainPtr dom) { virResDomainPtr p = domainall.domains; + + virMutexLock(&domainall.lock); + while (p->next != NULL) p = p->next; p->next = dom; dom->pre = p; domainall.num_domains += 1; + + virMutexUnlock(&domainall.lock); return 0; } @@ -899,18 +916,22 @@ virResCtrlCalculateSchemata(int type, { size_t i; int count; + int rc = -1; virResDomainPtr p; unsigned int tmp_schemata; unsigned int schemata_sum = 0; int pair_type = 0; + virMutexLock(&resctrlall[type].cache_banks[sid].lock); + if (resctrlall[type].cache_banks[sid].cache_left < size) { VIR_ERROR(_("Not enough cache left on bank %u"), hostid); - return -1; + goto cleanup; } + if ((count = size / resctrlall[type].cache_banks[sid].cache_min) <= 0) { VIR_ERROR(_("Error cache size %llu"), size); - return -1; + goto cleanup; } tmp_schemata = VIR_RESCTRL_GET_SCHEMATA(count); @@ -925,7 +946,7 @@ virResCtrlCalculateSchemata(int type, if (type == VIR_RDT_RESOURCE_L3CODE) pair_type = VIR_RDT_RESOURCE_L3DATA; - for (i = 1; i < domainall.num_domains; i ++) { + for (i = 1; i < domainall.num_domains; i++) { schemata_sum |= p->schematas[type]->schemata_items[sid].schemata; if (pair_type > 0) schemata_sum |= p->schematas[pair_type]->schemata_items[sid].schemata; @@ -936,7 +957,16 @@ virResCtrlCalculateSchemata(int type, while ((tmp_schemata & schemata_sum) != 0) tmp_schemata = tmp_schemata >> 1; - return tmp_schemata; + + resctrlall[type].cache_banks[sid].cache_left -= size; + if (pair_type > 0) + resctrlall[pair_type].cache_banks[sid].cache_left = resctrlall[type].cache_banks[sid].cache_left; + + rc = tmp_schemata; + + cleanup: + virMutexUnlock(&resctrlall[type].cache_banks[sid].lock); + return rc; } int virResCtrlSetCacheBanks(virDomainCachetunePtr cachetune, @@ -949,8 +979,8 @@ int virResCtrlSetCacheBanks(virDomainCachetunePtr cachetune, int pair_type = -1; int sid; int schemata; - - virUUIDFormat(uuid, name); + int lockfd; + int rc = -1; for (i = 0; i < cachetune->n_banks; i++) { VIR_DEBUG("cache_banks %u, %u, %llu, %s", @@ -963,12 +993,21 @@ int virResCtrlSetCacheBanks(virDomainCachetunePtr cachetune, if (cachetune->n_banks < 1) return 0; + virUUIDFormat(uuid, name); + + if ((lockfd = open(RESCTRL_DIR, O_RDONLY)) < 0) + goto cleanup; + + if (VIR_RESCTRL_LOCK(lockfd, LOCK_EX) < 0) { + virReportSystemError(errno, _("Unable to lock '%s'"), RESCTRL_DIR); + goto cleanup; + } + p = virResCtrlGetDomain(name); if (p == NULL) { VIR_DEBUG("no domain name %s found, create new one!", name); p = virResCtrlCreateDomain(name); } - if (p != NULL) { virResCtrlAppendDomain(p); @@ -1014,19 +1053,25 @@ int virResCtrlSetCacheBanks(virDomainCachetunePtr cachetune, if (virResCtrlFlushDomainToSysfs(p) < 0) { VIR_ERROR(_("failed to flush domain %s to sysfs"), name); virResCtrlDestroyDomain(p); - return -1; + goto cleanup; } } else { VIR_ERROR(_("Failed to create a domain in sysfs")); - return -1; + goto cleanup; } virResCtrlRefresh(); /* after refresh, flush header's schemata changes to sys fs */ - if (virResCtrlFlushDomainToSysfs(domainall.domains) < 0) - VIR_WARN("failed to flush domain to sysfs"); + if (virResCtrlFlushDomainToSysfs(domainall.domains) < 0) { + VIR_ERROR(_("failed to flush domain to sysfs")); + goto cleanup; + } - return 0; + rc = 0; + + cleanup: + VIR_RESCTRL_UNLOCK(lockfd); + return rc; } /* Should be called after pid disappeared, we recalculate @@ -1048,7 +1093,7 @@ int virResCtrlUpdate(void) int virResCtrlInit(void) { - size_t i = 0; + size_t i, j; char *tmp; int rc = 0; @@ -1068,6 +1113,24 @@ virResCtrlInit(void) domainall.domains = virResCtrlGetAllDomains(&(domainall.num_domains)); + for (i = 0; i < VIR_RDT_RESOURCE_LAST; i++) { + if (VIR_RESCTRL_ENABLED(i)) { + for (j = 0; j < resctrlall[i].num_banks; j++) { + if (virMutexInit(&resctrlall[i].cache_banks[j].lock) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Unable to initialize mutex")); + return -1; + } + } + } + } + + if (virMutexInit(&domainall.lock) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Unable to initialize mutex")); + return -1; + } + if ((rc = virResCtrlRefresh()) < 0) VIR_ERROR(_("Failed to refresh resource control")); return rc; diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h index ee7e115..07b298d 100644 --- a/src/util/virresctrl.h +++ b/src/util/virresctrl.h @@ -27,6 +27,7 @@ # include "virutil.h" # include "virbitmap.h" # include "domain_conf.h" +# include "virthread.h" #define MAX_CPU_SOCKET_NUM 8 @@ -49,6 +50,8 @@ struct _virResCacheBank { unsigned long long cache_left; unsigned long long cache_min; virBitmapPtr cpu_mask; + + virMutex lock; }; /** -- 1.9.1

Other application may touch resctrl while libvirt's running, scan resctrl again before allocating cache information to the newly created VM. Signed-off-by: Eli Qiao <liyong.qiao@intel.com> --- src/util/virresctrl.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index 2b1ee41..f13bce1 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -897,6 +897,46 @@ virResCtrlAppendDomain(virResDomainPtr dom) return 0; } +/* scan /sys/fs/resctrl again and refresh default schemata */ +static +int virResCtrlScan(void) +{ + struct dirent *ent; + DIR *dp = NULL; + int direrr; + virResDomainPtr p; + int rc = -1; + + if (virDirOpenQuiet(&dp, RESCTRL_DIR) < 0) { + if (errno == ENOENT) + return -1; + VIR_ERROR(_("Unable to open %s (%d)"), RESCTRL_DIR, errno); + goto cleanup; + } + + while ((direrr = virDirRead(dp, &ent, NULL)) > 0) { + if ((ent->d_type != DT_DIR) || STREQ(ent->d_name, "info")) + continue; + /* test if we'v tracked all domains */ + p = virResCtrlGetDomain(ent->d_name); + if (p == NULL) { + p = virResCtrlLoadDomain(ent->d_name); + if (p == NULL) + continue; + /* This domain is not created by libvirt, so we don't care + * about the tasks, add a fake one to prevent virResCtrlRefresh + * remove it from sysfs */ + virResCtrlAddTask(p, 1); + virResCtrlAppendDomain(p); + } + } + rc = 0; + + cleanup: + VIR_DIR_CLOSE(dp); + return rc; +} + static int virResCtrlGetSocketIdByHostID(int type, unsigned int hostid) { @@ -1003,6 +1043,10 @@ int virResCtrlSetCacheBanks(virDomainCachetunePtr cachetune, goto cleanup; } + if (virResCtrlScan() < 0) { + VIR_ERROR(_("Failed to scan resctrl domain dir")); + return -1; + } p = virResCtrlGetDomain(name); if (p == NULL) { VIR_DEBUG("no domain name %s found, create new one!", name); -- 1.9.1

Hi Eli Qiao, Question about removing resctrlfs directories with empty "tasks" file. /* This domain is not created by libvirt, so we don't care * about the tasks, add a fake one to prevent * virResCtrlRefresh * remove it from sysfs */ virResCtrlAddTask(p, 1); virResCtrlAppendDomain(p); /* Refresh all domains', remove the domains which has no task ids. * This will be used after VM pause, restart, destroy etc. */ static int virResCtrlRefresh(void) Why are you doing this (removing directories which have no tasks in them) ? The code should only read information from other CAT reservations, not write to them. This is not for cleanup purposes, since on VM shutdown the resctrlfs directories are properly removed: /* Remove the Domain from sysfs, this should only success no pids in * tasks * of a partition. */ static int virResCtrlRemoveDomain(const char *name) { char *path = NULL; int rc = 0; if ((rc = virAsprintf(&path, "%s/%s", RESCTRL_DIR, name)) < 0) return rc; rc = rmdir(path); VIR_FREE(path); return rc; } Should not write to other directories 'tasks' file.

-- Best regards Eli 天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Sunday, 19 February 2017 at 10:51 PM, Marcelo Tosatti wrote:
Hi Eli Qiao,
Question about removing resctrlfs directories with empty "tasks" file.
/* This domain is not created by libvirt, so we don't care * about the tasks, add a fake one to prevent * virResCtrlRefresh * remove it from sysfs */ virResCtrlAddTask(p, 1); virResCtrlAppendDomain(p);
/* Refresh all domains', remove the domains which has no task ids. * This will be used after VM pause, restart, destroy etc. */ static int virResCtrlRefresh(void)
Why are you doing this (removing directories which have no tasks in them) ? The code should only read information from other CAT reservations, not write to them.
When a VM get shutdown, their pids will disappear from resctrl’s directory, we can check pids to see if VM get KIlled or not. The reason why I would like check if tasks file empty is that: there maybe cases that libvirtd not tracking VM’s status, and the VM gone, left a empty directory with no tasks assigned, and cause resource leak. I think I can refine this later, keep a pointer of ResDomain in VM’s private data and use that to destroy VM’s resctrl directory. but this won’t work for exception, still resource leak happened.
This is not for cleanup purposes, since on VM shutdown the resctrlfs directories are properly removed:
/* Remove the Domain from sysfs, this should only success no pids in * tasks * of a partition. */ static int virResCtrlRemoveDomain(const char *name) { char *path = NULL; int rc = 0;
if ((rc = virAsprintf(&path, "%s/%s", RESCTRL_DIR, name)) < 0) return rc; rc = rmdir(path); VIR_FREE(path); return rc; }
Should not write to other directories 'tasks' file.
Other Apps can have a lock to resctrl before it write tasks to prevent libvirtd remove empty task directory.

On Mon, Feb 20, 2017 at 04:32:19PM +0800, Eli Qiao wrote:
-- Best regards Eli
天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Sunday, 19 February 2017 at 10:51 PM, Marcelo Tosatti wrote:
Hi Eli Qiao,
Question about removing resctrlfs directories with empty "tasks" file.
/* This domain is not created by libvirt, so we don't care * about the tasks, add a fake one to prevent * virResCtrlRefresh * remove it from sysfs */ virResCtrlAddTask(p, 1); virResCtrlAppendDomain(p);
/* Refresh all domains', remove the domains which has no task ids. * This will be used after VM pause, restart, destroy etc. */ static int virResCtrlRefresh(void)
Why are you doing this (removing directories which have no tasks in them) ? The code should only read information from other CAT reservations, not write to them.
When a VM get shutdown, their pids will disappear from resctrl’s directory, we can check pids to see if VM get KIlled or not.
The reason why I would like check if tasks file empty is that: there maybe cases that libvirtd not tracking VM’s status, and the VM gone, left a empty directory with no tasks assigned, and cause resource leak.
I think I can refine this later, keep a pointer of ResDomain in VM’s private data and use that to destroy VM’s resctrl directory.
but this won’t work for exception, still resource leak happened.
Function qemuProcessStop() will *always* be called to cleanup after VMs, all exceptions should be taken into account, so don't worry, just do proper cleanup.
This is not for cleanup purposes, since on VM shutdown the resctrlfs directories are properly removed:
/* Remove the Domain from sysfs, this should only success no pids in * tasks * of a partition. */ static int virResCtrlRemoveDomain(const char *name) { char *path = NULL; int rc = 0;
if ((rc = virAsprintf(&path, "%s/%s", RESCTRL_DIR, name)) < 0) return rc; rc = rmdir(path); VIR_FREE(path); return rc; }
Should not write to other directories 'tasks' file.
Other Apps can have a lock to resctrl before it write tasks to prevent libvirtd remove empty task directory.

-- Best regards Eli 天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Monday, 20 February 2017 at 5:07 PM, Martin Kletzander wrote:
On Mon, Feb 20, 2017 at 04:32:19PM +0800, Eli Qiao wrote:
-- Best regards Eli
天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Sunday, 19 February 2017 at 10:51 PM, Marcelo Tosatti wrote:
Hi Eli Qiao,
Question about removing resctrlfs directories with empty "tasks" file.
/* This domain is not created by libvirt, so we don't care * about the tasks, add a fake one to prevent * virResCtrlRefresh * remove it from sysfs */ virResCtrlAddTask(p, 1); virResCtrlAppendDomain(p);
/* Refresh all domains', remove the domains which has no task ids. * This will be used after VM pause, restart, destroy etc. */ static int virResCtrlRefresh(void)
Why are you doing this (removing directories which have no tasks in them) ? The code should only read information from other CAT reservations, not write to them.
When a VM get shutdown, their pids will disappear from resctrl’s directory, we can check pids to see if VM get KIlled or not.
The reason why I would like check if tasks file empty is that: there maybe cases that libvirtd not tracking VM’s status, and the VM gone, left a empty directory with no tasks assigned, and cause resource leak.
I think I can refine this later, keep a pointer of ResDomain in VM’s private data and use that to destroy VM’s resctrl directory.
but this won’t work for exception, still resource leak happened.
Function qemuProcessStop() will *always* be called to cleanup after VMs, all exceptions should be taken into account, so don't worry, just do proper cleanup.
hi Martin, thanks for your reply, I am thinking if qemu process being killed while libvirtd not running? will qemuProcessStop called in such case?
This is not for cleanup purposes, since on VM shutdown the resctrlfs directories are properly removed:
/* Remove the Domain from sysfs, this should only success no pids in * tasks * of a partition. */ static int virResCtrlRemoveDomain(const char *name) { char *path = NULL; int rc = 0;
if ((rc = virAsprintf(&path, "%s/%s", RESCTRL_DIR, name)) < 0) return rc; rc = rmdir(path); VIR_FREE(path); return rc; }
Should not write to other directories 'tasks' file.
Other Apps can have a lock to resctrl before it write tasks to prevent libvirtd remove empty task directory.

On Mon, Feb 20, 2017 at 04:32:19PM +0800, Eli Qiao wrote:
-- Best regards Eli
天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Sunday, 19 February 2017 at 10:51 PM, Marcelo Tosatti wrote:
Hi Eli Qiao,
Question about removing resctrlfs directories with empty "tasks" file.
/* This domain is not created by libvirt, so we don't care * about the tasks, add a fake one to prevent * virResCtrlRefresh * remove it from sysfs */ virResCtrlAddTask(p, 1); virResCtrlAppendDomain(p);
/* Refresh all domains', remove the domains which has no task ids. * This will be used after VM pause, restart, destroy etc. */ static int virResCtrlRefresh(void)
Why are you doing this (removing directories which have no tasks in them) ? The code should only read information from other CAT reservations, not write to them.
When a VM get shutdown, their pids will disappear from resctrl’s directory, we can check pids to see if VM get KIlled or not.
The reason why I would like check if tasks file empty is that: there maybe cases that libvirtd not tracking VM’s status, and the VM gone, left a empty directory with no tasks assigned, and cause resource leak.
I think I can refine this later, keep a pointer of ResDomain in VM’s private data and use that to destroy VM’s resctrl directory.
but this won’t work for exception, still resource leak happened.
Can you explain how the resource leak can happen? Why it is not possible to maintain that information inside libvirt itself?
This is not for cleanup purposes, since on VM shutdown the resctrlfs directories are properly removed:
/* Remove the Domain from sysfs, this should only success no pids in * tasks * of a partition. */ static int virResCtrlRemoveDomain(const char *name) { char *path = NULL; int rc = 0;
if ((rc = virAsprintf(&path, "%s/%s", RESCTRL_DIR, name)) < 0) return rc; rc = rmdir(path); VIR_FREE(path); return rc; }
Should not write to other directories 'tasks' file.
Other Apps can have a lock to resctrl before it write tasks to prevent libvirtd remove empty task directory.
Well then libvirt will never be able to change resctrl filesystem?

-- Best regards Eli 天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Tuesday, 21 February 2017 at 4:54 AM, Marcelo Tosatti wrote:
Can you explain how the resource leak can happen?
1. libvirtd creates a qemu process and create a resctrl directory for it, add it’s pids to tasks 2. stop libvirtd or it’s down. 3. qemu process, and tasks file is empty now. 4. libvirt restart, well, the resctrl directory should be deleted as qemu process gone.
Why it is not possible to maintain that information inside libvirt itself?
This is not for cleanup purposes, since on VM shutdown the resctrlfs directories are properly removed: /* Remove the Domain from sysfs, this should only success no pids in * tasks * of a partition. */ static int virResCtrlRemoveDomain(const char *name) { char *path = NULL; int rc = 0; if ((rc = virAsprintf(&path, "%s/%s", RESCTRL_DIR, name)) < 0) return rc; rc = rmdir(path); VIR_FREE(path); return rc; } Should not write to other directories 'tasks' file.
Other Apps can have a lock to resctrl before it write tasks to prevent libvirtd remove empty task directory.
Well then libvirt will never be able to change resctrl filesystem? no, Apps only grant lock when it creates new cache allocation, should release the lock after adding the app’s pid to tasks.

On Tue, Feb 21, 2017 at 10:25:13AM +0800, Eli Qiao wrote:
-- Best regards Eli
天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Tuesday, 21 February 2017 at 4:54 AM, Marcelo Tosatti wrote:
Can you explain how the resource leak can happen?
1. libvirtd creates a qemu process and create a resctrl directory for it, add it’s pids to tasks 2. stop libvirtd or it’s down. 3. qemu process, and tasks file is empty now. 4. libvirt restart, well, the resctrl directory should be deleted as qemu process gone.
So there are two problems: P1) If libvirtd is not restarted and guest VM is poweredoff, then CAT reservation leaks. Ideally QEMU should free the resources. How does libvirt deal with this for other resources? P2) Libvirt restart. Please remove resctrlfs directory only for libvirt owned VMs (say if the directory name matches the VM UUID), and not for all directories in resctrlfs (should not change resctrlfs directories which libvirt does not own).

-- Best regards Eli 天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Wednesday, 22 February 2017 at 7:36 AM, Marcelo Tosatti wrote:
On Tue, Feb 21, 2017 at 10:25:13AM +0800, Eli Qiao wrote:
-- Best regards Eli
天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Tuesday, 21 February 2017 at 4:54 AM, Marcelo Tosatti wrote:
Can you explain how the resource leak can happen?
1. libvirtd creates a qemu process and create a resctrl directory for it, add it’s pids to tasks 2. stop libvirtd or it’s down. 3. qemu process, and tasks file is empty now. 4. libvirt restart, well, the resctrl directory should be deleted as qemu process gone.
So there are two problems:
P1) If libvirtd is not restarted and guest VM is poweredoff, then CAT reservation leaks.
Ideally QEMU should free the resources.
yes, but qemu doesn’t care about CAT stuff.
How does libvirt deal with this for other resources?
I tested with libvirt, if qemu process exited while libvirtd is not running, next time when libvirtd restart, it will do something cleanup like it will call qemuprocessexit(), so libvirt can aware qemu process exited.
P2) Libvirt restart.
Yes.
Please remove resctrlfs directory only for libvirt owned VMs (say if the directory name matches the VM UUID), and not for all directories in resctrlfs
Yep, I ‘v done the patch update yesterday, and will send out new version today.
(should not change resctrlfs directories which libvirt does not own).
Thanks for the advices.

How does the management software query the amount of allocatable cache again? Section from another discussion:
The second case is necessary to get updated free space information.
Just VM initialization time could be enough as virConnectGetCapabilities would just know the total and free size would be reported in an API (if I rememer the discussion correctly) Martin

On Sun, Feb 19, 2017 at 12:01:57PM -0300, Marcelo Tosatti wrote:
How does the management software query the amount of allocatable cache again?
Section from another discussion:
The second case is necessary to get updated free space information.
Just VM initialization time could be enough as virConnectGetCapabilities would just know the total and free size would be reported in an API (if I rememer the discussion correctly)
Martin
Yes, i think this is missing because the interface was designed with only libvirt in mind: the "reserved" field returns the amount of cache reserved only by VMs. So if there is another application on the same L3 socket with a cache reservation, "reserved" fails to report it. Eli can you expose the amount of free allocatable cache space (where non-free includes space used by other reservations) in a 'free_space' field in the cache output of virConnectGetCapabilities?

On Sun, Feb 19, 2017 at 12:20:55PM -0300, Marcelo Tosatti wrote:
On Sun, Feb 19, 2017 at 12:01:57PM -0300, Marcelo Tosatti wrote:
How does the management software query the amount of allocatable cache again?
Section from another discussion:
The second case is necessary to get updated free space information.
Just VM initialization time could be enough as virConnectGetCapabilities would just know the total and free size would be reported in an API (if I rememer the discussion correctly)
Martin
Yes, i think this is missing because the interface was designed with only libvirt in mind: the "reserved" field returns the amount of cache reserved only by VMs.
So if there is another application on the same L3 socket with a cache reservation, "reserved" fails to report it.
Eli can you expose the amount of free allocatable cache space (where non-free includes space used by other reservations) in a 'free_space' field in the cache output of virConnectGetCapabilities?
There should be an API for that instead. Capabilities are supposed to show what the hardware is capable of, not what the actual state is. If my opinion is not enough, see here: https://www.redhat.com/archives/libvir-list/2017-January/msg00500.html

-- Best regards Eli 天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Monday, 20 February 2017 at 3:59 PM, Martin Kletzander wrote:
On Sun, Feb 19, 2017 at 12:20:55PM -0300, Marcelo Tosatti wrote:
On Sun, Feb 19, 2017 at 12:01:57PM -0300, Marcelo Tosatti wrote:
How does the management software query the amount of allocatable cache again?
Section from another discussion:
The second case is necessary to get updated free space information.
Just VM initialization time could be enough as virConnectGetCapabilities would just know the total and free size would be reported in an API (if I rememer the discussion correctly)
Martin
Yes, i think this is missing because the interface was designed with only libvirt in mind: the "reserved" field returns the amount of cache reserved only by VMs.
So if there is another application on the same L3 socket with a cache reservation, "reserved" fails to report it.
Eli can you expose the amount of free allocatable cache space (where non-free includes space used by other reservations) in a 'free_space' field in the cache output of virConnectGetCapabilities?
There should be an API for that instead. Capabilities are supposed to show what the hardware is capable of, not what the actual state is.
If my opinion is not enough, see here:
https://www.redhat.com/archives/libvir-list/2017-January/msg00500.html
Sure, thanks Martin point the link , the plan is to provide API to query cache left.
participants (4)
-
Eli Qiao
-
Eli Qiao
-
Marcelo Tosatti
-
Martin Kletzander