[libvirt] [PATCH resend V10 00/12] Support cache tune in libvirt

Addressed comment from v10 -> v9 Marcelo: * Improve default schemata calculate, support allocate from low schemata bits for other APP. Addressed comment from v9 -> v8 Marcelo: * New public API to query cache usage Eli: * Fix core dump while multiple tasks are added. Addressed comment from v8 -> v7 Martin: * Patch subject prefix. * Move some of cpu related information to virhostcpu.c. * Fix some memory leak in src/utils/resctrl.c Martin & Marcelo: * Don't remove directories which are not maintained by libvirt. Addressed comment from v7 -> v6 Marcelo: * Fix flock usage while VM initialization. Addressed comment from v6 -> v5 Marcelo: * Support other APPs to operate /sys/fs/resctrl at same time Libvirt will scan /sys/fs/resctrl again before doing cache allocation. patch 10 will address this. Addressed comment from v4 -> v5: Marcelo: * Several typos * Use flock instead of virFileLock Addressed comment from v3 -> v4: Daniel & Marcelo: * Added concurrence support Addressed comment from v2 -> v3: Daniel: * Fixed coding style, passed `make check` and `make syntax-check` * Variables renaming and move from header file to c file. * For locking/mutex support, no progress. There are some discussion from mailing list, but I can not find a better way to add locking support without performance impact. I'll explain the process and please help to advice what shoud we do. VM create: 1) Get the cache left value on each bank of the host. This should be shared amount all VMs. 2) Calculate the schemata on the bank based on all created resctrl domain's schemata 3) Calculate the default schemata by scaning all domain's schemata. 4) Flush default schemata to /sys/fs/resctrl/schemata VM destroy: 1) Remove the resctrl domain of that VM 2) Recalculate default schemata 3) Flush default schemata to /sys/fs/resctrl/schemata The key point is that all VMs shares /sys/fs/resctrl/schemata, and when a VM create a resctrl domain, the schemata of that VM depends on the default schemata and all other exsited schematas. So a global mutex is reqired. Before calculate a schemata or update default schemata, libvirt should gain this global mutex. I will try to think more about how to support this gracefully in next patch set. Marcelo: * Added vcpu support for cachetune, this will allow user to define which vcpu using which cache allocation bank. <cachetune id='0' host_id=0 size='3072' unit='KiB' vcpus='0,1'/> vcpus is a cpumap, the vcpu pids will be added to tasks file * Added cdp compatible, user can specify l3 cache even host enable cdp. See patch 8. On a cdp enabled host, specify l3code/l3data by <cachetune id='0' host_id='0' type='l3' size='3072' unit='KiB'/> This will create a schemata like: L3data:0=0xff00;... L3code:0=0xff00;... * Would you please help to test if the functions work. Martin: * Xml test case, I have no time to work on this yet, would you please show me an example, would like to amend it later. This series patches are for supportting CAT featues, which also called cache tune in libvirt. First to expose cache information which could be tuned in capabilites XML. Then add new domain xml element support to add cacahe bank which will apply on this libvirt domain. This series patches add a util file `resctrl.c/h`, an interface to talk with linux kernel's system fs. There are still one TODO left: 2. Expose a new public interface to set cachetune lively. Some discussion about this feature support can be found from: https://www.redhat.com/archives/libvir-list/2017-January/msg00644.html *** BLURB HERE *** Eli Qiao (12): Resctrl: Add some utils functions Resctrl: expose cache information to capabilities Resctrl: Add new xml element to support cache tune Resctrl: Add private interfaces to operate cache bank Qemu: Set cache tune while booting a new domain. Resctrl: enable l3code/l3data Resctrl: Make sure l3data/l3code are pairs Resctrl: Compatible mode for cdp enabled Resctrl: concurrence support Resctrl: Scan resctrl before doing cache allocation Resctrl: Add Public API for nodecachestats Resctrl: Add nodecachestats daemon/remote.c | 67 +++ docs/schemas/domaincommon.rng | 46 ++ include/libvirt/libvirt-host.h | 32 ++ include/libvirt/virterror.h | 1 + po/POTFILES.in | 1 + src/Makefile.am | 1 + src/conf/capabilities.c | 56 ++ src/conf/capabilities.h | 23 + src/conf/domain_conf.c | 182 +++++++ src/conf/domain_conf.h | 19 + src/driver-hypervisor.h | 7 + src/libvirt-host.c | 41 ++ src/libvirt_private.syms | 12 + src/libvirt_public.syms | 1 + src/nodeinfo.c | 64 +++ src/nodeinfo.h | 1 + src/qemu/qemu_capabilities.c | 8 + src/qemu/qemu_driver.c | 18 + src/qemu/qemu_process.c | 54 ++ src/remote/remote_driver.c | 52 ++ src/remote/remote_protocol.x | 25 +- src/remote_protocol-structs | 16 + src/util/virerror.c | 1 + src/util/virhostcpu.c | 186 ++++++- src/util/virhostcpu.h | 6 + src/util/virresctrl.c | 1098 ++++++++++++++++++++++++++++++++++++++++ src/util/virresctrl.h | 96 ++++ tools/virsh-host.c | 49 ++ 28 files changed, 2145 insertions(+), 18 deletions(-) create mode 100644 src/util/virresctrl.c create mode 100644 src/util/virresctrl.h -- 1.9.1

This patch adds some utils struct and functions to expose resctrl information. virResCtrlAvailable: if resctrl interface exist on host. virResCtrlGet: get specific type resource control information. virResCtrlInit: initialize resctrl struct from the host's sys fs. resctrlall[]: an array to maintain resource control information. Some of host cpu related information methods was added in virhostcpu.c Signed-off-by: Eli Qiao <liyong.qiao@intel.com> --- include/libvirt/virterror.h | 1 + po/POTFILES.in | 1 + src/Makefile.am | 1 + src/libvirt_private.syms | 4 + src/util/virerror.c | 1 + src/util/virhostcpu.c | 186 ++++++++++++++++++++++++++++++++++++---- src/util/virhostcpu.h | 6 ++ src/util/virresctrl.c | 201 ++++++++++++++++++++++++++++++++++++++++++++ src/util/virresctrl.h | 78 +++++++++++++++++ 9 files changed, 462 insertions(+), 17 deletions(-) create mode 100644 src/util/virresctrl.c create mode 100644 src/util/virresctrl.h diff --git a/include/libvirt/virterror.h b/include/libvirt/virterror.h index 2efee8f..3dd2d08 100644 --- a/include/libvirt/virterror.h +++ b/include/libvirt/virterror.h @@ -132,6 +132,7 @@ typedef enum { VIR_FROM_PERF = 65, /* Error from perf */ VIR_FROM_LIBSSH = 66, /* Error from libssh connection transport */ + VIR_FROM_RESCTRL = 67, /* Error from resource control */ # ifdef VIR_ENUM_SENTINELS VIR_ERR_DOMAIN_LAST diff --git a/po/POTFILES.in b/po/POTFILES.in index 7c7f530..4147bc6 100644 --- a/po/POTFILES.in +++ b/po/POTFILES.in @@ -241,6 +241,7 @@ src/util/virportallocator.c src/util/virprocess.c src/util/virqemu.c src/util/virrandom.c +src/util/virresctrl.c src/util/virrotatingfile.c src/util/virscsi.c src/util/virscsihost.c diff --git a/src/Makefile.am b/src/Makefile.am index 7d42eac..edb946a 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -162,6 +162,7 @@ UTIL_SOURCES = \ util/virprocess.c util/virprocess.h \ util/virqemu.c util/virqemu.h \ util/virrandom.h util/virrandom.c \ + util/virresctrl.h util/virresctrl.c \ util/virrotatingfile.h util/virrotatingfile.c \ util/virscsi.c util/virscsi.h \ util/virscsihost.c util/virscsihost.h \ diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index aed1d3d..bb7c3ad 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2320,6 +2320,10 @@ virRandomGenerateWWN; virRandomInt; +# util/virresctrl.h +virResCtrlAvailable; +virResCtrlInit; + # util/virrotatingfile.h virRotatingFileReaderConsume; virRotatingFileReaderFree; diff --git a/src/util/virerror.c b/src/util/virerror.c index ef17fb5..0ba15e6 100644 --- a/src/util/virerror.c +++ b/src/util/virerror.c @@ -139,6 +139,7 @@ VIR_ENUM_IMPL(virErrorDomain, VIR_ERR_DOMAIN_LAST, "Perf", /* 65 */ "Libssh transport layer", + "Resouce Control", ) diff --git a/src/util/virhostcpu.c b/src/util/virhostcpu.c index f29f312..e6d5102 100644 --- a/src/util/virhostcpu.c +++ b/src/util/virhostcpu.c @@ -206,29 +206,21 @@ void virHostCPUSetSysFSSystemPathLinux(const char *path) sysfs_system_path = SYSFS_SYSTEM_PATH; } -/* Return the positive decimal contents of the given - * DIR/cpu%u/FILE, or -1 on error. If DEFAULT_VALUE is non-negative - * and the file could not be found, return that instead of an error; - * this is useful for machines that cannot hot-unplug cpu0, or where - * hot-unplugging is disabled, or where the kernel is too old - * to support NUMA cells, etc. */ +/* Get a String value*/ static int -virHostCPUGetValue(const char *dir, unsigned int cpu, const char *file, - int default_value) +virHostCPUGetStrValue(const char *dir, unsigned int cpu, const char *file, char *value_str) { char *path; FILE *pathfp; - int value = -1; - char value_str[INT_BUFSIZE_BOUND(value)]; - char *tmp; + int ret = -1; if (virAsprintf(&path, "%s/cpu%u/%s", dir, cpu, file) < 0) return -1; pathfp = fopen(path, "r"); if (pathfp == NULL) { - if (default_value >= 0 && errno == ENOENT) - value = default_value; + if (errno == ENOENT) + return -2; else virReportSystemError(errno, _("cannot open %s"), path); goto cleanup; @@ -238,17 +230,84 @@ virHostCPUGetValue(const char *dir, unsigned int cpu, const char *file, virReportSystemError(errno, _("cannot read from %s"), path); goto cleanup; } + + ret = 0; + + cleanup: + VIR_FORCE_FCLOSE(pathfp); + VIR_FREE(path); + return ret; +} + + +/* Return the positive decimal contents of the given + * DIR/cpu%u/FILE, or -1 on error. If DEFAULT_VALUE is non-negative + * and the file could not be found, return that instead of an error; + * this is useful for machines that cannot hot-unplug cpu0, or where + * hot-unplugging is disabled, or where the kernel is too old + * to support NUMA cells, etc. */ +static int +virHostCPUGetValue(const char *dir, unsigned int cpu, const char *file, + int default_value) +{ + int value = -1; + char value_str[INT_BUFSIZE_BOUND(value)]; + char *tmp; + int ret; + + if ((ret = (virHostCPUGetStrValue(dir, cpu, file, value_str))) < 0) { + if (ret == -2 && default_value >= 0) + return default_value; + else + return -1; + } + if (virStrToLong_i(value_str, &tmp, 10, &value) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("could not convert '%s' to an integer"), value_str); - goto cleanup; + return -1; } + return value; +} - cleanup: - VIR_FORCE_FCLOSE(pathfp); - VIR_FREE(path); +/* Return specific type cache size in KiB of given cpu + -1 on error happened */ +static +int virHostCPUGetCache(unsigned int cpu, unsigned int type) +{ + char *cachedir = NULL; + char *cpudir; + char *unit = NULL; + char *tmp; + int value = -1; + unsigned long long size; + char value_str[INT_BUFSIZE_BOUND(value)]; + if (virAsprintf(&cpudir, "%s/cpu", sysfs_system_path) < 0) + return -1; + + if (virAsprintf(&cachedir, "cache/index%u/size", type) < 0) + goto error; + + if (virHostCPUGetStrValue(cpudir, cpu, cachedir, value_str) < 0) + goto error; + + if ((tmp = strchr(value_str, '\n'))) *tmp = '\0'; + + if (virStrToLong_i(value_str, &unit, 10, &value) < 0) + goto error; + + size = value; + + if (virScaleInteger(&size, unit, 1, ULLONG_MAX) < 0) + goto error; + + return size / 1024; + + error: + VIR_FREE(cpudir); + VIR_FREE(cachedir); return value; } @@ -301,6 +360,23 @@ virHostCPUParseSocket(const char *dir, return ret; } +/* return socket id of a given cpu*/ +static +int virHostCPUGetSocketId(virArch hostarch, unsigned int cpu) +{ + char *cpu_dir; + int ret = -1; + + if (virAsprintf(&cpu_dir, "%s/cpu", sysfs_system_path) < 0) + goto cleanup; + + ret = virHostCPUParseSocket(cpu_dir, hostarch, cpu); + + cleanup: + VIR_FREE(cpu_dir); + return ret; +} + /* parses a node entry, returning number of processors in the node and * filling arguments */ static int @@ -1346,3 +1422,79 @@ virHostCPUGetKVMMaxVCPUs(void) return -1; } #endif /* HAVE_LINUX_KVM_H */ + +/* Fill all cache bank informations + * Return a list of virResCacheBankPtr, and fill cache bank information + * by loop for all cpus on host, number of cache bank will be set in nbanks + * + * NULL if error happened, and nbanks will be set 0. */ +virResCacheBankPtr virHostCPUGetCacheBanks(virArch arch, int type, size_t *nbanks, int cbm_len) +{ + int npresent_cpus; + int idx; + size_t i; + virResCacheBankPtr bank; + + *nbanks = 0; + if ((npresent_cpus = virHostCPUGetCount()) < 0) + return NULL; + + switch (type) { + case VIR_RDT_RESOURCE_L3: + case VIR_RDT_RESOURCE_L3DATA: + case VIR_RDT_RESOURCE_L3CODE: + idx = 3; + break; + case VIR_RDT_RESOURCE_L2: + idx = 2; + break; + default: + idx = -1; + } + + if (idx == -1) + return NULL; + + if (VIR_ALLOC_N(bank, 1) < 0) + return NULL; + + *nbanks = 1; + + for (i = 0; i < npresent_cpus; i ++) { + int s_id; + int cache_size; + + if ((s_id = virHostCPUGetSocketId(arch, i)) < 0) + goto error; + + /* Expand cache bank array */ + if (s_id > (*nbanks - 1)) { + size_t cur = *nbanks; + size_t exp = s_id - (*nbanks) + 1; + if (VIR_EXPAND_N(bank, cur, exp) < 0) + goto error; + *nbanks = s_id + 1; + } + + if (bank[s_id].cpu_mask == NULL) { + if (!(bank[s_id].cpu_mask = virBitmapNew(npresent_cpus))) + goto error; + } + + ignore_value(virBitmapSetBit(bank[s_id].cpu_mask, i)); + + if (bank[s_id].cache_size == 0) { + if ((cache_size = virHostCPUGetCache(i, idx)) < 0) + goto error; + + bank[s_id].cache_size = cache_size; + bank[s_id].cache_min = cache_size / cbm_len; + } + } + return bank; + + error: + *nbanks = 0; + VIR_FREE(bank); + return NULL; +} diff --git a/src/util/virhostcpu.h b/src/util/virhostcpu.h index 39f7cf8..27f208e 100644 --- a/src/util/virhostcpu.h +++ b/src/util/virhostcpu.h @@ -27,6 +27,7 @@ # include "internal.h" # include "virarch.h" # include "virbitmap.h" +# include "virresctrl.h" # define VIR_HOST_CPU_MASK_LEN 1024 @@ -58,4 +59,9 @@ int virHostCPUStatsAssign(virNodeCPUStatsPtr param, const char *name, unsigned long long value); +virResCacheBankPtr virHostCPUGetCacheBanks(virArch arch, + int type, + size_t *nbanks, + int cbm_len); + #endif /* __VIR_HOSTCPU_H__*/ diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c new file mode 100644 index 0000000..44a47cc --- /dev/null +++ b/src/util/virresctrl.c @@ -0,0 +1,201 @@ +/* + * virresctrl.c: methods for managing resource control + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Authors: + * Eli Qiao <liyong.qiao@intel.com> + */ +#include <config.h> + +#include "virresctrl.h" +#include "viralloc.h" +#include "virerror.h" +#include "virfile.h" +#include "virhostcpu.h" +#include "virlog.h" +#include "virstring.h" +#include "virarch.h" + +VIR_LOG_INIT("util.resctrl"); + +#define VIR_FROM_THIS VIR_FROM_RESCTRL + +#define RESCTRL_DIR "/sys/fs/resctrl" +#define RESCTRL_INFO_DIR "/sys/fs/resctrl/info" +#define SYSFS_SYSTEM_PATH "/sys/devices/system" + +#define MAX_CPU_SOCKET_NUM 8 +#define MAX_CBM_BIT_LEN 32 +#define MAX_SCHEMATA_LEN 1024 +#define MAX_FILE_LEN (10 * 1024 * 1024) + +static unsigned int host_id; + +static virResCtrl resctrlall[] = { + { + .name = "L3", + .cache_level = "l3", + }, + { + .name = "L3DATA", + .cache_level = "l3", + }, + { + .name = "L3CODE", + .cache_level = "l3", + }, + { + .name = "L2", + .cache_level = "l2", + }, +}; + +static int virResCtrlGetInfoStr(const int type, const char *item, char **str) +{ + int ret = 0; + char *tmp; + char *path; + + if (virAsprintf(&path, "%s/%s/%s", RESCTRL_INFO_DIR, resctrlall[type].name, item) < 0) + return -1; + if (virFileReadAll(path, 10, str) < 0) { + ret = -1; + goto cleanup; + } + + if ((tmp = strchr(*str, '\n'))) *tmp = '\0'; + + cleanup: + VIR_FREE(path); + return ret; +} + +static int virResCtrlReadConfig(virArch arch, int type) +{ + int ret; + size_t i, nbanks; + char *str; + + /* Read num_closids from resctrl. + eg: /sys/fs/resctrl/info/L3/num_closids */ + if ((ret = virResCtrlGetInfoStr(type, "num_closids", &str)) < 0) + goto error; + + if ((ret = virStrToLong_i(str, NULL, 10, &resctrlall[type].num_closid)) < 0) + goto error; + + VIR_FREE(str); + + /* Read min_cbm_bits from resctrl. + eg: /sys/fs/resctrl/info/L3/cbm_mask */ + if ((ret = virResCtrlGetInfoStr(type, "min_cbm_bits", &str)) < 0) + goto error; + + if ((ret = virStrToLong_i(str, NULL, 10, &resctrlall[type].min_cbm_bits)) < 0) + goto error; + + VIR_FREE(str); + + /* Read cbm_mask string from resctrl. + eg: /sys/fs/resctrl/info/L3/cbm_mask */ + if ((ret = virResCtrlGetInfoStr(type, "cbm_mask", &str)) < 0) + goto error; + + /* cbm_mask is in hex, eg: "fffff", calculate cbm length from the default + cbm_mask. */ + resctrlall[type].cbm_len = strlen(str) * 4; + + /* Get all cache bank informations */ + resctrlall[type].cache_banks = virHostCPUGetCacheBanks(arch, + type, + &nbanks, resctrlall[type].cbm_len); + + if (resctrlall[type].cache_banks == NULL) + goto error; + + resctrlall[type].num_banks = nbanks; + + for (i = 0; i < resctrlall[type].num_banks; i++) { + /* L3CODE and L3DATA shares same L3 resource, so they should + * have same host_id. */ + if (type == VIR_RDT_RESOURCE_L3CODE) + resctrlall[type].cache_banks[i].host_id = resctrlall[VIR_RDT_RESOURCE_L3DATA].cache_banks[i].host_id; + else + resctrlall[type].cache_banks[i].host_id = host_id++; + } + + resctrlall[type].enabled = true; + + ret = 0; + + error: + VIR_FREE(str); + return ret; +} + +int +virResCtrlInit(void) +{ + size_t i = 0; + char *tmp; + int rc = 0; + + virArch hostarch; + + hostarch = virArchFromHost(); + + for (i = 0; i < VIR_RDT_RESOURCE_LAST; i++) { + if ((rc = virAsprintf(&tmp, "%s/%s", RESCTRL_INFO_DIR, resctrlall[i].name)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to initialize resource control config")); + goto cleanup; + } + + if (virFileExists(tmp)) { + if ((rc = virResCtrlReadConfig(hostarch, i)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to get resource control config")); + goto cleanup; + } + } + VIR_FREE(tmp); + } + + cleanup: + VIR_FREE(tmp); + return rc; +} + +/* + * Test whether the host support resource control + */ +bool +virResCtrlAvailable(void) +{ + if (!virFileExists(RESCTRL_INFO_DIR)) + return false; + return true; +} + +/* + * Return an virResCtrlPtr point to virResCtrl object, + * We should not modify it out side of virresctrl.c + */ +virResCtrlPtr +virResCtrlGet(int type) +{ + return &resctrlall[type]; +} diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h new file mode 100644 index 0000000..5a6a344 --- /dev/null +++ b/src/util/virresctrl.h @@ -0,0 +1,78 @@ +/* + * virresctrl.h: header for managing resctrl control + * + * Copyright (C) 2016 Intel, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Authors: + * Eli Qiao <liyong.qiao@intel.com> + */ + +#ifndef __VIR_RESCTRL_H__ +# define __VIR_RESCTRL_H__ + +# include "virbitmap.h" + +enum { + VIR_RDT_RESOURCE_L3, + VIR_RDT_RESOURCE_L3DATA, + VIR_RDT_RESOURCE_L3CODE, + VIR_RDT_RESOURCE_L2, + /* Must be the last */ + VIR_RDT_RESOURCE_LAST, +}; + + +typedef struct _virResCacheBank virResCacheBank; +typedef virResCacheBank *virResCacheBankPtr; +struct _virResCacheBank { + unsigned int host_id; + unsigned long long cache_size; + unsigned long long cache_left; + unsigned long long cache_min; + virBitmapPtr cpu_mask; +}; + +/** + * struct rdt_resource - attributes of an RDT resource + * @enabled: Is this feature enabled on this machine + * @name: Name to use in "schemata" file + * @num_closid: Number of CLOSIDs available + * @max_cbm: Largest Cache Bit Mask allowed + * @min_cbm_bits: Minimum number of consecutive bits to be set + * in a cache bit mask + * @cache_level: Which cache level defines scope of this domain + * @num_banks: Number of cache bank on this machine. + * @cache_banks: Array of cache bank + */ +typedef struct _virResCtrl virResCtrl; +typedef virResCtrl *virResCtrlPtr; +struct _virResCtrl { + bool enabled; + const char *name; + int num_closid; + int cbm_len; + int min_cbm_bits; + const char* cache_level; + int num_banks; + virResCacheBankPtr cache_banks; +}; + +bool virResCtrlAvailable(void); +int virResCtrlInit(void); +virResCtrlPtr virResCtrlGet(int); + +#endif -- 1.9.1

On Mon, Mar 06, 2017 at 06:06:30PM +0800, Eli Qiao wrote:
This patch adds some utils struct and functions to expose resctrl information.
virResCtrlAvailable: if resctrl interface exist on host. virResCtrlGet: get specific type resource control information. virResCtrlInit: initialize resctrl struct from the host's sys fs. resctrlall[]: an array to maintain resource control information.
Some of host cpu related information methods was added in virhostcpu.c
So to be able to test all this we need to make a bit different approach. I'm not in favour of pushing this without proper tests. Some paths need to be configurable, some readings should be unified. Unfortunately lot of the code is just copy-paste mess from the past. Fortunately for you, I'm working on cleaning this up, at least a little bit, so that we can add the tests easily. I got almost up to the test (I stumbled upon few rabbit holes on the way and some clean-ups went wrong along the way), so it should be pretty easy for you to modify this code to use what I'll be proposing to add. It's not ready yet, but you can start rebasing your series on top of my branch pre-cat from my github repo [1]. The commits are not very well described right now (for some temporary ones I used whatthecommit.com, sorry), but I'll fix all that. I'll be updating the branch, but it will be done with force pushes, so be careful when rebasing on top of newer versions. I can do that if you don't want, just let me know so we can coordinate. Just a quick heads-up, there will be virsysfs that will be used for the reads, some additional helper functions in virhostcpu and virfile, test that scans copy of /sys/devices/system (with that path faked thanks to using the aforementioned virsysfs) and outputs capabilities so that we can check the capability XML and so on. Martin [1] https://github.com/nertpinx/libvirt
Signed-off-by: Eli Qiao <liyong.qiao@intel.com> --- include/libvirt/virterror.h | 1 + po/POTFILES.in | 1 + src/Makefile.am | 1 + src/libvirt_private.syms | 4 + src/util/virerror.c | 1 + src/util/virhostcpu.c | 186 ++++++++++++++++++++++++++++++++++++---- src/util/virhostcpu.h | 6 ++ src/util/virresctrl.c | 201 ++++++++++++++++++++++++++++++++++++++++++++ src/util/virresctrl.h | 78 +++++++++++++++++ 9 files changed, 462 insertions(+), 17 deletions(-) create mode 100644 src/util/virresctrl.c create mode 100644 src/util/virresctrl.h
diff --git a/include/libvirt/virterror.h b/include/libvirt/virterror.h index 2efee8f..3dd2d08 100644 --- a/include/libvirt/virterror.h +++ b/include/libvirt/virterror.h @@ -132,6 +132,7 @@ typedef enum {
VIR_FROM_PERF = 65, /* Error from perf */ VIR_FROM_LIBSSH = 66, /* Error from libssh connection transport */ + VIR_FROM_RESCTRL = 67, /* Error from resource control */
# ifdef VIR_ENUM_SENTINELS VIR_ERR_DOMAIN_LAST diff --git a/po/POTFILES.in b/po/POTFILES.in index 7c7f530..4147bc6 100644 --- a/po/POTFILES.in +++ b/po/POTFILES.in @@ -241,6 +241,7 @@ src/util/virportallocator.c src/util/virprocess.c src/util/virqemu.c src/util/virrandom.c +src/util/virresctrl.c src/util/virrotatingfile.c src/util/virscsi.c src/util/virscsihost.c diff --git a/src/Makefile.am b/src/Makefile.am index 7d42eac..edb946a 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -162,6 +162,7 @@ UTIL_SOURCES = \ util/virprocess.c util/virprocess.h \ util/virqemu.c util/virqemu.h \ util/virrandom.h util/virrandom.c \ + util/virresctrl.h util/virresctrl.c \ util/virrotatingfile.h util/virrotatingfile.c \ util/virscsi.c util/virscsi.h \ util/virscsihost.c util/virscsihost.h \ diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index aed1d3d..bb7c3ad 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2320,6 +2320,10 @@ virRandomGenerateWWN; virRandomInt;
+# util/virresctrl.h +virResCtrlAvailable; +virResCtrlInit; + # util/virrotatingfile.h virRotatingFileReaderConsume; virRotatingFileReaderFree; diff --git a/src/util/virerror.c b/src/util/virerror.c index ef17fb5..0ba15e6 100644 --- a/src/util/virerror.c +++ b/src/util/virerror.c @@ -139,6 +139,7 @@ VIR_ENUM_IMPL(virErrorDomain, VIR_ERR_DOMAIN_LAST,
"Perf", /* 65 */ "Libssh transport layer", + "Resouce Control", )
diff --git a/src/util/virhostcpu.c b/src/util/virhostcpu.c index f29f312..e6d5102 100644 --- a/src/util/virhostcpu.c +++ b/src/util/virhostcpu.c @@ -206,29 +206,21 @@ void virHostCPUSetSysFSSystemPathLinux(const char *path) sysfs_system_path = SYSFS_SYSTEM_PATH; }
-/* Return the positive decimal contents of the given - * DIR/cpu%u/FILE, or -1 on error. If DEFAULT_VALUE is non-negative - * and the file could not be found, return that instead of an error; - * this is useful for machines that cannot hot-unplug cpu0, or where - * hot-unplugging is disabled, or where the kernel is too old - * to support NUMA cells, etc. */ +/* Get a String value*/ static int -virHostCPUGetValue(const char *dir, unsigned int cpu, const char *file, - int default_value) +virHostCPUGetStrValue(const char *dir, unsigned int cpu, const char *file, char *value_str) { char *path; FILE *pathfp; - int value = -1; - char value_str[INT_BUFSIZE_BOUND(value)]; - char *tmp; + int ret = -1;
if (virAsprintf(&path, "%s/cpu%u/%s", dir, cpu, file) < 0) return -1;
pathfp = fopen(path, "r"); if (pathfp == NULL) { - if (default_value >= 0 && errno == ENOENT) - value = default_value; + if (errno == ENOENT) + return -2; else virReportSystemError(errno, _("cannot open %s"), path); goto cleanup; @@ -238,17 +230,84 @@ virHostCPUGetValue(const char *dir, unsigned int cpu, const char *file, virReportSystemError(errno, _("cannot read from %s"), path); goto cleanup; } + + ret = 0; + + cleanup: + VIR_FORCE_FCLOSE(pathfp); + VIR_FREE(path); + return ret; +} + + +/* Return the positive decimal contents of the given + * DIR/cpu%u/FILE, or -1 on error. If DEFAULT_VALUE is non-negative + * and the file could not be found, return that instead of an error; + * this is useful for machines that cannot hot-unplug cpu0, or where + * hot-unplugging is disabled, or where the kernel is too old + * to support NUMA cells, etc. */ +static int +virHostCPUGetValue(const char *dir, unsigned int cpu, const char *file, + int default_value) +{ + int value = -1; + char value_str[INT_BUFSIZE_BOUND(value)]; + char *tmp; + int ret; + + if ((ret = (virHostCPUGetStrValue(dir, cpu, file, value_str))) < 0) { + if (ret == -2 && default_value >= 0) + return default_value; + else + return -1; + } + if (virStrToLong_i(value_str, &tmp, 10, &value) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("could not convert '%s' to an integer"), value_str); - goto cleanup; + return -1; } + return value; +}
- cleanup: - VIR_FORCE_FCLOSE(pathfp); - VIR_FREE(path); +/* Return specific type cache size in KiB of given cpu + -1 on error happened */ +static +int virHostCPUGetCache(unsigned int cpu, unsigned int type) +{ + char *cachedir = NULL; + char *cpudir; + char *unit = NULL; + char *tmp; + int value = -1; + unsigned long long size; + char value_str[INT_BUFSIZE_BOUND(value)];
+ if (virAsprintf(&cpudir, "%s/cpu", sysfs_system_path) < 0) + return -1; + + if (virAsprintf(&cachedir, "cache/index%u/size", type) < 0) + goto error; + + if (virHostCPUGetStrValue(cpudir, cpu, cachedir, value_str) < 0) + goto error; + + if ((tmp = strchr(value_str, '\n'))) *tmp = '\0'; + + if (virStrToLong_i(value_str, &unit, 10, &value) < 0) + goto error; + + size = value; + + if (virScaleInteger(&size, unit, 1, ULLONG_MAX) < 0) + goto error; + + return size / 1024; + + error: + VIR_FREE(cpudir); + VIR_FREE(cachedir); return value; }
@@ -301,6 +360,23 @@ virHostCPUParseSocket(const char *dir, return ret; }
+/* return socket id of a given cpu*/ +static +int virHostCPUGetSocketId(virArch hostarch, unsigned int cpu) +{ + char *cpu_dir; + int ret = -1; + + if (virAsprintf(&cpu_dir, "%s/cpu", sysfs_system_path) < 0) + goto cleanup; + + ret = virHostCPUParseSocket(cpu_dir, hostarch, cpu); + + cleanup: + VIR_FREE(cpu_dir); + return ret; +} + /* parses a node entry, returning number of processors in the node and * filling arguments */ static int @@ -1346,3 +1422,79 @@ virHostCPUGetKVMMaxVCPUs(void) return -1; } #endif /* HAVE_LINUX_KVM_H */ + +/* Fill all cache bank informations + * Return a list of virResCacheBankPtr, and fill cache bank information + * by loop for all cpus on host, number of cache bank will be set in nbanks + * + * NULL if error happened, and nbanks will be set 0. */ +virResCacheBankPtr virHostCPUGetCacheBanks(virArch arch, int type, size_t *nbanks, int cbm_len) +{ + int npresent_cpus; + int idx; + size_t i; + virResCacheBankPtr bank; + + *nbanks = 0; + if ((npresent_cpus = virHostCPUGetCount()) < 0) + return NULL; + + switch (type) { + case VIR_RDT_RESOURCE_L3: + case VIR_RDT_RESOURCE_L3DATA: + case VIR_RDT_RESOURCE_L3CODE: + idx = 3; + break; + case VIR_RDT_RESOURCE_L2: + idx = 2; + break; + default: + idx = -1; + } + + if (idx == -1) + return NULL; + + if (VIR_ALLOC_N(bank, 1) < 0) + return NULL; + + *nbanks = 1; + + for (i = 0; i < npresent_cpus; i ++) { + int s_id; + int cache_size; + + if ((s_id = virHostCPUGetSocketId(arch, i)) < 0) + goto error; + + /* Expand cache bank array */ + if (s_id > (*nbanks - 1)) { + size_t cur = *nbanks; + size_t exp = s_id - (*nbanks) + 1; + if (VIR_EXPAND_N(bank, cur, exp) < 0) + goto error; + *nbanks = s_id + 1; + } + + if (bank[s_id].cpu_mask == NULL) { + if (!(bank[s_id].cpu_mask = virBitmapNew(npresent_cpus))) + goto error; + } + + ignore_value(virBitmapSetBit(bank[s_id].cpu_mask, i)); + + if (bank[s_id].cache_size == 0) { + if ((cache_size = virHostCPUGetCache(i, idx)) < 0) + goto error; + + bank[s_id].cache_size = cache_size; + bank[s_id].cache_min = cache_size / cbm_len; + } + } + return bank; + + error: + *nbanks = 0; + VIR_FREE(bank); + return NULL; +} diff --git a/src/util/virhostcpu.h b/src/util/virhostcpu.h index 39f7cf8..27f208e 100644 --- a/src/util/virhostcpu.h +++ b/src/util/virhostcpu.h @@ -27,6 +27,7 @@ # include "internal.h" # include "virarch.h" # include "virbitmap.h" +# include "virresctrl.h"
# define VIR_HOST_CPU_MASK_LEN 1024
@@ -58,4 +59,9 @@ int virHostCPUStatsAssign(virNodeCPUStatsPtr param, const char *name, unsigned long long value);
+virResCacheBankPtr virHostCPUGetCacheBanks(virArch arch, + int type, + size_t *nbanks, + int cbm_len); + #endif /* __VIR_HOSTCPU_H__*/ diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c new file mode 100644 index 0000000..44a47cc --- /dev/null +++ b/src/util/virresctrl.c @@ -0,0 +1,201 @@ +/* + * virresctrl.c: methods for managing resource control + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Authors: + * Eli Qiao <liyong.qiao@intel.com> + */ +#include <config.h> + +#include "virresctrl.h" +#include "viralloc.h" +#include "virerror.h" +#include "virfile.h" +#include "virhostcpu.h" +#include "virlog.h" +#include "virstring.h" +#include "virarch.h" + +VIR_LOG_INIT("util.resctrl"); + +#define VIR_FROM_THIS VIR_FROM_RESCTRL + +#define RESCTRL_DIR "/sys/fs/resctrl" +#define RESCTRL_INFO_DIR "/sys/fs/resctrl/info" +#define SYSFS_SYSTEM_PATH "/sys/devices/system" + +#define MAX_CPU_SOCKET_NUM 8 +#define MAX_CBM_BIT_LEN 32 +#define MAX_SCHEMATA_LEN 1024 +#define MAX_FILE_LEN (10 * 1024 * 1024) + +static unsigned int host_id; + +static virResCtrl resctrlall[] = { + { + .name = "L3", + .cache_level = "l3", + }, + { + .name = "L3DATA", + .cache_level = "l3", + }, + { + .name = "L3CODE", + .cache_level = "l3", + }, + { + .name = "L2", + .cache_level = "l2", + }, +}; + +static int virResCtrlGetInfoStr(const int type, const char *item, char **str) +{ + int ret = 0; + char *tmp; + char *path; + + if (virAsprintf(&path, "%s/%s/%s", RESCTRL_INFO_DIR, resctrlall[type].name, item) < 0) + return -1; + if (virFileReadAll(path, 10, str) < 0) { + ret = -1; + goto cleanup; + } + + if ((tmp = strchr(*str, '\n'))) *tmp = '\0'; + + cleanup: + VIR_FREE(path); + return ret; +} + +static int virResCtrlReadConfig(virArch arch, int type) +{ + int ret; + size_t i, nbanks; + char *str; + + /* Read num_closids from resctrl. + eg: /sys/fs/resctrl/info/L3/num_closids */ + if ((ret = virResCtrlGetInfoStr(type, "num_closids", &str)) < 0) + goto error; + + if ((ret = virStrToLong_i(str, NULL, 10, &resctrlall[type].num_closid)) < 0) + goto error; + + VIR_FREE(str); + + /* Read min_cbm_bits from resctrl. + eg: /sys/fs/resctrl/info/L3/cbm_mask */ + if ((ret = virResCtrlGetInfoStr(type, "min_cbm_bits", &str)) < 0) + goto error; + + if ((ret = virStrToLong_i(str, NULL, 10, &resctrlall[type].min_cbm_bits)) < 0) + goto error; + + VIR_FREE(str); + + /* Read cbm_mask string from resctrl. + eg: /sys/fs/resctrl/info/L3/cbm_mask */ + if ((ret = virResCtrlGetInfoStr(type, "cbm_mask", &str)) < 0) + goto error; + + /* cbm_mask is in hex, eg: "fffff", calculate cbm length from the default + cbm_mask. */ + resctrlall[type].cbm_len = strlen(str) * 4; + + /* Get all cache bank informations */ + resctrlall[type].cache_banks = virHostCPUGetCacheBanks(arch, + type, + &nbanks, resctrlall[type].cbm_len); + + if (resctrlall[type].cache_banks == NULL) + goto error; + + resctrlall[type].num_banks = nbanks; + + for (i = 0; i < resctrlall[type].num_banks; i++) { + /* L3CODE and L3DATA shares same L3 resource, so they should + * have same host_id. */ + if (type == VIR_RDT_RESOURCE_L3CODE) + resctrlall[type].cache_banks[i].host_id = resctrlall[VIR_RDT_RESOURCE_L3DATA].cache_banks[i].host_id; + else + resctrlall[type].cache_banks[i].host_id = host_id++; + } + + resctrlall[type].enabled = true; + + ret = 0; + + error: + VIR_FREE(str); + return ret; +} + +int +virResCtrlInit(void) +{ + size_t i = 0; + char *tmp; + int rc = 0; + + virArch hostarch; + + hostarch = virArchFromHost(); + + for (i = 0; i < VIR_RDT_RESOURCE_LAST; i++) { + if ((rc = virAsprintf(&tmp, "%s/%s", RESCTRL_INFO_DIR, resctrlall[i].name)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to initialize resource control config")); + goto cleanup; + } + + if (virFileExists(tmp)) { + if ((rc = virResCtrlReadConfig(hostarch, i)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to get resource control config")); + goto cleanup; + } + } + VIR_FREE(tmp); + } + + cleanup: + VIR_FREE(tmp); + return rc; +} + +/* + * Test whether the host support resource control + */ +bool +virResCtrlAvailable(void) +{ + if (!virFileExists(RESCTRL_INFO_DIR)) + return false; + return true; +} + +/* + * Return an virResCtrlPtr point to virResCtrl object, + * We should not modify it out side of virresctrl.c + */ +virResCtrlPtr +virResCtrlGet(int type) +{ + return &resctrlall[type]; +} diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h new file mode 100644 index 0000000..5a6a344 --- /dev/null +++ b/src/util/virresctrl.h @@ -0,0 +1,78 @@ +/* + * virresctrl.h: header for managing resctrl control + * + * Copyright (C) 2016 Intel, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Authors: + * Eli Qiao <liyong.qiao@intel.com> + */ + +#ifndef __VIR_RESCTRL_H__ +# define __VIR_RESCTRL_H__ + +# include "virbitmap.h" + +enum { + VIR_RDT_RESOURCE_L3, + VIR_RDT_RESOURCE_L3DATA, + VIR_RDT_RESOURCE_L3CODE, + VIR_RDT_RESOURCE_L2, + /* Must be the last */ + VIR_RDT_RESOURCE_LAST, +}; + + +typedef struct _virResCacheBank virResCacheBank; +typedef virResCacheBank *virResCacheBankPtr; +struct _virResCacheBank { + unsigned int host_id; + unsigned long long cache_size; + unsigned long long cache_left; + unsigned long long cache_min; + virBitmapPtr cpu_mask; +}; + +/** + * struct rdt_resource - attributes of an RDT resource + * @enabled: Is this feature enabled on this machine + * @name: Name to use in "schemata" file + * @num_closid: Number of CLOSIDs available + * @max_cbm: Largest Cache Bit Mask allowed + * @min_cbm_bits: Minimum number of consecutive bits to be set + * in a cache bit mask + * @cache_level: Which cache level defines scope of this domain + * @num_banks: Number of cache bank on this machine. + * @cache_banks: Array of cache bank + */ +typedef struct _virResCtrl virResCtrl; +typedef virResCtrl *virResCtrlPtr; +struct _virResCtrl { + bool enabled; + const char *name; + int num_closid; + int cbm_len; + int min_cbm_bits; + const char* cache_level; + int num_banks; + virResCacheBankPtr cache_banks; +}; + +bool virResCtrlAvailable(void); +int virResCtrlInit(void); +virResCtrlPtr virResCtrlGet(int); + +#endif -- 1.9.1
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

-- Best regards Eli 天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Wednesday, 15 March 2017 at 7:57 PM, Martin Kletzander wrote:
On Mon, Mar 06, 2017 at 06:06:30PM +0800, Eli Qiao wrote:
This patch adds some utils struct and functions to expose resctrl information.
virResCtrlAvailable: if resctrl interface exist on host. virResCtrlGet: get specific type resource control information. virResCtrlInit: initialize resctrl struct from the host's sys fs. resctrlall[]: an array to maintain resource control information.
Some of host cpu related information methods was added in virhostcpu.c
So to be able to test all this we need to make a bit different approach. I'm not in favour of pushing this without proper tests. Some paths need to be configurable, some readings should be unified. Unfortunately lot of the code is just copy-paste mess from the past. Fortunately for you,
I'm working on cleaning this up, at least a little bit, so that we can
Good news.
add the tests easily. I got almost up to the test (I stumbled upon few rabbit holes on the way and some clean-ups went wrong along the way), so it should be pretty easy for you to modify this code to use what I'll be proposing to add. It's not ready yet, but you can start rebasing your series on top of my branch pre-cat from my github repo [1]. The commits are not very well described right now (for some temporary ones I used whatthecommit.com (http://whatthecommit.com), sorry), but I'll fix all that. I'll be updating the branch, but it will be done with force pushes, so be careful when rebasing on top of newer versions.
I can do that if you don't want, just let me know so we can coordinate.
of cause we can do some coordinate, but I am glad that you can help on this to speed up the progress to merge them, as you know this patch is in V10 already, and it has 12 patch set, kinds of hard to doing rebase… :(
Just a quick heads-up, there will be virsysfs that will be used for the reads, some additional helper functions in virhostcpu and virfile, test that scans copy of /sys/devices/system (with that path faked thanks to using the aforementioned virsysfs) and outputs capabilities so that we can check the capability XML and so on.
Ah, that’s a good news..
Martin
[1] https://github.com/nertpinx/libvirt
Signed-off-by: Eli Qiao <liyong.qiao@intel.com (mailto:liyong.qiao@intel.com)> --- include/libvirt/virterror.h | 1 + po/POTFILES.in (http://POTFILES.in) | 1 + src/Makefile.am (http://Makefile.am) | 1 + src/libvirt_private.syms | 4 + src/util/virerror.c | 1 + src/util/virhostcpu.c | 186 ++++++++++++++++++++++++++++++++++++---- src/util/virhostcpu.h | 6 ++ src/util/virresctrl.c | 201 ++++++++++++++++++++++++++++++++++++++++++++ src/util/virresctrl.h | 78 +++++++++++++++++ 9 files changed, 462 insertions(+), 17 deletions(-) create mode 100644 src/util/virresctrl.c create mode 100644 src/util/virresctrl.h
diff --git a/include/libvirt/virterror.h b/include/libvirt/virterror.h index 2efee8f..3dd2d08 100644 --- a/include/libvirt/virterror.h +++ b/include/libvirt/virterror.h @@ -132,6 +132,7 @@ typedef enum {
VIR_FROM_PERF = 65, /* Error from perf */ VIR_FROM_LIBSSH = 66, /* Error from libssh connection transport */ + VIR_FROM_RESCTRL = 67, /* Error from resource control */
# ifdef VIR_ENUM_SENTINELS VIR_ERR_DOMAIN_LAST diff --git a/po/POTFILES.in (http://POTFILES.in) b/po/POTFILES.in (http://POTFILES.in) index 7c7f530..4147bc6 100644 --- a/po/POTFILES.in (http://POTFILES.in) +++ b/po/POTFILES.in (http://POTFILES.in) @@ -241,6 +241,7 @@ src/util/virportallocator.c src/util/virprocess.c src/util/virqemu.c src/util/virrandom.c +src/util/virresctrl.c src/util/virrotatingfile.c src/util/virscsi.c src/util/virscsihost.c diff --git a/src/Makefile.am (http://Makefile.am) b/src/Makefile.am (http://Makefile.am) index 7d42eac..edb946a 100644 --- a/src/Makefile.am (http://Makefile.am) +++ b/src/Makefile.am (http://Makefile.am) @@ -162,6 +162,7 @@ UTIL_SOURCES = \ util/virprocess.c util/virprocess.h \ util/virqemu.c util/virqemu.h \ util/virrandom.h util/virrandom.c \ + util/virresctrl.h util/virresctrl.c \ util/virrotatingfile.h util/virrotatingfile.c \ util/virscsi.c util/virscsi.h \ util/virscsihost.c util/virscsihost.h \ diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index aed1d3d..bb7c3ad 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2320,6 +2320,10 @@ virRandomGenerateWWN; virRandomInt;
+# util/virresctrl.h +virResCtrlAvailable; +virResCtrlInit; + # util/virrotatingfile.h virRotatingFileReaderConsume; virRotatingFileReaderFree; diff --git a/src/util/virerror.c b/src/util/virerror.c index ef17fb5..0ba15e6 100644 --- a/src/util/virerror.c +++ b/src/util/virerror.c @@ -139,6 +139,7 @@ VIR_ENUM_IMPL(virErrorDomain, VIR_ERR_DOMAIN_LAST,
"Perf", /* 65 */ "Libssh transport layer", + "Resouce Control", )
diff --git a/src/util/virhostcpu.c b/src/util/virhostcpu.c index f29f312..e6d5102 100644 --- a/src/util/virhostcpu.c +++ b/src/util/virhostcpu.c @@ -206,29 +206,21 @@ void virHostCPUSetSysFSSystemPathLinux(const char *path) sysfs_system_path = SYSFS_SYSTEM_PATH; }
-/* Return the positive decimal contents of the given - * DIR/cpu%u/FILE, or -1 on error. If DEFAULT_VALUE is non-negative - * and the file could not be found, return that instead of an error; - * this is useful for machines that cannot hot-unplug cpu0, or where - * hot-unplugging is disabled, or where the kernel is too old - * to support NUMA cells, etc. */ +/* Get a String value*/ static int -virHostCPUGetValue(const char *dir, unsigned int cpu, const char *file, - int default_value) +virHostCPUGetStrValue(const char *dir, unsigned int cpu, const char *file, char *value_str) { char *path; FILE *pathfp; - int value = -1; - char value_str[INT_BUFSIZE_BOUND(value)]; - char *tmp; + int ret = -1;
if (virAsprintf(&path, "%s/cpu%u/%s", dir, cpu, file) < 0) return -1;
pathfp = fopen(path, "r"); if (pathfp == NULL) { - if (default_value >= 0 && errno == ENOENT) - value = default_value; + if (errno == ENOENT) + return -2; else virReportSystemError(errno, _("cannot open %s"), path); goto cleanup; @@ -238,17 +230,84 @@ virHostCPUGetValue(const char *dir, unsigned int cpu, const char *file, virReportSystemError(errno, _("cannot read from %s"), path); goto cleanup; } + + ret = 0; + + cleanup: + VIR_FORCE_FCLOSE(pathfp); + VIR_FREE(path); + return ret; +} + + +/* Return the positive decimal contents of the given + * DIR/cpu%u/FILE, or -1 on error. If DEFAULT_VALUE is non-negative + * and the file could not be found, return that instead of an error; + * this is useful for machines that cannot hot-unplug cpu0, or where + * hot-unplugging is disabled, or where the kernel is too old + * to support NUMA cells, etc. */ +static int +virHostCPUGetValue(const char *dir, unsigned int cpu, const char *file, + int default_value) +{ + int value = -1; + char value_str[INT_BUFSIZE_BOUND(value)]; + char *tmp; + int ret; + + if ((ret = (virHostCPUGetStrValue(dir, cpu, file, value_str))) < 0) { + if (ret == -2 && default_value >= 0) + return default_value; + else + return -1; + } + if (virStrToLong_i(value_str, &tmp, 10, &value) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("could not convert '%s' to an integer"), value_str); - goto cleanup; + return -1; } + return value; +}
- cleanup: - VIR_FORCE_FCLOSE(pathfp); - VIR_FREE(path); +/* Return specific type cache size in KiB of given cpu + -1 on error happened */ +static +int virHostCPUGetCache(unsigned int cpu, unsigned int type) +{ + char *cachedir = NULL; + char *cpudir; + char *unit = NULL; + char *tmp; + int value = -1; + unsigned long long size; + char value_str[INT_BUFSIZE_BOUND(value)];
+ if (virAsprintf(&cpudir, "%s/cpu", sysfs_system_path) < 0) + return -1; + + if (virAsprintf(&cachedir, "cache/index%u/size", type) < 0) + goto error; + + if (virHostCPUGetStrValue(cpudir, cpu, cachedir, value_str) < 0) + goto error; + + if ((tmp = strchr(value_str, '\n'))) *tmp = '\0'; + + if (virStrToLong_i(value_str, &unit, 10, &value) < 0) + goto error; + + size = value; + + if (virScaleInteger(&size, unit, 1, ULLONG_MAX) < 0) + goto error; + + return size / 1024; + + error: + VIR_FREE(cpudir); + VIR_FREE(cachedir); return value; }
@@ -301,6 +360,23 @@ virHostCPUParseSocket(const char *dir, return ret; }
+/* return socket id of a given cpu*/ +static +int virHostCPUGetSocketId(virArch hostarch, unsigned int cpu) +{ + char *cpu_dir; + int ret = -1; + + if (virAsprintf(&cpu_dir, "%s/cpu", sysfs_system_path) < 0) + goto cleanup; + + ret = virHostCPUParseSocket(cpu_dir, hostarch, cpu); + + cleanup: + VIR_FREE(cpu_dir); + return ret; +} + /* parses a node entry, returning number of processors in the node and * filling arguments */ static int @@ -1346,3 +1422,79 @@ virHostCPUGetKVMMaxVCPUs(void) return -1; } #endif /* HAVE_LINUX_KVM_H */ + +/* Fill all cache bank informations + * Return a list of virResCacheBankPtr, and fill cache bank information + * by loop for all cpus on host, number of cache bank will be set in nbanks + * + * NULL if error happened, and nbanks will be set 0. */ +virResCacheBankPtr virHostCPUGetCacheBanks(virArch arch, int type, size_t *nbanks, int cbm_len) +{ + int npresent_cpus; + int idx; + size_t i; + virResCacheBankPtr bank; + + *nbanks = 0; + if ((npresent_cpus = virHostCPUGetCount()) < 0) + return NULL; + + switch (type) { + case VIR_RDT_RESOURCE_L3: + case VIR_RDT_RESOURCE_L3DATA: + case VIR_RDT_RESOURCE_L3CODE: + idx = 3; + break; + case VIR_RDT_RESOURCE_L2: + idx = 2; + break; + default: + idx = -1; + } + + if (idx == -1) + return NULL; + + if (VIR_ALLOC_N(bank, 1) < 0) + return NULL; + + *nbanks = 1; + + for (i = 0; i < npresent_cpus; i ++) { + int s_id; + int cache_size; + + if ((s_id = virHostCPUGetSocketId(arch, i)) < 0) + goto error; + + /* Expand cache bank array */ + if (s_id > (*nbanks - 1)) { + size_t cur = *nbanks; + size_t exp = s_id - (*nbanks) + 1; + if (VIR_EXPAND_N(bank, cur, exp) < 0) + goto error; + *nbanks = s_id + 1; + } + + if (bank[s_id].cpu_mask == NULL) { + if (!(bank[s_id].cpu_mask = virBitmapNew(npresent_cpus))) + goto error; + } + + ignore_value(virBitmapSetBit(bank[s_id].cpu_mask, i)); + + if (bank[s_id].cache_size == 0) { + if ((cache_size = virHostCPUGetCache(i, idx)) < 0) + goto error; + + bank[s_id].cache_size = cache_size; + bank[s_id].cache_min = cache_size / cbm_len; + } + } + return bank; + + error: + *nbanks = 0; + VIR_FREE(bank); + return NULL; +} diff --git a/src/util/virhostcpu.h b/src/util/virhostcpu.h index 39f7cf8..27f208e 100644 --- a/src/util/virhostcpu.h +++ b/src/util/virhostcpu.h @@ -27,6 +27,7 @@ # include "internal.h" # include "virarch.h" # include "virbitmap.h" +# include "virresctrl.h"
# define VIR_HOST_CPU_MASK_LEN 1024
@@ -58,4 +59,9 @@ int virHostCPUStatsAssign(virNodeCPUStatsPtr param, const char *name, unsigned long long value);
+virResCacheBankPtr virHostCPUGetCacheBanks(virArch arch, + int type, + size_t *nbanks, + int cbm_len); + #endif /* __VIR_HOSTCPU_H__*/ diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c new file mode 100644 index 0000000..44a47cc --- /dev/null +++ b/src/util/virresctrl.c @@ -0,0 +1,201 @@ +/* + * virresctrl.c: methods for managing resource control + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Authors: + * Eli Qiao <liyong.qiao@intel.com (mailto:liyong.qiao@intel.com)> + */ +#include <config.h> + +#include "virresctrl.h" +#include "viralloc.h" +#include "virerror.h" +#include "virfile.h" +#include "virhostcpu.h" +#include "virlog.h" +#include "virstring.h" +#include "virarch.h" + +VIR_LOG_INIT("util.resctrl"); + +#define VIR_FROM_THIS VIR_FROM_RESCTRL + +#define RESCTRL_DIR "/sys/fs/resctrl" +#define RESCTRL_INFO_DIR "/sys/fs/resctrl/info" +#define SYSFS_SYSTEM_PATH "/sys/devices/system" + +#define MAX_CPU_SOCKET_NUM 8 +#define MAX_CBM_BIT_LEN 32 +#define MAX_SCHEMATA_LEN 1024 +#define MAX_FILE_LEN (10 * 1024 * 1024) + +static unsigned int host_id; + +static virResCtrl resctrlall[] = { + { + .name = "L3", + .cache_level = "l3", + }, + { + .name = "L3DATA", + .cache_level = "l3", + }, + { + .name = "L3CODE", + .cache_level = "l3", + }, + { + .name = "L2", + .cache_level = "l2", + }, +}; + +static int virResCtrlGetInfoStr(const int type, const char *item, char **str) +{ + int ret = 0; + char *tmp; + char *path; + + if (virAsprintf(&path, "%s/%s/%s", RESCTRL_INFO_DIR, resctrlall[type].name, item) < 0) + return -1; + if (virFileReadAll(path, 10, str) < 0) { + ret = -1; + goto cleanup; + } + + if ((tmp = strchr(*str, '\n'))) *tmp = '\0'; + + cleanup: + VIR_FREE(path); + return ret; +} + +static int virResCtrlReadConfig(virArch arch, int type) +{ + int ret; + size_t i, nbanks; + char *str; + + /* Read num_closids from resctrl. + eg: /sys/fs/resctrl/info/L3/num_closids */ + if ((ret = virResCtrlGetInfoStr(type, "num_closids", &str)) < 0) + goto error; + + if ((ret = virStrToLong_i(str, NULL, 10, &resctrlall[type].num_closid)) < 0) + goto error; + + VIR_FREE(str); + + /* Read min_cbm_bits from resctrl. + eg: /sys/fs/resctrl/info/L3/cbm_mask */ + if ((ret = virResCtrlGetInfoStr(type, "min_cbm_bits", &str)) < 0) + goto error; + + if ((ret = virStrToLong_i(str, NULL, 10, &resctrlall[type].min_cbm_bits)) < 0) + goto error; + + VIR_FREE(str); + + /* Read cbm_mask string from resctrl. + eg: /sys/fs/resctrl/info/L3/cbm_mask */ + if ((ret = virResCtrlGetInfoStr(type, "cbm_mask", &str)) < 0) + goto error; + + /* cbm_mask is in hex, eg: "fffff", calculate cbm length from the default + cbm_mask. */ + resctrlall[type].cbm_len = strlen(str) * 4; + + /* Get all cache bank informations */ + resctrlall[type].cache_banks = virHostCPUGetCacheBanks(arch, + type, + &nbanks, resctrlall[type].cbm_len); + + if (resctrlall[type].cache_banks == NULL) + goto error; + + resctrlall[type].num_banks = nbanks; + + for (i = 0; i < resctrlall[type].num_banks; i++) { + /* L3CODE and L3DATA shares same L3 resource, so they should + * have same host_id. */ + if (type == VIR_RDT_RESOURCE_L3CODE) + resctrlall[type].cache_banks[i].host_id = resctrlall[VIR_RDT_RESOURCE_L3DATA].cache_banks[i].host_id; + else + resctrlall[type].cache_banks[i].host_id = host_id++; + } + + resctrlall[type].enabled = true; + + ret = 0; + + error: + VIR_FREE(str); + return ret; +} + +int +virResCtrlInit(void) +{ + size_t i = 0; + char *tmp; + int rc = 0; + + virArch hostarch; + + hostarch = virArchFromHost(); + + for (i = 0; i < VIR_RDT_RESOURCE_LAST; i++) { + if ((rc = virAsprintf(&tmp, "%s/%s", RESCTRL_INFO_DIR, resctrlall[i].name)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to initialize resource control config")); + goto cleanup; + } + + if (virFileExists(tmp)) { + if ((rc = virResCtrlReadConfig(hostarch, i)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to get resource control config")); + goto cleanup; + } + } + VIR_FREE(tmp); + } + + cleanup: + VIR_FREE(tmp); + return rc; +} + +/* + * Test whether the host support resource control + */ +bool +virResCtrlAvailable(void) +{ + if (!virFileExists(RESCTRL_INFO_DIR)) + return false; + return true; +} + +/* + * Return an virResCtrlPtr point to virResCtrl object, + * We should not modify it out side of virresctrl.c + */ +virResCtrlPtr +virResCtrlGet(int type) +{ + return &resctrlall[type]; +} diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h new file mode 100644 index 0000000..5a6a344 --- /dev/null +++ b/src/util/virresctrl.h @@ -0,0 +1,78 @@ +/* + * virresctrl.h: header for managing resctrl control + * + * Copyright (C) 2016 Intel, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Authors: + * Eli Qiao <liyong.qiao@intel.com (mailto:liyong.qiao@intel.com)> + */ + +#ifndef __VIR_RESCTRL_H__ +# define __VIR_RESCTRL_H__ + +# include "virbitmap.h" + +enum { + VIR_RDT_RESOURCE_L3, + VIR_RDT_RESOURCE_L3DATA, + VIR_RDT_RESOURCE_L3CODE, + VIR_RDT_RESOURCE_L2, + /* Must be the last */ + VIR_RDT_RESOURCE_LAST, +}; + + +typedef struct _virResCacheBank virResCacheBank; +typedef virResCacheBank *virResCacheBankPtr; +struct _virResCacheBank { + unsigned int host_id; + unsigned long long cache_size; + unsigned long long cache_left; + unsigned long long cache_min; + virBitmapPtr cpu_mask; +}; + +/** + * struct rdt_resource - attributes of an RDT resource + * @enabled: Is this feature enabled on this machine + * @name: Name to use in "schemata" file + * @num_closid: Number of CLOSIDs available + * @max_cbm: Largest Cache Bit Mask allowed + * @min_cbm_bits: Minimum number of consecutive bits to be set + * in a cache bit mask + * @cache_level: Which cache level defines scope of this domain + * @num_banks: Number of cache bank on this machine. + * @cache_banks: Array of cache bank + */ +typedef struct _virResCtrl virResCtrl; +typedef virResCtrl *virResCtrlPtr; +struct _virResCtrl { + bool enabled; + const char *name; + int num_closid; + int cbm_len; + int min_cbm_bits; + const char* cache_level; + int num_banks; + virResCacheBankPtr cache_banks; +}; + +bool virResCtrlAvailable(void); +int virResCtrlInit(void); +virResCtrlPtr virResCtrlGet(int); + +#endif -- 1.9.1
-- libvir-list mailing list libvir-list@redhat.com (mailto:libvir-list@redhat.com) https://www.redhat.com/mailman/listinfo/libvir-list

-- Best regards Eli 天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Thursday, 16 March 2017 at 3:52 PM, Eli Qiao wrote:
-- Best regards Eli
天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Wednesday, 15 March 2017 at 7:57 PM, Martin Kletzander wrote:
On Mon, Mar 06, 2017 at 06:06:30PM +0800, Eli Qiao wrote:
This patch adds some utils struct and functions to expose resctrl information.
virResCtrlAvailable: if resctrl interface exist on host. virResCtrlGet: get specific type resource control information. virResCtrlInit: initialize resctrl struct from the host's sys fs. resctrlall[]: an array to maintain resource control information.
Some of host cpu related information methods was added in virhostcpu.c
So to be able to test all this we need to make a bit different approach. I'm not in favour of pushing this without proper tests. Some paths need to be configurable, some readings should be unified. Unfortunately lot of the code is just copy-paste mess from the past. Fortunately for you,
I'm working on cleaning this up, at least a little bit, so that we can
Good news.
add the tests easily. I got almost up to the test (I stumbled upon few rabbit holes on the way and some clean-ups went wrong along the way), so it should be pretty easy for you to modify this code to use what I'll be proposing to add. It's not ready yet, but you can start rebasing your series on top of my branch pre-cat from my github repo [1]. The commits are not very well described right now (for some temporary ones I used whatthecommit.com (http://whatthecommit.com), sorry), but I'll fix all that. I'll be updating the branch, but it will be done with force pushes, so be careful when rebasing on top of newer versions.
I can do that if you don't want, just let me know so we can coordinate.
of cause we can do some coordinate, but I am glad that you can help on this to speed up the progress to merge them, as you know this patch is in V10 already, and it has 12 patch set, kinds of hard to doing rebase… :(
Just a quick heads-up, there will be virsysfs that will be used for the reads, some additional helper functions in virhostcpu and virfile, test that scans copy of /sys/devices/system (with that path faked thanks to using the aforementioned virsysfs) and outputs capabilities so that we can check the capability XML and so on.
Ah, that’s a good news..
Martin
hi Martin So, if I understand you correctly , you want all my patch set to rebased on top of pre-cat branch [1] , I checked that the last commit is 15th March, I wonder if that ’s ready to merged into master? so that I can start doing the rebasing Thx Eli.
Signed-off-by: Eli Qiao <liyong.qiao@intel.com (mailto:liyong.qiao@intel.com)> --- include/libvirt/virterror.h | 1 + po/POTFILES.in (http://POTFILES.in) | 1 + src/Makefile.am (http://Makefile.am) | 1 + src/libvirt_private.syms | 4 + src/util/virerror.c | 1 + src/util/virhostcpu.c | 186 ++++++++++++++++++++++++++++++++++++---- src/util/virhostcpu.h | 6 ++ src/util/virresctrl.c | 201 ++++++++++++++++++++++++++++++++++++++++++++ src/util/virresctrl.h | 78 +++++++++++++++++ 9 files changed, 462 insertions(+), 17 deletions(-) create mode 100644 src/util/virresctrl.c create mode 100644 src/util/virresctrl.h
diff --git a/include/libvirt/virterror.h b/include/libvirt/virterror.h index 2efee8f..3dd2d08 100644 --- a/include/libvirt/virterror.h +++ b/include/libvirt/virterror.h @@ -132,6 +132,7 @@ typedef enum {
VIR_FROM_PERF = 65, /* Error from perf */ VIR_FROM_LIBSSH = 66, /* Error from libssh connection transport */ + VIR_FROM_RESCTRL = 67, /* Error from resource control */
# ifdef VIR_ENUM_SENTINELS VIR_ERR_DOMAIN_LAST diff --git a/po/POTFILES.in (http://POTFILES.in) b/po/POTFILES.in (http://POTFILES.in) index 7c7f530..4147bc6 100644 --- a/po/POTFILES.in (http://POTFILES.in) +++ b/po/POTFILES.in (http://POTFILES.in) @@ -241,6 +241,7 @@ src/util/virportallocator.c src/util/virprocess.c src/util/virqemu.c src/util/virrandom.c +src/util/virresctrl.c src/util/virrotatingfile.c src/util/virscsi.c src/util/virscsihost.c diff --git a/src/Makefile.am (http://Makefile.am) b/src/Makefile.am (http://Makefile.am) index 7d42eac..edb946a 100644 --- a/src/Makefile.am (http://Makefile.am) +++ b/src/Makefile.am (http://Makefile.am) @@ -162,6 +162,7 @@ UTIL_SOURCES = \ util/virprocess.c util/virprocess.h \ util/virqemu.c util/virqemu.h \ util/virrandom.h util/virrandom.c \ + util/virresctrl.h util/virresctrl.c \ util/virrotatingfile.h util/virrotatingfile.c \ util/virscsi.c util/virscsi.h \ util/virscsihost.c util/virscsihost.h \ diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index aed1d3d..bb7c3ad 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2320,6 +2320,10 @@ virRandomGenerateWWN; virRandomInt;
+# util/virresctrl.h +virResCtrlAvailable; +virResCtrlInit; + # util/virrotatingfile.h virRotatingFileReaderConsume; virRotatingFileReaderFree; diff --git a/src/util/virerror.c b/src/util/virerror.c index ef17fb5..0ba15e6 100644 --- a/src/util/virerror.c +++ b/src/util/virerror.c @@ -139,6 +139,7 @@ VIR_ENUM_IMPL(virErrorDomain, VIR_ERR_DOMAIN_LAST,
"Perf", /* 65 */ "Libssh transport layer", + "Resouce Control", )
diff --git a/src/util/virhostcpu.c b/src/util/virhostcpu.c index f29f312..e6d5102 100644 --- a/src/util/virhostcpu.c +++ b/src/util/virhostcpu.c @@ -206,29 +206,21 @@ void virHostCPUSetSysFSSystemPathLinux(const char *path) sysfs_system_path = SYSFS_SYSTEM_PATH; }
-/* Return the positive decimal contents of the given - * DIR/cpu%u/FILE, or -1 on error. If DEFAULT_VALUE is non-negative - * and the file could not be found, return that instead of an error; - * this is useful for machines that cannot hot-unplug cpu0, or where - * hot-unplugging is disabled, or where the kernel is too old - * to support NUMA cells, etc. */ +/* Get a String value*/ static int -virHostCPUGetValue(const char *dir, unsigned int cpu, const char *file, - int default_value) +virHostCPUGetStrValue(const char *dir, unsigned int cpu, const char *file, char *value_str) { char *path; FILE *pathfp; - int value = -1; - char value_str[INT_BUFSIZE_BOUND(value)]; - char *tmp; + int ret = -1;
if (virAsprintf(&path, "%s/cpu%u/%s", dir, cpu, file) < 0) return -1;
pathfp = fopen(path, "r"); if (pathfp == NULL) { - if (default_value >= 0 && errno == ENOENT) - value = default_value; + if (errno == ENOENT) + return -2; else virReportSystemError(errno, _("cannot open %s"), path); goto cleanup; @@ -238,17 +230,84 @@ virHostCPUGetValue(const char *dir, unsigned int cpu, const char *file, virReportSystemError(errno, _("cannot read from %s"), path); goto cleanup; } + + ret = 0; + + cleanup: + VIR_FORCE_FCLOSE(pathfp); + VIR_FREE(path); + return ret; +} + + +/* Return the positive decimal contents of the given + * DIR/cpu%u/FILE, or -1 on error. If DEFAULT_VALUE is non-negative + * and the file could not be found, return that instead of an error; + * this is useful for machines that cannot hot-unplug cpu0, or where + * hot-unplugging is disabled, or where the kernel is too old + * to support NUMA cells, etc. */ +static int +virHostCPUGetValue(const char *dir, unsigned int cpu, const char *file, + int default_value) +{ + int value = -1; + char value_str[INT_BUFSIZE_BOUND(value)]; + char *tmp; + int ret; + + if ((ret = (virHostCPUGetStrValue(dir, cpu, file, value_str))) < 0) { + if (ret == -2 && default_value >= 0) + return default_value; + else + return -1; + } + if (virStrToLong_i(value_str, &tmp, 10, &value) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("could not convert '%s' to an integer"), value_str); - goto cleanup; + return -1; } + return value; +}
- cleanup: - VIR_FORCE_FCLOSE(pathfp); - VIR_FREE(path); +/* Return specific type cache size in KiB of given cpu + -1 on error happened */ +static +int virHostCPUGetCache(unsigned int cpu, unsigned int type) +{ + char *cachedir = NULL; + char *cpudir; + char *unit = NULL; + char *tmp; + int value = -1; + unsigned long long size; + char value_str[INT_BUFSIZE_BOUND(value)];
+ if (virAsprintf(&cpudir, "%s/cpu", sysfs_system_path) < 0) + return -1; + + if (virAsprintf(&cachedir, "cache/index%u/size", type) < 0) + goto error; + + if (virHostCPUGetStrValue(cpudir, cpu, cachedir, value_str) < 0) + goto error; + + if ((tmp = strchr(value_str, '\n'))) *tmp = '\0'; + + if (virStrToLong_i(value_str, &unit, 10, &value) < 0) + goto error; + + size = value; + + if (virScaleInteger(&size, unit, 1, ULLONG_MAX) < 0) + goto error; + + return size / 1024; + + error: + VIR_FREE(cpudir); + VIR_FREE(cachedir); return value; }
@@ -301,6 +360,23 @@ virHostCPUParseSocket(const char *dir, return ret; }
+/* return socket id of a given cpu*/ +static +int virHostCPUGetSocketId(virArch hostarch, unsigned int cpu) +{ + char *cpu_dir; + int ret = -1; + + if (virAsprintf(&cpu_dir, "%s/cpu", sysfs_system_path) < 0) + goto cleanup; + + ret = virHostCPUParseSocket(cpu_dir, hostarch, cpu); + + cleanup: + VIR_FREE(cpu_dir); + return ret; +} + /* parses a node entry, returning number of processors in the node and * filling arguments */ static int @@ -1346,3 +1422,79 @@ virHostCPUGetKVMMaxVCPUs(void) return -1; } #endif /* HAVE_LINUX_KVM_H */ + +/* Fill all cache bank informations + * Return a list of virResCacheBankPtr, and fill cache bank information + * by loop for all cpus on host, number of cache bank will be set in nbanks + * + * NULL if error happened, and nbanks will be set 0. */ +virResCacheBankPtr virHostCPUGetCacheBanks(virArch arch, int type, size_t *nbanks, int cbm_len) +{ + int npresent_cpus; + int idx; + size_t i; + virResCacheBankPtr bank; + + *nbanks = 0; + if ((npresent_cpus = virHostCPUGetCount()) < 0) + return NULL; + + switch (type) { + case VIR_RDT_RESOURCE_L3: + case VIR_RDT_RESOURCE_L3DATA: + case VIR_RDT_RESOURCE_L3CODE: + idx = 3; + break; + case VIR_RDT_RESOURCE_L2: + idx = 2; + break; + default: + idx = -1; + } + + if (idx == -1) + return NULL; + + if (VIR_ALLOC_N(bank, 1) < 0) + return NULL; + + *nbanks = 1; + + for (i = 0; i < npresent_cpus; i ++) { + int s_id; + int cache_size; + + if ((s_id = virHostCPUGetSocketId(arch, i)) < 0) + goto error; + + /* Expand cache bank array */ + if (s_id > (*nbanks - 1)) { + size_t cur = *nbanks; + size_t exp = s_id - (*nbanks) + 1; + if (VIR_EXPAND_N(bank, cur, exp) < 0) + goto error; + *nbanks = s_id + 1; + } + + if (bank[s_id].cpu_mask == NULL) { + if (!(bank[s_id].cpu_mask = virBitmapNew(npresent_cpus))) + goto error; + } + + ignore_value(virBitmapSetBit(bank[s_id].cpu_mask, i)); + + if (bank[s_id].cache_size == 0) { + if ((cache_size = virHostCPUGetCache(i, idx)) < 0) + goto error; + + bank[s_id].cache_size = cache_size; + bank[s_id].cache_min = cache_size / cbm_len; + } + } + return bank; + + error: + *nbanks = 0; + VIR_FREE(bank); + return NULL; +} diff --git a/src/util/virhostcpu.h b/src/util/virhostcpu.h index 39f7cf8..27f208e 100644 --- a/src/util/virhostcpu.h +++ b/src/util/virhostcpu.h @@ -27,6 +27,7 @@ # include "internal.h" # include "virarch.h" # include "virbitmap.h" +# include "virresctrl.h"
# define VIR_HOST_CPU_MASK_LEN 1024
@@ -58,4 +59,9 @@ int virHostCPUStatsAssign(virNodeCPUStatsPtr param, const char *name, unsigned long long value);
+virResCacheBankPtr virHostCPUGetCacheBanks(virArch arch, + int type, + size_t *nbanks, + int cbm_len); + #endif /* __VIR_HOSTCPU_H__*/ diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c new file mode 100644 index 0000000..44a47cc --- /dev/null +++ b/src/util/virresctrl.c @@ -0,0 +1,201 @@ +/* + * virresctrl.c: methods for managing resource control + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Authors: + * Eli Qiao <liyong.qiao@intel.com (mailto:liyong.qiao@intel.com)> + */ +#include <config.h> + +#include "virresctrl.h" +#include "viralloc.h" +#include "virerror.h" +#include "virfile.h" +#include "virhostcpu.h" +#include "virlog.h" +#include "virstring.h" +#include "virarch.h" + +VIR_LOG_INIT("util.resctrl"); + +#define VIR_FROM_THIS VIR_FROM_RESCTRL + +#define RESCTRL_DIR "/sys/fs/resctrl" +#define RESCTRL_INFO_DIR "/sys/fs/resctrl/info" +#define SYSFS_SYSTEM_PATH "/sys/devices/system" + +#define MAX_CPU_SOCKET_NUM 8 +#define MAX_CBM_BIT_LEN 32 +#define MAX_SCHEMATA_LEN 1024 +#define MAX_FILE_LEN (10 * 1024 * 1024) + +static unsigned int host_id; + +static virResCtrl resctrlall[] = { + { + .name = "L3", + .cache_level = "l3", + }, + { + .name = "L3DATA", + .cache_level = "l3", + }, + { + .name = "L3CODE", + .cache_level = "l3", + }, + { + .name = "L2", + .cache_level = "l2", + }, +}; + +static int virResCtrlGetInfoStr(const int type, const char *item, char **str) +{ + int ret = 0; + char *tmp; + char *path; + + if (virAsprintf(&path, "%s/%s/%s", RESCTRL_INFO_DIR, resctrlall[type].name, item) < 0) + return -1; + if (virFileReadAll(path, 10, str) < 0) { + ret = -1; + goto cleanup; + } + + if ((tmp = strchr(*str, '\n'))) *tmp = '\0'; + + cleanup: + VIR_FREE(path); + return ret; +} + +static int virResCtrlReadConfig(virArch arch, int type) +{ + int ret; + size_t i, nbanks; + char *str; + + /* Read num_closids from resctrl. + eg: /sys/fs/resctrl/info/L3/num_closids */ + if ((ret = virResCtrlGetInfoStr(type, "num_closids", &str)) < 0) + goto error; + + if ((ret = virStrToLong_i(str, NULL, 10, &resctrlall[type].num_closid)) < 0) + goto error; + + VIR_FREE(str); + + /* Read min_cbm_bits from resctrl. + eg: /sys/fs/resctrl/info/L3/cbm_mask */ + if ((ret = virResCtrlGetInfoStr(type, "min_cbm_bits", &str)) < 0) + goto error; + + if ((ret = virStrToLong_i(str, NULL, 10, &resctrlall[type].min_cbm_bits)) < 0) + goto error; + + VIR_FREE(str); + + /* Read cbm_mask string from resctrl. + eg: /sys/fs/resctrl/info/L3/cbm_mask */ + if ((ret = virResCtrlGetInfoStr(type, "cbm_mask", &str)) < 0) + goto error; + + /* cbm_mask is in hex, eg: "fffff", calculate cbm length from the default + cbm_mask. */ + resctrlall[type].cbm_len = strlen(str) * 4; + + /* Get all cache bank informations */ + resctrlall[type].cache_banks = virHostCPUGetCacheBanks(arch, + type, + &nbanks, resctrlall[type].cbm_len); + + if (resctrlall[type].cache_banks == NULL) + goto error; + + resctrlall[type].num_banks = nbanks; + + for (i = 0; i < resctrlall[type].num_banks; i++) { + /* L3CODE and L3DATA shares same L3 resource, so they should + * have same host_id. */ + if (type == VIR_RDT_RESOURCE_L3CODE) + resctrlall[type].cache_banks[i].host_id = resctrlall[VIR_RDT_RESOURCE_L3DATA].cache_banks[i].host_id; + else + resctrlall[type].cache_banks[i].host_id = host_id++; + } + + resctrlall[type].enabled = true; + + ret = 0; + + error: + VIR_FREE(str); + return ret; +} + +int +virResCtrlInit(void) +{ + size_t i = 0; + char *tmp; + int rc = 0; + + virArch hostarch; + + hostarch = virArchFromHost(); + + for (i = 0; i < VIR_RDT_RESOURCE_LAST; i++) { + if ((rc = virAsprintf(&tmp, "%s/%s", RESCTRL_INFO_DIR, resctrlall[i].name)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to initialize resource control config")); + goto cleanup; + } + + if (virFileExists(tmp)) { + if ((rc = virResCtrlReadConfig(hostarch, i)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to get resource control config")); + goto cleanup; + } + } + VIR_FREE(tmp); + } + + cleanup: + VIR_FREE(tmp); + return rc; +} + +/* + * Test whether the host support resource control + */ +bool +virResCtrlAvailable(void) +{ + if (!virFileExists(RESCTRL_INFO_DIR)) + return false; + return true; +} + +/* + * Return an virResCtrlPtr point to virResCtrl object, + * We should not modify it out side of virresctrl.c + */ +virResCtrlPtr +virResCtrlGet(int type) +{ + return &resctrlall[type]; +} diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h new file mode 100644 index 0000000..5a6a344 --- /dev/null +++ b/src/util/virresctrl.h @@ -0,0 +1,78 @@ +/* + * virresctrl.h: header for managing resctrl control + * + * Copyright (C) 2016 Intel, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Authors: + * Eli Qiao <liyong.qiao@intel.com (mailto:liyong.qiao@intel.com)> + */ + +#ifndef __VIR_RESCTRL_H__ +# define __VIR_RESCTRL_H__ + +# include "virbitmap.h" + +enum { + VIR_RDT_RESOURCE_L3, + VIR_RDT_RESOURCE_L3DATA, + VIR_RDT_RESOURCE_L3CODE, + VIR_RDT_RESOURCE_L2, + /* Must be the last */ + VIR_RDT_RESOURCE_LAST, +}; + + +typedef struct _virResCacheBank virResCacheBank; +typedef virResCacheBank *virResCacheBankPtr; +struct _virResCacheBank { + unsigned int host_id; + unsigned long long cache_size; + unsigned long long cache_left; + unsigned long long cache_min; + virBitmapPtr cpu_mask; +}; + +/** + * struct rdt_resource - attributes of an RDT resource + * @enabled: Is this feature enabled on this machine + * @name: Name to use in "schemata" file + * @num_closid: Number of CLOSIDs available + * @max_cbm: Largest Cache Bit Mask allowed + * @min_cbm_bits: Minimum number of consecutive bits to be set + * in a cache bit mask + * @cache_level: Which cache level defines scope of this domain + * @num_banks: Number of cache bank on this machine. + * @cache_banks: Array of cache bank + */ +typedef struct _virResCtrl virResCtrl; +typedef virResCtrl *virResCtrlPtr; +struct _virResCtrl { + bool enabled; + const char *name; + int num_closid; + int cbm_len; + int min_cbm_bits; + const char* cache_level; + int num_banks; + virResCacheBankPtr cache_banks; +}; + +bool virResCtrlAvailable(void); +int virResCtrlInit(void); +virResCtrlPtr virResCtrlGet(int); + +#endif -- 1.9.1
-- libvir-list mailing list libvir-list@redhat.com (mailto:libvir-list@redhat.com) https://www.redhat.com/mailman/listinfo/libvir-list

On Fri, Mar 24, 2017 at 09:35:33AM +0800, Eli Qiao wrote:
-- Best regards Eli
天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Thursday, 16 March 2017 at 3:52 PM, Eli Qiao wrote:
-- Best regards Eli
天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Wednesday, 15 March 2017 at 7:57 PM, Martin Kletzander wrote:
On Mon, Mar 06, 2017 at 06:06:30PM +0800, Eli Qiao wrote:
This patch adds some utils struct and functions to expose resctrl information.
virResCtrlAvailable: if resctrl interface exist on host. virResCtrlGet: get specific type resource control information. virResCtrlInit: initialize resctrl struct from the host's sys fs. resctrlall[]: an array to maintain resource control information.
Some of host cpu related information methods was added in virhostcpu.c
So to be able to test all this we need to make a bit different approach. I'm not in favour of pushing this without proper tests. Some paths need to be configurable, some readings should be unified. Unfortunately lot of the code is just copy-paste mess from the past. Fortunately for you,
I'm working on cleaning this up, at least a little bit, so that we can
Good news.
add the tests easily. I got almost up to the test (I stumbled upon few rabbit holes on the way and some clean-ups went wrong along the way), so it should be pretty easy for you to modify this code to use what I'll be proposing to add. It's not ready yet, but you can start rebasing your series on top of my branch pre-cat from my github repo [1]. The commits are not very well described right now (for some temporary ones I used whatthecommit.com (http://whatthecommit.com), sorry), but I'll fix all that. I'll be updating the branch, but it will be done with force pushes, so be careful when rebasing on top of newer versions.
I can do that if you don't want, just let me know so we can coordinate.
of cause we can do some coordinate, but I am glad that you can help on this to speed up the progress to merge them, as you know this patch is in V10 already, and it has 12 patch set, kinds of hard to doing rebase… :(
Just a quick heads-up, there will be virsysfs that will be used for the reads, some additional helper functions in virhostcpu and virfile, test that scans copy of /sys/devices/system (with that path faked thanks to using the aforementioned virsysfs) and outputs capabilities so that we can check the capability XML and so on.
Ah, that’s a good news..
Martin
hi Martin
So, if I understand you correctly , you want all my patch set to rebased on top of pre-cat branch [1] , I checked that the last commit is 15th March, I wonder if that ’s ready to merged into master? so that I can start doing the rebasing
I forgot to do the usual push, it's updated now. The test fails in one occasion, but it's the code's fault, the test is fine. That's the last thing I'm looking at now, after that I'll send it to the list. Look at the changes and see what you can use, it will help simplifying your code a lot, I thing. You can start rebasing on top of that, I'll do that as well after it's posted and I'll be either using and modifying your patches or maybe doing some myself. Martin

hi Martin (cc libvir-list) I am a little confused about cat support. I am currently rebasing my code on top of pre-cat branch from your private github repo, today when I check it you have removed it and create a cat branch and there are some related code pushed[1], can I know what ’s your plan for my patch set for CAT support ? should I continue my rebasing work? your though? [1] https://github.com/nertpinx/libvirt/commit/c335de47a4efeca87f23e641a93587b1e... Thanks Eli. -- Best regards Eli 天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Friday, 24 March 2017 at 3:42 PM, Martin Kletzander wrote:
On Fri, Mar 24, 2017 at 09:35:33AM +0800, Eli Qiao wrote:
-- Best regards Eli
天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Thursday, 16 March 2017 at 3:52 PM, Eli Qiao wrote:
-- Best regards Eli
天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Wednesday, 15 March 2017 at 7:57 PM, Martin Kletzander wrote:
On Mon, Mar 06, 2017 at 06:06:30PM +0800, Eli Qiao wrote:
This patch adds some utils struct and functions to expose resctrl information.
virResCtrlAvailable: if resctrl interface exist on host. virResCtrlGet: get specific type resource control information. virResCtrlInit: initialize resctrl struct from the host's sys fs. resctrlall[]: an array to maintain resource control information.
Some of host cpu related information methods was added in virhostcpu.c
So to be able to test all this we need to make a bit different approach. I'm not in favour of pushing this without proper tests. Some paths need to be configurable, some readings should be unified. Unfortunately lot of the code is just copy-paste mess from the past. Fortunately for you,
I'm working on cleaning this up, at least a little bit, so that we can
Good news.
add the tests easily. I got almost up to the test (I stumbled upon few rabbit holes on the way and some clean-ups went wrong along the way), so it should be pretty easy for you to modify this code to use what I'll be proposing to add. It's not ready yet, but you can start rebasing your series on top of my branch pre-cat from my github repo [1]. The commits are not very well described right now (for some temporary ones I used whatthecommit.com (http://whatthecommit.com), sorry), but I'll fix all that. I'll be updating the branch, but it will be done with force pushes, so be careful when rebasing on top of newer versions.
I can do that if you don't want, just let me know so we can coordinate. of cause we can do some coordinate, but I am glad that you can help on this to speed up the progress to merge them, as you know this patch is in V10 already, and it has 12 patch set, kinds of hard to doing rebase… :(
Just a quick heads-up, there will be virsysfs that will be used for the reads, some additional helper functions in virhostcpu and virfile, test that scans copy of /sys/devices/system (with that path faked thanks to using the aforementioned virsysfs) and outputs capabilities so that we can check the capability XML and so on.
Ah, that’s a good news..
Martin
hi Martin
So, if I understand you correctly , you want all my patch set to rebased on top of pre-cat branch [1] , I checked that the last commit is 15th March, I wonder if that ’s ready to merged into master? so that I can start doing the rebasing
I forgot to do the usual push, it's updated now. The test fails in one occasion, but it's the code's fault, the test is fine. That's the last thing I'm looking at now, after that I'll send it to the list.
Look at the changes and see what you can use, it will help simplifying your code a lot, I thing. You can start rebasing on top of that, I'll do that as well after it's posted and I'll be either using and modifying your patches or maybe doing some myself.
Martin

On Tue, Mar 28, 2017 at 03:22:34PM +0800, Eli Qiao wrote:
hi Martin
(cc libvir-list)
I am a little confused about cat support.
I am currently rebasing my code on top of pre-cat branch from your private github repo, today when I check it you have removed it and create a cat branch and there are some related code pushed[1], can I know what ’s your plan for my patch set for CAT support ? should I continue my rebasing work? your though?
So we can work together on that. Since the rework of the sysfs functions, some patches are easier to write from scratch then rewrite, but I'm now just trying to setup the test suite, so that we have something to test on, at least some of the code. So where are you in the rebase right now? Do you think anything from the virsysfs.c code could be enhanced?
[1] https://github.com/nertpinx/libvirt/commit/c335de47a4efeca87f23e641a93587b1e...
Thanks Eli.
-- Best regards Eli
天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Friday, 24 March 2017 at 3:42 PM, Martin Kletzander wrote:
On Fri, Mar 24, 2017 at 09:35:33AM +0800, Eli Qiao wrote:
-- Best regards Eli
天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Thursday, 16 March 2017 at 3:52 PM, Eli Qiao wrote:
-- Best regards Eli
天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Wednesday, 15 March 2017 at 7:57 PM, Martin Kletzander wrote:
On Mon, Mar 06, 2017 at 06:06:30PM +0800, Eli Qiao wrote:
This patch adds some utils struct and functions to expose resctrl information.
virResCtrlAvailable: if resctrl interface exist on host. virResCtrlGet: get specific type resource control information. virResCtrlInit: initialize resctrl struct from the host's sys fs. resctrlall[]: an array to maintain resource control information.
Some of host cpu related information methods was added in virhostcpu.c
So to be able to test all this we need to make a bit different approach. I'm not in favour of pushing this without proper tests. Some paths need to be configurable, some readings should be unified. Unfortunately lot of the code is just copy-paste mess from the past. Fortunately for you,
I'm working on cleaning this up, at least a little bit, so that we can
Good news.
add the tests easily. I got almost up to the test (I stumbled upon few rabbit holes on the way and some clean-ups went wrong along the way), so it should be pretty easy for you to modify this code to use what I'll be proposing to add. It's not ready yet, but you can start rebasing your series on top of my branch pre-cat from my github repo [1]. The commits are not very well described right now (for some temporary ones I used whatthecommit.com (http://whatthecommit.com), sorry), but I'll fix all that. I'll be updating the branch, but it will be done with force pushes, so be careful when rebasing on top of newer versions.
I can do that if you don't want, just let me know so we can coordinate. of cause we can do some coordinate, but I am glad that you can help on this to speed up the progress to merge them, as you know this patch is in V10 already, and it has 12 patch set, kinds of hard to doing rebase… :(
Just a quick heads-up, there will be virsysfs that will be used for the reads, some additional helper functions in virhostcpu and virfile, test that scans copy of /sys/devices/system (with that path faked thanks to using the aforementioned virsysfs) and outputs capabilities so that we can check the capability XML and so on.
Ah, that’s a good news..
Martin
hi Martin
So, if I understand you correctly , you want all my patch set to rebased on top of pre-cat branch [1] , I checked that the last commit is 15th March, I wonder if that ’s ready to merged into master? so that I can start doing the rebasing
I forgot to do the usual push, it's updated now. The test fails in one occasion, but it's the code's fault, the test is fine. That's the last thing I'm looking at now, after that I'll send it to the list.
Look at the changes and see what you can use, it will help simplifying your code a lot, I thing. You can start rebasing on top of that, I'll do that as well after it's posted and I'll be either using and modifying your patches or maybe doing some myself.
Martin

-- Best regards Eli 天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Wednesday, 29 March 2017 at 3:45 PM, Martin Kletzander wrote:
On Tue, Mar 28, 2017 at 03:22:34PM +0800, Eli Qiao wrote:
hi Martin
(cc libvir-list)
I am a little confused about cat support.
I am currently rebasing my code on top of pre-cat branch from your private github repo, today when I check it you have removed it and create a cat branch and there are some related code pushed[1], can I know what ’s your plan for my patch set for CAT support ? should I continue my rebasing work? your though?
So we can work together on that. Since the rework of the sysfs functions, some patches are easier to write from scratch then rewrite, but I'm now just trying to setup the test suite, so that we have something to test on, at least some of the code. So where are you in the rebase right now? Do you think anything from the virsysfs.c code could be enhanced?
Not so fast, only the first patch [1], I found that nodeinfo.c is removed :( I think we need to extend virResCtrlGetInfoStr and virResCtrlGetInfoUint to virsysfs.c thought ? [1]https://github.com/taget/libvirt/commits/cdp_v11
[1] https://github.com/nertpinx/libvirt/commit/c335de47a4efeca87f23e641a93587b1e...
Thanks Eli.
-- Best regards Eli
天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Friday, 24 March 2017 at 3:42 PM, Martin Kletzander wrote:
On Fri, Mar 24, 2017 at 09:35:33AM +0800, Eli Qiao wrote:
-- Best regards Eli
天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Thursday, 16 March 2017 at 3:52 PM, Eli Qiao wrote:
-- Best regards Eli
天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Wednesday, 15 March 2017 at 7:57 PM, Martin Kletzander wrote:
On Mon, Mar 06, 2017 at 06:06:30PM +0800, Eli Qiao wrote: > This patch adds some utils struct and functions to expose resctrl > information. > > virResCtrlAvailable: if resctrl interface exist on host. > virResCtrlGet: get specific type resource control information. > virResCtrlInit: initialize resctrl struct from the host's sys fs. > resctrlall[]: an array to maintain resource control information. > > Some of host cpu related information methods was added in virhostcpu.c
So to be able to test all this we need to make a bit different approach. I'm not in favour of pushing this without proper tests. Some paths need to be configurable, some readings should be unified. Unfortunately lot of the code is just copy-paste mess from the past. Fortunately for you,
I'm working on cleaning this up, at least a little bit, so that we can
Good news.
add the tests easily. I got almost up to the test (I stumbled upon few rabbit holes on the way and some clean-ups went wrong along the way), so it should be pretty easy for you to modify this code to use what I'll be proposing to add. It's not ready yet, but you can start rebasing your series on top of my branch pre-cat from my github repo [1]. The commits are not very well described right now (for some temporary ones I used whatthecommit.com (http://whatthecommit.com), sorry), but I'll fix all that. I'll be updating the branch, but it will be done with force pushes, so be careful when rebasing on top of newer versions.
I can do that if you don't want, just let me know so we can coordinate. of cause we can do some coordinate, but I am glad that you can help on this to speed up the progress to merge them, as you know this patch is in V10 already, and it has 12 patch set, kinds of hard to doing rebase… :(
Just a quick heads-up, there will be virsysfs that will be used for the reads, some additional helper functions in virhostcpu and virfile, test that scans copy of /sys/devices/system (with that path faked thanks to using the aforementioned virsysfs) and outputs capabilities so that we can check the capability XML and so on.
Ah, that’s a good news..
Martin
hi Martin
So, if I understand you correctly , you want all my patch set to rebased on top of pre-cat branch [1] , I checked that the last commit is 15th March, I wonder if that ’s ready to merged into master? so that I can start doing the rebasing
I forgot to do the usual push, it's updated now. The test fails in one occasion, but it's the code's fault, the test is fine. That's the last thing I'm looking at now, after that I'll send it to the list.
Look at the changes and see what you can use, it will help simplifying your code a lot, I thing. You can start rebasing on top of that, I'll do that as well after it's posted and I'll be either using and modifying your patches or maybe doing some myself.
Martin

On Wed, Mar 29, 2017 at 04:22:16PM +0800, Eli Qiao wrote:
-- Best regards Eli
天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Wednesday, 29 March 2017 at 3:45 PM, Martin Kletzander wrote:
On Tue, Mar 28, 2017 at 03:22:34PM +0800, Eli Qiao wrote:
hi Martin
(cc libvir-list)
I am a little confused about cat support.
I am currently rebasing my code on top of pre-cat branch from your private github repo, today when I check it you have removed it and create a cat branch and there are some related code pushed[1], can I know what ’s your plan for my patch set for CAT support ? should I continue my rebasing work? your though?
So we can work together on that. Since the rework of the sysfs functions, some patches are easier to write from scratch then rewrite, but I'm now just trying to setup the test suite, so that we have something to test on, at least some of the code. So where are you in the rebase right now? Do you think anything from the virsysfs.c code could be enhanced?
Not so fast, only the first patch [1], I found that nodeinfo.c is removed :(
I think we need to extend virResCtrlGetInfoStr and virResCtrlGetInfoUint to virsysfs.c
thought ?
Yeah, we should wrap around /sys/fs/resctrl as we do with /sys/devices/system so that it can be easily tested. Also I got another idea about keeping the resource info. There is no need for any global data to be stored as you are re-reading almost all of it. The only info that stays the same is caches (that is already saved in capabilities) and what caches are available for resource control (that will be there as well). So I don't think we need yet another global data storage.

-- Best regards Eli 天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Wednesday, 29 March 2017 at 5:19 PM, Martin Kletzander wrote:
On Wed, Mar 29, 2017 at 04:22:16PM +0800, Eli Qiao wrote:
-- Best regards Eli
天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Wednesday, 29 March 2017 at 3:45 PM, Martin Kletzander wrote:
On Tue, Mar 28, 2017 at 03:22:34PM +0800, Eli Qiao wrote:
hi Martin
(cc libvir-list)
I am a little confused about cat support.
I am currently rebasing my code on top of pre-cat branch from your private github repo, today when I check it you have removed it and create a cat branch and there are some related code pushed[1], can I know what ’s your plan for my patch set for CAT support ? should I continue my rebasing work? your though?
So we can work together on that. Since the rework of the sysfs functions, some patches are easier to write from scratch then rewrite, but I'm now just trying to setup the test suite, so that we have something to test on, at least some of the code. So where are you in the rebase right now? Do you think anything from the virsysfs.c code could be enhanced?
Not so fast, only the first patch [1], I found that nodeinfo.c is removed :(
I think we need to extend virResCtrlGetInfoStr and virResCtrlGetInfoUint to virsysfs.c
thought ?
Yeah, we should wrap around /sys/fs/resctrl as we do with /sys/devices/system so that it can be easily tested.
Sure, working on it, and done, will push it for review. Also I will push some fake data for resctrl testing..
Also I got another idea about keeping the resource info. There is no need for any global data to be stored as you are re-reading almost all of it. The only info that stays the same is caches (that is already saved in capabilities) and what caches are available for resource control (that will be there as well). So I don't think we need yet another global data storage.
Do you mean, we re-create all struct (reading them from /sys/fs/resctrl) when we create/destroy instance? also, for get free cache ? This is what I did in my early PoC, that will much easier… but please keep in mind that only one thread can read/write to /sys/fs/resctrl at one time. the neck bottle is /sys/fs/resctrl

On Wed, Mar 29, 2017 at 05:31:42PM +0800, Eli Qiao wrote:
-- Best regards Eli
天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
OK, please do something with your client. Having the footer here on top for every reply is *sooooo* bothersome when you are replying inline (that part is fine).
On Wednesday, 29 March 2017 at 5:19 PM, Martin Kletzander wrote:
On Wed, Mar 29, 2017 at 04:22:16PM +0800, Eli Qiao wrote:
-- Best regards Eli
天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Wednesday, 29 March 2017 at 3:45 PM, Martin Kletzander wrote:
On Tue, Mar 28, 2017 at 03:22:34PM +0800, Eli Qiao wrote:
hi Martin
(cc libvir-list)
I am a little confused about cat support.
I am currently rebasing my code on top of pre-cat branch from your private github repo, today when I check it you have removed it and create a cat branch and there are some related code pushed[1], can I know what ’s your plan for my patch set for CAT support ? should I continue my rebasing work? your though?
So we can work together on that. Since the rework of the sysfs functions, some patches are easier to write from scratch then rewrite, but I'm now just trying to setup the test suite, so that we have something to test on, at least some of the code. So where are you in the rebase right now? Do you think anything from the virsysfs.c code could be enhanced?
Not so fast, only the first patch [1], I found that nodeinfo.c is removed :(
I think we need to extend virResCtrlGetInfoStr and virResCtrlGetInfoUint to virsysfs.c
thought ?
Yeah, we should wrap around /sys/fs/resctrl as we do with /sys/devices/system so that it can be easily tested.
Sure, working on it, and done, will push it for review.
Also I will push some fake data for resctrl testing..
Also I got another idea about keeping the resource info. There is no need for any global data to be stored as you are re-reading almost all of it. The only info that stays the same is caches (that is already saved in capabilities) and what caches are available for resource control (that will be there as well). So I don't think we need yet another global data storage.
Do you mean, we re-create all struct (reading them from /sys/fs/resctrl) when we create/destroy instance? also, for get free cache ?
You have to update that for every request anyway, so what's the point of keeping the data when they immediately become old?
This is what I did in my early PoC, that will much easier… but please keep in mind that only one thread can read/write to /sys/fs/resctrl at one time.
Yeah, that's what we have locks for.
the neck bottle is /sys/fs/resctrl
Sure you mean bottleneck, right? :)

On Wednesday, 29 March 2017 at 5:47 PM, Martin Kletzander wrote:
On Wed, Mar 29, 2017 at 05:31:42PM +0800, Eli Qiao wrote:
-- Best regards Eli
天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
OK, please do something with your client. Having the footer here on top for every reply is *sooooo* bothersome when you are replying inline (that part is fine).
Sorry, I removed footer, better now?
On Wednesday, 29 March 2017 at 5:19 PM, Martin Kletzander wrote:
On Wed, Mar 29, 2017 at 04:22:16PM +0800, Eli Qiao wrote:
-- Best regards Eli
天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Wednesday, 29 March 2017 at 3:45 PM, Martin Kletzander wrote:
On Tue, Mar 28, 2017 at 03:22:34PM +0800, Eli Qiao wrote:
hi Martin
(cc libvir-list)
I am a little confused about cat support.
I am currently rebasing my code on top of pre-cat branch from your private github repo, today when I check it you have removed it and create a cat branch and there are some related code pushed[1], can I know what ’s your plan for my patch set for CAT support ? should I continue my rebasing work? your though?
So we can work together on that. Since the rework of the sysfs functions, some patches are easier to write from scratch then rewrite, but I'm now just trying to setup the test suite, so that we have something to test on, at least some of the code. So where are you in the rebase right now? Do you think anything from the virsysfs.c code could be enhanced?
Not so fast, only the first patch [1], I found that nodeinfo.c is removed :(
I think we need to extend virResCtrlGetInfoStr and virResCtrlGetInfoUint to virsysfs.c
thought ?
Yeah, we should wrap around /sys/fs/resctrl as we do with /sys/devices/system so that it can be easily tested.
Sure, working on it, and done, will push it for review.
Also I will push some fake data for resctrl testing..
Also I got another idea about keeping the resource info. There is no need for any global data to be stored as you are re-reading almost all of it. The only info that stays the same is caches (that is already saved in capabilities) and what caches are available for resource control (that will be there as well). So I don't think we need yet another global data storage.
Do you mean, we re-create all struct (reading them from /sys/fs/resctrl) when we create/destroy instance? also, for get free cache ?
You have to update that for every request anyway, so what's the point of keeping the data when they immediately become old?
I was thought that may reduce the time costing, not all of the content be refreshed, anyway, I will try to avoid global files in my later version. LoL lots of rebasing :( Thanks for your suggestion.
This is what I did in my early PoC, that will much easier… but please keep in mind that only one thread can read/write to /sys/fs/resctrl at one time.
Yeah, that's what we have locks for.
the neck bottle is /sys/fs/resctrl
Sure you mean bottleneck, right? :)
yes, bottleneck,

On Wed, Mar 29, 2017 at 05:59:47PM +0800, Eli Qiao wrote:
On Wednesday, 29 March 2017 at 5:47 PM, Martin Kletzander wrote:
On Wed, Mar 29, 2017 at 05:31:42PM +0800, Eli Qiao wrote:
-- Best regards Eli
天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
OK, please do something with your client. Having the footer here on top for every reply is *sooooo* bothersome when you are replying inline (that part is fine).
Sorry, I removed footer, better now?
Way better, thank you very much. I always didn't know whether to look for the rest of the message or not. And as you can see, after some replies, it's hard to read what was discussed:
On Wednesday, 29 March 2017 at 5:19 PM, Martin Kletzander wrote:
On Wed, Mar 29, 2017 at 04:22:16PM +0800, Eli Qiao wrote:
-- Best regards Eli
天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Wednesday, 29 March 2017 at 3:45 PM, Martin Kletzander wrote:
On Tue, Mar 28, 2017 at 03:22:34PM +0800, Eli Qiao wrote:
Look hoe much space it took ^^, and that's only once in there ;)
> hi Martin > > (cc libvir-list) > > I am a little confused about cat support. > > I am currently rebasing my code on top of pre-cat branch from your private github repo, today when I check it you have removed it and create a cat branch and there are some related code pushed[1], can I know what ’s your plan for my patch set for CAT support ? should I continue my rebasing work? your though?
So we can work together on that. Since the rework of the sysfs functions, some patches are easier to write from scratch then rewrite, but I'm now just trying to setup the test suite, so that we have something to test on, at least some of the code. So where are you in the rebase right now? Do you think anything from the virsysfs.c code could be enhanced?
Not so fast, only the first patch [1], I found that nodeinfo.c is removed :(
I think we need to extend virResCtrlGetInfoStr and virResCtrlGetInfoUint to virsysfs.c
thought ?
Yeah, we should wrap around /sys/fs/resctrl as we do with /sys/devices/system so that it can be easily tested.
Sure, working on it, and done, will push it for review.
Also I will push some fake data for resctrl testing..
Also I got another idea about keeping the resource info. There is no need for any global data to be stored as you are re-reading almost all of it. The only info that stays the same is caches (that is already saved in capabilities) and what caches are available for resource control (that will be there as well). So I don't think we need yet another global data storage.
Do you mean, we re-create all struct (reading them from /sys/fs/resctrl) when we create/destroy instance? also, for get free cache ?
You have to update that for every request anyway, so what's the point of keeping the data when they immediately become old?
I was thought that may reduce the time costing, not all of the content be refreshed, anyway, I will try to avoid global files in my later version.
Well, if there is something that does not need refreshing, then you might consider adding it to capabilities (if it is helpful to the user in any way).

On Mon, Mar 06, 2017 at 06:06:30PM +0800, Eli Qiao wrote:
This patch adds some utils struct and functions to expose resctrl information.
virResCtrlAvailable: if resctrl interface exist on host. virResCtrlGet: get specific type resource control information. virResCtrlInit: initialize resctrl struct from the host's sys fs. resctrlall[]: an array to maintain resource control information.
Some of host cpu related information methods was added in virhostcpu.c
Signed-off-by: Eli Qiao <liyong.qiao@intel.com> --- include/libvirt/virterror.h | 1 + po/POTFILES.in | 1 + src/Makefile.am | 1 + src/libvirt_private.syms | 4 + src/util/virerror.c | 1 + src/util/virhostcpu.c | 186 ++++++++++++++++++++++++++++++++++++---- src/util/virhostcpu.h | 6 ++ src/util/virresctrl.c | 201 ++++++++++++++++++++++++++++++++++++++++++++ src/util/virresctrl.h | 78 +++++++++++++++++ 9 files changed, 462 insertions(+), 17 deletions(-) create mode 100644 src/util/virresctrl.c create mode 100644 src/util/virresctrl.h
diff --git a/src/util/virerror.c b/src/util/virerror.c index ef17fb5..0ba15e6 100644 --- a/src/util/virerror.c +++ b/src/util/virerror.c @@ -139,6 +139,7 @@ VIR_ENUM_IMPL(virErrorDomain, VIR_ERR_DOMAIN_LAST,
"Perf", /* 65 */ "Libssh transport layer", + "Resouce Control",
s/resouce/resource/
diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c new file mode 100644 index 0000000..44a47cc --- /dev/null +++ b/src/util/virresctrl.c @@ -0,0 +1,201 @@
[...]
+ +static unsigned int host_id; + +static virResCtrl resctrlall[] = { + { + .name = "L3", + .cache_level = "l3", + }, + { + .name = "L3DATA", + .cache_level = "l3", + }, + { + .name = "L3CODE", + .cache_level = "l3", + }, + { + .name = "L2", + .cache_level = "l2", + }, +}; +
You are using global variables, still. But I *still* see no locking. What if yet another driver (not just QEMU) will want to use resctrl? Bunch of these accesses can happen at the same time and break everything. How much of this information do we really need to keep (and not reload)? For example host_id can screw up a lot of things. I might be discussing in the latter patches as well.

-- Best regards Eli 天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Wednesday, 15 March 2017 at 8:20 PM, Martin Kletzander wrote:
On Mon, Mar 06, 2017 at 06:06:30PM +0800, Eli Qiao wrote:
This patch adds some utils struct and functions to expose resctrl information.
virResCtrlAvailable: if resctrl interface exist on host. virResCtrlGet: get specific type resource control information. virResCtrlInit: initialize resctrl struct from the host's sys fs. resctrlall[]: an array to maintain resource control information.
Some of host cpu related information methods was added in virhostcpu.c
Signed-off-by: Eli Qiao <liyong.qiao@intel.com (mailto:liyong.qiao@intel.com)> --- include/libvirt/virterror.h | 1 + po/POTFILES.in (http://POTFILES.in) | 1 + src/Makefile.am (http://Makefile.am) | 1 + src/libvirt_private.syms | 4 + src/util/virerror.c | 1 + src/util/virhostcpu.c | 186 ++++++++++++++++++++++++++++++++++++---- src/util/virhostcpu.h | 6 ++ src/util/virresctrl.c | 201 ++++++++++++++++++++++++++++++++++++++++++++ src/util/virresctrl.h | 78 +++++++++++++++++ 9 files changed, 462 insertions(+), 17 deletions(-) create mode 100644 src/util/virresctrl.c create mode 100644 src/util/virresctrl.h
diff --git a/src/util/virerror.c b/src/util/virerror.c index ef17fb5..0ba15e6 100644 --- a/src/util/virerror.c +++ b/src/util/virerror.c @@ -139,6 +139,7 @@ VIR_ENUM_IMPL(virErrorDomain, VIR_ERR_DOMAIN_LAST,
"Perf", /* 65 */ "Libssh transport layer", + "Resouce Control",
s/resouce/resource/
diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c new file mode 100644 index 0000000..44a47cc --- /dev/null +++ b/src/util/virresctrl.c @@ -0,0 +1,201 @@
[...]
+ +static unsigned int host_id; + +static virResCtrl resctrlall[] = { + { + .name = "L3", + .cache_level = "l3", + }, + { + .name = "L3DATA", + .cache_level = "l3", + }, + { + .name = "L3CODE", + .cache_level = "l3", + }, + { + .name = "L2", + .cache_level = "l2", + }, +}; +
You are using global variables, still. But I *still* see no locking. What if yet another driver (not just QEMU) will want to use resctrl? Bunch of these accesses can happen at the same time and break everything. How much of this information do we really need to keep (and not reload)?
Yes, we need to maintain a global one as /sys/fs/resctrl is a global one. most of these information are in-mutble and don’t need to reload.
For example host_id can screw up a lot of things. I might be discussing in the latter patches as well.
yep. I see them.

This patch expose cache information to host's capabilites xml. For l3 cache allocation <cache> <bank id='0' type='l3' size='56320' unit='KiB' cpus='0-21,44-65'> <control min='2816' reserved='2816' unit='KiB' scope='L3'/> </bank> <bank id='1' type='l3' size='56320' unit='KiB' cpus='22-43,66-87'> <control min='2816' reserved='2816' unit='KiB' scope='L3'/> </bank> </cache> For l3 cache allocation supported cdp(seperate data/code): <cache> <bank id='0' type='l3' size='56320' unit='KiB' cpus='0-21,44-65'> <control min='2816' reserved='2816' unit='KiB' scope='L3DATA'/> <control min='2816' reserved='2816' unit='KiB' scope='L3CODE'/> </bank> <bank id='1' type='l3' size='56320' unit='KiB' cpus='22-43,66-87'> <control min='2816' reserved='2816' unit='KiB' scope='L3DATA'/> <control min='2816' reserved='2816' unit='KiB' scope='L3CODE'/> </bank> </cache> RFC on mailing list. https://www.redhat.com/archives/libvir-list/2017-January/msg00644.html Signed-off-by: Eli Qiao <liyong.qiao@intel.com> --- src/conf/capabilities.c | 56 ++++++++++++++++++++++++++++++++++++++ src/conf/capabilities.h | 23 ++++++++++++++++ src/libvirt_private.syms | 3 +++ src/nodeinfo.c | 64 ++++++++++++++++++++++++++++++++++++++++++++ src/nodeinfo.h | 1 + src/qemu/qemu_capabilities.c | 8 ++++++ src/qemu/qemu_driver.c | 4 +++ 7 files changed, 159 insertions(+) diff --git a/src/conf/capabilities.c b/src/conf/capabilities.c index 9ab343b..23e21e4 100644 --- a/src/conf/capabilities.c +++ b/src/conf/capabilities.c @@ -198,6 +198,18 @@ virCapabilitiesClearSecModel(virCapsHostSecModelPtr secmodel) } static void +virCapabilitiesClearCacheBank(virCapsHostCacheBankPtr cachebank) +{ + size_t i; + for (i = 0; i < cachebank->ncontrol; i++) + VIR_FREE(cachebank->control[i].scope); + + VIR_FREE(cachebank->type); + VIR_FREE(cachebank->cpus); +} + + +static void virCapabilitiesDispose(void *object) { virCapsPtr caps = object; @@ -221,6 +233,10 @@ virCapabilitiesDispose(void *object) virCapabilitiesClearSecModel(&caps->host.secModels[i]); VIR_FREE(caps->host.secModels); + for (i = 0; i < caps->host.ncachebank; i++) + virCapabilitiesClearCacheBank(caps->host.cachebank[i]); + VIR_FREE(caps->host.cachebank); + VIR_FREE(caps->host.netprefix); VIR_FREE(caps->host.pagesSize); virCPUDefFree(caps->host.cpu); @@ -844,6 +860,41 @@ virCapabilitiesFormatNUMATopology(virBufferPtr buf, return 0; } +static int +virCapabilitiesFormatCache(virBufferPtr buf, + size_t ncachebank, + virCapsHostCacheBankPtr *cachebank) +{ + size_t i; + size_t j; + + virBufferAddLit(buf, "<cache>\n"); + virBufferAdjustIndent(buf, 2); + + for (i = 0; i < ncachebank; i++) { + virBufferAsprintf(buf, + "<bank id='%u' type='%s' size='%llu' unit='KiB' cpus='%s'>\n", + cachebank[i]->id, + cachebank[i]->type, + cachebank[i]->size, + cachebank[i]->cpus); + + virBufferAdjustIndent(buf, 2); + for (j = 0; j < cachebank[i]->ncontrol; j++) { + virBufferAsprintf(buf, + "<control min='%llu' reserved='%llu' unit='KiB' scope='%s'/>\n", + cachebank[i]->control[j].min, + cachebank[i]->control[j].reserved, + cachebank[i]->control[j].scope); + } + virBufferAdjustIndent(buf, -2); + virBufferAddLit(buf, "</bank>\n"); + } + virBufferAdjustIndent(buf, -2); + virBufferAddLit(buf, "</cache>\n"); + return 0; +} + /** * virCapabilitiesFormatXML: * @caps: capabilities to format @@ -931,6 +982,11 @@ virCapabilitiesFormatXML(virCapsPtr caps) virBufferAddLit(&buf, "</migration_features>\n"); } + if (caps->host.ncachebank && + virCapabilitiesFormatCache(&buf, caps->host.ncachebank, + caps->host.cachebank) < 0) + return NULL; + if (caps->host.netprefix) virBufferAsprintf(&buf, "<netprefix>%s</netprefix>\n", caps->host.netprefix); diff --git a/src/conf/capabilities.h b/src/conf/capabilities.h index cfdc34a..b446de5 100644 --- a/src/conf/capabilities.h +++ b/src/conf/capabilities.h @@ -138,6 +138,25 @@ struct _virCapsHostSecModel { virCapsHostSecModelLabelPtr labels; }; +typedef struct _virCapsHostCacheControl virCapsHostCacheControl; +typedef virCapsHostCacheControl *virCapsHostCacheControlPtr; +struct _virCapsHostCacheControl { + unsigned long long min; + unsigned long long reserved; + char* scope; +}; + +typedef struct _virCapsHostCacheBank virCapsHostCacheBank; +typedef virCapsHostCacheBank *virCapsHostCacheBankPtr; +struct _virCapsHostCacheBank { + unsigned int id; + char* type; + char* cpus; + unsigned long long size; + size_t ncontrol; + virCapsHostCacheControlPtr control; +}; + typedef struct _virCapsHost virCapsHost; typedef virCapsHost *virCapsHostPtr; struct _virCapsHost { @@ -160,6 +179,10 @@ struct _virCapsHost { size_t nsecModels; virCapsHostSecModelPtr secModels; + size_t ncachebank; + size_t ncachebank_max; + virCapsHostCacheBankPtr *cachebank; + char *netprefix; virCPUDefPtr cpu; int nPagesSize; /* size of pagesSize array */ diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index bb7c3ad..b8445ef 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1125,6 +1125,7 @@ virLogManagerNew; # nodeinfo.h nodeCapsInitNUMA; nodeGetInfo; +virCapsInitCache; virHostCPUGetCount; virHostCPUGetKVMMaxVCPUs; virHostCPUGetMap; @@ -2322,8 +2323,10 @@ virRandomInt; # util/virresctrl.h virResCtrlAvailable; +virResCtrlGet; virResCtrlInit; + # util/virrotatingfile.h virRotatingFileReaderConsume; virRotatingFileReaderFree; diff --git a/src/nodeinfo.c b/src/nodeinfo.c index f2ded02..a001cd0 100644 --- a/src/nodeinfo.c +++ b/src/nodeinfo.c @@ -48,6 +48,7 @@ #include "virstring.h" #include "virnuma.h" #include "virlog.h" +#include "virresctrl.h" #define VIR_FROM_THIS VIR_FROM_NONE @@ -416,3 +417,66 @@ nodeCapsInitNUMA(virCapsPtr caps) VIR_FREE(pageinfo); return ret; } + +int +virCapsInitCache(virCapsPtr caps) +{ + size_t i, j; + virResCtrlPtr resctrl; + virCapsHostCacheBankPtr bank; + + for (i = 0; i < VIR_RDT_RESOURCE_LAST; i++) { + /* L3DATA and L3CODE share L3 resources */ + if (i == VIR_RDT_RESOURCE_L3CODE) + continue; + + resctrl = virResCtrlGet(i); + + if (resctrl->enabled) { + for (j = 0; j < resctrl->num_banks; j++) { + if (VIR_RESIZE_N(caps->host.cachebank, caps->host.ncachebank_max, + caps->host.ncachebank, 1) < 0) + return -1; + + if (VIR_ALLOC(bank) < 0) + return -1; + + bank->id = resctrl->cache_banks[j].host_id; + if (VIR_STRDUP(bank->type, resctrl->cache_level) < 0) + goto err; + if (VIR_STRDUP(bank->cpus, virBitmapFormat(resctrl->cache_banks[j].cpu_mask)) < 0) + goto err; + bank->size = resctrl->cache_banks[j].cache_size; + /*L3DATA and L3CODE shares L3 cache resources, so fill them to the control element*/ + if (i == VIR_RDT_RESOURCE_L3DATA) { + if (VIR_EXPAND_N(bank->control, bank->ncontrol, 2) < 0) + goto err; + + bank->control[0].min = virResCtrlGet(VIR_RDT_RESOURCE_L3DATA)->cache_banks[j].cache_min; + bank->control[0].reserved = bank->control[0].min * (virResCtrlGet(VIR_RDT_RESOURCE_L3DATA)->min_cbm_bits); + if (VIR_STRDUP(bank->control[0].scope, + virResCtrlGet(VIR_RDT_RESOURCE_L3DATA)->name) < 0) + goto err; + + bank->control[1].min = virResCtrlGet(VIR_RDT_RESOURCE_L3CODE)->cache_banks[j].cache_min; + bank->control[1].reserved = bank->control[1].min * (virResCtrlGet(VIR_RDT_RESOURCE_L3CODE)->min_cbm_bits); + if (VIR_STRDUP(bank->control[1].scope, + virResCtrlGet(VIR_RDT_RESOURCE_L3CODE)->name) < 0) + goto err; + } else { + if (VIR_EXPAND_N(bank->control, bank->ncontrol, 1) < 0) + goto err; + bank->control[0].min = resctrl->cache_banks[j].cache_min; + bank->control[0].reserved = bank->control[0].min * resctrl->min_cbm_bits; + if (VIR_STRDUP(bank->control[0].scope, resctrl->name) < 0) + goto err; + } + caps->host.cachebank[caps->host.ncachebank++] = bank; + } + } + } + return 0; + err: + VIR_FREE(bank); + return -1; +} diff --git a/src/nodeinfo.h b/src/nodeinfo.h index 3c4dc46..5eb0f83 100644 --- a/src/nodeinfo.h +++ b/src/nodeinfo.h @@ -28,5 +28,6 @@ int nodeGetInfo(virNodeInfoPtr nodeinfo); int nodeCapsInitNUMA(virCapsPtr caps); +int virCapsInitCache(virCapsPtr caps); #endif /* __VIR_NODEINFO_H__*/ diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 359a0d8..a0d7254 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -1100,6 +1100,11 @@ virQEMUCapsInitCPU(virCapsPtr caps, goto cleanup; } +static int +virQEMUCapsInitCache(virCapsPtr caps) +{ + return virCapsInitCache(caps); +} static int virQEMUCapsInitPages(virCapsPtr caps) @@ -1146,6 +1151,9 @@ virCapsPtr virQEMUCapsInit(virQEMUCapsCachePtr cache) if (virQEMUCapsInitCPU(caps, hostarch) < 0) VIR_WARN("Failed to get host CPU"); + if (virQEMUCapsInitCache(caps) < 0) + VIR_WARN("Failed to get host cache"); + /* Add the power management features of the host */ if (virNodeSuspendGetTargetMask(&caps->host.powerMgmt) < 0) VIR_WARN("Failed to get host power management capabilities"); diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 77d8175..3bfb4ec 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -105,6 +105,7 @@ #include "vircgroup.h" #include "virperf.h" #include "virnuma.h" +#include "virresctrl.h" #include "dirname.h" #include "network/bridge_driver.h" @@ -849,6 +850,9 @@ qemuStateInitialize(bool privileged, run_gid = cfg->group; } + if (virResCtrlAvailable() && virResCtrlInit() < 0) + VIR_WARN("Faild to initialize resource control."); + qemu_driver->qemuCapsCache = virQEMUCapsCacheNew(cfg->libDir, cfg->cacheDir, run_uid, -- 1.9.1

On Mon, Mar 06, 2017 at 06:06:31PM +0800, Eli Qiao wrote:
This patch expose cache information to host's capabilites xml.
For l3 cache allocation <cache> <bank id='0' type='l3' size='56320' unit='KiB' cpus='0-21,44-65'> <control min='2816' reserved='2816' unit='KiB' scope='L3'/> </bank> <bank id='1' type='l3' size='56320' unit='KiB' cpus='22-43,66-87'> <control min='2816' reserved='2816' unit='KiB' scope='L3'/> </bank> </cache>
For l3 cache allocation supported cdp(seperate data/code): <cache> <bank id='0' type='l3' size='56320' unit='KiB' cpus='0-21,44-65'> <control min='2816' reserved='2816' unit='KiB' scope='L3DATA'/> <control min='2816' reserved='2816' unit='KiB' scope='L3CODE'/>
I know this was discussed before, but why having a vector value in a single attribute? Why don't we split it to two scalars? It will also be SOO much more readable and close to what other tools expose, e.g.: <control scope="data"/> <control scope="instruction"/> or <control scope="both"/> I also skipped the 'l3' because that is already in the <bank/> and will never be different, right?
</bank> <bank id='1' type='l3' size='56320' unit='KiB' cpus='22-43,66-87'> <control min='2816' reserved='2816' unit='KiB' scope='L3DATA'/> <control min='2816' reserved='2816' unit='KiB' scope='L3CODE'/> </bank> </cache>
RFC on mailing list. https://www.redhat.com/archives/libvir-list/2017-January/msg00644.html
Signed-off-by: Eli Qiao <liyong.qiao@intel.com> --- src/conf/capabilities.c | 56 ++++++++++++++++++++++++++++++++++++++ src/conf/capabilities.h | 23 ++++++++++++++++ src/libvirt_private.syms | 3 +++ src/nodeinfo.c | 64 ++++++++++++++++++++++++++++++++++++++++++++ src/nodeinfo.h | 1 + src/qemu/qemu_capabilities.c | 8 ++++++ src/qemu/qemu_driver.c | 4 +++ 7 files changed, 159 insertions(+)
diff --git a/src/conf/capabilities.c b/src/conf/capabilities.c index 9ab343b..23e21e4 100644 --- a/src/conf/capabilities.c +++ b/src/conf/capabilities.c @@ -198,6 +198,18 @@ virCapabilitiesClearSecModel(virCapsHostSecModelPtr secmodel) }
static void +virCapabilitiesClearCacheBank(virCapsHostCacheBankPtr cachebank) +{ + size_t i; + for (i = 0; i < cachebank->ncontrol; i++) + VIR_FREE(cachebank->control[i].scope); + + VIR_FREE(cachebank->type); + VIR_FREE(cachebank->cpus); +} + + +static void virCapabilitiesDispose(void *object) { virCapsPtr caps = object; @@ -221,6 +233,10 @@ virCapabilitiesDispose(void *object) virCapabilitiesClearSecModel(&caps->host.secModels[i]); VIR_FREE(caps->host.secModels);
+ for (i = 0; i < caps->host.ncachebank; i++) + virCapabilitiesClearCacheBank(caps->host.cachebank[i]); + VIR_FREE(caps->host.cachebank); + VIR_FREE(caps->host.netprefix); VIR_FREE(caps->host.pagesSize); virCPUDefFree(caps->host.cpu); @@ -844,6 +860,41 @@ virCapabilitiesFormatNUMATopology(virBufferPtr buf, return 0; }
+static int +virCapabilitiesFormatCache(virBufferPtr buf, + size_t ncachebank, + virCapsHostCacheBankPtr *cachebank) +{ + size_t i; + size_t j; + + virBufferAddLit(buf, "<cache>\n"); + virBufferAdjustIndent(buf, 2); + + for (i = 0; i < ncachebank; i++) { + virBufferAsprintf(buf, + "<bank id='%u' type='%s' size='%llu' unit='KiB' cpus='%s'>\n", + cachebank[i]->id, + cachebank[i]->type, + cachebank[i]->size, + cachebank[i]->cpus); + + virBufferAdjustIndent(buf, 2); + for (j = 0; j < cachebank[i]->ncontrol; j++) { + virBufferAsprintf(buf, + "<control min='%llu' reserved='%llu' unit='KiB' scope='%s'/>\n", + cachebank[i]->control[j].min, + cachebank[i]->control[j].reserved, + cachebank[i]->control[j].scope); + } + virBufferAdjustIndent(buf, -2); + virBufferAddLit(buf, "</bank>\n"); + } + virBufferAdjustIndent(buf, -2); + virBufferAddLit(buf, "</cache>\n"); + return 0; +} + /** * virCapabilitiesFormatXML: * @caps: capabilities to format @@ -931,6 +982,11 @@ virCapabilitiesFormatXML(virCapsPtr caps) virBufferAddLit(&buf, "</migration_features>\n"); }
+ if (caps->host.ncachebank && + virCapabilitiesFormatCache(&buf, caps->host.ncachebank, + caps->host.cachebank) < 0) + return NULL; + if (caps->host.netprefix) virBufferAsprintf(&buf, "<netprefix>%s</netprefix>\n", caps->host.netprefix); diff --git a/src/conf/capabilities.h b/src/conf/capabilities.h index cfdc34a..b446de5 100644 --- a/src/conf/capabilities.h +++ b/src/conf/capabilities.h @@ -138,6 +138,25 @@ struct _virCapsHostSecModel { virCapsHostSecModelLabelPtr labels; };
+typedef struct _virCapsHostCacheControl virCapsHostCacheControl; +typedef virCapsHostCacheControl *virCapsHostCacheControlPtr; +struct _virCapsHostCacheControl { + unsigned long long min; + unsigned long long reserved; + char* scope; +}; + +typedef struct _virCapsHostCacheBank virCapsHostCacheBank; +typedef virCapsHostCacheBank *virCapsHostCacheBankPtr; +struct _virCapsHostCacheBank { + unsigned int id; + char* type; + char* cpus; + unsigned long long size; + size_t ncontrol; + virCapsHostCacheControlPtr control; +}; + typedef struct _virCapsHost virCapsHost; typedef virCapsHost *virCapsHostPtr; struct _virCapsHost { @@ -160,6 +179,10 @@ struct _virCapsHost { size_t nsecModels; virCapsHostSecModelPtr secModels;
+ size_t ncachebank; + size_t ncachebank_max; + virCapsHostCacheBankPtr *cachebank; + char *netprefix; virCPUDefPtr cpu; int nPagesSize; /* size of pagesSize array */ diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index bb7c3ad..b8445ef 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1125,6 +1125,7 @@ virLogManagerNew; # nodeinfo.h nodeCapsInitNUMA; nodeGetInfo; +virCapsInitCache; virHostCPUGetCount; virHostCPUGetKVMMaxVCPUs; virHostCPUGetMap; @@ -2322,8 +2323,10 @@ virRandomInt;
# util/virresctrl.h virResCtrlAvailable; +virResCtrlGet; virResCtrlInit;
+ # util/virrotatingfile.h virRotatingFileReaderConsume; virRotatingFileReaderFree; diff --git a/src/nodeinfo.c b/src/nodeinfo.c index f2ded02..a001cd0 100644 --- a/src/nodeinfo.c +++ b/src/nodeinfo.c @@ -48,6 +48,7 @@ #include "virstring.h" #include "virnuma.h" #include "virlog.h" +#include "virresctrl.h"
#define VIR_FROM_THIS VIR_FROM_NONE
@@ -416,3 +417,66 @@ nodeCapsInitNUMA(virCapsPtr caps) VIR_FREE(pageinfo); return ret; } + +int +virCapsInitCache(virCapsPtr caps) +{ + size_t i, j; + virResCtrlPtr resctrl; + virCapsHostCacheBankPtr bank; + + for (i = 0; i < VIR_RDT_RESOURCE_LAST; i++) { + /* L3DATA and L3CODE share L3 resources */ + if (i == VIR_RDT_RESOURCE_L3CODE) + continue; + + resctrl = virResCtrlGet(i); + + if (resctrl->enabled) { + for (j = 0; j < resctrl->num_banks; j++) { + if (VIR_RESIZE_N(caps->host.cachebank, caps->host.ncachebank_max, + caps->host.ncachebank, 1) < 0) + return -1; + + if (VIR_ALLOC(bank) < 0) + return -1; + + bank->id = resctrl->cache_banks[j].host_id; + if (VIR_STRDUP(bank->type, resctrl->cache_level) < 0) + goto err; + if (VIR_STRDUP(bank->cpus, virBitmapFormat(resctrl->cache_banks[j].cpu_mask)) < 0) + goto err; + bank->size = resctrl->cache_banks[j].cache_size; + /*L3DATA and L3CODE shares L3 cache resources, so fill them to the control element*/ + if (i == VIR_RDT_RESOURCE_L3DATA) { + if (VIR_EXPAND_N(bank->control, bank->ncontrol, 2) < 0) + goto err; + + bank->control[0].min = virResCtrlGet(VIR_RDT_RESOURCE_L3DATA)->cache_banks[j].cache_min; + bank->control[0].reserved = bank->control[0].min * (virResCtrlGet(VIR_RDT_RESOURCE_L3DATA)->min_cbm_bits); + if (VIR_STRDUP(bank->control[0].scope, + virResCtrlGet(VIR_RDT_RESOURCE_L3DATA)->name) < 0) + goto err; + + bank->control[1].min = virResCtrlGet(VIR_RDT_RESOURCE_L3CODE)->cache_banks[j].cache_min; + bank->control[1].reserved = bank->control[1].min * (virResCtrlGet(VIR_RDT_RESOURCE_L3CODE)->min_cbm_bits); + if (VIR_STRDUP(bank->control[1].scope, + virResCtrlGet(VIR_RDT_RESOURCE_L3CODE)->name) < 0) + goto err; + } else { + if (VIR_EXPAND_N(bank->control, bank->ncontrol, 1) < 0) + goto err; + bank->control[0].min = resctrl->cache_banks[j].cache_min; + bank->control[0].reserved = bank->control[0].min * resctrl->min_cbm_bits; + if (VIR_STRDUP(bank->control[0].scope, resctrl->name) < 0) + goto err; + } + caps->host.cachebank[caps->host.ncachebank++] = bank; + } + } + } + return 0; + err: + VIR_FREE(bank); + return -1; +} diff --git a/src/nodeinfo.h b/src/nodeinfo.h index 3c4dc46..5eb0f83 100644 --- a/src/nodeinfo.h +++ b/src/nodeinfo.h @@ -28,5 +28,6 @@
int nodeGetInfo(virNodeInfoPtr nodeinfo); int nodeCapsInitNUMA(virCapsPtr caps); +int virCapsInitCache(virCapsPtr caps);
#endif /* __VIR_NODEINFO_H__*/ diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 359a0d8..a0d7254 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -1100,6 +1100,11 @@ virQEMUCapsInitCPU(virCapsPtr caps, goto cleanup; }
+static int +virQEMUCapsInitCache(virCapsPtr caps) +{ + return virCapsInitCache(caps); +}
static int virQEMUCapsInitPages(virCapsPtr caps) @@ -1146,6 +1151,9 @@ virCapsPtr virQEMUCapsInit(virQEMUCapsCachePtr cache) if (virQEMUCapsInitCPU(caps, hostarch) < 0) VIR_WARN("Failed to get host CPU");
+ if (virQEMUCapsInitCache(caps) < 0) + VIR_WARN("Failed to get host cache"); + /* Add the power management features of the host */ if (virNodeSuspendGetTargetMask(&caps->host.powerMgmt) < 0) VIR_WARN("Failed to get host power management capabilities"); diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 77d8175..3bfb4ec 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -105,6 +105,7 @@ #include "vircgroup.h" #include "virperf.h" #include "virnuma.h" +#include "virresctrl.h" #include "dirname.h" #include "network/bridge_driver.h"
@@ -849,6 +850,9 @@ qemuStateInitialize(bool privileged, run_gid = cfg->group; }
+ if (virResCtrlAvailable() && virResCtrlInit() < 0) + VIR_WARN("Faild to initialize resource control."); +
s/Faild/Failed/ Also the Available() call seems unnecessary since Init() is graceful to unavailability IIRC. Anyway if this is to be done once per daemon lifetime, it should be wrapped using VIR_ONCE (see VIR_ONCE_GLOBAL_INIT all over the codebase). I believe I mentioned this already in some previous version.
qemu_driver->qemuCapsCache = virQEMUCapsCacheNew(cfg->libDir, cfg->cacheDir, run_uid, -- 1.9.1
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

-- Best regards Eli 天涯无处不重逢 a leaf duckweed belongs to the sea, where not to meet in life Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Wednesday, 15 March 2017 at 8:53 PM, Martin Kletzander wrote:
On Mon, Mar 06, 2017 at 06:06:31PM +0800, Eli Qiao wrote:
This patch expose cache information to host's capabilites xml.
For l3 cache allocation <cache> <bank id='0' type='l3' size='56320' unit='KiB' cpus='0-21,44-65'> <control min='2816' reserved='2816' unit='KiB' scope='L3'/> </bank> <bank id='1' type='l3' size='56320' unit='KiB' cpus='22-43,66-87'> <control min='2816' reserved='2816' unit='KiB' scope='L3'/> </bank> </cache>
For l3 cache allocation supported cdp(seperate data/code): <cache> <bank id='0' type='l3' size='56320' unit='KiB' cpus='0-21,44-65'> <control min='2816' reserved='2816' unit='KiB' scope='L3DATA'/> <control min='2816' reserved='2816' unit='KiB' scope='L3CODE'/>
I know this was discussed before, but why having a vector value in a single attribute? Why don't we split it to two scalars? It will also be SOO much more readable and close to what other tools expose, e.g.:
<control scope="data"/> <control scope="instruction"/>
or
<control scope="both"/>
I also skipped the 'l3' because that is already in the <bank/> and will never be different, right?
I am okay to ether of them, just follow with previous discussion.
</bank> <bank id='1' type='l3' size='56320' unit='KiB' cpus='22-43,66-87'> <control min='2816' reserved='2816' unit='KiB' scope='L3DATA'/> <control min='2816' reserved='2816' unit='KiB' scope='L3CODE'/> </bank> </cache>
RFC on mailing list. https://www.redhat.com/archives/libvir-list/2017-January/msg00644.html
Signed-off-by: Eli Qiao <liyong.qiao@intel.com (mailto:liyong.qiao@intel.com)> --- src/conf/capabilities.c | 56 ++++++++++++++++++++++++++++++++++++++ src/conf/capabilities.h | 23 ++++++++++++++++ src/libvirt_private.syms | 3 +++ src/nodeinfo.c | 64 ++++++++++++++++++++++++++++++++++++++++++++ src/nodeinfo.h | 1 + src/qemu/qemu_capabilities.c | 8 ++++++ src/qemu/qemu_driver.c | 4 +++ 7 files changed, 159 insertions(+)
diff --git a/src/conf/capabilities.c b/src/conf/capabilities.c index 9ab343b..23e21e4 100644 --- a/src/conf/capabilities.c +++ b/src/conf/capabilities.c @@ -198,6 +198,18 @@ virCapabilitiesClearSecModel(virCapsHostSecModelPtr secmodel) }
static void +virCapabilitiesClearCacheBank(virCapsHostCacheBankPtr cachebank) +{ + size_t i; + for (i = 0; i < cachebank->ncontrol; i++) + VIR_FREE(cachebank->control[i].scope); + + VIR_FREE(cachebank->type); + VIR_FREE(cachebank->cpus); +} + + +static void virCapabilitiesDispose(void *object) { virCapsPtr caps = object; @@ -221,6 +233,10 @@ virCapabilitiesDispose(void *object) virCapabilitiesClearSecModel(&caps->host.secModels[i]); VIR_FREE(caps->host.secModels);
+ for (i = 0; i < caps->host.ncachebank; i++) + virCapabilitiesClearCacheBank(caps->host.cachebank[i]); + VIR_FREE(caps->host.cachebank); + VIR_FREE(caps->host.netprefix); VIR_FREE(caps->host.pagesSize); virCPUDefFree(caps->host.cpu); @@ -844,6 +860,41 @@ virCapabilitiesFormatNUMATopology(virBufferPtr buf, return 0; }
+static int +virCapabilitiesFormatCache(virBufferPtr buf, + size_t ncachebank, + virCapsHostCacheBankPtr *cachebank) +{ + size_t i; + size_t j; + + virBufferAddLit(buf, "<cache>\n"); + virBufferAdjustIndent(buf, 2); + + for (i = 0; i < ncachebank; i++) { + virBufferAsprintf(buf, + "<bank id='%u' type='%s' size='%llu' unit='KiB' cpus='%s'>\n", + cachebank[i]->id, + cachebank[i]->type, + cachebank[i]->size, + cachebank[i]->cpus); + + virBufferAdjustIndent(buf, 2); + for (j = 0; j < cachebank[i]->ncontrol; j++) { + virBufferAsprintf(buf, + "<control min='%llu' reserved='%llu' unit='KiB' scope='%s'/>\n", + cachebank[i]->control[j].min, + cachebank[i]->control[j].reserved, + cachebank[i]->control[j].scope); + } + virBufferAdjustIndent(buf, -2); + virBufferAddLit(buf, "</bank>\n"); + } + virBufferAdjustIndent(buf, -2); + virBufferAddLit(buf, "</cache>\n"); + return 0; +} + /** * virCapabilitiesFormatXML: * @caps: capabilities to format @@ -931,6 +982,11 @@ virCapabilitiesFormatXML(virCapsPtr caps) virBufferAddLit(&buf, "</migration_features>\n"); }
+ if (caps->host.ncachebank && + virCapabilitiesFormatCache(&buf, caps->host.ncachebank, + caps->host.cachebank) < 0) + return NULL; + if (caps->host.netprefix) virBufferAsprintf(&buf, "<netprefix>%s</netprefix>\n", caps->host.netprefix); diff --git a/src/conf/capabilities.h b/src/conf/capabilities.h index cfdc34a..b446de5 100644 --- a/src/conf/capabilities.h +++ b/src/conf/capabilities.h @@ -138,6 +138,25 @@ struct _virCapsHostSecModel { virCapsHostSecModelLabelPtr labels; };
+typedef struct _virCapsHostCacheControl virCapsHostCacheControl; +typedef virCapsHostCacheControl *virCapsHostCacheControlPtr; +struct _virCapsHostCacheControl { + unsigned long long min; + unsigned long long reserved; + char* scope; +}; + +typedef struct _virCapsHostCacheBank virCapsHostCacheBank; +typedef virCapsHostCacheBank *virCapsHostCacheBankPtr; +struct _virCapsHostCacheBank { + unsigned int id; + char* type; + char* cpus; + unsigned long long size; + size_t ncontrol; + virCapsHostCacheControlPtr control; +}; + typedef struct _virCapsHost virCapsHost; typedef virCapsHost *virCapsHostPtr; struct _virCapsHost { @@ -160,6 +179,10 @@ struct _virCapsHost { size_t nsecModels; virCapsHostSecModelPtr secModels;
+ size_t ncachebank; + size_t ncachebank_max; + virCapsHostCacheBankPtr *cachebank; + char *netprefix; virCPUDefPtr cpu; int nPagesSize; /* size of pagesSize array */ diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index bb7c3ad..b8445ef 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1125,6 +1125,7 @@ virLogManagerNew; # nodeinfo.h nodeCapsInitNUMA; nodeGetInfo; +virCapsInitCache; virHostCPUGetCount; virHostCPUGetKVMMaxVCPUs; virHostCPUGetMap; @@ -2322,8 +2323,10 @@ virRandomInt;
# util/virresctrl.h virResCtrlAvailable; +virResCtrlGet; virResCtrlInit;
+ # util/virrotatingfile.h virRotatingFileReaderConsume; virRotatingFileReaderFree; diff --git a/src/nodeinfo.c b/src/nodeinfo.c index f2ded02..a001cd0 100644 --- a/src/nodeinfo.c +++ b/src/nodeinfo.c @@ -48,6 +48,7 @@ #include "virstring.h" #include "virnuma.h" #include "virlog.h" +#include "virresctrl.h"
#define VIR_FROM_THIS VIR_FROM_NONE
@@ -416,3 +417,66 @@ nodeCapsInitNUMA(virCapsPtr caps) VIR_FREE(pageinfo); return ret; } + +int +virCapsInitCache(virCapsPtr caps) +{ + size_t i, j; + virResCtrlPtr resctrl; + virCapsHostCacheBankPtr bank; + + for (i = 0; i < VIR_RDT_RESOURCE_LAST; i++) { + /* L3DATA and L3CODE share L3 resources */ + if (i == VIR_RDT_RESOURCE_L3CODE) + continue; + + resctrl = virResCtrlGet(i); + + if (resctrl->enabled) { + for (j = 0; j < resctrl->num_banks; j++) { + if (VIR_RESIZE_N(caps->host.cachebank, caps->host.ncachebank_max, + caps->host.ncachebank, 1) < 0) + return -1; + + if (VIR_ALLOC(bank) < 0) + return -1; + + bank->id = resctrl->cache_banks[j].host_id; + if (VIR_STRDUP(bank->type, resctrl->cache_level) < 0) + goto err; + if (VIR_STRDUP(bank->cpus, virBitmapFormat(resctrl->cache_banks[j].cpu_mask)) < 0) + goto err; + bank->size = resctrl->cache_banks[j].cache_size; + /*L3DATA and L3CODE shares L3 cache resources, so fill them to the control element*/ + if (i == VIR_RDT_RESOURCE_L3DATA) { + if (VIR_EXPAND_N(bank->control, bank->ncontrol, 2) < 0) + goto err; + + bank->control[0].min = virResCtrlGet(VIR_RDT_RESOURCE_L3DATA)->cache_banks[j].cache_min; + bank->control[0].reserved = bank->control[0].min * (virResCtrlGet(VIR_RDT_RESOURCE_L3DATA)->min_cbm_bits); + if (VIR_STRDUP(bank->control[0].scope, + virResCtrlGet(VIR_RDT_RESOURCE_L3DATA)->name) < 0) + goto err; + + bank->control[1].min = virResCtrlGet(VIR_RDT_RESOURCE_L3CODE)->cache_banks[j].cache_min; + bank->control[1].reserved = bank->control[1].min * (virResCtrlGet(VIR_RDT_RESOURCE_L3CODE)->min_cbm_bits); + if (VIR_STRDUP(bank->control[1].scope, + virResCtrlGet(VIR_RDT_RESOURCE_L3CODE)->name) < 0) + goto err; + } else { + if (VIR_EXPAND_N(bank->control, bank->ncontrol, 1) < 0) + goto err; + bank->control[0].min = resctrl->cache_banks[j].cache_min; + bank->control[0].reserved = bank->control[0].min * resctrl->min_cbm_bits; + if (VIR_STRDUP(bank->control[0].scope, resctrl->name) < 0) + goto err; + } + caps->host.cachebank[caps->host.ncachebank++] = bank; + } + } + } + return 0; + err: + VIR_FREE(bank); + return -1; +} diff --git a/src/nodeinfo.h b/src/nodeinfo.h index 3c4dc46..5eb0f83 100644 --- a/src/nodeinfo.h +++ b/src/nodeinfo.h @@ -28,5 +28,6 @@
int nodeGetInfo(virNodeInfoPtr nodeinfo); int nodeCapsInitNUMA(virCapsPtr caps); +int virCapsInitCache(virCapsPtr caps);
#endif /* __VIR_NODEINFO_H__*/ diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 359a0d8..a0d7254 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -1100,6 +1100,11 @@ virQEMUCapsInitCPU(virCapsPtr caps, goto cleanup; }
+static int +virQEMUCapsInitCache(virCapsPtr caps) +{ + return virCapsInitCache(caps); +}
static int virQEMUCapsInitPages(virCapsPtr caps) @@ -1146,6 +1151,9 @@ virCapsPtr virQEMUCapsInit(virQEMUCapsCachePtr cache) if (virQEMUCapsInitCPU(caps, hostarch) < 0) VIR_WARN("Failed to get host CPU");
+ if (virQEMUCapsInitCache(caps) < 0) + VIR_WARN("Failed to get host cache"); + /* Add the power management features of the host */ if (virNodeSuspendGetTargetMask(&caps->host.powerMgmt) < 0) VIR_WARN("Failed to get host power management capabilities"); diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 77d8175..3bfb4ec 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -105,6 +105,7 @@ #include "vircgroup.h" #include "virperf.h" #include "virnuma.h" +#include "virresctrl.h" #include "dirname.h" #include "network/bridge_driver.h"
@@ -849,6 +850,9 @@ qemuStateInitialize(bool privileged, run_gid = cfg->group; }
+ if (virResCtrlAvailable() && virResCtrlInit() < 0) + VIR_WARN("Faild to initialize resource control."); +
s/Faild/Failed/
Also the Available() call seems unnecessary since Init() is graceful to unavailability IIRC.
yep, we can remove virResCtrlAvailable.
Anyway if this is to be done once per daemon lifetime, it should be wrapped using VIR_ONCE (see VIR_ONCE_GLOBAL_INIT all over the codebase). I believe I mentioned this already in some previous version.
Ops, good to know that, would like to learn how to use VIR_ONCE.
qemu_driver->qemuCapsCache = virQEMUCapsCacheNew(cfg->libDir, cfg->cacheDir, run_uid, -- 1.9.1
-- libvir-list mailing list libvir-list@redhat.com (mailto:libvir-list@redhat.com) https://www.redhat.com/mailman/listinfo/libvir-list

This patch adds new xml element to support cache tune as: <cputune> ... <cachetune id='1' host_id='0' type='l3' size='2816' unit='KiB' vcpus='1,2'/> ... </cputune> id: any non-minus number host_id: reference of the host's cache banks id, it's from capabilities type: cache bank type size: should be multiples of the min_size of the bank on host. vcpus: cache allocation on vcpu set, if empty, will apply the allocation on all vcpus --- docs/schemas/domaincommon.rng | 46 +++++++++++++ src/conf/domain_conf.c | 152 ++++++++++++++++++++++++++++++++++++++++++ src/conf/domain_conf.h | 19 ++++++ src/util/virresctrl.c | 29 ++++++-- src/util/virresctrl.h | 4 +- 5 files changed, 244 insertions(+), 6 deletions(-) diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index c64544a..efc84c5 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -825,6 +825,32 @@ </attribute> </element> </zeroOrMore> + <zeroOrMore> + <element name="cachetune"> + <attribute name="id"> + <ref name="cacheid"/> + </attribute> + <attribute name="host_id"> + <ref name="hostid"/> + </attribute> + <attribute name="type"> + <ref name="cachetype"/> + </attribute> + <attribute name="size"> + <ref name="unsignedInt"/> + </attribute> + <optional> + <attribute name="unit"> + <ref name="cacheunit"/> + </attribute> + </optional> + <optional> + <attribute name="vcpus"> + <ref name="cpuset"/> + </attribute> + </optional> + </element> + </zeroOrMore> <optional> <element name="emulatorpin"> <attribute name="cpuset"> @@ -5490,6 +5516,26 @@ <param name="minInclusive">-1</param> </data> </define> + <define name="cacheid"> + <data type="unsignedShort"> + <param name="pattern">[0-9]+</param> + </data> + </define> + <define name="hostid"> + <data type="unsignedShort"> + <param name="pattern">[0-9]+</param> + </data> + </define> + <define name="cachetype"> + <data type="string"> + <param name="pattern">(l3)</param> + </data> + </define> + <define name="cacheunit"> + <data type="string"> + <param name="pattern">KiB</param> + </data> + </define> <!-- weight currently is in range [100, 1000] --> <define name="weight"> <data type="unsignedInt"> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 97d42fe..652f4ca 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -56,6 +56,7 @@ #include "virstring.h" #include "virnetdev.h" #include "virhostdev.h" +#include "virresctrl.h" #define VIR_FROM_THIS VIR_FROM_DOMAIN @@ -15745,6 +15746,127 @@ virDomainVcpuPinDefParseXML(virDomainDefPtr def, return ret; } +/* Parse the XML definition for cachetune + * and a cachetune has the form + * <cachetune id='0' host_id='0' type='l3' size='1024' unit='KiB'/> + */ +static int +virDomainCacheTuneDefParseXML(virDomainDefPtr def, + int n, + xmlNodePtr* nodes) +{ + char* tmp = NULL; + size_t i, j; + int type = -1; + virDomainCacheBankPtr bank = NULL; + virResCtrlPtr resctrl; + + if (VIR_ALLOC_N(bank, n) < 0) + goto cleanup; + + for (i = 0; i < n; i++) { + if (!(tmp = virXMLPropString(nodes[i], "id"))) { + virReportError(VIR_ERR_XML_ERROR, "%s", _("missing id in cache tune")); + goto cleanup; + } + if (virStrToLong_uip(tmp, NULL, 10, &(bank[i].id)) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("invalid setting for cache id '%s'"), tmp); + goto cleanup; + } + + VIR_FREE(tmp); + if (!(tmp = virXMLPropString(nodes[i], "host_id"))) { + virReportError(VIR_ERR_XML_ERROR, "%s", _("missing host id in cache tune")); + goto cleanup; + } + if (virStrToLong_uip(tmp, NULL, 10, &(bank[i].host_id)) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("invalid setting for cache host id '%s'"), tmp); + goto cleanup; + } + VIR_FREE(tmp); + + if (!(tmp = virXMLPropString(nodes[i], "size"))) { + virReportError(VIR_ERR_XML_ERROR, "%s", _("missing size in cache tune")); + goto cleanup; + } + if (virStrToLong_ull(tmp, NULL, 10, &(bank[i].size)) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("invalid setting for cache size '%s'"), tmp); + goto cleanup; + } + VIR_FREE(tmp); + + if (!(tmp = virXMLPropString(nodes[i], "type"))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("missing cache type")); + goto cleanup; + } + + if ((type = virResCtrlTypeFromString(tmp)) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("'unsupported cache type '%s'"), tmp); + goto cleanup; + } + + resctrl = virResCtrlGet(type); + + if (resctrl == NULL || !resctrl->enabled) { + virReportError(VIR_ERR_XML_ERROR, + _("'host doesn't enabled cache type '%s'"), tmp); + goto cleanup; + } + + bool found_host_id = false; + /* Loop for banks to search host_id */ + for (j = 0; j < resctrl->num_banks; j++) { + if (resctrl->cache_banks[j].host_id == bank[i].host_id) { + found_host_id = true; + break; + } + } + + if (! found_host_id) { + virReportError(VIR_ERR_XML_ERROR, + _("'cache bank's host id %u not found on the host"), + bank[i].host_id); + goto cleanup; + } + + if (bank[i].size == 0 || + bank[i].size % resctrl->cache_banks[j].cache_min != 0) { + virReportError(VIR_ERR_XML_ERROR, + _("'the size should be multiples of '%llu'"), + resctrl->cache_banks[j].cache_min); + goto cleanup; + } + + if (VIR_STRDUP(bank[i].type, tmp) < 0) + goto cleanup; + + if ((tmp = virXMLPropString(nodes[i], "vcpus"))) { + if (virBitmapParse(tmp, &bank[i].vcpus, + VIR_DOMAIN_CPUMASK_LEN) < 0) + goto cleanup; + + if (virBitmapIsAllClear(bank[i].vcpus)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Invalid value of 'vcpus': %s"), tmp); + goto cleanup; + } + } + } + + def->cachetune.cache_banks = bank; + def->cachetune.n_banks = n; + return 0; + + cleanup: + VIR_FREE(bank); + VIR_FREE(tmp); + return -1; +} /* Parse the XML definition for a iothreadpin * and an iothreadspin has the form @@ -17033,6 +17155,14 @@ virDomainDefParseXML(xmlDocPtr xml, } VIR_FREE(nodes); + if ((n = virXPathNodeSet("./cputune/cachetune", ctxt, &nodes)) < 0) + goto error; + + if (virDomainCacheTuneDefParseXML(def, n, nodes) < 0) + goto error; + + VIR_FREE(nodes); + if ((n = virXPathNodeSet("./cputune/emulatorpin", ctxt, &nodes)) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("cannot extract emulatorpin nodes")); @@ -23554,6 +23684,26 @@ virDomainSchedulerFormat(virBufferPtr buf, } +static void +virDomainCacheTuneDefFormat(virBufferPtr buf, + virDomainCachetunePtr cache) +{ + size_t i; + for (i = 0; i < cache->n_banks; i ++) { + virBufferAsprintf(buf, "<cachetune id='%u' host_id='%u' " + "type='%s' size='%llu' unit='KiB'", + cache->cache_banks[i].id, + cache->cache_banks[i].host_id, + cache->cache_banks[i].type, + cache->cache_banks[i].size); + + if (cache->cache_banks[i].vcpus) + virBufferAsprintf(buf, " vcpus='%s'/>\n", + virBitmapFormat(cache->cache_banks[i].vcpus)); + else + virBufferAddLit(buf, "/>\n"); + } +} static int virDomainCputuneDefFormat(virBufferPtr buf, @@ -23617,6 +23767,8 @@ virDomainCputuneDefFormat(virBufferPtr buf, VIR_FREE(cpumask); } + virDomainCacheTuneDefFormat(&childrenBuf, &def->cachetune); + if (def->cputune.emulatorpin) { char *cpumask; virBufferAddLit(&childrenBuf, "<emulatorpin "); diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 1e53cc3..3da5d61 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2153,6 +2153,23 @@ struct _virDomainMemtune { int allocation; /* enum virDomainMemoryAllocation */ }; +typedef struct _virDomainCacheBank virDomainCacheBank; +typedef virDomainCacheBank *virDomainCacheBankPtr; +struct _virDomainCacheBank { + unsigned int id; + unsigned int host_id; + unsigned long long size; + char *type; + virBitmapPtr vcpus; +}; + +typedef struct _virDomainCachetune virDomainCachetune; +typedef virDomainCachetune *virDomainCachetunePtr; +struct _virDomainCachetune { + size_t n_banks; + virDomainCacheBankPtr cache_banks; +}; + typedef struct _virDomainPowerManagement virDomainPowerManagement; typedef virDomainPowerManagement *virDomainPowerManagementPtr; @@ -2216,6 +2233,8 @@ struct _virDomainDef { virDomainCputune cputune; + virDomainCachetune cachetune; + virDomainNumaPtr numa; virDomainResourceDefPtr resource; virDomainIdMapDef idmap; diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index 44a47cc..eee6675 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -32,15 +32,34 @@ VIR_LOG_INIT("util.resctrl"); #define VIR_FROM_THIS VIR_FROM_RESCTRL - -#define RESCTRL_DIR "/sys/fs/resctrl" -#define RESCTRL_INFO_DIR "/sys/fs/resctrl/info" -#define SYSFS_SYSTEM_PATH "/sys/devices/system" - #define MAX_CPU_SOCKET_NUM 8 #define MAX_CBM_BIT_LEN 32 #define MAX_SCHEMATA_LEN 1024 #define MAX_FILE_LEN (10 * 1024 * 1024) +#define RESCTRL_DIR "/sys/fs/resctrl" +#define RESCTRL_INFO_DIR "/sys/fs/resctrl/info" +#define SYSFS_SYSTEM_PATH "/sys/devices/system" + +VIR_ENUM_IMPL(virResCtrl, VIR_RDT_RESOURCE_LAST, + "l3", "l3data", "l3code", "l2"); + +#define CONSTRUCT_RESCTRL_PATH(domain_name, item_name) \ +do { \ + if (NULL == domain_name) { \ + if (virAsprintf(&path, "%s/%s", RESCTRL_DIR, item_name) < 0)\ + return -1; \ + } else { \ + if (virAsprintf(&path, "%s/%s/%s", RESCTRL_DIR, \ + domain_name, \ + item_name) < 0) \ + return -1; \ + } \ +} while (0) + +#define VIR_RESCTRL_ENABLED(type) \ + resctrlall[type].enabled + +#define VIR_RESCTRL_GET_SCHEMATA(count) ((1 << count) - 1) static unsigned int host_id; diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h index 5a6a344..074d307 100644 --- a/src/util/virresctrl.h +++ b/src/util/virresctrl.h @@ -25,6 +25,7 @@ # define __VIR_RESCTRL_H__ # include "virbitmap.h" +# include "virutil.h" enum { VIR_RDT_RESOURCE_L3, @@ -32,9 +33,10 @@ enum { VIR_RDT_RESOURCE_L3CODE, VIR_RDT_RESOURCE_L2, /* Must be the last */ - VIR_RDT_RESOURCE_LAST, + VIR_RDT_RESOURCE_LAST }; +VIR_ENUM_DECL(virResCtrl); typedef struct _virResCacheBank virResCacheBank; typedef virResCacheBank *virResCacheBankPtr; -- 1.9.1

On Mon, Mar 06, 2017 at 06:06:32PM +0800, Eli Qiao wrote:
This patch adds new xml element to support cache tune as:
<cputune> ... <cachetune id='1' host_id='0' type='l3' size='2816' unit='KiB'
Again, this was already discussed, probably, I just can't find the source of it. But host_id actually already selects what cache is supposed to be used, so instead of type='l3' we only need scope='both' (or data/instruction) there. Or am I missing something? What I'm concerned about is that the host_id is mostly stable on one host (when the previously mentioned problems are fixed), but it will make no sense when the VM is migrated to another one. Unfortunately, the only solution I can think of is using multiple keys to precisely describe the bank we want (e.g. host's cpu id, cache level and scope), but that seems very unclean.
vcpus='1,2'/> ... </cputune>
id: any non-minus number host_id: reference of the host's cache banks id, it's from capabilities type: cache bank type size: should be multiples of the min_size of the bank on host. vcpus: cache allocation on vcpu set, if empty, will apply the allocation on all vcpus --- docs/schemas/domaincommon.rng | 46 +++++++++++++ src/conf/domain_conf.c | 152 ++++++++++++++++++++++++++++++++++++++++++ src/conf/domain_conf.h | 19 ++++++ src/util/virresctrl.c | 29 ++++++-- src/util/virresctrl.h | 4 +- 5 files changed, 244 insertions(+), 6 deletions(-)
diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index c64544a..efc84c5 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -825,6 +825,32 @@ </attribute> </element> </zeroOrMore> + <zeroOrMore> + <element name="cachetune"> + <attribute name="id"> + <ref name="cacheid"/> + </attribute>
this should be optional when defining a domain, we can fill in the ids after (or during) parsing.
+ <attribute name="host_id"> + <ref name="hostid"/> + </attribute> + <attribute name="type"> + <ref name="cachetype"/> + </attribute> + <attribute name="size"> + <ref name="unsignedInt"/> + </attribute> + <optional> + <attribute name="unit"> + <ref name="cacheunit"/> + </attribute> + </optional> + <optional> + <attribute name="vcpus"> + <ref name="cpuset"/> + </attribute> + </optional> + </element> + </zeroOrMore> <optional> <element name="emulatorpin"> <attribute name="cpuset"> @@ -5490,6 +5516,26 @@ <param name="minInclusive">-1</param> </data> </define> + <define name="cacheid"> + <data type="unsignedShort"> + <param name="pattern">[0-9]+</param>
unsignedShort doesn't need pattern, wherever you copied that from, it should be cleaned up there as well (of course not needed to be part of this series). I'll include such things with other silly one-line fixes in the series so that we get rid of it.
+ </data> + </define> + <define name="hostid"> + <data type="unsignedShort"> + <param name="pattern">[0-9]+</param> + </data> + </define> + <define name="cachetype"> + <data type="string"> + <param name="pattern">(l3)</param> + </data>
What's the difference to <value>l3</value> ???
+ </define> + <define name="cacheunit"> + <data type="string"> + <param name="pattern">KiB</param> + </data> + </define> <!-- weight currently is in range [100, 1000] --> <define name="weight"> <data type="unsignedInt"> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 97d42fe..652f4ca 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -56,6 +56,7 @@ #include "virstring.h" #include "virnetdev.h" #include "virhostdev.h" +#include "virresctrl.h"
#define VIR_FROM_THIS VIR_FROM_DOMAIN
@@ -15745,6 +15746,127 @@ virDomainVcpuPinDefParseXML(virDomainDefPtr def, return ret; }
+/* Parse the XML definition for cachetune + * and a cachetune has the form + * <cachetune id='0' host_id='0' type='l3' size='1024' unit='KiB'/> + */ +static int +virDomainCacheTuneDefParseXML(virDomainDefPtr def, + int n, + xmlNodePtr* nodes) +{ + char* tmp = NULL; + size_t i, j; + int type = -1; + virDomainCacheBankPtr bank = NULL; + virResCtrlPtr resctrl; + + if (VIR_ALLOC_N(bank, n) < 0) + goto cleanup; + + for (i = 0; i < n; i++) { + if (!(tmp = virXMLPropString(nodes[i], "id"))) { + virReportError(VIR_ERR_XML_ERROR, "%s", _("missing id in cache tune")); + goto cleanup; + } + if (virStrToLong_uip(tmp, NULL, 10, &(bank[i].id)) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("invalid setting for cache id '%s'"), tmp); + goto cleanup; + } +
As said before, ids can be calculated later, we don't require referencing them in the user's provided XML anywhere, or do we?
+ VIR_FREE(tmp); + if (!(tmp = virXMLPropString(nodes[i], "host_id"))) { + virReportError(VIR_ERR_XML_ERROR, "%s", _("missing host id in cache tune")); + goto cleanup; + } + if (virStrToLong_uip(tmp, NULL, 10, &(bank[i].host_id)) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("invalid setting for cache host id '%s'"), tmp); + goto cleanup; + } + VIR_FREE(tmp); + + if (!(tmp = virXMLPropString(nodes[i], "size"))) { + virReportError(VIR_ERR_XML_ERROR, "%s", _("missing size in cache tune")); + goto cleanup; + } + if (virStrToLong_ull(tmp, NULL, 10, &(bank[i].size)) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("invalid setting for cache size '%s'"), tmp); + goto cleanup; + } + VIR_FREE(tmp); + + if (!(tmp = virXMLPropString(nodes[i], "type"))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("missing cache type")); + goto cleanup; + } + + if ((type = virResCtrlTypeFromString(tmp)) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("'unsupported cache type '%s'"), tmp); + goto cleanup; + } + + resctrl = virResCtrlGet(type); + + if (resctrl == NULL || !resctrl->enabled) { + virReportError(VIR_ERR_XML_ERROR, + _("'host doesn't enabled cache type '%s'"), tmp);
s/'host/host/, s/enabled/support/?? This should be in PostParse callback, though. Or even better in qemuProcessStartValidate() so that the domain doesn't disappear when the host changes kernels or kernel config or anything else.
+ goto cleanup; + } + + bool found_host_id = false; + /* Loop for banks to search host_id */ + for (j = 0; j < resctrl->num_banks; j++) { + if (resctrl->cache_banks[j].host_id == bank[i].host_id) { + found_host_id = true; + break; + } + } + + if (! found_host_id) {
no spaces after exclamation mark, please
+ virReportError(VIR_ERR_XML_ERROR, + _("'cache bank's host id %u not found on the host"), + bank[i].host_id);
Indentation, ...
+ goto cleanup; + } + + if (bank[i].size == 0 || + bank[i].size % resctrl->cache_banks[j].cache_min != 0) { + virReportError(VIR_ERR_XML_ERROR, + _("'the size should be multiples of '%llu'"), + resctrl->cache_banks[j].cache_min);
... indentation everywhere!
@@ -23554,6 +23684,26 @@ virDomainSchedulerFormat(virBufferPtr buf,
}
+static void +virDomainCacheTuneDefFormat(virBufferPtr buf, + virDomainCachetunePtr cache) +{ + size_t i; + for (i = 0; i < cache->n_banks; i ++) { + virBufferAsprintf(buf, "<cachetune id='%u' host_id='%u' " + "type='%s' size='%llu' unit='KiB'", + cache->cache_banks[i].id, + cache->cache_banks[i].host_id, + cache->cache_banks[i].type, + cache->cache_banks[i].size); + + if (cache->cache_banks[i].vcpus) + virBufferAsprintf(buf, " vcpus='%s'/>\n", + virBitmapFormat(cache->cache_banks[i].vcpus)); + else + virBufferAddLit(buf, "/>\n"); + } +}
static int virDomainCputuneDefFormat(virBufferPtr buf, @@ -23617,6 +23767,8 @@ virDomainCputuneDefFormat(virBufferPtr buf, VIR_FREE(cpumask); }
+ virDomainCacheTuneDefFormat(&childrenBuf, &def->cachetune); + if (def->cputune.emulatorpin) { char *cpumask; virBufferAddLit(&childrenBuf, "<emulatorpin "); diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 1e53cc3..3da5d61 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2153,6 +2153,23 @@ struct _virDomainMemtune { int allocation; /* enum virDomainMemoryAllocation */ };
+typedef struct _virDomainCacheBank virDomainCacheBank; +typedef virDomainCacheBank *virDomainCacheBankPtr; +struct _virDomainCacheBank { + unsigned int id; + unsigned int host_id; + unsigned long long size; + char *type; + virBitmapPtr vcpus; +}; + +typedef struct _virDomainCachetune virDomainCachetune; +typedef virDomainCachetune *virDomainCachetunePtr; +struct _virDomainCachetune { + size_t n_banks; + virDomainCacheBankPtr cache_banks; +}; + typedef struct _virDomainPowerManagement virDomainPowerManagement; typedef virDomainPowerManagement *virDomainPowerManagementPtr;
@@ -2216,6 +2233,8 @@ struct _virDomainDef {
virDomainCputune cputune;
+ virDomainCachetune cachetune; + virDomainNumaPtr numa; virDomainResourceDefPtr resource; virDomainIdMapDef idmap; diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index 44a47cc..eee6675 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -32,15 +32,34 @@ VIR_LOG_INIT("util.resctrl");
#define VIR_FROM_THIS VIR_FROM_RESCTRL - -#define RESCTRL_DIR "/sys/fs/resctrl" -#define RESCTRL_INFO_DIR "/sys/fs/resctrl/info" -#define SYSFS_SYSTEM_PATH "/sys/devices/system" - #define MAX_CPU_SOCKET_NUM 8 #define MAX_CBM_BIT_LEN 32 #define MAX_SCHEMATA_LEN 1024 #define MAX_FILE_LEN (10 * 1024 * 1024) +#define RESCTRL_DIR "/sys/fs/resctrl" +#define RESCTRL_INFO_DIR "/sys/fs/resctrl/info" +#define SYSFS_SYSTEM_PATH "/sys/devices/system" +
your diffs are weird, why does it show like this?
+VIR_ENUM_IMPL(virResCtrl, VIR_RDT_RESOURCE_LAST, + "l3", "l3data", "l3code", "l2"); + +#define CONSTRUCT_RESCTRL_PATH(domain_name, item_name) \ +do { \ + if (NULL == domain_name) { \ + if (virAsprintf(&path, "%s/%s", RESCTRL_DIR, item_name) < 0)\ + return -1; \ + } else { \ + if (virAsprintf(&path, "%s/%s/%s", RESCTRL_DIR, \ + domain_name, \ + item_name) < 0) \ + return -1; \ + } \
For macros like this, it's more readable when the backslashes are aligned (your editor should be able to do that pretty automatically) But for such function it's not very nice to use macro. Just use function (if needed after reworking the patches). Also you'r enot using the macro here, but in later patch, so it doesn't make sense to add it in this commit. Each commit should be somehow one isolated change with all things in one commit somehow related to each other.
+} while (0) + +#define VIR_RESCTRL_ENABLED(type) \ + resctrlall[type].enabled + +#define VIR_RESCTRL_GET_SCHEMATA(count) ((1 << count) - 1)
I thought we have BITS or something similar for this, but I can't find it, but BITS feels more descriptive.
static unsigned int host_id;
diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h index 5a6a344..074d307 100644 --- a/src/util/virresctrl.h +++ b/src/util/virresctrl.h @@ -25,6 +25,7 @@ # define __VIR_RESCTRL_H__
# include "virbitmap.h" +# include "virutil.h"
enum { VIR_RDT_RESOURCE_L3, @@ -32,9 +33,10 @@ enum { VIR_RDT_RESOURCE_L3CODE, VIR_RDT_RESOURCE_L2, /* Must be the last */ - VIR_RDT_RESOURCE_LAST, + VIR_RDT_RESOURCE_LAST
What's the point of this being in this commit?
};
+VIR_ENUM_DECL(virResCtrl);
typedef struct _virResCacheBank virResCacheBank; typedef virResCacheBank *virResCacheBankPtr; -- 1.9.1
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On Wed, Mar 15, 2017 at 03:11:26PM +0100, Martin Kletzander wrote:
On Mon, Mar 06, 2017 at 06:06:32PM +0800, Eli Qiao wrote:
This patch adds new xml element to support cache tune as:
<cputune> ... <cachetune id='1' host_id='0' type='l3' size='2816' unit='KiB'
Again, this was already discussed, probably, I just can't find the source of it. But host_id actually already selects what cache is supposed to be used, so instead of type='l3' we only need scope='both' (or data/instruction) there. Or am I missing something? What I'm concerned about is that the host_id is mostly stable on one host (when the previously mentioned problems are fixed), but it will make no sense when the VM is migrated to another one. Unfortunately, the only solution I can think of is using multiple keys to precisely describe the bank we want (e.g. host's cpu id, cache level and scope), but that seems very unclean.
I tend to view use of this cachetune setting as being similar to using host CPU passthrough - you're intentionally trading off migratability of your guest to get a perf boost. Even without the host_id bit, this is still non-portable, as you might be requesting separate regions for code + data, but the target host of migration may only support shared regions. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

On Wed, Mar 15, 2017 at 02:23:26PM +0000, Daniel P. Berrange wrote:
On Wed, Mar 15, 2017 at 03:11:26PM +0100, Martin Kletzander wrote:
On Mon, Mar 06, 2017 at 06:06:32PM +0800, Eli Qiao wrote:
This patch adds new xml element to support cache tune as:
<cputune> ... <cachetune id='1' host_id='0' type='l3' size='2816' unit='KiB'
Again, this was already discussed, probably, I just can't find the source of it. But host_id actually already selects what cache is supposed to be used, so instead of type='l3' we only need scope='both' (or data/instruction) there. Or am I missing something? What I'm concerned about is that the host_id is mostly stable on one host (when the previously mentioned problems are fixed), but it will make no sense when the VM is migrated to another one. Unfortunately, the only solution I can think of is using multiple keys to precisely describe the bank we want (e.g. host's cpu id, cache level and scope), but that seems very unclean.
I tend to view use of this cachetune setting as being similar to using host CPU passthrough - you're intentionally trading off migratability of your guest to get a perf boost.
Even without the host_id bit, this is still non-portable, as you might be requesting separate regions for code + data, but the target host of migration may only support shared regions.
Sure, but those are things we can check during migration. I'd be OK with disabling migration (or making it --unsafe) when migrating with cachetune.
Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On Wed, Mar 15, 2017 at 03:59:31PM +0100, Martin Kletzander wrote:
On Wed, Mar 15, 2017 at 02:23:26PM +0000, Daniel P. Berrange wrote:
On Wed, Mar 15, 2017 at 03:11:26PM +0100, Martin Kletzander wrote:
On Mon, Mar 06, 2017 at 06:06:32PM +0800, Eli Qiao wrote:
This patch adds new xml element to support cache tune as:
<cputune> ... <cachetune id='1' host_id='0' type='l3' size='2816' unit='KiB'
Again, this was already discussed, probably, I just can't find the source of it. But host_id actually already selects what cache is supposed to be used, so instead of type='l3' we only need scope='both' (or data/instruction) there. Or am I missing something? What I'm concerned about is that the host_id is mostly stable on one host (when the previously mentioned problems are fixed), but it will make no sense when the VM is migrated to another one.
This is the same conditions as pinning a vcpu to a pcpu. So yes, it might be that you want to migrate to a host where a different "host ID" number is used (which might or might not be a different socket).
Unfortunately, the only
solution I can think of is using multiple keys to precisely describe the bank we want (e.g. host's cpu id, cache level and scope), but that seems very unclean.
I tend to view use of this cachetune setting as being similar to using host CPU passthrough - you're intentionally trading off migratability of your guest to get a perf boost.
Even without the host_id bit, this is still non-portable, as you might be requesting separate regions for code + data, but the target host of migration may only support shared regions.
Then migration should fail.
Sure, but those are things we can check during migration. I'd be OK with disabling migration (or making it --unsafe) when migrating with cachetune.
Migration support is required (for NFV usecase they want to migrate VMs around).

On Wed, Mar 15, 2017 at 02:11:23PM -0300, Marcelo Tosatti wrote:
On Wed, Mar 15, 2017 at 03:59:31PM +0100, Martin Kletzander wrote:
On Wed, Mar 15, 2017 at 02:23:26PM +0000, Daniel P. Berrange wrote:
On Wed, Mar 15, 2017 at 03:11:26PM +0100, Martin Kletzander wrote:
On Mon, Mar 06, 2017 at 06:06:32PM +0800, Eli Qiao wrote:
This patch adds new xml element to support cache tune as:
<cputune> ... <cachetune id='1' host_id='0' type='l3' size='2816' unit='KiB'
Again, this was already discussed, probably, I just can't find the source of it. But host_id actually already selects what cache is supposed to be used, so instead of type='l3' we only need scope='both' (or data/instruction) there. Or am I missing something? What I'm concerned about is that the host_id is mostly stable on one host (when the previously mentioned problems are fixed), but it will make no sense when the VM is migrated to another one.
This is the same conditions as pinning a vcpu to a pcpu.
So yes, it might be that you want to migrate to a host where a different "host ID" number is used (which might or might not be a different socket).
Unfortunately, the only
solution I can think of is using multiple keys to precisely describe the bank we want (e.g. host's cpu id, cache level and scope), but that seems very unclean.
I tend to view use of this cachetune setting as being similar to using host CPU passthrough - you're intentionally trading off migratability of your guest to get a perf boost.
Even without the host_id bit, this is still non-portable, as you might be requesting separate regions for code + data, but the target host of migration may only support shared regions.
Then migration should fail.
Yes, it will automatically fail when you can't do the same allocations, that's easy. The difference with host_id is that you cannot foresee whether keeping the host_id makes sense on the destination as well.
Sure, but those are things we can check during migration. I'd be OK with disabling migration (or making it --unsafe) when migrating with cachetune.
Migration support is required (for NFV usecase they want to migrate VMs around).
So we at least need a check for the fact that either the caches are the somehow same or destination XML is supplied. The other option would be what I hinted above, that is to change host_id= to something like level, scope, and pcpu(s)/socket(s). That is fairly easy change from the code point of view, but I'm trying to think more about the user experience. Martin

On Wed, Mar 15, 2017 at 08:49:57PM +0100, Martin Kletzander wrote:
On Wed, Mar 15, 2017 at 02:11:23PM -0300, Marcelo Tosatti wrote:
On Wed, Mar 15, 2017 at 03:59:31PM +0100, Martin Kletzander wrote:
On Wed, Mar 15, 2017 at 02:23:26PM +0000, Daniel P. Berrange wrote:
On Wed, Mar 15, 2017 at 03:11:26PM +0100, Martin Kletzander wrote:
On Mon, Mar 06, 2017 at 06:06:32PM +0800, Eli Qiao wrote:
This patch adds new xml element to support cache tune as:
<cputune> ... <cachetune id='1' host_id='0' type='l3' size='2816' unit='KiB'
Again, this was already discussed, probably, I just can't find the source of it. But host_id actually already selects what cache is supposed to be used, so instead of type='l3' we only need scope='both' (or data/instruction) there. Or am I missing something? What I'm concerned about is that the host_id is mostly stable on one host (when the previously mentioned problems are fixed), but it will make no sense when the VM is migrated to another one.
This is the same conditions as pinning a vcpu to a pcpu.
So yes, it might be that you want to migrate to a host where a different "host ID" number is used (which might or might not be a different socket).
Unfortunately, the only
solution I can think of is using multiple keys to precisely describe the bank we want (e.g. host's cpu id, cache level and scope), but that seems very unclean.
I tend to view use of this cachetune setting as being similar to using host CPU passthrough - you're intentionally trading off migratability of your guest to get a perf boost.
Even without the host_id bit, this is still non-portable, as you might be requesting separate regions for code + data, but the target host of migration may only support shared regions.
Then migration should fail.
Yes, it will automatically fail when you can't do the same allocations, that's easy. The difference with host_id is that you cannot foresee whether keeping the host_id makes sense on the destination as well.
But on the destination, the user can use a different vcpu<->pcpu configuration, right? That is, you do not enforce that if the source has say vcpu0 <-> pcpu2, the destination must have vcpu0 <-> pcpu2 (the configuration for that part is different). So, the same thing can be applied to cache configuration.
Sure, but those are things we can check during migration. I'd be OK with disabling migration (or making it --unsafe) when migrating with cachetune.
Migration support is required (for NFV usecase they want to migrate VMs around).
So we at least need a check for the fact that either the caches are the somehow same or destination XML is supplied.
The other option would be what I hinted above, that is to change host_id= to something like level, scope, and pcpu(s)/socket(s).
host_id maps to L3 socket via: commit d57e3ab7e34c51a8badeea1b500bfb738d0af66e Author: Fenghua Yu <fenghua.yu@intel.com> Date: Sat Oct 22 06:19:50 2016 -0700 x86/intel_cacheinfo: Enable cache id in cache info Cache id is retrieved from APIC ID and CPUID leaf 4 on x86. For more details please see the section on "Cache ID Extraction Parameters" in "Intel 64 Architecture Processor Topology Enumeration". Also the documentation of the CPUID instruction in the "Intel 64 and IA-32 Architectures Software Developer's Manual" So on the destination the user has to configure host_id= of the PCPU which the guest is pinned to.
That is fairly easy change from the code point of view, but I'm trying to think more about the user experience.
Martin

On Wed, Mar 15, 2017 at 05:48:09PM -0300, Marcelo Tosatti wrote:
On Wed, Mar 15, 2017 at 08:49:57PM +0100, Martin Kletzander wrote:
On Wed, Mar 15, 2017 at 02:11:23PM -0300, Marcelo Tosatti wrote:
On Wed, Mar 15, 2017 at 03:59:31PM +0100, Martin Kletzander wrote:
On Wed, Mar 15, 2017 at 02:23:26PM +0000, Daniel P. Berrange wrote:
On Wed, Mar 15, 2017 at 03:11:26PM +0100, Martin Kletzander wrote:
On Mon, Mar 06, 2017 at 06:06:32PM +0800, Eli Qiao wrote: > This patch adds new xml element to support cache tune as: > > <cputune> > ... > <cachetune id='1' host_id='0' type='l3' size='2816' unit='KiB'
Again, this was already discussed, probably, I just can't find the source of it. But host_id actually already selects what cache is supposed to be used, so instead of type='l3' we only need scope='both' (or data/instruction) there. Or am I missing something? What I'm concerned about is that the host_id is mostly stable on one host (when the previously mentioned problems are fixed), but it will make no sense when the VM is migrated to another one.
This is the same conditions as pinning a vcpu to a pcpu.
So yes, it might be that you want to migrate to a host where a different "host ID" number is used (which might or might not be a different socket).
Unfortunately, the only
solution I can think of is using multiple keys to precisely describe the bank we want (e.g. host's cpu id, cache level and scope), but that seems very unclean.
I tend to view use of this cachetune setting as being similar to using host CPU passthrough - you're intentionally trading off migratability of your guest to get a perf boost.
Even without the host_id bit, this is still non-portable, as you might be requesting separate regions for code + data, but the target host of migration may only support shared regions.
Then migration should fail.
Yes, it will automatically fail when you can't do the same allocations, that's easy. The difference with host_id is that you cannot foresee whether keeping the host_id makes sense on the destination as well.
But on the destination, the user can use a different vcpu<->pcpu configuration, right?
That is, you do not enforce that if the source has say vcpu0 <-> pcpu2, the destination must have vcpu0 <-> pcpu2 (the configuration for that part is different).
Yes, you can change it by supplying different destination XML to the migration API.
So, the same thing can be applied to cache configuration.
Yes, it can. However it is way easier to do when you select the particular cache bank by socket/pcpu, level and scope, then just host_id.
Sure, but those are things we can check during migration. I'd be OK with disabling migration (or making it --unsafe) when migrating with cachetune.
Migration support is required (for NFV usecase they want to migrate VMs around).
So we at least need a check for the fact that either the caches are the somehow same or destination XML is supplied.
The other option would be what I hinted above, that is to change host_id= to something like level, scope, and pcpu(s)/socket(s).
host_id maps to L3 socket via:
commit d57e3ab7e34c51a8badeea1b500bfb738d0af66e Author: Fenghua Yu <fenghua.yu@intel.com> Date: Sat Oct 22 06:19:50 2016 -0700
x86/intel_cacheinfo: Enable cache id in cache info
Cache id is retrieved from APIC ID and CPUID leaf 4 on x86.
For more details please see the section on "Cache ID Extraction Parameters" in "Intel 64 Architecture Processor Topology Enumeration".
Also the documentation of the CPUID instruction in the "Intel 64 and IA-32 Architectures Software Developer's Manual"
So on the destination the user has to configure host_id= of the PCPU which the guest is pinned to.
Oh, OK, I missed that it is actually taken from the host. that's because the code generates host_id numbers by using and incrementing global variable, so it does not correspond to the one kernel knows about. Is cache/index*/id the one from APIC?
That is fairly easy change from the code point of view, but I'm trying to think more about the user experience.
Martin
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

virResCtrlSetCacheBanks: Set cache banks of a libvirt domain. It will create new resource domain under `/sys/fs/resctrl` and fill the schemata according the cache bank configration. virResCtrlUpdate: Destroy resctrl domain directory, and update the schemata after libvirt domain destroy. Signed-off-by: Eli Qiao <liyong.qiao@intel.com> --- src/libvirt_private.syms | 6 +- src/util/virresctrl.c | 677 +++++++++++++++++++++++++++++++++++++++++++++++ src/util/virresctrl.h | 5 +- 3 files changed, 686 insertions(+), 2 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index b8445ef..9cfffb8 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2325,7 +2325,11 @@ virRandomInt; virResCtrlAvailable; virResCtrlGet; virResCtrlInit; - +virResCtrlSetCacheBanks; +virResCtrlTypeFromString; +virResCtrlTypeToString; +virResCtrlUpdate; +# # util/virrotatingfile.h virRotatingFileReaderConsume; diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index eee6675..43af0f5 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -19,6 +19,8 @@ * Eli Qiao <liyong.qiao@intel.com> */ #include <config.h> +#include <fcntl.h> +#include <sys/stat.h> #include "virresctrl.h" #include "viralloc.h" @@ -61,8 +63,58 @@ do { \ #define VIR_RESCTRL_GET_SCHEMATA(count) ((1 << count) - 1) +/** + * a virResSchemata represents a schemata object under a resource control + * domain. + */ +typedef struct _virResSchemataItem virResSchemataItem; +typedef virResSchemataItem *virResSchemataItemPtr; +struct _virResSchemataItem { + unsigned int socket_no; + unsigned schemata; +}; + +typedef struct _virResSchemata virResSchemata; +typedef virResSchemata *virResSchemataPtr; +struct _virResSchemata { + unsigned int n_schemata_items; + virResSchemataItemPtr schemata_items; +}; + +/** + * a virResDomain represents a resource control domain. It's a double linked + * list. + */ + +typedef struct _virResDomain virResDomain; +typedef virResDomain *virResDomainPtr; + +struct _virResDomain { + char *name; + virResSchemataPtr schematas[VIR_RDT_RESOURCE_LAST]; + char **tasks; + size_t n_tasks; + size_t n_sockets; + virResDomainPtr pre; + virResDomainPtr next; +}; + +/* All resource control domains on this host*/ +typedef struct _virResCtrlDomain virResCtrlDomain; +typedef virResCtrlDomain *virResCtrlDomainPtr; + +struct _virResCtrlDomain { + unsigned int num_domains; + virResDomainPtr domains; +}; + static unsigned int host_id; +/* Global static struct to be maintained which is a interface */ +static virResCtrlDomain domainall; + +/* Global static struct array to be maintained which indicate + * resource status on a host */ static virResCtrl resctrlall[] = { { .name = "L3", @@ -82,6 +134,78 @@ static virResCtrl resctrlall[] = { }, }; +/* + * How many bits is set in schemata + * eg: + * virResCtrlBitsNum(10110) = 2 */ +static int virResCtrlBitsContinuesNum(unsigned schemata) +{ + size_t i; + int ret = 0; + for (i = 0; i < MAX_CBM_BIT_LEN; i ++) { + if ((schemata & 0x1) == 0x1) + ret++; + else + if (ret > 0 || schemata == 0) break; + + schemata = schemata >> 1; + } + return ret; +} + +/* Position of the highest continue 1 bit of in schemata + * eg: + * virResctrlBitsContinuesPos(10110) = 3 */ +static int virResCtrlBitsContinuesPos(unsigned schemata) +{ + size_t i; + int flag = 0; + for (i = 0; i < MAX_CBM_BIT_LEN; i ++) { + if ((schemata & 0x1) == 0x0 && flag == 1) + return i; + else if ((schemata & 0x1) == 0x1) flag = 1; + + schemata = schemata >> 1; + } + return 0; +} + +static int virResCtrlGetStr(const char *domain_name, const char *item_name, char **ret) +{ + char *path; + int rc = 0; + + CONSTRUCT_RESCTRL_PATH(domain_name, item_name); + + if (virFileReadAll(path, MAX_FILE_LEN, ret) < 0) { + rc = -1; + goto cleanup; + } + + cleanup: + VIR_FREE(path); + return rc; +} + +static int virResCtrlGetSchemata(const int type, const char *name, char **schemata) +{ + int rc; + char *tmp, *end; + char *buf; + + if ((rc = virResCtrlGetStr(name, "schemata", &buf)) < 0) + return rc; + + tmp = strstr(buf, resctrlall[type].name); + end = strchr(tmp, '\n'); + *end = '\0'; + if (VIR_STRDUP(*schemata, tmp) < 0) + rc = -1; + + VIR_FREE(buf); + return rc; +} + static int virResCtrlGetInfoStr(const int type, const char *item, char **str) { int ret = 0; @@ -102,6 +226,71 @@ static int virResCtrlGetInfoStr(const int type, const char *item, char **str) return ret; } +/* Return pointer of and ncount of schemata*/ +static virResSchemataPtr virParseSchemata(const char *schemata_str, size_t *ncount) +{ + const char *p, *q; + int pos; + int ischemata; + virResSchemataPtr schemata; + virResSchemataItemPtr schemataitems, tmpitem; + unsigned int socket_no = 0; + char *tmp; + + if (VIR_ALLOC(schemata) < 0) + goto cleanup; + + p = q = schemata_str; + pos = strchr(schemata_str, ':') - p; + + /* calculate cpu socket count */ + *ncount = 1; + while ((q = strchr(p, ';')) != 0) { + p = q + 1; + (*ncount)++; + } + + /* allocat an arrry to store schemata for each socket*/ + if (VIR_ALLOC_N_QUIET(tmpitem, *ncount) < 0) + goto cleanup; + + schemataitems = tmpitem; + + p = q = schemata_str + pos + 1; + + while (*p != '\0') { + if (*p == '=') { + q = p + 1; + + tmpitem->socket_no = socket_no++; + + while (*p != ';' && *p != '\0') p++; + + if (VIR_STRNDUP(tmp, q, p-q) < 0) + goto cleanup; + + if (virStrToLong_i(tmp, NULL, 16, &ischemata) < 0) + goto cleanup; + + VIR_FREE(tmp); + tmp = NULL; + tmpitem->schemata = ischemata; + tmpitem ++; + schemata->n_schemata_items += 1; + } + p++; + } + + schemata->schemata_items = schemataitems; + return schemata; + + cleanup: + VIR_FREE(schemata); + VIR_FREE(tmpitem); + return NULL; +} + + static int virResCtrlReadConfig(virArch arch, int type) { int ret; @@ -165,6 +354,488 @@ static int virResCtrlReadConfig(virArch arch, int type) return ret; } +/* Remove the Domain from sysfs, this should only success no pids in tasks + * of a partition. + */ +static +int virResCtrlRemoveDomain(const char *name) +{ + char *path = NULL; + int rc = 0; + + if ((rc = virAsprintf(&path, "%s/%s", RESCTRL_DIR, name)) < 0) + return rc; + rc = rmdir(path); + VIR_FREE(path); + return rc; +} + +static +int virResCtrlDestroyDomain(virResDomainPtr p) +{ + size_t i; + int rc; + if ((rc = virResCtrlRemoveDomain(p->name)) < 0) + VIR_WARN("Failed to removed partition %s", p->name); + + VIR_FREE(p->name); + p->name = NULL; + for (i = 0; i < p->n_tasks; i ++) + VIR_FREE(p->tasks[i]); + VIR_FREE(p); + p = NULL; + return rc; +} + + +/* assemble schemata string*/ +static +char* virResCtrlAssembleSchemata(virResSchemataPtr schemata, int type) +{ + virBuffer buf = VIR_BUFFER_INITIALIZER; + size_t i; + + virBufferAsprintf(&buf, "%s:%u=%x", resctrlall[type].name, + schemata->schemata_items[0].socket_no, + schemata->schemata_items[0].schemata); + + for (i = 1; i < schemata->n_schemata_items; i++) { + virBufferAsprintf(&buf, ";%u=%x", + schemata->schemata_items[i].socket_no, + schemata->schemata_items[i].schemata); + } + + return virBufferContentAndReset(&buf); +} + +/* Refresh default domains' schemata + */ +static +int virResCtrlRefreshSchemata(void) +{ + size_t i, j, k; + unsigned int tmp_schemata; + unsigned int default_schemata; + unsigned int min_schemata; + + virResDomainPtr header, p; + + header = domainall.domains; + + if (!header) + return 0; + + for (i = 0; i < VIR_RDT_RESOURCE_LAST; i++) { + if (VIR_RESCTRL_ENABLED(i)) { + min_schemata = VIR_RESCTRL_GET_SCHEMATA(resctrlall[i].min_cbm_bits); + + for (j = 0; j < header->schematas[i]->n_schemata_items; j ++) { + p = header->next; + // Reset to default schemata 0xfffff + default_schemata = VIR_RESCTRL_GET_SCHEMATA(resctrlall[i].cbm_len); + tmp_schemata = 0; + /* NOTEs: if only header domain, the schemata will be set to default one*/ + for (k = 1; k < domainall.num_domains; k++) { + if (p->schematas[i]->schemata_items[j].schemata > min_schemata) + tmp_schemata |= p->schematas[i]->schemata_items[j].schemata; + p = p->next; + } + + default_schemata ^= tmp_schemata; + + default_schemata &= VIR_RESCTRL_GET_SCHEMATA(virResCtrlBitsContinuesPos(default_schemata)); + header->schematas[i]->schemata_items[j].schemata = default_schemata; + int bitsnum = virResCtrlBitsContinuesNum(default_schemata); + resctrlall[i].cache_banks[j].cache_left = + (bitsnum - resctrlall[i].min_cbm_bits) * resctrlall[i].cache_banks[j].cache_min; + } + } + } + + return 0; + +} + +/* Get a domain ptr by domain's name*/ +static +virResDomainPtr virResCtrlGetDomain(const char* name) { + size_t i; + virResDomainPtr p = domainall.domains; + for (i = 0; i < domainall.num_domains; i++) { + if ((p->name) && STREQ(name, p->name)) + return p; + p = p->next; + } + return NULL; +} + +static int +virResCtrlAddTask(virResDomainPtr dom, pid_t pid) +{ + size_t maxtasks; + + if (VIR_RESIZE_N(dom->tasks, maxtasks, dom->n_tasks + 1, 1) < 0) + return -1; + + if (virAsprintf(&(dom->tasks[dom->n_tasks]), "%llu", (long long)pid) < 0) + return -1; + + dom->n_tasks += 1; + return 0; +} + +static int +virResCtrlWrite(const char *name, const char *item, const char *content) +{ + char *path; + int writefd; + int rc = -1; + + CONSTRUCT_RESCTRL_PATH(name, item); + + if (!virFileExists(path)) + goto cleanup; + + if ((writefd = open(path, O_WRONLY | O_APPEND, S_IRUSR | S_IWUSR)) < 0) + goto cleanup; + + if (safewrite(writefd, content, strlen(content)) < 0) + goto cleanup; + + rc = 0; + cleanup: + VIR_FREE(path); + VIR_FORCE_CLOSE(writefd); + return rc; +} + +/* if name == NULL we load default schemata */ +static +virResDomainPtr virResCtrlLoadDomain(const char *name) +{ + char *schematas; + virResDomainPtr p; + size_t i; + + if (VIR_ALLOC(p) < 0) + goto cleanup; + + for (i = 0; i < VIR_RDT_RESOURCE_LAST; i++) { + if (VIR_RESCTRL_ENABLED(i)) { + if (virResCtrlGetSchemata(i, name, &schematas) < 0) + goto cleanup; + p->schematas[i] = virParseSchemata(schematas, &(p->n_sockets)); + VIR_FREE(schematas); + } + } + + p->tasks = NULL; + p->n_tasks = 0; + + if ((name != NULL) && (VIR_STRDUP(p->name, name)) < 0) + goto cleanup; + + return p; + + cleanup: + VIR_FREE(p); + return NULL; +} + +static +virResDomainPtr virResCtrlCreateDomain(const char *name) +{ + char *path; + mode_t mode = 0755; + virResDomainPtr p; + size_t i, j; + + if (virAsprintf(&path, "%s/%s", RESCTRL_DIR, name) < 0) + return NULL; + + if (virDirCreate(path, mode, 0, 0, 0) < 0) + goto cleanup; + + if ((p = virResCtrlLoadDomain(name)) == NULL) + return p; + + /* sys fs doens't let us use 0. + * reset schemata to min_bits*/ + for (i = 0; i < VIR_RDT_RESOURCE_LAST; i++) { + if (VIR_RESCTRL_ENABLED(i)) { + int min_bits = VIR_RESCTRL_GET_SCHEMATA(resctrlall[i].min_cbm_bits); + for (j = 0; j < p->n_sockets; j++) + p->schematas[i]->schemata_items[j].schemata = min_bits; + } + } + + VIR_FREE(path); + return p; + + cleanup: + VIR_FREE(path); + return NULL; +} + +/* flush domains's information to sysfs*/ +static int +virResCtrlFlushDomainToSysfs(virResDomainPtr dom) +{ + size_t i; + char* schemata; + char* tmp; + int rc = -1; + virBuffer buf = VIR_BUFFER_INITIALIZER; + + for (i = 0; i < VIR_RDT_RESOURCE_LAST; i++) { + if (VIR_RESCTRL_ENABLED(i)) { + tmp = virResCtrlAssembleSchemata(dom->schematas[i], i); + virBufferAsprintf(&buf, "%s\n", tmp); + VIR_FREE(tmp); + } + } + + schemata = virBufferContentAndReset(&buf); + + if (virResCtrlWrite(dom->name, "schemata", schemata) < 0) + goto cleanup; + + if (dom->n_tasks > 0) { + for (i = 0; i < dom->n_tasks; i++) { + if (virResCtrlWrite(dom->name, "tasks", dom->tasks[i]) < 0) + goto cleanup; + } + } + + rc = 0; + + cleanup: + VIR_FREE(schemata); + return rc; +} + +static virResDomainPtr virResCtrlGetAllDomains(unsigned int *len) +{ + struct dirent *ent; + DIR *dp = NULL; + int direrr; + + *len = 0; + virResDomainPtr header, tmp, tmp_pre; + header = tmp = tmp_pre = NULL; + if (virDirOpenQuiet(&dp, RESCTRL_DIR) < 0) { + if (errno == ENOENT) + return NULL; + VIR_ERROR(_("Unable to open %s (%d)"), RESCTRL_DIR, errno); + goto cleanup; + } + + header = virResCtrlLoadDomain(NULL); + if (header == NULL) + goto cleanup; + + header->next = NULL; + + *len = 1; + + while ((direrr = virDirRead(dp, &ent, NULL)) > 0) { + if ((ent->d_type != DT_DIR) || STREQ(ent->d_name, "info")) + continue; + + tmp = virResCtrlLoadDomain(ent->d_name); + if (tmp == NULL) + goto cleanup; + + tmp->next = NULL; + + if (header->next == NULL) + header->next = tmp; + + if (tmp_pre == NULL) { + tmp->pre = header; + } else { + tmp->pre = tmp_pre; + tmp_pre->next = tmp; + } + + tmp_pre = tmp; + (*len) ++; + } + return header; + + cleanup: + VIR_DIR_CLOSE(dp); + tmp_pre = tmp = header; + while (tmp) { + tmp_pre = tmp; + tmp = tmp->next; + VIR_FREE(tmp_pre); + } + return NULL; +} + +static int +virResCtrlAppendDomain(virResDomainPtr dom) +{ + virResDomainPtr p = domainall.domains; + while (p->next != NULL) p = p->next; + p->next = dom; + dom->pre = p; + domainall.num_domains += 1; + return 0; +} + +static int +virResCtrlGetSocketIdByHostID(int type, unsigned int hostid) +{ + size_t i; + for (i = 0; i < resctrlall[type].num_banks; i++) { + if (resctrlall[type].cache_banks[i].host_id == hostid) + return i; + } + return -1; +} + +static int +virResCtrlCalculateSchemata(int type, + int sid, + unsigned hostid, + unsigned long long size) +{ + size_t i; + int count; + virResDomainPtr p; + unsigned int tmp_schemata; + unsigned int schemata_sum = 0; + + if (resctrlall[type].cache_banks[sid].cache_left < size) { + VIR_ERROR(_("Not enough cache left on bank %u"), hostid); + return -1; + } + if ((count = size / resctrlall[type].cache_banks[sid].cache_min) <= 0) { + VIR_ERROR(_("Error cache size %llu"), size); + return -1; + } + + tmp_schemata = VIR_RESCTRL_GET_SCHEMATA(count); + + p = domainall.domains; + p = p->next; + for (i = 1; i < domainall.num_domains; i ++) { + schemata_sum |= p->schematas[type]->schemata_items[sid].schemata; + p = p->next; + } + + tmp_schemata = tmp_schemata << (resctrlall[type].cbm_len - count); + + while ((tmp_schemata & schemata_sum) != 0) + tmp_schemata = tmp_schemata >> 1; + return tmp_schemata; +} + +int virResCtrlSetCacheBanks(virDomainCachetunePtr cachetune, + unsigned char *uuid, pid_t *pids, int npid) +{ + size_t i; + char name[VIR_UUID_STRING_BUFLEN]; + virResDomainPtr p; + int type; + int sid; + int schemata; + + virUUIDFormat(uuid, name); + + for (i = 0; i < cachetune->n_banks; i++) { + VIR_DEBUG("cache_banks %u, %u, %llu, %s", + cachetune->cache_banks[i].id, + cachetune->cache_banks[i].host_id, + cachetune->cache_banks[i].size, + cachetune->cache_banks[i].type); + } + + if (cachetune->n_banks < 1) + return 0; + + p = virResCtrlGetDomain(name); + if (p == NULL) { + VIR_DEBUG("no domain name %s found, create new one!", name); + p = virResCtrlCreateDomain(name); + } + + if (p != NULL) { + for (i = 0; i < cachetune->n_banks; i++) { + if ((type = virResCtrlTypeFromString( + cachetune->cache_banks[i].type)) < 0) { + VIR_WARN("Ignore unknown cache type %s.", + cachetune->cache_banks[i].type); + continue; + } + + if ((sid = virResCtrlGetSocketIdByHostID( + type, cachetune->cache_banks[i].host_id)) < 0) { + VIR_WARN("Can not find cache bank host id %u.", + cachetune->cache_banks[i].host_id); + continue; + } + + if ((schemata = virResCtrlCalculateSchemata( + type, sid, cachetune->cache_banks[i].host_id, + cachetune->cache_banks[i].size)) < 0) { + VIR_WARN("Failed to set schemata for cache bank id %u", + cachetune->cache_banks[i].id); + continue; + } + + p->schematas[type]->schemata_items[sid].schemata = schemata; + } + + for (i = 0; i < npid; i++) + virResCtrlAddTask(p, pids[i]); + + if (virResCtrlFlushDomainToSysfs(p) < 0) { + VIR_ERROR(_("failed to flush domain %s to sysfs"), name); + virResCtrlDestroyDomain(p); + return -1; + } + virResCtrlAppendDomain(p); + } else { + VIR_ERROR(_("Failed to create a domain in sysfs")); + return -1; + } + + virResCtrlRefreshSchemata(); + /* after refresh, flush header's schemata changes to sys fs */ + if (virResCtrlFlushDomainToSysfs(domainall.domains) < 0) + VIR_WARN("failed to flush domain to sysfs"); + + return 0; +} + +/* Should be called after pid disappeared, we recalculate + * schemata of default and flush it to sys fs. + */ +int virResCtrlUpdate(unsigned char *uuid) +{ + char name[VIR_UUID_STRING_BUFLEN]; + virResDomainPtr del; + + virUUIDFormat(uuid, name); + + del = virResCtrlGetDomain(name); + + if (del != NULL) { + del->pre->next = del->next; + if (del->next != NULL) + del->next->pre = del->pre; + virResCtrlDestroyDomain(del); + domainall.num_domains -= 1; + virResCtrlRefreshSchemata(); + if (virResCtrlFlushDomainToSysfs(domainall.domains) < 0) + VIR_WARN("failed to flush domain to sysfs"); + } + return 0; +} + int virResCtrlInit(void) { @@ -193,6 +864,12 @@ virResCtrlInit(void) VIR_FREE(tmp); } + domainall.domains = virResCtrlGetAllDomains(&(domainall.num_domains)); + + if ((rc = virResCtrlRefreshSchemata()) < 0) + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to refresh resource control")); + cleanup: VIR_FREE(tmp); return rc; diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h index 074d307..3a6fb95 100644 --- a/src/util/virresctrl.h +++ b/src/util/virresctrl.h @@ -26,6 +26,7 @@ # include "virbitmap.h" # include "virutil.h" +# include "conf/domain_conf.h" enum { VIR_RDT_RESOURCE_L3, @@ -76,5 +77,7 @@ struct _virResCtrl { bool virResCtrlAvailable(void); int virResCtrlInit(void); virResCtrlPtr virResCtrlGet(int); - +int virResCtrlSetCacheBanks(virDomainCachetunePtr, + unsigned char *, pid_t *, int); +int virResCtrlUpdate(unsigned char *); #endif -- 1.9.1

On Mon, Mar 06, 2017 at 06:06:33PM +0800, Eli Qiao wrote:
virResCtrlSetCacheBanks: Set cache banks of a libvirt domain. It will create new resource domain under `/sys/fs/resctrl` and fill the schemata according the cache bank configration.
virResCtrlUpdate: Destroy resctrl domain directory, and update the schemata after libvirt domain destroy.
Signed-off-by: Eli Qiao <liyong.qiao@intel.com> --- src/libvirt_private.syms | 6 +- src/util/virresctrl.c | 677 +++++++++++++++++++++++++++++++++++++++++++++++ src/util/virresctrl.h | 5 +- 3 files changed, 686 insertions(+), 2 deletions(-)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index b8445ef..9cfffb8 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2325,7 +2325,11 @@ virRandomInt; virResCtrlAvailable; virResCtrlGet; virResCtrlInit; -
Don't delete that line, keep two empty lines so that the file is consistent. Every time you do something somewhere in some file, look around the file and try keeping stuff consistent.
+virResCtrlSetCacheBanks; +virResCtrlTypeFromString; +virResCtrlTypeToString; +virResCtrlUpdate; +#
# util/virrotatingfile.h virRotatingFileReaderConsume; diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index eee6675..43af0f5 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -19,6 +19,8 @@ * Eli Qiao <liyong.qiao@intel.com> */ #include <config.h> +#include <fcntl.h> +#include <sys/stat.h>
#include "virresctrl.h" #include "viralloc.h" @@ -61,8 +63,58 @@ do { \
#define VIR_RESCTRL_GET_SCHEMATA(count) ((1 << count) - 1)
+/** + * a virResSchemata represents a schemata object under a resource control + * domain. + */ +typedef struct _virResSchemataItem virResSchemataItem; +typedef virResSchemataItem *virResSchemataItemPtr; +struct _virResSchemataItem { + unsigned int socket_no; + unsigned schemata; +}; + +typedef struct _virResSchemata virResSchemata; +typedef virResSchemata *virResSchemataPtr; +struct _virResSchemata { + unsigned int n_schemata_items; + virResSchemataItemPtr schemata_items; +}; + +/** + * a virResDomain represents a resource control domain. It's a double linked + * list. + */ + +typedef struct _virResDomain virResDomain; +typedef virResDomain *virResDomainPtr; + +struct _virResDomain { + char *name; + virResSchemataPtr schematas[VIR_RDT_RESOURCE_LAST]; + char **tasks; + size_t n_tasks; + size_t n_sockets; + virResDomainPtr pre; + virResDomainPtr next; +}; + +/* All resource control domains on this host*/ +typedef struct _virResCtrlDomain virResCtrlDomain; +typedef virResCtrlDomain *virResCtrlDomainPtr; + +struct _virResCtrlDomain { + unsigned int num_domains; + virResDomainPtr domains; +}; + static unsigned int host_id;
+/* Global static struct to be maintained which is a interface */ +static virResCtrlDomain domainall; + +/* Global static struct array to be maintained which indicate + * resource status on a host */ static virResCtrl resctrlall[] = { { .name = "L3", @@ -82,6 +134,78 @@ static virResCtrl resctrlall[] = { }, };
+/* + * How many bits is set in schemata + * eg: + * virResCtrlBitsNum(10110) = 2 */
You mean how many consecutive bits? Why are you not using (and extending in case there's something missing) the virBitmap for this kind of things? Also going bit by bit in such a big loop, I doubt compiler will optimize this for us. Can we do something like: ffs((schemata >> ffs(schemata)) + 1) instead?
+static int virResCtrlBitsContinuesNum(unsigned schemata) +{ + size_t i; + int ret = 0; + for (i = 0; i < MAX_CBM_BIT_LEN; i ++) { + if ((schemata & 0x1) == 0x1) + ret++; + else + if (ret > 0 || schemata == 0) break; + + schemata = schemata >> 1; + } + return ret; +} + +/* Position of the highest continue 1 bit of in schemata + * eg: + * virResctrlBitsContinuesPos(10110) = 3 */ +static int virResCtrlBitsContinuesPos(unsigned schemata)
full type names please (unsigned int)
+{ + size_t i; + int flag = 0; + for (i = 0; i < MAX_CBM_BIT_LEN; i ++) { + if ((schemata & 0x1) == 0x0 && flag == 1) + return i; + else if ((schemata & 0x1) == 0x1) flag = 1; + + schemata = schemata >> 1; + } + return 0;
something like: f = ffs(schemata) return f + (schemata >> f) + 1 ??? [...]
+static int virResCtrlGetSchemata(const int type, const char *name, char **schemata) +{ + int rc; + char *tmp, *end; + char *buf; + + if ((rc = virResCtrlGetStr(name, "schemata", &buf)) < 0) + return rc; + + tmp = strstr(buf, resctrlall[type].name);
You will find L3CODE when looking for L3 as well.

While user can assign some specific vcpus list in <cachetune>, adds the vcpus' pids to cache bank, else vm->pid will be added to cache bank. Signed-off-by: Eli Qiao <liyong.qiao@intel.com> --- src/qemu/qemu_driver.c | 6 ++++-- src/qemu/qemu_process.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 58 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 3bfb4ec..0f11ae2 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -850,8 +850,10 @@ qemuStateInitialize(bool privileged, run_gid = cfg->group; } - if (virResCtrlAvailable() && virResCtrlInit() < 0) - VIR_WARN("Faild to initialize resource control."); + if (virResCtrlAvailable() && virResCtrlInit() < 0) { + VIR_ERROR(_("Faild to initialize resource control")); + goto error; + } qemu_driver->qemuCapsCache = virQEMUCapsCacheNew(cfg->libDir, cfg->cacheDir, diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 68378c9..5935502 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -76,6 +76,7 @@ #include "configmake.h" #include "nwfilter_conf.h" #include "netdev_bandwidth_conf.h" +#include "virresctrl.h" #define VIR_FROM_THIS VIR_FROM_QEMU @@ -5055,6 +5056,50 @@ qemuProcessVcpusSortOrder(const void *a, return vcpua->order - vcpub->order; } +static int +qemuProcessSetCacheBanks(virDomainObjPtr vm) +{ + size_t i, j; + virDomainCachetunePtr cachetune; + unsigned int max_vcpus = virDomainDefGetVcpusMax(vm->def); + pid_t *pids = NULL; + virDomainVcpuDefPtr vcpu; + size_t npid = 0; + size_t count = 0; + int ret = -1; + + cachetune = &(vm->def->cachetune); + + for (i = 0; i < cachetune->n_banks; i++) { + if (cachetune->cache_banks[i].vcpus) { + for (j = 0; j < max_vcpus; j++) { + if (virBitmapIsBitSet(cachetune->cache_banks[i].vcpus, j)) { + + vcpu = virDomainDefGetVcpu(vm->def, j); + if (!vcpu->online) + continue; + + if (VIR_RESIZE_N(pids, npid, count, 1) < 0) + goto cleanup; + pids[count ++] = qemuDomainGetVcpuPid(vm, j); + } + } + } + } + + /* If not specific vcpus in cachetune, add vm->pid */ + if (pids == NULL) { + if (VIR_ALLOC_N(pids, 1) < 0) + goto cleanup; + pids[0] = vm->pid; + count = 1; + } + ret = virResCtrlSetCacheBanks(cachetune, vm->def->uuid, pids, count); + + cleanup: + VIR_FREE(pids); + return ret; +} static int qemuProcessSetupHotpluggableVcpus(virQEMUDriverPtr driver, @@ -5749,6 +5794,11 @@ qemuProcessLaunch(virConnectPtr conn, qemuProcessAutoDestroyAdd(driver, vm, conn) < 0) goto cleanup; + VIR_DEBUG("Cache allocation"); + + if (virResCtrlAvailable() && qemuProcessSetCacheBanks(vm) < 0) + goto cleanup; + ret = 0; cleanup: @@ -6251,6 +6301,10 @@ void qemuProcessStop(virQEMUDriverPtr driver, virPerfFree(priv->perf); priv->perf = NULL; + if (virResCtrlAvailable() && virResCtrlUpdate(vm->def->uuid) < 0) + VIR_WARN("Failed to update resource control for %s", + vm->def->name); + qemuProcessRemoveDomainStatus(driver, vm); /* Remove VNC and Spice ports from port reservation bitmap, but only if -- 1.9.1

Enable l3code/l3data while doing cache tune. l3code/l3data should use a continus cbm in their seperated schemata and the cache size are shared between them, so we need to deal them differently with l3 cache. This should enable cdp feature while mounting /sys/fs/resctrl, eg: mount -t resctrl resctrl -o cdp /sys/fs/resctrl Signed-off-by: Eli Qiao <liyong.qiao@intel.com> --- docs/schemas/domaincommon.rng | 2 +- src/util/virresctrl.c | 27 +++++++++++++++++++++++++-- 2 files changed, 26 insertions(+), 3 deletions(-) diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index efc84c5..ed8bdb9 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -5528,7 +5528,7 @@ </define> <define name="cachetype"> <data type="string"> - <param name="pattern">(l3)</param> + <param name="pattern">(l3|l3code|l3data)</param> </data> </define> <define name="cacheunit"> diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index 43af0f5..e7376a0 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -417,6 +417,7 @@ int virResCtrlRefreshSchemata(void) unsigned int tmp_schemata; unsigned int default_schemata; unsigned int min_schemata; + int pair_type = 0; virResDomainPtr header, p; @@ -429,6 +430,11 @@ int virResCtrlRefreshSchemata(void) if (VIR_RESCTRL_ENABLED(i)) { min_schemata = VIR_RESCTRL_GET_SCHEMATA(resctrlall[i].min_cbm_bits); + if (i == VIR_RDT_RESOURCE_L3DATA) + pair_type = VIR_RDT_RESOURCE_L3CODE; + if (i == VIR_RDT_RESOURCE_L3CODE) + pair_type = VIR_RDT_RESOURCE_L3DATA; + for (j = 0; j < header->schematas[i]->n_schemata_items; j ++) { p = header->next; // Reset to default schemata 0xfffff @@ -436,8 +442,11 @@ int virResCtrlRefreshSchemata(void) tmp_schemata = 0; /* NOTEs: if only header domain, the schemata will be set to default one*/ for (k = 1; k < domainall.num_domains; k++) { - if (p->schematas[i]->schemata_items[j].schemata > min_schemata) + if (p->schematas[i]->schemata_items[j].schemata > min_schemata) { tmp_schemata |= p->schematas[i]->schemata_items[j].schemata; + if (pair_type > 0) + tmp_schemata |= p->schematas[pair_type]->schemata_items[j].schemata; + } p = p->next; } @@ -503,6 +512,7 @@ virResCtrlWrite(const char *name, const char *item, const char *content) goto cleanup; rc = 0; + cleanup: VIR_FREE(path); VIR_FORCE_CLOSE(writefd); @@ -707,6 +717,7 @@ virResCtrlCalculateSchemata(int type, virResDomainPtr p; unsigned int tmp_schemata; unsigned int schemata_sum = 0; + int pair_type = 0; if (resctrlall[type].cache_banks[sid].cache_left < size) { VIR_ERROR(_("Not enough cache left on bank %u"), hostid); @@ -721,8 +732,18 @@ virResCtrlCalculateSchemata(int type, p = domainall.domains; p = p->next; + + /* for type is l3code and l3data, we need to deal them specially*/ + if (type == VIR_RDT_RESOURCE_L3DATA) + pair_type = VIR_RDT_RESOURCE_L3CODE; + + if (type == VIR_RDT_RESOURCE_L3CODE) + pair_type = VIR_RDT_RESOURCE_L3DATA; + for (i = 1; i < domainall.num_domains; i ++) { schemata_sum |= p->schematas[type]->schemata_items[sid].schemata; + if (pair_type > 0) + schemata_sum |= p->schematas[pair_type]->schemata_items[sid].schemata; p = p->next; } @@ -763,6 +784,9 @@ int virResCtrlSetCacheBanks(virDomainCachetunePtr cachetune, } if (p != NULL) { + + virResCtrlAppendDomain(p); + for (i = 0; i < cachetune->n_banks; i++) { if ((type = virResCtrlTypeFromString( cachetune->cache_banks[i].type)) < 0) { @@ -797,7 +821,6 @@ int virResCtrlSetCacheBanks(virDomainCachetunePtr cachetune, virResCtrlDestroyDomain(p); return -1; } - virResCtrlAppendDomain(p); } else { VIR_ERROR(_("Failed to create a domain in sysfs")); return -1; -- 1.9.1

l3data and l3code type of cache banks should be configured pairs. Signed-off-by: Eli Qiao <liyong.qiao@intel.com> --- src/conf/domain_conf.c | 19 +++++++++++++++++++ src/util/virresctrl.c | 1 - src/util/virresctrl.h | 2 ++ 3 files changed, 21 insertions(+), 1 deletion(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 652f4ca..86c292d 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -15760,9 +15760,13 @@ virDomainCacheTuneDefParseXML(virDomainDefPtr def, int type = -1; virDomainCacheBankPtr bank = NULL; virResCtrlPtr resctrl; + /* An array to make sure l3code and l3data are pairs */ + int* sem = NULL; if (VIR_ALLOC_N(bank, n) < 0) goto cleanup; + if (VIR_ALLOC_N(sem, MAX_CPU_SOCKET_NUM) < 0) + goto cleanup; for (i = 0; i < n; i++) { if (!(tmp = virXMLPropString(nodes[i], "id"))) { @@ -15810,6 +15814,12 @@ virDomainCacheTuneDefParseXML(virDomainDefPtr def, goto cleanup; } + /* VIR_RDT_RESOURCE_L3DATA and VIR_RDT_RESOURCE_L3CODE should be pair */ + if (type == VIR_RDT_RESOURCE_L3DATA) + sem[bank[i].host_id] ++; + else if (type == VIR_RDT_RESOURCE_L3CODE) + sem[bank[i].host_id] --; + resctrl = virResCtrlGet(type); if (resctrl == NULL || !resctrl->enabled) { @@ -15858,6 +15868,14 @@ virDomainCacheTuneDefParseXML(virDomainDefPtr def, } } + for (i = 0; i < MAX_CPU_SOCKET_NUM; i ++) { + if (sem[i] != 0) { + virReportError(VIR_ERR_XML_ERROR, + _("'l3code and l3data shoud be show up pairs on bank %zu'"), + i); + goto cleanup; + } + } def->cachetune.cache_banks = bank; def->cachetune.n_banks = n; return 0; @@ -15865,6 +15883,7 @@ virDomainCacheTuneDefParseXML(virDomainDefPtr def, cleanup: VIR_FREE(bank); VIR_FREE(tmp); + VIR_FREE(sem); return -1; } diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index e7376a0..ee5f043 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -34,7 +34,6 @@ VIR_LOG_INIT("util.resctrl"); #define VIR_FROM_THIS VIR_FROM_RESCTRL -#define MAX_CPU_SOCKET_NUM 8 #define MAX_CBM_BIT_LEN 32 #define MAX_SCHEMATA_LEN 1024 #define MAX_FILE_LEN (10 * 1024 * 1024) diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h index 3a6fb95..d639de1 100644 --- a/src/util/virresctrl.h +++ b/src/util/virresctrl.h @@ -28,6 +28,8 @@ # include "virutil.h" # include "conf/domain_conf.h" +#define MAX_CPU_SOCKET_NUM 8 + enum { VIR_RDT_RESOURCE_L3, VIR_RDT_RESOURCE_L3DATA, -- 1.9.1

This patch support l3 cache allocation compatible mode if cdp enabled on host. In this case l3code/l3data has same schemata. Signed-off-by: Eli Qiao <liyong.qiao@intel.com> --- src/conf/domain_conf.c | 15 +++++++++++++-- src/util/virresctrl.c | 10 ++++++++++ 2 files changed, 23 insertions(+), 2 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 86c292d..710c327 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -15823,9 +15823,20 @@ virDomainCacheTuneDefParseXML(virDomainDefPtr def, resctrl = virResCtrlGet(type); if (resctrl == NULL || !resctrl->enabled) { - virReportError(VIR_ERR_XML_ERROR, + /* support cdp compatible */ + if (type == VIR_RDT_RESOURCE_L3) { + resctrl = virResCtrlGet(type + 1); + if (resctrl == NULL || !resctrl->enabled) { + virReportError(VIR_ERR_XML_ERROR, _("'host doesn't enabled cache type '%s'"), tmp); - goto cleanup; + goto cleanup; + } + VIR_WARN("Use cdp compatible mode."); + } else { + virReportError(VIR_ERR_XML_ERROR, + _("'host doesn't enabled cache type '%s'"), tmp); + goto cleanup; + } } bool found_host_id = false; diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index ee5f043..9f0d05f 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -760,6 +760,7 @@ int virResCtrlSetCacheBanks(virDomainCachetunePtr cachetune, char name[VIR_UUID_STRING_BUFLEN]; virResDomainPtr p; int type; + int pair_type = -1; int sid; int schemata; @@ -793,6 +794,13 @@ int virResCtrlSetCacheBanks(virDomainCachetunePtr cachetune, cachetune->cache_banks[i].type); continue; } + /* use cdp compatible mode */ + if (!VIR_RESCTRL_ENABLED(type) && + (type == VIR_RDT_RESOURCE_L3) && + VIR_RESCTRL_ENABLED(VIR_RDT_RESOURCE_L3DATA)) { + type = VIR_RDT_RESOURCE_L3DATA; + pair_type = VIR_RDT_RESOURCE_L3CODE; + } if ((sid = virResCtrlGetSocketIdByHostID( type, cachetune->cache_banks[i].host_id)) < 0) { @@ -810,6 +818,8 @@ int virResCtrlSetCacheBanks(virDomainCachetunePtr cachetune, } p->schematas[type]->schemata_items[sid].schemata = schemata; + if (pair_type > 0) + p->schematas[pair_type]->schemata_items[sid].schemata = schemata; } for (i = 0; i < npid; i++) -- 1.9.1

The internal struct list domainall is a list which are resctral domain status shared by all VMs, especiall the default domain, each VM should access it concomitantly. Ues a mutex to control it. Each bank's cache_left field is also a global shared resource we need to be care, add a mutex for each bank. We need also to add lock to access /sys/fs/resctrl, use flock. Signed-off-by: Eli Qiao <liyong.qiao@intel.com> --- src/util/virresctrl.c | 93 ++++++++++++++++++++++++++++++++++++++++++++------- src/util/virresctrl.h | 3 ++ 2 files changed, 83 insertions(+), 13 deletions(-) diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index 9f0d05f..3c54e44 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -20,7 +20,9 @@ */ #include <config.h> #include <fcntl.h> +#include <sys/file.h> #include <sys/stat.h> +#include <sys/types.h> #include "virresctrl.h" #include "viralloc.h" @@ -62,6 +64,9 @@ do { \ #define VIR_RESCTRL_GET_SCHEMATA(count) ((1 << count) - 1) +#define VIR_RESCTRL_LOCK(fd, op) flock(fd, op) +#define VIR_RESCTRL_UNLOCK(fd) flock(fd, LOCK_UN) + /** * a virResSchemata represents a schemata object under a resource control * domain. @@ -105,6 +110,8 @@ typedef virResCtrlDomain *virResCtrlDomainPtr; struct _virResCtrlDomain { unsigned int num_domains; virResDomainPtr domains; + + virMutex lock; }; static unsigned int host_id; @@ -176,11 +183,16 @@ static int virResCtrlGetStr(const char *domain_name, const char *item_name, char CONSTRUCT_RESCTRL_PATH(domain_name, item_name); + if (!virFileExists(path)) + goto cleanup; + if (virFileReadAll(path, MAX_FILE_LEN, ret) < 0) { rc = -1; goto cleanup; } + rc = 0; + cleanup: VIR_FREE(path); return rc; @@ -687,10 +699,15 @@ static int virResCtrlAppendDomain(virResDomainPtr dom) { virResDomainPtr p = domainall.domains; + + virMutexLock(&domainall.lock); + while (p->next != NULL) p = p->next; p->next = dom; dom->pre = p; domainall.num_domains += 1; + + virMutexUnlock(&domainall.lock); return 0; } @@ -713,18 +730,22 @@ virResCtrlCalculateSchemata(int type, { size_t i; int count; + int rc = -1; virResDomainPtr p; unsigned int tmp_schemata; unsigned int schemata_sum = 0; int pair_type = 0; + virMutexLock(&resctrlall[type].cache_banks[sid].lock); + if (resctrlall[type].cache_banks[sid].cache_left < size) { VIR_ERROR(_("Not enough cache left on bank %u"), hostid); - return -1; + goto cleanup; } + if ((count = size / resctrlall[type].cache_banks[sid].cache_min) <= 0) { VIR_ERROR(_("Error cache size %llu"), size); - return -1; + goto cleanup; } tmp_schemata = VIR_RESCTRL_GET_SCHEMATA(count); @@ -739,7 +760,7 @@ virResCtrlCalculateSchemata(int type, if (type == VIR_RDT_RESOURCE_L3CODE) pair_type = VIR_RDT_RESOURCE_L3DATA; - for (i = 1; i < domainall.num_domains; i ++) { + for (i = 1; i < domainall.num_domains; i++) { schemata_sum |= p->schematas[type]->schemata_items[sid].schemata; if (pair_type > 0) schemata_sum |= p->schematas[pair_type]->schemata_items[sid].schemata; @@ -750,7 +771,16 @@ virResCtrlCalculateSchemata(int type, while ((tmp_schemata & schemata_sum) != 0) tmp_schemata = tmp_schemata >> 1; - return tmp_schemata; + + resctrlall[type].cache_banks[sid].cache_left -= size; + if (pair_type > 0) + resctrlall[pair_type].cache_banks[sid].cache_left = resctrlall[type].cache_banks[sid].cache_left; + + rc = tmp_schemata; + + cleanup: + virMutexUnlock(&resctrlall[type].cache_banks[sid].lock); + return rc; } int virResCtrlSetCacheBanks(virDomainCachetunePtr cachetune, @@ -763,8 +793,8 @@ int virResCtrlSetCacheBanks(virDomainCachetunePtr cachetune, int pair_type = -1; int sid; int schemata; - - virUUIDFormat(uuid, name); + int lockfd; + int rc = -1; for (i = 0; i < cachetune->n_banks; i++) { VIR_DEBUG("cache_banks %u, %u, %llu, %s", @@ -777,12 +807,21 @@ int virResCtrlSetCacheBanks(virDomainCachetunePtr cachetune, if (cachetune->n_banks < 1) return 0; + virUUIDFormat(uuid, name); + + if ((lockfd = open(RESCTRL_DIR, O_RDONLY)) < 0) + goto cleanup; + + if (VIR_RESCTRL_LOCK(lockfd, LOCK_EX) < 0) { + virReportSystemError(errno, _("Unable to lock '%s'"), RESCTRL_DIR); + goto cleanup; + } + p = virResCtrlGetDomain(name); if (p == NULL) { VIR_DEBUG("no domain name %s found, create new one!", name); p = virResCtrlCreateDomain(name); } - if (p != NULL) { virResCtrlAppendDomain(p); @@ -828,19 +867,25 @@ int virResCtrlSetCacheBanks(virDomainCachetunePtr cachetune, if (virResCtrlFlushDomainToSysfs(p) < 0) { VIR_ERROR(_("failed to flush domain %s to sysfs"), name); virResCtrlDestroyDomain(p); - return -1; + goto cleanup; } } else { VIR_ERROR(_("Failed to create a domain in sysfs")); - return -1; + goto cleanup; } virResCtrlRefreshSchemata(); /* after refresh, flush header's schemata changes to sys fs */ - if (virResCtrlFlushDomainToSysfs(domainall.domains) < 0) - VIR_WARN("failed to flush domain to sysfs"); + if (virResCtrlFlushDomainToSysfs(domainall.domains) < 0) { + VIR_ERROR(_("failed to flush domain to sysfs")); + goto cleanup; + } - return 0; + rc = 0; + + cleanup: + VIR_RESCTRL_UNLOCK(lockfd); + return rc; } /* Should be called after pid disappeared, we recalculate @@ -856,6 +901,9 @@ int virResCtrlUpdate(unsigned char *uuid) del = virResCtrlGetDomain(name); if (del != NULL) { + + virMutexLock(&domainall.lock); + del->pre->next = del->next; if (del->next != NULL) del->next->pre = del->pre; @@ -864,6 +912,7 @@ int virResCtrlUpdate(unsigned char *uuid) virResCtrlRefreshSchemata(); if (virResCtrlFlushDomainToSysfs(domainall.domains) < 0) VIR_WARN("failed to flush domain to sysfs"); + virMutexUnlock(&domainall.lock); } return 0; } @@ -871,7 +920,7 @@ int virResCtrlUpdate(unsigned char *uuid) int virResCtrlInit(void) { - size_t i = 0; + size_t i, j; char *tmp; int rc = 0; @@ -898,6 +947,24 @@ virResCtrlInit(void) domainall.domains = virResCtrlGetAllDomains(&(domainall.num_domains)); + for (i = 0; i < VIR_RDT_RESOURCE_LAST; i++) { + if (VIR_RESCTRL_ENABLED(i)) { + for (j = 0; j < resctrlall[i].num_banks; j++) { + if (virMutexInit(&resctrlall[i].cache_banks[j].lock) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Unable to initialize mutex")); + return -1; + } + } + } + } + + if (virMutexInit(&domainall.lock) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Unable to initialize mutex")); + return -1; + } + if ((rc = virResCtrlRefreshSchemata()) < 0) virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("Failed to refresh resource control")); diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h index d639de1..968e0dc 100644 --- a/src/util/virresctrl.h +++ b/src/util/virresctrl.h @@ -27,6 +27,7 @@ # include "virbitmap.h" # include "virutil.h" # include "conf/domain_conf.h" +# include "virthread.h" #define MAX_CPU_SOCKET_NUM 8 @@ -49,6 +50,8 @@ struct _virResCacheBank { unsigned long long cache_left; unsigned long long cache_min; virBitmapPtr cpu_mask; + + virMutex lock; }; /** -- 1.9.1

Other application may touch resctrl while libvirt's running, scan resctrl again before allocating cache information to the newly created VM. Signed-off-by: Eli Qiao <liyong.qiao@intel.com> --- src/util/virresctrl.c | 40 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index 3c54e44..16c01a2 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -711,6 +711,42 @@ virResCtrlAppendDomain(virResDomainPtr dom) return 0; } +/* scan /sys/fs/resctrl again and refresh default schemata */ +static +int virResCtrlScan(void) +{ + struct dirent *ent; + DIR *dp = NULL; + int direrr; + virResDomainPtr p; + int rc = -1; + + if (virDirOpenQuiet(&dp, RESCTRL_DIR) < 0) { + if (errno == ENOENT) + return -1; + VIR_ERROR(_("Unable to open %s (%d)"), RESCTRL_DIR, errno); + goto cleanup; + } + + while ((direrr = virDirRead(dp, &ent, NULL)) > 0) { + if ((ent->d_type != DT_DIR) || STREQ(ent->d_name, "info")) + continue; + /* test if we'v tracked all domains */ + p = virResCtrlGetDomain(ent->d_name); + if (p == NULL) { + p = virResCtrlLoadDomain(ent->d_name); + if (p == NULL) + continue; + virResCtrlAppendDomain(p); + } + } + rc = 0; + + cleanup: + VIR_DIR_CLOSE(dp); + return rc; +} + static int virResCtrlGetSocketIdByHostID(int type, unsigned int hostid) { @@ -817,6 +853,10 @@ int virResCtrlSetCacheBanks(virDomainCachetunePtr cachetune, goto cleanup; } + if (virResCtrlScan() < 0) { + VIR_ERROR(_("Failed to scan resctrl domain dir")); + goto cleanup; + } p = virResCtrlGetDomain(name); if (p == NULL) { VIR_DEBUG("no domain name %s found, create new one!", name); -- 1.9.1

This patch expose a public API virNodeCacheStats to query cache stats on a host. Signed-off-by: Eli Qiao <liyong.qiao@intel.com> --- daemon/remote.c | 67 ++++++++++++++++++++++++++++++++++++++++++ include/libvirt/libvirt-host.h | 32 ++++++++++++++++++++ src/driver-hypervisor.h | 7 +++++ src/libvirt-host.c | 41 ++++++++++++++++++++++++++ src/libvirt_public.syms | 1 + src/remote/remote_driver.c | 52 ++++++++++++++++++++++++++++++++ src/remote/remote_protocol.x | 25 +++++++++++++++- src/remote_protocol-structs | 16 ++++++++++ 8 files changed, 240 insertions(+), 1 deletion(-) diff --git a/daemon/remote.c b/daemon/remote.c index f2b9b9a..af291a7 100644 --- a/daemon/remote.c +++ b/daemon/remote.c @@ -7079,3 +7079,70 @@ remoteSerializeDomainDiskErrors(virDomainDiskErrorPtr errors, } return -1; } + + static int +remoteDispatchNodeGetCacheStats(virNetServerPtr server ATTRIBUTE_UNUSED, + virNetServerClientPtr client ATTRIBUTE_UNUSED, + virNetMessagePtr msg ATTRIBUTE_UNUSED, + virNetMessageErrorPtr rerr, + remote_node_get_cache_stats_args *args, + remote_node_get_cache_stats_ret *ret) +{ + virNodeCacheStatsPtr params = NULL; + size_t i; + int nparams = 0; + unsigned int flags; + int rv = -1; + struct daemonClientPrivate *priv = + virNetServerClientGetPrivateData(client); + + if (!priv->conn) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("connection not open")); + goto cleanup; + } + + flags = args->flags; + + if (args->nparams && VIR_ALLOC_N(params, args->nparams) < 0) + goto cleanup; + nparams = args->nparams; + + if (virNodeGetCacheStats(priv->conn, params, &nparams, flags) < 0) + goto cleanup; + + /* In this case, we need to send back the number of stats + * supported + */ + if (args->nparams == 0) { + ret->nparams = nparams; + goto success; + } + + /* Serialise the memory parameters. */ + ret->params.params_len = nparams; + if (VIR_ALLOC_N(ret->params.params_val, nparams) < 0) + goto cleanup; + + for (i = 0; i < nparams; ++i) { + /* remoteDispatchClientRequest will free this: */ + if (VIR_STRDUP(ret->params.params_val[i].field, params[i].field) < 0) + goto cleanup; + + ret->params.params_val[i].value = params[i].value; + } + + success: + rv = 0; + + cleanup: + if (rv < 0) { + virNetMessageSaveError(rerr); + if (ret->params.params_val) { + for (i = 0; i < nparams; i++) + VIR_FREE(ret->params.params_val[i].field); + VIR_FREE(ret->params.params_val); + } + } + VIR_FREE(params); + return rv; +} diff --git a/include/libvirt/libvirt-host.h b/include/libvirt/libvirt-host.h index 07b5d15..222f361 100644 --- a/include/libvirt/libvirt-host.h +++ b/include/libvirt/libvirt-host.h @@ -140,6 +140,32 @@ struct _virSecurityModel { */ typedef virSecurityModel *virSecurityModelPtr; +/** + * VIR_NODE_CACHE_STATS_FIELD_LENGTH: + * + * Macro providing the field length of virNodeCacheStats + */ + +# define VIR_NODE_CACHE_STATS_FIELD_LENGTH 16 + +/** + * + * virNodeCacheStats is all the cache stats on a host. + */ + +typedef struct _virNodeCacheStats virNodeCacheStats; + +struct _virNodeCacheStats { + char field[VIR_NODE_CACHE_STATS_FIELD_LENGTH]; + unsigned long long value; +}; + +/** + * a virNodeCacheStatsPtr is a pointer to a virNodeCacheStats. + */ + +typedef virNodeCacheStats *virNodeCacheStatsPtr; + /* data types related to virNodePtr */ @@ -603,6 +629,12 @@ int virNodeSuspendForDuration (virConnectPtr conn, unsigned long long duration, unsigned int flags); +int virNodeGetCacheStats (virConnectPtr conn, + virNodeCacheStatsPtr params, + int *nparams, + unsigned int flags); + + /* * NUMA support */ diff --git a/src/driver-hypervisor.h b/src/driver-hypervisor.h index b81420a..a6a1350 100644 --- a/src/driver-hypervisor.h +++ b/src/driver-hypervisor.h @@ -578,6 +578,12 @@ typedef int unsigned int flags); typedef int +(*virDrvNodeGetCacheStats)(virConnectPtr conn, + virNodeCacheStatsPtr params, + int *nparams, + unsigned int flags); + +typedef int (*virDrvNodeGetCellsFreeMemory)(virConnectPtr conn, unsigned long long *freeMems, int startCell, @@ -1458,6 +1464,7 @@ struct _virHypervisorDriver { virDrvConnectSetKeepAlive connectSetKeepAlive; virDrvConnectIsAlive connectIsAlive; virDrvNodeSuspendForDuration nodeSuspendForDuration; + virDrvNodeGetCacheStats nodeGetCacheStats; virDrvDomainGetPerfEvents domainGetPerfEvents; virDrvDomainSetPerfEvents domainSetPerfEvents; virDrvDomainSetBlockIoTune domainSetBlockIoTune; diff --git a/src/libvirt-host.c b/src/libvirt-host.c index 335798a..87c1279 100644 --- a/src/libvirt-host.c +++ b/src/libvirt-host.c @@ -679,6 +679,47 @@ virNodeSuspendForDuration(virConnectPtr conn, return -1; } +/* + * virNodeGetCacheStats: + * @conn: pointer to the hypervisor connection + * @params: pointer to memory parameter object + * (return value, allocated by the caller) + * @nparams: pointer to number of memory parameters; input and output + * @flags: extra flags; not used yet, so callers should always pass 0 + * + * Get all node cache usage stats. + * + * Returns 0 in case of success, and -1 in case of failure. + * +*/ + +int virNodeGetCacheStats(virConnectPtr conn, + virNodeCacheStatsPtr params, + int *nparams, + unsigned int flags) +{ + VIR_DEBUG("conn=%p, params=%p, nparams=%d, flags=%x", + conn, params, nparams ? *nparams : -1, flags); + virResetLastError(); + + virCheckConnectReturn(conn, -1); + virCheckNonNullArgGoto(nparams, error); + virCheckNonNegativeArgGoto(*nparams, error); + + if (conn->driver->nodeGetCacheStats) { + int ret; + ret = conn->driver->nodeGetCacheStats(conn, params, nparams, flags); + if (ret < 0) + goto error; + return ret; + } + virReportUnsupportedError(); + + error: + virDispatchError(conn); + return -1; + +} /* * virNodeGetMemoryParameters: diff --git a/src/libvirt_public.syms b/src/libvirt_public.syms index 04ef580..a091cab 100644 --- a/src/libvirt_public.syms +++ b/src/libvirt_public.syms @@ -751,6 +751,7 @@ LIBVIRT_3.0.0 { virStorageVolGetInfoFlags; virConnectSecretEventRegisterAny; virConnectSecretEventDeregisterAny; + virNodeGetCacheStats; } LIBVIRT_2.2.0; LIBVIRT_3.1.0 { diff --git a/src/remote/remote_driver.c b/src/remote/remote_driver.c index 0c8bfee..278ffc6 100644 --- a/src/remote/remote_driver.c +++ b/src/remote/remote_driver.c @@ -1708,6 +1708,57 @@ remoteNodeGetCellsFreeMemory(virConnectPtr conn, } static int +remoteNodeGetCacheStats(virConnectPtr conn, + virNodeCacheStatsPtr params, + int *nparams, + unsigned int flags) +{ + int rv = -1; + size_t i; + remote_node_get_cache_stats_args args; + remote_node_get_cache_stats_ret ret; + struct private_data *priv = conn->privateData; + + remoteDriverLock(priv); + + args.nparams = *nparams; + args.flags = flags; + + memset(&ret, 0, sizeof(ret)); + if (call(conn, priv, 0, REMOTE_PROC_NODE_GET_CACHE_STATS, + (xdrproc_t) xdr_remote_node_get_cache_stats_args, (char *) &args, + (xdrproc_t) xdr_remote_node_get_cache_stats_ret, (char *) &ret) == -1) + goto done; + + if (*nparams == 0) { + *nparams = ret.nparams; + rv = 0; + goto cleanup; + } + + *nparams = ret.params.params_len; + + /* Deserialise the result. */ + for (i = 0; i < *nparams; ++i) { + if (virStrcpyStatic(params[i].field, ret.params.params_val[i].field) == NULL) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Stats %s too big for destination"), + ret.params.params_val[i].field); + goto cleanup; + } + params[i].value = ret.params.params_val[i].value; + } + + rv = 0; + + cleanup: + xdr_free((xdrproc_t) xdr_remote_node_get_cache_stats_ret, (char *) &ret); + done: + remoteDriverUnlock(priv); + return rv; +} + +static int remoteConnectListDomains(virConnectPtr conn, int *ids, int maxids) { int rv = -1; @@ -8291,6 +8342,7 @@ static virHypervisorDriver hypervisor_driver = { .nodeGetMemoryStats = remoteNodeGetMemoryStats, /* 0.9.3 */ .nodeGetCellsFreeMemory = remoteNodeGetCellsFreeMemory, /* 0.3.3 */ .nodeGetFreeMemory = remoteNodeGetFreeMemory, /* 0.3.3 */ + .nodeGetCacheStats = remoteNodeGetCacheStats, /* 3.0.0 */ .connectDomainEventRegister = remoteConnectDomainEventRegister, /* 0.5.0 */ .connectDomainEventDeregister = remoteConnectDomainEventDeregister, /* 0.5.0 */ .domainMigratePrepare2 = remoteDomainMigratePrepare2, /* 0.5.0 */ diff --git a/src/remote/remote_protocol.x b/src/remote/remote_protocol.x index abe63af..1c349ce 100644 --- a/src/remote/remote_protocol.x +++ b/src/remote/remote_protocol.x @@ -121,6 +121,9 @@ const REMOTE_NODE_CPU_STATS_MAX = 16; /* Upper limit on list of node memory stats. */ const REMOTE_NODE_MEMORY_STATS_MAX = 16; +/* Upper limit on list of node cache stats */ +const REMOTE_NODE_CACHE_STATS_MAX = 16; + /* Upper limit on list of block stats. */ const REMOTE_DOMAIN_BLOCK_STATS_PARAMETERS_MAX = 16; @@ -394,6 +397,11 @@ struct remote_node_get_memory_stats { unsigned hyper value; }; +struct remote_node_get_cache_stats { + remote_nonnull_string field; + unsigned hyper value; +}; + struct remote_domain_disk_error { remote_nonnull_string disk; int error; @@ -492,6 +500,16 @@ struct remote_node_get_info_ret { /* insert@1 */ int threads; }; +struct remote_node_get_cache_stats_args { + int nparams; + u_int flags; +}; + +struct remote_node_get_cache_stats_ret { + remote_node_get_cache_stats params<REMOTE_NODE_CACHE_STATS_MAX>; + int nparams; +}; + struct remote_connect_get_capabilities_ret { remote_nonnull_string capabilities; }; @@ -6033,5 +6051,10 @@ enum remote_procedure { * @acl: domain:save:!VIR_DOMAIN_AFFECT_CONFIG|VIR_DOMAIN_AFFECT_LIVE * @acl: domain:save:VIR_DOMAIN_AFFECT_CONFIG */ - REMOTE_PROC_DOMAIN_SET_VCPU = 384 + REMOTE_PROC_DOMAIN_SET_VCPU = 384, + /** + * @generate: none + * @acl: connect:read + */ + REMOTE_PROC_NODE_GET_CACHE_STATS = 385 }; diff --git a/src/remote_protocol-structs b/src/remote_protocol-structs index e1e53d2..c8868d2 100644 --- a/src/remote_protocol-structs +++ b/src/remote_protocol-structs @@ -94,6 +94,10 @@ struct remote_node_get_memory_stats { remote_nonnull_string field; uint64_t value; }; +struct remote_node_get_cache_stats { + remote_nonnull_string field; + uint64_t value; +}; struct remote_domain_disk_error { remote_nonnull_string disk; int error; @@ -2380,6 +2384,17 @@ struct remote_node_get_cpu_map_ret { u_int online; int ret; }; +struct remote_node_get_cache_stats_args { + int nparams; + u_int flags; +}; +struct remote_node_get_cache_stats_ret { + struct { + u_int paramms_len; + remote_node_get_cache_stats * params_val; + } params; + int nparams; +}; struct remote_domain_fstrim_args { remote_nonnull_domain dom; remote_string mountPoint; @@ -3217,4 +3232,5 @@ enum remote_procedure { REMOTE_PROC_SECRET_EVENT_LIFECYCLE = 382, REMOTE_PROC_SECRET_EVENT_VALUE_CHANGED = 383, REMOTE_PROC_DOMAIN_SET_VCPU = 384, + REMOTE_PROC_NODE_GET_CACHE_STATS = 385, }; -- 1.9.1

Add new virsh command line `nodecachestats` to expose the cache usage on a node. Signed-off-by: Eli Qiao <liyong.qiao@intel.com> --- src/libvirt_private.syms | 3 ++- src/qemu/qemu_driver.c | 12 ++++++++++ src/util/virresctrl.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++ src/util/virresctrl.h | 8 +++++++ tools/virsh-host.c | 49 ++++++++++++++++++++++++++++++++++++++ 5 files changed, 133 insertions(+), 1 deletion(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 9cfffb8..75a4c98 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2323,13 +2323,14 @@ virRandomInt; # util/virresctrl.h virResCtrlAvailable; +virResCtrlCacheGetStats; virResCtrlGet; virResCtrlInit; virResCtrlSetCacheBanks; virResCtrlTypeFromString; virResCtrlTypeToString; virResCtrlUpdate; -# + # util/virrotatingfile.h virRotatingFileReaderConsume; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 0f11ae2..4677406 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -18285,6 +18285,17 @@ qemuNodeGetCPUStats(virConnectPtr conn, return virHostCPUGetStats(cpuNum, params, nparams, flags); } +static int +qemuNodeGetCacheStats(virConnectPtr conn, + virNodeCacheStatsPtr params, + int *nparams, + unsigned int flags) +{ + if (virNodeGetCacheStatsEnsureACL(conn) < 0) + return -1; + + return virResCtrlCacheGetStats(params, nparams, flags); +} static int qemuNodeGetMemoryStats(virConnectPtr conn, @@ -20391,6 +20402,7 @@ static virHypervisorDriver qemuHypervisorDriver = { .domainMemoryPeek = qemuDomainMemoryPeek, /* 0.4.4 */ .domainGetBlockInfo = qemuDomainGetBlockInfo, /* 0.8.1 */ .nodeGetCPUStats = qemuNodeGetCPUStats, /* 0.9.3 */ + .nodeGetCacheStats = qemuNodeGetCacheStats, /* 3.1.0 */ .nodeGetMemoryStats = qemuNodeGetMemoryStats, /* 0.9.3 */ .nodeGetCellsFreeMemory = qemuNodeGetCellsFreeMemory, /* 0.4.4 */ .nodeGetFreeMemory = qemuNodeGetFreeMemory, /* 0.4.4 */ diff --git a/src/util/virresctrl.c b/src/util/virresctrl.c index 16c01a2..97f7e84 100644 --- a/src/util/virresctrl.c +++ b/src/util/virresctrl.c @@ -1034,3 +1034,65 @@ virResCtrlGet(int type) { return &resctrlall[type]; } + +int virResCtrlCacheGetStats(virNodeCacheStatsPtr params, + int *nparams, + unsigned int flags) +{ + virCheckFlags(0, -1); + size_t i, j, k; + char *value; + int rc = -1; + int lockfd; + + if (*nparams == 0) { + for (i = 0; i < VIR_RDT_RESOURCE_LAST; i++) { + if (VIR_RESCTRL_ENABLED(i)) + *nparams += resctrlall[i].num_banks; + } + } + if (params == NULL) + return 0; + + if ((lockfd = open(RESCTRL_DIR, O_RDONLY)) < 0) + goto cleanup; + + if (VIR_RESCTRL_LOCK(lockfd, LOCK_SH) < 0) { + virReportSystemError(errno, _("Unable to lock '%s'"), RESCTRL_DIR); + goto cleanup; + } + if (virResCtrlScan() < 0) { + VIR_ERROR(_("Failed to scan resctrl domain dir")); + goto cleanup; + } + + virResCtrlRefreshSchemata(); + + if ((rc = virResCtrlFlushDomainToSysfs(domainall.domains)) < 0) + goto cleanup; + + k = 0; + + for (i = 0; i < VIR_RDT_RESOURCE_LAST; i++) { + if (VIR_RESCTRL_ENABLED(i)) { + for (j = 0; j < resctrlall[i].num_banks; j++) { + + if (virAsprintf(&value, "%s.%zu", + resctrlall[i].name, j) < 0) + goto cleanup; + + if (virStrcpyStatic((¶ms[k])->field, value) == NULL) + goto cleanup; + + (¶ms[k++])->value = resctrlall[i].cache_banks[j].cache_left; + } + } + } + + rc = 0; + + cleanup: + VIR_FREE(value); + VIR_RESCTRL_UNLOCK(lockfd); + return rc; +} diff --git a/src/util/virresctrl.h b/src/util/virresctrl.h index 968e0dc..eef5370 100644 --- a/src/util/virresctrl.h +++ b/src/util/virresctrl.h @@ -80,9 +80,17 @@ struct _virResCtrl { }; bool virResCtrlAvailable(void); + int virResCtrlInit(void); + virResCtrlPtr virResCtrlGet(int); + int virResCtrlSetCacheBanks(virDomainCachetunePtr, unsigned char *, pid_t *, int); + int virResCtrlUpdate(unsigned char *); + +int virResCtrlCacheGetStats(virNodeCacheStatsPtr params, + int *nparams, + unsigned int flags); #endif diff --git a/tools/virsh-host.c b/tools/virsh-host.c index 24ebde2..c90bd2e 100644 --- a/tools/virsh-host.c +++ b/tools/virsh-host.c @@ -946,6 +946,49 @@ cmdNodeMemStats(vshControl *ctl, const vshCmd *cmd) VIR_FREE(params); return ret; } +/* "nodecachestats" command + */ +static const vshCmdInfo info_nodecachestats[] = { + {.name = "help", + .data = N_("Prints cache stats of the node.") + }, + {.name = "desc", + .data = N_("Returns cache stats of the node, in kilobytes.") + }, + {.name = NULL} +}; + +static bool +cmdNodeCacheStats(vshControl *ctl, const vshCmd *cmd ATTRIBUTE_UNUSED) +{ + virshControlPtr priv = ctl->privData; + virNodeCacheStatsPtr params; + int nparams = 0; + size_t i; + bool ret = false; + + if (virNodeGetCacheStats(priv->conn, NULL, &nparams, 0) != 0) { + vshError(ctl, "%s", + _("Unable to get number of cache stats")); + return false; + } + if (nparams == 0) { + /* nothing to output */ + return true; + } + + params = vshCalloc(ctl, nparams, sizeof(*params)); + if (virNodeGetCacheStats(priv->conn, params, &nparams, 0) != 0) { + vshError(ctl, "%s", _("Unable to get node cache stats")); + goto cleanup; + } + + for (i = 0; i < nparams; i++) + vshPrint(ctl, "%s: %llu KiB\n", params[i].field, params[i].value); + + cleanup: + return ret; +} /* * "nodesuspend" command @@ -1455,6 +1498,12 @@ const vshCmdDef hostAndHypervisorCmds[] = { .info = info_nodememstats, .flags = 0 }, + {.name = "nodecachestats", + .handler = cmdNodeCacheStats, + .opts = NULL, + .info = info_nodecachestats, + .flags = 0 + }, {.name = "nodesuspend", .handler = cmdNodeSuspend, .opts = opts_node_suspend, -- 1.9.1
participants (5)
-
Daniel P. Berrange
-
Eli Qiao
-
Eli Qiao
-
Marcelo Tosatti
-
Martin Kletzander