[libvirt] [PATCH 00/17] Supports for hypervisor-pin and hypervisor-bandwidth

This series is a merge of 1) "Support hypervisor-threads-pin in vcpupin" (https://www.redhat.com/archives/libvir-list/2012-July/msg01361.html) 2) "support to set cpu bandwidth for hypervisor threads" (https://www.redhat.com/archives/libvir-list/2012-June/msg01161.html) to make life easier because of the two share some patches. Patches 1-12 are from 1), patches 13-17 are from 2). Changes: 1. rebase to the latest git tree(removal of qemuReportError, split of virsh.c) 2. some typo fixes 3. make it pass syntax-check Hu Tao (2): limit cpu bandwidth only for vcpus update doc about hypervisor_period/hypervisor_quota Tang Chen (9): Enable cpuset cgroup and synchronous vcpupin info to cgroup. Support hypervisorpin xml parse. Introduce qemuSetupCgroupHypervisorPin and synchronize hypervisorpin info to cgroup. Add qemuProcessSetHypervisorAffinites and set hypervisor threads affinities Introduce virDomainHypervisorPinAdd and virDomainHypervisorPinDel functions Introduce virDomainPinHypervisorFlags and virDomainGetHypervisorPinInfo functions. Introduce qemudDomainPinHypervisorFlags and qemudDomainGetHypervisorPinInfo in qemu driver. Introduce remoteDomainPinHypervisorFlags and remoteDomainGetHypervisorPinInfo functions in remote driver. Improve vcpupin to support hypervisorpin dynamically. Wen Congyang (6): Introduce the function virCgroupForHypervisor Introduce the function virCgroupMoveTask create a new cgroup and move all hypervisor threads to the new cgroup Update XML Schema for new entries qemu: Implement hypervisor's period and quota tunable XML configuration and parsing qemu: Implement hypervisor_period and hypervisor_quota's modification daemon/remote.c | 103 +++++ docs/schemas/domaincommon.rng | 17 + include/libvirt/libvirt.h.in | 26 ++ src/conf/domain_conf.c | 189 ++++++++- src/conf/domain_conf.h | 9 + src/driver.h | 13 +- src/libvirt.c | 147 +++++++ src/libvirt_private.syms | 7 + src/libvirt_public.syms | 2 + src/qemu/qemu_cgroup.c | 193 ++++++++-- src/qemu/qemu_cgroup.h | 5 + src/qemu/qemu_driver.c | 465 +++++++++++++++++++---- src/qemu/qemu_process.c | 60 ++- src/remote/remote_driver.c | 102 +++++ src/remote/remote_protocol.x | 23 +- src/remote_protocol-structs | 24 ++ src/util/cgroup.c | 188 ++++++++- src/util/cgroup.h | 15 + tests/qemuxml2argvdata/qemuxml2argv-cputune.xml | 1 + tests/vcpupin | 6 +- tools/virsh-domain.c | 147 ++++--- tools/virsh.pod | 23 +- 22 files changed, 1594 insertions(+), 171 deletions(-) -- 1.7.10.2

From: Wen Congyang <wency@cn.fujitsu.com> Introduce the function virCgroupForHypervisor() to create sub directory for hypervisor thread(include I/O thread, vhost-net thread). There is already function to create cgroup dir for vcpus. Function virCgroupForHypervisor() creates cgroup dir for hypervisor, like those for vcpus. The hierarchy looks like this: cgroup mount point +--libvirt +--qemu +--domain name +--vcpu0 ... +--vcpuN +--hypervisor Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> --- src/libvirt_private.syms | 1 + src/util/cgroup.c | 42 ++++++++++++++++++++++++++++++++++++++++++ src/util/cgroup.h | 4 ++++ 3 files changed, 47 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index aace748..0904b60 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -71,6 +71,7 @@ virCgroupDenyDeviceMajor; virCgroupDenyDevicePath; virCgroupForDomain; virCgroupForDriver; +virCgroupForHypervisor; virCgroupForVcpu; virCgroupFree; virCgroupGetBlkioWeight; diff --git a/src/util/cgroup.c b/src/util/cgroup.c index 6c29c87..00b559f 100644 --- a/src/util/cgroup.c +++ b/src/util/cgroup.c @@ -958,6 +958,48 @@ int virCgroupForVcpu(virCgroupPtr driver ATTRIBUTE_UNUSED, #endif /** + * virCgroupForHypervisor: + * + * @driver: group for the domain + * @group: Pointer to returned virCgroupPtr + * + * Returns: 0 on success or -errno on failure + */ +#if defined HAVE_MNTENT_H && defined HAVE_GETMNTENT_R +int virCgroupForHypervisor(virCgroupPtr driver, + virCgroupPtr *group, + int create) +{ + int rc; + char *path; + + if (driver == NULL) + return -EINVAL; + + if (virAsprintf(&path, "%s/hypervisor", driver->path) < 0) + return -ENOMEM; + + rc = virCgroupNew(path, group); + VIR_FREE(path); + + if (rc == 0) { + rc = virCgroupMakeGroup(driver, *group, create, VIR_CGROUP_VCPU); + if (rc != 0) + virCgroupFree(group); + } + + return rc; +} +#else +int virCgroupForHypervisor(virCgroupPtr driver ATTRIBUTE_UNUSED, + virCgroupPtr *group ATTRIBUTE_UNUSED, + int create ATTRIBUTE_UNUSED) +{ + return -ENXIO; +} + +#endif +/** * virCgroupSetBlkioWeight: * * @group: The cgroup to change io weight for diff --git a/src/util/cgroup.h b/src/util/cgroup.h index 265f7c9..91f41ce 100644 --- a/src/util/cgroup.h +++ b/src/util/cgroup.h @@ -59,6 +59,10 @@ int virCgroupForVcpu(virCgroupPtr driver, virCgroupPtr *group, int create); +int virCgroupForHypervisor(virCgroupPtr driver, + virCgroupPtr *group, + int create); + int virCgroupPathOfController(virCgroupPtr group, int controller, const char *key, -- 1.7.10.2

On 08/03/2012 12:36 AM, Hu Tao wrote:
From: Wen Congyang <wency@cn.fujitsu.com>
Introduce the function virCgroupForHypervisor() to create sub directory for hypervisor thread(include I/O thread, vhost-net thread).
There is already function to create cgroup dir for vcpus. Function virCgroupForHypervisor() creates cgroup dir for hypervisor, like those for vcpus. The hierarchy looks like this:
cgroup mount point +--libvirt +--qemu +--domain name +--vcpu0 ... +--vcpuN +--hypervisor
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> --- src/libvirt_private.syms | 1 + src/util/cgroup.c | 42 ++++++++++++++++++++++++++++++++++++++++++ src/util/cgroup.h | 4 ++++ 3 files changed, 47 insertions(+)
Looks okay in isolation, but I'll continue reviewing the entire series before I push anything. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

From: Wen Congyang <wency@cn.fujitsu.com> Introduce a new API to move tasks of one controller from a cgroup to another cgroup Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> --- src/libvirt_private.syms | 2 + src/util/cgroup.c | 111 ++++++++++++++++++++++++++++++++++++++++++++++ src/util/cgroup.h | 8 ++++ 3 files changed, 121 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 0904b60..0511a0f 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -60,6 +60,7 @@ virCapabilitiesSetMacPrefix; # cgroup.h virCgroupAddTask; +virCgroupAddTaskController; virCgroupAllowDevice; virCgroupAllowDeviceMajor; virCgroupAllowDevicePath; @@ -91,6 +92,7 @@ virCgroupKill; virCgroupKillPainfully; virCgroupKillRecursive; virCgroupMounted; +virCgroupMoveTask; virCgroupPathOfController; virCgroupRemove; virCgroupSetBlkioDeviceWeight; diff --git a/src/util/cgroup.c b/src/util/cgroup.c index 00b559f..23837fe 100644 --- a/src/util/cgroup.c +++ b/src/util/cgroup.c @@ -803,6 +803,117 @@ int virCgroupAddTask(virCgroupPtr group, pid_t pid) return rc; } +/** + * virCgroupAddTaskController: + * + * @group: The cgroup to add a task to + * @pid: The pid of the task to add + * @controller: The cgroup controller to be operated on + * + * Returns: 0 on success or -errno on failure + */ +int virCgroupAddTaskController(virCgroupPtr group, pid_t pid, int controller) +{ + if (controller < VIR_CGROUP_CONTROLLER_CPU || + controller > VIR_CGROUP_CONTROLLER_BLKIO) + return -EINVAL; + + if (!group->controllers[controller].mountPoint) + return -EINVAL; + + return virCgroupSetValueU64(group, controller, "tasks", + (unsigned long long)pid); +} + + +static int virCgroupAddTaskStrController(virCgroupPtr group, + const char *pidstr, + int controller) +{ + char *str = NULL, *cur = NULL, *next = NULL; + unsigned long long p = 0; + int rc = 0; + + if (virAsprintf(&str, "%s", pidstr) < 0) + return -1; + + cur = str; + while ((next = strchr(cur, '\n')) != NULL) { + *next = '\0'; + rc = virStrToLong_ull(cur, NULL, 10, &p); + if (rc != 0) + goto cleanup; + cur = next + 1; + + rc = virCgroupAddTaskController(group, p, controller); + if (rc != 0) + goto cleanup; + } + if (cur != '\0') { + rc = virStrToLong_ull(cur, NULL, 10, &p); + if (rc != 0) + goto cleanup; + rc = virCgroupAddTaskController(group, p, controller); + if (rc != 0) + goto cleanup; + } + +cleanup: + VIR_FREE(str); + return rc; +} + +/** + * virCgroupMoveTask: + * + * @src_group: The source cgroup where all tasks are removed from + * @dest_group: The destination where all tasks are added to + * @controller: The cgroup controller to be operated on + * + * Returns: 0 on success or -errno on failure + */ +int virCgroupMoveTask(virCgroupPtr src_group, virCgroupPtr dest_group, + int controller) +{ + int rc = 0, err = 0; + char *content = NULL; + + if (controller < VIR_CGROUP_CONTROLLER_CPU || + controller > VIR_CGROUP_CONTROLLER_BLKIO) + return -EINVAL; + + if (!src_group->controllers[controller].mountPoint || + !dest_group->controllers[controller].mountPoint) { + VIR_WARN("no vm cgroup in controller %d", controller); + return 0; + } + + rc = virCgroupGetValueStr(src_group, controller, "tasks", &content); + if (rc != 0) + return rc; + + rc = virCgroupAddTaskStrController(dest_group, content, controller); + if (rc != 0) + goto cleanup; + + VIR_FREE(content); + + return 0; + +cleanup: + /* + * We don't need to recover dest_cgroup because cgroup will make sure + * that one task only resides in one cgroup of the same controller. + */ + err = virCgroupAddTaskStrController(src_group, content, controller); + if (err != 0) + VIR_ERROR(_("Cannot recover cgroup %s from %s"), + src_group->controllers[controller].mountPoint, + dest_group->controllers[controller].mountPoint); + VIR_FREE(content); + + return rc; +} /** * virCgroupForDriver: diff --git a/src/util/cgroup.h b/src/util/cgroup.h index 91f41ce..29c52c1 100644 --- a/src/util/cgroup.h +++ b/src/util/cgroup.h @@ -70,6 +70,14 @@ int virCgroupPathOfController(virCgroupPtr group, int virCgroupAddTask(virCgroupPtr group, pid_t pid); +int virCgroupAddTaskController(virCgroupPtr group, + pid_t pid, + int controller); + +int virCgroupMoveTask(virCgroupPtr src_group, + virCgroupPtr dest_group, + int controller); + int virCgroupSetBlkioWeight(virCgroupPtr group, unsigned int weight); int virCgroupGetBlkioWeight(virCgroupPtr group, unsigned int *weight); -- 1.7.10.2

On 08/03/2012 12:36 AM, Hu Tao wrote:
From: Wen Congyang <wency@cn.fujitsu.com>
Introduce a new API to move tasks of one controller from a cgroup to another cgroup
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> ---
+static int virCgroupAddTaskStrController(virCgroupPtr group, + const char *pidstr, + int controller) +{ + char *str = NULL, *cur = NULL, *next = NULL; + unsigned long long p = 0; + int rc = 0; + + if (virAsprintf(&str, "%s", pidstr) < 0) + return -1; + + cur = str; + while ((next = strchr(cur, '\n')) != NULL) { + *next = '\0'; + rc = virStrToLong_ull(cur, NULL, 10, &p); + if (rc != 0) + goto cleanup; + cur = next + 1; + + rc = virCgroupAddTaskController(group, p, controller); + if (rc != 0) + goto cleanup; + } + if (cur != '\0') { + rc = virStrToLong_ull(cur, NULL, 10, &p); + if (rc != 0) + goto cleanup; + rc = virCgroupAddTaskController(group, p, controller); + if (rc != 0) + goto cleanup; + }
Can this last if statement be folded into the while loop for less code duplication? If the series passes review now, then I'm probably not worried enough to change it and would probably push as is; but if we need a v2, then it's worth improving. Another patch that looks decent in isolation. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Mon, Aug 06, 2012 at 03:09:51PM -0600, Eric Blake wrote:
On 08/03/2012 12:36 AM, Hu Tao wrote:
From: Wen Congyang <wency@cn.fujitsu.com>
Introduce a new API to move tasks of one controller from a cgroup to another cgroup
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> ---
+static int virCgroupAddTaskStrController(virCgroupPtr group, + const char *pidstr, + int controller) +{ + char *str = NULL, *cur = NULL, *next = NULL; + unsigned long long p = 0; + int rc = 0; + + if (virAsprintf(&str, "%s", pidstr) < 0) + return -1; + + cur = str; + while ((next = strchr(cur, '\n')) != NULL) { + *next = '\0'; + rc = virStrToLong_ull(cur, NULL, 10, &p); + if (rc != 0) + goto cleanup; + cur = next + 1; + + rc = virCgroupAddTaskController(group, p, controller); + if (rc != 0) + goto cleanup; + } + if (cur != '\0') { + rc = virStrToLong_ull(cur, NULL, 10, &p); + if (rc != 0) + goto cleanup; + rc = virCgroupAddTaskController(group, p, controller); + if (rc != 0) + goto cleanup; + }
Can this last if statement be folded into the while loop for less code duplication? If the series passes review now, then I'm probably not worried enough to change it and would probably push as is; but if we need a v2, then it's worth improving.
OK, I'll improve it. -- Thanks, Hu Tao

From: Wen Congyang <wency@cn.fujitsu.com> Create a new cgroup and move all hypervisor threads to the new cgroup. And then we can do the other things: 1. limit only vcpu usage rather than the whole qemu 2. limit for hypervisor threads(include vhost-net threads) Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> --- src/qemu/qemu_cgroup.c | 71 ++++++++++++++++++++++++++++++++++++++++++++--- src/qemu/qemu_cgroup.h | 2 ++ src/qemu/qemu_process.c | 6 +++- 3 files changed, 74 insertions(+), 5 deletions(-) diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index dee9de6..9400efe 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -521,11 +521,12 @@ int qemuSetupCgroupForVcpu(struct qemud_driver *driver, virDomainObjPtr vm) } if (priv->nvcpupids == 0 || priv->vcpupids[0] == vm->pid) { - /* If we does not know VCPU<->PID mapping or all vcpu runs in the same + /* If we does not know VCPU<->PID mapping or all vcpus run in the same * thread, we cannot control each vcpu. */ - virCgroupFree(&cgroup); - return 0; + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Unable to get vcpus' pids.")); + goto cleanup; } for (i = 0; i < priv->nvcpupids; i++) { @@ -562,7 +563,11 @@ int qemuSetupCgroupForVcpu(struct qemud_driver *driver, virDomainObjPtr vm) return 0; cleanup: - virCgroupFree(&cgroup_vcpu); + if (cgroup_vcpu) { + virCgroupRemove(cgroup_vcpu); + virCgroupFree(&cgroup_vcpu); + } + if (cgroup) { virCgroupRemove(cgroup); virCgroupFree(&cgroup); @@ -571,6 +576,64 @@ cleanup: return -1; } +int qemuSetupCgroupForHypervisor(struct qemud_driver *driver, + virDomainObjPtr vm) +{ + virCgroupPtr cgroup = NULL; + virCgroupPtr cgroup_hypervisor = NULL; + int rc, i; + + if (driver->cgroup == NULL) + return 0; /* Not supported, so claim success */ + + rc = virCgroupForDomain(driver->cgroup, vm->def->name, &cgroup, 0); + if (rc != 0) { + virReportSystemError(-rc, + _("Unable to find cgroup for %s"), + vm->def->name); + goto cleanup; + } + + rc = virCgroupForHypervisor(cgroup, &cgroup_hypervisor, 1); + if (rc < 0) { + virReportSystemError(-rc, + _("Unable to create hypervisor cgroup for %s"), + vm->def->name); + goto cleanup; + } + + for (i = 0; i < VIR_CGROUP_CONTROLLER_LAST; i++) { + if (!qemuCgroupControllerActive(driver, i)) { + VIR_WARN("cgroup %d is not active", i); + continue; + } + rc = virCgroupMoveTask(cgroup, cgroup_hypervisor, i); + if (rc < 0) { + virReportSystemError(-rc, + _("Unable to move tasks from domain cgroup to " + "hypervisor cgroup in controller %d for %s"), + i, vm->def->name); + goto cleanup; + } + } + + virCgroupFree(&cgroup_hypervisor); + virCgroupFree(&cgroup); + return 0; + +cleanup: + if (cgroup_hypervisor) { + virCgroupRemove(cgroup_hypervisor); + virCgroupFree(&cgroup_hypervisor); + } + + if (cgroup) { + virCgroupRemove(cgroup); + virCgroupFree(&cgroup); + } + + return rc; +} int qemuRemoveCgroup(struct qemud_driver *driver, virDomainObjPtr vm, diff --git a/src/qemu/qemu_cgroup.h b/src/qemu/qemu_cgroup.h index 5973430..3380ee2 100644 --- a/src/qemu/qemu_cgroup.h +++ b/src/qemu/qemu_cgroup.h @@ -54,6 +54,8 @@ int qemuSetupCgroupVcpuBW(virCgroupPtr cgroup, unsigned long long period, long long quota); int qemuSetupCgroupForVcpu(struct qemud_driver *driver, virDomainObjPtr vm); +int qemuSetupCgroupForHypervisor(struct qemud_driver *driver, + virDomainObjPtr vm); int qemuRemoveCgroup(struct qemud_driver *driver, virDomainObjPtr vm, int quiet); diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 685ea7c..d89b4d5 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3753,10 +3753,14 @@ int qemuProcessStart(virConnectPtr conn, if (qemuProcessDetectVcpuPIDs(driver, vm) < 0) goto cleanup; - VIR_DEBUG("Setting cgroup for each VCPU(if required)"); + VIR_DEBUG("Setting cgroup for each VCPU (if required)"); if (qemuSetupCgroupForVcpu(driver, vm) < 0) goto cleanup; + VIR_DEBUG("Setting cgroup for hypervisor (if required)"); + if (qemuSetupCgroupForHypervisor(driver, vm) < 0) + goto cleanup; + VIR_DEBUG("Setting VCPU affinities"); if (qemuProcessSetVcpuAffinites(conn, vm) < 0) goto cleanup; -- 1.7.10.2

On 08/03/2012 12:36 AM, Hu Tao wrote:
From: Wen Congyang <wency@cn.fujitsu.com>
Create a new cgroup and move all hypervisor threads to the new cgroup. And then we can do the other things: 1. limit only vcpu usage rather than the whole qemu 2. limit for hypervisor threads(include vhost-net threads)
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> ---
+int qemuSetupCgroupForHypervisor(struct qemud_driver *driver, + virDomainObjPtr vm) +{
+ + for (i = 0; i < VIR_CGROUP_CONTROLLER_LAST; i++) { + if (!qemuCgroupControllerActive(driver, i)) { + VIR_WARN("cgroup %d is not active", i); + continue; + }
Do we need to do this for every controller, or only for the cpu and cpuacct controllers? Again, looks reasonable in isolation. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Mon, Aug 06, 2012 at 03:18:07PM -0600, Eric Blake wrote:
On 08/03/2012 12:36 AM, Hu Tao wrote:
From: Wen Congyang <wency@cn.fujitsu.com>
Create a new cgroup and move all hypervisor threads to the new cgroup. And then we can do the other things: 1. limit only vcpu usage rather than the whole qemu 2. limit for hypervisor threads(include vhost-net threads)
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> ---
+int qemuSetupCgroupForHypervisor(struct qemud_driver *driver, + virDomainObjPtr vm) +{
+ + for (i = 0; i < VIR_CGROUP_CONTROLLER_LAST; i++) { + if (!qemuCgroupControllerActive(driver, i)) { + VIR_WARN("cgroup %d is not active", i); + continue; + }
Do we need to do this for every controller, or only for the cpu and cpuacct controllers?
How about to add a third parameter to let the caller choose which cgroup controller to set up? -- Thanks, Hu Tao

From: Tang Chen <tangchen@cn.fujitsu.com> vcpu threads pin are implemented using sched_setaffinity(), but not controlled by cgroup. This patch does the following things: 1) enable cpuset cgroup 2) reflect all the vcpu threads pin info to cgroup Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> --- src/libvirt_private.syms | 2 ++ src/qemu/qemu_cgroup.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_cgroup.h | 2 ++ src/qemu/qemu_driver.c | 44 ++++++++++++++++++++++++++++++++++++-------- src/util/cgroup.c | 35 ++++++++++++++++++++++++++++++++++- src/util/cgroup.h | 3 +++ 6 files changed, 121 insertions(+), 9 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 0511a0f..ae7f124 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -82,6 +82,7 @@ virCgroupGetCpuShares; virCgroupGetCpuacctPercpuUsage; virCgroupGetCpuacctStat; virCgroupGetCpuacctUsage; +virCgroupGetCpusetCpus; virCgroupGetCpusetMems; virCgroupGetFreezerState; virCgroupGetMemSwapHardLimit; @@ -100,6 +101,7 @@ virCgroupSetBlkioWeight; virCgroupSetCpuCfsPeriod; virCgroupSetCpuCfsQuota; virCgroupSetCpuShares; +virCgroupSetCpusetCpus; virCgroupSetCpusetMems; virCgroupSetFreezerState; virCgroupSetMemSwapHardLimit; diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index 9400efe..ceca4de 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -479,11 +479,49 @@ cleanup: return -1; } +int qemuSetupCgroupVcpuPin(virCgroupPtr cgroup, virDomainDefPtr def, + int vcpuid) +{ + int i, rc = 0; + char *new_cpus = NULL; + + if (vcpuid < 0 || vcpuid >= def->vcpus) { + virReportSystemError(EINVAL, + _("invalid vcpuid: %d"), vcpuid); + return -EINVAL; + } + + for (i = 0; i < def->cputune.nvcpupin; i++) { + if (vcpuid == def->cputune.vcpupin[i]->vcpuid) { + new_cpus = virDomainCpuSetFormat(def->cputune.vcpupin[i]->cpumask, + VIR_DOMAIN_CPUMASK_LEN); + if (!new_cpus) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("failed to convert cpu mask")); + rc = -1; + goto cleanup; + } + rc = virCgroupSetCpusetCpus(cgroup, new_cpus); + if (rc != 0) { + virReportSystemError(-rc, + "%s", + _("Unable to set cpuset.cpus")); + goto cleanup; + } + } + } + +cleanup: + VIR_FREE(new_cpus); + return rc; +} + int qemuSetupCgroupForVcpu(struct qemud_driver *driver, virDomainObjPtr vm) { virCgroupPtr cgroup = NULL; virCgroupPtr cgroup_vcpu = NULL; qemuDomainObjPrivatePtr priv = vm->privateData; + virDomainDefPtr def = vm->def; int rc; unsigned int i; unsigned long long period = vm->def->cputune.period; @@ -555,6 +593,12 @@ int qemuSetupCgroupForVcpu(struct qemud_driver *driver, virDomainObjPtr vm) } } + /* Set vcpupin in cgroup if vcpupin xml is provided */ + if (def->cputune.nvcpupin && + qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_CPUSET) && + qemuSetupCgroupVcpuPin(cgroup_vcpu, def, i) < 0) + goto cleanup; + virCgroupFree(&cgroup_vcpu); } diff --git a/src/qemu/qemu_cgroup.h b/src/qemu/qemu_cgroup.h index 3380ee2..62ec953 100644 --- a/src/qemu/qemu_cgroup.h +++ b/src/qemu/qemu_cgroup.h @@ -53,6 +53,8 @@ int qemuSetupCgroup(struct qemud_driver *driver, int qemuSetupCgroupVcpuBW(virCgroupPtr cgroup, unsigned long long period, long long quota); +int qemuSetupCgroupVcpuPin(virCgroupPtr cgroup, virDomainDefPtr def, + int vcpuid); int qemuSetupCgroupForVcpu(struct qemud_driver *driver, virDomainObjPtr vm); int qemuSetupCgroupForHypervisor(struct qemud_driver *driver, virDomainObjPtr vm); diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 270e4dd..2da13a4 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -3613,6 +3613,8 @@ qemudDomainPinVcpuFlags(virDomainPtr dom, struct qemud_driver *driver = dom->conn->privateData; virDomainObjPtr vm; virDomainDefPtr persistentDef = NULL; + virCgroupPtr cgroup_dom = NULL; + virCgroupPtr cgroup_vcpu = NULL; int maxcpu, hostcpus; virNodeInfo nodeinfo; int ret = -1; @@ -3667,9 +3669,38 @@ qemudDomainPinVcpuFlags(virDomainPtr dom, if (flags & VIR_DOMAIN_AFFECT_LIVE) { if (priv->vcpupids != NULL) { + /* Add config to vm->def first, because qemuSetupCgroupVcpuPin needs it. */ + if (virDomainVcpuPinAdd(vm->def, cpumap, maplen, vcpu) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("failed to update or add vcpupin xml of " + "a running domain")); + goto cleanup; + } + + /* Configure the corresponding cpuset cgroup before set affinity. */ + if (qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_CPUSET)) { + if (virCgroupForDomain(driver->cgroup, vm->def->name, &cgroup_dom, 0) == 0 && + virCgroupForVcpu(cgroup_dom, vcpu, &cgroup_vcpu, 0) == 0 && + qemuSetupCgroupVcpuPin(cgroup_vcpu, vm->def, vcpu) < 0) { + virReportError(VIR_ERR_OPERATION_INVALID, + _("failed to set cpuset.cpus in cgroup" + " for vcpu %d"), vcpu); + goto cleanup; + } + } else { + /* Here, we should go on because even if cgroup is not active, + * we can still use virProcessInfoSetAffinity. + */ + VIR_WARN("cpuset cgroup is not active"); + } + if (virProcessInfoSetAffinity(priv->vcpupids[vcpu], - cpumap, maplen, maxcpu) < 0) + cpumap, maplen, maxcpu) < 0) { + virReportError(VIR_ERR_SYSTEM_ERROR, + _("failed to set cpu affinity for vcpu %d"), + vcpu); goto cleanup; + } } else { virReportError(VIR_ERR_OPERATION_INVALID, "%s", _("cpu affinity is not supported")); @@ -3683,13 +3714,6 @@ qemudDomainPinVcpuFlags(virDomainPtr dom, "a running domain")); goto cleanup; } - } else { - if (virDomainVcpuPinAdd(vm->def, cpumap, maplen, vcpu) < 0) { - virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("failed to update or add vcpupin xml of " - "a running domain")); - goto cleanup; - } } if (virDomainSaveStatus(driver->caps, driver->stateDir, vm) < 0) @@ -3721,6 +3745,10 @@ qemudDomainPinVcpuFlags(virDomainPtr dom, ret = 0; cleanup: + if (cgroup_vcpu) + virCgroupFree(&cgroup_vcpu); + if (cgroup_dom) + virCgroupFree(&cgroup_dom); if (vm) virDomainObjUnlock(vm); return ret; diff --git a/src/util/cgroup.c b/src/util/cgroup.c index 23837fe..c35828c 100644 --- a/src/util/cgroup.c +++ b/src/util/cgroup.c @@ -544,7 +544,8 @@ static int virCgroupMakeGroup(virCgroupPtr parent, virCgroupPtr group, /* We need to control cpu bandwidth for each vcpu now */ if ((flags & VIR_CGROUP_VCPU) && (i != VIR_CGROUP_CONTROLLER_CPU && - i != VIR_CGROUP_CONTROLLER_CPUACCT)) { + i != VIR_CGROUP_CONTROLLER_CPUACCT && + i != VIR_CGROUP_CONTROLLER_CPUSET)) { /* treat it as unmounted and we can use virCgroupAddTask */ VIR_FREE(group->controllers[i].mountPoint); continue; @@ -1405,6 +1406,38 @@ int virCgroupGetCpusetMems(virCgroupPtr group, char **mems) } /** + * virCgroupSetCpusetCpus: + * + * @group: The cgroup to set cpuset.cpus for + * @cpus: the cpus to set + * + * Retuens: 0 on success + */ +int virCgroupSetCpusetCpus(virCgroupPtr group, const char *cpus) +{ + return virCgroupSetValueStr(group, + VIR_CGROUP_CONTROLLER_CPUSET, + "cpuset.cpus", + cpus); +} + +/** + * virCgroupGetCpusetCpus: + * + * @group: The cgroup to get cpuset.cpus for + * @cpus: the cpus to get + * + * Retuens: 0 on success + */ +int virCgroupGetCpusetCpus(virCgroupPtr group, char **cpus) +{ + return virCgroupGetValueStr(group, + VIR_CGROUP_CONTROLLER_CPUSET, + "cpuset.cpus", + cpus); +} + +/** * virCgroupDenyAllDevices: * * @group: The cgroup to deny all permissions, for all devices diff --git a/src/util/cgroup.h b/src/util/cgroup.h index 29c52c1..1409df9 100644 --- a/src/util/cgroup.h +++ b/src/util/cgroup.h @@ -151,6 +151,9 @@ int virCgroupGetFreezerState(virCgroupPtr group, char **state); int virCgroupSetCpusetMems(virCgroupPtr group, const char *mems); int virCgroupGetCpusetMems(virCgroupPtr group, char **mems); +int virCgroupSetCpusetCpus(virCgroupPtr group, const char *cpus); +int virCgroupGetCpusetCpus(virCgroupPtr group, char **cpus); + int virCgroupRemove(virCgroupPtr group); void virCgroupFree(virCgroupPtr *group); -- 1.7.10.2

On 08/03/2012 12:36 AM, Hu Tao wrote:
From: Tang Chen <tangchen@cn.fujitsu.com>
vcpu threads pin are implemented using sched_setaffinity(), but not controlled by cgroup. This patch does the following things:
1) enable cpuset cgroup 2) reflect all the vcpu threads pin info to cgroup
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> ---
@@ -3667,9 +3669,38 @@ qemudDomainPinVcpuFlags(virDomainPtr dom, if (flags & VIR_DOMAIN_AFFECT_LIVE) {
if (priv->vcpupids != NULL) { + /* Add config to vm->def first, because qemuSetupCgroupVcpuPin needs it. */ + if (virDomainVcpuPinAdd(vm->def, cpumap, maplen, vcpu) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("failed to update or add vcpupin xml of " + "a running domain")); + goto cleanup; + }
If this succeeds,
+ + /* Configure the corresponding cpuset cgroup before set affinity. */ + if (qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_CPUSET)) { + if (virCgroupForDomain(driver->cgroup, vm->def->name, &cgroup_dom, 0) == 0 && + virCgroupForVcpu(cgroup_dom, vcpu, &cgroup_vcpu, 0) == 0 && + qemuSetupCgroupVcpuPin(cgroup_vcpu, vm->def, vcpu) < 0) { + virReportError(VIR_ERR_OPERATION_INVALID, + _("failed to set cpuset.cpus in cgroup" + " for vcpu %d"), vcpu); + goto cleanup;
but this fails, then where do we roll vm->def back to its pre-attempt state?
+ } + } else { + /* Here, we should go on because even if cgroup is not active, + * we can still use virProcessInfoSetAffinity. + */ + VIR_WARN("cpuset cgroup is not active"); + } + if (virProcessInfoSetAffinity(priv->vcpupids[vcpu], - cpumap, maplen, maxcpu) < 0) + cpumap, maplen, maxcpu) < 0) { + virReportError(VIR_ERR_SYSTEM_ERROR, + _("failed to set cpu affinity for vcpu %d"), + vcpu); goto cleanup;
Same situation - if set_affinity failed, where do we roll back to the original vm->def state (also, should we roll back to the original cpuset state)? If we don't roll back, then a failure can leave the domain in an unknown state for cpu pinning. Or are we just declaring that if these functions fail, the domain is in an unknown set of cpu pinning? Do we need both cpuset and set_affinity? Or is cpuset a superset of affinity (that is, if I alter cpuset, can I completely skip out on calling set_affinity and still get the same results)? If cpuset gives stronger pinning than affinity, then maybe we don't need to try both methods, which gives us a bit better mechanism for rollbacks; the only remaining thing is making sure you avoid altering vm->def except on success (perhaps by using a temporary structure rather than vm->def at the time you attempt the pinning).
+ } } else { virReportError(VIR_ERR_OPERATION_INVALID, "%s", _("cpu affinity is not supported")); @@ -3683,13 +3714,6 @@ qemudDomainPinVcpuFlags(virDomainPtr dom, "a running domain")); goto cleanup; } - } else { - if (virDomainVcpuPinAdd(vm->def, cpumap, maplen, vcpu) < 0) { - virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("failed to update or add vcpupin xml of " - "a running domain")); - goto cleanup; - }
In other words, I think floating this code to occur before the pinning is dangerous, unless you also add a rollback mechanism. That said, if you ignore this particular corner case of failure recovery, the rest of the patch seems fine, so it might be okay to add failure recovery as a followup patch. I'll continue reviewing the series, and try actually playing with the results (which, in the common case of success, should just work). -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Mon, Aug 06, 2012 at 03:41:29PM -0600, Eric Blake wrote:
On 08/03/2012 12:36 AM, Hu Tao wrote:
From: Tang Chen <tangchen@cn.fujitsu.com>
vcpu threads pin are implemented using sched_setaffinity(), but not controlled by cgroup. This patch does the following things:
1) enable cpuset cgroup 2) reflect all the vcpu threads pin info to cgroup
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> ---
@@ -3667,9 +3669,38 @@ qemudDomainPinVcpuFlags(virDomainPtr dom, if (flags & VIR_DOMAIN_AFFECT_LIVE) {
if (priv->vcpupids != NULL) { + /* Add config to vm->def first, because qemuSetupCgroupVcpuPin needs it. */ + if (virDomainVcpuPinAdd(vm->def, cpumap, maplen, vcpu) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("failed to update or add vcpupin xml of " + "a running domain")); + goto cleanup; + }
If this succeeds,
+ + /* Configure the corresponding cpuset cgroup before set affinity. */ + if (qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_CPUSET)) { + if (virCgroupForDomain(driver->cgroup, vm->def->name, &cgroup_dom, 0) == 0 && + virCgroupForVcpu(cgroup_dom, vcpu, &cgroup_vcpu, 0) == 0 && + qemuSetupCgroupVcpuPin(cgroup_vcpu, vm->def, vcpu) < 0) { + virReportError(VIR_ERR_OPERATION_INVALID, + _("failed to set cpuset.cpus in cgroup" + " for vcpu %d"), vcpu); + goto cleanup;
but this fails, then where do we roll vm->def back to its pre-attempt state?
+ } + } else { + /* Here, we should go on because even if cgroup is not active, + * we can still use virProcessInfoSetAffinity. + */ + VIR_WARN("cpuset cgroup is not active"); + } + if (virProcessInfoSetAffinity(priv->vcpupids[vcpu], - cpumap, maplen, maxcpu) < 0) + cpumap, maplen, maxcpu) < 0) { + virReportError(VIR_ERR_SYSTEM_ERROR, + _("failed to set cpu affinity for vcpu %d"), + vcpu); goto cleanup;
Same situation - if set_affinity failed, where do we roll back to the original vm->def state (also, should we roll back to the original cpuset state)?
If we don't roll back, then a failure can leave the domain in an unknown state for cpu pinning. Or are we just declaring that if these functions fail, the domain is in an unknown set of cpu pinning?
Oh yes, this is indeed a problem here, I'll fix it.
Do we need both cpuset and set_affinity? Or is cpuset a superset of affinity (that is, if I alter cpuset, can I completely skip out on calling set_affinity and still get the same results)? If cpuset gives stronger pinning than affinity, then maybe we don't need to try both methods, which gives us a bit better mechanism for rollbacks; the only
The man page[1] says cpusets are integrated with sched_setaffinity(), so I think it is better to choose one of them. If cpuset is avaiable, we use cpuset. Otherwise we use sched_setaffinity().
remaining thing is making sure you avoid altering vm->def except on success (perhaps by using a temporary structure rather than vm->def at the time you attempt the pinning).
Yes. [1] man 7 cpuset -- Thanks, Hu Tao

From: Tang Chen <tangchen@cn.fujitsu.com> This patch adds a new xml element <hypervisorpin cpuset='1'>, and also the parser functions, docs, and tests. hypervisorpin means pinning hypervisor threads, and cpuset = '1' means pinning all hypervisor threads to cpu 1. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> --- docs/schemas/domaincommon.rng | 7 ++ src/conf/domain_conf.c | 97 ++++++++++++++++++++++- src/conf/domain_conf.h | 1 + tests/qemuxml2argvdata/qemuxml2argv-cputune.xml | 1 + 4 files changed, 103 insertions(+), 3 deletions(-) diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index c85d763..c0b83b3 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -576,6 +576,13 @@ </attribute> </element> </zeroOrMore> + <optional> + <element name="hypervisorpin"> + <attribute name="cpuset"> + <ref name="cpuset"/> + </attribute> + </element> + </optional> </element> </optional> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 43b3f80..08a662e 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -7828,6 +7828,51 @@ error: goto cleanup; } +/* Parse the XML definition for hypervisorpin */ +static virDomainVcpuPinDefPtr +virDomainHypervisorPinDefParseXML(const xmlNodePtr node) +{ + virDomainVcpuPinDefPtr def = NULL; + char *tmp = NULL; + + if (VIR_ALLOC(def) < 0) { + virReportOOMError(); + return NULL; + } + + def->vcpuid = -1; + + tmp = virXMLPropString(node, "cpuset"); + + if (tmp) { + char *set = tmp; + int cpumasklen = VIR_DOMAIN_CPUMASK_LEN; + + if (VIR_ALLOC_N(def->cpumask, cpumasklen) < 0) { + virReportOOMError(); + goto error; + } + + if (virDomainCpuSetParse(set, 0, def->cpumask, + cpumasklen) < 0) + goto error; + + VIR_FREE(tmp); + } else { + virReportError(VIR_ERR_INTERNAL_ERROR, + "%s", _("missing cpuset for hypervisor pin")); + goto error; + } + +cleanup: + return def; + +error: + VIR_FREE(tmp); + VIR_FREE(def); + goto cleanup; +} + static int virDomainDefMaybeAddController(virDomainDefPtr def, int type, int idx) @@ -8223,6 +8268,34 @@ static virDomainDefPtr virDomainDefParseXML(virCapsPtr caps, } VIR_FREE(nodes); + if ((n = virXPathNodeSet("./cputune/hypervisorpin", ctxt, &nodes)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("cannot extract hypervisorpin nodes")); + goto error; + } + + if (n > 1) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("only one hypervisorpin is supported")); + VIR_FREE(nodes); + goto error; + } + + if (n && VIR_ALLOC(def->cputune.hypervisorpin) < 0) { + goto no_memory; + } + + if (n) { + virDomainVcpuPinDefPtr hypervisorpin = NULL; + hypervisorpin = virDomainHypervisorPinDefParseXML(nodes[0]); + + if (!hypervisorpin) + goto error; + + def->cputune.hypervisorpin = hypervisorpin; + } + VIR_FREE(nodes); + /* Extract numatune if exists. */ if ((n = virXPathNodeSet("./numatune", ctxt, &nodes)) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, @@ -9280,7 +9353,7 @@ no_memory: virReportOOMError(); /* fallthrough */ - error: +error: VIR_FREE(tmp); VIR_FREE(nodes); virBitmapFree(bootMap); @@ -12848,7 +12921,8 @@ virDomainDefFormatInternal(virDomainDefPtr def, virBufferAsprintf(buf, ">%u</vcpu>\n", def->maxvcpus); if (def->cputune.shares || def->cputune.vcpupin || - def->cputune.period || def->cputune.quota) + def->cputune.period || def->cputune.quota || + def->cputune.hypervisorpin) virBufferAddLit(buf, " <cputune>\n"); if (def->cputune.shares) @@ -12880,8 +12954,25 @@ virDomainDefFormatInternal(virDomainDefPtr def, } } + if (def->cputune.hypervisorpin) { + virBufferAsprintf(buf, " <hypervisorpin "); + + char *cpumask = NULL; + cpumask = virDomainCpuSetFormat(def->cputune.hypervisorpin->cpumask, + VIR_DOMAIN_CPUMASK_LEN); + if (cpumask == NULL) { + virReportError(VIR_ERR_INTERNAL_ERROR, + "%s", _("failed to format cpuset for hypervisor")); + goto cleanup; + } + + virBufferAsprintf(buf, "cpuset='%s'/>\n", cpumask); + VIR_FREE(cpumask); + } + if (def->cputune.shares || def->cputune.vcpupin || - def->cputune.period || def->cputune.quota) + def->cputune.period || def->cputune.quota || + def->cputune.hypervisorpin) virBufferAddLit(buf, " </cputune>\n"); if (def->numatune.memory.nodemask || diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 9fdda78..03cab2b 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -1609,6 +1609,7 @@ struct _virDomainDef { long long quota; int nvcpupin; virDomainVcpuPinDefPtr *vcpupin; + virDomainVcpuPinDefPtr hypervisorpin; } cputune; virDomainNumatuneDef numatune; diff --git a/tests/qemuxml2argvdata/qemuxml2argv-cputune.xml b/tests/qemuxml2argvdata/qemuxml2argv-cputune.xml index df3101d..b72af1b 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-cputune.xml +++ b/tests/qemuxml2argvdata/qemuxml2argv-cputune.xml @@ -10,6 +10,7 @@ <quota>-1</quota> <vcpupin vcpu='0' cpuset='0'/> <vcpupin vcpu='1' cpuset='1'/> + <hypervisorpin cpuset='1'/> </cputune> <os> <type arch='i686' machine='pc'>hvm</type> -- 1.7.10.2

On 08/03/2012 12:36 AM, Hu Tao wrote:
From: Tang Chen <tangchen@cn.fujitsu.com>
This patch adds a new xml element <hypervisorpin cpuset='1'>,
I would mention that this is a sibling to the existing <vcpupin> element under the <cputune>.
and also the parser functions, docs, and tests. hypervisorpin means pinning hypervisor threads, and cpuset = '1' means pinning all hypervisor threads to cpu 1.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> --- docs/schemas/domaincommon.rng | 7 ++
Missing documentation. I can't apply this as-is unless we also update the elementsCPUTuning section of docs/formatdomain.html.in. That won't stop me from reviewing the rest of the patch, though.
src/conf/domain_conf.c | 97 ++++++++++++++++++++++- src/conf/domain_conf.h | 1 + tests/qemuxml2argvdata/qemuxml2argv-cputune.xml | 1 + 4 files changed, 103 insertions(+), 3 deletions(-)
+++ b/src/conf/domain_conf.c @@ -7828,6 +7828,51 @@ error: goto cleanup; }
+/* Parse the XML definition for hypervisorpin */ +static virDomainVcpuPinDefPtr +virDomainHypervisorPinDefParseXML(const xmlNodePtr node) +{ ... +}
This is a lot of code duplication. It might be worth refactoring things to write a helper function that parses '@cpuset', and which can be shared by both the existing virDomainVcpuPinDefParseXML and your new function.
@@ -9280,7 +9353,7 @@ no_memory: virReportOOMError(); /* fallthrough */
- error: +error: VIR_FREE(tmp); VIR_FREE(nodes); virBitmapFree(bootMap);
On its own, this whitespace cleanup has no bearing on the patch; it's generally wise to limit cleanups to portions already affected by the patch. But in general, the patch looked reasonable. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Mon, Aug 06, 2012 at 04:51:49PM -0600, Eric Blake wrote:
On 08/03/2012 12:36 AM, Hu Tao wrote:
From: Tang Chen <tangchen@cn.fujitsu.com>
This patch adds a new xml element <hypervisorpin cpuset='1'>,
I would mention that this is a sibling to the existing <vcpupin> element under the <cputune>.
and also the parser functions, docs, and tests. hypervisorpin means pinning hypervisor threads, and cpuset = '1' means pinning all hypervisor threads to cpu 1.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> --- docs/schemas/domaincommon.rng | 7 ++
Missing documentation. I can't apply this as-is unless we also update the elementsCPUTuning section of docs/formatdomain.html.in. That won't stop me from reviewing the rest of the patch, though.
src/conf/domain_conf.c | 97 ++++++++++++++++++++++- src/conf/domain_conf.h | 1 + tests/qemuxml2argvdata/qemuxml2argv-cputune.xml | 1 + 4 files changed, 103 insertions(+), 3 deletions(-)
+++ b/src/conf/domain_conf.c @@ -7828,6 +7828,51 @@ error: goto cleanup; }
+/* Parse the XML definition for hypervisorpin */ +static virDomainVcpuPinDefPtr +virDomainHypervisorPinDefParseXML(const xmlNodePtr node) +{ ... +}
This is a lot of code duplication. It might be worth refactoring things to write a helper function that parses '@cpuset', and which can be shared by both the existing virDomainVcpuPinDefParseXML and your new function.
@@ -9280,7 +9353,7 @@ no_memory: virReportOOMError(); /* fallthrough */
- error: +error: VIR_FREE(tmp); VIR_FREE(nodes); virBitmapFree(bootMap);
On its own, this whitespace cleanup has no bearing on the patch; it's generally wise to limit cleanups to portions already affected by the patch.
But in general, the patch looked reasonable.
Thank you, I'll improve this patch in v2. -- Thanks, Hu Tao

From: Tang Chen <tangchen@cn.fujitsu.com> Introduce qemuSetupCgroupHypervisorPin() function to add hypervisor threads pin info to cpuset cgroup, the same as vcpupin. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> --- src/qemu/qemu_cgroup.c | 36 ++++++++++++++++++++++++++++++++++++ src/qemu/qemu_cgroup.h | 1 + 2 files changed, 37 insertions(+) diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index ceca4de..247f987 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -516,6 +516,36 @@ cleanup: return rc; } +int qemuSetupCgroupHypervisorPin(virCgroupPtr cgroup, virDomainDefPtr def) +{ + int rc = 0; + char *new_cpus = NULL; + + if (!def->cputune.hypervisorpin) + return 0; + + new_cpus = virDomainCpuSetFormat(def->cputune.hypervisorpin->cpumask, + VIR_DOMAIN_CPUMASK_LEN); + if (!new_cpus) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("failed to convert cpu mask")); + rc = -1; + goto cleanup; + } + + rc = virCgroupSetCpusetCpus(cgroup, new_cpus); + if (rc < 0) { + virReportSystemError(-rc, + "%s", + _("Unable to set cpuset.cpus")); + goto cleanup; + } + +cleanup: + VIR_FREE(new_cpus); + return rc; +} + int qemuSetupCgroupForVcpu(struct qemud_driver *driver, virDomainObjPtr vm) { virCgroupPtr cgroup = NULL; @@ -625,6 +655,7 @@ int qemuSetupCgroupForHypervisor(struct qemud_driver *driver, { virCgroupPtr cgroup = NULL; virCgroupPtr cgroup_hypervisor = NULL; + virDomainDefPtr def = vm->def; int rc, i; if (driver->cgroup == NULL) @@ -661,6 +692,11 @@ int qemuSetupCgroupForHypervisor(struct qemud_driver *driver, } } + if (def->cputune.hypervisorpin && + qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_CPUSET) && + qemuSetupCgroupHypervisorPin(cgroup_hypervisor, def) < 0) + goto cleanup; + virCgroupFree(&cgroup_hypervisor); virCgroupFree(&cgroup); return 0; diff --git a/src/qemu/qemu_cgroup.h b/src/qemu/qemu_cgroup.h index 62ec953..f321683 100644 --- a/src/qemu/qemu_cgroup.h +++ b/src/qemu/qemu_cgroup.h @@ -55,6 +55,7 @@ int qemuSetupCgroupVcpuBW(virCgroupPtr cgroup, long long quota); int qemuSetupCgroupVcpuPin(virCgroupPtr cgroup, virDomainDefPtr def, int vcpuid); +int qemuSetupCgroupHypervisorPin(virCgroupPtr cgroup, virDomainDefPtr def); int qemuSetupCgroupForVcpu(struct qemud_driver *driver, virDomainObjPtr vm); int qemuSetupCgroupForHypervisor(struct qemud_driver *driver, virDomainObjPtr vm); -- 1.7.10.2

On 08/03/2012 12:36 AM, Hu Tao wrote:
From: Tang Chen <tangchen@cn.fujitsu.com>
That's a long subject line (longer than 80! when we usually go for less than 60). I'd suggest: qemu: synchronize hypervisorpin info to cgroup
Introduce qemuSetupCgroupHypervisorPin() function to add hypervisor threads pin info to cpuset cgroup, the same as vcpupin.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> --- src/qemu/qemu_cgroup.c | 36 ++++++++++++++++++++++++++++++++++++ src/qemu/qemu_cgroup.h | 1 + 2 files changed, 37 insertions(+)
Looks reasonable. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

From: Tang Chen <tangchen@cn.fujitsu.com> Hypervisor threads should also be pinned by sched_setaffinity(), just the same as vcpu threads. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> --- src/qemu/qemu_process.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 54 insertions(+) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index d89b4d5..bb1640a 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -2005,6 +2005,56 @@ cleanup: return ret; } +/* Set CPU affinities for hypervisor threads if hypervisorpin xml provided. */ +static int +qemuProcessSetHypervisorAffinites(virConnectPtr conn, + virDomainObjPtr vm) +{ + virDomainDefPtr def = vm->def; + pid_t pid = vm->pid; + unsigned char *cpumask = NULL; + unsigned char *cpumap = NULL; + virNodeInfo nodeinfo; + int cpumaplen, hostcpus, maxcpu, i; + int ret = -1; + + if (virNodeGetInfo(conn, &nodeinfo) != 0) + return -1; + + if (!def->cputune.hypervisorpin) + return 0; + + hostcpus = VIR_NODEINFO_MAXCPUS(nodeinfo); + cpumaplen = VIR_CPU_MAPLEN(hostcpus); + maxcpu = cpumaplen * 8; + + if (maxcpu > hostcpus) + maxcpu = hostcpus; + + if (VIR_ALLOC_N(cpumap, cpumaplen) < 0) { + virReportOOMError(); + return -1; + } + + cpumask = (unsigned char *)def->cputune.hypervisorpin->cpumask; + for(i = 0; i < VIR_DOMAIN_CPUMASK_LEN; i++) { + if (cpumask[i]) + VIR_USE_CPU(cpumap, i); + } + + if (virProcessInfoSetAffinity(pid, + cpumap, + cpumaplen, + maxcpu) < 0) { + goto cleanup; + } + + ret = 0; +cleanup: + VIR_FREE(cpumap); + return ret; +} + static int qemuProcessInitPasswords(virConnectPtr conn, struct qemud_driver *driver, @@ -3765,6 +3815,10 @@ int qemuProcessStart(virConnectPtr conn, if (qemuProcessSetVcpuAffinites(conn, vm) < 0) goto cleanup; + VIR_DEBUG("Setting hypervisor threads affinities"); + if (qemuProcessSetHypervisorAffinites(conn, vm) < 0) + goto cleanup; + VIR_DEBUG("Setting any required VM passwords"); if (qemuProcessInitPasswords(conn, driver, vm) < 0) goto cleanup; -- 1.7.10.2

On 08/03/2012 12:36 AM, Hu Tao wrote:
From: Tang Chen <tangchen@cn.fujitsu.com>
Hypervisor threads should also be pinned by sched_setaffinity(), just the same as vcpu threads.
Indeed, this fallback makes sense when cpuset cgroup is not present. Same question as earlier in the series - is affinity necessary when cpuset is in effect, or does cpuset already guarantee everything that affinity would already provide?
+static int +qemuProcessSetHypervisorAffinites(virConnectPtr conn, + virDomainObjPtr vm) +{ + virDomainDefPtr def = vm->def; + pid_t pid = vm->pid; + unsigned char *cpumask = NULL; + unsigned char *cpumap = NULL; + virNodeInfo nodeinfo; + int cpumaplen, hostcpus, maxcpu, i; + int ret = -1; + + if (virNodeGetInfo(conn, &nodeinfo) != 0) + return -1; + + if (!def->cputune.hypervisorpin) + return 0; + + hostcpus = VIR_NODEINFO_MAXCPUS(nodeinfo); + cpumaplen = VIR_CPU_MAPLEN(hostcpus); + maxcpu = cpumaplen * 8;
That's a magic number; I prefer CHAR_BIT from <limits.h>. Not your fault, but I really hate the amount of gross code duplication we have for dealing with conversions between strings, cpu maps, and bitmaps. It would be really nice to scrub the code to make nice helper functions that can provide easier conversion interfaces rather than open coding the conversion at each client.
+ + if (virProcessInfoSetAffinity(pid, + cpumap, + cpumaplen, + maxcpu) < 0) {
Does this work, or does it slam the affinity of all the vcpu children threads at the same time, even when we have used <vcpupin> to request different pinning?
@@ -3765,6 +3815,10 @@ int qemuProcessStart(virConnectPtr conn, if (qemuProcessSetVcpuAffinites(conn, vm) < 0) goto cleanup;
+ VIR_DEBUG("Setting hypervisor threads affinities");
Reads awkwardly; maybe 'setting affinity of hypervisor threads' But the basic idea still looks sound, and if we save the conversion cleanup for later patches, then we're down to nits on this patch. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Mon, Aug 06, 2012 at 05:09:04PM -0600, Eric Blake wrote:
On 08/03/2012 12:36 AM, Hu Tao wrote:
From: Tang Chen <tangchen@cn.fujitsu.com>
Hypervisor threads should also be pinned by sched_setaffinity(), just the same as vcpu threads.
Indeed, this fallback makes sense when cpuset cgroup is not present.
Same question as earlier in the series - is affinity necessary when cpuset is in effect, or does cpuset already guarantee everything that affinity would already provide?
+static int +qemuProcessSetHypervisorAffinites(virConnectPtr conn, + virDomainObjPtr vm) +{ + virDomainDefPtr def = vm->def; + pid_t pid = vm->pid; + unsigned char *cpumask = NULL; + unsigned char *cpumap = NULL; + virNodeInfo nodeinfo; + int cpumaplen, hostcpus, maxcpu, i; + int ret = -1; + + if (virNodeGetInfo(conn, &nodeinfo) != 0) + return -1; + + if (!def->cputune.hypervisorpin) + return 0; + + hostcpus = VIR_NODEINFO_MAXCPUS(nodeinfo); + cpumaplen = VIR_CPU_MAPLEN(hostcpus); + maxcpu = cpumaplen * 8;
That's a magic number; I prefer CHAR_BIT from <limits.h>.
Not your fault, but I really hate the amount of gross code duplication we have for dealing with conversions between strings, cpu maps, and bitmaps. It would be really nice to scrub the code to make nice helper functions that can provide easier conversion interfaces rather than open coding the conversion at each client.
+ + if (virProcessInfoSetAffinity(pid, + cpumap, + cpumaplen, + maxcpu) < 0) {
Does this work, or does it slam the affinity of all the vcpu children threads at the same time, even when we have used <vcpupin> to request different pinning?
Doesn't matter, existing children threads are not affected. -- Thanks, Hu Tao

From: Tang Chen <tangchen@cn.fujitsu.com> Introduce 2 APIs to support hypervisor threads pin. 1) virDomainHypervisorPinAdd: setup hypervisor threads pin with a given cpumap string. 2) virDomainHypervisorPinDel: remove all hypervisor threads pin. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> --- src/conf/domain_conf.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++ src/conf/domain_conf.h | 6 ++++ src/libvirt_private.syms | 2 ++ 3 files changed, 79 insertions(+) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 08a662e..98ea335 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -11011,6 +11011,77 @@ virDomainVcpuPinDel(virDomainDefPtr def, int vcpu) return 0; } +int +virDomainHypervisorPinAdd(virDomainDefPtr def, + unsigned char *cpumap, + int maplen) +{ + virDomainVcpuPinDefPtr hypervisorpin = NULL; + char *cpumask = NULL; + int i; + + if (VIR_ALLOC_N(cpumask, VIR_DOMAIN_CPUMASK_LEN) < 0) { + virReportOOMError(); + goto cleanup; + } + + /* Convert bitmap (cpumap) to cpumask, which is byte map. */ + for (i = 0; i < maplen; i++) { + int cur; + + for (cur = 0; cur < 8; cur++) { + if (cpumap[i] & (1 << cur)) + cpumask[i * 8 + cur] = 1; + } + } + + if (!def->cputune.hypervisorpin) { + /* No hypervisorpin exists yet. */ + if (VIR_ALLOC(hypervisorpin) < 0) { + virReportOOMError(); + goto cleanup; + } + + hypervisorpin->vcpuid = -1; + hypervisorpin->cpumask = cpumask; + def->cputune.hypervisorpin = hypervisorpin; + } else { + /* Since there is only 1 hypervisorpin for each vm, + * juest replace the old one. + */ + VIR_FREE(def->cputune.hypervisorpin->cpumask); + def->cputune.hypervisorpin->cpumask = cpumask; + } + + return 0; + +cleanup: + VIR_FREE(cpumask); + return -1; +} + +int +virDomainHypervisorPinDel(virDomainDefPtr def) +{ + virDomainVcpuPinDefPtr hypervisorpin = NULL; + + /* No hypervisorpin exists yet */ + if (!def->cputune.hypervisorpin) { + return 0; + } + + hypervisorpin = def->cputune.hypervisorpin; + + VIR_FREE(hypervisorpin->cpumask); + VIR_FREE(hypervisorpin); + def->cputune.hypervisorpin = NULL; + + if (def->cputune.hypervisorpin) + return -1; + + return 0; +} + static int virDomainLifecycleDefFormat(virBufferPtr buf, int type, diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 03cab2b..9f63669 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -1990,6 +1990,12 @@ int virDomainVcpuPinAdd(virDomainDefPtr def, int virDomainVcpuPinDel(virDomainDefPtr def, int vcpu); +int virDomainHypervisorPinAdd(virDomainDefPtr def, + unsigned char *cpumap, + int maplen); + +int virDomainHypervisorPinDel(virDomainDefPtr def); + int virDomainDiskIndexByName(virDomainDefPtr def, const char *name, bool allow_ambiguous); const char *virDomainDiskPathByName(virDomainDefPtr, const char *name); diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index ae7f124..77220f4 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -492,6 +492,8 @@ virDomainTimerTrackTypeFromString; virDomainTimerTrackTypeToString; virDomainVcpuPinAdd; virDomainVcpuPinDel; +virDomainHypervisorPinAdd; +virDomainHypervisorPinDel; virDomainVcpuPinFindByVcpu; virDomainVcpuPinIsDuplicate; virDomainVideoDefFree; -- 1.7.10.2

On 08/03/2012 12:36 AM, Hu Tao wrote:
From: Tang Chen <tangchen@cn.fujitsu.com>
Introduce 2 APIs to support hypervisor threads pin. 1) virDomainHypervisorPinAdd: setup hypervisor threads pin with a given cpumap string. 2) virDomainHypervisorPinDel: remove all hypervisor threads pin.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> --- src/conf/domain_conf.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++ src/conf/domain_conf.h | 6 ++++ src/libvirt_private.syms | 2 ++ 3 files changed, 79 insertions(+)
Yuck. More code duplication. These new functions share code in part with the existing vcpupin code (that is, the notion of which thread to pin differs based on vcpu id vs. all other hypervisor threads, but once you have a thread or set of threads to pin, the work of actually pinning it, and the work of representing the pinning map in XML, should be common between the two). But if you overlook the need to refactor this code, then what you have done is an accurate, albeit hard-to-maintain, mirror of what the vcpupin code was doing. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

From: Tang Chen <tangchen@cn.fujitsu.com> Introduce 2 APIs for client to use. 1) virDomainPinHypervisorFlags: call remote driver api, such as remoteDomainPinHypervisorFlags. 2) virDomainGetHypervisorPinInfo: call remote driver api, such as remoteDomainGetHypervisorPinInfo. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> --- include/libvirt/libvirt.h.in | 10 +++ src/driver.h | 13 +++- src/libvirt.c | 147 ++++++++++++++++++++++++++++++++++++++++++ src/libvirt_public.syms | 2 + 4 files changed, 171 insertions(+), 1 deletion(-) diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index d21d029..15c08c1 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -1914,6 +1914,16 @@ int virDomainGetVcpuPinInfo (virDomainPtr domain, int maplen, unsigned int flags); +int virDomainPinHypervisorFlags (virDomainPtr domain, + unsigned char *cpumap, + int maplen, + unsigned int flags); + +int virDomainGetHypervisorPinInfo (virDomainPtr domain, + unsigned char *cpumaps, + int maplen, + unsigned int flags); + /** * VIR_USE_CPU: * @cpumap: pointer to a bit map of real CPUs (in 8-bit bytes) (IN/OUT) diff --git a/src/driver.h b/src/driver.h index aab9766..95d85e8 100644 --- a/src/driver.h +++ b/src/driver.h @@ -306,7 +306,16 @@ typedef int unsigned char *cpumaps, int maplen, unsigned int flags); - + typedef int + (*virDrvDomainPinHypervisorFlags) (virDomainPtr domain, + unsigned char *cpumap, + int maplen, + unsigned int flags); +typedef int + (*virDrvDomainGetHypervisorPinInfo) (virDomainPtr domain, + unsigned char *cpumaps, + int maplen, + unsigned int flags); typedef int (*virDrvDomainGetVcpus) (virDomainPtr domain, virVcpuInfoPtr info, @@ -938,6 +947,8 @@ struct _virDriver { virDrvDomainPinVcpu domainPinVcpu; virDrvDomainPinVcpuFlags domainPinVcpuFlags; virDrvDomainGetVcpuPinInfo domainGetVcpuPinInfo; + virDrvDomainPinHypervisorFlags domainPinHypervisorFlags; + virDrvDomainGetHypervisorPinInfo domainGetHypervisorPinInfo; virDrvDomainGetVcpus domainGetVcpus; virDrvDomainGetMaxVcpus domainGetMaxVcpus; virDrvDomainGetSecurityLabel domainGetSecurityLabel; diff --git a/src/libvirt.c b/src/libvirt.c index 3c4bf8c..c94caa7 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -8864,6 +8864,153 @@ error: } /** + * virDomainPinHypervisorFlags: + * @domain: pointer to domain object, or NULL for Domain0 + * @cpumap: pointer to a bit map of real CPUs (in 8-bit bytes) (IN) + * Each bit set to 1 means that corresponding CPU is usable. + * Bytes are stored in little-endian order: CPU0-7, 8-15... + * In each byte, lowest CPU number is least significant bit. + * @maplen: number of bytes in cpumap, from 1 up to size of CPU map in + * underlying virtualization system (Xen...). + * If maplen < size, missing bytes are set to zero. + * If maplen > size, failure code is returned. + * @flags: bitwise-OR of virDomainModificationImpact + * + * Dynamically change the real CPUs which can be allocated to all hypervisor + * threads. This function may require privileged access to the hypervisor. + * + * @flags may include VIR_DOMAIN_AFFECT_LIVE or VIR_DOMAIN_AFFECT_CONFIG. + * Both flags may be set. + * If VIR_DOMAIN_AFFECT_LIVE is set, the change affects a running domain + * and may fail if domain is not alive. + * If VIR_DOMAIN_AFFECT_CONFIG is set, the change affects persistent state, + * and will fail for transient domains. If neither flag is specified (that is, + * @flags is VIR_DOMAIN_AFFECT_CURRENT), then an inactive domain modifies + * persistent setup, while an active domain is hypervisor-dependent on whether + * just live or both live and persistent state is changed. + * Not all hypervisors can support all flag combinations. + * + * See also virDomainGetHypervisorPinInfo for querying this information. + * + * Returns 0 in case of success, -1 in case of failure. + * + */ +int +virDomainPinHypervisorFlags(virDomainPtr domain, unsigned char *cpumap, + int maplen, unsigned int flags) +{ + virConnectPtr conn; + + VIR_DOMAIN_DEBUG(domain, "cpumap=%p, maplen=%d, flags=%x", + cpumap, maplen, flags); + + virResetLastError(); + + if (!VIR_IS_CONNECTED_DOMAIN(domain)) { + virLibDomainError(VIR_ERR_INVALID_DOMAIN, __FUNCTION__); + virDispatchError(NULL); + return -1; + } + + if (domain->conn->flags & VIR_CONNECT_RO) { + virLibDomainError(VIR_ERR_OPERATION_DENIED, __FUNCTION__); + goto error; + } + + if ((cpumap == NULL) || (maplen < 1)) { + virLibDomainError(VIR_ERR_INVALID_ARG, __FUNCTION__); + goto error; + } + + conn = domain->conn; + + if (conn->driver->domainPinHypervisorFlags) { + int ret; + ret = conn->driver->domainPinHypervisorFlags (domain, cpumap, maplen, flags); + if (ret < 0) + goto error; + return ret; + } + + virLibConnError(VIR_ERR_NO_SUPPORT, __FUNCTION__); + +error: + virDispatchError(domain->conn); + return -1; +} + +/** + * virDomainGetHypervisorPinInfo: + * @domain: pointer to domain object, or NULL for Domain0 + * @cpumap: pointer to a bit map of real CPUs for all hypervisor threads of + * this domain (in 8-bit bytes) (OUT) + * There is only one cpumap for all hypervisor threads. + * Must not be NULL. + * @maplen: the number of bytes in one cpumap, from 1 up to size of CPU map. + * Must be positive. + * @flags: bitwise-OR of virDomainModificationImpact + * Must not be VIR_DOMAIN_AFFECT_LIVE and + * VIR_DOMAIN_AFFECT_CONFIG concurrently. + * + * Query the CPU affinity setting of all hypervisor threads of domain, store + * it in cpumap. + * + * Returns 1 in case of success, + * 0 in case of no hypervisor threads are pined to pcpus, + * -1 in case of failure. + */ +int +virDomainGetHypervisorPinInfo(virDomainPtr domain, unsigned char *cpumap, + int maplen, unsigned int flags) +{ + virConnectPtr conn; + + VIR_DOMAIN_DEBUG(domain, "cpumap=%p, maplen=%d, flags=%x", + cpumap, maplen, flags); + + virResetLastError(); + + if (!VIR_IS_CONNECTED_DOMAIN(domain)) { + virLibDomainError(VIR_ERR_INVALID_DOMAIN, __FUNCTION__); + virDispatchError(NULL); + return -1; + } + + if (!cpumap || maplen <= 0) { + virLibDomainError(VIR_ERR_INVALID_ARG, __FUNCTION__); + goto error; + } + if (INT_MULTIPLY_OVERFLOW(1, maplen)) { + virLibDomainError(VIR_ERR_OVERFLOW, _("input too large: 1 * %d"), + maplen); + goto error; + } + + /* At most one of these two flags should be set. */ + if ((flags & VIR_DOMAIN_AFFECT_LIVE) && + (flags & VIR_DOMAIN_AFFECT_CONFIG)) { + virLibDomainError(VIR_ERR_INVALID_ARG, __FUNCTION__); + goto error; + } + conn = domain->conn; + + if (conn->driver->domainGetHypervisorPinInfo) { + int ret; + ret = conn->driver->domainGetHypervisorPinInfo(domain, cpumap, + maplen, flags); + if (ret < 0) + goto error; + return ret; + } + + virLibConnError(VIR_ERR_NO_SUPPORT, __FUNCTION__); + +error: + virDispatchError(domain->conn); + return -1; +} + +/** * virDomainGetVcpus: * @domain: pointer to domain object, or NULL for Domain0 * @info: pointer to an array of virVcpuInfo structures (OUT) diff --git a/src/libvirt_public.syms b/src/libvirt_public.syms index e3ba119..1c606f3 100644 --- a/src/libvirt_public.syms +++ b/src/libvirt_public.syms @@ -549,6 +549,8 @@ LIBVIRT_0.10.0 { virDomainGetHostname; virConnectRegisterCloseCallback; virConnectUnregisterCloseCallback; + virDomainPinHypervisorFlags; + virDomainGetHypervisorPinInfo; } LIBVIRT_0.9.13; # .... define new API here using predicted next version number .... -- 1.7.10.2

On 08/03/2012 12:36 AM, Hu Tao wrote:
From: Tang Chen <tangchen@cn.fujitsu.com>
Introduce 2 APIs for client to use. 1) virDomainPinHypervisorFlags: call remote driver api, such as remoteDomainPinHypervisorFlags. 2) virDomainGetHypervisorPinInfo: call remote driver api, such as remoteDomainGetHypervisorPinInfo.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> --- include/libvirt/libvirt.h.in | 10 +++ src/driver.h | 13 +++- src/libvirt.c | 147 ++++++++++++++++++++++++++++++++++++++++++ src/libvirt_public.syms | 2 + 4 files changed, 171 insertions(+), 1 deletion(-)
Here's where I start to have questions on whether this approach makes sense. We already have a function for doing pinning, so why not add a new flag and reuse the existing function instead of adding a new one? That is, it might be nicer to change virDomainPinVcpuFlags (side note, why did we name it with 'Flags' in the name? In retrospect, that was a dumb naming choice) to have the 'vcpu' argument become signed, with -1 now being a valid value for all hypervisor threads not tied to a vcpu thread. Since the function has extern "C" linkage, it would not be an ABI break (it _would_ be a minor API break, but the only code that would fail to recompile is code that uses a function pointer to virDomainPinVcpuFlags, since most code would just get C's implicit casting rules). That is, instead of adding a new function, why can't: virDomainPinVcpuFlags(dom, -1, map, len, flags) serve as a way to pin all non-vcpu threads? Or if changing from 'unsigned int vcpu' to 'int vcpu' in the pinning function is unpalatable, then we could use: virDomainPinVcpuFlags(dom, 0, map, len, flags | VIR_DOMAIN_PIN_VCPU_HYPERVISOR) And for querying the hypervisor pinning, how about: virDomainGetVcpuPinInfo(dom, 1, map, len, flags | VIR_DOMAIN_GET_VCPU_PIN_HYPERVISOR)
+++ b/include/libvirt/libvirt.h.in @@ -1914,6 +1914,16 @@ int virDomainGetVcpuPinInfo (virDomainPtr domain, int maplen, unsigned int flags);
+int virDomainPinHypervisorFlags (virDomainPtr domain, + unsigned char *cpumap, + int maplen, + unsigned int flags);
_If_ we agree that a new API is needed instead of reusing the existing API with better semantics, then at least name it sensibly: virDomainPinHypervisor() by omitting the Flags suffix (no need to repeat the stupid naming mistake of vcpu pinning)
+ +int virDomainGetHypervisorPinInfo (virDomainPtr domain, + unsigned char *cpumaps, + int maplen, + unsigned int flags);
Indentation looks off.
+++ b/src/libvirt.c @@ -8864,6 +8864,153 @@ error: }
/** + * virDomainPinHypervisorFlags: + * @domain: pointer to domain object, or NULL for Domain0 + * @cpumap: pointer to a bit map of real CPUs (in 8-bit bytes) (IN) + * Each bit set to 1 means that corresponding CPU is usable. + * Bytes are stored in little-endian order: CPU0-7, 8-15... + * In each byte, lowest CPU number is least significant bit. + * @maplen: number of bytes in cpumap, from 1 up to size of CPU map in + * underlying virtualization system (Xen...). + * If maplen < size, missing bytes are set to zero. + * If maplen > size, failure code is returned. + * @flags: bitwise-OR of virDomainModificationImpact + * + * Dynamically change the real CPUs which can be allocated to all hypervisor + * threads. This function may require privileged access to the hypervisor.
This terminology of 'all hypervisor threads' feels rather loose, I'm thinking it might be better as 'all hypervisor threads not related to a specific vcpu', to make it clear that this is the catch-all for all remaining threads in the domain.
+/** + * virDomainGetHypervisorPinInfo: + * @domain: pointer to domain object, or NULL for Domain0 + * @cpumap: pointer to a bit map of real CPUs for all hypervisor threads of + * this domain (in 8-bit bytes) (OUT) + * There is only one cpumap for all hypervisor threads. + * Must not be NULL. + * @maplen: the number of bytes in one cpumap, from 1 up to size of CPU map. + * Must be positive. + * @flags: bitwise-OR of virDomainModificationImpact + * Must not be VIR_DOMAIN_AFFECT_LIVE and + * VIR_DOMAIN_AFFECT_CONFIG concurrently. + * + * Query the CPU affinity setting of all hypervisor threads of domain, store + * it in cpumap. + * + * Returns 1 in case of success, + * 0 in case of no hypervisor threads are pined to pcpus,
s/pined/pinned/
+++ b/src/libvirt_public.syms @@ -549,6 +549,8 @@ LIBVIRT_0.10.0 { virDomainGetHostname; virConnectRegisterCloseCallback; virConnectUnregisterCloseCallback; + virDomainPinHypervisorFlags; + virDomainGetHypervisorPinInfo;
I tend to sort these lists, but you're starting with unsorted text. :) I'd like an opinion from anyone else on whether reusing existing API with new flags makes more sense than adding new API. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Mon, Aug 06, 2012 at 05:31:23PM -0600, Eric Blake wrote:
On 08/03/2012 12:36 AM, Hu Tao wrote:
From: Tang Chen <tangchen@cn.fujitsu.com>
Introduce 2 APIs for client to use. 1) virDomainPinHypervisorFlags: call remote driver api, such as remoteDomainPinHypervisorFlags. 2) virDomainGetHypervisorPinInfo: call remote driver api, such as remoteDomainGetHypervisorPinInfo.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> --- include/libvirt/libvirt.h.in | 10 +++ src/driver.h | 13 +++- src/libvirt.c | 147 ++++++++++++++++++++++++++++++++++++++++++ src/libvirt_public.syms | 2 + 4 files changed, 171 insertions(+), 1 deletion(-)
Here's where I start to have questions on whether this approach makes sense. We already have a function for doing pinning, so why not add a new flag and reuse the existing function instead of adding a new one? That is, it might be nicer to change virDomainPinVcpuFlags (side note, why did we name it with 'Flags' in the name? In retrospect, that was a dumb naming choice) to have the 'vcpu' argument become signed, with -1 now being a valid value for all hypervisor threads not tied to a vcpu thread. Since the function has extern "C" linkage, it would not be an ABI break (it _would_ be a minor API break, but the only code that would fail to recompile is code that uses a function pointer to virDomainPinVcpuFlags, since most code would just get C's implicit casting rules).
That is, instead of adding a new function, why can't:
virDomainPinVcpuFlags(dom, -1, map, len, flags)
serve as a way to pin all non-vcpu threads? Or if changing from 'unsigned int vcpu' to 'int vcpu' in the pinning function is unpalatable, then we could use:
virDomainPinVcpuFlags(dom, 0, map, len, flags | VIR_DOMAIN_PIN_VCPU_HYPERVISOR)
And for querying the hypervisor pinning, how about:
virDomainGetVcpuPinInfo(dom, 1, map, len, flags | VIR_DOMAIN_GET_VCPU_PIN_HYPERVISOR)
While what you describe could be made to work, I tend to prefer the idea of adding in new APIs specifically for this, and dealing with code duplication inside the drivers instead of at the public API level. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

From: Tang Chen <tangchen@cn.fujitsu.com> Introduce 2 APIs to support hypervisor threads pin in qemu driver. 1) qemudDomainPinHypervisorFlags: setup hypervisor threads pin info. 2) qemudDomainGetHypervisorPinInfo: get all hypervisor threads pin info. They are similar to qemudDomainPinVcpuFlags and qemudDomainGetVcpuPinInfo. And also, remoteDispatchDomainPinHypervisorFlags and remoteDispatchDomainGetHypervisorPinInfo functions are introduced. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> --- src/qemu/qemu_driver.c | 223 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 223 insertions(+) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 2da13a4..3b1bf2c 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -3851,6 +3851,227 @@ cleanup: } static int +qemudDomainPinHypervisorFlags(virDomainPtr dom, + unsigned char *cpumap, + int maplen, + unsigned int flags) +{ + struct qemud_driver *driver = dom->conn->privateData; + virDomainObjPtr vm; + virCgroupPtr cgroup_dom = NULL; + virCgroupPtr cgroup_hypervisor = NULL; + pid_t pid; + virDomainDefPtr persistentDef = NULL; + int maxcpu, hostcpus; + virNodeInfo nodeinfo; + int ret = -1; + qemuDomainObjPrivatePtr priv; + bool canResetting = true; + int pcpu; + + virCheckFlags(VIR_DOMAIN_AFFECT_LIVE | + VIR_DOMAIN_AFFECT_CONFIG, -1); + + qemuDriverLock(driver); + vm = virDomainFindByUUID(&driver->domains, dom->uuid); + qemuDriverUnlock(driver); + + if (!vm) { + char uuidstr[VIR_UUID_STRING_BUFLEN]; + virUUIDFormat(dom->uuid, uuidstr); + virReportError(VIR_ERR_NO_DOMAIN, + _("no domain with matching uuid '%s'"), uuidstr); + goto cleanup; + } + + if (virDomainLiveConfigHelperMethod(driver->caps, vm, &flags, + &persistentDef) < 0) + goto cleanup; + + priv = vm->privateData; + + if (nodeGetInfo(dom->conn, &nodeinfo) < 0) + goto cleanup; + hostcpus = VIR_NODEINFO_MAXCPUS(nodeinfo); + maxcpu = maplen * 8; + if (maxcpu > hostcpus) + maxcpu = hostcpus; + /* pinning to all physical cpus means resetting, + * so check if we can reset setting. + */ + for (pcpu = 0; pcpu < hostcpus; pcpu++) { + if ((cpumap[pcpu/8] & (1 << (pcpu % 8))) == 0) { + canResetting = false; + break; + } + } + + pid = vm->pid; + + if (flags & VIR_DOMAIN_AFFECT_LIVE) { + + if (priv->vcpupids != NULL) { + if (virDomainHypervisorPinAdd(vm->def, cpumap, maplen) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("failed to update or add hypervisorpin xml " + "of a running domain")); + goto cleanup; + } + + if (qemuCgroupControllerActive(driver, + VIR_CGROUP_CONTROLLER_CPUSET)) { + /* + * Configure the corresponding cpuset cgroup. + * If no cgroup for domain or hypervisor exists, do nothing. + */ + if (virCgroupForDomain(driver->cgroup, vm->def->name, + &cgroup_dom, 0) == 0) { + if (virCgroupForHypervisor(cgroup_dom, &cgroup_hypervisor, 0) == 0) { + if (qemuSetupCgroupHypervisorPin(cgroup_hypervisor, vm->def) < 0) { + virReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("failed to set cpuset.cpus in cgroup" + " for hypervisor threads")); + goto cleanup; + } + } + } + } + + if (canResetting) { + if (virDomainHypervisorPinDel(vm->def) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("failed to delete hypervisorpin xml of " + "a running domain")); + goto cleanup; + } + } + + if (virProcessInfoSetAffinity(pid, cpumap, maplen, maxcpu) < 0) { + virReportError(VIR_ERR_SYSTEM_ERROR, "%s", + _("failed to set cpu affinity for " + "hypervisor threads")); + goto cleanup; + } + } else { + virReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("cpu affinity is not supported")); + goto cleanup; + } + + if (virDomainSaveStatus(driver->caps, driver->stateDir, vm) < 0) + goto cleanup; + } + + if (flags & VIR_DOMAIN_AFFECT_CONFIG) { + + if (canResetting) { + if (virDomainHypervisorPinDel(persistentDef) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("failed to delete hypervisorpin xml of " + "a persistent domain")); + goto cleanup; + } + } else { + if (virDomainHypervisorPinAdd(persistentDef, cpumap, maplen) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("failed to update or add hypervisorpin xml " + "of a persistent domain")); + goto cleanup; + } + } + + ret = virDomainSaveConfig(driver->configDir, persistentDef); + goto cleanup; + } + + ret = 0; + +cleanup: + if (cgroup_hypervisor) + virCgroupFree(&cgroup_hypervisor); + if (cgroup_dom) + virCgroupFree(&cgroup_dom); + + if (vm) + virDomainObjUnlock(vm); + return ret; +} + +static int +qemudDomainGetHypervisorPinInfo(virDomainPtr dom, + unsigned char *cpumaps, + int maplen, + unsigned int flags) +{ + struct qemud_driver *driver = dom->conn->privateData; + virDomainObjPtr vm = NULL; + virNodeInfo nodeinfo; + virDomainDefPtr targetDef = NULL; + int ret = -1; + int maxcpu, hostcpus, pcpu; + virDomainVcpuPinDefPtr hypervisorpin = NULL; + char *cpumask = NULL; + + virCheckFlags(VIR_DOMAIN_AFFECT_LIVE | + VIR_DOMAIN_AFFECT_CONFIG, -1); + + qemuDriverLock(driver); + vm = virDomainFindByUUID(&driver->domains, dom->uuid); + qemuDriverUnlock(driver); + + if (!vm) { + char uuidstr[VIR_UUID_STRING_BUFLEN]; + virUUIDFormat(dom->uuid, uuidstr); + virReportError(VIR_ERR_NO_DOMAIN, + _("no domain with matching uuid '%s'"), uuidstr); + goto cleanup; + } + + if (virDomainLiveConfigHelperMethod(driver->caps, vm, &flags, + &targetDef) < 0) + goto cleanup; + + if (flags & VIR_DOMAIN_AFFECT_LIVE) + targetDef = vm->def; + + /* Coverity didn't realize that targetDef must be set if we got here. */ + sa_assert(targetDef); + + if (nodeGetInfo(dom->conn, &nodeinfo) < 0) + goto cleanup; + hostcpus = VIR_NODEINFO_MAXCPUS(nodeinfo); + maxcpu = maplen * 8; + if (maxcpu > hostcpus) + maxcpu = hostcpus; + + /* initialize cpumaps */ + memset(cpumaps, 0xff, maplen); + if (maxcpu % 8) { + cpumaps[maplen - 1] &= (1 << maxcpu % 8) - 1; + } + + /* If no hypervisorpin, all cpus should be used */ + hypervisorpin = targetDef->cputune.hypervisorpin; + if (!hypervisorpin) { + ret = 0; + goto cleanup; + } + + cpumask = hypervisorpin->cpumask; + for (pcpu = 0; pcpu < maxcpu; pcpu++) { + if (cpumask[pcpu] == 0) + VIR_UNUSE_CPU(cpumaps, pcpu); + } + + ret = 1; + +cleanup: + if (vm) + virDomainObjUnlock(vm); + return ret; +} + +static int qemudDomainGetVcpus(virDomainPtr dom, virVcpuInfoPtr info, int maxinfo, @@ -13255,6 +13476,8 @@ static virDriver qemuDriver = { .domainPinVcpu = qemudDomainPinVcpu, /* 0.4.4 */ .domainPinVcpuFlags = qemudDomainPinVcpuFlags, /* 0.9.3 */ .domainGetVcpuPinInfo = qemudDomainGetVcpuPinInfo, /* 0.9.3 */ + .domainPinHypervisorFlags = qemudDomainPinHypervisorFlags, /* 0.9.13 */ + .domainGetHypervisorPinInfo = qemudDomainGetHypervisorPinInfo, /* 0.9.13 */ .domainGetVcpus = qemudDomainGetVcpus, /* 0.4.4 */ .domainGetMaxVcpus = qemudDomainGetMaxVcpus, /* 0.4.4 */ .domainGetSecurityLabel = qemudDomainGetSecurityLabel, /* 0.6.1 */ -- 1.7.10.2

On 08/03/2012 12:36 AM, Hu Tao wrote:
From: Tang Chen <tangchen@cn.fujitsu.com>
Again, a subject line longer than 80 columns. I suggest the much shorter: qemu: support hypervisor pinning
Introduce 2 APIs to support hypervisor threads pin in qemu driver. 1) qemudDomainPinHypervisorFlags: setup hypervisor threads pin info. 2) qemudDomainGetHypervisorPinInfo: get all hypervisor threads pin info.
Depending on the answer to 9/17, you may not need these functions, so much as tweaking the existing vcpupin functions to take a new flag.
static int +qemudDomainPinHypervisorFlags(virDomainPtr dom, + unsigned char *cpumap, + int maplen, + unsigned int flags) +{
Again, this is a lot of code duplication; refactoring things into common helper routines would make me feel better (in fact, using a flag would be a form of refactoring; the bulk of the code between vcpupin and hypervisor pin is then shared, with the difference being the flag that controls which one to pin).
@@ -13255,6 +13476,8 @@ static virDriver qemuDriver = { .domainPinVcpu = qemudDomainPinVcpu, /* 0.4.4 */ .domainPinVcpuFlags = qemudDomainPinVcpuFlags, /* 0.9.3 */ .domainGetVcpuPinInfo = qemudDomainGetVcpuPinInfo, /* 0.9.3 */ + .domainPinHypervisorFlags = qemudDomainPinHypervisorFlags, /* 0.9.13 */ + .domainGetHypervisorPinInfo = qemudDomainGetHypervisorPinInfo, /* 0.9.13 */
0.10.0 now. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

From: Tang Chen <tangchen@cn.fujitsu.com> Introduce 2 APIs to support hypervisor threads in remote driver. 1) remoteDomainPinHypervisorFlags: call driver api, such as qemudDomainPinHypervisorFlags. 2) remoteDomainGetHypervisorPinInfo: call driver api, such as qemudDomainGetHypervisorPinInfo. They are similar to remoteDomainPinVcpuFlags and remoteDomainGetVcpuPinInfo. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> --- daemon/remote.c | 103 ++++++++++++++++++++++++++++++++++++++++++ src/remote/remote_driver.c | 102 +++++++++++++++++++++++++++++++++++++++++ src/remote/remote_protocol.x | 23 +++++++++- src/remote_protocol-structs | 24 ++++++++++ 4 files changed, 251 insertions(+), 1 deletion(-) diff --git a/daemon/remote.c b/daemon/remote.c index d25717c..3677193 100644 --- a/daemon/remote.c +++ b/daemon/remote.c @@ -1533,6 +1533,109 @@ no_memory: } static int +remoteDispatchDomainPinHypervisorFlags(virNetServerPtr server ATTRIBUTE_UNUSED, + virNetServerClientPtr client, + virNetMessagePtr msg ATTRIBUTE_UNUSED, + virNetMessageErrorPtr rerr, + remote_domain_pin_hypervisor_flags_args *args) +{ + int rv = -1; + virDomainPtr dom = NULL; + struct daemonClientPrivate *priv = + virNetServerClientGetPrivateData(client); + + if (!priv->conn) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("connection not open")); + goto cleanup; + } + + if (!(dom = get_nonnull_domain(priv->conn, args->dom))) + goto cleanup; + + if (virDomainPinHypervisorFlags(dom, + (unsigned char *) args->cpumap.cpumap_val, + args->cpumap.cpumap_len, + args->flags) < 0) + goto cleanup; + + rv = 0; + +cleanup: + if (rv < 0) + virNetMessageSaveError(rerr); + if (dom) + virDomainFree(dom); + return rv; +} + + +static int +remoteDispatchDomainGetHypervisorPinInfo(virNetServerPtr server ATTRIBUTE_UNUSED, + virNetServerClientPtr client ATTRIBUTE_UNUSED, + virNetMessagePtr msg ATTRIBUTE_UNUSED, + virNetMessageErrorPtr rerr, + remote_domain_get_hypervisor_pin_info_args *args, + remote_domain_get_hypervisor_pin_info_ret *ret) +{ + virDomainPtr dom = NULL; + unsigned char *cpumaps = NULL; + int num; + int rv = -1; + struct daemonClientPrivate *priv = + virNetServerClientGetPrivateData(client); + + if (!priv->conn) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("connection not open")); + goto cleanup; + } + + if (!(dom = get_nonnull_domain(priv->conn, args->dom))) + goto cleanup; + + /* There is only one cpumap struct for all hypervisor threads */ + if (args->ncpumaps != 1) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("ncpumaps != 1")); + goto cleanup; + } + + if (INT_MULTIPLY_OVERFLOW(args->ncpumaps, args->maplen) || + args->ncpumaps * args->maplen > REMOTE_CPUMAPS_MAX) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("maxinfo * maplen > REMOTE_CPUMAPS_MAX")); + goto cleanup; + } + + /* Allocate buffers to take the results */ + if (args->maplen > 0 && + VIR_ALLOC_N(cpumaps, args->maplen) < 0) + goto no_memory; + + if ((num = virDomainGetHypervisorPinInfo(dom, + cpumaps, + args->maplen, + args->flags)) < 0) + goto cleanup; + + ret->num = num; + ret->cpumaps.cpumaps_len = args->maplen; + ret->cpumaps.cpumaps_val = (char *) cpumaps; + cpumaps = NULL; + + rv = 0; + +cleanup: + if (rv < 0) + virNetMessageSaveError(rerr); + VIR_FREE(cpumaps); + if (dom) + virDomainFree(dom); + return rv; + +no_memory: + virReportOOMError(); + goto cleanup; +} + +static int remoteDispatchDomainGetVcpus(virNetServerPtr server ATTRIBUTE_UNUSED, virNetServerClientPtr client ATTRIBUTE_UNUSED, virNetMessagePtr msg ATTRIBUTE_UNUSED, diff --git a/src/remote/remote_driver.c b/src/remote/remote_driver.c index b9e2127..f003343 100644 --- a/src/remote/remote_driver.c +++ b/src/remote/remote_driver.c @@ -1841,6 +1841,106 @@ done: } static int +remoteDomainPinHypervisorFlags (virDomainPtr dom, + unsigned char *cpumap, + int cpumaplen, + unsigned int flags) +{ + int rv = -1; + struct private_data *priv = dom->conn->privateData; + remote_domain_pin_hypervisor_flags_args args; + + remoteDriverLock(priv); + + if (cpumaplen > REMOTE_CPUMAP_MAX) { + virReportError(VIR_ERR_RPC, + _("%s length greater than maximum: %d > %d"), + "cpumap", (int)cpumaplen, REMOTE_CPUMAP_MAX); + goto done; + } + + make_nonnull_domain(&args.dom, dom); + args.vcpu = -1; + args.cpumap.cpumap_val = (char *)cpumap; + args.cpumap.cpumap_len = cpumaplen; + args.flags = flags; + + if (call(dom->conn, priv, 0, REMOTE_PROC_DOMAIN_PIN_HYPERVISOR_FLAGS, + (xdrproc_t) xdr_remote_domain_pin_hypervisor_flags_args, + (char *) &args, + (xdrproc_t) xdr_void, (char *) NULL) == -1) { + goto done; + } + + rv = 0; + +done: + remoteDriverUnlock(priv); + return rv; +} + + +static int +remoteDomainGetHypervisorPinInfo (virDomainPtr domain, + unsigned char *cpumaps, + int maplen, + unsigned int flags) +{ + int rv = -1; + int i; + remote_domain_get_hypervisor_pin_info_args args; + remote_domain_get_hypervisor_pin_info_ret ret; + struct private_data *priv = domain->conn->privateData; + + remoteDriverLock(priv); + + /* There is only one cpumap for all hypervisor threads */ + if (INT_MULTIPLY_OVERFLOW(1, maplen) || + maplen > REMOTE_CPUMAPS_MAX) { + virReportError(VIR_ERR_RPC, + _("vCPU map buffer length exceeds maximum: %d > %d"), + maplen, REMOTE_CPUMAPS_MAX); + goto done; + } + + make_nonnull_domain(&args.dom, domain); + args.ncpumaps = 1; + args.maplen = maplen; + args.flags = flags; + + memset(&ret, 0, sizeof(ret)); + + if (call (domain->conn, priv, 0, REMOTE_PROC_DOMAIN_GET_HYPERVISOR_PIN_INFO, + (xdrproc_t) xdr_remote_domain_get_hypervisor_pin_info_args, + (char *) &args, + (xdrproc_t) xdr_remote_domain_get_hypervisor_pin_info_ret, + (char *) &ret) == -1) + goto done; + + if (ret.cpumaps.cpumaps_len > maplen) { + virReportError(VIR_ERR_RPC, + _("host reports map buffer length exceeds maximum: %d > %d"), + ret.cpumaps.cpumaps_len, maplen); + goto cleanup; + } + + memset(cpumaps, 0, maplen); + + for (i = 0; i < ret.cpumaps.cpumaps_len; ++i) + cpumaps[i] = ret.cpumaps.cpumaps_val[i]; + + rv = ret.num; + +cleanup: + xdr_free ((xdrproc_t) xdr_remote_domain_get_hypervisor_pin_info_ret, + (char *) &ret); + +done: + remoteDriverUnlock(priv); + return rv; +} + +static int remoteDomainGetVcpus (virDomainPtr domain, virVcpuInfoPtr info, int maxinfo, @@ -5254,6 +5354,8 @@ static virDriver remote_driver = { .domainPinVcpu = remoteDomainPinVcpu, /* 0.3.0 */ .domainPinVcpuFlags = remoteDomainPinVcpuFlags, /* 0.9.3 */ .domainGetVcpuPinInfo = remoteDomainGetVcpuPinInfo, /* 0.9.3 */ + .domainPinHypervisorFlags = remoteDomainPinHypervisorFlags, /* 0.9.13 */ + .domainGetHypervisorPinInfo = remoteDomainGetHypervisorPinInfo, /* 0.9.13 */ .domainGetVcpus = remoteDomainGetVcpus, /* 0.3.0 */ .domainGetMaxVcpus = remoteDomainGetMaxVcpus, /* 0.3.0 */ .domainGetSecurityLabel = remoteDomainGetSecurityLabel, /* 0.6.1 */ diff --git a/src/remote/remote_protocol.x b/src/remote/remote_protocol.x index 200fe75..8d5d991 100644 --- a/src/remote/remote_protocol.x +++ b/src/remote/remote_protocol.x @@ -1054,6 +1054,25 @@ struct remote_domain_get_vcpu_pin_info_ret { int num; }; +struct remote_domain_pin_hypervisor_flags_args { + remote_nonnull_domain dom; + unsigned int vcpu; + opaque cpumap<REMOTE_CPUMAP_MAX>; /* (unsigned char *) */ + unsigned int flags; +}; + +struct remote_domain_get_hypervisor_pin_info_args { + remote_nonnull_domain dom; + int ncpumaps; + int maplen; + unsigned int flags; +}; + +struct remote_domain_get_hypervisor_pin_info_ret { + opaque cpumaps<REMOTE_CPUMAPS_MAX>; + int num; +}; + struct remote_domain_get_vcpus_args { remote_nonnull_domain dom; int maxinfo; @@ -2854,7 +2873,9 @@ enum remote_procedure { REMOTE_PROC_DOMAIN_LIST_ALL_SNAPSHOTS = 274, /* skipgen skipgen priority:high */ REMOTE_PROC_DOMAIN_SNAPSHOT_LIST_ALL_CHILDREN = 275, /* skipgen skipgen priority:high */ REMOTE_PROC_DOMAIN_EVENT_BALLOON_CHANGE = 276, /* autogen autogen */ - REMOTE_PROC_DOMAIN_GET_HOSTNAME = 277 /* autogen autogen */ + REMOTE_PROC_DOMAIN_GET_HOSTNAME = 277, /* autogen autogen */ + REMOTE_PROC_DOMAIN_PIN_HYPERVISOR_FLAGS = 278, /* skipgen skipgen */ + REMOTE_PROC_DOMAIN_GET_HYPERVISOR_PIN_INFO = 279 /* skipgen skipgen */ /* * Notice how the entries are grouped in sets of 10 ? diff --git a/src/remote_protocol-structs b/src/remote_protocol-structs index 8d09138..b9f5260 100644 --- a/src/remote_protocol-structs +++ b/src/remote_protocol-structs @@ -718,6 +718,28 @@ struct remote_domain_get_vcpu_pin_info_ret { } cpumaps; int num; }; +struct remote_domain_pin_hypervisor_flags_args { + remote_nonnull_domain dom; + u_int vcpu; + struct { + u_int cpumap_len; + char * cpumap_val; + } cpumap; + u_int flags; +}; +struct remote_domain_get_hypervisor_pin_info_args { + remote_nonnull_domain dom; + int ncpumaps; + int maplen; + u_int flags; +}; +struct remote_domain_get_hypervisor_pin_info_ret { + struct { + u_int cpumaps_len; + char * cpumaps_val; + } cpumaps; + int num; +}; struct remote_domain_get_vcpus_args { remote_nonnull_domain dom; int maxinfo; @@ -2259,4 +2281,6 @@ enum remote_procedure { REMOTE_PROC_DOMAIN_SNAPSHOT_LIST_ALL_CHILDREN = 275, REMOTE_PROC_DOMAIN_EVENT_BALLOON_CHANGE = 276, REMOTE_PROC_DOMAIN_GET_HOSTNAME = 277, + REMOTE_PROC_DOMAIN_PIN_HYPERVISOR_FLAGS = 278, + REMOTE_PROC_DOMAIN_GET_HYPERVISOR_PIN_INFO = 279, }; -- 1.7.10.2

On 08/03/2012 12:36 AM, Hu Tao wrote:
From: Tang Chen <tangchen@cn.fujitsu.com>
Again, a long subject line. Given Dan's rename suggestion, I propose: remote: introduce emulator pinning RPCs
static int +remoteDispatchDomainPinHypervisorFlags(virNetServerPtr server ATTRIBUTE_UNUSED,
No need for the 'Flags' suffix in the name.
+ +static int +remoteDispatchDomainGetHypervisorPinInfo(virNetServerPtr server ATTRIBUTE_UNUSED, + virNetServerClientPtr client ATTRIBUTE_UNUSED, + virNetMessagePtr msg ATTRIBUTE_UNUSED, + virNetMessageErrorPtr rerr, + remote_domain_get_hypervisor_pin_info_args *args, + remote_domain_get_hypervisor_pin_info_ret *ret) +{ + virDomainPtr dom = NULL; + unsigned char *cpumaps = NULL; + int num; + int rv = -1; + struct daemonClientPrivate *priv = + virNetServerClientGetPrivateData(client); + + if (!priv->conn) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("connection not open")); + goto cleanup; + } + + if (!(dom = get_nonnull_domain(priv->conn, args->dom))) + goto cleanup; + + /* There is only one cpumap struct for all hypervisor threads */ + if (args->ncpumaps != 1) {
If that's true, then we don't need args->ncpumaps in the first place. That's a tweak to make in the .x file.
+++ b/src/remote/remote_driver.c @@ -1841,6 +1841,106 @@ done: }
static int +remoteDomainPinHypervisorFlags (virDomainPtr dom, + unsigned char *cpumap, + int cpumaplen, + unsigned int flags) +{ + int rv = -1; + struct private_data *priv = dom->conn->privateData; + remote_domain_pin_hypervisor_flags_args args; + + remoteDriverLock(priv); + + if (cpumaplen > REMOTE_CPUMAP_MAX) { + virReportError(VIR_ERR_RPC, + _("%s length greater than maximum: %d > %d"), + "cpumap", (int)cpumaplen, REMOTE_CPUMAP_MAX);
Why are we casting cpumaplen, which is already an int?
+static int +remoteDomainGetHypervisorPinInfo (virDomainPtr domain, + unsigned char *cpumaps, + int maplen, + unsigned int flags) +{ + int rv = -1; + int i; + remote_domain_get_hypervisor_pin_info_args args; + remote_domain_get_hypervisor_pin_info_ret ret; + struct private_data *priv = domain->conn->privateData; + + remoteDriverLock(priv); + + /* There is only one cpumap for all hypervisor threads */ + if (INT_MULTIPLY_OVERFLOW(1, maplen) ||
This expression is always false ('1 * anything' cannot overflow), and therefore dead code worth simplifying.
@@ -5254,6 +5354,8 @@ static virDriver remote_driver = { .domainPinVcpu = remoteDomainPinVcpu, /* 0.3.0 */ .domainPinVcpuFlags = remoteDomainPinVcpuFlags, /* 0.9.3 */ .domainGetVcpuPinInfo = remoteDomainGetVcpuPinInfo, /* 0.9.3 */ + .domainPinHypervisorFlags = remoteDomainPinHypervisorFlags, /* 0.9.13 */ + .domainGetHypervisorPinInfo = remoteDomainGetHypervisorPinInfo, /* 0.9.13 */
0.10.0
+++ b/src/remote/remote_protocol.x @@ -1054,6 +1054,25 @@ struct remote_domain_get_vcpu_pin_info_ret { int num; };
+struct remote_domain_pin_hypervisor_flags_args {
Again, no need for '_flags' in the name.
+ remote_nonnull_domain dom; + unsigned int vcpu;
No need for a wasted 'vcpu' arg.
+ opaque cpumap<REMOTE_CPUMAP_MAX>; /* (unsigned char *) */ + unsigned int flags; +}; + +struct remote_domain_get_hypervisor_pin_info_args { + remote_nonnull_domain dom; + int ncpumaps;
No need for an 'ncpumaps' arg.
+ int maplen; + unsigned int flags; +}; + +struct remote_domain_get_hypervisor_pin_info_ret { + opaque cpumaps<REMOTE_CPUMAPS_MAX>; + int num;
'num' is a bit misleading for a name, and probably not necessary. The return value of virDomainGetHypervisorPinInfo is either 0 (no pinning present) or 1 (pinning present and details returned) (or negative for error, but that doesn't go through this RPC struct). If RPC allows for 0-length cpumaps<>, then you don't need num; otherwise, I'd rename num to 'ret', since it is recording the return value of 0 or 1. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

From: Tang Chen <tangchen@cn.fujitsu.com> Modify vcpupin command to support hypervisor threads pin. 1) add "--hypervisor" option to get hypervisor threads info. 2) add "--hypervisor cpuset" to set hypervisor threads to specified cpuset. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> --- tests/vcpupin | 6 +-- tools/virsh-domain.c | 147 +++++++++++++++++++++++++++++++++----------------- tools/virsh.pod | 12 +++-- 3 files changed, 109 insertions(+), 56 deletions(-) diff --git a/tests/vcpupin b/tests/vcpupin index f1fb038..e55806e 100755 --- a/tests/vcpupin +++ b/tests/vcpupin @@ -31,16 +31,16 @@ fi fail=0 # Invalid syntax. -$abs_top_builddir/tools/virsh --connect test:///default vcpupin test a 0,1 > out 2>&1 +$abs_top_builddir/tools/virsh --connect test:///default vcpupin test a --vcpu 0,1 > out 2>&1 test $? = 1 || fail=1 cat <<\EOF > exp || fail=1 -error: vcpupin: Invalid or missing vCPU number. +error: vcpupin: Invalid or missing vCPU number, or missing --hypervisor option. EOF compare exp out || fail=1 # An out-of-range vCPU number deserves a diagnostic, too. -$abs_top_builddir/tools/virsh --connect test:///default vcpupin test 100 0,1 > out 2>&1 +$abs_top_builddir/tools/virsh --connect test:///default vcpupin test --vcpu 100 0,1 > out 2>&1 test $? = 1 || fail=1 cat <<\EOF > exp || fail=1 error: vcpupin: Invalid vCPU number. diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index 33b1727..74603e8 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -4373,14 +4373,15 @@ cmdVcpuinfo(vshControl *ctl, const vshCmd *cmd) * "vcpupin" command */ static const vshCmdInfo info_vcpupin[] = { - {"help", N_("control or query domain vcpu affinity")}, - {"desc", N_("Pin domain VCPUs to host physical CPUs.")}, + {"help", N_("control or query domain vcpu or hypervisor threads affinities")}, + {"desc", N_("Pin domain VCPUs or hypervisor threads to host physical CPUs.")}, {NULL, NULL} }; static const vshCmdOptDef opts_vcpupin[] = { {"domain", VSH_OT_DATA, VSH_OFLAG_REQ, N_("domain name, id or uuid")}, - {"vcpu", VSH_OT_INT, 0, N_("vcpu number")}, + {"vcpu", VSH_OT_INT, VSH_OFLAG_REQ_OPT, N_("vcpu number")}, + {"hypervisor", VSH_OT_BOOL, VSH_OFLAG_REQ_OPT, N_("pin hypervisor threads")}, {"cpulist", VSH_OT_DATA, VSH_OFLAG_EMPTY_OK, N_("host cpu number(s) to set, or omit option to query")}, {"config", VSH_OT_BOOL, 0, N_("affect next boot")}, @@ -4389,6 +4390,45 @@ static const vshCmdOptDef opts_vcpupin[] = { {NULL, 0, 0, NULL} }; +/* + * Helper function to print vcpupin and hypervisorpin info. + */ +static bool +printPinInfo(unsigned char *cpumaps, size_t cpumaplen, + int maxcpu, int vcpuindex) +{ + int cpu, lastcpu; + bool bit, lastbit, isInvert; + + if (!cpumaps || cpumaplen <= 0 || maxcpu <= 0 || vcpuindex < 0) { + return false; + } + + bit = lastbit = isInvert = false; + lastcpu = -1; + + for (cpu = 0; cpu < maxcpu; cpu++) { + bit = VIR_CPU_USABLE(cpumaps, cpumaplen, vcpuindex, cpu); + + isInvert = (bit ^ lastbit); + if (bit && isInvert) { + if (lastcpu == -1) + vshPrint(ctl, "%d", cpu); + else + vshPrint(ctl, ",%d", cpu); + lastcpu = cpu; + } + if (!bit && isInvert && lastcpu != cpu - 1) + vshPrint(ctl, "-%d", cpu - 1); + lastbit = bit; + } + if (bit && !isInvert) { + vshPrint(ctl, "-%d", maxcpu - 1); + } + + return true; +} + static bool cmdVcpuPin(vshControl *ctl, const vshCmd *cmd) { @@ -4401,13 +4441,13 @@ cmdVcpuPin(vshControl *ctl, const vshCmd *cmd) unsigned char *cpumap = NULL; unsigned char *cpumaps = NULL; size_t cpumaplen; - bool bit, lastbit, isInvert; - int i, cpu, lastcpu, maxcpu, ncpus; + int i, cpu, lastcpu, maxcpu, ncpus, nhyper; bool unuse = false; const char *cur; bool config = vshCommandOptBool(cmd, "config"); bool live = vshCommandOptBool(cmd, "live"); bool current = vshCommandOptBool(cmd, "current"); + bool hypervisor = vshCommandOptBool(cmd, "hypervisor"); bool query = false; /* Query mode if no cpulist */ unsigned int flags = 0; @@ -4442,8 +4482,18 @@ cmdVcpuPin(vshControl *ctl, const vshCmd *cmd) /* In query mode, "vcpu" is optional */ if (vshCommandOptInt(cmd, "vcpu", &vcpu) < !query) { - vshError(ctl, "%s", - _("vcpupin: Invalid or missing vCPU number.")); + if (!hypervisor) { + vshError(ctl, "%s", + _("vcpupin: Invalid or missing vCPU number, " + "or missing --hypervisor option.")); + virDomainFree(dom); + return false; + } + } + + if (hypervisor && vcpu != -1) { + vshError(ctl, "%s", _("vcpupin: --hypervisor must be specified " + "exclusively to --vcpu.")); virDomainFree(dom); return false; } @@ -4475,47 +4525,45 @@ cmdVcpuPin(vshControl *ctl, const vshCmd *cmd) if (flags == -1) flags = VIR_DOMAIN_AFFECT_CURRENT; - cpumaps = vshMalloc(ctl, info.nrVirtCpu * cpumaplen); - if ((ncpus = virDomainGetVcpuPinInfo(dom, info.nrVirtCpu, - cpumaps, cpumaplen, flags)) >= 0) { - - vshPrint(ctl, "%s %s\n", _("VCPU:"), _("CPU Affinity")); - vshPrint(ctl, "----------------------------------\n"); - for (i = 0; i < ncpus; i++) { - - if (vcpu != -1 && i != vcpu) - continue; - - bit = lastbit = isInvert = false; - lastcpu = -1; - - vshPrint(ctl, "%4d: ", i); - for (cpu = 0; cpu < maxcpu; cpu++) { - - bit = VIR_CPU_USABLE(cpumaps, cpumaplen, i, cpu); - - isInvert = (bit ^ lastbit); - if (bit && isInvert) { - if (lastcpu == -1) - vshPrint(ctl, "%d", cpu); - else - vshPrint(ctl, ",%d", cpu); - lastcpu = cpu; - } - if (!bit && isInvert && lastcpu != cpu - 1) - vshPrint(ctl, "-%d", cpu - 1); - lastbit = bit; - } - if (bit && !isInvert) { - vshPrint(ctl, "-%d", maxcpu - 1); - } - vshPrint(ctl, "\n"); + if (!hypervisor) { + cpumaps = vshMalloc(ctl, info.nrVirtCpu * cpumaplen); + if ((ncpus = virDomainGetVcpuPinInfo(dom, info.nrVirtCpu, + cpumaps, cpumaplen, flags)) >= 0) { + vshPrint(ctl, "%s %s\n", _("VCPU:"), _("CPU Affinity")); + vshPrint(ctl, "----------------------------------\n"); + for (i = 0; i < ncpus; i++) { + if (vcpu != -1 && i != vcpu) + continue; + + vshPrint(ctl, "%4d: ", i); + ret = printPinInfo(cpumaps, cpumaplen, maxcpu, i); + vshPrint(ctl, "\n"); + if (!ret) + break; + } + } else { + ret = false; } + VIR_FREE(cpumaps); + } - } else { - ret = false; + if (vcpu == -1) { + cpumaps = vshMalloc(ctl, cpumaplen); + if ((nhyper = virDomainGetHypervisorPinInfo(dom, cpumaps, + cpumaplen, flags)) >= 0) { + if (!hypervisor) + vshPrint(ctl, "\n"); + vshPrint(ctl, "%s %s\n", _("Hypervisor:"), _("CPU Affinity")); + vshPrint(ctl, "----------------------------------\n"); + + vshPrint(ctl, " *: "); + ret = printPinInfo(cpumaps, cpumaplen, maxcpu, 0); + vshPrint(ctl, "\n"); + } else if (nhyper < 0) { + ret = false; + } + VIR_FREE(cpumaps); } - VIR_FREE(cpumaps); goto cleanup; } @@ -4593,13 +4641,14 @@ cmdVcpuPin(vshControl *ctl, const vshCmd *cmd) } if (flags == -1) { - if (virDomainPinVcpu(dom, vcpu, cpumap, cpumaplen) != 0) { + flags = VIR_DOMAIN_AFFECT_LIVE; + } + if (!hypervisor) { + if (virDomainPinVcpuFlags(dom, vcpu, cpumap, cpumaplen, flags) != 0) ret = false; - } } else { - if (virDomainPinVcpuFlags(dom, vcpu, cpumap, cpumaplen, flags) != 0) { + if (virDomainPinHypervisorFlags(dom, cpumap, cpumaplen, flags) != 0) ret = false; - } } cleanup: diff --git a/tools/virsh.pod b/tools/virsh.pod index 35613c4..ffc0777 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -1592,12 +1592,16 @@ Thus, this command always takes exactly zero or two flags. Returns basic information about the domain virtual CPUs, like the number of vCPUs, the running time, the affinity to physical processors. -=item B<vcpupin> I<domain> [I<vcpu>] [I<cpulist>] [[I<--live>] -[I<--config>] | [I<--current>]] +=item B<vcpupin> I<domain-id> [I<vcpu>] [I<hypervisor>] [I<cpulist>] +[[I<--live>] [I<--config>] | [I<--current>]] -Query or change the pinning of domain VCPUs to host physical CPUs. To -pin a single I<vcpu>, specify I<cpulist>; otherwise, you can query one +Query or change the pinning of domain VCPUs or hypervisor threads to host physical CPUs. +To pin a single I<vcpu>, specify I<cpulist>; otherwise, you can query one I<vcpu> or omit I<vcpu> to list all at once. +To pin all I<hypervisor> threads, specify I<cpulist>; otherwise, you can +query I<hypervisor>. +You can also omit I<vcpu> or I<hypervisor> to list vcpus and hypervisor threads +all at once. I<cpulist> is a list of physical CPU numbers. Its syntax is a comma separated list and a special markup using '-' and '^' (ex. '0-4', '0-3,^2') can -- 1.7.10.2

On 08/03/2012 12:36 AM, Hu Tao wrote:
From: Tang Chen <tangchen@cn.fujitsu.com>
Modify vcpupin command to support hypervisor threads pin. 1) add "--hypervisor" option to get hypervisor threads info. 2) add "--hypervisor cpuset" to set hypervisor threads to specified cpuset.
Dan's comment of s/hypervisor/emulator/ applies here as well.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> --- tests/vcpupin | 6 +-- tools/virsh-domain.c | 147 +++++++++++++++++++++++++++++++++----------------- tools/virsh.pod | 12 +++-- 3 files changed, 109 insertions(+), 56 deletions(-)
diff --git a/tests/vcpupin b/tests/vcpupin index f1fb038..e55806e 100755 --- a/tests/vcpupin +++ b/tests/vcpupin @@ -31,16 +31,16 @@ fi fail=0
# Invalid syntax. -$abs_top_builddir/tools/virsh --connect test:///default vcpupin test a 0,1 > out 2>&1 +$abs_top_builddir/tools/virsh --connect test:///default vcpupin test a --vcpu 0,1 > out 2>&1
Why did you have to change this example invocation? If you broke back-compat with existing clients, then we need to rethink how to add the new functionality into virsh. Since this was a negative test (where the command used to fail), if the command now succeeds where it used to fail, and adding '--vcpu' preserves the negative test aspect, I could understand it; but that doesn't appear to be the case here.
test $? = 1 || fail=1 cat <<\EOF > exp || fail=1 -error: vcpupin: Invalid or missing vCPU number. +error: vcpupin: Invalid or missing vCPU number, or missing --hypervisor option.
Changing an error message is okay, though.
EOF compare exp out || fail=1
# An out-of-range vCPU number deserves a diagnostic, too. -$abs_top_builddir/tools/virsh --connect test:///default vcpupin test 100 0,1 > out 2>&1 +$abs_top_builddir/tools/virsh --connect test:///default vcpupin test --vcpu 100 0,1 > out 2>&1
Again, you shouldn't be needing to modify this test.
test $? = 1 || fail=1 cat <<\EOF > exp || fail=1 error: vcpupin: Invalid vCPU number. diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index 33b1727..74603e8 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -4373,14 +4373,15 @@ cmdVcpuinfo(vshControl *ctl, const vshCmd *cmd) * "vcpupin" command */ static const vshCmdInfo info_vcpupin[] = { - {"help", N_("control or query domain vcpu affinity")}, - {"desc", N_("Pin domain VCPUs to host physical CPUs.")}, + {"help", N_("control or query domain vcpu or hypervisor threads affinities")}, + {"desc", N_("Pin domain VCPUs or hypervisor threads to host physical CPUs.")}, {NULL, NULL} };
static const vshCmdOptDef opts_vcpupin[] = { {"domain", VSH_OT_DATA, VSH_OFLAG_REQ, N_("domain name, id or uuid")}, - {"vcpu", VSH_OT_INT, 0, N_("vcpu number")}, + {"vcpu", VSH_OT_INT, VSH_OFLAG_REQ_OPT, N_("vcpu number")},
This is the broken change - you can't require existing users to start typing --vcpu.
+ {"hypervisor", VSH_OT_BOOL, VSH_OFLAG_REQ_OPT, N_("pin hypervisor threads")},
bool options cannot require an option, so using VSH_OFLAG_REQ_OPT is wrong here (hmm, virsh.c's self-check validation should probably be erroring out on this combination). I think what you want to end up with is: vcpupin dom => query ALL pinnings vcpupin dom --emulator => query just the emulator pinning vcpupin dom [--vcpu] number => query just vcpuN pinning vcpupin dom --emulator cpulist => set the emulator pinning to cpulist vcpupin dom [--vcpu] number cpulist => set the vcpuN pinning to cpulist Or in virsh.pod syntax: vcpupin <domain> [ { [--vcpu] <number> | <--emulator> } [<cpulist>] ] [--config] [--live] [--current] such that either --emulator or an integer but not both must be present if cpulist is also present; and that an integer can optionally be prefixed with --vcpu. There's no way to write that directly in opts_vcpupin, so it requires listing both options as optional, then manual checks in the function body that exactly one of the two options was present. Hmm, I see a potential gotcha: the parser would treat: vcpupin dom --emulator 0 as assigning '0' to --vcpu, instead of to --cpulist, so users of --emulator would always have to spell out: vcpupin dom --emulator --cpulist 0 but since that's a new command, at least we wouldn't be breaking back-compat. Maybe a solution would be to take --vcpu -1 as the special case of requesting the emulator pinning, instead of adding a new flag --emulator. That makes sense if we do the same overloading at the libvirt API level. Or maybe the solution is to add a new command 'virsh emulatorpin' for emulator pinning, instead of trying to overload 'virsh vcpupin', especially, since you have proposed separate libvirt APIs. That does mean you can't list all pinnings in one command, but that's not too tough of a restriction. I'm still thinking about the best way to resolve this UI issue, and I think part of the answer is determined by the API answer in 9/17.
+/* + * Helper function to print vcpupin and hypervisorpin info. + */ +static bool +printPinInfo(unsigned char *cpumaps, size_t cpumaplen, + int maxcpu, int vcpuindex)
We should probably name this vshPrintPinInfo, to keep our helper functions in the virsh namespace. For patch review purposes, it would help to split this into two patches - one that refactors code to introduce this new function and call it from the old location but has no semantic changes, and then one to add the new options and/or command and becomes a second client of the helper function.
static bool cmdVcpuPin(vshControl *ctl, const vshCmd *cmd) { @@ -4401,13 +4441,13 @@ cmdVcpuPin(vshControl *ctl, const vshCmd *cmd) unsigned char *cpumap = NULL; unsigned char *cpumaps = NULL; size_t cpumaplen; - bool bit, lastbit, isInvert; - int i, cpu, lastcpu, maxcpu, ncpus; + int i, cpu, lastcpu, maxcpu, ncpus, nhyper;
Why nhyper? There is only one emulator pinning value.
+ if (hypervisor && vcpu != -1) { + vshError(ctl, "%s", _("vcpupin: --hypervisor must be specified " + "exclusively to --vcpu."));
reads better as: --hypervisor and --vcpu cannot both be specified
+ if (vcpu == -1) { + cpumaps = vshMalloc(ctl, cpumaplen); + if ((nhyper = virDomainGetHypervisorPinInfo(dom, cpumaps, + cpumaplen, flags)) >= 0) {
Why >= 0? Either it is 0 (no pinning known) or 1 (exactly one pinning known).
+++ b/tools/virsh.pod @@ -1592,12 +1592,16 @@ Thus, this command always takes exactly zero or two flags. Returns basic information about the domain virtual CPUs, like the number of vCPUs, the running time, the affinity to physical processors.
-=item B<vcpupin> I<domain> [I<vcpu>] [I<cpulist>] [[I<--live>] -[I<--config>] | [I<--current>]] +=item B<vcpupin> I<domain-id> [I<vcpu>] [I<hypervisor>] [I<cpulist>]
You are introducing a regression on the spelling of 'I<domain>'. --hypervisor is a bool, so it should be listed as <--hypervisor>. See my above attempt at describing this virsh.pod layout using {} to indicate mutual exclusion, if we decide to overload this command rather than adding a new one. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

This patch changes the behaviour of xml element cputune.period and cputune.quota to limit cpu bandwidth only for vcpus, and no longer limit cpu bandwidth for the whole guest. The reasons to do this are: - This matches docs of cputune.period and cputune.quota. - The other parts excepting vcpus are treated as "hypervisor", and there are seperate period/quota settings for hypervisor in the subsequent patches --- src/qemu/qemu_cgroup.c | 33 ++++++------------------ src/qemu/qemu_driver.c | 65 ------------------------------------------------ 2 files changed, 8 insertions(+), 90 deletions(-) diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index 247f987..4d97665 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -556,11 +556,16 @@ int qemuSetupCgroupForVcpu(struct qemud_driver *driver, virDomainObjPtr vm) unsigned int i; unsigned long long period = vm->def->cputune.period; long long quota = vm->def->cputune.quota; - long long vm_quota = 0; if (driver->cgroup == NULL) return 0; /* Not supported, so claim success */ + if (!qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_CPU)) { + virReportError(VIR_ERR_SYSTEM_ERROR, "%s", + _("cgroup cpu is not active")); + return -1; + } + rc = virCgroupForDomain(driver->cgroup, vm->def->name, &cgroup, 0); if (rc != 0) { virReportSystemError(-rc, @@ -569,25 +574,6 @@ int qemuSetupCgroupForVcpu(struct qemud_driver *driver, virDomainObjPtr vm) goto cleanup; } - /* Set cpu bandwidth for the vm */ - if (period || quota) { - if (qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_CPU)) { - /* Ensure that we can multiply by vcpus without overflowing. */ - if (quota > LLONG_MAX / vm->def->vcpus) { - virReportSystemError(EINVAL, "%s", - _("Unable to set cpu bandwidth quota")); - goto cleanup; - } - - if (quota > 0) - vm_quota = quota * vm->def->vcpus; - else - vm_quota = quota; - if (qemuSetupCgroupVcpuBW(cgroup, period, vm_quota) < 0) - goto cleanup; - } - } - if (priv->nvcpupids == 0 || priv->vcpupids[0] == vm->pid) { /* If we does not know VCPU<->PID mapping or all vcpus run in the same * thread, we cannot control each vcpu. @@ -617,10 +603,8 @@ int qemuSetupCgroupForVcpu(struct qemud_driver *driver, virDomainObjPtr vm) } if (period || quota) { - if (qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_CPU)) { - if (qemuSetupCgroupVcpuBW(cgroup_vcpu, period, quota) < 0) - goto cleanup; - } + if (qemuSetupCgroupVcpuBW(cgroup_vcpu, period, quota) < 0) + goto cleanup; } /* Set vcpupin in cgroup if vcpupin xml is provided */ @@ -632,7 +616,6 @@ int qemuSetupCgroupForVcpu(struct qemud_driver *driver, virDomainObjPtr vm) virCgroupFree(&cgroup_vcpu); } - virCgroupFree(&cgroup_vcpu); virCgroupFree(&cgroup); return 0; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 3b1bf2c..7d1d093 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -7386,69 +7386,10 @@ qemuSetVcpusBWLive(virDomainObjPtr vm, virCgroupPtr cgroup, qemuDomainObjPrivatePtr priv = vm->privateData; virCgroupPtr cgroup_vcpu = NULL; int rc; - long long vm_quota = 0; - long long old_quota = 0; - unsigned long long old_period = 0; if (period == 0 && quota == 0) return 0; - /* Ensure that we can multiply by vcpus without overflowing. */ - if (quota > LLONG_MAX / vm->def->vcpus) { - virReportSystemError(EINVAL, "%s", - _("Unable to set cpu bandwidth quota")); - goto cleanup; - } - - if (quota > 0) - vm_quota = quota * vm->def->vcpus; - else - vm_quota = quota; - - rc = virCgroupGetCpuCfsQuota(cgroup, &old_quota); - if (rc < 0) { - virReportSystemError(-rc, "%s", - _("unable to get cpu bandwidth tunable")); - goto cleanup; - } - - rc = virCgroupGetCpuCfsPeriod(cgroup, &old_period); - if (rc < 0) { - virReportSystemError(-rc, "%s", - _("unable to get cpu bandwidth period tunable")); - goto cleanup; - } - - /* - * If quota will be changed to a small value, we should modify vcpu's quota - * first. Otherwise, we should modify vm's quota first. - * - * If period will be changed to a small value, we should modify vm's period - * first. Otherwise, we should modify vcpu's period first. - * - * If both quota and period will be changed to a big/small value, we cannot - * modify period and quota together. - */ - if ((quota != 0) && (period != 0)) { - if (((quota > old_quota) && (period > old_period)) || - ((quota < old_quota) && (period < old_period))) { - /* modify period */ - if (qemuSetVcpusBWLive(vm, cgroup, period, 0) < 0) - goto cleanup; - - /* modify quota */ - if (qemuSetVcpusBWLive(vm, cgroup, 0, quota) < 0) - goto cleanup; - return 0; - } - } - - if (((vm_quota != 0) && (vm_quota > old_quota)) || - ((period != 0) && (period < old_period))) - /* Set cpu bandwidth for the vm */ - if (qemuSetupCgroupVcpuBW(cgroup, period, vm_quota) < 0) - goto cleanup; - /* If we does not know VCPU<->PID mapping or all vcpu runs in the same * thread, we cannot control each vcpu. So we only modify cpu bandwidth * when each vcpu has a separated thread. @@ -7471,12 +7412,6 @@ qemuSetVcpusBWLive(virDomainObjPtr vm, virCgroupPtr cgroup, } } - if (((vm_quota != 0) && (vm_quota <= old_quota)) || - ((period != 0) && (period >= old_period))) - /* Set cpu bandwidth for the vm */ - if (qemuSetupCgroupVcpuBW(cgroup, period, vm_quota) < 0) - goto cleanup; - return 0; cleanup: -- 1.7.10.2

On 08/03/2012 12:36 AM, Hu Tao wrote:
This patch changes the behaviour of xml element cputune.period and cputune.quota to limit cpu bandwidth only for vcpus, and no longer limit cpu bandwidth for the whole guest.
The reasons to do this are:
- This matches docs of cputune.period and cputune.quota. - The other parts excepting vcpus are treated as "hypervisor", and there are seperate period/quota settings for hypervisor
s/hypervisor/emulator/ throughout, s/seperate/separate/ Is it worth being able to put a separate quota on the entire domain (emulator+vcpus), independently from the smaller quotas on a per-vcpu or per-emulator basis? While I do think this is an appropriate fix (normally, what we are trying to rate limit is the guest itself, and that means run the emulator as fast as possible to let the guest keep up), I do worry that it means someone that was previously using this to limit the entire domain would now be oversubscribed, unless we also have a way to preserve the ability to limit the entire domain. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

From: Wen Congyang <wency@cn.fujitsu.com> --- docs/schemas/domaincommon.rng | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index c0b83b3..8729c02 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -566,6 +566,16 @@ <ref name="cpuquota"/> </element> </optional> + <optional> + <element name="hypervisor_period"> + <ref name="cpuperiod"/> + </element> + </optional> + <optional> + <element name="hypervisor_quota"> + <ref name="cpuquota"/> + </element> + </optional> <zeroOrMore> <element name="vcpupin"> <attribute name="vcpu"> -- 1.7.10.2

On 08/03/2012 12:36 AM, Hu Tao wrote:
From: Wen Congyang <wency@cn.fujitsu.com>
--- docs/schemas/domaincommon.rng | 10 ++++++++++ 1 file changed, 10 insertions(+)
Missing counterpart documentation in docs/formatdomain.html.in, and a corresponding test case somewhere in tests. I would consider squashing this in with 15/17 - the new schema doesn't do us any good without also being able to parse it. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

From: Wen Congyang <wency@cn.fujitsu.com> set hypervisor's period and quota to limit cpu bandwidth when the vm starts. --- src/conf/domain_conf.c | 25 +++++++++++++++++++++++-- src/conf/domain_conf.h | 2 ++ src/qemu/qemu_cgroup.c | 11 ++++++++++- 3 files changed, 35 insertions(+), 3 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 98ea335..fd437a9 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -8235,6 +8235,14 @@ static virDomainDefPtr virDomainDefParseXML(virCapsPtr caps, &def->cputune.quota) < 0) def->cputune.quota = 0; + if (virXPathULongLong("string(./cputune/hypervisor_period[1])", ctxt, + &def->cputune.hypervisor_period) < 0) + def->cputune.hypervisor_period = 0; + + if (virXPathLongLong("string(./cputune/hypervisor_quota[1])", ctxt, + &def->cputune.hypervisor_quota) < 0) + def->cputune.hypervisor_quota = 0; + if ((n = virXPathNodeSet("./cputune/vcpupin", ctxt, &nodes)) < 0) { goto error; } @@ -12993,7 +13001,8 @@ virDomainDefFormatInternal(virDomainDefPtr def, if (def->cputune.shares || def->cputune.vcpupin || def->cputune.period || def->cputune.quota || - def->cputune.hypervisorpin) + def->cputune.hypervisorpin || + def->cputune.hypervisor_period || def->cputune.hypervisor_quota) virBufferAddLit(buf, " <cputune>\n"); if (def->cputune.shares) @@ -13005,6 +13014,17 @@ virDomainDefFormatInternal(virDomainDefPtr def, if (def->cputune.quota) virBufferAsprintf(buf, " <quota>%lld</quota>\n", def->cputune.quota); + + if (def->cputune.hypervisor_period) + virBufferAsprintf(buf, " <hypervisor_period>%llu" + "</hypervisor_period>\n", + def->cputune.hypervisor_period); + + if (def->cputune.hypervisor_period) + virBufferAsprintf(buf, " <hypervisor_quota>%lld" + "</hypervisor_quota>\n", + def->cputune.hypervisor_quota); + if (def->cputune.vcpupin) { for (i = 0; i < def->cputune.nvcpupin; i++) { virBufferAsprintf(buf, " <vcpupin vcpu='%u' ", @@ -13043,7 +13063,8 @@ virDomainDefFormatInternal(virDomainDefPtr def, if (def->cputune.shares || def->cputune.vcpupin || def->cputune.period || def->cputune.quota || - def->cputune.hypervisorpin) + def->cputune.hypervisorpin || + def->cputune.hypervisor_period || def->cputune.hypervisor_quota) virBufferAddLit(buf, " </cputune>\n"); if (def->numatune.memory.nodemask || diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 9f63669..44283ef 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -1607,6 +1607,8 @@ struct _virDomainDef { unsigned long shares; unsigned long long period; long long quota; + unsigned long long hypervisor_period; + long long hypervisor_quota; int nvcpupin; virDomainVcpuPinDefPtr *vcpupin; virDomainVcpuPinDefPtr hypervisorpin; diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index 4d97665..27c0c8a 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -575,7 +575,7 @@ int qemuSetupCgroupForVcpu(struct qemud_driver *driver, virDomainObjPtr vm) } if (priv->nvcpupids == 0 || priv->vcpupids[0] == vm->pid) { - /* If we does not know VCPU<->PID mapping or all vcpus run in the same + /* If we don't know VCPU<->PID mapping or all vcpu runs in the same * thread, we cannot control each vcpu. */ virReportError(VIR_ERR_INTERNAL_ERROR, "%s", @@ -639,6 +639,8 @@ int qemuSetupCgroupForHypervisor(struct qemud_driver *driver, virCgroupPtr cgroup = NULL; virCgroupPtr cgroup_hypervisor = NULL; virDomainDefPtr def = vm->def; + unsigned long long period = vm->def->cputune.hypervisor_period; + long long quota = vm->def->cputune.hypervisor_quota; int rc, i; if (driver->cgroup == NULL) @@ -680,6 +682,13 @@ int qemuSetupCgroupForHypervisor(struct qemud_driver *driver, qemuSetupCgroupHypervisorPin(cgroup_hypervisor, def) < 0) goto cleanup; + if (period || quota) { + if (qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_CPU)) { + if (qemuSetupCgroupVcpuBW(cgroup_hypervisor, period, quota) < 0) + goto cleanup; + } + } + virCgroupFree(&cgroup_hypervisor); virCgroupFree(&cgroup); return 0; -- 1.7.10.2

On 08/03/2012 12:36 AM, Hu Tao wrote:
From: Wen Congyang <wency@cn.fujitsu.com>
subject is long. I'd suggest: qemu: introduce period/quota tuning for emulator -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

From: Wen Congyang <wency@cn.fujitsu.com> allow the user change/get hypervisor's period and quota when the vm is running. --- include/libvirt/libvirt.h.in | 16 +++++ src/qemu/qemu_driver.c | 133 +++++++++++++++++++++++++++++++++++++++++- 2 files changed, 148 insertions(+), 1 deletion(-) diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index 15c08c1..dd34295 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -692,6 +692,22 @@ typedef virTypedParameter *virTypedParameterPtr; #define VIR_DOMAIN_SCHEDULER_VCPU_QUOTA "vcpu_quota" /** + * VIR_DOMAIN_SCHEDULER_HYPERVISOR_PERIOD: + * + * Macro represents the enforcement period for a quota, in microseconds, + * when using the posix scheduler, as a ullong. + */ +#define VIR_DOMAIN_SCHEDULER_HYPERVISOR_PERIOD "hypervisor_period" + +/** + * VIR_DOMAIN_SCHEDULER_HYPERVISOR_QUOTA: + * + * Macro represents the maximum bandwidth to be used within a period, + * when using the posix scheduler, as an llong. + */ +#define VIR_DOMAIN_SCHEDULER_HYPERVISOR_QUOTA "hypervisor_quota" + +/** * VIR_DOMAIN_SCHEDULER_WEIGHT: * * Macro represents the relative weight, when using the credit diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 7d1d093..e1274c2 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -6349,7 +6349,7 @@ static char *qemuGetSchedulerType(virDomainPtr dom, else if (rc == 0) *nparams = 1; else - *nparams = 3; + *nparams = 5; } ret = strdup("posix"); @@ -7420,6 +7420,40 @@ cleanup: } static int +qemuSetHypervisorBWLive(virDomainObjPtr vm, virCgroupPtr cgroup, + unsigned long long period, long long quota) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + virCgroupPtr cgroup_hypervisor = NULL; + int rc; + + if (period == 0 && quota == 0) + return 0; + + if (priv->nvcpupids == 0 || priv->vcpupids[0] == vm->pid) { + return 0; + } + + rc = virCgroupForHypervisor(cgroup, &cgroup_hypervisor, 0); + if (rc < 0) { + virReportSystemError(-rc, + _("Unable to find hypervisor cgroup for %s"), + vm->def->name); + goto cleanup; + } + + if (qemuSetupCgroupVcpuBW(cgroup_hypervisor, period, quota) < 0) + goto cleanup; + + virCgroupFree(&cgroup_hypervisor); + return 0; + +cleanup: + virCgroupFree(&cgroup_hypervisor); + return -1; +} + +static int qemuSetSchedulerParametersFlags(virDomainPtr dom, virTypedParameterPtr params, int nparams, @@ -7442,6 +7476,10 @@ qemuSetSchedulerParametersFlags(virDomainPtr dom, VIR_TYPED_PARAM_ULLONG, VIR_DOMAIN_SCHEDULER_VCPU_QUOTA, VIR_TYPED_PARAM_LLONG, + VIR_DOMAIN_SCHEDULER_HYPERVISOR_PERIOD, + VIR_TYPED_PARAM_ULLONG, + VIR_DOMAIN_SCHEDULER_HYPERVISOR_QUOTA, + VIR_TYPED_PARAM_LLONG, NULL) < 0) return -1; @@ -7524,6 +7562,32 @@ qemuSetSchedulerParametersFlags(virDomainPtr dom, if (flags & VIR_DOMAIN_AFFECT_CONFIG) { vmdef->cputune.quota = params[i].value.l; } + } else if (STREQ(param->field, VIR_DOMAIN_SCHEDULER_HYPERVISOR_PERIOD)) { + if (flags & VIR_DOMAIN_AFFECT_LIVE) { + rc = qemuSetHypervisorBWLive(vm, group, params[i].value.ul, 0); + if (rc != 0) + goto cleanup; + + if (params[i].value.ul) + vm->def->cputune.hypervisor_period = params[i].value.ul; + } + + if (flags & VIR_DOMAIN_AFFECT_CONFIG) { + vmdef->cputune.hypervisor_period = params[i].value.ul; + } + } else if (STREQ(param->field, VIR_DOMAIN_SCHEDULER_HYPERVISOR_QUOTA)) { + if (flags & VIR_DOMAIN_AFFECT_LIVE) { + rc = qemuSetHypervisorBWLive(vm, group, 0, params[i].value.l); + if (rc != 0) + goto cleanup; + + if (params[i].value.l) + vm->def->cputune.hypervisor_quota = params[i].value.l; + } + + if (flags & VIR_DOMAIN_AFFECT_CONFIG) { + vmdef->cputune.hypervisor_quota = params[i].value.l; + } } } @@ -7628,6 +7692,43 @@ cleanup: } static int +qemuGetHypervisorBWLive(virDomainObjPtr vm, virCgroupPtr cgroup, + unsigned long long *period, long long *quota) +{ + virCgroupPtr cgroup_hypervisor = NULL; + qemuDomainObjPrivatePtr priv = NULL; + int rc; + int ret = -1; + + priv = vm->privateData; + if (priv->nvcpupids == 0 || priv->vcpupids[0] == vm->pid) { + /* We donot create sub dir for each vcpu */ + *period = 0; + *quota = 0; + return 0; + } + + /* get period and quota for hypervisor */ + rc = virCgroupForHypervisor(cgroup, &cgroup_hypervisor, 0); + if (!cgroup_hypervisor) { + virReportSystemError(-rc, + _("Unable to find hypervisor cgroup for %s"), + vm->def->name); + goto cleanup; + } + + rc = qemuGetVcpuBWLive(cgroup_hypervisor, period, quota); + if (rc < 0) + goto cleanup; + + ret = 0; + +cleanup: + virCgroupFree(&cgroup_hypervisor); + return ret; +} + +static int qemuGetSchedulerParametersFlags(virDomainPtr dom, virTypedParameterPtr params, int *nparams, @@ -7639,6 +7740,8 @@ qemuGetSchedulerParametersFlags(virDomainPtr dom, unsigned long long shares; unsigned long long period; long long quota; + unsigned long long hypervisor_period; + long long hypervisor_quota; int ret = -1; int rc; bool cpu_bw_status = false; @@ -7678,6 +7781,8 @@ qemuGetSchedulerParametersFlags(virDomainPtr dom, if (*nparams > 1 && cpu_bw_status) { period = persistentDef->cputune.period; quota = persistentDef->cputune.quota; + hypervisor_period = persistentDef->cputune.hypervisor_period; + hypervisor_quota = persistentDef->cputune.hypervisor_quota; } goto out; } @@ -7706,6 +7811,14 @@ qemuGetSchedulerParametersFlags(virDomainPtr dom, if (rc != 0) goto cleanup; } + + if (*nparams > 3 && cpu_bw_status) { + rc = qemuGetHypervisorBWLive(vm, group, &hypervisor_period, + &hypervisor_quota); + if (rc != 0) + goto cleanup; + } + out: if (virTypedParameterAssign(¶ms[0], VIR_DOMAIN_SCHEDULER_CPU_SHARES, VIR_TYPED_PARAM_ULLONG, shares) < 0) @@ -7728,6 +7841,24 @@ out: goto cleanup; saved_nparams++; } + + if (*nparams > saved_nparams) { + if (virTypedParameterAssign(¶ms[3], + VIR_DOMAIN_SCHEDULER_HYPERVISOR_PERIOD, + VIR_TYPED_PARAM_ULLONG, + hypervisor_period) < 0) + goto cleanup; + saved_nparams++; + } + + if (*nparams > saved_nparams) { + if (virTypedParameterAssign(¶ms[4], + VIR_DOMAIN_SCHEDULER_HYPERVISOR_QUOTA, + VIR_TYPED_PARAM_LLONG, + hypervisor_quota) < 0) + goto cleanup; + saved_nparams++; + } } *nparams = saved_nparams; -- 1.7.10.2

On 08/03/2012 12:36 AM, Hu Tao wrote:
From: Wen Congyang <wency@cn.fujitsu.com>
allow the user change/get hypervisor's period and quota when the vm is running. --- include/libvirt/libvirt.h.in | 16 +++++ src/qemu/qemu_driver.c | 133 +++++++++++++++++++++++++++++++++++++++++- 2 files changed, 148 insertions(+), 1 deletion(-)
diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index 15c08c1..dd34295 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -692,6 +692,22 @@ typedef virTypedParameter *virTypedParameterPtr; #define VIR_DOMAIN_SCHEDULER_VCPU_QUOTA "vcpu_quota"
/** + * VIR_DOMAIN_SCHEDULER_HYPERVISOR_PERIOD:
more of the s/hypervisor/emulator/ renaming
+ * + * Macro represents the enforcement period for a quota, in microseconds, + * when using the posix scheduler, as a ullong.
Enforcement period for what? I would suggest: Macro represents the enforcement period for a quota in microseconds, when using the posix scheduler, for all emulator activity not tied to vcpus, as a ullong.
+ */ +#define VIR_DOMAIN_SCHEDULER_HYPERVISOR_PERIOD "hypervisor_period" + +/** + * VIR_DOMAIN_SCHEDULER_HYPERVISOR_QUOTA: + * + * Macro represents the maximum bandwidth to be used within a period, + * when using the posix scheduler, as an llong.
and similarly. It may mean rewording the existing descriptions of vcpu quota to be more obvious that they limit only vcpu activity.
@@ -7420,6 +7420,40 @@ cleanup: }
static int +qemuSetHypervisorBWLive(virDomainObjPtr vm, virCgroupPtr cgroup,
Probably copying from existing naming, but BW looks awkward to me (I first read it as black/white, not bandwidth); spelling it out would help.
static int +qemuGetHypervisorBWLive(virDomainObjPtr vm, virCgroupPtr cgroup, + unsigned long long *period, long long *quota)
Same comment about spelling out BW as Bandwidth.
+{ + virCgroupPtr cgroup_hypervisor = NULL; + qemuDomainObjPrivatePtr priv = NULL; + int rc; + int ret = -1; + + priv = vm->privateData; + if (priv->nvcpupids == 0 || priv->vcpupids[0] == vm->pid) { + /* We donot create sub dir for each vcpu */
s/donot/don't/ -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

--- tools/virsh.pod | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/tools/virsh.pod b/tools/virsh.pod index ffc0777..92cb897 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -1202,7 +1202,8 @@ available for each hypervisor are: LXC (posix scheduler) : cpu_shares -QEMU/KVM (posix scheduler): cpu_shares, vcpu_period, vcpu_quota +QEMU/KVM (posix scheduler): cpu_shares, vcpu_period, vcpu_quota, +hypervisor_period, hypervisor_quota Xen (credit scheduler): weight, cap @@ -1220,10 +1221,10 @@ values 0 and 1 are automatically converted to a minimal value of 2. B<Note>: The weight and cap parameters are defined only for the XEN_CREDIT scheduler and are now I<DEPRECATED>. -B<Note>: The vcpu_period parameter has a valid value range of 1000-1000000 or -0, and the vcpu_quota parameter has a valid value range of -1000-18446744073709551 or less than 0. The value 0 for either parameter is -the same as not specifying that parameter. +B<Note>: The vcpu_period/hypervisor_period parameters have a valid value range +of 1000-1000000 or 0, and the vcpu_quota/hypervisor_quota parameters have a +valid value range of 1000-18446744073709551 or less than 0. The value 0 for +either parameter is the same as not specifying that parameter. =item B<screenshot> I<domain> [I<imagefilepath>] [I<--screen> B<screenID>] -- 1.7.10.2

On 08/03/2012 12:36 AM, Hu Tao wrote:
--- tools/virsh.pod | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/tools/virsh.pod b/tools/virsh.pod index ffc0777..92cb897 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -1202,7 +1202,8 @@ available for each hypervisor are:
LXC (posix scheduler) : cpu_shares
-QEMU/KVM (posix scheduler): cpu_shares, vcpu_period, vcpu_quota +QEMU/KVM (posix scheduler): cpu_shares, vcpu_period, vcpu_quota, +hypervisor_period, hypervisor_quota
Make sure you match the naming in the API, if we go with emulator_quota. I would rebase this series to split 16 and 17 differently - have 16 be the new API (libvirt.h.in and this file) and its first client, but not implementation; then 17 would be the first implementation of that API (qemu_driver.c). -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 08/03/2012 12:36 AM, Hu Tao wrote:
This series is a merge of
1) "Support hypervisor-threads-pin in vcpupin" (https://www.redhat.com/archives/libvir-list/2012-July/msg01361.html) 2) "support to set cpu bandwidth for hypervisor threads" (https://www.redhat.com/archives/libvir-list/2012-June/msg01161.html)
to make life easier because of the two share some patches.
Patches 1-12 are from 1), patches 13-17 are from 2).
Changes: 1. rebase to the latest git tree(removal of qemuReportError, split of virsh.c) 2. some typo fixes 3. make it pass syntax-check
Thanks for reposting. I do intend to review this series next, and this refresher is a big help. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Fri, Aug 03, 2012 at 02:36:02PM +0800, Hu Tao wrote:
This series is a merge of
1) "Support hypervisor-threads-pin in vcpupin" (https://www.redhat.com/archives/libvir-list/2012-July/msg01361.html) 2) "support to set cpu bandwidth for hypervisor threads" (https://www.redhat.com/archives/libvir-list/2012-June/msg01161.html)
to make life easier because of the two share some patches.
This series is really focusing on pinning threads associated with the <emulator> element, rather than the hypervisor. The hypervisor is a separate entity that is shared. So I'm thinking that this entire patch series could replace 'hypervisor' with 'emulator' everywhere. Any one has agree or disagree ? Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 08/07/2012 09:47 AM, Daniel P. Berrange wrote:
On Fri, Aug 03, 2012 at 02:36:02PM +0800, Hu Tao wrote:
This series is a merge of
1) "Support hypervisor-threads-pin in vcpupin" (https://www.redhat.com/archives/libvir-list/2012-July/msg01361.html) 2) "support to set cpu bandwidth for hypervisor threads" (https://www.redhat.com/archives/libvir-list/2012-June/msg01161.html)
to make life easier because of the two share some patches.
This series is really focusing on pinning threads associated with the <emulator> element, rather than the hypervisor. The hypervisor is a separate entity that is shared.
So I'm thinking that this entire patch series could replace 'hypervisor' with 'emulator' everywhere. Any one has agree or disagree ?
I definitely agree - when I hear 'hypervisor', I think 'qemu:///system', which is the technology used to run multiple guests, but when I hear 'emulator', I think of a subset of a domain, namely the specific qemu pid_t running a given guest. Also, we're not pinning all of the hypervisor's threads, but just the threads that are associated with emulation but not a specific vcpu. That is, marking up your comment in 1/17: cgroup mount point +--libvirt <= setting up a namespace (*) +--qemu <= hypervisor level +--domain name <= domain level +--vcpu0 <= vcpu level ... +--vcpuN +--"hypervisor" <= emulator so a domain really is made up of an 'emulator' and 'vcpu' threads, and a 'hypervisor' contains domains, rather than making up a portion of a domain. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Tue, Aug 07, 2012 at 10:13:19AM -0600, Eric Blake wrote:
On 08/07/2012 09:47 AM, Daniel P. Berrange wrote:
On Fri, Aug 03, 2012 at 02:36:02PM +0800, Hu Tao wrote:
This series is a merge of
1) "Support hypervisor-threads-pin in vcpupin" (https://www.redhat.com/archives/libvir-list/2012-July/msg01361.html) 2) "support to set cpu bandwidth for hypervisor threads" (https://www.redhat.com/archives/libvir-list/2012-June/msg01161.html)
to make life easier because of the two share some patches.
This series is really focusing on pinning threads associated with the <emulator> element, rather than the hypervisor. The hypervisor is a separate entity that is shared.
So I'm thinking that this entire patch series could replace 'hypervisor' with 'emulator' everywhere. Any one has agree or disagree ?
I definitely agree - when I hear 'hypervisor', I think 'qemu:///system', which is the technology used to run multiple guests, but when I hear 'emulator', I think of a subset of a domain, namely the specific qemu pid_t running a given guest. Also, we're not pinning all of the hypervisor's threads, but just the threads that are associated with emulation but not a specific vcpu.
That is, marking up your comment in 1/17:
cgroup mount point +--libvirt <= setting up a namespace (*) +--qemu <= hypervisor level +--domain name <= domain level +--vcpu0 <= vcpu level ... +--vcpuN +--"hypervisor" <= emulator
so a domain really is made up of an 'emulator' and 'vcpu' threads, and a 'hypervisor' contains domains, rather than making up a portion of a domain.
Also note that LXC has an emulator "/usr/libexec/libvirt_lxc" for which all these new APIs apply, but the term 'hypervisor' is meaningless for container based virt. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Tue, Aug 07, 2012 at 06:35:32PM +0100, Daniel P. Berrange wrote:
On Tue, Aug 07, 2012 at 10:13:19AM -0600, Eric Blake wrote:
On 08/07/2012 09:47 AM, Daniel P. Berrange wrote:
On Fri, Aug 03, 2012 at 02:36:02PM +0800, Hu Tao wrote:
This series is a merge of
1) "Support hypervisor-threads-pin in vcpupin" (https://www.redhat.com/archives/libvir-list/2012-July/msg01361.html) 2) "support to set cpu bandwidth for hypervisor threads" (https://www.redhat.com/archives/libvir-list/2012-June/msg01161.html)
to make life easier because of the two share some patches.
This series is really focusing on pinning threads associated with the <emulator> element, rather than the hypervisor. The hypervisor is a separate entity that is shared.
So I'm thinking that this entire patch series could replace 'hypervisor' with 'emulator' everywhere. Any one has agree or disagree ?
I definitely agree - when I hear 'hypervisor', I think 'qemu:///system', which is the technology used to run multiple guests, but when I hear 'emulator', I think of a subset of a domain, namely the specific qemu pid_t running a given guest. Also, we're not pinning all of the hypervisor's threads, but just the threads that are associated with emulation but not a specific vcpu.
That is, marking up your comment in 1/17:
cgroup mount point +--libvirt <= setting up a namespace (*) +--qemu <= hypervisor level +--domain name <= domain level +--vcpu0 <= vcpu level ... +--vcpuN +--"hypervisor" <= emulator
so a domain really is made up of an 'emulator' and 'vcpu' threads, and a 'hypervisor' contains domains, rather than making up a portion of a domain.
Also note that LXC has an emulator "/usr/libexec/libvirt_lxc" for which all these new APIs apply, but the term 'hypervisor' is meaningless for container based virt.
Thank you for your advice on the naming. I'll change hypervisor into emulator all through the series. -- Thanks, Hu Tao
participants (3)
-
Daniel P. Berrange
-
Eric Blake
-
Hu Tao