[libvirt] [PATCH 0/7] Add more options for IOThreads

The following patches will add more support for IOThreads not completed from the initial patches. These changes support the remaining elements of bz https://bugzilla.redhat.com/show_bug.cgi?id=1101574 (working through unsetting the private bits - as there's nothing in there that should necessarily be private). Changes: 1. Add "--iothread" option to virsh attach-disk to allow using a specific iothread when attaching a disk 2. Add the ability to set CPU affinity for an IOThread. This involves multiple steps (patches 2-6) of adding the infrastructure to support setting scheduler affinity for the IOThread including cgroup setup. For the most part it's a "copy" of the vCPU pinning code, but without the external interfaces - those will come at a later time after RHEL7.1. 3. Add to <cpuset/> a new element <iothreadpin iothread='#" cpuset="string"/> NOTE: I can combine any/all of patches 2-6 - I just kept them separate so it wasn't a larger single patch. Although future changes will support API's and virsh commands to modify the iothreadpin, this set of changes at least is backportable to RHEL7.1 since there are no external API changes. John Ferlan (7): virsh: Add iothread to 'attach-disk' qemu: Issue query-iothreads and to get list of active IOThreads vircgroup: Introduce virCgroupNewIOThread qemu_domain: Add niothreadpids and iothreadpids qemu_cgroup: Introduce cgroup functions for IOThreads qemu: Allow pinning specific IOThreads to a CPU domain_conf: Add iothreadpin to cputune docs/formatdomain.html.in | 18 +++ docs/schemas/domaincommon.rng | 10 ++ src/conf/domain_conf.c | 124 +++++++++++++++++++-- src/conf/domain_conf.h | 2 + src/libvirt_private.syms | 1 + src/qemu/qemu_cgroup.c | 104 +++++++++++++++++ src/qemu/qemu_cgroup.h | 5 + src/qemu/qemu_domain.c | 36 ++++++ src/qemu/qemu_domain.h | 3 + src/qemu/qemu_driver.c | 8 ++ src/qemu/qemu_monitor.c | 41 +++++++ src/qemu/qemu_monitor.h | 12 ++ src/qemu/qemu_monitor_json.c | 91 +++++++++++++++ src/qemu/qemu_monitor_json.h | 4 + src/qemu/qemu_process.c | 96 ++++++++++++++++ src/util/vircgroup.c | 43 +++++++ src/util/vircgroup.h | 6 + tests/qemumonitorjsontest.c | 71 ++++++++++++ .../qemuxml2argv-cputune-iothreads.xml | 38 +++++++ tests/qemuxml2xmltest.c | 1 + tools/virsh-domain.c | 13 ++- tools/virsh.pod | 5 +- 22 files changed, 717 insertions(+), 15 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-cputune-iothreads.xml -- 1.9.3

Add an iothread parameter to allow attaching to an IOThread, such as: virsh attach-disk $dom $source $target --live --config --iothread 2 \ --targetbus virtio --driver qemu --subdriver raw --type disk Signed-off-by: John Ferlan <jferlan@redhat.com> --- tools/virsh-domain.c | 13 ++++++++++--- tools/virsh.pod | 5 ++++- 2 files changed, 14 insertions(+), 4 deletions(-) diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index c75cd73..21f024b 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -311,6 +311,10 @@ static const vshCmdOptDef opts_attach_disk[] = { .type = VSH_OT_STRING, .help = N_("subdriver of disk device") }, + {.name = "iothread", + .type = VSH_OT_STRING, + .help = N_("IOThread to be used by supported device") + }, {.name = "cache", .type = VSH_OT_STRING, .help = N_("cache mode of disk device") @@ -527,8 +531,8 @@ cmdAttachDisk(vshControl *ctl, const vshCmd *cmd) virDomainPtr dom = NULL; const char *source = NULL, *target = NULL, *driver = NULL, *subdriver = NULL, *type = NULL, *mode = NULL, - *cache = NULL, *serial = NULL, *straddr = NULL, - *wwn = NULL, *targetbus = NULL; + *iothread = NULL, *cache = NULL, *serial = NULL, + *straddr = NULL, *wwn = NULL, *targetbus = NULL; struct DiskAddress diskAddr; bool isFile = false, functionReturn = false; int ret; @@ -558,6 +562,7 @@ cmdAttachDisk(vshControl *ctl, const vshCmd *cmd) vshCommandOptStringReq(ctl, cmd, "subdriver", &subdriver) < 0 || vshCommandOptStringReq(ctl, cmd, "type", &type) < 0 || vshCommandOptStringReq(ctl, cmd, "mode", &mode) < 0 || + vshCommandOptStringReq(ctl, cmd, "iothread", &iothread) < 0 || vshCommandOptStringReq(ctl, cmd, "cache", &cache) < 0 || vshCommandOptStringReq(ctl, cmd, "serial", &serial) < 0 || vshCommandOptStringReq(ctl, cmd, "wwn", &wwn) < 0 || @@ -601,13 +606,15 @@ cmdAttachDisk(vshControl *ctl, const vshCmd *cmd) virBufferAddLit(&buf, ">\n"); virBufferAdjustIndent(&buf, 2); - if (driver || subdriver || cache) { + if (driver || subdriver || iothread || cache) { virBufferAddLit(&buf, "<driver"); if (driver) virBufferAsprintf(&buf, " name='%s'", driver); if (subdriver) virBufferAsprintf(&buf, " type='%s'", subdriver); + if (iothread) + virBufferAsprintf(&buf, " iothread='%s'", iothread); if (cache) virBufferAsprintf(&buf, " cache='%s'", cache); diff --git a/tools/virsh.pod b/tools/virsh.pod index ea9267e..0f25387 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -2211,7 +2211,8 @@ expected. =item B<attach-disk> I<domain> I<source> I<target> [[[I<--live>] [I<--config>] | [I<--current>]] | [I<--persistent>]] [I<--targetbus bus>] [I<--driver -driver>] [I<--subdriver subdriver>] [I<--cache cache>] [I<--type type>] +driver>] [I<--subdriver subdriver>] [I<--iothread iothread>] +[I<--cache cache>] [I<--type type>] [I<--mode mode>] [I<--sourcetype sourcetype>] [I<--serial serial>] [I<--wwn wwn>] [I<--rawio>] [I<--address address>] [I<--multifunction>] [I<--print-xml>] @@ -2238,6 +2239,8 @@ I<mode> can specify the two specific mode I<readonly> or I<shareable>. I<sourcetype> can indicate the type of source (block|file) I<cache> can be one of "default", "none", "writethrough", "writeback", "directsync" or "unsafe". +I<iothread> is the number within the range of domain IOThreads to which +this disk may be attached (QEMU only). I<serial> is the serial of disk device. I<wwn> is the wwn of disk device. I<rawio> indicates the disk needs rawio capability. I<address> is the address of disk device in the form of pci:domain.bus.slot.function, -- 1.9.3

Generate infrastructure and test to handle fetching the QMP IOThreads data. Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/qemu/qemu_monitor.c | 41 ++++++++++++++++++++ src/qemu/qemu_monitor.h | 12 ++++++ src/qemu/qemu_monitor_json.c | 91 ++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_monitor_json.h | 4 ++ tests/qemumonitorjsontest.c | 71 ++++++++++++++++++++++++++++++++++ 5 files changed, 219 insertions(+) diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 5b2952a..66ebd42 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -4071,3 +4071,44 @@ qemuMonitorRTCResetReinjection(qemuMonitorPtr mon) return qemuMonitorJSONRTCResetReinjection(mon); } + +/** + * qemuMonitorGetIOThreads: + * @mon: Pointer to the monitor + * @iothreads: Location to return array of IOThreadInfo data + * + * Issue query-iothreads command. + * Retrieve the list of iothreads defined/running for the machine + * + * Returns count of IOThreadInfo structures on success + * -1 on error. + */ +int +qemuMonitorGetIOThreads(qemuMonitorPtr mon, + qemuMonitorIOThreadsInfoPtr **iothreads) +{ + + VIR_DEBUG("mon=%p iothreads=%p", mon, iothreads); + + if (!mon) { + virReportError(VIR_ERR_INVALID_ARG, "%s", + _("monitor must not be NULL")); + return -1; + } + + if (!mon->json) { + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s", + _("JSON monitor is required")); + return -1; + } + + return qemuMonitorJSONGetIOThreads(mon, iothreads); +} + +void qemuMonitorIOThreadsInfoFree(qemuMonitorIOThreadsInfoPtr iothread) +{ + if (!iothread) + return; + VIR_FREE(iothread->name); + VIR_FREE(iothread); +} diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 4fd6f01..c8915ec 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -792,6 +792,18 @@ int qemuMonitorGetGuestCPU(qemuMonitorPtr mon, int qemuMonitorRTCResetReinjection(qemuMonitorPtr mon); +typedef struct _qemuMonitorIOThreadsInfo qemuMonitorIOThreadsInfo; +typedef qemuMonitorIOThreadsInfo *qemuMonitorIOThreadsInfoPtr; + +struct _qemuMonitorIOThreadsInfo { + char *name; + int thread_id; +}; +int qemuMonitorGetIOThreads(qemuMonitorPtr mon, + qemuMonitorIOThreadsInfoPtr **iothreads); + +void qemuMonitorIOThreadsInfoFree(qemuMonitorIOThreadsInfoPtr iothread); + /** * When running two dd process and using <> redirection, we need a * shell that will not truncate files. These two strings serve that diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 62e7d5d..b6122dc 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -5872,3 +5872,94 @@ qemuMonitorJSONRTCResetReinjection(qemuMonitorPtr mon) virJSONValueFree(reply); return ret; } + +/** + * Query and parse returned array of data such as: + * + * {u'return': [{u'id': u'iothread1', u'thread-id': 30992}, \ + * {u'id': u'iothread2', u'thread-id': 30993}]} + */ +int +qemuMonitorJSONGetIOThreads(qemuMonitorPtr mon, + qemuMonitorIOThreadsInfoPtr **iothreads) +{ + int ret = -1; + virJSONValuePtr cmd; + virJSONValuePtr reply = NULL; + virJSONValuePtr data; + qemuMonitorIOThreadsInfoPtr *infolist = NULL; + int n = 0; + size_t i; + + *iothreads = NULL; + + if (!(cmd = qemuMonitorJSONMakeCommand("query-iothreads", NULL))) + return ret; + + ret = qemuMonitorJSONCommand(mon, cmd, &reply); + + if (ret == 0) + ret = qemuMonitorJSONCheckError(cmd, reply); + + if (ret < 0) + goto cleanup; + + ret = -1; + + if (!(data = virJSONValueObjectGet(reply, "return"))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("query-iothreads reply was missing return data")); + goto cleanup; + } + + if ((n = virJSONValueArraySize(data)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("query-iothreads reply data was not an array")); + goto cleanup; + } + + /* null-terminated list */ + if (VIR_ALLOC_N(infolist, n + 1) < 0) + goto cleanup; + + for (i = 0; i < n; i++) { + virJSONValuePtr child = virJSONValueArrayGet(data, i); + const char *tmp; + qemuMonitorIOThreadsInfoPtr info; + + if (VIR_ALLOC(info) < 0) + goto cleanup; + + infolist[i] = info; + + if (!(tmp = virJSONValueObjectGetString(child, "id"))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("query-iothreads reply data was missing 'id'")); + goto cleanup; + } + + if (VIR_STRDUP(info->name, tmp) < 0) + goto cleanup; + + if (virJSONValueObjectGetNumberInt(child, "thread-id", + &info->thread_id) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("query-iothreads reply has malformed " + "'thread-id' data")); + goto cleanup; + } + } + + ret = n; + *iothreads = infolist; + + cleanup: + if (ret < 0 && infolist) { + for (i = 0; i < n; i++) + qemuMonitorIOThreadsInfoFree(infolist[i]); + VIR_FREE(infolist); + } + virJSONValueFree(cmd); + virJSONValueFree(reply); + return ret; +} diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index d8c9308..4a6dbee 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -439,4 +439,8 @@ int qemuMonitorJSONGetGuestCPU(qemuMonitorPtr mon, virCPUDataPtr *data); int qemuMonitorJSONRTCResetReinjection(qemuMonitorPtr mon); + +int qemuMonitorJSONGetIOThreads(qemuMonitorPtr mon, + qemuMonitorIOThreadsInfoPtr **iothreads) + ATTRIBUTE_NONNULL(2); #endif /* QEMU_MONITOR_JSON_H */ diff --git a/tests/qemumonitorjsontest.c b/tests/qemumonitorjsontest.c index e3fb4f7..d53ab05 100644 --- a/tests/qemumonitorjsontest.c +++ b/tests/qemumonitorjsontest.c @@ -2217,6 +2217,76 @@ testQemuMonitorJSONGetNonExistingCPUData(const void *opaque) } static int +testQemuMonitorJSONGetIOThreads(const void *data) +{ + virDomainXMLOptionPtr xmlopt = (virDomainXMLOptionPtr)data; + qemuMonitorTestPtr test = qemuMonitorTestNewSimple(true, xmlopt); + qemuMonitorIOThreadsInfoPtr *info; + int ninfo = 0; + int ret = -1; + size_t i; + + if (!test) + return -1; + + if (qemuMonitorTestAddItem(test, "query-iothreads", + "{ " + " \"return\": [ " + " { " + " \"id\": \"iothread1\", " + " \"thread-id\": 30992 " + " }, " + " { " + " \"id\": \"iothread2\", " + " \"thread-id\": 30993 " + " } " + " ]" + "}") < 0) + goto cleanup; + + if ((ninfo = qemuMonitorGetIOThreads(qemuMonitorTestGetMonitor(test), + &info)) < 0) + goto cleanup; + + if (ninfo != 2) { + virReportError(VIR_ERR_INTERNAL_ERROR, + "ninfo %d is not 2", ninfo); + goto cleanup; + } + +#define CHECK(i, wantname, wantthread_id) \ + do { \ + if (STRNEQ(info[i]->name, (wantname))) { \ + virReportError(VIR_ERR_INTERNAL_ERROR, \ + "name %s is not %s", \ + info[i]->name, (wantname)); \ + goto cleanup; \ + } \ + if (info[i]->thread_id != (wantthread_id)) { \ + virReportError(VIR_ERR_INTERNAL_ERROR, \ + "thread_id %d is not %d", \ + info[i]->thread_id, (wantthread_id)); \ + goto cleanup; \ + } \ + } while (0) + + CHECK(0, "iothread1", 30992); + CHECK(1, "iothread2", 30993); + +#undef CHECK + + ret = 0; + + cleanup: + qemuMonitorTestFree(test); + for (i = 0; i < ninfo; i++) + qemuMonitorIOThreadsInfoFree(info[i]); + VIR_FREE(info); + + return ret; +} + +static int mymain(void) { int ret = 0; @@ -2272,6 +2342,7 @@ mymain(void) DO_TEST(GetDeviceAliases); DO_TEST(CPU); DO_TEST(GetNonExistingCPUData); + DO_TEST(GetIOThreads); DO_TEST_SIMPLE("qmp_capabilities", qemuMonitorJSONSetCapabilities); DO_TEST_SIMPLE("system_powerdown", qemuMonitorJSONSystemPowerdown); DO_TEST_SIMPLE("system_reset", qemuMonitorJSONSystemReset); -- 1.9.3

Add virCgroupNewIOThread() to mimic virCgroupNewVcpu() except the naming scheme with use "iothread" rather than "vcpu". Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/libvirt_private.syms | 1 + src/util/vircgroup.c | 43 +++++++++++++++++++++++++++++++++++++++++++ src/util/vircgroup.h | 6 ++++++ 3 files changed, 50 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index f136d24..8481ed6 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1085,6 +1085,7 @@ virCgroupNewDetectMachine; virCgroupNewDomainPartition; virCgroupNewEmulator; virCgroupNewIgnoreError; +virCgroupNewIOThread; virCgroupNewMachine; virCgroupNewPartition; virCgroupNewSelf; diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c index 8b554a9..2842831 100644 --- a/src/util/vircgroup.c +++ b/src/util/vircgroup.c @@ -1492,6 +1492,49 @@ virCgroupNewEmulator(virCgroupPtr domain, } +/** + * virCgroupNewIOThread: + * + * @domain: group for the domain + * @iothreadid: id of the iothread + * @create: true to create if not already existing + * @group: Pointer to returned virCgroupPtr + * + * Returns 0 on success, or -1 on error + */ +int +virCgroupNewIOThread(virCgroupPtr domain, + int iothreadid, + bool create, + virCgroupPtr *group) +{ + int ret = -1; + char *name = NULL; + int controllers; + + if (virAsprintf(&name, "iothread%d", iothreadid) < 0) + goto cleanup; + + controllers = ((1 << VIR_CGROUP_CONTROLLER_CPU) | + (1 << VIR_CGROUP_CONTROLLER_CPUACCT) | + (1 << VIR_CGROUP_CONTROLLER_CPUSET)); + + if (virCgroupNew(-1, name, domain, controllers, group) < 0) + goto cleanup; + + if (virCgroupMakeGroup(domain, *group, create, VIR_CGROUP_NONE) < 0) { + virCgroupRemove(*group); + virCgroupFree(group); + goto cleanup; + } + + ret = 0; + cleanup: + VIR_FREE(name); + return ret; +} + + int virCgroupNewDetect(pid_t pid, int controllers, diff --git a/src/util/vircgroup.h b/src/util/vircgroup.h index 90b41f7..19e82d1 100644 --- a/src/util/vircgroup.h +++ b/src/util/vircgroup.h @@ -76,6 +76,12 @@ int virCgroupNewEmulator(virCgroupPtr domain, virCgroupPtr *group) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(3); +int virCgroupNewIOThread(virCgroupPtr domain, + int iothreadid, + bool create, + virCgroupPtr *group) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(4); + int virCgroupNewDetect(pid_t pid, int controllers, virCgroupPtr *group); -- 1.9.3

Add new 'niothreadpids' and 'iothreadpids' to mimic the 'ncpupids' and 'vcpupids' that already exist. Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/qemu/qemu_domain.c | 36 ++++++++++++++++++++++++++++++++++++ src/qemu/qemu_domain.h | 3 +++ 2 files changed, 39 insertions(+) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index e9506e0..b23217b 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -253,6 +253,7 @@ qemuDomainObjPrivateFree(void *data) virDomainChrSourceDefFree(priv->monConfig); qemuDomainObjFreeJob(priv); VIR_FREE(priv->vcpupids); + VIR_FREE(priv->iothreadpids); VIR_FREE(priv->lockState); VIR_FREE(priv->origname); @@ -311,6 +312,18 @@ qemuDomainObjPrivateXMLFormat(virBufferPtr buf, void *data) virBufferAddLit(buf, "</vcpus>\n"); } + if (priv->niothreadpids) { + size_t i; + virBufferAddLit(buf, "<iothreads>\n"); + virBufferAdjustIndent(buf, 2); + for (i = 0; i < priv->niothreadpids; i++) { + virBufferAsprintf(buf, "<iothread pid='%d'/>\n", + priv->iothreadpids[i]); + } + virBufferAdjustIndent(buf, -2); + virBufferAddLit(buf, "</iothreads>\n"); + } + if (priv->qemuCaps) { size_t i; virBufferAddLit(buf, "<qemuCaps>\n"); @@ -434,6 +447,29 @@ qemuDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt, void *data) VIR_FREE(nodes); } + n = virXPathNodeSet("./iothreads/iothread", ctxt, &nodes); + if (n < 0) + goto error; + if (n) { + priv->niothreadpids = n; + if (VIR_REALLOC_N(priv->iothreadpids, priv->niothreadpids) < 0) + goto error; + + for (i = 0; i < n; i++) { + char *pidstr = virXMLPropString(nodes[i], "pid"); + if (!pidstr) + goto error; + + if (virStrToLong_i(pidstr, NULL, 10, + &(priv->iothreadpids[i])) < 0) { + VIR_FREE(pidstr); + goto error; + } + VIR_FREE(pidstr); + } + VIR_FREE(nodes); + } + if ((n = virXPathNodeSet("./qemuCaps/flag", ctxt, &nodes)) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("failed to parse qemu capabilities flags")); diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 8736889..d27a458 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -142,6 +142,9 @@ struct _qemuDomainObjPrivate { int nvcpupids; int *vcpupids; + int niothreadpids; + int *iothreadpids; + virDomainPCIAddressSetPtr pciaddrs; virDomainCCWAddressSetPtr ccwaddrs; int persistentAddrs; -- 1.9.3

In order to support cpuset setting, introduce qemuSetupCgroupIOThreadsPin and qemuSetupCgroupForIOThreads to mimic the existing Vcpu API's. These will support having an 'iotrhreadpin' element in the 'cpuset' in order to pin named IOThreads to specific CPU's. The IOThread pin names will follow the IOThread naming scheme starting at 1 (eg "iothread1") up through an including the def->iothreads value. Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/qemu/qemu_cgroup.c | 104 +++++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_cgroup.h | 5 +++ 2 files changed, 109 insertions(+) diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index 43d14d4..9486651 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -903,6 +903,23 @@ qemuSetupCgroupVcpuPin(virCgroupPtr cgroup, } int +qemuSetupCgroupIOThreadsPin(virCgroupPtr cgroup, + virDomainVcpuPinDefPtr *iothreadspin, + int niothreadspin, + int iothreadid) +{ + size_t i; + + for (i = 0; i < niothreadspin; i++) { + if (iothreadid == iothreadspin[i]->vcpuid) { + return qemuSetupCgroupEmulatorPin(cgroup, iothreadspin[i]->cpumask); + } + } + + return -1; +} + +int qemuSetupCgroupEmulatorPin(virCgroupPtr cgroup, virBitmapPtr cpumask) { @@ -1083,6 +1100,93 @@ qemuSetupCgroupForEmulator(virQEMUDriverPtr driver, } int +qemuSetupCgroupForIOThreads(virDomainObjPtr vm) +{ + virCgroupPtr cgroup_iothread = NULL; + qemuDomainObjPrivatePtr priv = vm->privateData; + virDomainDefPtr def = vm->def; + size_t i, j; + unsigned long long period = vm->def->cputune.period; + long long quota = vm->def->cputune.quota; + + if ((period || quota) && + !virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_CPU)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("cgroup cpu is required for scheduler tuning")); + return -1; + } + + /* + * If CPU cgroup controller is not initialized here, then we need + * neither period nor quota settings. And if CPUSET controller is + * not initialized either, then there's nothing to do anyway. + */ + if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_CPU) && + !virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_CPUSET)) + return 0; + + /* We are trying to setup cgroups for CPU pinning, which can also be done + * with virProcessSetAffinity, thus the lack of cgroups is not fatal here. + */ + if (priv->cgroup == NULL) + return 0; + + if (priv->niothreadpids == 0) { + VIR_WARN("Unable to get iothreads' pids."); + return 0; + } + + for (i = 0; i < priv->niothreadpids; i++) { + /* IOThreads are numbered 1..n, although the array is 0..n-1, + * so we will laccount for that here + */ + if (virCgroupNewIOThread(priv->cgroup, i+1, true, &cgroup_iothread) < 0) + goto cleanup; + + /* move the thread for iothread to sub dir */ + if (virCgroupAddTask(cgroup_iothread, priv->iothreadpids[i]) < 0) + goto cleanup; + + if (period || quota) { + if (qemuSetupCgroupVcpuBW(cgroup_iothread, period, quota) < 0) + goto cleanup; + } + + /* Set iothreadpin in cgroup if iothreadpin xml is provided */ + if (virCgroupHasController(priv->cgroup, + VIR_CGROUP_CONTROLLER_CPUSET)) { + /* find the right CPU to pin, otherwise + * qemuSetupCgroupIOThreadsPin will fail. */ + for (j = 0; j < def->cputune.niothreadspin; j++) { + /* IOThreads are numbered/named 1..n */ + if (def->cputune.iothreadspin[j]->vcpuid != i+1) + continue; + + if (qemuSetupCgroupIOThreadsPin(cgroup_iothread, + def->cputune.iothreadspin, + def->cputune.niothreadspin, + i+1) < 0) + goto cleanup; + + break; + } + } + + virCgroupFree(&cgroup_iothread); + } + + return 0; + + cleanup: + if (cgroup_iothread) { + virCgroupRemove(cgroup_iothread); + virCgroupFree(&cgroup_iothread); + } + + return -1; +} + +int qemuRemoveCgroup(virDomainObjPtr vm) { qemuDomainObjPrivatePtr priv = vm->privateData; diff --git a/src/qemu/qemu_cgroup.h b/src/qemu/qemu_cgroup.h index 7394969..8a2c723 100644 --- a/src/qemu/qemu_cgroup.h +++ b/src/qemu/qemu_cgroup.h @@ -57,7 +57,12 @@ int qemuSetupCgroupVcpuPin(virCgroupPtr cgroup, int nvcpupin, int vcpuid); int qemuSetupCgroupEmulatorPin(virCgroupPtr cgroup, virBitmapPtr cpumask); +int qemuSetupCgroupIOThreadsPin(virCgroupPtr cgroup, + virDomainVcpuPinDefPtr *iothreadspin, + int niothreadspin, + int iothreadid); int qemuSetupCgroupForVcpu(virDomainObjPtr vm); +int qemuSetupCgroupForIOThreads(virDomainObjPtr vm); int qemuSetupCgroupForEmulator(virQEMUDriverPtr driver, virDomainObjPtr vm, virBitmapPtr nodemask); -- 1.9.3

On 09/03/2014 06:15 PM, John Ferlan wrote:
In order to support cpuset setting, introduce qemuSetupCgroupIOThreadsPin and qemuSetupCgroupForIOThreads to mimic the existing Vcpu API's.
These will support having an 'iotrhreadpin' element in the 'cpuset' in order to pin named IOThreads to specific CPU's. The IOThread pin names will follow the IOThread naming scheme starting at 1 (eg "iothread1") up through an including the def->iothreads value.
Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/qemu/qemu_cgroup.c | 104 +++++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_cgroup.h | 5 +++ 2 files changed, 109 insertions(+)
int +qemuSetupCgroupForIOThreads(virDomainObjPtr vm) +{
...
+ } + + for (i = 0; i < priv->niothreadpids; i++) { + /* IOThreads are numbered 1..n, although the array is 0..n-1, + * so we will laccount for that here
s/laccount/account/
+ */ + if (virCgroupNewIOThread(priv->cgroup, i+1, true, &cgroup_iothread) < 0) + goto cleanup; + + /* move the thread for iothread to sub dir */
Jan

Modify qemuProcessStart() in order to allowing setting affinity to specific CPU's for IOThreads. The process followed is similar to that for the vCPU's. This involves adding a function to fetch the IOThread id's via qemuMonitorGetIOThreads() and adding them to iothreadpids[] list. Then making sure all the cgroup data has been properly set up and finally assigning affinity. Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/qemu/qemu_driver.c | 8 +++++ src/qemu/qemu_process.c | 96 +++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 104 insertions(+) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 239a300..cb94b02 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -8733,6 +8733,14 @@ qemuDomainSetNumaParamsLive(virDomainObjPtr vm, virCgroupSetCpusetMems(priv->cgroup, nodeset_str) < 0) goto cleanup; + for (i = 0; i < priv->niothreadpids; i++) { + if (virCgroupNewIOThread(priv->cgroup, i, false, &cgroup_temp) < 0 || + virCgroupSetCpusetMems(cgroup_temp, nodeset_str) < 0) + goto cleanup; + virCgroupFree(&cgroup_temp); + } + + ret = 0; cleanup: VIR_FREE(nodeset_str); diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index f76aa5a..3a02b9e 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -2089,6 +2089,49 @@ qemuProcessDetectVcpuPIDs(virQEMUDriverPtr driver, } +static int +qemuProcessDetectIOThreadPIDs(virQEMUDriverPtr driver, + virDomainObjPtr vm, + int asyncJob) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + qemuMonitorIOThreadsInfoPtr *iothreads = NULL; + int niothreads = 0; + int ret = -1; + size_t i; + + /* Get the list of IOThreads from qemu */ + if (qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob) < 0) + goto cleanup; + niothreads = qemuMonitorGetIOThreads(priv->mon, &iothreads); + qemuDomainObjExitMonitor(driver, vm); + if (niothreads <= 0) + goto cleanup; + + if (niothreads != vm->def->iothreads) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("got wrong number of IOThread pids from QEMU monitor. " + "got %d, wanted %d"), + niothreads, vm->def->iothreads); + goto cleanup; + } + + if (VIR_ALLOC_N(priv->iothreadpids, niothreads) < 0) + goto cleanup; + priv->niothreadpids = niothreads; + + for (i = 0; i < priv->niothreadpids; i++) + priv->iothreadpids[i] = iothreads[i]->thread_id; + + ret = 0; + + cleanup: + for (i = 0; i < niothreads; i++) + qemuMonitorIOThreadsInfoFree(iothreads[i]); + VIR_FREE(iothreads); + return ret; +} + /* Helper to prepare cpumap for affinity setting, convert * NUMA nodeset into cpuset if @nodemask is not NULL, otherwise * just return a new allocated bitmap. @@ -2281,6 +2324,41 @@ qemuProcessSetEmulatorAffinity(virDomainObjPtr vm) return ret; } +/* Set CPU affinities for IOThreads threads. */ +static int +qemuProcessSetIOThreadsAffinity(virDomainObjPtr vm) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + virDomainDefPtr def = vm->def; + virDomainVcpuPinDefPtr pininfo; + size_t i; + int ret = -1; + + if (!def->cputune.niothreadspin) + return 0; + + if (priv->iothreadpids == NULL) { + virReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("IOThread affinity is not supported")); + return -1; + } + + for (i = 0; i < def->iothreads; i++) { + /* set affinity only for existing vcpus */ + if (!(pininfo = virDomainVcpuPinFindByVcpu(def->cputune.iothreadspin, + def->cputune.niothreadspin, + i+1))) + continue; + + if (virProcessSetAffinity(priv->iothreadpids[i], pininfo->cpumask) < 0) + goto cleanup; + } + ret = 0; + + cleanup: + return ret; +} + static int qemuProcessInitPasswords(virConnectPtr conn, virQEMUDriverPtr driver, @@ -4268,6 +4346,10 @@ int qemuProcessStart(virConnectPtr conn, if (qemuProcessDetectVcpuPIDs(driver, vm, asyncJob) < 0) goto cleanup; + VIR_DEBUG("Detecting IOThread PIDs"); + if (qemuProcessDetectIOThreadPIDs(driver, vm, asyncJob) < 0) + goto cleanup; + VIR_DEBUG("Setting cgroup for each VCPU (if required)"); if (qemuSetupCgroupForVcpu(vm) < 0) goto cleanup; @@ -4276,6 +4358,10 @@ int qemuProcessStart(virConnectPtr conn, if (qemuSetupCgroupForEmulator(driver, vm, nodemask) < 0) goto cleanup; + VIR_DEBUG("Setting cgroup for each IOThread (if required)"); + if (qemuSetupCgroupForIOThreads(vm) < 0) + goto cleanup; + VIR_DEBUG("Setting VCPU affinities"); if (qemuProcessSetVcpuAffinities(vm) < 0) goto cleanup; @@ -4284,6 +4370,10 @@ int qemuProcessStart(virConnectPtr conn, if (qemuProcessSetEmulatorAffinity(vm) < 0) goto cleanup; + VIR_DEBUG("Setting affinity of IOThread threads"); + if (qemuProcessSetIOThreadsAffinity(vm) < 0) + goto cleanup; + VIR_DEBUG("Setting any required VM passwords"); if (qemuProcessInitPasswords(conn, driver, vm, asyncJob) < 0) goto cleanup; @@ -4697,6 +4787,8 @@ void qemuProcessStop(virQEMUDriverPtr driver, virDomainObjSetState(vm, VIR_DOMAIN_SHUTOFF, reason); VIR_FREE(priv->vcpupids); priv->nvcpupids = 0; + VIR_FREE(priv->iothreadpids); + priv->niothreadpids = 0; virObjectUnref(priv->qemuCaps); priv->qemuCaps = NULL; VIR_FREE(priv->pidfile); @@ -4889,6 +4981,10 @@ int qemuProcessAttach(virConnectPtr conn ATTRIBUTE_UNUSED, if (qemuProcessDetectVcpuPIDs(driver, vm, QEMU_ASYNC_JOB_NONE) < 0) goto error; + VIR_DEBUG("Detecting IOThread PIDs"); + if (qemuProcessDetectIOThreadPIDs(driver, vm, QEMU_ASYNC_JOB_NONE) < 0) + goto error; + /* If we have -device, then addresses are assigned explicitly. * If not, then we have to detect dynamic ones here */ if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_DEVICE)) { -- 1.9.3

https://bugzilla.redhat.com/show_bug.cgi?id=1101574 Add an option 'iothreadpin' to the <cpuset> to allow for setting the CPU affinity for each IOThread. The iothreadspin will mimic the vcpupin with respect to being able to assign each iothread to a specific CPU, although iothreads ids start at 1 while vcpu ids start at 0. This matches the iothread naming scheme. Signed-off-by: John Ferlan <jferlan@redhat.com> --- docs/formatdomain.html.in | 18 +++ docs/schemas/domaincommon.rng | 10 ++ src/conf/domain_conf.c | 124 +++++++++++++++++++-- src/conf/domain_conf.h | 2 + .../qemuxml2argv-cputune-iothreads.xml | 38 +++++++ tests/qemuxml2xmltest.c | 1 + 6 files changed, 182 insertions(+), 11 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-cputune-iothreads.xml diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 94236dd..7859076 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -508,6 +508,8 @@ <vcpupin vcpu="2" cpuset="2,3"/> <vcpupin vcpu="3" cpuset="0,4"/> <emulatorpin cpuset="1-3"/> + <iothreadpin iothread="1" cpuset="5,6"/> + <iothreadpin iothread="2" cpuset="7,8"/> <shares>2048</shares> <period>1000000</period> <quota>-1</quota> @@ -549,6 +551,22 @@ attribute <code>placement</code> of element <code>vcpu</code> is "auto". </dd> + <dt><code>iothreadpin</code></dt> + <dd> + The optional <code>iothreadpin</code> element specifies which of host + physical CPUs the IOThreads will be pinned to. If this is omitted + and attribute <code>cpuset</code> of element <code>vcpu</code> is + not specified, the IOThreads are pinned to all the physical CPUs + by default. There are two required attributes, the attribute + <code>iothread</code> specifies the IOThread id and the attribute + <code>cpuset</code> specifying which physical CPUs to pin to. The + <code>iothread</code> value begins at "1" through the number of + <a href="#elementsIOThreadsAllocation"><code>iothreads</code></a> + allocated to the domain. A value of "0" is not permitted. + NB, <code>iothreadpin</code> is not allowed if attribute + <code>placement</code> of element <code>vcpu</code> is "auto". + <span class="since">Since 1.2.9</span> + </dd> <dt><code>shares</code></dt> <dd> The optional <code>shares</code> element specifies the proportional diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index cedceae..9dc8dfb 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -782,6 +782,16 @@ </attribute> </element> </optional> + <zeroOrMore> + <element name="iothreadpin"> + <attribute name="iothread"> + <ref name="unsignedInt"/> + </attribute> + <attribute name="cpuset"> + <ref name="cpuset"/> + </attribute> + </element> + </zeroOrMore> </element> </define> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 53ef694..88128f7 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -2150,6 +2150,9 @@ void virDomainDefFree(virDomainDefPtr def) virDomainVcpuPinDefFree(def->cputune.emulatorpin); + virDomainVcpuPinDefArrayFree(def->cputune.iothreadspin, + def->cputune.niothreadspin); + virDomainNumatuneFree(def->numatune); virSysinfoDefFree(def->sysinfo); @@ -11393,6 +11396,9 @@ virDomainPanicDefParseXML(xmlNodePtr node) * and emulatorpin has the form of * <emulatorpin cpuset='0'/> * + * and an iothreadspin has the form + * <iothreadpin iothread='1' cpuset='2'/> + * * A vcpuid of -1 is valid and only valid for emulatorpin. So callers * have to check the returned cpuid for validity. */ @@ -11400,11 +11406,13 @@ static virDomainVcpuPinDefPtr virDomainVcpuPinDefParseXML(xmlNodePtr node, xmlXPathContextPtr ctxt, int maxvcpus, - bool emulator) + bool emulator, + bool iothreads) { virDomainVcpuPinDefPtr def; xmlNodePtr oldnode = ctxt->node; int vcpuid = -1; + unsigned int iothreadid; char *tmp = NULL; int ret; @@ -11413,7 +11421,7 @@ virDomainVcpuPinDefParseXML(xmlNodePtr node, ctxt->node = node; - if (!emulator) { + if (!emulator && !iothreads) { ret = virXPathInt("string(./@vcpu)", ctxt, &vcpuid); if ((ret == -2) || (vcpuid < -1)) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", @@ -11434,10 +11442,41 @@ virDomainVcpuPinDefParseXML(xmlNodePtr node, def->vcpuid = vcpuid; } + if (iothreads) { + tmp = virXPathString("string(./@iothread)", ctxt); + if (tmp && virStrToLong_uip(tmp, NULL, 10, &iothreadid) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("invalid setting for iothread '%s'"), tmp); + goto error; + } + + if (iothreadid == 0) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("zero is an invalid iothread id value")); + goto error; + } + + /* NB: maxvcpus is actually def->iothreads + * IOThreads are numbered "iothread1...iothread<n>", where + * "n" is the iothreads value + */ + if (iothreadid > maxvcpus) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("iothread id must not exceed iothreads")); + goto error; + } + + /* Rather than creating our own structure we are reusing the vCPU */ + def->vcpuid = iothreadid; + } + if (!(tmp = virXMLPropString(node, "cpuset"))) { if (emulator) virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("missing cpuset for emulatorpin")); + else if (iothreads) + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("missing cpuset for iothreadpin")); else virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("missing cpuset for vcpupin")); @@ -12080,7 +12119,7 @@ virDomainDefParseXML(xmlDocPtr xml, for (i = 0; i < n; i++) { virDomainVcpuPinDefPtr vcpupin = NULL; vcpupin = virDomainVcpuPinDefParseXML(nodes[i], ctxt, - def->maxvcpus, false); + def->maxvcpus, false, false); if (!vcpupin) goto error; @@ -12155,8 +12194,9 @@ virDomainDefParseXML(xmlDocPtr xml, goto error; } - def->cputune.emulatorpin = virDomainVcpuPinDefParseXML(nodes[0], ctxt, - 0, true); + def->cputune.emulatorpin = virDomainVcpuPinDefParseXML(nodes[0], + ctxt, 0, + true, false); if (!def->cputune.emulatorpin) goto error; @@ -12166,6 +12206,49 @@ virDomainDefParseXML(xmlDocPtr xml, } VIR_FREE(nodes); + + if ((n = virXPathNodeSet("./cputune/iothreadpin", ctxt, &nodes)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("cannot extract iothreadpin nodes")); + goto error; + } + + /* Ignore iothreadpin if <vcpu> placement is "auto", they + * conflict with each other, and <vcpu> placement can't be + * simply ignored, as <numatune>'s placement defaults to it. + */ + if (n) { + if (def->placement_mode != VIR_DOMAIN_CPU_PLACEMENT_MODE_AUTO) { + if (VIR_ALLOC_N(def->cputune.iothreadspin, n) < 0) + goto error; + + for (i = 0; i < n; i++) { + virDomainVcpuPinDefPtr iothreadpin = NULL; + iothreadpin = virDomainVcpuPinDefParseXML(nodes[i], ctxt, + def->iothreads, + false, true); + if (!iothreadpin) + goto error; + + if (virDomainVcpuPinIsDuplicate(def->cputune.iothreadspin, + def->cputune.niothreadspin, + iothreadpin->vcpuid)) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("duplicate iothreadpin for same iothread")); + virDomainVcpuPinDefFree(iothreadpin); + goto error; + } + + def->cputune.iothreadspin[def->cputune.niothreadspin++] = + iothreadpin; + } + } else { + VIR_WARN("Ignore iothreadpin for <vcpu> placement is 'auto'"); + } + } + VIR_FREE(nodes); + + /* analysis of cpu handling */ if ((node = virXPathNode("./cpu[1]", ctxt)) != NULL) { xmlNodePtr oldnode = ctxt->node; @@ -17818,6 +17901,7 @@ virDomainDefFormatInternal(virDomainDefPtr def, int n; size_t i; bool blkio = false; + bool cputune = false; virCheckFlags(DUMPXML_FLAGS | VIR_DOMAIN_XML_INTERNAL_STATUS | @@ -18009,8 +18093,11 @@ virDomainDefFormatInternal(virDomainDefPtr def, (def->cputune.nvcpupin && !virDomainIsAllVcpupinInherited(def)) || def->cputune.period || def->cputune.quota || def->cputune.emulatorpin || - def->cputune.emulator_period || def->cputune.emulator_quota) + def->cputune.emulator_period || def->cputune.emulator_quota || + def->cputune.niothreadspin) { virBufferAddLit(buf, "<cputune>\n"); + cputune = true; + } virBufferAdjustIndent(buf, 2); if (def->cputune.sharesSpecified) @@ -18061,12 +18148,27 @@ virDomainDefFormatInternal(virDomainDefPtr def, virBufferAsprintf(buf, "cpuset='%s'/>\n", cpumask); VIR_FREE(cpumask); } + + for (i = 0; i < def->cputune.niothreadspin; i++) { + char *cpumask; + /* Ignore the iothreadpin which inherit from "cpuset of "<vcpu>." */ + if (def->cpumask && + virBitmapEqual(def->cpumask, + def->cputune.iothreadspin[i]->cpumask)) + continue; + + virBufferAsprintf(buf, "<iothreadpin iothread='%u' ", + def->cputune.iothreadspin[i]->vcpuid); + + if (!(cpumask = virBitmapFormat(def->cputune.iothreadspin[i]->cpumask))) + goto error; + + virBufferAsprintf(buf, "cpuset='%s'/>\n", cpumask); + VIR_FREE(cpumask); + } + virBufferAdjustIndent(buf, -2); - if (def->cputune.sharesSpecified || - (def->cputune.nvcpupin && !virDomainIsAllVcpupinInherited(def)) || - def->cputune.period || def->cputune.quota || - def->cputune.emulatorpin || - def->cputune.emulator_period || def->cputune.emulator_quota) + if (cputune) virBufferAddLit(buf, "</cputune>\n"); if (virDomainNumatuneFormatXML(buf, def->numatune) < 0) diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 9586c3b..25ff031 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -1928,6 +1928,8 @@ struct _virDomainDef { size_t nvcpupin; virDomainVcpuPinDefPtr *vcpupin; virDomainVcpuPinDefPtr emulatorpin; + size_t niothreadspin; + virDomainVcpuPinDefPtr *iothreadspin; } cputune; virDomainNumatunePtr numatune; diff --git a/tests/qemuxml2argvdata/qemuxml2argv-cputune-iothreads.xml b/tests/qemuxml2argvdata/qemuxml2argv-cputune-iothreads.xml new file mode 100644 index 0000000..435d0ae --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-cputune-iothreads.xml @@ -0,0 +1,38 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219136</memory> + <currentMemory unit='KiB'>219136</currentMemory> + <vcpu placement='static'>2</vcpu> + <iothreads>2</iothreads> + <cputune> + <shares>2048</shares> + <period>1000000</period> + <quota>-1</quota> + <vcpupin vcpu='0' cpuset='0'/> + <vcpupin vcpu='1' cpuset='1'/> + <emulatorpin cpuset='1'/> + <iothreadpin iothread='1' cpuset='2'/> + <iothreadpin iothread='2' cpuset='3'/> + </cputune> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu</emulator> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='ide'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <controller type='usb' index='0'/> + <controller type='ide' index='0'/> + <controller type='pci' index='0' model='pci-root'/> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index b4ab671..4f327fc 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -302,6 +302,7 @@ mymain(void) DO_TEST("smp"); DO_TEST("iothreads"); + DO_TEST("cputune-iothreads"); DO_TEST("iothreads-disk"); DO_TEST("lease"); DO_TEST("event_idx"); -- 1.9.3

On 09/03/2014 06:15 PM, John Ferlan wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1101574
Add an option 'iothreadpin' to the <cpuset> to allow for setting the CPU affinity for each IOThread.
The iothreadspin will mimic the vcpupin with respect to being able to assign each iothread to a specific CPU, although iothreads ids start at 1 while vcpu ids start at 0. This matches the iothread naming scheme.
Signed-off-by: John Ferlan <jferlan@redhat.com> --- docs/formatdomain.html.in | 18 +++ docs/schemas/domaincommon.rng | 10 ++ src/conf/domain_conf.c | 124 +++++++++++++++++++-- src/conf/domain_conf.h | 2 + .../qemuxml2argv-cputune-iothreads.xml | 38 +++++++ tests/qemuxml2xmltest.c | 1 + 6 files changed, 182 insertions(+), 11 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-cputune-iothreads.xml
@@ -11434,10 +11442,41 @@ virDomainVcpuPinDefParseXML(xmlNodePtr node, def->vcpuid = vcpuid; }
+ if (iothreads) { + tmp = virXPathString("string(./@iothread)", ctxt); + if (tmp && virStrToLong_uip(tmp, NULL, 10, &iothreadid) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("invalid setting for iothread '%s'"), tmp); + goto error; + }
VIR_FREE(tmp);
+ + if (iothreadid == 0) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("zero is an invalid iothread id value"));
Jan

On 09/03/2014 10:15 AM, John Ferlan wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1101574
Add an option 'iothreadpin' to the <cpuset> to allow for setting the CPU affinity for each IOThread.
The iothreadspin will mimic the vcpupin with respect to being able to assign each iothread to a specific CPU, although iothreads ids start at 1 while vcpu ids start at 0. This matches the iothread naming scheme.
Signed-off-by: John Ferlan <jferlan@redhat.com> ---
+++ b/docs/schemas/domaincommon.rng @@ -782,6 +782,16 @@ </attribute> </element> </optional> + <zeroOrMore> + <element name="iothreadpin"> + <attribute name="iothread"> + <ref name="unsignedInt"/> + </attribute> + <attribute name="cpuset"> + <ref name="cpuset"/> + </attribute> + </element> + </zeroOrMore> </element> </define>
We ought to do a followup patch that adds an <interleave>, so that all sub-elements of <cputune> can come in any order. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Wed, Sep 3, 2014 at 5:15 PM, John Ferlan <jferlan@redhat.com> wrote:
The following patches will add more support for IOThreads not completed from the initial patches. These changes support the remaining elements of bz https://bugzilla.redhat.com/show_bug.cgi?id=1101574 (working through unsetting the private bits - as there's nothing in there that should necessarily be private).
Changes:
1. Add "--iothread" option to virsh attach-disk to allow using a specific iothread when attaching a disk
2. Add the ability to set CPU affinity for an IOThread. This involves multiple steps (patches 2-6) of adding the infrastructure to support setting scheduler affinity for the IOThread including cgroup setup. For the most part it's a "copy" of the vCPU pinning code, but without the external interfaces - those will come at a later time after RHEL7.1.
3. Add to <cpuset/> a new element <iothreadpin iothread='#" cpuset="string"/>
NOTE: I can combine any/all of patches 2-6 - I just kept them separate so it wasn't a larger single patch.
Although future changes will support API's and virsh commands to modify the iothreadpin, this set of changes at least is backportable to RHEL7.1 since there are no external API changes.
John Ferlan (7): virsh: Add iothread to 'attach-disk' qemu: Issue query-iothreads and to get list of active IOThreads vircgroup: Introduce virCgroupNewIOThread qemu_domain: Add niothreadpids and iothreadpids qemu_cgroup: Introduce cgroup functions for IOThreads qemu: Allow pinning specific IOThreads to a CPU domain_conf: Add iothreadpin to cputune
docs/formatdomain.html.in | 18 +++ docs/schemas/domaincommon.rng | 10 ++ src/conf/domain_conf.c | 124 +++++++++++++++++++-- src/conf/domain_conf.h | 2 + src/libvirt_private.syms | 1 + src/qemu/qemu_cgroup.c | 104 +++++++++++++++++ src/qemu/qemu_cgroup.h | 5 + src/qemu/qemu_domain.c | 36 ++++++ src/qemu/qemu_domain.h | 3 + src/qemu/qemu_driver.c | 8 ++ src/qemu/qemu_monitor.c | 41 +++++++ src/qemu/qemu_monitor.h | 12 ++ src/qemu/qemu_monitor_json.c | 91 +++++++++++++++ src/qemu/qemu_monitor_json.h | 4 + src/qemu/qemu_process.c | 96 ++++++++++++++++ src/util/vircgroup.c | 43 +++++++ src/util/vircgroup.h | 6 + tests/qemumonitorjsontest.c | 71 ++++++++++++ .../qemuxml2argv-cputune-iothreads.xml | 38 +++++++ tests/qemuxml2xmltest.c | 1 + tools/virsh-domain.c | 13 ++- tools/virsh.pod | 5 +- 22 files changed, 717 insertions(+), 15 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-cputune-iothreads.xml
Looks good Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

On 09/03/2014 06:15 PM, John Ferlan wrote:
The following patches will add more support for IOThreads not completed from the initial patches. These changes support the remaining elements of bz https://bugzilla.redhat.com/show_bug.cgi?id=1101574 (working through unsetting the private bits - as there's nothing in there that should necessarily be private).
Changes:
1. Add "--iothread" option to virsh attach-disk to allow using a specific iothread when attaching a disk
2. Add the ability to set CPU affinity for an IOThread. This involves multiple steps (patches 2-6) of adding the infrastructure to support setting scheduler affinity for the IOThread including cgroup setup. For the most part it's a "copy" of the vCPU pinning code, but without the external interfaces - those will come at a later time after RHEL7.1.
3. Add to <cpuset/> a new element <iothreadpin iothread='#" cpuset="string"/>
NOTE: I can combine any/all of patches 2-6 - I just kept them separate so it wasn't a larger single patch.
IMO the split is fine as-is (maybe with the exception of the little cleanup in 5/7).
John Ferlan (7): virsh: Add iothread to 'attach-disk' qemu: Issue query-iothreads and to get list of active IOThreads vircgroup: Introduce virCgroupNewIOThread qemu_domain: Add niothreadpids and iothreadpids qemu_cgroup: Introduce cgroup functions for IOThreads qemu: Allow pinning specific IOThreads to a CPU domain_conf: Add iothreadpin to cputune
...
22 files changed, 717 insertions(+), 15 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-cputune-iothreads.xml
Looks good to me as well. ACK series. Jan

On 09/03/2014 12:15 PM, John Ferlan wrote: <...snip...>
John Ferlan (7): virsh: Add iothread to 'attach-disk' qemu: Issue query-iothreads and to get list of active IOThreads vircgroup: Introduce virCgroupNewIOThread qemu_domain: Add niothreadpids and iothreadpids qemu_cgroup: Introduce cgroup functions for IOThreads qemu: Allow pinning specific IOThreads to a CPU domain_conf: Add iothreadpin to cputune
docs/formatdomain.html.in | 18 +++ docs/schemas/domaincommon.rng | 10 ++ src/conf/domain_conf.c | 124 +++++++++++++++++++-- src/conf/domain_conf.h | 2 + src/libvirt_private.syms | 1 + src/qemu/qemu_cgroup.c | 104 +++++++++++++++++ src/qemu/qemu_cgroup.h | 5 + src/qemu/qemu_domain.c | 36 ++++++ src/qemu/qemu_domain.h | 3 + src/qemu/qemu_driver.c | 8 ++ src/qemu/qemu_monitor.c | 41 +++++++ src/qemu/qemu_monitor.h | 12 ++ src/qemu/qemu_monitor_json.c | 91 +++++++++++++++ src/qemu/qemu_monitor_json.h | 4 + src/qemu/qemu_process.c | 96 ++++++++++++++++ src/util/vircgroup.c | 43 +++++++ src/util/vircgroup.h | 6 + tests/qemumonitorjsontest.c | 71 ++++++++++++ .../qemuxml2argv-cputune-iothreads.xml | 38 +++++++ tests/qemuxml2xmltest.c | 1 + tools/virsh-domain.c | 13 ++- tools/virsh.pod | 5 +- 22 files changed, 717 insertions(+), 15 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-cputune-iothreads.xml
Thanks for the reviews - along with the VIR_FREE(tmp) in 7, I also squashed and pushed the following into patch 6 & 7 respectively due to a dogfooding experiment with my Coverity scan John Patch 6: (would have been a NEGATIVE_RETURNS due niothreads) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index cd1b431..6c412db 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -2131,9 +2131,11 @@ qemuProcessDetectIOThreadPIDs(virQEMUDriverPtr driver, ret = 0; cleanup: - for (i = 0; i < niothreads; i++) - qemuMonitorIOThreadsInfoFree(iothreads[i]); - VIR_FREE(iothreads); + if (iothreads) { + for (i = 0; i < niothreads; i++) + qemuMonitorIOThreadsInfoFree(iothreads[i]); + VIR_FREE(iothreads); + } return ret; } Patch 7: (iothreadid was uninitialized if tmp was NULL) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index bb09720..9cb3ebd 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -11513,9 +11513,8 @@ virDomainVcpuPinDefParseXML(xmlNodePtr node, def->vcpuid = vcpuid; } - if (iothreads) { - tmp = virXPathString("string(./@iothread)", ctxt); - if (tmp && virStrToLong_uip(tmp, NULL, 10, &iothreadid) < 0) { + if (iothreads && (tmp = virXPathString("string(./@iothread)", ctxt))) { + if (virStrToLong_uip(tmp, NULL, 10, &iothreadid) < 0) { virReportError(VIR_ERR_XML_ERROR, _("invalid setting for iothread '%s'"), tmp); goto error;

On 09/03/2014 10:15 AM, John Ferlan wrote:
The following patches will add more support for IOThreads not completed from the initial patches. These changes support the remaining elements of bz https://bugzilla.redhat.com/show_bug.cgi?id=1101574 (working through unsetting the private bits - as there's nothing in there that should necessarily be private).
Changes:
1. Add "--iothread" option to virsh attach-disk to allow using a specific iothread when attaching a disk
2. Add the ability to set CPU affinity for an IOThread. This involves multiple steps (patches 2-6) of adding the infrastructure to support setting scheduler affinity for the IOThread including cgroup setup. For the most part it's a "copy" of the vCPU pinning code, but without the external interfaces - those will come at a later time after RHEL7.1.
3. Add to <cpuset/> a new element <iothreadpin iothread='#" cpuset="string"/>
NOTE: I can combine any/all of patches 2-6 - I just kept them separate so it wasn't a larger single patch.
Although future changes will support API's and virsh commands to modify the iothreadpin, this set of changes at least is backportable to RHEL7.1 since there are no external API changes.
John Ferlan (7): virsh: Add iothread to 'attach-disk' qemu: Issue query-iothreads and to get list of active IOThreads vircgroup: Introduce virCgroupNewIOThread qemu_domain: Add niothreadpids and iothreadpids
Up to this patch, things worked for me,
qemu_cgroup: Introduce cgroup functions for IOThreads qemu: Allow pinning specific IOThreads to a CPU
but with these two patches, I get a compilation failure: qemu/qemu_cgroup.c: In function 'qemuSetupCgroupForIOThreads': qemu/qemu_cgroup.c:1160:41: error: 'struct <anonymous>' has no member named 'niothreadspin' for (j = 0; j < def->cputune.niothreadspin; j++) { ^
domain_conf: Add iothreadpin to cputune
and with this patch, I'm now unable to start a transient domain, with a less-than-stellar message: $ qemu-img create -f qcow2 -o compat=0.10 base.img 10M $ virsh create /dev/stdin <<EOF <domain type='kvm'> <name>testvm1</name> <memory unit='MiB'>256</memory> <vcpu>1</vcpu> <os> <type arch='x86_64'>hvm</type> </os> <devices> <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='$PWD/base.img'/> <target dev='vda' bus='virtio'/> </disk> <graphics type='vnc'/> </devices> </domain> EOF error: Failed to create domain from /dev/stdin error: An error occurred, but the cause is unknown Normally, that command starts up a guest (yeah, the guest is doing nothing but wasting CPU cycles, because it has a blank disk, but it's a great little smoke test for my recent work in block jobs) -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 09/15/2014 07:06 PM, Eric Blake wrote:
On 09/03/2014 10:15 AM, John Ferlan wrote:
The following patches will add more support for IOThreads not completed from the initial patches. These changes support the remaining elements of bz https://bugzilla.redhat.com/show_bug.cgi?id=1101574 (working through unsetting the private bits - as there's nothing in there that should necessarily be private).
Changes:
1. Add "--iothread" option to virsh attach-disk to allow using a specific iothread when attaching a disk
2. Add the ability to set CPU affinity for an IOThread. This involves multiple steps (patches 2-6) of adding the infrastructure to support setting scheduler affinity for the IOThread including cgroup setup. For the most part it's a "copy" of the vCPU pinning code, but without the external interfaces - those will come at a later time after RHEL7.1.
3. Add to <cpuset/> a new element <iothreadpin iothread='#" cpuset="string"/>
NOTE: I can combine any/all of patches 2-6 - I just kept them separate so it wasn't a larger single patch.
Although future changes will support API's and virsh commands to modify the iothreadpin, this set of changes at least is backportable to RHEL7.1 since there are no external API changes.
John Ferlan (7): virsh: Add iothread to 'attach-disk' qemu: Issue query-iothreads and to get list of active IOThreads vircgroup: Introduce virCgroupNewIOThread qemu_domain: Add niothreadpids and iothreadpids
Up to this patch, things worked for me,
Hmm... not sure what happened because I usually do go thru these one at a time before sending them out - of course that was Sep 3 before my brain was flooded with Coverity issues, so I don't quite remember everthing I did. I think adding the cgroup code was something I did last though since and figured I could rearrange things, but I guess I failed to check if the rearranging worked. In any case, the domain_conf.h from patch 7 should have been added before/with "qemu_cgroup: Introduce cgroup functions for IOThreads" Doing that resolves the build issue... and if I build/install from there I can run the code below for the first patch, but not the second one. The second one fails because of : if (niothreads <= 0) goto cleanup; However, changing it to + if (niothreads < 0) goto cleanup; + if (niothreads == 0) + return 0; Now this is even more perplexing because I'm quite certain I tested the path of starting a guest without iothreads configured, but apparently that doesn't seem to be the case. Patches coming... <sigh> John
qemu_cgroup: Introduce cgroup functions for IOThreads qemu: Allow pinning specific IOThreads to a CPU
but with these two patches, I get a compilation failure:
qemu/qemu_cgroup.c: In function 'qemuSetupCgroupForIOThreads': qemu/qemu_cgroup.c:1160:41: error: 'struct <anonymous>' has no member named 'niothreadspin' for (j = 0; j < def->cputune.niothreadspin; j++) { ^
domain_conf: Add iothreadpin to cputune
and with this patch, I'm now unable to start a transient domain, with a less-than-stellar message:
$ qemu-img create -f qcow2 -o compat=0.10 base.img 10M $ virsh create /dev/stdin <<EOF <domain type='kvm'> <name>testvm1</name> <memory unit='MiB'>256</memory> <vcpu>1</vcpu> <os> <type arch='x86_64'>hvm</type> </os> <devices> <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='$PWD/base.img'/> <target dev='vda' bus='virtio'/> </disk> <graphics type='vnc'/> </devices> </domain> EOF
error: Failed to create domain from /dev/stdin error: An error occurred, but the cause is unknown
Normally, that command starts up a guest (yeah, the guest is doing nothing but wasting CPU cycles, because it has a blank disk, but it's a great little smoke test for my recent work in block jobs)
Sounds like this smoke test needs to be added to the list of make check tests somehow...
participants (4)
-
Eric Blake
-
John Ferlan
-
Ján Tomko
-
Stefan Hajnoczi