[libvirt] [PATCH 0/6 v3] Add blkio cgroup support

Hi This patchset adds blkio cgroup support for qemu and lxc. [PATCH 1/6] cgroup: Enable cgroup hierarchy for blkio cgroup [PATCH 2/6 v3] cgroup: Implement blkio.weight tuning API. [PATCH 3/6] Update XML Schema for new entries. [PATCH 4/6 v3] qemu: Implement blkio tunable XML configuration and parsing. [PATCH 5/6 v3] LXC: LXC Blkio weight configuration support. [PATCH 6/6] Add documentation for blkiotune elements. Will post a patchset to implement virsh command "blkiotune" to tune blkio cgroup parameter later on. v2 -> v3 Changes: o Remove an unused local variable o Rename virCgroup(Set/Get)Weight to virCgroup(Set/Get)BlkioWeight o Add documentation in docs/formatdomain.html.in o Update XML Schema for new entries. docs/formatdomain.html.in | 10 ++++++++++ docs/schemas/domain.rng | 20 ++++++++++++++++++++ src/conf/domain_conf.c | 13 +++++++++++++ src/conf/domain_conf.h | 4 ++++ src/libvirt_private.syms | 2 ++ src/lxc/lxc_controller.c | 10 ++++++++++ src/qemu/qemu_cgroup.c | 16 +++++++++++++++- src/qemu/qemu_conf.c | 3 ++- src/util/cgroup.c | 41 ++++++++++++++++++++++++++++++++++++++++- src/util/cgroup.h | 4 ++++ 10 files changed, 120 insertions(+), 3 deletions(-) Thanks Gui

Enable cgroup hierarchy for blkio cgroup Acked-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> --- src/util/cgroup.c | 2 +- src/util/cgroup.h | 1 + 2 files changed, 2 insertions(+), 1 deletions(-) diff --git a/src/util/cgroup.c b/src/util/cgroup.c index cd9caba..309f4e9 100644 --- a/src/util/cgroup.c +++ b/src/util/cgroup.c @@ -37,7 +37,7 @@ VIR_ENUM_IMPL(virCgroupController, VIR_CGROUP_CONTROLLER_LAST, "cpu", "cpuacct", "cpuset", "memory", "devices", - "freezer"); + "freezer", "blkio"); struct virCgroupController { int type; diff --git a/src/util/cgroup.h b/src/util/cgroup.h index 964da7a..67b1299 100644 --- a/src/util/cgroup.h +++ b/src/util/cgroup.h @@ -22,6 +22,7 @@ enum { VIR_CGROUP_CONTROLLER_MEMORY, VIR_CGROUP_CONTROLLER_DEVICES, VIR_CGROUP_CONTROLLER_FREEZER, + VIR_CGROUP_CONTROLLER_BLKIO, VIR_CGROUP_CONTROLLER_LAST }; -- 1.7.1

On 02/07/2011 11:50 PM, Gui Jianfeng wrote:
Enable cgroup hierarchy for blkio cgroup
Acked-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> --- src/util/cgroup.c | 2 +- src/util/cgroup.h | 1 + 2 files changed, 2 insertions(+), 1 deletions(-)
ACK. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

Implement blkio.weight tuning API. Acked-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> --- src/libvirt_private.syms | 2 ++ src/util/cgroup.c | 39 +++++++++++++++++++++++++++++++++++++++ src/util/cgroup.h | 3 +++ 3 files changed, 44 insertions(+), 0 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 88e270c..490901e 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -77,6 +77,8 @@ virCgroupMounted; virCgroupRemove; virCgroupSetCpuShares; virCgroupSetFreezerState; +virCgroupSetBlkioWeight; +virCgroupGetBlkioWeight; virCgroupSetMemory; virCgroupSetMemoryHardLimit; virCgroupSetMemorySoftLimit; diff --git a/src/util/cgroup.c b/src/util/cgroup.c index 309f4e9..bb3f334 100644 --- a/src/util/cgroup.c +++ b/src/util/cgroup.c @@ -851,6 +851,45 @@ int virCgroupForDomain(virCgroupPtr driver ATTRIBUTE_UNUSED, #endif /** + * virCgroupSetBlkioWeight: + * + * @group: The cgroup to change io weight for + * @weight: The Weight for this cgroup + * + * Returns: 0 on success + */ +int virCgroupSetBlkioWeight(virCgroupPtr group, unsigned long weight) +{ + if (weight > 1000 || weight < 100) + return -EINVAL; + + return virCgroupSetValueU64(group, + VIR_CGROUP_CONTROLLER_BLKIO, + "blkio.weight", + weight); +} + +/** + * virCgroupGetBlkioWeight: + * + * @group: The cgroup to get weight for + * @Weight: Pointer to returned weight + * + * Returns: 0 on success + */ +int virCgroupGetBlkioWeight(virCgroupPtr group, unsigned long *weight) +{ + long long unsigned int __weight; + int ret; + ret = virCgroupGetValueU64(group, + VIR_CGROUP_CONTROLLER_BLKIO, + "blkio.weight", &__weight); + if (ret == 0) + *weight = (unsigned long) __weight; + return ret; +} + +/** * virCgroupSetMemory: * * @group: The cgroup to change memory for diff --git a/src/util/cgroup.h b/src/util/cgroup.h index 67b1299..f1a47dc 100644 --- a/src/util/cgroup.h +++ b/src/util/cgroup.h @@ -41,6 +41,9 @@ int virCgroupForDomain(virCgroupPtr driver, int virCgroupAddTask(virCgroupPtr group, pid_t pid); +int virCgroupSetBlkioWeight(virCgroupPtr group, unsigned long weight); +int virCgroupGetBlkioWeight(virCgroupPtr group, unsigned long *weight); + int virCgroupSetMemory(virCgroupPtr group, unsigned long long kb); int virCgroupGetMemoryUsage(virCgroupPtr group, unsigned long *kb); -- 1.7.1

On 02/07/2011 11:56 PM, Gui Jianfeng wrote:
Implement blkio.weight tuning API.
Acked-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> --- src/libvirt_private.syms | 2 ++ src/util/cgroup.c | 39 +++++++++++++++++++++++++++++++++++++++ src/util/cgroup.h | 3 +++ 3 files changed, 44 insertions(+), 0 deletions(-)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 88e270c..490901e 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -77,6 +77,8 @@ virCgroupMounted; virCgroupRemove; virCgroupSetCpuShares; virCgroupSetFreezerState; +virCgroupSetBlkioWeight; +virCgroupGetBlkioWeight;
Not in sorted order, but that's easy enough to fix.
+int virCgroupGetBlkioWeight(virCgroupPtr group, unsigned long *weight) +{ + long long unsigned int __weight;
Use of double underscore risks collision with system headers, not to mention it looks ugly. I'd rather s/__weight/tmp/.
+ int ret; + ret = virCgroupGetValueU64(group, + VIR_CGROUP_CONTROLLER_BLKIO, + "blkio.weight", &__weight); + if (ret == 0) + *weight = (unsigned long) __weight;
The cast is not strictly necessary. ACK with this squashed in: diff --git i/src/libvirt_private.syms w/src/libvirt_private.syms index 490901e..1bbd44e 100644 --- i/src/libvirt_private.syms +++ w/src/libvirt_private.syms @@ -66,6 +66,7 @@ virCgroupDenyDevicePath; virCgroupForDomain; virCgroupForDriver; virCgroupFree; +virCgroupGetBlkioWeight; virCgroupGetCpuShares; virCgroupGetCpuacctUsage; virCgroupGetFreezerState; @@ -75,10 +76,9 @@ virCgroupGetMemoryUsage; virCgroupGetSwapHardLimit; virCgroupMounted; virCgroupRemove; +virCgroupSetBlkioWeight; virCgroupSetCpuShares; virCgroupSetFreezerState; -virCgroupSetBlkioWeight; -virCgroupGetBlkioWeight; virCgroupSetMemory; virCgroupSetMemoryHardLimit; virCgroupSetMemorySoftLimit; diff --git i/src/util/cgroup.c w/src/util/cgroup.c index bb3f334..9cdfc6e 100644 --- i/src/util/cgroup.c +++ w/src/util/cgroup.c @@ -1,7 +1,7 @@ /* * cgroup.c: Tools for managing cgroups * - * Copyright (C) 2010 Red Hat, Inc. + * Copyright (C) 2010-2011 Red Hat, Inc. * Copyright IBM Corp. 2008 * * See COPYING.LIB for the License of this software @@ -879,13 +879,13 @@ int virCgroupSetBlkioWeight(virCgroupPtr group, unsigned long weight) */ int virCgroupGetBlkioWeight(virCgroupPtr group, unsigned long *weight) { - long long unsigned int __weight; + long long unsigned int tmp; int ret; ret = virCgroupGetValueU64(group, VIR_CGROUP_CONTROLLER_BLKIO, - "blkio.weight", &__weight); + "blkio.weight", &tmp); if (ret == 0) - *weight = (unsigned long) __weight; + *weight = tmp; return ret; } -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

Update XML Schema for new entries. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> --- docs/schemas/domain.rng | 20 ++++++++++++++++++++ 1 files changed, 20 insertions(+), 0 deletions(-) diff --git a/docs/schemas/domain.rng b/docs/schemas/domain.rng index 53c07fd..a9391d3 100644 --- a/docs/schemas/domain.rng +++ b/docs/schemas/domain.rng @@ -306,6 +306,18 @@ </element> </optional> + <!-- The Blkio cgroup related tunables would go in the blkiotune --> + <optional> + <element name="blkiotune"> + <!-- I/O weight the VM can use --> + <optional> + <element name="weight"> + <ref name="WEIGHT"/> + </element> + </optional> + </element> + </optional> + <!-- All the memory/swap related tunables would go in the memtune --> <optional> <element name="memtune"> @@ -2150,6 +2162,7 @@ A domain name shoul be made of ascii, numbers, _-+ and is non-empty UUID currently allows only the 32 characters strict syntax memoryKB request at least 4Mbytes though Xen will grow bigger if too low + WEIGHT currently is in range [100, 1000] --> <define name="unsignedInt"> <data type="unsignedInt"> @@ -2182,6 +2195,13 @@ <param name="minInclusive">-1</param> </data> </define> + <define name="WEIGHT"> + <data type="unsignedInt"> + <param name="pattern">[0-9]+</param> + <param name="minInclusive">100</param> + <param name="maxInclusive">1000</param> + </data> + </define> <define name="memoryKB"> <data type="unsignedInt"> <param name="pattern">[0-9]+</param> -- 1.7.1

On 02/07/2011 11:58 PM, Gui Jianfeng wrote:
Update XML Schema for new entries.
Subject line could use 'cgroups:' lead-in.
+ <!-- The Blkio cgroup related tunables would go in the blkiotune --> + <optional> + <element name="blkiotune"> + <!-- I/O weight the VM can use --> + <optional> + <element name="weight"> + <ref name="WEIGHT"/>
Why all caps?
@@ -2150,6 +2162,7 @@ A domain name shoul be made of ascii, numbers, _-+ and is non-empty
As long as we're touching this area of the file, s/shoul/should/ ACK with this squashed in: diff --git i/docs/schemas/domain.rng w/docs/schemas/domain.rng index a9391d3..7fe66e3 100644 --- i/docs/schemas/domain.rng +++ w/docs/schemas/domain.rng @@ -312,7 +312,7 @@ <!-- I/O weight the VM can use --> <optional> <element name="weight"> - <ref name="WEIGHT"/> + <ref name="weight"/> </element> </optional> </element> @@ -2159,10 +2159,10 @@ Type library Our unsignedInt doesn't allow a leading '+' in its lexical form - A domain name shoul be made of ascii, numbers, _-+ and is non-empty + A domain name should be made of ascii, numbers, _-+ and is non-empty UUID currently allows only the 32 characters strict syntax memoryKB request at least 4Mbytes though Xen will grow bigger if too low - WEIGHT currently is in range [100, 1000] + weight currently is in range [100, 1000] --> <define name="unsignedInt"> <data type="unsignedInt"> @@ -2195,7 +2195,7 @@ <param name="minInclusive">-1</param> </data> </define> - <define name="WEIGHT"> + <define name="weight"> <data type="unsignedInt"> <param name="pattern">[0-9]+</param> <param name="minInclusive">100</param> -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

Implement blkio tunable XML configuration and parsing. Reviewed-by: "Nikunj A. Dadhania" <nikunj@linux.vnet.ibm.com> Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> --- src/conf/domain_conf.c | 13 +++++++++++++ src/conf/domain_conf.h | 4 ++++ src/qemu/qemu_cgroup.c | 16 +++++++++++++++- src/qemu/qemu_conf.c | 3 ++- 4 files changed, 34 insertions(+), 2 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 9369ed4..94369e2 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -5149,6 +5149,11 @@ static virDomainDefPtr virDomainDefParseXML(virCapsPtr caps, if (node) def->mem.hugepage_backed = 1; + /* Extract blkio cgroup tunables */ + if (virXPathULong("string(./blkiotune/weight)", ctxt, + &def->blkio.weight) < 0) + def->blkio.weight = 0; + /* Extract other memory tunables */ if (virXPathULong("string(./memtune/hard_limit)", ctxt, &def->mem.hard_limit) < 0) @@ -7682,6 +7687,14 @@ char *virDomainDefFormat(virDomainDefPtr def, virBufferVSprintf(&buf, " <currentMemory>%lu</currentMemory>\n", def->mem.cur_balloon); + /* add blkiotune only if there are any */ + if (def->blkio.weight) { + virBufferVSprintf(&buf, " <blkiotune>\n"); + virBufferVSprintf(&buf, " <weight>%lu</weight>\n", + def->blkio.weight); + virBufferVSprintf(&buf, " </blkiotune>\n"); + } + /* add memtune only if there are any */ if (def->mem.hard_limit || def->mem.soft_limit || def->mem.min_guarantee || def->mem.swap_hard_limit) diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 5d35e43..80d58a0 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -1029,6 +1029,10 @@ struct _virDomainDef { char *description; struct { + unsigned long weight; + } blkio; + + struct { unsigned long max_balloon; unsigned long cur_balloon; unsigned long hugepage_backed; diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index 82d3695..0622c9e 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -54,7 +54,6 @@ int qemuCgroupControllerActive(struct qemud_driver *driver, return 0; } - int qemuSetupDiskPathAllow(virDomainDiskDefPtr disk ATTRIBUTE_UNUSED, const char *path, size_t depth ATTRIBUTE_UNUSED, @@ -270,6 +269,21 @@ int qemuSetupCgroup(struct qemud_driver *driver, } } + if (qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_BLKIO)) { + if (vm->def->blkio.weight != 0) { + rc = virCgroupSetBlkioWeight(cgroup, vm->def->blkio.weight); + if(rc != 0) { + virReportSystemError(-rc, + _("Unable to set io weight for domain %s"), + vm->def->name); + goto cleanup; + } + } + } else { + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Block I/O tuning is not available on this host")); + } + if ((rc = qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_MEMORY))) { if (vm->def->mem.hard_limit != 0) { rc = virCgroupSetMemoryHardLimit(cgroup, vm->def->mem.hard_limit); diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c index 9f9e99e..9ba60b1 100644 --- a/src/qemu/qemu_conf.c +++ b/src/qemu/qemu_conf.c @@ -303,7 +303,8 @@ int qemudLoadDriverConfig(struct qemud_driver *driver, driver->cgroupControllers = (1 << VIR_CGROUP_CONTROLLER_CPU) | (1 << VIR_CGROUP_CONTROLLER_DEVICES) | - (1 << VIR_CGROUP_CONTROLLER_MEMORY); + (1 << VIR_CGROUP_CONTROLLER_MEMORY) | + (1 << VIR_CGROUP_CONTROLLER_BLKIO); } for (i = 0 ; i < VIR_CGROUP_CONTROLLER_LAST ; i++) { if (driver->cgroupControllers & (1 << i)) { -- 1.7.1

On 02/07/2011 11:59 PM, Gui Jianfeng wrote:
Implement blkio tunable XML configuration and parsing.
Reviewed-by: "Nikunj A. Dadhania" <nikunj@linux.vnet.ibm.com> Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> --- src/conf/domain_conf.c | 13 +++++++++++++ src/conf/domain_conf.h | 4 ++++ src/qemu/qemu_cgroup.c | 16 +++++++++++++++- src/qemu/qemu_conf.c | 3 ++- 4 files changed, 34 insertions(+), 2 deletions(-)
+ /* Extract blkio cgroup tunables */ + if (virXPathULong("string(./blkiotune/weight)", ctxt, + &def->blkio.weight) < 0) + def->blkio.weight = 0;
No range validation here. But since it fails down the road if it is outside [100,1000], use of a ULong is overkill (wastes four bytes); technically, a short would work, but we don't have helper functions for short, so I'm swapping this to unsigned int.
struct { + unsigned long weight; + } blkio; + +++ b/src/qemu/qemu_cgroup.c @@ -54,7 +54,6 @@ int qemuCgroupControllerActive(struct qemud_driver *driver, return 0; }
- int qemuSetupDiskPathAllow(virDomainDiskDefPtr disk ATTRIBUTE_UNUSED,
Spurious whitespace change.
const char *path, size_t depth ATTRIBUTE_UNUSED, @@ -270,6 +269,21 @@ int qemuSetupCgroup(struct qemud_driver *driver, } }
+ if (qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_BLKIO)) { + if (vm->def->blkio.weight != 0) { + rc = virCgroupSetBlkioWeight(cgroup, vm->def->blkio.weight);
Oh, if I change types, that means that virCgroupSetBlkioWeight in patch 2/6 needs to handle unsigned int, not unsigned long. ACK with this squashed in (part of it into patch 2): diff --git c/src/conf/domain_conf.c w/src/conf/domain_conf.c index 94369e2..c299c03 100644 --- c/src/conf/domain_conf.c +++ w/src/conf/domain_conf.c @@ -5150,7 +5150,7 @@ static virDomainDefPtr virDomainDefParseXML(virCapsPtr caps, def->mem.hugepage_backed = 1; /* Extract blkio cgroup tunables */ - if (virXPathULong("string(./blkiotune/weight)", ctxt, + if (virXPathUInt("string(./blkiotune/weight)", ctxt, &def->blkio.weight) < 0) def->blkio.weight = 0; @@ -7690,7 +7690,7 @@ char *virDomainDefFormat(virDomainDefPtr def, /* add blkiotune only if there are any */ if (def->blkio.weight) { virBufferVSprintf(&buf, " <blkiotune>\n"); - virBufferVSprintf(&buf, " <weight>%lu</weight>\n", + virBufferVSprintf(&buf, " <weight>%u</weight>\n", def->blkio.weight); virBufferVSprintf(&buf, " </blkiotune>\n"); } diff --git c/src/conf/domain_conf.h w/src/conf/domain_conf.h index 80d58a0..491301f 100644 --- c/src/conf/domain_conf.h +++ w/src/conf/domain_conf.h @@ -1029,7 +1029,7 @@ struct _virDomainDef { char *description; struct { - unsigned long weight; + unsigned int weight; } blkio; struct { diff --git c/src/qemu/qemu_cgroup.c w/src/qemu/qemu_cgroup.c index 0622c9e..8cd6ce9 100644 --- c/src/qemu/qemu_cgroup.c +++ w/src/qemu/qemu_cgroup.c @@ -54,6 +54,7 @@ int qemuCgroupControllerActive(struct qemud_driver *driver, return 0; } + int qemuSetupDiskPathAllow(virDomainDiskDefPtr disk ATTRIBUTE_UNUSED, const char *path, size_t depth ATTRIBUTE_UNUSED, diff --git c/src/util/cgroup.c w/src/util/cgroup.c index 9cdfc6e..de1fd8e 100644 --- c/src/util/cgroup.c +++ w/src/util/cgroup.c @@ -858,7 +858,7 @@ int virCgroupForDomain(virCgroupPtr driver ATTRIBUTE_UNUSED, * * Returns: 0 on success */ -int virCgroupSetBlkioWeight(virCgroupPtr group, unsigned long weight) +int virCgroupSetBlkioWeight(virCgroupPtr group, unsigned int weight) { if (weight > 1000 || weight < 100) return -EINVAL; @@ -877,9 +877,9 @@ int virCgroupSetBlkioWeight(virCgroupPtr group, unsigned long weight) * * Returns: 0 on success */ -int virCgroupGetBlkioWeight(virCgroupPtr group, unsigned long *weight) +int virCgroupGetBlkioWeight(virCgroupPtr group, unsigned int *weight) { - long long unsigned int tmp; + unsigned long long tmp; int ret; ret = virCgroupGetValueU64(group, VIR_CGROUP_CONTROLLER_BLKIO, diff --git c/src/util/cgroup.h w/src/util/cgroup.h index f1a47dc..f1bdd0f 100644 --- c/src/util/cgroup.h +++ w/src/util/cgroup.h @@ -1,6 +1,7 @@ /* * cgroup.h: Interface to tools for managing cgroups * + * Copyright (C) 2011 Red Hat, Inc. * Copyright IBM Corp. 2008 * * See COPYING.LIB for the License of this software @@ -41,8 +42,8 @@ int virCgroupForDomain(virCgroupPtr driver, int virCgroupAddTask(virCgroupPtr group, pid_t pid); -int virCgroupSetBlkioWeight(virCgroupPtr group, unsigned long weight); -int virCgroupGetBlkioWeight(virCgroupPtr group, unsigned long *weight); +int virCgroupSetBlkioWeight(virCgroupPtr group, unsigned int weight); +int virCgroupGetBlkioWeight(virCgroupPtr group, unsigned int *weight); int virCgroupSetMemory(virCgroupPtr group, unsigned long long kb); int virCgroupGetMemoryUsage(virCgroupPtr group, unsigned long *kb); -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

LXC Blkio weight configuration support. Reviewed-by: "Nikunj A. Dadhania" <nikunj@linux.vnet.ibm.com> Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> --- src/lxc/lxc_controller.c | 10 ++++++++++ 1 files changed, 10 insertions(+), 0 deletions(-) diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index af0b70c..0db6673 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -104,6 +104,16 @@ static int lxcSetContainerResources(virDomainDefPtr def) goto cleanup; } + if (def->blkio.weight) { + rc = virCgroupSetBlkioWeight(cgroup, def->blkio.weight); + if (rc != 0) { + virReportSystemError(-rc, + _("Unable to set Blkio weight for domain %s"), + def->name); + goto cleanup; + } + } + rc = virCgroupSetMemory(cgroup, def->mem.max_balloon); if (rc != 0) { virReportSystemError(-rc, -- 1.7.1

On 02/08/2011 12:00 AM, Gui Jianfeng wrote:
LXC Blkio weight configuration support.
Reviewed-by: "Nikunj A. Dadhania" <nikunj@linux.vnet.ibm.com> Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> --- src/lxc/lxc_controller.c | 10 ++++++++++ 1 files changed, 10 insertions(+), 0 deletions(-)
diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index af0b70c..0db6673 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -104,6 +104,16 @@ static int lxcSetContainerResources(virDomainDefPtr def) goto cleanup; }
+ if (def->blkio.weight) { + rc = virCgroupSetBlkioWeight(cgroup, def->blkio.weight); + if (rc != 0) { + virReportSystemError(-rc, + _("Unable to set Blkio weight for domain %s"), + def->name); + goto cleanup; + } + } +
ACK; no change needed (even with type changes introduced in commit 2 and 4). However, the fact that you are implementing things for 2 hypervisors implies that for backporting purposes, it might have been better to split patch 4 into two patches (one for src/conf, and the other for scr/lxc), so that someone could cherry-pick just the src/conf and src/lxc changes. Not a big enough deal to worry about, though. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

Add documentation for blkiotune elements. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> --- docs/formatdomain.html.in | 10 ++++++++++ 1 files changed, 10 insertions(+), 0 deletions(-) diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 43c78fc..407e5f4 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -275,6 +275,9 @@ <memoryBacking> <hugepages/> </memoryBacking> + <blkiotune> + <weight>800</weight> + </blkiotune> <memtune> <hard_limit>1048576</hard_limit> <soft_limit>131072</soft_limit> @@ -298,6 +301,13 @@ <code>hugepages</code> element set within it. This tells the hypervisor that the guest should have its memory allocated using hugepages instead of the normal native page size.</dd> + <dt><code>blkiotune</code></dt> + <dd> The optional <code>blkiotune</code> element provides the ability + to tune Blkio cgroup tuneable parameters for the domain. If this is + omitted, OS will provides the default values.</dd> + <dt><code>weight</code></dt> + <dd> The optional <code>weight</code> element is the I/O weight of the + guest. The value should be in range [100, 1000].</dd> <dt><code>memtune</code></dt> <dd> The optional <code>memtune</code> element provides details regarding the memory tuneable parameters for the domain. If this is -- 1.7.1

On 02/08/2011 12:02 AM, Gui Jianfeng wrote:
Add documentation for blkiotune elements.
Adding 'cgroup:' to the subject.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> --- docs/formatdomain.html.in | 10 ++++++++++ 1 files changed, 10 insertions(+), 0 deletions(-)
@@ -298,6 +301,13 @@ <code>hugepages</code> element set within it. This tells the hypervisor that the guest should have its memory allocated using hugepages instead of the normal native page size.</dd> + <dt><code>blkiotune</code></dt> + <dd> The optional <code>blkiotune</code> element provides the ability + to tune Blkio cgroup tuneable parameters for the domain. If this is
Pre-existing, and you copied it, but indentation wasn't consistent; that's easy to fix. Pre-existing, and you copied it, but I'm more familiar with 'tunable' than 'tuneable'. Both forms are listed in the dictionary, but tuneable is listed second, and http://www.googlefight.com/index.php?word1=tunable&word2=tuneable confirms that it is less common. /me what has the world come to, when I settle spelling questions via google fight? ACK with this squashed in: diff --git i/docs/formatdomain.html.in w/docs/formatdomain.html.in index 407e5f4..9130767 100644 --- i/docs/formatdomain.html.in +++ w/docs/formatdomain.html.in @@ -303,15 +303,15 @@ hugepages instead of the normal native page size.</dd> <dt><code>blkiotune</code></dt> <dd> The optional <code>blkiotune</code> element provides the ability - to tune Blkio cgroup tuneable parameters for the domain. If this is - omitted, OS will provides the default values.</dd> + to tune Blkio cgroup tunable parameters for the domain. If this is + omitted, OS will provides the default values.</dd> <dt><code>weight</code></dt> <dd> The optional <code>weight</code> element is the I/O weight of the - guest. The value should be in range [100, 1000].</dd> + guest. The value should be in range [100, 1000].</dd> <dt><code>memtune</code></dt> <dd> The optional <code>memtune</code> element provides details - regarding the memory tuneable parameters for the domain. If this is - omitted, it defaults to the OS provided defaults.</dd> + regarding the memory tunable parameters for the domain. If this is + omitted, it defaults to the OS provided defaults.</dd> <dt><code>hard_limit</code></dt> <dd> The optional <code>hard_limit</code> element is the maximum memory the guest can use. The units for this value are kilobytes (i.e. blocks [Oh, and that makes for a minor conflict with my patch to remove all TABs, which is still pending review: https://www.redhat.com/archives/libvir-list/2011-February/msg00129.html] Oops; I also realized you missed one review comment: https://www.redhat.com/archives/libvir-list/2011-January/msg01081.html which asked for a new testcase in qemuxml2argv. I'm squashing this in to patch 4: diff --git c/tests/qemuxml2argvdata/qemuxml2argv-blkiotune.args i/tests/qemuxml2argvdata/qemuxml2argv-blkiotune.args new file mode 100644 index 0000000..651793d --- /dev/null +++ i/tests/qemuxml2argvdata/qemuxml2argv-blkiotune.args @@ -0,0 +1,4 @@ +LC_ALL=C PATH=/bin HOME=/home/test USER=test LOGNAME=test /usr/bin/qemu -S -M \ +pc -m 214 -smp 1 -name QEMUGuest1 -nographic -monitor unix:/tmp/test-monitor,\ +server,nowait -no-acpi -boot c -hda /dev/HostVG/QEMUGuest1 -net none -serial \ +none -parallel none -usb diff --git c/tests/qemuxml2argvdata/qemuxml2argv-blkiotune.xml i/tests/qemuxml2argvdata/qemuxml2argv-blkiotune.xml new file mode 100644 index 0000000..4fa03ef --- /dev/null +++ i/tests/qemuxml2argvdata/qemuxml2argv-blkiotune.xml @@ -0,0 +1,28 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory>219136</memory> + <currentMemory>219136</currentMemory> + <blkiotune> + <weight>800</weight> + </blkiotune> + <vcpu>1</vcpu> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu</emulator> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='ide'/> + <address type='drive' controller='0' bus='0' unit='0'/> + </disk> + <controller type='ide' index='0'/> + <memballoon model='virtio'/> + </devices> +</domain> diff --git c/tests/qemuxml2argvtest.c i/tests/qemuxml2argvtest.c index 9512bdc..52808b5 100644 --- c/tests/qemuxml2argvtest.c +++ i/tests/qemuxml2argvtest.c @@ -480,6 +480,9 @@ mymain(int argc, char **argv) DO_TEST("cpu-exact2", 0, false); DO_TEST("cpu-strict1", 0, false); + DO_TEST("memtune", QEMUD_CMD_FLAG_NAME, false); + DO_TEST("blkiotune", QEMUD_CMD_FLAG_NAME, false); + free(driver.stateDir); virCapabilitiesFree(driver.caps); diff --git c/tests/qemuxml2xmltest.c i/tests/qemuxml2xmltest.c index dad91d4..15d94b7 100644 --- c/tests/qemuxml2xmltest.c +++ i/tests/qemuxml2xmltest.c @@ -182,6 +182,7 @@ mymain(int argc, char **argv) DO_TEST("encrypted-disk"); DO_TEST("memtune"); + DO_TEST("blkiotune"); DO_TEST("smp"); So, since I've [h]ack'd your series, I've now applied the amended results. Thanks again for the contribution! -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On 02/07/2011 11:38 PM, Gui Jianfeng wrote:
Hi
This patchset adds blkio cgroup support for qemu and lxc.
[PATCH 1/6] cgroup: Enable cgroup hierarchy for blkio cgroup [PATCH 2/6 v3] cgroup: Implement blkio.weight tuning API. [PATCH 3/6] Update XML Schema for new entries. [PATCH 4/6 v3] qemu: Implement blkio tunable XML configuration and parsing. [PATCH 5/6 v3] LXC: LXC Blkio weight configuration support. [PATCH 6/6] Add documentation for blkiotune elements.
Will post a patchset to implement virsh command "blkiotune" to tune blkio cgroup parameter later on.
Please do :) It's not a showstopper (which is why I pushed the series), since 'dumpxml' and 'edit' can do the same thing. But without the new virsh command, getting at blkiotune is awkward, and you already have memtune to copy from. Remember to update virsh.pod in the process. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

Eric Blake wrote:
On 02/07/2011 11:38 PM, Gui Jianfeng wrote:
Hi
This patchset adds blkio cgroup support for qemu and lxc.
[PATCH 1/6] cgroup: Enable cgroup hierarchy for blkio cgroup [PATCH 2/6 v3] cgroup: Implement blkio.weight tuning API. [PATCH 3/6] Update XML Schema for new entries. [PATCH 4/6 v3] qemu: Implement blkio tunable XML configuration and parsing. [PATCH 5/6 v3] LXC: LXC Blkio weight configuration support. [PATCH 6/6] Add documentation for blkiotune elements.
Will post a patchset to implement virsh command "blkiotune" to tune blkio cgroup parameter later on.
Please do :)
It's not a showstopper (which is why I pushed the series), since 'dumpxml' and 'edit' can do the same thing. But without the new virsh command, getting at blkiotune is awkward, and you already have memtune to copy from. Remember to update virsh.pod in the process.
Eric, Thanks for reviewing and committing the series, will work on the blkiotune later. Thanks, Gui

Hi not that I tested this patch or the patch that followed (adding blkiotune command) ... but the way I understand, the cgroup blkio controller does only work for sync'd IO/queues. quote blkio-controller.txt from 2.6.37: Currently only sync IO queues are support. All the buffered writes are still system wide and not per group. When I last tested this (using manual cgroups commands), this seemed true. So may I ask how you use this and how this helps in which setup? Regards Dominik On 02/08/2011 07:38 AM, Gui Jianfeng wrote:
Hi
This patchset adds blkio cgroup support for qemu and lxc.
[PATCH 1/6] cgroup: Enable cgroup hierarchy for blkio cgroup [PATCH 2/6 v3] cgroup: Implement blkio.weight tuning API. [PATCH 3/6] Update XML Schema for new entries. [PATCH 4/6 v3] qemu: Implement blkio tunable XML configuration and parsing. [PATCH 5/6 v3] LXC: LXC Blkio weight configuration support. [PATCH 6/6] Add documentation for blkiotune elements.
Will post a patchset to implement virsh command "blkiotune" to tune blkio cgroup parameter later on.
v2 -> v3 Changes: o Remove an unused local variable o Rename virCgroup(Set/Get)Weight to virCgroup(Set/Get)BlkioWeight o Add documentation in docs/formatdomain.html.in o Update XML Schema for new entries.
docs/formatdomain.html.in | 10 ++++++++++ docs/schemas/domain.rng | 20 ++++++++++++++++++++ src/conf/domain_conf.c | 13 +++++++++++++ src/conf/domain_conf.h | 4 ++++ src/libvirt_private.syms | 2 ++ src/lxc/lxc_controller.c | 10 ++++++++++ src/qemu/qemu_cgroup.c | 16 +++++++++++++++- src/qemu/qemu_conf.c | 3 ++- src/util/cgroup.c | 41 ++++++++++++++++++++++++++++++++++++++++- src/util/cgroup.h | 4 ++++ 10 files changed, 120 insertions(+), 3 deletions(-)
Thanks Gui
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
-- IN-telegence GmbH Oskar-Jäger-Str. 125 50825 Köln Registergericht AG Köln - HRB 34038 USt-ID DE210882245 Geschäftsführende Gesellschafter: Christian Plätke und Holger Jansen

Dominik Klein wrote:
Hi
not that I tested this patch or the patch that followed (adding blkiotune command) ... but the way I understand, the cgroup blkio controller does only work for sync'd IO/queues.
quote blkio-controller.txt from 2.6.37: Currently only sync IO queues are support. All the buffered writes are still system wide and not per group.
When I last tested this (using manual cgroups commands), this seemed true.
So may I ask how you use this and how this helps in which setup?
Hi, Dominik Actually, this two series doesn't care how blkio cgroup works. They just provide the ability to put Guest into given cgroup and tune the tunables. Currently, You can only control the "blkio.weight" tunable. For examle, If you'd like to give more I/O bandwidth to a given Guest, you can assign a bigger value to "blkio.weight" by running blkiotune command. Thanks, Gui
Regards Dominik
On 02/08/2011 07:38 AM, Gui Jianfeng wrote:
Hi
This patchset adds blkio cgroup support for qemu and lxc.
[PATCH 1/6] cgroup: Enable cgroup hierarchy for blkio cgroup [PATCH 2/6 v3] cgroup: Implement blkio.weight tuning API. [PATCH 3/6] Update XML Schema for new entries. [PATCH 4/6 v3] qemu: Implement blkio tunable XML configuration and parsing. [PATCH 5/6 v3] LXC: LXC Blkio weight configuration support. [PATCH 6/6] Add documentation for blkiotune elements.
Will post a patchset to implement virsh command "blkiotune" to tune blkio cgroup parameter later on.
v2 -> v3 Changes: o Remove an unused local variable o Rename virCgroup(Set/Get)Weight to virCgroup(Set/Get)BlkioWeight o Add documentation in docs/formatdomain.html.in o Update XML Schema for new entries.
docs/formatdomain.html.in | 10 ++++++++++ docs/schemas/domain.rng | 20 ++++++++++++++++++++ src/conf/domain_conf.c | 13 +++++++++++++ src/conf/domain_conf.h | 4 ++++ src/libvirt_private.syms | 2 ++ src/lxc/lxc_controller.c | 10 ++++++++++ src/qemu/qemu_cgroup.c | 16 +++++++++++++++- src/qemu/qemu_conf.c | 3 ++- src/util/cgroup.c | 41 ++++++++++++++++++++++++++++++++++++++++- src/util/cgroup.h | 4 ++++ 10 files changed, 120 insertions(+), 3 deletions(-)
Thanks Gui
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
-- Regards Gui Jianfeng

Hi Gui, thanks for your reply.
Actually, this two series doesn't care how blkio cgroup works. They just provide the ability to put Guest into given cgroup and tune the tunables. Currently, You can only control the "blkio.weight" tunable. For examle, If you'd like to give more I/O bandwidth to a given Guest, you can assign a bigger value to "blkio.weight" by running blkiotune command.
Okay, so then basically, if I understand the current state of the blkio controller correctly (see my original message), this is useless at this point. I don't mean to offend you in any case, I just wondered how you used (or intended to use) this, since I myself did not yet find a good way to actually limit a guests I/O hunger. Regards Dominik

Dominik Klein wrote:
Hi Gui,
thanks for your reply.
Actually, this two series doesn't care how blkio cgroup works. They just provide the ability to put Guest into given cgroup and tune the tunables. Currently, You can only control the "blkio.weight" tunable. For examle, If you'd like to give more I/O bandwidth to a given Guest, you can assign a bigger value to "blkio.weight" by running blkiotune command.
Okay, so then basically, if I understand the current state of the blkio controller correctly (see my original message), this is useless at this point.
I don't mean to offend you in any case, I just wondered how you used (or intended to use) this, since I myself did not yet find a good way to actually limit a guests I/O hunger.
Hi Dominik, Hmm.. "blkio.weight" is used to control the Minimal Maximal Bandwidth. If you'd like to control Max bandwidth to let your guests become I/O hungery. "blkio.throttle.*" should helps. But these tunables aren't supported by blkiotune for the time being. I'm considering to implement them in the future. Thanks, Gui
Regards Dominik

Hmm.. "blkio.weight" is used to control the Minimal Maximal Bandwidth. If you'd like to control Max bandwidth to let your guests become I/O hungery. "blkio.throttle.*" should helps. But these tunables aren't supported by blkiotune for the time being. I'm considering to implement them in the future.
Hm, maybe I was not clear then. Let me try to rephrase. As far as I understand, and as far as I saw during my tests, the blkio controller only works for sync I/O requests (eg dd with oflag=direct). Buffered I/O is not part of the control. And since a VM's I/O is most likely buffered in some fashion, this does not have any effect. That's what my tests showed. Did your tests show something different? Maybe I did things wrong then. Regards Dominik

Dominik Klein wrote:
Hmm.. "blkio.weight" is used to control the Minimal Maximal Bandwidth. If you'd like to control Max bandwidth to let your guests become I/O hungery. "blkio.throttle.*" should helps. But these tunables aren't supported by blkiotune for the time being. I'm considering to implement them in the future.
Hm, maybe I was not clear then. Let me try to rephrase.
As far as I understand, and as far as I saw during my tests, the blkio controller only works for sync I/O requests (eg dd with oflag=direct). Buffered I/O is not part of the control.
Yes, you are right.
And since a VM's I/O is most likely buffered in some fashion, this does not have any effect. That's what my tests showed.
how about the start Guest with option "cache=none" to bypass pagecache? This should help i think.
Did your tests show something different? Maybe I did things wrong then.
I think blkio cgroup should work if "cache=none" is set. But I didn't try it, Will try it later. Thanks Gui
Regards Dominik
-- Regards Gui Jianfeng

As far as I understand, and as far as I saw during my tests, the blkio controller only works for sync I/O requests (eg dd with oflag=direct). Buffered I/O is not part of the control.
Yes, you are right.
whew :)
And since a VM's I/O is most likely buffered in some fashion, this does not have any effect. That's what my tests showed.
how about the start Guest with option "cache=none" to bypass pagecache? This should help i think.
I will read up on where to set that and give it a try. Thanks for the hint.
Did your tests show something different? Maybe I did things wrong then.
I think blkio cgroup should work if "cache=none" is set. But I didn't try it, Will try it later.
Please let us know what you found out during your tests. Regards Dominik

Hi back with some testing results.
how about the start Guest with option "cache=none" to bypass pagecache? This should help i think.
I will read up on where to set that and give it a try. Thanks for the hint.
So here's what I did and found out: The host system has 2 12 core CPUs and 128 GB of Ram. I have 8 test VMs named kernel1 to kernel8. Each VM has 4 VCPUs, 2 GB of RAm and one disk, which is an lv on the host. Cache mode is "none": for vm in kernel1 kernel2 kernel3 kernel4 kernel5 kernel6 kernel7 kernel8; do virsh dumpxml $vm|grep cache; done <driver name='qemu' type='raw' cache='none'/> <driver name='qemu' type='raw' cache='none'/> <driver name='qemu' type='raw' cache='none'/> <driver name='qemu' type='raw' cache='none'/> <driver name='qemu' type='raw' cache='none'/> <driver name='qemu' type='raw' cache='none'/> <driver name='qemu' type='raw' cache='none'/> <driver name='qemu' type='raw' cache='none'/> My goal is to give more I/O time to kernel1 and kernel2 than to the rest of the VMs. mount -t cgroup -o blkio none /mnt cd /mnt mkdir important mkdir notimportant echo 1000 > important/blkio.weight echo 100 > notimportant/blkio.weight for vm in kernel3 kernel4 kernel5 kernel6 kernel7 kernel8; do cd /proc/$(pgrep -f "qemu-kvm.*$vm")/task for task in *; do /bin/echo $task > /mnt/notimportant/tasks done done for vm in kernel1 kernel2; do cd /proc/$(pgrep -f "qemu-kvm.*$vm")/task for task in *; do /bin/echo $task > /mnt/important/tasks done done Then I used cssh to connect to all 8 VMs and execute dd if=/dev/zero of=testfile bs=1M count=1500 in all VMs simultaneously. Results are: kernel1: 47.5593 s, 33.1 MB/s kernel2: 60.1464 s, 26.2 MB/s kernel3: 74.204 s, 21.2 MB/s kernel4: 77.0759 s, 20.4 MB/s kernel5: 65.6309 s, 24.0 MB/s kernel6: 81.1402 s, 19.4 MB/s kernel7: 70.3881 s, 22.3 MB/s kernel8: 77.4475 s, 20.3 MB/s Results vary a little bit from run to run, but it is nothing spectacular, as weights of 1000 vs. 100 would suggest. So I went and tried to throttle I/O of kernel3-8 to 10MB/s instead of weighing I/O. First I rebooted everything so that no old configuration of cgroup was left in place and then setup everything except the 100 and 1000 weight configuration. quote from blkio.txt: ------------ - blkio.throttle.write_bps_device - Specifies upper limit on WRITE rate to the device. IO rate is specified in bytes per second. Rules are per deivce. Following is the format. echo "<major>:<minor> <rate_bytes_per_second>" > /cgrp/blkio.write_bps_device ------------- for vm in kernel1 kernel2 kernel3 kernel4 kernel5 kernel6 kernel7 kernel8; do ls -lH /dev/vdisks/$vm; done brw-rw---- 1 root root 254, 23 Feb 18 13:45 /dev/vdisks/kernel1 brw-rw---- 1 root root 254, 24 Feb 18 13:45 /dev/vdisks/kernel2 brw-rw---- 1 root root 254, 25 Feb 18 13:45 /dev/vdisks/kernel3 brw-rw---- 1 root root 254, 26 Feb 18 13:45 /dev/vdisks/kernel4 brw-rw---- 1 root root 254, 27 Feb 18 13:45 /dev/vdisks/kernel5 brw-rw---- 1 root root 254, 28 Feb 18 13:45 /dev/vdisks/kernel6 brw-rw---- 1 root root 254, 29 Feb 18 13:45 /dev/vdisks/kernel7 brw-rw---- 1 root root 254, 30 Feb 18 13:45 /dev/vdisks/kernel8 /bin/echo 254:25 10000000 > /mnt/notimportant/blkio.throttle.write_bps_device /bin/echo 254:26 10000000 > /mnt/notimportant/blkio.throttle.write_bps_device /bin/echo 254:27 10000000 > /mnt/notimportant/blkio.throttle.write_bps_device /bin/echo 254:28 10000000 > /mnt/notimportant/blkio.throttle.write_bps_device /bin/echo 254:29 10000000 > /mnt/notimportant/blkio.throttle.write_bps_device /bin/echo 254:30 10000000 > /mnt/notimportant/blkio.throttle.write_bps_device /bin/echo 254:30 10000000 > /mnt/notimportant/blkio.throttle.write_bps_device Then I ran the previous test again. This resulted in an ever increasing load (last I checked was ~ 300) on the host system. (This is perfectly reproducible). uptime Fri Feb 18 14:42:17 2011 14:42:17 up 12 min, 9 users, load average: 286.51, 142.22, 56.71 So, at least in my case, it does not seem to work too well (yet). Regards Dominik

Dominik Klein wrote:
Hi
back with some testing results.
how about the start Guest with option "cache=none" to bypass pagecache? This should help i think. I will read up on where to set that and give it a try. Thanks for the hint.
So here's what I did and found out:
The host system has 2 12 core CPUs and 128 GB of Ram.
I have 8 test VMs named kernel1 to kernel8. Each VM has 4 VCPUs, 2 GB of RAm and one disk, which is an lv on the host. Cache mode is "none":
for vm in kernel1 kernel2 kernel3 kernel4 kernel5 kernel6 kernel7 kernel8; do virsh dumpxml $vm|grep cache; done <driver name='qemu' type='raw' cache='none'/> <driver name='qemu' type='raw' cache='none'/> <driver name='qemu' type='raw' cache='none'/> <driver name='qemu' type='raw' cache='none'/> <driver name='qemu' type='raw' cache='none'/> <driver name='qemu' type='raw' cache='none'/> <driver name='qemu' type='raw' cache='none'/> <driver name='qemu' type='raw' cache='none'/>
My goal is to give more I/O time to kernel1 and kernel2 than to the rest of the VMs.
mount -t cgroup -o blkio none /mnt cd /mnt mkdir important mkdir notimportant
echo 1000 > important/blkio.weight echo 100 > notimportant/blkio.weight for vm in kernel3 kernel4 kernel5 kernel6 kernel7 kernel8; do cd /proc/$(pgrep -f "qemu-kvm.*$vm")/task for task in *; do /bin/echo $task > /mnt/notimportant/tasks done done
for vm in kernel1 kernel2; do cd /proc/$(pgrep -f "qemu-kvm.*$vm")/task for task in *; do /bin/echo $task > /mnt/important/tasks done done
Then I used cssh to connect to all 8 VMs and execute dd if=/dev/zero of=testfile bs=1M count=1500 in all VMs simultaneously.
Hi Dominik, Please add "oflag=direct" to see how it goes. Thanks, Gui
Results are: kernel1: 47.5593 s, 33.1 MB/s kernel2: 60.1464 s, 26.2 MB/s kernel3: 74.204 s, 21.2 MB/s kernel4: 77.0759 s, 20.4 MB/s kernel5: 65.6309 s, 24.0 MB/s kernel6: 81.1402 s, 19.4 MB/s kernel7: 70.3881 s, 22.3 MB/s kernel8: 77.4475 s, 20.3 MB/s
Results vary a little bit from run to run, but it is nothing spectacular, as weights of 1000 vs. 100 would suggest.
So I went and tried to throttle I/O of kernel3-8 to 10MB/s instead of weighing I/O. First I rebooted everything so that no old configuration of cgroup was left in place and then setup everything except the 100 and 1000 weight configuration.
quote from blkio.txt: ------------ - blkio.throttle.write_bps_device - Specifies upper limit on WRITE rate to the device. IO rate is specified in bytes per second. Rules are per deivce. Following is the format.
echo "<major>:<minor> <rate_bytes_per_second>" > /cgrp/blkio.write_bps_device -------------
for vm in kernel1 kernel2 kernel3 kernel4 kernel5 kernel6 kernel7 kernel8; do ls -lH /dev/vdisks/$vm; done brw-rw---- 1 root root 254, 23 Feb 18 13:45 /dev/vdisks/kernel1 brw-rw---- 1 root root 254, 24 Feb 18 13:45 /dev/vdisks/kernel2 brw-rw---- 1 root root 254, 25 Feb 18 13:45 /dev/vdisks/kernel3 brw-rw---- 1 root root 254, 26 Feb 18 13:45 /dev/vdisks/kernel4 brw-rw---- 1 root root 254, 27 Feb 18 13:45 /dev/vdisks/kernel5 brw-rw---- 1 root root 254, 28 Feb 18 13:45 /dev/vdisks/kernel6 brw-rw---- 1 root root 254, 29 Feb 18 13:45 /dev/vdisks/kernel7 brw-rw---- 1 root root 254, 30 Feb 18 13:45 /dev/vdisks/kernel8
/bin/echo 254:25 10000000 > /mnt/notimportant/blkio.throttle.write_bps_device /bin/echo 254:26 10000000 > /mnt/notimportant/blkio.throttle.write_bps_device /bin/echo 254:27 10000000 > /mnt/notimportant/blkio.throttle.write_bps_device /bin/echo 254:28 10000000 > /mnt/notimportant/blkio.throttle.write_bps_device /bin/echo 254:29 10000000 > /mnt/notimportant/blkio.throttle.write_bps_device /bin/echo 254:30 10000000 > /mnt/notimportant/blkio.throttle.write_bps_device /bin/echo 254:30 10000000 > /mnt/notimportant/blkio.throttle.write_bps_device
Then I ran the previous test again. This resulted in an ever increasing load (last I checked was ~ 300) on the host system. (This is perfectly reproducible).
uptime
Fri Feb 18 14:42:17 2011
14:42:17 up 12 min, 9 users, load average: 286.51, 142.22, 56.71
So, at least in my case, it does not seem to work too well (yet).
Regards Dominik
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

Ahh, sure. Dominik Klein wrote:
Please add "oflag=direct" to see how it goes.
Any objections on continuing this discussion in thread "blkio cgroup" where Vivek Goyal, the blkio controller author, replied to my posting?
I'm in the middle of answering his questions there.
Regards Dominik
-- Regards Gui Jianfeng

On Fri, Feb 18, 2011 at 04:12:28PM +0800, Gui Jianfeng wrote:
"blkio.weight" is used to control the Minimal Maximal Bandwidth. If you'd like to control Max bandwidth to let your guests become I/O hungery. "blkio.throttle.*" should helps. But these tunables aren't supported by blkiotune for the time being. I'm considering to implement them in the future.
I'm still a bit confused. The "weight" controls the 'percentage' of resources the VM gets compared to other VMs (or other processes participating in the cgroup)? If I have 5 processes running on the system, and 4 of them are VMs using cgroups, then the VMs get 4/5s of all CPU time for IO tasks, and each VM gets an amount of that 4/5th proportional to it's weight? Yes? Then, blkio.throttle can be used to set actual hard limits on IO consumption -- like, no more than 10Mbps for this VM. I can already do this with blkio cgroups, but just not from libvirt -- yes? Like many people using libvirt, I only use my hosts for running VMs, and I am most interested in limiting IO operations per host to make sure that no host monopolizes our SAN. Is there a way for me to just set a hard IO throttle limit for libvirt as a whole which would apply to all child VMs created by libvirt? I saw a discussion on this list regarding group hierarchy, but it went over my head. --Igor

Igor Serebryany wrote:
On Fri, Feb 18, 2011 at 04:12:28PM +0800, Gui Jianfeng wrote:
"blkio.weight" is used to control the Minimal Maximal Bandwidth. If you'd like to control Max bandwidth to let your guests become I/O hungery. "blkio.throttle.*" should helps. But these tunables aren't supported by blkiotune for the time being. I'm considering to implement them in the future.
I'm still a bit confused. The "weight" controls the 'percentage' of resources the VM gets compared to other VMs (or other processes participating in the cgroup)? If I have 5 processes running on the system, and 4 of them are VMs using cgroups, then the VMs get 4/5s of all CPU time for IO tasks, and each VM gets an amount of that 4/5th proportional to it's weight? Yes?
Yes, if all weight are the same.
Then, blkio.throttle can be used to set actual hard limits on IO consumption -- like, no more than 10Mbps for this VM. I can already do this with blkio cgroups, but just not from libvirt -- yes?
yes
Like many people using libvirt, I only use my hosts for running VMs, and I am most interested in limiting IO operations per host to make sure that no host monopolizes our SAN. Is there a way for me to just set a hard IO throttle limit for libvirt as a whole which would apply to all child VMs created by libvirt? I saw a discussion on this list regarding group hierarchy, but it went over my head.
There's no way to set this limit by libvirt. But, I think you can set the limit by hand in cgroup directory. ;) Thanks, Gui
--Igor

Gui Jianfeng wrote:
Igor Serebryany wrote:
"blkio.weight" is used to control the Minimal Maximal Bandwidth. If you'd like to control Max bandwidth to let your guests become I/O hungery. "blkio.throttle.*" should helps. But these tunables aren't supported by blkiotune for the time being. I'm considering to implement them in the future. I'm still a bit confused. The "weight" controls the 'percentage' of resources the VM gets compared to other VMs (or other processes
On Fri, Feb 18, 2011 at 04:12:28PM +0800, Gui Jianfeng wrote: participating in the cgroup)? If I have 5 processes running on the system, and 4 of them are VMs using cgroups, then the VMs get 4/5s of all CPU time for IO tasks, and each VM gets an amount of that 4/5th proportional to it's weight? Yes?
Yes, if all weight are the same.
I think here will be misleading. I means if VMs will get 4/5 bandwidth, the ratio of total weight of VMs and the group holds the other task should be 4:1. Thanks, Gui
Then, blkio.throttle can be used to set actual hard limits on IO consumption -- like, no more than 10Mbps for this VM. I can already do this with blkio cgroups, but just not from libvirt -- yes?
yes
Like many people using libvirt, I only use my hosts for running VMs, and I am most interested in limiting IO operations per host to make sure that no host monopolizes our SAN. Is there a way for me to just set a hard IO throttle limit for libvirt as a whole which would apply to all child VMs created by libvirt? I saw a discussion on this list regarding group hierarchy, but it went over my head.
There's no way to set this limit by libvirt. But, I think you can set the limit by hand in cgroup directory. ;)
Thanks, Gui
--Igor

Hi I can only comment on some of your questions as I also can't say to understand this completely.
Then, blkio.throttle can be used to set actual hard limits on IO consumption -- like, no more than 10Mbps for this VM. I can already do this with blkio cgroups, but just not from libvirt -- yes?
Yes. But bear in mind that this only works for sync I/O, not for buffered I/O (read my discussion with Gui).
Like many people using libvirt, I only use my hosts for running VMs, and I am most interested in limiting IO operations per host to make sure that no host monopolizes our SAN. Is there a way for me to just set a hard IO throttle limit for libvirt as a whole which would apply to all child VMs created by libvirt? I saw a discussion on this list regarding group hierarchy, but it went over my head.
I did not read the other discussion, but this comes to my mind when thinking about your question: You could try to put the libvirtd process into a cgroup that has certain limits configured. Every spawned child of this process will inherit those limits. Not sure if a VM is actually a libvirtd-child though, since at least "ps" shows them as init's children on my systems. If you try this, please report back what you did and how it worked. Regards Dominik
participants (4)
-
Dominik Klein
-
Eric Blake
-
Gui Jianfeng
-
Igor Serebryany