[libvirt] RFC: APIs for managing resource groups

Historically for the QEMU/LXC drivers we've simply put each virtual instance in a dedicated cgroup, under the path $LIBVIRT_CGROUP_LOCATION | +- libvirt | +- qemu | | | +- vm1 | +- vm2 | +- vm3 | +- lxc | +- cont1 +- cont2 +- cont3 for a variety of reasons this nesting sucks. It is too deep causing kernel performance problems, its structure does not easily allow for calculating fixed % shares, it does not allow for grouping of VMs. We need to simplify our layout and also introduce some APIs for the grouping of VMs. I won't go into specifics of a new cgroups layout here, just focus on the question of defining a set of APIs that are generic to any hypervisor, for the purpose of setting up VM resource groups. I'm calling the resource cgroup a "partition", since this is all about partitioning workloads. I anticipate a new top level object and APIs for creating/defining it in the usual manner: typedef struct _virPartition virPartition; typedef virPartition *virPartitionPtr; int virConnectListAllPartitions(virConnectPtr conn, virPartitionPtr **partitions, unsigned int flags); virPartitionPtr virPartitionDefineXML(virConnectPtr conn, const char *xml, unsigned int flags); int virPartitionCreate(virPartitionPtr partition, unsigned int flags); int virPartitionCreateXML(virPartitionPtr partition, const char *xml, unsigned int flags); int virPartitionDestroy(virPartitionPtr partition, unsigned int flags); int virPartitionUndefine(virPartitionPtr partition, unsigned int flags); Then I think we'll duplicate all the APIs for setting resource tunables from virDomainPtr against the new object, so we get int virPartitionGetSchedulerParameters(virPartitionPtr partition, virTypedParameterPtr params, int *nparams, unsigned int flags); int virPartitionSetSchedulerParameters(virPartitionPtr partition, virTypedParameterPtr params, int nparams, unsigned int flags) int virDomainSetBlkioParameters(virDomainPtr domain, virTypedParameterPtr params, int nparams, unsigned int flags); int virDomainGetBlkioParameters(virDomainPtr domain, virTypedParameterPtr params, int *nparams, unsigned int flags); int virDomainSetMemoryParameters(virDomainPtr domain, virTypedParameterPtr params, int nparams, unsigned int flags); int virDomainGetMemoryParameters(virDomainPtr domain, virTypedParameterPtr params, int *nparams, unsigned int flags); int virDomainSetNumaParameters(virDomainPtr domain, virTypedParameterPtr params, int nparams, unsigned int flags); int virDomainGetNumaParameters(virDomainPtr domain, virTypedParameterPtr params, int *nparams, unsigned int flags); Finally we need a way to associate a domain with a partition virPartitionPtr virDomainGetPartition(virDomainPtr dom, unsigned int flags); void virDomainSetPartition(virDomainPtr dom, unsigned int flags); There'd also likely be a new VM XML element <partition name="..partition name..."/> which is what the Get/SetPartition methods would be touching. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 02/25/2013 05:41 AM, Daniel P. Berrange wrote:
Historically for the QEMU/LXC drivers we've simply put each virtual instance in a dedicated cgroup, under the path
We need to simplify our layout and also introduce some APIs for the grouping of VMs. I won't go into specifics of a new cgroups layout here, just focus on the question of defining a set of APIs that are generic to any hypervisor, for the purpose of setting up VM resource groups.
I'm very much in favor of VM resource groups. In fact, this RFC has come up in the past, if it gives you any ideas of what you replied back then: https://www.redhat.com/archives/libvir-list/2011-March/msg01546.html
I'm calling the resource cgroup a "partition", since this is all about partitioning workloads.
Yes, that naming is workable, and a bit better than what I tried last time.
I anticipate a new top level object and APIs for creating/defining it in the usual manner:
<snip good API>
There'd also likely be a new VM XML element
<partition name="..partition name..."/>
which is what the Get/SetPartition methods would be touching.
Earlier, you pointed out that it might make sense to have multiple partitions per domain - that is, have one partitioning that controls only memory usage, and another partitioning that controls only cpu usage, then have a domain that belongs to two orthogonal partitions to cap both memory and cpu. Your proposal today doesn't seem to deal with the idea of having multiple partitions per domain. Also, while you proposed having a domain belong to a partition, you didn't cover whether it makes sense to have a hierarchy of partitions, where one partition can provide further constraints on top of a parent partition. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Mon, Feb 25, 2013 at 09:48:16AM -0700, Eric Blake wrote:
On 02/25/2013 05:41 AM, Daniel P. Berrange wrote:
Historically for the QEMU/LXC drivers we've simply put each virtual instance in a dedicated cgroup, under the path
We need to simplify our layout and also introduce some APIs for the grouping of VMs. I won't go into specifics of a new cgroups layout here, just focus on the question of defining a set of APIs that are generic to any hypervisor, for the purpose of setting up VM resource groups.
I'm very much in favor of VM resource groups. In fact, this RFC has come up in the past, if it gives you any ideas of what you replied back then: https://www.redhat.com/archives/libvir-list/2011-March/msg01546.html
I'm calling the resource cgroup a "partition", since this is all about partitioning workloads.
Yes, that naming is workable, and a bit better than what I tried last time.
I anticipate a new top level object and APIs for creating/defining it in the usual manner:
<snip good API>
There'd also likely be a new VM XML element
<partition name="..partition name..."/>
which is what the Get/SetPartition methods would be touching.
Earlier, you pointed out that it might make sense to have multiple partitions per domain - that is, have one partitioning that controls only memory usage, and another partitioning that controls only cpu usage, then have a domain that belongs to two orthogonal partitions to cap both memory and cpu. Your proposal today doesn't seem to deal with the idea of having multiple partitions per domain. Also, while you proposed having a domain belong to a partition, you didn't cover whether it makes sense to have a hierarchy of partitions, where one partition can provide further constraints on top of a parent partition.
While cgroups currently allows you to setup different hiearchies for memory, cpu, blockio, etc controls, these days it is agreed that this is an anti-feature causing no end of trouble for all involved. In the future I expect the kernel will enforce that we use the same hierarchy for all cgroup controllers. In other words, we only want a single partition for all resources. I do anticipate that we'll be able to create partition hierarchies, though it may be discouraged since it has performance implications for the kernel that are currently somewhat unacceptable. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Mon, Feb 25, 2013 at 12:41:06 +0000, Daniel P. Berrange wrote: ...
Then I think we'll duplicate all the APIs for setting resource tunables from virDomainPtr against the new object, so we get
int virPartitionGetSchedulerParameters(virPartitionPtr partition, virTypedParameterPtr params, int *nparams, unsigned int flags);
My comment is not really specific to resource groups but since you suggest to copy APIs that return typed parameters, could you make them autoallocate params array (see virDomainJobGetStats)? That is int virPartitionGetSchedulerParameters(virPartitionPtr partition, virTypedParameterPtr *params, int *nparams, unsigned int flags); Having to call each API twice, first to get the number of parameters it can return and then to get the actual parameters is a horrible design. Jirka
participants (3)
-
Daniel P. Berrange
-
Eric Blake
-
Jiri Denemark