[libvirt] Globally Reserve Resources for Host

Hi, I am interested in the capability to globally reserve resources(cpu and memory) for a KVM host. I know you can configure memory limits for each guest (http://libvirt.org/formatdomain.html#elementsMemoryTuning), but would like the ability to reserve host cpu and memory without having to actively do it by modifying each guests xml. For clarity, what I mean by "reserve resources" is that there are certain cpus and a certain amount of memory that guests will never have access to. This can be achieved using cgroups. Does anyone think this functionality would be useful? This is primarily to prevent the host from being starved when the allocation of guests have the host overcommitted/oversubscribed. Note: I think vmware has a similar functionality, but I am not sure as I don't really use vmware http://blogs.technet.com/b/virtualpfe/archive/2011/08/29/hyper-v-dynamic-mem... Thanks for any thoughts, Dusty

On 11/01/2012 07:16 AM, Dusty Mabe wrote:
Hi,
I am interested in the capability to globally reserve resources(cpu and memory) for a KVM host. I know you can configure memory limits for each guest (http://libvirt.org/formatdomain.html#elementsMemoryTuning), but would like the ability to reserve host cpu and memory without having to actively do it by modifying each guests xml.
For clarity, what I mean by "reserve resources" is that there are certain cpus and a certain amount of memory that guests will never have access to. This can be achieved using cgroups.
Does anyone think this functionality would be useful? This is primarily to prevent the host from being starved when the allocation of guests have the host overcommitted/oversubscribed.
Yes, this functionality would be useful. In fact, I even proposed a possible design, and Dan chimed in with some improvements (but no one has coded anything towards that design) ages ago: https://www.redhat.com/archives/libvir-list/2011-March/msg01546.html Basically, you would create a virGroupPtr that describes the entire resources you are willing to allow to VMs, then ensure that all VMs are members of that virGroupPtr. The idea of a virGroupPtr also makes it possible to isolate host-specific management out of per-guest XML, making it easier to migrate guests. For example, instead of hard-coding that a guest is pinned to host cpus 4-7, you would state that a guest's cpu usage is determined by a named virGroup policy, and as long as that named policy provides 4 cpus on both source and destination, the actual decision of WHICH 4 host cpus are in the group can differ between the hosts, so that you don't have to have the same cpus available on the migration destination, and don't have to write a migration hook to rewrite the XML to redo the pinning. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Thu, Nov 1, 2012 at 11:09 AM, Eric Blake <eblake@redhat.com> wrote:
On 11/01/2012 07:16 AM, Dusty Mabe wrote:
Does anyone think this functionality would be useful? This is primarily to prevent the host from being starved when the allocation of guests have the host overcommitted/oversubscribed.
Yes, this functionality would be useful. In fact, I even proposed a possible design, and Dan chimed in with some improvements (but no one has coded anything towards that design) ages ago: https://www.redhat.com/archives/libvir-list/2011-March/msg01546.html
Basically, you would create a virGroupPtr that describes the entire resources you are willing to allow to VMs, then ensure that all VMs are members of that virGroupPtr.
Thanks for the response and some pointers to relevant design information. If I get some free cycles to implement such a feature I will get back to you with some more targeted design questions or maybe even a possible implementation. Dusty

On Thu, Nov 1, 2012 at 11:32 PM, Dusty Mabe <dustymabe@gmail.com> wrote:
On Thu, Nov 1, 2012 at 11:09 AM, Eric Blake <eblake@redhat.com> wrote:
Basically, you would create a virGroupPtr that describes the entire resources you are willing to allow to VMs, then ensure that all VMs are members of that virGroupPtr.
Thanks for the response and some pointers to relevant design information. If I get some free cycles to implement such a feature I will get back to you with some more targeted design questions or maybe even a possible implementation.
I think i have a very minimal implementation of what I proposed in my original email ("reserving resources for host"). It is not quite as featureful as what you discussed with danpb (https://www.redhat.com/archives/libvir-list/2011-March/msg01546.html), but it is a small amount of work and should be worth the effort. As of right now this is specifically for qemu - Basically the idea is that right after the cgroup gets created for the qemu driver we will set memory and cpu restrictions for the group like so: from qemu_driver.c: rc = virCgroupForDriver("qemu", &qemu_driver->cgroup, privileged, 1); rc = virCgroupSetMemory(&qemu_driver->cgroup, availableMem); rc = virCgroupSetCpus(&qemu_driver->cgroup, availableCpus); The user will provide values in qemu.conf for "reservedHostMem" and "reservedHostCpus" and then availableMem and availableCpus would be calculated from that. If no values were provided in the conf then simply act as normal and don't enforce any "restrictions". We may also want to expose this "setting" in virsh so that we could change the value once up and running. Does this seem trivial to implement as I suggest? Are there any flaws with this idea? Thanks, Dusty

On Wed, Nov 14, 2012 at 11:22 AM, Dusty Mabe <dustymabe@gmail.com> wrote:
On Thu, Nov 1, 2012 at 11:32 PM, Dusty Mabe <dustymabe@gmail.com> wrote:
I think i have a very minimal implementation of what I proposed in my original email ("reserving resources for host"). It is not quite as featureful as what you discussed with danpb (https://www.redhat.com/archives/libvir-list/2011-March/msg01546.html), but it is a small amount of work and should be worth the effort.
As of right now this is specifically for qemu - Basically the idea is that right after the cgroup gets created for the qemu driver we will set memory and cpu restrictions for the group like so:
from qemu_driver.c: rc = virCgroupForDriver("qemu", &qemu_driver->cgroup, privileged, 1); rc = virCgroupSetMemory(&qemu_driver->cgroup, availableMem); rc = virCgroupSetCpus(&qemu_driver->cgroup, availableCpus);
The user will provide values in qemu.conf for "reservedHostMem" and "reservedHostCpus" and then availableMem and availableCpus would be calculated from that. If no values were provided in the conf then simply act as normal and don't enforce any "restrictions".
We may also want to expose this "setting" in virsh so that we could change the value once up and running.
Does this seem trivial to implement as I suggest? Are there any flaws with this idea?
Hey Hey, Just thought i would ping on this thread to see if anyone had any input. I may try to code up a minimal implementation and send a patch if anyone thinks that would be useful in evaluating this feature. Thanks again! Dusty Mabe

On Wed, Nov 28, 2012 at 02:27:53PM -0500, Dusty Mabe wrote:
On Wed, Nov 14, 2012 at 11:22 AM, Dusty Mabe <dustymabe@gmail.com> wrote:
On Thu, Nov 1, 2012 at 11:32 PM, Dusty Mabe <dustymabe@gmail.com> wrote:
I think i have a very minimal implementation of what I proposed in my original email ("reserving resources for host"). It is not quite as featureful as what you discussed with danpb (https://www.redhat.com/archives/libvir-list/2011-March/msg01546.html), but it is a small amount of work and should be worth the effort.
As of right now this is specifically for qemu - Basically the idea is that right after the cgroup gets created for the qemu driver we will set memory and cpu restrictions for the group like so:
from qemu_driver.c: rc = virCgroupForDriver("qemu", &qemu_driver->cgroup, privileged, 1); rc = virCgroupSetMemory(&qemu_driver->cgroup, availableMem); rc = virCgroupSetCpus(&qemu_driver->cgroup, availableCpus);
The user will provide values in qemu.conf for "reservedHostMem" and "reservedHostCpus" and then availableMem and availableCpus would be calculated from that. If no values were provided in the conf then simply act as normal and don't enforce any "restrictions".
We may also want to expose this "setting" in virsh so that we could change the value once up and running.
Does this seem trivial to implement as I suggest? Are there any flaws with this idea?
Hey Hey,
Just thought i would ping on this thread to see if anyone had any input. I may try to code up a minimal implementation and send a patch if anyone thinks that would be useful in evaluating this feature.
Sorry for not replying before. I've been thinking about this today and am specifically wondering about the possible implications for a change we need to make to cgroups setup in libvirt The core issue is that it has become apparent that nesting cgroups can cause some very significant performance / scalability problems for the kernel. They have requested / recommended that libvirt make its cgroup hierarchy as flat as possible to avoid this problem. Libvirt currently creates a hierarchy 3 to 4 levels deep below the cgroup that libvirtd itself is placed in. $ROOT/$LIBVIRTD/libvirt/$DRIVERNAME/$VMNAME/{vcpu$VCPUNUM or emulator} eg with systemd you might get /libvirtd/libvirt/qemu/myvmname/vcpu0 /libvirtd/libvirt/qemu/myvmname/emulator /libvirtd/libvirt/lxc/mycontainter The second level is clearly redundant if systemd is already placing libvirtd in a private cgroup. The third and fourth levels could be optionally combined into '$DRIVERNAME-$VMNAME'. The last levels must remain unchanged. This would result in examples /libvirtd/qemu-myvmname/vcpu0 /libvirtd/qemu-myvmname/emulator /libvirtd/lxc-mycontainer We want this change to apply out of the box. In fact, my expectation is that we'll not actually hardcode this layout, but instead introduce some level of configurabilty to things, by having a 'cgroup_layout' config param for qemu.conf 1. Match current hardcoded layout: cgroup_layout="/::PROCESS::/libvirt/::DRIVER::/::VMNAME::" 2. Remove first level when used with systemd: cgroup_layout="/::PROCESS::/::DRIVER::/::VMNAME::" 3. Combine 3rd/4th levels too cgroup_layout="/::PROCESS::/::DRIVER::-::VMNAME::" 4. Ignore current libvirtd placement completely and create in the root: cgroup_layout="/libvirt/::DRIVER::-::VMNAME::" 5. Use UUID instead of VM name cgroup_layout="/libvirt/::DRIVER::-::VMUUID::" I'm sure you've noticed that this plan doesn't leave much scope for the approach you outlined above, since the 'qemu' level will disappear by default. On the plus side though, since we will have the flexibility to put VMs in a cgroup that is unrelated to the libvirtd cgroup, (options 4/5 above), you will be able to isolate VMs from the host that way, without needing explicit libvirt support. It is on my plate to get these changes done for Fedora 19 / RHEL-7 releases. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Wed, Nov 28, 2012 at 3:26 PM, Daniel P. Berrange <berrange@redhat.com> wrote:
On Wed, Nov 28, 2012 at 02:27:53PM -0500, Dusty Mabe wrote:
On Wed, Nov 14, 2012 at 11:22 AM, Dusty Mabe <dustymabe@gmail.com> wrote:
On Thu, Nov 1, 2012 at 11:32 PM, Dusty Mabe <dustymabe@gmail.com> wrote:
Sorry for not replying before. I've been thinking about this today and am specifically wondering about the possible implications for a change we need to make to cgroups setup in libvirt
No worries, thank you for responding!!
The core issue is that it has become apparent that nesting cgroups can cause some very significant performance / scalability problems for the kernel. They have requested / recommended that libvirt make its cgroup hierarchy as flat as possible to avoid this problem.
That makes sense. Does this affect how you feel about https://www.redhat.com/archives/libvir-list/2011-March/msg01546.html ?
1. Match current hardcoded layout:
cgroup_layout="/::PROCESS::/libvirt/::DRIVER::/::VMNAME::"
2. Remove first level when used with systemd:
cgroup_layout="/::PROCESS::/::DRIVER::/::VMNAME::"
3. Combine 3rd/4th levels too
cgroup_layout="/::PROCESS::/::DRIVER::-::VMNAME::"
4. Ignore current libvirtd placement completely and create in the root:
cgroup_layout="/libvirt/::DRIVER::-::VMNAME::"
5. Use UUID instead of VM name
cgroup_layout="/libvirt/::DRIVER::-::VMUUID::"
A couple more options: 6. cgroup_layout="/libvirt/::DRIVER::/VMNAME" 7. cgroup_layout="/libvirt/::DRIVER::/VMUUID"
I'm sure you've noticed that this plan doesn't leave much scope for the approach you outlined above, since the 'qemu' level will disappear by default.
It looks like you are right, by default there would be no driver level cgroup. What if we still had settings in qemu.conf for something like "reservedHostMem" and "reservedHostCpus" (maybe those names don't make sense, but I think you know what i mean) that would only be enforced if the cgroup_layout had a */::DRIVER::/* component (cases 1,2,6,7). In this case there would be a 'qemu' level. We could explain this dependency in the conf file and print out an error/warning in the logs if it wasn't followed.
It is on my plate to get these changes done for Fedora 19 / RHEL-7 releases.
Daniel
Thanks again for responding! I appreciate your input and it is helpful to know the path that libvirt is going down. I know what I am suggesting above may be a long shot but i figure it's worth at least airing the idea. I like it because too often my hosts get oversubscribed from other people creating guests and can cause wacky behavior (my problem, not yours). Dusty Mabe
participants (3)
-
Daniel P. Berrange
-
Dusty Mabe
-
Eric Blake