On Thu, Aug 14, 2014 at 04:25:05PM +0200, Radim Krčmář wrote:
Hello,
by default, libvirt with KVM creates a Cgroup hierarchy in 'cpu,cpuacct'
[1], with 'shares' set to 1024 on every level. This raises two points:
1) Every VM is given an equal amount of CPU time. [2]
($CG/machine.slice/*/shares = 1024)
Which means that smaller / less loaded guests are given an advantage.
This is a default with which we do nothing unless the user (or mgmt
app) wants to. What you say is true only when there is no spare time
(the machines need more time than available). Such overcommit is the
problem of the user, I'd say.
2) All VMs combined are given 1024 shares. [3]
($CG/machine.slice/shares)
This is a problem even on system without slices (systemd), because
there is /machine/cpu.shares == 1024 anyway. Is there a way to
disable hierarchy in this case (to say cpu.shares=-1 for example)?
Because if not, then it has only limited use (we cannot prepare the
hierarchy and just write a number in some file when we want to start
using it). That's a pity, but there are probably less use cases then
hundreds of lines of code that would need to be changed in order to
support this in kernel.
This is made even worse on RHEL7, by sched_autogroup_enabled = 0,
so
every other process in the system is given the same amount of CPU as
all VMs combined.
But sched_autogroup_enabled = 1 wouldn't make it much better, since it
would group the machines together anyway, right?
It does not seem to be possible to tune shares and get a good general
behavior, so the best solution I can see is to disable the cpu cgroup
and let users do it when needed. (Keeping all tasks in $CG/tasks.)
I agree with you that it's not the best default scenario we can do,
and maybe not using cgroups until needed would bring us a good
benefit. That is for cgroups like cpu and blkio only, I think.
Do we want cgroups in the default at all?
(Is OpenStack dealing with these quirks?)
Thanks.
---
1: machine.slice/
machine-qemu\\x2d${name}.scope/
{emulator,vcpu*}/
2: To reproduce, run two guests with > 1 VCPU and execute two spinners
on the first and one on the second.
The result will be 50%/50% CPU assignment between guests; 66%/33%
seems more natural, but it could still be considered as a feature.
3: Run a guest with $n VCPUs and $n spinners in it, and $n spinners in
the host
- RHEL7: 1/($n + 1)% CPU for the guest -- I'd expect 50%/50%.
- Upstream: 50%/50% between guest and host because of autogrouping;
if you run $n more spinners in the host, it will still be 50%/50%,
instead of seemingly more fair 33%/66%. (And you can run spinners
from different groups, so it would be the same as in RHEL7 then.)
And it also works the other way: if the host has $n CPUs, then
$n/2 tasks in the host suffice to minimize VMs' performance,
regardless of the amount of running VCPUs.
--
libvir-list mailing list
libvir-list(a)redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list