
2014-08-14 13:55-0400, Andrew Theurer:
----- Original Message -----
From: "Radim Krčmář" <rkrcmar@redhat.com> To: libvir-list@redhat.com Cc: "Daniel P. Berrange" <berrange@redhat.com>, "Andrew Theurer" <atheurer@redhat.com> Sent: Thursday, August 14, 2014 9:25:05 AM Subject: Suboptimal default cpu Cgroup
Hello,
by default, libvirt with KVM creates a Cgroup hierarchy in 'cpu,cpuacct' [1], with 'shares' set to 1024 on every level. This raises two points:
1) Every VM is given an equal amount of CPU time. [2] ($CG/machine.slice/*/shares = 1024)
Which means that smaller / less loaded guests are given an advantage.
2) All VMs combined are given 1024 shares. [3] ($CG/machine.slice/shares)
This is made even worse on RHEL7, by sched_autogroup_enabled = 0, so every other process in the system is given the same amount of CPU as all VMs combined.
It does not seem to be possible to tune shares and get a good general behavior, so the best solution I can see is to disable the cpu cgroup and let users do it when needed. (Keeping all tasks in $CG/tasks.)
Could we have each VM's shares be nr_vcpu * 1024, and the share for $CG/machine.slice be sum of all VM's share?
That would be unfair in a different way ... some examples: VM's shares = nr_vcpu * 1024: - 1 and 10 VCPU guests both running only one task in overcommit, larger guest gets 10 times more CPU. (Feature?) $CG/machine.slice = sum (VM's shares): - 'shares' are bound by 262144 right now, so it wouldn't scale beyond one large guest. (Not a big problem, but has ugly solutions.) - Default system tasks still have 1024, so their share would get unfairly small if we had some idle guests as well. 10 CPU machine with 10*10 VCPU guests, only one of which is actively running: A non-vm task would get just ~1% of the CPU, not ~10%, like we would expect with 11 running tasks. And it would be even worse with autogrouping. ---
[...]
2: To reproduce, run two guests with > 1 VCPU and execute two spinners on the first and one on the second. The result will be 50%/50% CPU assignment between guests; 66%/33% seems more natural, but it could still be considered as a feature.
(Please note a mistake here: the host is implied to have 1-2 CPUs. It would have been better to use nr_cpus as well ...)