On Wed, Jun 17, 2015 at 10:55:35PM +0300, Andrey Korolyov wrote:
Sorry for a delay, the 'perf numa numa-mem -p 8 -t 2 -P 384 -C 0 -M 0
-s 200 -zZq --thp 1 --no-data_rand_walk' exposes a difference of value
0.96 by 1. The trick I did (and successfully forget) before is in
setting the value of the cfs_quota in a machine wide group, up one
level from individual vcpus.
Right now, libvirt sets values from
<cputune>
<period>100000</period>
<quota>200000</quota>
</cputune>
for each vCPU thread cgroup, which is a bit wrong by my understanding , like
/cgroup/cpu/machine/vmxx/vcpu0: period=100000, quota=2000000
/cgroup/cpu/machine/vmxx/vcpu1: period=100000, quota=2000000
/cgroup/cpu/machine/vmxx/vcpu2: period=100000, quota=2000000
/cgroup/cpu/machine/vmxx/vcpu3: period=100000, quota=2000000
In other words, the user (me) assumed that he limited total
consumption of the VM by two cores total, though all every thread can
consume up to a single CPU, resulting in a four-core consumption
instead. With different cpu count/quota/host cpu count ratios there
would be different practical limitations with same period to quota
ratio, where a single total quota will result in much more predictable
top consumption. I had put the same quota to period ratio in a
VM-level directory to meet the expectancies from a config setting and
there one can observe a mentioned performance drop.
With default placement there is no difference in a performance
numbers, but the behavior of the libvirt itself is kinda controversial
there. The documentation says that this is a right behavior as well,
but I think that the limiting the vcpu group with total quota is far
more flexible than per-vcpu limitations which can negatively impact
single-threaded processes in the guest, plus the overall consumption
should be recalculated every time when host core count or guest core
count changes. Sorry for not mentioning the custom scheme before, if
mine assumption about execution flexibility is plainly wrong, I`ll
withdraw my concerns from above. I am using the 'mine' scheme for a
couple of years in production and it is proved (for me) to be a far
less complex for a workload balancing for a cpu-congested hypervisor
than a generic one.
As you say there are two possible directions libvirt was able to take
when implementing the schedular tunables. Either apply them to the
VM as a whole, or apply them to the individual vCPUS. We debated this
a fair bit, but in the end we took the per-VCPU approach. There were
two real compelling reasons. First, if users have 2 guests with
identical configurations, but give one of the guests 2 vCPUs and the
other guest 4 vCPUs, the general expectation is that the one with
4 vCPUS will have twice the performance. If we apply the CFS tuning
at the VM level, then as you added vCPUs you'd get no increase in
performance. The second reason was that people wanted to be able to
control performance of the emulator threads, separately from the
vCPU threads. Now we also have dedicated I/O threads that can have
different tuning set. This would be impossible if we were always
setting stuff at the VM level.
It would in theory be possible for us to add a further tunable to the
VM config which allowed VM level tuning. eg we could define something
like
<vmtune>
<period>100000</period>
<quota>200000</quota>
</vmtune>
Semantically, if <vmtune> was set, we would then forbid use of the
<cputune> and <emulatortune> configurations, as they'd be mutually
exclusive. In such a case we'd avoid creating the sub-cgroups for
vCPUs and emulator threads, etc.
The question is whether the benefit would outweigh the extra code
complexity to deal with this. I appreciate you would desire this
kind of setup, but I think we'd probably need more than one person
requesting use of this kind of setup in order to justify the work
involved.
Regards,
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|