Re: [libvirt] Overhead for a default cpu cg placement scheme

11 Jun 2015

      On Thu, Jun 11, 2015 at 02:16:50PM +0300, Andrey Korolyov wrote:
...
On Thu, Jun 11, 2015 at 2:09 PM, Daniel P. Berrange <berrange@redhat.com> wrote:
...
On Thu, Jun 11, 2015 at 01:50:24PM +0300, Andrey Korolyov wrote:
...
Hi Daniel,
would it possible to adopt an optional tunable for a virCgroup
mechanism which targets to a disablement of a nested (per-thread)
cgroup creation? Those are bringing visible overhead for many-threaded
guest workloads, almost 5% in non-congested host CPU state, primarily
because the host scheduler should make a much more decisions with
those cgroups than without them. We also experienced a lot of host
lockups with currently exploited cgroup placement and disabled nested
behavior a couple of years ago. Though the current patch is simply
carves out the mentioned behavior, leaving only top-level per-machine
cgroups, it can serve for an upstream after some adaptation, that`s
why I`m asking about a chance of its acceptance. This message is a
kind of 'request of a feature', it either can be accepted/dropped from
our side or someone may give a hand and redo it from scratch. The
detailed benchmarks are related to a host 3.10.y, if anyone is
interested in the numbers for latest stable, I can update those.
When you say nested cgroup creation, as you referring to the modern
libvirt hierarchy, or the legacy hierarchy - as described here:
http://libvirt.org/cgroups.html
The current libvirt setup used for a year or so now is much shallower
than previously, to the extent that we'd consider performance problems
with it to be the job of the kernel to fix.
Thanks, I`m referring to a 'new nested' hiearchy for an overhead
mentioned above. The host crashes I mentioned happened with old
hierarchy back ago, forgot to mention this. Despite the flattening of
the topo for the current scheme it should be possible to disable fine
group creation for the VM threads for some users who don`t need
per-vcpu cpu pinning/accounting (though overhead caused by a placement
for cpu cgroup, not by accounting/pinning ones, I`m assuming equal
distribution with such disablement for all nested-aware cgroup types),
that`s the point for now.
Ok, so the per-vCPU cgroups are used for a couple of things

 - Setting scheduler tunables - period/quota/shares/etc
 - Setting CPU pinning
 - Setting NUMA memory pinning

In addition to the per-VCPU cgroup, we have one cgroup fr each
I/O thread, and also one more for general QEMU emulator threads.

In the case of CPU pinning we already have automatic fallback to
sched_setaffinity if the CPUSET controller isn't available.

We could in theory start off without the per-vCPU/emulator/I/O
cgroups and only create them as & when the feature is actually
used. The concern I would have though is that changing the cgroups
layout on the fly may cause unexpected sideeffects in behaviour of
the VM. More critically, there would be alot of places in the code
where we would need to deal with this which could hurt maintainability.

How confident are you that the performance problems you see are inherant
to the actual use of the cgroups, and not instead as a result of some
particular bad choice of default parameters we might have left in the
cgroups ?  In general I'd have a desire to try to work to eliminate the
perf impact before we consider the complexity of disabling this feature

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|