Re: [libvirt] cpu affinity, isolcpus and cgroups

3 Jul 2015

      On Thu, Jul 02, 2015 at 04:42:47PM +0200, Henning Schild wrote:
...
On Thu, 2 Jul 2015 15:18:46 +0100
"Daniel P. Berrange" <berrange@redhat.com> wrote:
...
On Thu, Jul 02, 2015 at 04:02:58PM +0200, Henning Schild wrote:
...
Hi,
i am currently looking into realtime VMs using libvirt. My first
starting point was reserving a couple of cores using isolcpus and
later tuning the affinity to place my vcpus on the reserved pcpus.
My first observation was that libvirt ignores isolcpus. Affinity
masks of new qemus will default to all cpus and will not be
inherited from libvirtd. A comment in the code suggests that this
is done on purpose.
Ignore realtime + isolcpus for a minute. It is not unreasonable for
the system admin to decide system services should be restricted to
run on a certain subset of CPUs. If we let VMs inherit the CPU
pinning on libvirtd, we'd be accidentally confining VMs to a subset
of CPUs too. With new cgroups layout, libvirtd lives in a separate
cgroups tree /system.slice, while VMs live in /machine.slice. So
for both these reasons, when starting VMs, we explicitly ignore
any affinity libvirtd has and set VMs mask to allow any CPU.
Sure, that was my first guess as well. Still i wanted to raise the
topic again from the realtime POV.
I am using a pretty recent libvirt from git but did not come across the
system.slice yet. Might be a matter of configuration/invocation of
libvirtd.
Oh, I should mention that I'm referring to OS that use systemd
for their init system here, not legacy sysvinit

FWIW our cgroups layout is described here

  http://libvirt.org/cgroups.html
...
...
...
After that i changed the code to use only the available cpus by
default. But taskset was still showing all 'f's on my qemus. Then i
traced my change down to sched_setaffinity assuming that some other
mechanism might have reverted my hack, but it is still in place.
From the libvirt POV, we can't tell whether the admin set isolcpus
because they want to reserve those CPUs only for VMs, or because
they want to stop VMs using those CPUs by default. As such libvirt
does not try to interpret isolcpus at all, it leaves it upto a
higher level app to decide on this policy.
I know, you have to tell libvirt that the reservation is actually for
libvirt. My idea was to introduce a config option in libvirt and maybe
sanity check it by looking at whether the pcpus are actually reserved.
Rik recently posted a patch to allow easy programmatic checking of
isolcpus via sysfs.
In libvirt we try to have a general principle that libvirt will
provide the mechanism but not implement usage policy. So if we
follow a strict interpretation here, then applying CPU mask
based on isolcpus would be out of scope for libvirt, since we
expose a sufficiently flexible mechanism to implement any
desired policy at a higher level.
...
...
In the case of OpenStack, the /etc/nova/nova.conf allows a config
setting  'vcpu_pin_set' to say what set of CPUs VMs should be allowed
to run on, and nova will then update the libvirt XML when starting
each guest.
I see, would it not still make sense to have that setting centrally in
libvirt? I am thinking about people not using nova but virsh or
virt-manager.
virsh aims to be a completely plain passthrough where the user is
in total control of their setup. To a large extent that is true
of virt-manager too. So I'd tend to expect users of both those
apps would manually configured CPU affinity of their VMs as & when
they used isolcpus.

Where we'd put in policies around isolcpus would be in the apps
like OpenStack and RHEV/oVirt which define specific usage policies
for the system as a whole.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|