
On Thu, Jul 02, 2015 at 04:42:47PM +0200, Henning Schild wrote:
On Thu, 2 Jul 2015 15:18:46 +0100 "Daniel P. Berrange" <berrange@redhat.com> wrote:
On Thu, Jul 02, 2015 at 04:02:58PM +0200, Henning Schild wrote:
Hi,
i am currently looking into realtime VMs using libvirt. My first starting point was reserving a couple of cores using isolcpus and later tuning the affinity to place my vcpus on the reserved pcpus.
My first observation was that libvirt ignores isolcpus. Affinity masks of new qemus will default to all cpus and will not be inherited from libvirtd. A comment in the code suggests that this is done on purpose.
Ignore realtime + isolcpus for a minute. It is not unreasonable for the system admin to decide system services should be restricted to run on a certain subset of CPUs. If we let VMs inherit the CPU pinning on libvirtd, we'd be accidentally confining VMs to a subset of CPUs too. With new cgroups layout, libvirtd lives in a separate cgroups tree /system.slice, while VMs live in /machine.slice. So for both these reasons, when starting VMs, we explicitly ignore any affinity libvirtd has and set VMs mask to allow any CPU.
Sure, that was my first guess as well. Still i wanted to raise the topic again from the realtime POV. I am using a pretty recent libvirt from git but did not come across the system.slice yet. Might be a matter of configuration/invocation of libvirtd.
Oh, I should mention that I'm referring to OS that use systemd for their init system here, not legacy sysvinit FWIW our cgroups layout is described here http://libvirt.org/cgroups.html
After that i changed the code to use only the available cpus by default. But taskset was still showing all 'f's on my qemus. Then i traced my change down to sched_setaffinity assuming that some other mechanism might have reverted my hack, but it is still in place.
From the libvirt POV, we can't tell whether the admin set isolcpus because they want to reserve those CPUs only for VMs, or because they want to stop VMs using those CPUs by default. As such libvirt does not try to interpret isolcpus at all, it leaves it upto a higher level app to decide on this policy.
I know, you have to tell libvirt that the reservation is actually for libvirt. My idea was to introduce a config option in libvirt and maybe sanity check it by looking at whether the pcpus are actually reserved. Rik recently posted a patch to allow easy programmatic checking of isolcpus via sysfs.
In libvirt we try to have a general principle that libvirt will provide the mechanism but not implement usage policy. So if we follow a strict interpretation here, then applying CPU mask based on isolcpus would be out of scope for libvirt, since we expose a sufficiently flexible mechanism to implement any desired policy at a higher level.
In the case of OpenStack, the /etc/nova/nova.conf allows a config setting 'vcpu_pin_set' to say what set of CPUs VMs should be allowed to run on, and nova will then update the libvirt XML when starting each guest.
I see, would it not still make sense to have that setting centrally in libvirt? I am thinking about people not using nova but virsh or virt-manager.
virsh aims to be a completely plain passthrough where the user is in total control of their setup. To a large extent that is true of virt-manager too. So I'd tend to expect users of both those apps would manually configured CPU affinity of their VMs as & when they used isolcpus. Where we'd put in policies around isolcpus would be in the apps like OpenStack and RHEV/oVirt which define specific usage policies for the system as a whole. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|