Hi folks,
i already started a discussion on the interaction of cgroups and
isolcpus a while ago. But now i believe i have got a better
understanding of how the two interact and i can describe problems that
arise from that.
The scenario: A machine that runs realtime tasks on pcpus reserved with
isolcpus. It also runs VMs with the help of libvirt. It might also run
realtime VMs with the help of libvirt.
Moving a task into a new cgroup/cpuset and some modifications of the
cpus in that set imply a setaffinity by the kernel. That affinity
setting will ignore isolcpus. The result is possible "interference by"
or "starvation of" these tasks.
Now let me describe one scenario where that implicit setaffinity
becomes a problem for our realtime system.
libvirt creates a superset called the machine.slice and subsets called
emulator and vpuX. By default the machine.slice inherits from the root
which contains all pcpus, also the isolated ones. Now moving a task
into that superset will place that task on isolcpus where it might
interfere or simply starve.
Turns out that a fresh qemu actually is put into that superset. That is
a bug that should be fixed but let me address that one in another mail.
My current point of view is that we need a strong mechanism to isolate
cpus. isolcpus just is not good enough. The measure of choice probably
is cpusets as well, and this time with the exclusive flag turned on.
That will stop every other cpuset user from messing around with
those cpus by accident.
I am thinking of one or more cpusets where isolated cpus are
parked and not used within this cgroup. Anyone wanting to use one of
them will have to take it out there and explictily put it into their a
new set. Now if libvirt makes the mistake to have tasks running in
supersets these tasks will spread to the newly added rt-cpu. Or new
tasks that run in supersets will end up on rt-cpus already in use. But
at least we have containment in libvirt and the VMs it spawned.
For alloc and free of rt-cpus i am planning to use libvirt hooks to
begin with, from what i read they should enable me to do what i need.
What do you guys think about the general idea to address the described
problem?
I will implement a prototype of the alloc-free of rt-cpus. My current
hope is that libvirt hooks can be abused for that.
I am thinking that at some point libvirt should be able to do that
without hooks. It should get a notion of reserved ressources that are
currently parked in other cgroups. My current suspicion is that the
cpusets might just be the tip of the iceberg. -- for now i am running
libvirt without cgroups to keep my isolcpus free
cgroups/cpusets offer a switch to make a cpu exclusive to a set. That
switch is great because it will act as an assert, a second line of
defense. Having seen how cpusets and migration mess around with
affinities i guess for realtime people have to insist on that second
line of defense. Especially in times where cgroups are all over the
place.
In openstack one would actually say that a pcpu should be "dedicated".
That will result in a vcpupin on exactly one pcpu. Unfortunately one
meaning of "dedicated" gets lost in translation. It could otherwise be
used by libvirtd to set cpuset.cpu_exclusive in the vcpu-cgroup.
And i am bringing that up here because i do not think libvirt allows
me to influence the cpu_exclusive flag for my vcpu cgroups.
Henning
Show replies by date