On 12/21/2015 03:36 AM, Henning Schild wrote:
On Mon, 14 Dec 2015 16:27:54 -0500
John Ferlan <jferlan(a)redhat.com> wrote:
>
>
> On 11/13/2015 11:56 AM, Henning Schild wrote:
>> Hi,
>>
>> i already explained some of the cgroup problems in some detail so i
>> will not do that again.
>>
https://www.redhat.com/archives/libvir-list/2015-October/msg00876.html
>>
>> I managed to solve some of the problems in the current codebase, and
>> am now sharing the patches. But they are really just half of what i
>> had to change to get libvirt to behave in a system with isolated
>> cpus.
>>
>> Other changes/hacks i am not sending here because they do not work
>> for the general case:
>> - create machine.slice before starting libvirtd (smaller than root)
>> ... and hope it wont grow
>> - disabling cpuset.cpus inheritance in libvirtd
>> - allowing only xml with fully specified cputune
>> - set machine cpuset to (vcpupins | emulatorpin)
>>
>> I am not sure how useful the individual fixes are, i am sending them
>> as concrete examples for the problems i described earlier. And i am
>> hoping that will start a discussion.
>>
>> Henning
>>
>> Henning Schild (3):
>> util: cgroups do not implicitly add task to new machine cgroup
>> qemu: do not put a task into machine cgroup
>> qemu cgroups: move new threads to new cgroup after cpuset is set
>> up
>>
>> src/lxc/lxc_cgroup.c | 6 ++++++
>> src/qemu/qemu_cgroup.c | 23 ++++++++++++++---------
>> src/util/vircgroup.c | 22 ----------------------
>> 3 files changed, 20 insertions(+), 31 deletions(-)
>>
>
>
> The updated code looks fine to me - although it didn't directly git am
> -3 to top of tree - I was able to make a few adjustments to get things
> merged... Since no one has objected to this ordering change - I've
> pushed.
Sorry the patches where still based on v1.2.19. Thanks for the merge
and accepting them!
No problem - although it seems they've generated a regression in the
virttest memtune test suite. I'm 'technically' on vacation for the next
couple of weeks; however, I think/perhaps the problem is a result of
this patch and the change to adding the task to the cgroup at the end of
the for loop, but perhaps the following code causes the control to jump
back to the top of the loop:
if (!cpumap)
continue;
if (qemuSetupCgroupCpusetCpus(cgroup_vcpu, cpumap) < 0)
goto cleanup;
not allowing the
/* move the thread for vcpu to sub dir */
if (virCgroupAddTask(cgroup_vcpu,
qemuDomainGetVcpuPid(vm, i)) < 0)
goto cleanup;
to be executed.
The code should probably change to be (like IOThreads):
if (cpumap &&
qemuSetupCgroupCpusetCpus(cgroup_vcpu, cpumap) < 0)
goto cleanup;
As for the rest, I suspect things will be quite quiet around here over
the next couple of weeks. A discussion to perhaps start in the new year.
John
Wrong operation ordering within libvirt cgroups (like the ones
fixed by the patches) could still push tasks onto dedicated cpus. And
more importantly other cgroups users can still grab the dedicated cpus
as well. The only reliable solution to prevent that seems to be making
use of the "exclusive" feature of cpusets. And that would imply
changing the cgroups layout of libvirt again. Because sets can not be
partially exclusive and libvirt deals with dedicated cpus and shared
ones.
How to deal with these problems is a discussion that i wanted to get
started with this patch-series. It would be nice to receive general
comments on that. How should we proceed here? I could maybe write an
RFC mail describing the problems again and suggesting changes to
libvirt on a conceptual basis.
But until then maybe people responsible for cgroups in libvirt (Paul
and Martin?) can again look at
https://www.redhat.com/archives/libvir-list/2015-October/msg00876.html
There i described how naive use of cgoups can place tasks on cpus that
are supposed to be isolated/dedicated/exclusive. Even if libvirt does
not make these mistakes it should protect itself against docker,
systemd, ...
Henning