On 01/07/2016 02:01 PM, Henning Schild wrote:
On Thu, 7 Jan 2016 11:20:23 -0500
John Ferlan <jferlan(a)redhat.com> wrote:
>
> [...]
>
>>> No problem - although it seems they've generated a regression in
>>> the virttest memtune test suite. I'm 'technically' on vacation
>>> for the next couple of weeks; however, I think/perhaps the problem
>>> is a result of this patch and the change to adding the task to the
>>> cgroup at the end of the for loop, but perhaps the following code
>>> causes the control to jump back to the top of the loop:
>>>
>>> if (!cpumap)
>>> continue;
>>>
>>> if (qemuSetupCgroupCpusetCpus(cgroup_vcpu, cpumap) <
>>> 0) goto cleanup;
>>>
>>> not allowing the
>>>
>>>
>>> /* move the thread for vcpu to sub dir */
>>> if (virCgroupAddTask(cgroup_vcpu,
>>> qemuDomainGetVcpuPid(vm, i)) < 0)
>>> goto cleanup;
>>>
>>> to be executed.
>>>
>>> The code should probably change to be (like IOThreads):
>>>
>>> if (cpumap &&
>>> qemuSetupCgroupCpusetCpus(cgroup_vcpu, cpumap) <
>>> 0) goto cleanup;
>>>
>>>
>>> As for the rest, I suspect things will be quite quiet around here
>>> over the next couple of weeks. A discussion to perhaps start in
>>> the new year.
>>
>> Same here. I will have a look at that regression after my vacation,
>> should it still be there.
>>
>> Henning
>>
>
> More data from the issue... While the above mentioned path is an
> issue, I don't believe it's what's causing the test failure.
>
> I haven't quite figured out why yet, but it seems the /proc/#/cgroup
> file isn't getting the proper path for the 'memory' slice and thus the
> test fails because it's looking at the:
>
> /sys/fs/cgroup/memory/machine.slice/memory.*
>
> files instead of the
>
> /sys/fs/cgroup/memory/machine.slice/$path/memory.*
To be honest i did just look at the cgroup/cpuset/ hierarchy, but i
just browsed cgroup/memory/ as well.
The target of my patch series was to get
cgroup/cpuset/machine.slice/tasks to be emtpy, all tasks should be in
their sub-cgroup under the machine.slice. And the ordering patches make
sure the file is always empty.
In the memory cgroups all tasks are in the parent group (all in
machine.slice/tasks). machine.slice/*/tasks are empty. I am not sure
whether that is intended, i can just assume it is a bug in the memory
cgroup subsystem. Why are the groups created and tuned when the tasks
stay in the big superset?
TBH - there's quite a bit of this that mystifies me... Use of cgroups is
not something I've spent a whole lot of time looking at...
I guess I've been working under the assumption that when the
machine.slice/$path is created, the domain would use that for all cgroup
specific file adjustments for that domain. Not sure how the
/proc/$pid/cgroup is related to this.
My f23 system seems to generate the /proc/$pid/cgroup with the
machine.slice/$path/ for each of the cgroups libvirt cares about while
the f20 system with the test only has that path for cpuset and
cpu,cpuacct. Since that's what the test uses for to find the memory path
for validation that's why it fails.
I've been looking through the libvirtd debug logs to see if anything
jumps out at me, but it seems both the systems I've looked at will build
the path for the domain using the machine.slice/$path as seen during
domain startup.
Very odd - perhaps looking at it too long right now though!
/proc/#/cgroup is showing the correct path, libvirt seems to fail to
migrate tasks into memory subgroups. (i am talking about a patched
1.2.19 where vms do not have any special memory tuning)
I'm using latest upstream 1.3.1 - it seems to set the
machine.slice/$path for blkio, cpu,cpuacct, cpuset, memory, and devices
entries.
Without my patches the first qemu thread was in
"2:cpuset:/machine.slice" and the name did match
"4:memory:/machine.slice". Now if the test wants matching names the
test might just be wrong. Or as indicated before there might be a bug
in the memory cgroups.
I'm leaning towards something in the test. I'll check if reverting these
changes alters the results. I don't imagine it will.
John
> Where $path is
"machine-qemu\x2dvirt\x2dtests\x2dvm1.scope"
>
> This affects the virsh memtune $dom command test suite which uses the
> /proc/$pid/cgroup file in order to find the path for the 'memory' or
> 'cpuset' or 'cpu,cpuacct' cgroup paths.
>
> Seems to be some interaction with systemd that I have quite figured
> out.
>
> I'm assuming this is essentially the issue you were trying to fix -
> that is changes to values should be done to the machine-qemu*
> specific files rather than the machine.slice files.
>
> The good news is I can see the changes occurring in the machine-qemu*
> specific files, so it seems libvirt is doing the right thing.
>
> However, there's something strange with perhaps previously
> existing/running domains where that /proc/$pid/cgroup file doesn't get
> the $path for the memory entry, thus causing the test validation to
> look in the wrong place.
>
> Hopefully this makes sense. What's really strange (for me at least) is
> that it's only occurring on one test system. I can set up the same
> test on another system and things work just fine. I'm not quite sure
> what interaction generates that /proc/$pid/cgroup file - hopefully
> someone else understands it and help me make sense of it.