>
> I'm leaning towards something in the test. I'll check if reverting
> these changes alters the results. I don't imagine it will.
The real question is which thread it fails on and at what point in
time. My patches only changed the order of operations where threads
enter the cpuset cgroups at a slightly different time. And the qemu main
thread never enters the parent group, it becomes an emulator-thread.
Maybe you can point to exactly the assertion that fails. Including a
link to the test code. And yes if you can confirm that the patches are
to blame that would be a good first step ;).
Thanks,
Henning
Not quite sure how to answer your question about which thread - I'm
still at the point of figuring out the symptoms.
At startup, there's a priv->cgroup created. Then when vcpu, emulator,
iothread calls are made - each seems to create it's own cgroup thread
via virCgroupNewThread using the priv->cgroup, then make the adjustment,
and free the cgroup.
As for which test - it's part of the 'virt-test' suite. It's run on a
Red Hat internal system every night in order to help determine what/if
any changes made during the work day have caused a regression. You can
look up 'virt-test' on github, but it's being replaced by something
known as avacado.
A test was run with all the patches reverted and the test passed, so
something in the way things were moved. What's "interesting" (to me at
least) is that if I start the vm on that system used for the test, the
/proc/$pid/cgroup file is as follows:
10:hugetlb:/
9:perf_event:/machine.slice/machine-qemu\x2dvirt\x2dtests\x2dvm1.scope
8:blkio:/machine.slice/machine-qemu\x2dvirt\x2dtests\x2dvm1.scope
7:net_cls,net_prio:/machine.slice/machine-qemu\x2dvirt\x2dtests\x2dvm1.scope
6:freezer:/machine.slice/machine-qemu\x2dvirt\x2dtests\x2dvm1.scope
5:devices:/machine.slice/machine-qemu\x2dvirt\x2dtests\x2dvm1.scope
4:memory:/machine.slice/machine-qemu\x2dvirt\x2dtests\x2dvm1.scope
3:cpu,cpuacct:/machine.slice/machine-qemu\x2dvirt\x2dtests\x2dvm1.scope/emulator
2:cpuset:/machine.slice/machine-qemu\x2dvirt\x2dtests\x2dvm1.scope/emulator
1:name=systemd:/machine.slice/machine-qemu\x2dvirt\x2dtests\x2dvm1.scope
If I then "unrevert" the patches one by one (e.g. poor man's git
bisect), I find that patch 2/3 results in the following adjustment to
the /proc/$pid/cgroup file:
10:hugetlb:/
9:perf_event:/
8:blkio:/machine.slice
7:net_cls,net_prio:/
6:freezer:/
5:devices:/machine.slice
4:memory:/machine.slice
3:cpu,cpuacct:/machine.slice/machine-qemu\x2dvirt\x2dtests\x2dvm1.scope/emulator
2:cpuset:/machine.slice/machine-qemu\x2dvirt\x2dtests\x2dvm1.scope/emulator
1:name=systemd:/machine.slice/machine-qemu\x2dvirt\x2dtests\x2dvm1.scope
So, I have a candidate... It seems that by changing the AddTask from
using 'priv->cgroup' to a copy of the cgroup as created by
virCgroupNewThread in qemuSetupCgroupForEmulator, the /proc/$pid/cgroup
file only modifies the 'cpuset and cpu,cpuacct'. Thus changing the other
entries back to /machine.slice. I'm not clear why that happens (yet).
BTW:
What's interesting with the file changes is that they differ from my f23
system in which the same revert processing would have the following
/proc/$pid/cgroup file when patch 2 is re-applied:
10:devices:/machine.slice/machine-qemu\x2dvirt\x2dtests\x2dvm1.scope
9:memory:/machine.slice/machine-qemu\x2dvirt\x2dtests\x2dvm1.scope
8:freezer:/
7:cpuset:/machine.slice/machine-qemu\x2dvirt\x2dtests\x2dvm1.scope/emulator
6:net_cls,net_prio:/
5:cpu,cpuacct:/machine.slice/machine-qemu\x2dvirt\x2dtests\x2dvm1.scope/emulator
4:blkio:/machine.slice/machine-qemu\x2dvirt\x2dtests\x2dvm1.scope
3:hugetlb:/
2:perf_event:/
1:name=systemd:/machine.slice/machine-qemu\x2dvirt\x2dtests\x2dvm1.scope
This does seem similar in a way to something I found while doing a
search to
https://bugzilla.redhat.com/show_bug.cgi?id=1139223. It's not
completely the same, but the symptom of systemd overwriting non changing
controller entries feels similar.
John