On Fri, Jul 17, 2015 at 15:40:31 +0100, Daniel Berrange wrote:
On Thu, Jul 16, 2015 at 04:18:08PM +0200, Peter Krempa wrote:
> The scope name, even according to our docs is
> "machine-$DRIVER\x2d$VMNAME.scope" virSystemdMakeScopeName would use the
> resource partition name instead of "machine-" if it was specified thus
> creating invalid scope paths.
>
> This makes libvirt drop cgroups for a VM that uses custom resource
> partition upon reconnecting since the detected scope name would not
> match the expected name generated by virSystemdMakeScopeName.
>
> The error is exposed by the following log entry:
>
> debug : virCgroupValidateMachineGroup:302 : Name
'machine-qemu\x2dtestvm.scope' for controller 'cpu' does not match
'testvm', 'testvm.libvirt-qemu' or
'machine-test-qemu\x2dtestvm.scope'
>
> for a "/machine/test" resource and "testvm" vm.
This doesn't look right - the whole idea of the resource partition
name is that it overrides the default /machine based path. So
ignoring the resource partition is not right, unless I'm mis-
understanding what this patch is doing.
Perhaps I wasn't clear enough in the explanation. The code path that
calls the function where I've removed the partition is called on daemon
restart, where the paths to cgroups are re-detected upon startup.
At first virCgroupNewDetect() is called on a VM's pid and the function
detects cgroup paths for the given process.
The detected data for a VM with
<partition>/machine/asdf/test</partiotion> look like:
(gdb) p **group
$2 = {path = 0x7fff94000ec0 "",
controllers = {{type = 0,
mountPoint = 0x7fff94001330 "/sys/fs/cgroup/cpu,cpuacct",
linkPoint = 0x7fff94001470 "/sys/fs/cgroup/cpu",
placement = 0x7fff94001fb0
"/machine.slice/machine-asdf.slice/machine-asdf-test.slice/machine-qemu\\x2df20live2.scope/emulator"},
{type = 0,
mountPoint = 0x7fff94001200 "/sys/fs/cgroup/cpu,cpuacct",
linkPoint = 0x7fff94001450 "/sys/fs/cgroup/cpuacct",
placement = 0x7fff94002020
"/machine.slice/machine-asdf.slice/machine-asdf-test.slice/machine-qemu\\x2df20live2.scope/emulator"},
{type = 0,
mountPoint = 0x7fff94001410 "/sys/fs/cgroup/cpuset",
linkPoint = 0x0,
placement = 0x7fff94001ad0
"/machine.slice/machine-asdf.slice/machine-asdf-test.slice/machine-qemu\\x2df20live2.scope/emulator"},
... and so on for other cgroups.
After the detection virCgroupValidateMachineGroup() is called on the
detected cgroup array to validate them with the following algorithm:
FOREACH entry IN 'cgroup' array:
Find the last directory name in entry.placement (strrchr(..., '/')).
IF the cgroup is one of CPU* cgroups AND
the last directory component is '/emulator':
find next previous directory component and strip '/emuator' from it
# This equals to 'machine-qemu\\x2df20live2.scope' in the example above.
# validate the name
IF directory name IS EQUAL TO 'MACHINE_NAME' OR
'MACHINE_NAME.libvirt-DRIVER_NAME' OR
virSystemdMakeScopeName(...):
cgroup is valid
Since the algorithm does not extract the partition name or validate it
in any way in non-systemd scenarios and in scenarios where systemd is
used the generated name wouldn't ever match since the scope name as
extracted in the code does not include the slice generated from
<partition> the cgroups code does not work on systemd machines after
libvirt restart.
The patch as-is removes the partition name from the code that generates
the systemd SCOPE name since that does not contain it. The partition is
used to generate SLICE name which is contained in the cgroup path before
the SCOPE.
The question that comes into my mind is whether we should actually make
sure that the <partition> corresponds to the SLICE in the detected path,
or just matching the machine SCOPE is good enough in this case.
Peter