lxc container startup error and RFC patch

Hi all, I'm seeing LXC container startup failures. This is with libvirt git, fedora 34 host with systemd-248.6-1.fc34.x86_64 (I didn't confirm with other versions). Reproducer: sudo virt-install --connect lxc:/// --name test-container --memory 128 --boot init=/bin/sh Starting install... ERROR error from service: GDBus.Error:org.freedesktop.machine1.NoMachineForPID: PID 2145047 does not belong to any known machine libvirt 7.0.0 works but 7.1.0+ does not. The root error seems to predate that, showing up in syslog, but commit 9c1693eff made it fatal: commit 9c1693eff427661616ce1bd2795688f87288a412 Author: Pavel Hrdina <phrdina@redhat.com> Date: Fri Feb 5 16:17:35 2021 +0100 vircgroup: use DBus call to systemd for some APIs The error comes from virSystemdGetMachineByPID. The PID that shows up in the above error message does not match the leader PID as reported by machinectl. This change fixes the error but I don't know if it's correct or if it has other implications: diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index 066e013ed4..54ecb1316b 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -866,12 +866,12 @@ static int virLXCControllerSetupCgroupLimits(virLXCController *ctrl) nodeset = virDomainNumatuneGetNodeset(ctrl->def->numa, auto_nodeset, -1); if (!(ctrl->cgroup = virLXCCgroupCreate(ctrl->def, - ctrl->initpid, + getpid(), ctrl->nnicindexes, ctrl->nicindexes))) goto cleanup; - if (virCgroupAddMachineProcess(ctrl->cgroup, getpid()) < 0) + if (virCgroupAddMachineProcess(ctrl->cgroup, ctrl->initpid) < 0) goto cleanup; /* Add all qemu-nbd tasks to the cgroup */ Maybe something else isn't working elsewhere. Clearly we try to add both pids to the systemd machine, but virSystemdGetMachineByPID is not working to match the non-leader pid, which is the one that the LXC driver knows about. Thoughts? Can anyone else reproduce? Thanks, Cole

On 7/29/21 3:28 PM, Cole Robinson wrote:
Hi all,
I'm seeing LXC container startup failures. This is with libvirt git, fedora 34 host with systemd-248.6-1.fc34.x86_64 (I didn't confirm with other versions). Reproducer:
From my experience its more related to cgroups. Works with V2-only, doesn't work with V1 or hybrid.
sudo virt-install --connect lxc:/// --name test-container --memory 128 --boot init=/bin/sh
Starting install... ERROR error from service: GDBus.Error:org.freedesktop.machine1.NoMachineForPID: PID 2145047 does not belong to any known machine
libvirt 7.0.0 works but 7.1.0+ does not. The root error seems to predate that, showing up in syslog, but commit 9c1693eff made it fatal:
commit 9c1693eff427661616ce1bd2795688f87288a412 Author: Pavel Hrdina <phrdina@redhat.com> Date: Fri Feb 5 16:17:35 2021 +0100
vircgroup: use DBus call to systemd for some APIs
The error comes from virSystemdGetMachineByPID. The PID that shows up in the above error message does not match the leader PID as reported by machinectl. This change fixes the error but I don't know if it's correct or if it has other implications:
I'm not familiar enough with the driver to review your change with confidence.
diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index 066e013ed4..54ecb1316b 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -866,12 +866,12 @@ static int virLXCControllerSetupCgroupLimits(virLXCController *ctrl) nodeset = virDomainNumatuneGetNodeset(ctrl->def->numa, auto_nodeset, -1);
if (!(ctrl->cgroup = virLXCCgroupCreate(ctrl->def, - ctrl->initpid, + getpid(), ctrl->nnicindexes, ctrl->nicindexes))) goto cleanup;
- if (virCgroupAddMachineProcess(ctrl->cgroup, getpid()) < 0) + if (virCgroupAddMachineProcess(ctrl->cgroup, ctrl->initpid) < 0) goto cleanup;
/* Add all qemu-nbd tasks to the cgroup */
Maybe something else isn't working elsewhere. Clearly we try to add both pids to the systemd machine, but virSystemdGetMachineByPID is not working to match the non-leader pid, which is the one that the LXC driver knows about.
Thoughts? Can anyone else reproduce?
https://gitlab.com/libvirt/libvirt/-/issues/182 Regards, Jim

On 8/2/21 10:20 PM, Jim Fehlig wrote:
On 7/29/21 3:28 PM, Cole Robinson wrote:
Hi all,
I'm seeing LXC container startup failures. This is with libvirt git, fedora 34 host with systemd-248.6-1.fc34.x86_64 (I didn't confirm with other versions). Reproducer:
From my experience its more related to cgroups. Works with V2-only, doesn't work with V1 or hybrid.
Ah, that's the missing piece! I tried to reproduce on my fedora VMs but all of them are fully switched to v2. Thanks Jim, I'll give it another try.
sudo virt-install --connect lxc:/// --name test-container --memory 128 --boot init=/bin/sh
Starting install... ERROR error from service: GDBus.Error:org.freedesktop.machine1.NoMachineForPID: PID 2145047 does not belong to any known machine
libvirt 7.0.0 works but 7.1.0+ does not. The root error seems to predate that, showing up in syslog, but commit 9c1693eff made it fatal:
commit 9c1693eff427661616ce1bd2795688f87288a412 Author: Pavel Hrdina <phrdina@redhat.com> Date: Fri Feb 5 16:17:35 2021 +0100
vircgroup: use DBus call to systemd for some APIs
The error comes from virSystemdGetMachineByPID. The PID that shows up in the above error message does not match the leader PID as reported by machinectl. This change fixes the error but I don't know if it's correct or if it has other implications:
I'm not familiar enough with the driver to review your change with confidence.
I'll do the review.
diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index 066e013ed4..54ecb1316b 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -866,12 +866,12 @@ static int virLXCControllerSetupCgroupLimits(virLXCController *ctrl) nodeset = virDomainNumatuneGetNodeset(ctrl->def->numa, auto_nodeset, -1); if (!(ctrl->cgroup = virLXCCgroupCreate(ctrl->def, - ctrl->initpid, + getpid(), ctrl->nnicindexes, ctrl->nicindexes))) goto cleanup; - if (virCgroupAddMachineProcess(ctrl->cgroup, getpid()) < 0) + if (virCgroupAddMachineProcess(ctrl->cgroup, ctrl->initpid) < 0) goto cleanup; /* Add all qemu-nbd tasks to the cgroup */
Maybe something else isn't working elsewhere. Clearly we try to add both pids to the systemd machine, but virSystemdGetMachineByPID is not working to match the non-leader pid, which is the one that the LXC driver knows about.
Thoughts? Can anyone else reproduce?
Thanks for filing the issue. Michal
participants (3)
-
Cole Robinson
-
Jim Fehlig
-
Michal Prívozník