On Mon, Sep 12, 2022 at 03:46:41PM +0200, Michal Privoznik wrote:
> When reconnecting to a running QEMU process, we construct the
> per-domain path in all hugetlbfs mounts. This is a relict from
> the past (v3.4.0-100-g5b24d25062) where we switched to a
> per-domain path and we want to create those paths when libvirtd
> restarts on upgrade.
>
> And with namespaces enabled there is one corner case where the
> path is not created. In fact an error is reported and the
> reconnect fails. Ideally, all mount events are propagated into
> the QEMU's namespace. And they probably are, except when the
> target path does not exist inside the namespace. Now, it's pretty
> common for users to mount hugetlbfs under /dev (e.g.
> /dev/hugepages), but if domain is started without hugepages (or
> more specifically - private hugetlbfs path wasn't created on
> domain startup), then the reconnect code tries to create it.
> But it fails to do so, well, it fails to set seclabels on the
> path because, because the path does not exist in the private
s/because, // at least =)
> namespace. And it doesn't exist because we specifically create
> only a subset of all possible /dev nodes. Therefore, the mount
> event, whilst propagated, is not successful and hence the
> filesystem is not mounted. We have to do it ourselves.
>
> If hugetlbfs is mount anywhere else there's no problem and this
> is effectively a dead code.
>
> Resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=2123196
> Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>
> ---
> docs/kbase/qemu-passthrough-security.rst | 6 ------
> src/qemu/qemu_process.c | 3 +++
> 2 files changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/docs/kbase/qemu-passthrough-security.rst
> b/docs/kbase/qemu-passthrough-security.rst
> index 106c3cc5b9..ef10d8af9b 100644
> --- a/docs/kbase/qemu-passthrough-security.rst
> +++ b/docs/kbase/qemu-passthrough-security.rst
> @@ -172,9 +172,3 @@ command before any guest is started:
> ::
>
> # mount --make-rshared /
> -
> -Another requirement for dynamic mount point propagation is to not place
> -``hugetlbfs`` mount points under ``/dev`` because these won't be
> propagated as
> -corresponding directories do not exist in the private namespace. Or
> just use
> -``memfd`` memory backend instead which does not require ``hugetlbfs``
> mount
> -points.
> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> index cbfdd3bda5..b05ad059c3 100644
> --- a/src/qemu/qemu_process.c
> +++ b/src/qemu/qemu_process.c
> @@ -3976,6 +3976,9 @@
> qemuProcessBuildDestroyMemoryPathsImpl(virQEMUDriver *driver,
> return -1;
> }
>
> + if (qemuDomainNamespaceSetupPath(vm, path, NULL) < 0)
> + return -1;
> +
What's not visible here is that this branch is accessible only if the
domain has any hugepage backing configured. And it is only created on
startup and reconnect. That means if you start a VM without any
hugepages mounted, then mount them somewhere under /dev and restart the
daemon (which was always required) then attaching a hugepage-backed
dimm, for example, will work without namespaces, but will fail with
namespaces because we skipped the creation on restart. I think just
recreating the structure unconditionally would provide more benefit than
doubly isolating qemu from hugepages (once with permissions and once
with namespaces).
Actually, the path is created also on memory hotplug:
src/qemu/qemu_hotplug.c=2231=qemuDomainAttachMemory(virQEMUDriver *driver,
--
src/qemu/qemu_hotplug.c-2275-
src/qemu/qemu_hotplug.c:2276: if (qemuProcessBuildDestroyMemoryPaths(driver, vm, mem,
true) < 0)
src/qemu/qemu_hotplug.c-2277- goto cleanup;
So the per-domain dir should be created in all three cases (startup, reconnect &
hotplug).
Since this patch solves the issue of us killing perfectly fine qemu I
think it's fine if that's a follow'up patch:
Reviewed-by: Martin Kletzander <mkletzan(a)redhat.com>