[PATCH 0/2] qemu: Two improvements wrt mount namespaces

While investigating a bug (which I believe is just a misconfiguration; linked in 2/2) I've found a problem with memfd (patch 1/2). Michal Prívozník (2): qemu_process: Don't require a hugetlbfs mount for memfd kbase: Document QEMU private mount NS limitations docs/kbase/qemu-passthrough-security.rst | 22 ++++++++++++++++++++++ src/qemu/qemu_process.c | 12 +++++++++++- 2 files changed, 33 insertions(+), 1 deletion(-) -- 2.35.1

The aim of qemuProcessNeedHugepagesPath() is to determine whether a hugetlbfs mount point is required for given domain (as in whether qemuBuildMemoryBackendProps() picks up memory-backend-file pointing to a hugetlbfs mount point). Well, when domain is configured to use memfd backend then that condition can never be true. Therefore, skip creating domain's private path under hugetlbfs mount points. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_process.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 32f03ff79a..8102e689fb 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3888,8 +3888,18 @@ qemuProcessNeedHugepagesPath(virDomainDef *def, const long system_pagesize = virGetSystemPageSizeKB(); size_t i; - if (def->mem.source == VIR_DOMAIN_MEMORY_SOURCE_FILE) + switch ((virDomainMemorySource)def->mem.source) { + case VIR_DOMAIN_MEMORY_SOURCE_FILE: + /* This needs a hugetlbfs mount. */ return true; + case VIR_DOMAIN_MEMORY_SOURCE_MEMFD: + /* memfd works without a hugetlbfs mount */ + return false; + case VIR_DOMAIN_MEMORY_SOURCE_NONE: + case VIR_DOMAIN_MEMORY_SOURCE_ANONYMOUS: + case VIR_DOMAIN_MEMORY_SOURCE_LAST: + break; + } for (i = 0; i < def->mem.nhugepages; i++) { if (def->mem.hugepages[i].size != system_pagesize) -- 2.35.1

There are two points I've taken for granted: 1) the mount points are set before starting a guest, 2) the / and its submounts are marked as shared, so that mount events propagate into child namespaces when assumption 1) is not held. But what's obvious to me might not be obvious to our users. Document these known limitations. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2123196 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/kbase/qemu-passthrough-security.rst | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/docs/kbase/qemu-passthrough-security.rst b/docs/kbase/qemu-passthrough-security.rst index 4381d9f3a6..106c3cc5b9 100644 --- a/docs/kbase/qemu-passthrough-security.rst +++ b/docs/kbase/qemu-passthrough-security.rst @@ -156,3 +156,25 @@ will affect all virtual machines. These settings are all made in * Cgroups - set ``cgroup_device_acl`` to include the desired device node, or ``cgroup_controllers = [...]`` to exclude the ``devices`` controller. + +Private monunt namespace +---------------------------- + +As mentioned above, libvirt launches each QEMU process in its own ``mount`` +namespace. It's recommended that all mount points are set up prior starting any +guest. For cases when that can't be assured, mount points in the namespace are +marked as slave so that mount events happening in the parent namespace are +propagated into this child namespace. But this may require an additional step: +mounts in the parent namespace need to be marked as shared (if the distribution +doesn't do that by default). This can be achieved by running the following +command before any guest is started: + +:: + + # mount --make-rshared / + +Another requirement for dynamic mount point propagation is to not place +``hugetlbfs`` mount points under ``/dev`` because these won't be propagated as +corresponding directories do not exist in the private namespace. Or just use +``memfd`` memory backend instead which does not require ``hugetlbfs`` mount +points. -- 2.35.1

On Mon, Sep 05, 2022 at 04:32:27PM +0200, Michal Privoznik wrote:
While investigating a bug (which I believe is just a misconfiguration; linked in 2/2) I've found a problem with memfd (patch 1/2).
Michal Prívozník (2): qemu_process: Don't require a hugetlbfs mount for memfd kbase: Document QEMU private mount NS limitations
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
docs/kbase/qemu-passthrough-security.rst | 22 ++++++++++++++++++++++ src/qemu/qemu_process.c | 12 +++++++++++- 2 files changed, 33 insertions(+), 1 deletion(-)
-- 2.35.1
participants (2)
-
Martin Kletzander
-
Michal Privoznik