[PATCH 00/10] qemu: Introduce shared_filesystems configuration option

An alternative take on [1] based on review feedback. The need to have something like this in the first place is driven by KubeVirt (see [2] and [3]). A draft version of this series has been integrated into KubeVirt and it has been confirmed that it was effective in removing the need to use LD_PRELOAD hacks in the storage provider. CC'ing Stefan so he can have a look at the TPM part and shout if I've gotten anything wrong :) [1] https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/MMKVR... [2] https://issues.redhat.com/browse/CNV-34322 [3] https://issues.redhat.com/browse/CNV-39370 Andrea Bolognani (10): security: Fix alignment security: Fix name for _virSecurityDACChardevCallbackData security: Drop virSecurity(DAC|SELinux)RestoreImageLabelSingle() security: Drop virSecurity(DAC|SELinux)SetImageLabelRelative() qemu: Tweak augeas schema qemu: Introduce shared_filesystems configuration option qemu: Propagate shared_filesystems utils: Use overrides in virFileIsSharedFS() qemu: Always set labels for TPM state NEWS: Document qemu shared_filesystems option NEWS.rst | 7 +++ src/lxc/lxc_controller.c | 2 +- src/lxc/lxc_driver.c | 2 +- src/lxc/lxc_process.c | 4 +- src/qemu/libvirtd_qemu.aug | 11 ++-- src/qemu/qemu.conf.in | 17 ++++++ src/qemu/qemu_conf.c | 17 ++++++ src/qemu/qemu_conf.h | 2 + src/qemu/qemu_domain.c | 2 +- src/qemu/qemu_extdevice.c | 2 +- src/qemu/qemu_migration.c | 12 ++-- src/qemu/qemu_security.c | 14 ++++- src/qemu/qemu_tpm.c | 36 ++++++------ src/qemu/qemu_tpm.h | 8 ++- src/qemu/test_libvirtd_qemu.aug.in | 5 ++ src/security/security_apparmor.c | 2 + src/security/security_dac.c | 67 +++++++++------------- src/security/security_driver.h | 4 ++ src/security/security_manager.c | 34 +++++++----- src/security/security_manager.h | 20 ++++--- src/security/security_nop.c | 4 ++ src/security/security_selinux.c | 58 ++++++++----------- src/security/security_stack.c | 16 ++++-- src/util/virfile.c | 89 +++++++++++++++++++++++++----- src/util/virfile.h | 3 +- tests/securityselinuxlabeltest.c | 2 +- tests/virfiletest.c | 2 +- 27 files changed, 289 insertions(+), 153 deletions(-) -- 2.44.0

Signed-off-by: Andrea Bolognani <abologna@redhat.com> --- src/security/security_manager.c | 14 +++++++------- src/security/security_manager.h | 16 ++++++++-------- 2 files changed, 15 insertions(+), 15 deletions(-) diff --git a/src/security/security_manager.c b/src/security/security_manager.c index afd41f1c20..24f2f3d3dc 100644 --- a/src/security/security_manager.c +++ b/src/security/security_manager.c @@ -412,9 +412,9 @@ virSecurityManagerGetPrivileged(virSecurityManager *mgr) */ int virSecurityManagerRestoreImageLabel(virSecurityManager *mgr, - virDomainDef *vm, - virStorageSource *src, - virSecurityDomainImageLabelFlags flags) + virDomainDef *vm, + virStorageSource *src, + virSecurityDomainImageLabelFlags flags) { VIR_LOCK_GUARD lock = virObjectLockGuard(mgr); @@ -1082,8 +1082,8 @@ virSecurityManagerDomainRestorePathLabel(virSecurityManager *mgr, */ int virSecurityManagerSetMemoryLabel(virSecurityManager *mgr, - virDomainDef *vm, - virDomainMemoryDef *mem) + virDomainDef *vm, + virDomainMemoryDef *mem) { VIR_LOCK_GUARD lock = virObjectLockGuard(mgr); @@ -1108,8 +1108,8 @@ virSecurityManagerSetMemoryLabel(virSecurityManager *mgr, */ int virSecurityManagerRestoreMemoryLabel(virSecurityManager *mgr, - virDomainDef *vm, - virDomainMemoryDef *mem) + virDomainDef *vm, + virDomainMemoryDef *mem) { VIR_LOCK_GUARD lock = virObjectLockGuard(mgr); diff --git a/src/security/security_manager.h b/src/security/security_manager.h index 97add3294d..a416af3215 100644 --- a/src/security/security_manager.h +++ b/src/security/security_manager.h @@ -43,8 +43,8 @@ typedef enum { VIR_SECURITY_MANAGER_PRIVILEGED) virSecurityManager *virSecurityManagerNew(const char *name, - const char *virtDriver, - unsigned int flags); + const char *virtDriver, + unsigned int flags); virSecurityManager *virSecurityManagerNewStack(virSecurityManager *primary); int virSecurityManagerStackAddNested(virSecurityManager *stack, @@ -73,10 +73,10 @@ typedef int virSecurityManager *virSecurityManagerNewDAC(const char *virtDriver, - uid_t user, - gid_t group, - unsigned int flags, - virSecurityManagerDACChownCallback chownCallback); + uid_t user, + gid_t group, + unsigned int flags, + virSecurityManagerDACChownCallback chownCallback); int virSecurityManagerPreFork(virSecurityManager *mgr); void virSecurityManagerPostFork(virSecurityManager *mgr); @@ -184,8 +184,8 @@ int virSecurityManagerSetMemoryLabel(virSecurityManager *mgr, virDomainDef *vm, virDomainMemoryDef *mem); int virSecurityManagerRestoreMemoryLabel(virSecurityManager *mgr, - virDomainDef *vm, - virDomainMemoryDef *mem); + virDomainDef *vm, + virDomainMemoryDef *mem); int virSecurityManagerSetInputLabel(virSecurityManager *mgr, virDomainDef *vm, -- 2.44.0

On Wed, Mar 20, 2024 at 10:19:06 +0100, Andrea Bolognani wrote:
Signed-off-by: Andrea Bolognani <abologna@redhat.com> --- src/security/security_manager.c | 14 +++++++------- src/security/security_manager.h | 16 ++++++++--------
Reviewed-by: Peter Krempa <pkrempa@redhat.com>

It was clearly copied over from the SELinux driver without updating its name in the process. Signed-off-by: Andrea Bolognani <abologna@redhat.com> --- src/security/security_dac.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/security/security_dac.c b/src/security/security_dac.c index 4b8130630f..7421496fc9 100644 --- a/src/security/security_dac.c +++ b/src/security/security_dac.c @@ -1671,7 +1671,7 @@ virSecurityDACRestoreChardevLabel(virSecurityManager *mgr, } -struct _virSecuritySELinuxChardevCallbackData { +struct _virSecurityDACChardevCallbackData { virSecurityManager *mgr; bool chardevStdioLogd; }; @@ -1682,7 +1682,7 @@ virSecurityDACRestoreChardevCallback(virDomainDef *def, virDomainChrDef *dev G_GNUC_UNUSED, void *opaque) { - struct _virSecuritySELinuxChardevCallbackData *data = opaque; + struct _virSecurityDACChardevCallbackData *data = opaque; return virSecurityDACRestoreChardevLabel(data->mgr, def, dev->source, data->chardevStdioLogd); @@ -1916,7 +1916,7 @@ virSecurityDACRestoreAllLabel(virSecurityManager *mgr, size_t i; int rc = 0; - struct _virSecuritySELinuxChardevCallbackData chardevData = { + struct _virSecurityDACChardevCallbackData chardevData = { .mgr = mgr, .chardevStdioLogd = chardevStdioLogd, }; @@ -2018,7 +2018,7 @@ virSecurityDACSetChardevCallback(virDomainDef *def, virDomainChrDef *dev G_GNUC_UNUSED, void *opaque) { - struct _virSecuritySELinuxChardevCallbackData *data = opaque; + struct _virSecurityDACChardevCallbackData *data = opaque; return virSecurityDACSetChardevLabel(data->mgr, def, dev->source, data->chardevStdioLogd); @@ -2141,7 +2141,7 @@ virSecurityDACSetAllLabel(virSecurityManager *mgr, uid_t user; gid_t group; - struct _virSecuritySELinuxChardevCallbackData chardevData = { + struct _virSecurityDACChardevCallbackData chardevData = { .mgr = mgr, .chardevStdioLogd = chardevStdioLogd, }; -- 2.44.0

On Wed, Mar 20, 2024 at 10:19:07 +0100, Andrea Bolognani wrote:
It was clearly copied over from the SELinux driver without updating its name in the process.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> --- src/security/security_dac.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
Reviewed-by: Peter Krempa <pkrempa@redhat.com>

On 3/20/24 05:19, Andrea Bolognani wrote:
It was clearly copied over from the SELinux driver without updating its name in the process.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>

Each one only has a single, trivial caller. Signed-off-by: Andrea Bolognani <abologna@redhat.com> --- src/security/security_dac.c | 21 ++++----------------- src/security/security_selinux.c | 21 ++++----------------- 2 files changed, 8 insertions(+), 34 deletions(-) diff --git a/src/security/security_dac.c b/src/security/security_dac.c index 7421496fc9..9c24a1c4a8 100644 --- a/src/security/security_dac.c +++ b/src/security/security_dac.c @@ -972,10 +972,10 @@ virSecurityDACSetImageLabel(virSecurityManager *mgr, } static int -virSecurityDACRestoreImageLabelSingle(virSecurityManager *mgr, - virDomainDef *def, - virStorageSource *src, - bool migrated) +virSecurityDACRestoreImageLabelInt(virSecurityManager *mgr, + virDomainDef *def, + virStorageSource *src, + bool migrated) { virSecurityDACData *priv = virSecurityManagerGetPrivateData(mgr); virSecurityLabelDef *secdef; @@ -1047,19 +1047,6 @@ virSecurityDACRestoreImageLabelSingle(virSecurityManager *mgr, } -static int -virSecurityDACRestoreImageLabelInt(virSecurityManager *mgr, - virDomainDef *def, - virStorageSource *src, - bool migrated) -{ - if (virSecurityDACRestoreImageLabelSingle(mgr, def, src, migrated) < 0) - return -1; - - return 0; -} - - static int virSecurityDACRestoreImageLabel(virSecurityManager *mgr, virDomainDef *def, diff --git a/src/security/security_selinux.c b/src/security/security_selinux.c index ffad058d9a..d491435ae1 100644 --- a/src/security/security_selinux.c +++ b/src/security/security_selinux.c @@ -1774,10 +1774,10 @@ virSecuritySELinuxRestoreTPMFileLabelInt(virSecurityManager *mgr, static int -virSecuritySELinuxRestoreImageLabelSingle(virSecurityManager *mgr, - virDomainDef *def, - virStorageSource *src, - bool migrated) +virSecuritySELinuxRestoreImageLabelInt(virSecurityManager *mgr, + virDomainDef *def, + virStorageSource *src, + bool migrated) { virSecurityLabelDef *seclabel; virSecurityDeviceLabelDef *disk_seclabel; @@ -1863,19 +1863,6 @@ virSecuritySELinuxRestoreImageLabelSingle(virSecurityManager *mgr, } -static int -virSecuritySELinuxRestoreImageLabelInt(virSecurityManager *mgr, - virDomainDef *def, - virStorageSource *src, - bool migrated) -{ - if (virSecuritySELinuxRestoreImageLabelSingle(mgr, def, src, migrated) < 0) - return -1; - - return 0; -} - - static int virSecuritySELinuxRestoreImageLabel(virSecurityManager *mgr, virDomainDef *def, -- 2.44.0

On Wed, Mar 20, 2024 at 10:19:08 +0100, Andrea Bolognani wrote:
Each one only has a single, trivial caller.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> --- src/security/security_dac.c | 21 ++++----------------- src/security/security_selinux.c | 21 ++++----------------- 2 files changed, 8 insertions(+), 34 deletions(-)
Reviewed-by: Peter Krempa <pkrempa@redhat.com>

On 3/20/24 05:19, Andrea Bolognani wrote:
Each one only has a single, trivial caller.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>

The single caller for each function passes the same value for @src and @parent, which means that we don't really need the additional API. Signed-off-by: Andrea Bolognani <abologna@redhat.com> --- src/security/security_dac.c | 19 +++++-------------- src/security/security_selinux.c | 19 +++++-------------- 2 files changed, 10 insertions(+), 28 deletions(-) diff --git a/src/security/security_dac.c b/src/security/security_dac.c index 9c24a1c4a8..567be4bd23 100644 --- a/src/security/security_dac.c +++ b/src/security/security_dac.c @@ -939,12 +939,12 @@ virSecurityDACSetImageLabelInternal(virSecurityManager *mgr, static int -virSecurityDACSetImageLabelRelative(virSecurityManager *mgr, - virDomainDef *def, - virStorageSource *src, - virStorageSource *parent, - virSecurityDomainImageLabelFlags flags) +virSecurityDACSetImageLabel(virSecurityManager *mgr, + virDomainDef *def, + virStorageSource *src, + virSecurityDomainImageLabelFlags flags) { + virStorageSource *parent = src; virStorageSource *n; for (n = src; virStorageSourceIsBacking(n); n = n->backingStore) { @@ -962,15 +962,6 @@ virSecurityDACSetImageLabelRelative(virSecurityManager *mgr, return 0; } -static int -virSecurityDACSetImageLabel(virSecurityManager *mgr, - virDomainDef *def, - virStorageSource *src, - virSecurityDomainImageLabelFlags flags) -{ - return virSecurityDACSetImageLabelRelative(mgr, def, src, src, flags); -} - static int virSecurityDACRestoreImageLabelInt(virSecurityManager *mgr, virDomainDef *def, diff --git a/src/security/security_selinux.c b/src/security/security_selinux.c index d491435ae1..b49af26e49 100644 --- a/src/security/security_selinux.c +++ b/src/security/security_selinux.c @@ -1980,12 +1980,12 @@ virSecuritySELinuxSetImageLabelInternal(virSecurityManager *mgr, static int -virSecuritySELinuxSetImageLabelRelative(virSecurityManager *mgr, - virDomainDef *def, - virStorageSource *src, - virStorageSource *parent, - virSecurityDomainImageLabelFlags flags) +virSecuritySELinuxSetImageLabel(virSecurityManager *mgr, + virDomainDef *def, + virStorageSource *src, + virSecurityDomainImageLabelFlags flags) { + virStorageSource *parent = src; virStorageSource *n; for (n = src; virStorageSourceIsBacking(n); n = n->backingStore) { @@ -2004,15 +2004,6 @@ virSecuritySELinuxSetImageLabelRelative(virSecurityManager *mgr, } -static int -virSecuritySELinuxSetImageLabel(virSecurityManager *mgr, - virDomainDef *def, - virStorageSource *src, - virSecurityDomainImageLabelFlags flags) -{ - return virSecuritySELinuxSetImageLabelRelative(mgr, def, src, src, flags); -} - struct virSecuritySELinuxMoveImageMetadataData { virSecurityManager *mgr; const char *src; -- 2.44.0

On Wed, Mar 20, 2024 at 10:19:09 +0100, Andrea Bolognani wrote:
The single caller for each function passes the same value for @src and @parent, which means that we don't really need the additional API.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> --- src/security/security_dac.c | 19 +++++-------------- src/security/security_selinux.c | 19 +++++-------------- 2 files changed, 10 insertions(+), 28 deletions(-)
Reviewed-by: Peter Krempa <pkrempa@redhat.com>

On 3/20/24 05:19, Andrea Bolognani wrote:
The single caller for each function passes the same value for @src and @parent, which means that we don't really need the additional API.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>

Current entries should always be listed before obsolete ones. Signed-off-by: Andrea Bolognani <abologna@redhat.com> --- src/qemu/libvirtd_qemu.aug | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/qemu/libvirtd_qemu.aug b/src/qemu/libvirtd_qemu.aug index 43485b43fb..2b6526538f 100644 --- a/src/qemu/libvirtd_qemu.aug +++ b/src/qemu/libvirtd_qemu.aug @@ -139,16 +139,16 @@ module Libvirtd_qemu = let swtpm_entry = str_entry "swtpm_user" | str_entry "swtpm_group" + let capability_filters_entry = str_array_entry "capability_filters" + + let storage_entry = bool_entry "storage_use_nbdkit" + (* Entries that used to exist in the config which are now * deleted. We keep on parsing them so we don't break * ability to parse old configs after upgrade *) let obsolete_entry = bool_entry "clear_emulator_capabilities" - let capability_filters_entry = str_array_entry "capability_filters" - - let storage_entry = bool_entry "storage_use_nbdkit" - (* Each entry in the config is one of the following ... *) let entry = default_tls_entry | vnc_entry -- 2.44.0

On Wed, Mar 20, 2024 at 10:19:10 +0100, Andrea Bolognani wrote:
Current entries should always be listed before obsolete ones.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> --- src/qemu/libvirtd_qemu.aug | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
Reviewed-by: Peter Krempa <pkrempa@redhat.com>

On 3/20/24 05:19, Andrea Bolognani wrote:
Current entries should always be listed before obsolete ones.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> --- src/qemu/libvirtd_qemu.aug | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/src/qemu/libvirtd_qemu.aug b/src/qemu/libvirtd_qemu.aug index 43485b43fb..2b6526538f 100644 --- a/src/qemu/libvirtd_qemu.aug +++ b/src/qemu/libvirtd_qemu.aug @@ -139,16 +139,16 @@ module Libvirtd_qemu = let swtpm_entry = str_entry "swtpm_user" | str_entry "swtpm_group"
+ let capability_filters_entry = str_array_entry "capability_filters" + + let storage_entry = bool_entry "storage_use_nbdkit" + (* Entries that used to exist in the config which are now * deleted. We keep on parsing them so we don't break * ability to parse old configs after upgrade *) let obsolete_entry = bool_entry "clear_emulator_capabilities"
- let capability_filters_entry = str_array_entry "capability_filters" - - let storage_entry = bool_entry "storage_use_nbdkit" - (* Each entry in the config is one of the following ... *) let entry = default_tls_entry | vnc_entry
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>

As explained in the comment, this can help in scenarios where a shared filesystem can't be detected as such by libvirt, by giving the admin the opportunity to provide this information manually. Signed-off-by: Andrea Bolognani <abologna@redhat.com> --- src/qemu/libvirtd_qemu.aug | 3 +++ src/qemu/qemu.conf.in | 17 +++++++++++++++++ src/qemu/qemu_conf.c | 17 +++++++++++++++++ src/qemu/qemu_conf.h | 2 ++ src/qemu/test_libvirtd_qemu.aug.in | 5 +++++ 5 files changed, 44 insertions(+) diff --git a/src/qemu/libvirtd_qemu.aug b/src/qemu/libvirtd_qemu.aug index 2b6526538f..1377fd89cc 100644 --- a/src/qemu/libvirtd_qemu.aug +++ b/src/qemu/libvirtd_qemu.aug @@ -143,6 +143,8 @@ module Libvirtd_qemu = let storage_entry = bool_entry "storage_use_nbdkit" + let filesystem_entry = str_array_entry "shared_filesystems" + (* Entries that used to exist in the config which are now * deleted. We keep on parsing them so we don't break * ability to parse old configs after upgrade @@ -173,6 +175,7 @@ module Libvirtd_qemu = | swtpm_entry | capability_filters_entry | storage_entry + | filesystem_entry | obsolete_entry let comment = [ label "#comment" . del /#[ \t]*/ "# " . store /([^ \t\n][^\n]*)?/ . del /\n/ "\n" ] diff --git a/src/qemu/qemu.conf.in b/src/qemu/qemu.conf.in index f406df8749..db42448239 100644 --- a/src/qemu/qemu.conf.in +++ b/src/qemu/qemu.conf.in @@ -986,3 +986,20 @@ # note that the default might change in future releases. # #storage_use_nbdkit = @USE_NBDKIT_DEFAULT@ + +# libvirt will normally prevent migration if the storage backing the VM is not +# on a shared filesystems. Sometimes, however, the storage *is* shared despite +# not being detected as such: for example, this is the case when one of the +# hosts involved in the migration is exporting its local storage to the other +# one via NFS. +# +# Any directory listed here will be assumed to live on a shared filesystem, +# making migration possible in scenarios such as the one described above. +# +# If you need this feature, you probably want to set remember_owner=0 too. +# +#shared_filesystems = [ +# "/var/lib/libvirt/images", +# "/var/lib/libvirt/qemu/nvram", +# "/var/lib/libvirt/swtpm" +#] diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c index 4050a82341..01c6bcc793 100644 --- a/src/qemu/qemu_conf.c +++ b/src/qemu/qemu_conf.c @@ -374,6 +374,8 @@ static void virQEMUDriverConfigDispose(void *obj) g_strfreev(cfg->capabilityfilters); + g_strfreev(cfg->sharedFilesystems); + g_free(cfg->deprecationBehavior); } @@ -1084,6 +1086,18 @@ virQEMUDriverConfigLoadStorageEntry(virQEMUDriverConfig *cfg, } +static int +virQEMUDriverConfigLoadFilesystemEntry(virQEMUDriverConfig *cfg, + virConf *conf) +{ + if (virConfGetValueStringList(conf, "shared_filesystems", false, + &cfg->sharedFilesystems) < 0) + return -1; + + return 0; +} + + int virQEMUDriverConfigLoadFile(virQEMUDriverConfig *cfg, const char *filename, bool privileged) @@ -1158,6 +1172,9 @@ int virQEMUDriverConfigLoadFile(virQEMUDriverConfig *cfg, if (virQEMUDriverConfigLoadStorageEntry(cfg, conf) < 0) return -1; + if (virQEMUDriverConfigLoadFilesystemEntry(cfg, conf) < 0) + return -1; + return 0; } diff --git a/src/qemu/qemu_conf.h b/src/qemu/qemu_conf.h index 36049b4bfa..b53d56be02 100644 --- a/src/qemu/qemu_conf.h +++ b/src/qemu/qemu_conf.h @@ -233,6 +233,8 @@ struct _virQEMUDriverConfig { bool storageUseNbdkit; virQEMUSchedCore schedCore; + + char **sharedFilesystems; }; G_DEFINE_AUTOPTR_CLEANUP_FUNC(virQEMUDriverConfig, virObjectUnref); diff --git a/src/qemu/test_libvirtd_qemu.aug.in b/src/qemu/test_libvirtd_qemu.aug.in index b97e6de11e..f0a7a2a30e 100644 --- a/src/qemu/test_libvirtd_qemu.aug.in +++ b/src/qemu/test_libvirtd_qemu.aug.in @@ -119,3 +119,8 @@ module Test_libvirtd_qemu = { "deprecation_behavior" = "none" } { "sched_core" = "none" } { "storage_use_nbdkit" = "@USE_NBDKIT_DEFAULT@" } +{ "shared_filesystems" + { "1" = "/var/lib/libvirt/images" } + { "2" = "/var/lib/libvirt/qemu/nvram" } + { "3" = "/var/lib/libvirt/swtpm" } +} -- 2.44.0

On Wed, Mar 20, 2024 at 10:19:11 +0100, Andrea Bolognani wrote:
As explained in the comment, this can help in scenarios where a shared filesystem can't be detected as such by libvirt, by giving the admin the opportunity to provide this information manually.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> --- src/qemu/libvirtd_qemu.aug | 3 +++ src/qemu/qemu.conf.in | 17 +++++++++++++++++ src/qemu/qemu_conf.c | 17 +++++++++++++++++ src/qemu/qemu_conf.h | 2 ++ src/qemu/test_libvirtd_qemu.aug.in | 5 +++++ 5 files changed, 44 insertions(+)
[...]
diff --git a/src/qemu/qemu.conf.in b/src/qemu/qemu.conf.in index f406df8749..db42448239 100644 --- a/src/qemu/qemu.conf.in +++ b/src/qemu/qemu.conf.in @@ -986,3 +986,20 @@ # note that the default might change in future releases. # #storage_use_nbdkit = @USE_NBDKIT_DEFAULT@ + +# libvirt will normally prevent migration if the storage backing the VM is not +# on a shared filesystems. Sometimes, however, the storage *is* shared despite +# not being detected as such: for example, this is the case when one of the +# hosts involved in the migration is exporting its local storage to the other +# one via NFS. +# +# Any directory listed here will be assumed to live on a shared filesystem, +# making migration possible in scenarios such as the one described above. +# +# If you need this feature, you probably want to set remember_owner=0 too.
Could you please elaborate why you'd want to disable owner remembering? With remote filesystems this works so I expect that if this makes certain paths behave as shared filesystems, they should behave such without any additional tweaks
+#shared_filesystems = [ +# "/var/lib/libvirt/images", +# "/var/lib/libvirt/qemu/nvram", +# "/var/lib/libvirt/swtpm" +#]
Do we want to give real paths as examples? Users might think that they are the 'suggested' values, while it really depends on how they've configured it. I sugest using something clearly dummy here. The rest looks reasonable.

On Wed, Mar 20, 2024 at 12:37:37 +0100, Peter Krempa wrote:
On Wed, Mar 20, 2024 at 10:19:11 +0100, Andrea Bolognani wrote:
As explained in the comment, this can help in scenarios where a shared filesystem can't be detected as such by libvirt, by giving the admin the opportunity to provide this information manually.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> --- src/qemu/libvirtd_qemu.aug | 3 +++ src/qemu/qemu.conf.in | 17 +++++++++++++++++ src/qemu/qemu_conf.c | 17 +++++++++++++++++ src/qemu/qemu_conf.h | 2 ++ src/qemu/test_libvirtd_qemu.aug.in | 5 +++++ 5 files changed, 44 insertions(+)
[...]
diff --git a/src/qemu/qemu.conf.in b/src/qemu/qemu.conf.in index f406df8749..db42448239 100644 --- a/src/qemu/qemu.conf.in +++ b/src/qemu/qemu.conf.in @@ -986,3 +986,20 @@ # note that the default might change in future releases. # #storage_use_nbdkit = @USE_NBDKIT_DEFAULT@ + +# libvirt will normally prevent migration if the storage backing the VM is not +# on a shared filesystems. Sometimes, however, the storage *is* shared despite +# not being detected as such: for example, this is the case when one of the +# hosts involved in the migration is exporting its local storage to the other +# one via NFS.
TBH this feels a bit weird. Having the NFS server on the host you are migrating from will e.g. cause storage to start have network latency which it didn't before. Additionally you can't even take the host down for maintenance as it's still serving storage for the VM. How is this expected to be used? I don't seem to understand why would anybody want to do this in contrast to e.g. migrating also the storage fully to the destination.

On Wed, Mar 20, 2024 at 01:22:15PM +0100, Peter Krempa wrote:
+# libvirt will normally prevent migration if the storage backing the VM is not +# on a shared filesystems. Sometimes, however, the storage *is* shared despite +# not being detected as such: for example, this is the case when one of the +# hosts involved in the migration is exporting its local storage to the other +# one via NFS.
TBH this feels a bit weird. Having the NFS server on the host you are migrating from will e.g. cause storage to start have network latency which it didn't before.
Additionally you can't even take the host down for maintenance as it's still serving storage for the VM.
How is this expected to be used? I don't seem to understand why would anybody want to do this in contrast to e.g. migrating also the storage fully to the destination.
The main driver for this feature is KubeVirt, and more specifically its integration with a storage provider called Portworx. From what I understand, the way this provider works is that it will initially configure the VM storage so that it's local to the machine running the pod; when migration occurs, however, the same storage will be exported to the destination machine via NFS. Since currently libvirt blocks attempt to migrate from local storage, in order to make things work they're tricking it into thinking it's NFS instead by using an LD_PRELOAD hack[1]. Since that's a... Somewhat questionable approach, the idea here is to provide a more targeted and supportable way to let libvirt know that some host paths are shared, even though a shared filesystem is not used locally. In the more general case, there's really nothing preventing people from creating a similar setup outside of KubeVirt with Portworx. As you rightfully point out, there are some drawbacks to the approach but it's not necessarily an invalid one per se. [1] https://github.com/libopenstorage/stork/pull/1117/files#diff-9aadee401eff19b... -- Andrea Bolognani / Red Hat / Virtualization

On Wed, Mar 20, 2024 at 01:22:15PM +0100, Peter Krempa wrote:
On Wed, Mar 20, 2024 at 12:37:37 +0100, Peter Krempa wrote:
On Wed, Mar 20, 2024 at 10:19:11 +0100, Andrea Bolognani wrote:
As explained in the comment, this can help in scenarios where a shared filesystem can't be detected as such by libvirt, by giving the admin the opportunity to provide this information manually.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> --- src/qemu/libvirtd_qemu.aug | 3 +++ src/qemu/qemu.conf.in | 17 +++++++++++++++++ src/qemu/qemu_conf.c | 17 +++++++++++++++++ src/qemu/qemu_conf.h | 2 ++ src/qemu/test_libvirtd_qemu.aug.in | 5 +++++ 5 files changed, 44 insertions(+)
[...]
diff --git a/src/qemu/qemu.conf.in b/src/qemu/qemu.conf.in index f406df8749..db42448239 100644 --- a/src/qemu/qemu.conf.in +++ b/src/qemu/qemu.conf.in @@ -986,3 +986,20 @@ # note that the default might change in future releases. # #storage_use_nbdkit = @USE_NBDKIT_DEFAULT@ + +# libvirt will normally prevent migration if the storage backing the VM is not +# on a shared filesystems. Sometimes, however, the storage *is* shared despite +# not being detected as such: for example, this is the case when one of the +# hosts involved in the migration is exporting its local storage to the other +# one via NFS.
TBH this feels a bit weird. Having the NFS server on the host you are migrating from will e.g. cause storage to start have network latency which it didn't before.
Additionally you can't even take the host down for maintenance as it's still serving storage for the VM.
You can do a block copy / rebase after you have migrated in order to then pull the storage over to the new host. If disk is much larger than RAM, this has the advantage that you can get the live migration completed quickly, and then let the disk copy run in the background a long time. Obviously not much use for an emergency evacuation of a failing host, but reasonable for planned maint actions. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Wed, Mar 20, 2024 at 12:37:37PM +0100, Peter Krempa wrote:
On Wed, Mar 20, 2024 at 10:19:11 +0100, Andrea Bolognani wrote:
+# libvirt will normally prevent migration if the storage backing the VM is not +# on a shared filesystems. Sometimes, however, the storage *is* shared despite +# not being detected as such: for example, this is the case when one of the +# hosts involved in the migration is exporting its local storage to the other +# one via NFS. +# +# Any directory listed here will be assumed to live on a shared filesystem, +# making migration possible in scenarios such as the one described above. +# +# If you need this feature, you probably want to set remember_owner=0 too.
Could you please elaborate why you'd want to disable owner remembering? With remote filesystems this works so I expect that if this makes certain paths behave as shared filesystems, they should behave such without any additional tweaks
To be quite honest I don't remember exactly why I've added that, but I can confirm that if remember_owner=0 is not used on the destination host then migration will stall for a bit and then fail with error: unable to lock /var/lib/libvirt/swtpm/.../tpm2/.lock for metadata change: Resource temporarily unavailable Things work fine if swtpm is not involved. I'm going to dig deeper, but my guess is that, just like the situation addressed by the last patch, having an additional process complicates things compared to when we need to worry about QEMU only.
+#shared_filesystems = [ +# "/var/lib/libvirt/images", +# "/var/lib/libvirt/qemu/nvram", +# "/var/lib/libvirt/swtpm" +#]
Do we want to give real paths as examples? Users might think that they are the 'suggested' values, while it really depends on how they've configured it. I sugest using something clearly dummy here.
Hopefully people will read the comment, but I can change those to something like shared_filesystems = [ "/path/to/images", "/path/to/nvram", "/path/to/swtpm" ] if you think it's less likely to cause confusion. -- Andrea Bolognani / Red Hat / Virtualization

On Wed, Mar 20, 2024 at 08:43:24AM -0700, Andrea Bolognani wrote:
On Wed, Mar 20, 2024 at 12:37:37PM +0100, Peter Krempa wrote:
On Wed, Mar 20, 2024 at 10:19:11 +0100, Andrea Bolognani wrote:
+# libvirt will normally prevent migration if the storage backing the VM is not +# on a shared filesystems. Sometimes, however, the storage *is* shared despite +# not being detected as such: for example, this is the case when one of the +# hosts involved in the migration is exporting its local storage to the other +# one via NFS. +# +# Any directory listed here will be assumed to live on a shared filesystem, +# making migration possible in scenarios such as the one described above. +# +# If you need this feature, you probably want to set remember_owner=0 too.
Could you please elaborate why you'd want to disable owner remembering? With remote filesystems this works so I expect that if this makes certain paths behave as shared filesystems, they should behave such without any additional tweaks
To be quite honest I don't remember exactly why I've added that, but I can confirm that if remember_owner=0 is not used on the destination host then migration will stall for a bit and then fail with
error: unable to lock /var/lib/libvirt/swtpm/.../tpm2/.lock for metadata change: Resource temporarily unavailable
Things work fine if swtpm is not involved. I'm going to dig deeper, but my guess is that, just like the situation addressed by the last patch, having an additional process complicates things compared to when we need to worry about QEMU only.
I've managed to track this down, and I wasn't far off. The issue is that, when remember_owner is enabled, we perform a bunch of additional locking on files around the actual relabel operation; specifically, we call virSecurityManagerTransactionCommit() with the last argument set to true. This doesn't seem to cause any issues in general, *except* when it comes to swtpm's storage lock. The swtpm process holds this lock while it's running, and only releases it once migration is triggered. So, when we're about to start the target swtpm process and want to prepare the environment by setting up labels, we try to acquire the storage lock, and can't proceed because the source swtpm process is still holding on to it. The hacky patch below makes migration work even when remember_owner is enabled. Obviously I'd rewrite it so that we'd only skip locking for incoming migration, but I wonder if there could be nasty side effects to this... Other ideas? Can we perhaps change things so that swtpm releases the lock earlier upon our request or something like that? diff --git a/src/security/security_manager.c b/src/security/security_manager.c index fc2747c45e..2aa06eaec2 100644 --- a/src/security/security_manager.c +++ b/src/security/security_manager.c @@ -1356,6 +1356,9 @@ virSecurityManagerMetadataLock(virSecurityManager *mgr G_GNUC_UNUSED, continue; } + if (g_str_has_suffix(p, "/.lock")) + continue; + if ((fd = open(p, O_RDWR)) < 0) { if (errno == EROFS) { /* There is nothing we can do for RO filesystem. */ -- Andrea Bolognani / Red Hat / Virtualization

On 3/26/24 11:54, Andrea Bolognani wrote:
On Wed, Mar 20, 2024 at 08:43:24AM -0700, Andrea Bolognani wrote:
On Wed, Mar 20, 2024 at 12:37:37PM +0100, Peter Krempa wrote:
On Wed, Mar 20, 2024 at 10:19:11 +0100, Andrea Bolognani wrote:
+# libvirt will normally prevent migration if the storage backing the VM is not +# on a shared filesystems. Sometimes, however, the storage *is* shared despite +# not being detected as such: for example, this is the case when one of the +# hosts involved in the migration is exporting its local storage to the other +# one via NFS. +# +# Any directory listed here will be assumed to live on a shared filesystem, +# making migration possible in scenarios such as the one described above. +# +# If you need this feature, you probably want to set remember_owner=0 too.
Could you please elaborate why you'd want to disable owner remembering? With remote filesystems this works so I expect that if this makes certain paths behave as shared filesystems, they should behave such without any additional tweaks
To be quite honest I don't remember exactly why I've added that, but I can confirm that if remember_owner=0 is not used on the destination host then migration will stall for a bit and then fail with
error: unable to lock /var/lib/libvirt/swtpm/.../tpm2/.lock for metadata change: Resource temporarily unavailable
Things work fine if swtpm is not involved. I'm going to dig deeper, but my guess is that, just like the situation addressed by the last patch, having an additional process complicates things compared to when we need to worry about QEMU only.
I've managed to track this down, and I wasn't far off.
The issue is that, when remember_owner is enabled, we perform a bunch of additional locking on files around the actual relabel operation; specifically, we call virSecurityManagerTransactionCommit() with the last argument set to true.
This doesn't seem to cause any issues in general, *except* when it comes to swtpm's storage lock. The swtpm process holds this lock while it's running, and only releases it once migration is triggered. So, when we're about to start the target swtpm process and want to prepare the environment by setting up labels, we try to acquire the storage lock, and can't proceed because the source swtpm process is still holding on to it.
Who is 'we try to acquire the storage lock'? Is libvirt trying to acquire swtpm's storage lock? I would assume that only an instance of swtpm would acquire the lock.
The hacky patch below makes migration work even when remember_owner is enabled. Obviously I'd rewrite it so that we'd only skip locking for incoming migration, but I wonder if there could be nasty side effects to this...
Other ideas? Can we perhaps change things so that swtpm releases the lock earlier upon our request or something like that?
Maybe below functiokn needs to be passed an array of exceptions, one being swtpm's lock file. I mean the lock file serves the purpose of locking via filesystem, so I don't think a third party should start interfering here and influencing the protocol ...
diff --git a/src/security/security_manager.c b/src/security/security_manager.c index fc2747c45e..2aa06eaec2 100644 --- a/src/security/security_manager.c +++ b/src/security/security_manager.c @@ -1356,6 +1356,9 @@ virSecurityManagerMetadataLock(virSecurityManager *mgr G_GNUC_UNUSED, continue; }
+ if (g_str_has_suffix(p, "/.lock")) + continue; + if ((fd = open(p, O_RDWR)) < 0) { if (errno == EROFS) { /* There is nothing we can do for RO filesystem. */

On Tue, Mar 26, 2024 at 12:04:21PM -0400, Stefan Berger wrote:
On 3/26/24 11:54, Andrea Bolognani wrote:
On Wed, Mar 20, 2024 at 08:43:24AM -0700, Andrea Bolognani wrote:
On Wed, Mar 20, 2024 at 12:37:37PM +0100, Peter Krempa wrote:
On Wed, Mar 20, 2024 at 10:19:11 +0100, Andrea Bolognani wrote:
+# libvirt will normally prevent migration if the storage backing the VM is not +# on a shared filesystems. Sometimes, however, the storage *is* shared despite +# not being detected as such: for example, this is the case when one of the +# hosts involved in the migration is exporting its local storage to the other +# one via NFS. +# +# Any directory listed here will be assumed to live on a shared filesystem, +# making migration possible in scenarios such as the one described above. +# +# If you need this feature, you probably want to set remember_owner=0 too.
Could you please elaborate why you'd want to disable owner remembering? With remote filesystems this works so I expect that if this makes certain paths behave as shared filesystems, they should behave such without any additional tweaks
To be quite honest I don't remember exactly why I've added that, but I can confirm that if remember_owner=0 is not used on the destination host then migration will stall for a bit and then fail with
error: unable to lock /var/lib/libvirt/swtpm/.../tpm2/.lock for metadata change: Resource temporarily unavailable
Things work fine if swtpm is not involved. I'm going to dig deeper, but my guess is that, just like the situation addressed by the last patch, having an additional process complicates things compared to when we need to worry about QEMU only.
I've managed to track this down, and I wasn't far off.
The issue is that, when remember_owner is enabled, we perform a bunch of additional locking on files around the actual relabel operation; specifically, we call virSecurityManagerTransactionCommit() with the last argument set to true.
This doesn't seem to cause any issues in general, *except* when it comes to swtpm's storage lock. The swtpm process holds this lock while it's running, and only releases it once migration is triggered. So, when we're about to start the target swtpm process and want to prepare the environment by setting up labels, we try to acquire the storage lock, and can't proceed because the source swtpm process is still holding on to it.
Who is 'we try to acquire the storage lock'? Is libvirt trying to acquire swtpm's storage lock? I would assume that only an instance of swtpm would acquire the lock.
Yes, it's libvirt doing that. The lock is only held for a brief period while labels are being applied, and released immediately afterwards. swtpm is only launched once that's done, so generally there's no conflict. In the migration case, things have worked so far because labeling in general has been skipped for shared filesystem, which means that locking was not performed either. This series changes things so that labeling is necessary.
The hacky patch below makes migration work even when remember_owner is enabled. Obviously I'd rewrite it so that we'd only skip locking for incoming migration, but I wonder if there could be nasty side effects to this...
Other ideas? Can we perhaps change things so that swtpm releases the lock earlier upon our request or something like that?
Maybe below functiokn needs to be passed an array of exceptions, one being swtpm's lock file. I mean the lock file serves the purpose of locking via filesystem, so I don't think a third party should start interfering here and influencing the protocol ...
I guess the idea behind the locking is to prevent unrelated processes (and possibly other libvirt threads?) from stepping on each other's toes. Some exceptions are carved out already, so it's not like there's no precedent for what we're suggesting. I'm just concerned about accidentally opening the door for fun race conditions or CVEs. -- Andrea Bolognani / Red Hat / Virtualization

On 3/26/24 12:38, Andrea Bolognani wrote:
On Tue, Mar 26, 2024 at 12:04:21PM -0400, Stefan Berger wrote:
On 3/26/24 11:54, Andrea Bolognani wrote:
On Wed, Mar 20, 2024 at 08:43:24AM -0700, Andrea Bolognani wrote:
On Wed, Mar 20, 2024 at 12:37:37PM +0100, Peter Krempa wrote:
On Wed, Mar 20, 2024 at 10:19:11 +0100, Andrea Bolognani wrote:
+# libvirt will normally prevent migration if the storage backing the VM is not +# on a shared filesystems. Sometimes, however, the storage *is* shared despite +# not being detected as such: for example, this is the case when one of the +# hosts involved in the migration is exporting its local storage to the other +# one via NFS. +# +# Any directory listed here will be assumed to live on a shared filesystem, +# making migration possible in scenarios such as the one described above. +# +# If you need this feature, you probably want to set remember_owner=0 too.
Could you please elaborate why you'd want to disable owner remembering? With remote filesystems this works so I expect that if this makes certain paths behave as shared filesystems, they should behave such without any additional tweaks
To be quite honest I don't remember exactly why I've added that, but I can confirm that if remember_owner=0 is not used on the destination host then migration will stall for a bit and then fail with
error: unable to lock /var/lib/libvirt/swtpm/.../tpm2/.lock for metadata change: Resource temporarily unavailable
Things work fine if swtpm is not involved. I'm going to dig deeper, but my guess is that, just like the situation addressed by the last patch, having an additional process complicates things compared to when we need to worry about QEMU only.
I've managed to track this down, and I wasn't far off.
The issue is that, when remember_owner is enabled, we perform a bunch of additional locking on files around the actual relabel operation; specifically, we call virSecurityManagerTransactionCommit() with the last argument set to true.
This doesn't seem to cause any issues in general, *except* when it comes to swtpm's storage lock. The swtpm process holds this lock while it's running, and only releases it once migration is triggered. So, when we're about to start the target swtpm process and want to prepare the environment by setting up labels, we try to acquire the storage lock, and can't proceed because the source swtpm process is still holding on to it.
Who is 'we try to acquire the storage lock'? Is libvirt trying to acquire swtpm's storage lock? I would assume that only an instance of swtpm would acquire the lock.
Yes, it's libvirt doing that. The lock is only held for a brief period while labels are being applied, and released immediately afterwards. swtpm is only launched once that's done, so generally there's no conflict.
Yes, I saw the code now. It kind of prevents lock files from being used.
In the migration case, things have worked so far because labeling in general has been skipped for shared filesystem, which means that locking was not performed either. This series changes things so that labeling is necessary.
Thanks for the re-cap. Does libvirt actually get involved in case of a migration failure and fallback to the source host so that it could again relabel all files before QEMU and swtpm (and possibly other virtual devices) again open their files?
The hacky patch below makes migration work even when remember_owner is enabled. Obviously I'd rewrite it so that we'd only skip locking for incoming migration, but I wonder if there could be nasty side effects to this...
Other ideas? Can we perhaps change things so that swtpm releases the lock earlier upon our request or something like that?
Maybe below functiokn needs to be passed an array of exceptions, one being swtpm's lock file. I mean the lock file serves the purpose of locking via filesystem, so I don't think a third party should start interfering here and influencing the protocol ...
I guess the idea behind the locking is to prevent unrelated processes (and possibly other libvirt threads?) from stepping on each other's toes. Some exceptions are carved out already, so it's not like there's no precedent for what we're suggesting. I'm just concerned about accidentally opening the door for fun race conditions or CVEs.
Hm, maybe exceptions of filenames not to lock could be configured in the config file instead of hard coded but with concrete names already given in the sample config file so that users don't need to find out. Some ideas about exceptions for files not to lock: - there is a list of exception of files - check whether a file is locked by a process with a given name in an exception list (e.g., swtpm in /proc/<flock->l_pid>/comm ).

On Tue, Mar 26, 2024 at 01:15:41PM -0400, Stefan Berger wrote:
On 3/26/24 12:38, Andrea Bolognani wrote:
On Tue, Mar 26, 2024 at 12:04:21PM -0400, Stefan Berger wrote:
On 3/26/24 11:54, Andrea Bolognani wrote:
The issue is that, when remember_owner is enabled, we perform a bunch of additional locking on files around the actual relabel operation; specifically, we call virSecurityManagerTransactionCommit() with the last argument set to true.
This doesn't seem to cause any issues in general, *except* when it comes to swtpm's storage lock. The swtpm process holds this lock while it's running, and only releases it once migration is triggered. So, when we're about to start the target swtpm process and want to prepare the environment by setting up labels, we try to acquire the storage lock, and can't proceed because the source swtpm process is still holding on to it.
Who is 'we try to acquire the storage lock'? Is libvirt trying to acquire swtpm's storage lock? I would assume that only an instance of swtpm would acquire the lock.
Yes, it's libvirt doing that. The lock is only held for a brief period while labels are being applied, and released immediately afterwards. swtpm is only launched once that's done, so generally there's no conflict.
Yes, I saw the code now. It kind of prevents lock files from being used.
In the migration case, things have worked so far because labeling in general has been skipped for shared filesystem, which means that locking was not performed either. This series changes things so that labeling is necessary.
Thanks for the re-cap. Does libvirt actually get involved in case of a migration failure and fallback to the source host so that it could again relabel all files before QEMU and swtpm (and possibly other virtual devices) again open their files?
Determining this is next in my to-do list :) As I've explained in an earlier message, I believe things will be fine on account of at least one of the hosts involved accessing the storage via NFS, which doesn't support SELinux and is thus unaffected by changes to labeling. But I still need to actually confirm that.
The hacky patch below makes migration work even when remember_owner is enabled. Obviously I'd rewrite it so that we'd only skip locking for incoming migration, but I wonder if there could be nasty side effects to this...
Other ideas? Can we perhaps change things so that swtpm releases the lock earlier upon our request or something like that?
Maybe below functiokn needs to be passed an array of exceptions, one being swtpm's lock file. I mean the lock file serves the purpose of locking via filesystem, so I don't think a third party should start interfering here and influencing the protocol ...
I guess the idea behind the locking is to prevent unrelated processes (and possibly other libvirt threads?) from stepping on each other's toes. Some exceptions are carved out already, so it's not like there's no precedent for what we're suggesting. I'm just concerned about accidentally opening the door for fun race conditions or CVEs.
Hm, maybe exceptions of filenames not to lock could be configured in the config file instead of hard coded but with concrete names already given in the sample config file so that users don't need to find out.
Some ideas about exceptions for files not to lock: - there is a list of exception of files - check whether a file is locked by a process with a given name in an exception list (e.g., swtpm in /proc/<flock->l_pid>/comm ).
Ideally users wouldn't need to configure any of that, and libvirt can instead figure things out automatically. I had the "check whether the process currently holding the lock is swtpm, ignore the failure if so" thought as well, perhaps that could be the way to go. Or keep it simple and just don't attempt to lock the swtpm files if we know we're preparing for migration, if we can determine that it's safe to do so. -- Andrea Bolognani / Red Hat / Virtualization

On Tue, Mar 26, 2024 at 08:54:03AM -0700, Andrea Bolognani wrote:
On Wed, Mar 20, 2024 at 08:43:24AM -0700, Andrea Bolognani wrote:
On Wed, Mar 20, 2024 at 12:37:37PM +0100, Peter Krempa wrote:
On Wed, Mar 20, 2024 at 10:19:11 +0100, Andrea Bolognani wrote:
+# libvirt will normally prevent migration if the storage backing the VM is not +# on a shared filesystems. Sometimes, however, the storage *is* shared despite +# not being detected as such: for example, this is the case when one of the +# hosts involved in the migration is exporting its local storage to the other +# one via NFS. +# +# Any directory listed here will be assumed to live on a shared filesystem, +# making migration possible in scenarios such as the one described above. +# +# If you need this feature, you probably want to set remember_owner=0 too.
Could you please elaborate why you'd want to disable owner remembering? With remote filesystems this works so I expect that if this makes certain paths behave as shared filesystems, they should behave such without any additional tweaks
To be quite honest I don't remember exactly why I've added that, but I can confirm that if remember_owner=0 is not used on the destination host then migration will stall for a bit and then fail with
error: unable to lock /var/lib/libvirt/swtpm/.../tpm2/.lock for metadata change: Resource temporarily unavailable
Things work fine if swtpm is not involved. I'm going to dig deeper, but my guess is that, just like the situation addressed by the last patch, having an additional process complicates things compared to when we need to worry about QEMU only.
I've managed to track this down, and I wasn't far off.
The issue is that, when remember_owner is enabled, we perform a bunch of additional locking on files around the actual relabel operation; specifically, we call virSecurityManagerTransactionCommit() with the last argument set to true.
This doesn't seem to cause any issues in general, *except* when it comes to swtpm's storage lock.
I take this back. Further testing has confirmed that there are other scenarios in which this is problematic, for example NVRAM files. Even with the TPM locking issue out of the way, we still have to deal with an underlying issue: label remembering is implemented by setting custom XATTRs on the files involved, and NFS doesn't support XATTRs. So what will happen when remember_owner=1 and the VM getting started on the host that has local access to the filesystem is, that these XATTRs will get set and not cleared up on migration; the VM will make it safely to the other host. Then, when you attempt to migrate it back, libvirt will report Setting different SELinux label on /var/lib/libvirt/qemu/nvram/test_VARS.fd which is already in use and abort the operation. This is caused by # getfattr -dm- /var/lib/libvirt/qemu/nvram/test_VARS.fd security.selinux="system_u:object_r:svirt_image_t:s0:c704,c774" trusted.libvirt.security.dac="+0:+0" trusted.libvirt.security.ref_dac="1" trusted.libvirt.security.ref_selinux="1" trusted.libvirt.security.selinux="system_u:object_r:svirt_image_t:s0" trusted.libvirt.security.timestamp_dac="1713347549" trusted.libvirt.security.timestamp_selinux="1713347549" In particular, the non-zero reference counts are what convinces libvirt to bail. I don't think we can handle this nicely without opening other cans of worms. The principle of "don't touch anything on the source once the VM is running on the destination" is a good one to follow, since ignoring it might result in the destination VM suddenly being prevented from performing I/O. And in case the migration fails, which as you both pointed out earlier is a possibility that we need to take into account, retaining the existing labels instead of clearing them is actually a good thing, as it allows resuming execution on the source host without running into permission issues. In other words, I've come to the conclusion that remember_owner=0 is the least bad option here. In the v2 that I've just posted, I've updated the comment to stress further the fact that this option comes with caveats and is not to be used lightly. -- Andrea Bolognani / Red Hat / Virtualization

virFileIsSharedFS() is the function that ultimately decides whether a filesystem should be considered shared, but the list of manually configured shared filesystems is part of the QEMU driver's configuration, so we need to pass the information through several layers in order to make use of it. Note that with this change the list is propagated all the way through, but its contents are still ignored, so the behavior remains the same for now. Signed-off-by: Andrea Bolognani <abologna@redhat.com> --- src/lxc/lxc_controller.c | 2 +- src/lxc/lxc_driver.c | 2 +- src/lxc/lxc_process.c | 4 ++-- src/qemu/qemu_domain.c | 2 +- src/qemu/qemu_extdevice.c | 2 +- src/qemu/qemu_migration.c | 12 ++++++++---- src/qemu/qemu_security.c | 14 ++++++++++++-- src/qemu/qemu_tpm.c | 27 ++++++++++++++++++--------- src/qemu/qemu_tpm.h | 8 +++++--- src/security/security_apparmor.c | 2 ++ src/security/security_dac.c | 17 +++++++++++++---- src/security/security_driver.h | 4 ++++ src/security/security_manager.c | 20 ++++++++++++++------ src/security/security_manager.h | 4 ++++ src/security/security_nop.c | 4 ++++ src/security/security_selinux.c | 18 +++++++++++++++--- src/security/security_stack.c | 16 ++++++++++++---- src/util/virfile.c | 5 +++-- src/util/virfile.h | 3 ++- tests/securityselinuxlabeltest.c | 2 +- tests/virfiletest.c | 2 +- 21 files changed, 124 insertions(+), 46 deletions(-) diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index 505b71d05e..0b82fb9624 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -1919,7 +1919,7 @@ static int virLXCControllerSetupDisk(virLXCController *ctrl, /* Labelling normally operates on src, but we need * to actually label the dst here, so hack the config */ def->src->path = dst; - if (virSecurityManagerSetImageLabel(securityDriver, ctrl->def, def->src, + if (virSecurityManagerSetImageLabel(securityDriver, ctrl->def, def->src, NULL, VIR_SECURITY_DOMAIN_IMAGE_LABEL_BACKING_CHAIN) < 0) goto cleanup; diff --git a/src/lxc/lxc_driver.c b/src/lxc/lxc_driver.c index 39992bdf96..701d22efef 100644 --- a/src/lxc/lxc_driver.c +++ b/src/lxc/lxc_driver.c @@ -3260,7 +3260,7 @@ lxcDomainAttachDeviceMknodHelper(pid_t pid G_GNUC_UNUSED, char *tmpsrc = def->src->path; def->src->path = data->file; if (virSecurityManagerSetImageLabel(data->driver->securityManager, - data->vm->def, def->src, + data->vm->def, def->src, NULL, VIR_SECURITY_DOMAIN_IMAGE_LABEL_BACKING_CHAIN) < 0) { def->src->path = tmpsrc; goto cleanup; diff --git a/src/lxc/lxc_process.c b/src/lxc/lxc_process.c index bfdcefd01b..a426d915ab 100644 --- a/src/lxc/lxc_process.c +++ b/src/lxc/lxc_process.c @@ -171,7 +171,7 @@ static void virLXCProcessCleanup(virLXCDriver *driver, if (flags & VIR_LXC_PROCESS_CLEANUP_RESTORE_SECLABEL) { virSecurityManagerRestoreAllLabel(driver->securityManager, - vm->def, false, false); + vm->def, NULL, false, false); } if (flags & VIR_LXC_PROCESS_CLEANUP_RELEASE_SECLABEL) { @@ -1327,7 +1327,7 @@ int virLXCProcessStart(virLXCDriver * driver, VIR_DEBUG("Setting domain security labels"); if (virSecurityManagerSetAllLabel(driver->securityManager, - vm->def, NULL, false, false) < 0) + vm->def, NULL, NULL, false, false) < 0) goto cleanup; stopFlags |= VIR_LXC_PROCESS_CLEANUP_RESTORE_SECLABEL; diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index bc6cf133d4..a2f22dafe8 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -11877,7 +11877,7 @@ virQEMUFileOpenAs(uid_t fallback_uid, bool need_unlink = false; unsigned int vfoflags = 0; int fd = -1; - int path_shared = virFileIsSharedFS(path); + int path_shared = virFileIsSharedFS(path, NULL); uid_t uid = geteuid(); gid_t gid = getegid(); diff --git a/src/qemu/qemu_extdevice.c b/src/qemu/qemu_extdevice.c index ed5976d1f7..dc1bb56237 100644 --- a/src/qemu/qemu_extdevice.c +++ b/src/qemu/qemu_extdevice.c @@ -165,7 +165,7 @@ qemuExtDevicesCleanupHost(virQEMUDriver *driver, virDomainTPMDef *tpm = def->tpms[i]; if (tpm->type == VIR_DOMAIN_TPM_TYPE_EMULATOR) - qemuExtTPMCleanupHost(tpm, flags, outgoingMigration); + qemuExtTPMCleanupHost(driver, tpm, flags, outgoingMigration); } } diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 1faab5dd23..330efb069b 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1429,6 +1429,8 @@ qemuMigrationSrcIsAllowed(virDomainObj *vm, unsigned int flags) { qemuDomainObjPrivate *priv = vm->privateData; + virQEMUDriver *driver = priv->driver; + g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver); int nsnapshots; int pauseReason; size_t i; @@ -1599,7 +1601,7 @@ qemuMigrationSrcIsAllowed(virDomainObj *vm, } } - if (qemuTPMHasSharedStorage(vm->def)&& + if (qemuTPMHasSharedStorage(vm->def, cfg->sharedFilesystems) && !qemuTPMCanMigrateSharedStorage(vm->def)) { virReportError(VIR_ERR_NO_SUPPORT, "%s", _("the running swtpm does not support migration with shared storage")); @@ -1612,6 +1614,7 @@ qemuMigrationSrcIsAllowed(virDomainObj *vm, static bool qemuMigrationSrcIsSafe(virDomainDef *def, + virQEMUDriverConfig *cfg, virQEMUCaps *qemuCaps, size_t nmigrate_disks, const char **migrate_disks, @@ -1643,7 +1646,7 @@ qemuMigrationSrcIsSafe(virDomainDef *def, /* However, disks on local FS (e.g. ext4) are not safe. */ switch (actualType) { case VIR_STORAGE_TYPE_FILE: - if ((rc = virFileIsSharedFS(src)) < 0) { + if ((rc = virFileIsSharedFS(src, cfg->sharedFilesystems)) < 0) { return false; } else if (rc == 0) { unsafe = true; @@ -2582,6 +2585,7 @@ qemuMigrationSrcBeginPhase(virQEMUDriver *driver, const char **migrate_disks, unsigned int flags) { + g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver); qemuDomainObjPrivate *priv = vm->privateData; unsigned int cookieFlags = QEMU_MIGRATION_COOKIE_LOCKSTATE; @@ -2604,7 +2608,7 @@ qemuMigrationSrcBeginPhase(virQEMUDriver *driver, return NULL; if (!(flags & (VIR_MIGRATE_UNSAFE | VIR_MIGRATE_OFFLINE)) && - !qemuMigrationSrcIsSafe(vm->def, priv->qemuCaps, + !qemuMigrationSrcIsSafe(vm->def, cfg, priv->qemuCaps, nmigrate_disks, migrate_disks, flags)) return NULL; @@ -6091,7 +6095,7 @@ qemuMigrationSrcPerformJob(virQEMUDriver *driver, goto endjob; if (!(flags & (VIR_MIGRATE_UNSAFE | VIR_MIGRATE_OFFLINE)) && - !qemuMigrationSrcIsSafe(vm->def, priv->qemuCaps, + !qemuMigrationSrcIsSafe(vm->def, cfg, priv->qemuCaps, nmigrate_disks, migrate_disks, flags)) goto endjob; diff --git a/src/qemu/qemu_security.c b/src/qemu/qemu_security.c index 4aaa863ae9..3aaa93a76c 100644 --- a/src/qemu/qemu_security.c +++ b/src/qemu/qemu_security.c @@ -38,6 +38,7 @@ qemuSecuritySetAllLabel(virQEMUDriver *driver, { int ret = -1; qemuDomainObjPrivate *priv = vm->privateData; + g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver); pid_t pid = -1; if (qemuDomainNamespaceEnabled(vm, QEMU_DOMAIN_NS_MOUNT)) @@ -48,6 +49,7 @@ qemuSecuritySetAllLabel(virQEMUDriver *driver, if (virSecurityManagerSetAllLabel(driver->securityManager, vm->def, + cfg->sharedFilesystems, incomingPath, priv->chardevStdioLogd, migrated) < 0) @@ -70,6 +72,7 @@ qemuSecurityRestoreAllLabel(virQEMUDriver *driver, bool migrated) { qemuDomainObjPrivate *priv = vm->privateData; + g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver); bool transactionStarted = false; /* In contrast to qemuSecuritySetAllLabel, do not use vm->pid @@ -83,6 +86,7 @@ qemuSecurityRestoreAllLabel(virQEMUDriver *driver, virSecurityManagerRestoreAllLabel(driver->securityManager, vm->def, + cfg->sharedFilesystems, migrated, priv->chardevStdioLogd); @@ -103,6 +107,7 @@ qemuSecuritySetImageLabel(virQEMUDriver *driver, bool chainTop) { qemuDomainObjPrivate *priv = vm->privateData; + g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver); pid_t pid = -1; int ret = -1; virSecurityDomainImageLabelFlags labelFlags = 0; @@ -120,7 +125,9 @@ qemuSecuritySetImageLabel(virQEMUDriver *driver, goto cleanup; if (virSecurityManagerSetImageLabel(driver->securityManager, - vm->def, src, labelFlags) < 0) + vm->def, src, + cfg->sharedFilesystems, + labelFlags) < 0) goto cleanup; if (virSecurityManagerTransactionCommit(driver->securityManager, @@ -141,6 +148,7 @@ qemuSecurityRestoreImageLabel(virQEMUDriver *driver, bool backingChain) { qemuDomainObjPrivate *priv = vm->privateData; + g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver); pid_t pid = -1; int ret = -1; virSecurityDomainImageLabelFlags labelFlags = 0; @@ -155,7 +163,9 @@ qemuSecurityRestoreImageLabel(virQEMUDriver *driver, goto cleanup; if (virSecurityManagerRestoreImageLabel(driver->securityManager, - vm->def, src, labelFlags) < 0) + vm->def, src, + cfg->sharedFilesystems, + labelFlags) < 0) goto cleanup; if (virSecurityManagerTransactionCommit(driver->securityManager, diff --git a/src/qemu/qemu_tpm.c b/src/qemu/qemu_tpm.c index bf0c6bcb0d..f1b4283a70 100644 --- a/src/qemu/qemu_tpm.c +++ b/src/qemu/qemu_tpm.c @@ -538,6 +538,7 @@ qemuTPMEmulatorReconfigure(const char *storagepath, * @privileged: whether we are running in privileged mode * @swtpm_user: The uid for the swtpm to run as (drop privileges to from root) * @swtpm_group: The gid for the swtpm to run as + * @sharedFilesystems: list of filesystem to consider shared * @incomingMigration: whether we have an incoming migration * * Create the virCommand use for starting the emulator @@ -551,6 +552,7 @@ qemuTPMEmulatorBuildCommand(virDomainTPMDef *tpm, bool privileged, uid_t swtpm_user, gid_t swtpm_group, + char *const *sharedFilesystems, bool incomingMigration) { g_autoptr(virCommand) cmd = NULL; @@ -568,7 +570,7 @@ qemuTPMEmulatorBuildCommand(virDomainTPMDef *tpm, /* Do not create storage and run swtpm_setup on incoming migration over * shared storage */ - on_shared_storage = virFileIsSharedFS(tpm->data.emulator.storagepath) == 1; + on_shared_storage = virFileIsSharedFS(tpm->data.emulator.storagepath, sharedFilesystems) == 1; if (incomingMigration && on_shared_storage) create_storage = false; @@ -734,6 +736,7 @@ qemuTPMEmulatorInitPaths(virDomainTPMDef *tpm, /** * qemuTPMEmulatorCleanupHost: + * @driver: QEMU driver * @tpm: TPM definition * @flags: flags indicating whether to keep or remove TPM persistent state * @outgoingMigration: whether cleanup is due to an outgoing migration @@ -741,15 +744,18 @@ qemuTPMEmulatorInitPaths(virDomainTPMDef *tpm, * Clean up persistent storage for the swtpm. */ static void -qemuTPMEmulatorCleanupHost(virDomainTPMDef *tpm, +qemuTPMEmulatorCleanupHost(virQEMUDriver *driver, + virDomainTPMDef *tpm, virDomainUndefineFlagsValues flags, bool outgoingMigration) { + g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver); + /* Never remove the state in case of outgoing migration with shared * storage. */ if (outgoingMigration && - virFileIsSharedFS(tpm->data.emulator.storagepath) == 1) + virFileIsSharedFS(tpm->data.emulator.storagepath, cfg->sharedFilesystems) == 1) return; /* @@ -935,6 +941,7 @@ qemuTPMEmulatorStart(virQEMUDriver *driver, driver->privileged, cfg->swtpm_user, cfg->swtpm_group, + cfg->sharedFilesystems, incomingMigration))) return -1; @@ -950,7 +957,7 @@ qemuTPMEmulatorStart(virQEMUDriver *driver, virCommandSetErrorFD(cmd, &errfd); if (incomingMigration && - virFileIsSharedFS(tpm->data.emulator.storagepath) == 1) { + virFileIsSharedFS(tpm->data.emulator.storagepath, cfg->sharedFilesystems) == 1) { /* security labels must have been set up on source already */ setTPMStateLabel = false; } @@ -1010,7 +1017,8 @@ qemuTPMEmulatorStart(virQEMUDriver *driver, bool -qemuTPMHasSharedStorage(virDomainDef *def) +qemuTPMHasSharedStorage(virDomainDef *def, + char *const *sharedFilesystems) { size_t i; @@ -1019,7 +1027,7 @@ qemuTPMHasSharedStorage(virDomainDef *def) switch (tpm->type) { case VIR_DOMAIN_TPM_TYPE_EMULATOR: - return virFileIsSharedFS(tpm->data.emulator.storagepath) == 1; + return virFileIsSharedFS(tpm->data.emulator.storagepath, sharedFilesystems) == 1; case VIR_DOMAIN_TPM_TYPE_PASSTHROUGH: case VIR_DOMAIN_TPM_TYPE_EXTERNAL: case VIR_DOMAIN_TPM_TYPE_LAST: @@ -1097,11 +1105,12 @@ qemuExtTPMPrepareHost(virQEMUDriver *driver, void -qemuExtTPMCleanupHost(virDomainTPMDef *tpm, +qemuExtTPMCleanupHost(virQEMUDriver *driver, + virDomainTPMDef *tpm, virDomainUndefineFlagsValues flags, bool outgoingMigration) { - qemuTPMEmulatorCleanupHost(tpm, flags, outgoingMigration); + qemuTPMEmulatorCleanupHost(driver, tpm, flags, outgoingMigration); } @@ -1133,7 +1142,7 @@ qemuExtTPMStop(virQEMUDriver *driver, return; qemuTPMEmulatorStop(cfg->swtpmStateDir, shortName); - if (outgoingMigration && qemuTPMHasSharedStorage(vm->def)) + if (outgoingMigration && qemuTPMHasSharedStorage(vm->def, cfg->sharedFilesystems)) restoreTPMStateLabel = false; if (qemuSecurityRestoreTPMLabels(driver, vm, restoreTPMStateLabel) < 0) diff --git a/src/qemu/qemu_tpm.h b/src/qemu/qemu_tpm.h index 33ba5d2268..709e956fce 100644 --- a/src/qemu/qemu_tpm.h +++ b/src/qemu/qemu_tpm.h @@ -35,10 +35,11 @@ int qemuExtTPMPrepareHost(virQEMUDriver *driver, ATTRIBUTE_NONNULL(3) G_GNUC_WARN_UNUSED_RESULT; -void qemuExtTPMCleanupHost(virDomainTPMDef *tpm, +void qemuExtTPMCleanupHost(virQEMUDriver *driver, + virDomainTPMDef *tpm, virDomainUndefineFlagsValues flags, bool outgoingMigration) - ATTRIBUTE_NONNULL(1); + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); int qemuExtTPMStart(virQEMUDriver *driver, virDomainObj *vm, @@ -59,7 +60,8 @@ int qemuExtTPMSetupCgroup(virQEMUDriver *driver, ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3) G_GNUC_WARN_UNUSED_RESULT; -bool qemuTPMHasSharedStorage(virDomainDef *def) +bool qemuTPMHasSharedStorage(virDomainDef *def, + char *const *sharedFilesystems) ATTRIBUTE_NONNULL(1) G_GNUC_WARN_UNUSED_RESULT; diff --git a/src/security/security_apparmor.c b/src/security/security_apparmor.c index c1dc859751..8746c96275 100644 --- a/src/security/security_apparmor.c +++ b/src/security/security_apparmor.c @@ -508,6 +508,7 @@ AppArmorReleaseSecurityLabel(virSecurityManager *mgr G_GNUC_UNUSED, static int AppArmorRestoreSecurityAllLabel(virSecurityManager *mgr G_GNUC_UNUSED, virDomainDef *def, + char *const *sharedFilesystems G_GNUC_UNUSED, bool migrated G_GNUC_UNUSED, bool chardevStdioLogd G_GNUC_UNUSED) { @@ -627,6 +628,7 @@ static int AppArmorRestoreSecurityImageLabel(virSecurityManager *mgr, virDomainDef *def, virStorageSource *src, + char *const *sharedFilesystems G_GNUC_UNUSED, virSecurityDomainImageLabelFlags flags G_GNUC_UNUSED) { if (!virStorageSourceIsLocalStorage(src)) diff --git a/src/security/security_dac.c b/src/security/security_dac.c index 567be4bd23..376b364beb 100644 --- a/src/security/security_dac.c +++ b/src/security/security_dac.c @@ -864,6 +864,7 @@ virSecurityDACSetImageLabelInternal(virSecurityManager *mgr, virDomainDef *def, virStorageSource *src, virStorageSource *parent, + char *const *sharedFilesystems G_GNUC_UNUSED, bool isChainTop) { virSecurityLabelDef *secdef; @@ -942,6 +943,7 @@ static int virSecurityDACSetImageLabel(virSecurityManager *mgr, virDomainDef *def, virStorageSource *src, + char *const *sharedFilesystems, virSecurityDomainImageLabelFlags flags) { virStorageSource *parent = src; @@ -950,7 +952,7 @@ virSecurityDACSetImageLabel(virSecurityManager *mgr, for (n = src; virStorageSourceIsBacking(n); n = n->backingStore) { const bool isChainTop = flags & VIR_SECURITY_DOMAIN_IMAGE_PARENT_CHAIN_TOP; - if (virSecurityDACSetImageLabelInternal(mgr, def, n, parent, isChainTop) < 0) + if (virSecurityDACSetImageLabelInternal(mgr, def, n, parent, sharedFilesystems, isChainTop) < 0) return -1; if (!(flags & VIR_SECURITY_DOMAIN_IMAGE_LABEL_BACKING_CHAIN)) @@ -966,6 +968,7 @@ static int virSecurityDACRestoreImageLabelInt(virSecurityManager *mgr, virDomainDef *def, virStorageSource *src, + char *const *sharedFilesystems, bool migrated) { virSecurityDACData *priv = virSecurityManagerGetPrivateData(mgr); @@ -1006,7 +1009,7 @@ virSecurityDACRestoreImageLabelInt(virSecurityManager *mgr, if (!src->path) return 0; - if ((rc = virFileIsSharedFS(src->path)) < 0) + if ((rc = virFileIsSharedFS(src->path, sharedFilesystems)) < 0) return -1; } @@ -1042,9 +1045,10 @@ static int virSecurityDACRestoreImageLabel(virSecurityManager *mgr, virDomainDef *def, virStorageSource *src, + char *const *sharedFilesystems, virSecurityDomainImageLabelFlags flags G_GNUC_UNUSED) { - return virSecurityDACRestoreImageLabelInt(mgr, def, src, false); + return virSecurityDACRestoreImageLabelInt(mgr, def, src, sharedFilesystems, false); } @@ -1886,6 +1890,7 @@ virSecurityDACRestoreSysinfoLabel(virSecurityManager *mgr, static int virSecurityDACRestoreAllLabel(virSecurityManager *mgr, virDomainDef *def, + char *const *sharedFilesystems, bool migrated, bool chardevStdioLogd) { @@ -1911,6 +1916,7 @@ virSecurityDACRestoreAllLabel(virSecurityManager *mgr, if (virSecurityDACRestoreImageLabelInt(mgr, def, def->disks[i]->src, + sharedFilesystems, migrated) < 0) rc = -1; } @@ -1967,7 +1973,7 @@ virSecurityDACRestoreAllLabel(virSecurityManager *mgr, if (def->os.loader && def->os.loader->nvram) { if (virSecurityDACRestoreImageLabelInt(mgr, def, def->os.loader->nvram, - migrated) < 0) + sharedFilesystems, migrated) < 0) rc = -1; } @@ -2109,6 +2115,7 @@ virSecurityDACSetSysinfoLabel(virSecurityManager *mgr, static int virSecurityDACSetAllLabel(virSecurityManager *mgr, virDomainDef *def, + char *const *sharedFilesystems, const char *incomingPath G_GNUC_UNUSED, bool chardevStdioLogd, bool migrated G_GNUC_UNUSED) @@ -2134,6 +2141,7 @@ virSecurityDACSetAllLabel(virSecurityManager *mgr, if (virDomainDiskGetType(def->disks[i]) == VIR_STORAGE_TYPE_DIR) continue; if (virSecurityDACSetImageLabel(mgr, def, def->disks[i]->src, + sharedFilesystems, VIR_SECURITY_DOMAIN_IMAGE_LABEL_BACKING_CHAIN | VIR_SECURITY_DOMAIN_IMAGE_PARENT_CHAIN_TOP) < 0) return -1; @@ -2193,6 +2201,7 @@ virSecurityDACSetAllLabel(virSecurityManager *mgr, if (def->os.loader && def->os.loader->nvram) { if (virSecurityDACSetImageLabel(mgr, def, def->os.loader->nvram, + sharedFilesystems, VIR_SECURITY_DOMAIN_IMAGE_LABEL_BACKING_CHAIN | VIR_SECURITY_DOMAIN_IMAGE_PARENT_CHAIN_TOP) < 0) return -1; diff --git a/src/security/security_driver.h b/src/security/security_driver.h index aa1fb2125d..ea990d7210 100644 --- a/src/security/security_driver.h +++ b/src/security/security_driver.h @@ -81,11 +81,13 @@ typedef int (*virSecurityDomainReleaseLabel) (virSecurityManager *mgr, virDomainDef *sec); typedef int (*virSecurityDomainSetAllLabel) (virSecurityManager *mgr, virDomainDef *sec, + char *const *sharedFilesystems, const char *incomingPath, bool chardevStdioLogd, bool migrated); typedef int (*virSecurityDomainRestoreAllLabel) (virSecurityManager *mgr, virDomainDef *def, + char *const *sharedFilesystems, bool migrated, bool chardevStdioLogd); typedef int (*virSecurityDomainGetProcessLabel) (virSecurityManager *mgr, @@ -115,10 +117,12 @@ typedef int (*virSecurityDomainSetHugepages) (virSecurityManager *mgr, typedef int (*virSecurityDomainSetImageLabel) (virSecurityManager *mgr, virDomainDef *def, virStorageSource *src, + char *const *sharedFilesystems, virSecurityDomainImageLabelFlags flags); typedef int (*virSecurityDomainRestoreImageLabel) (virSecurityManager *mgr, virDomainDef *def, virStorageSource *src, + char *const *sharedFilesystems, virSecurityDomainImageLabelFlags flags); typedef int (*virSecurityDomainMoveImageMetadata) (virSecurityManager *mgr, pid_t pid, diff --git a/src/security/security_manager.c b/src/security/security_manager.c index 24f2f3d3dc..57de40ef65 100644 --- a/src/security/security_manager.c +++ b/src/security/security_manager.c @@ -404,6 +404,7 @@ virSecurityManagerGetPrivileged(virSecurityManager *mgr) * @mgr: security manager object * @vm: domain definition object * @src: disk source definition to operate on + * @sharedFilesystems: list of filesystem to consider shared * @flags: bitwise or of 'virSecurityDomainImageLabelFlags' * * Removes security label from @src according to @flags. @@ -414,6 +415,7 @@ int virSecurityManagerRestoreImageLabel(virSecurityManager *mgr, virDomainDef *vm, virStorageSource *src, + char *const *sharedFilesystems, virSecurityDomainImageLabelFlags flags) { VIR_LOCK_GUARD lock = virObjectLockGuard(mgr); @@ -423,7 +425,7 @@ virSecurityManagerRestoreImageLabel(virSecurityManager *mgr, return -1; } - return mgr->drv->domainRestoreSecurityImageLabel(mgr, vm, src, flags); + return mgr->drv->domainRestoreSecurityImageLabel(mgr, vm, src, sharedFilesystems, flags); } @@ -512,6 +514,7 @@ virSecurityManagerClearSocketLabel(virSecurityManager *mgr, * @mgr: security manager object * @vm: domain definition object * @src: disk source definition to operate on + * @sharedFilesystems: list of filesystem to consider shared * @flags: bitwise or of 'virSecurityDomainImageLabelFlags' * * Labels a storage image with the configured security label according to @flags. @@ -522,6 +525,7 @@ int virSecurityManagerSetImageLabel(virSecurityManager *mgr, virDomainDef *vm, virStorageSource *src, + char *const *sharedFilesystems, virSecurityDomainImageLabelFlags flags) { VIR_LOCK_GUARD lock = virObjectLockGuard(mgr); @@ -531,7 +535,8 @@ virSecurityManagerSetImageLabel(virSecurityManager *mgr, return -1; } - return mgr->drv->domainSetSecurityImageLabel(mgr, vm, src, flags); + return mgr->drv->domainSetSecurityImageLabel(mgr, vm, src, + sharedFilesystems, flags); } @@ -817,6 +822,7 @@ int virSecurityManagerCheckAllLabel(virSecurityManager *mgr, int virSecurityManagerSetAllLabel(virSecurityManager *mgr, virDomainDef *vm, + char *const *sharedFilesystems, const char *incomingPath, bool chardevStdioLogd, bool migrated) @@ -828,7 +834,8 @@ virSecurityManagerSetAllLabel(virSecurityManager *mgr, return -1; } - return mgr->drv->domainSetSecurityAllLabel(mgr, vm, incomingPath, + return mgr->drv->domainSetSecurityAllLabel(mgr, vm, sharedFilesystems, + incomingPath, chardevStdioLogd, migrated); } @@ -836,6 +843,7 @@ virSecurityManagerSetAllLabel(virSecurityManager *mgr, int virSecurityManagerRestoreAllLabel(virSecurityManager *mgr, virDomainDef *vm, + char *const *sharedFilesystems, bool migrated, bool chardevStdioLogd) { @@ -846,8 +854,8 @@ virSecurityManagerRestoreAllLabel(virSecurityManager *mgr, return -1; } - return mgr->drv->domainRestoreSecurityAllLabel(mgr, vm, migrated, - chardevStdioLogd); + return mgr->drv->domainRestoreSecurityAllLabel(mgr, vm, sharedFilesystems, + migrated, chardevStdioLogd); } int @@ -1355,7 +1363,7 @@ virSecurityManagerMetadataLock(virSecurityManager *mgr G_GNUC_UNUSED, } #endif /* !WIN32 */ - if (virFileIsSharedFS(p)) { + if (virFileIsSharedFS(p, NULL)) { /* Probably a root squashed NFS. */ continue; } diff --git a/src/security/security_manager.h b/src/security/security_manager.h index a416af3215..da2ab7f584 100644 --- a/src/security/security_manager.h +++ b/src/security/security_manager.h @@ -130,11 +130,13 @@ int virSecurityManagerCheckAllLabel(virSecurityManager *mgr, virDomainDef *sec); int virSecurityManagerSetAllLabel(virSecurityManager *mgr, virDomainDef *sec, + char *const *sharedFilesystems, const char *incomingPath, bool chardevStdioLogd, bool migrated); int virSecurityManagerRestoreAllLabel(virSecurityManager *mgr, virDomainDef *def, + char *const *sharedFilesystems, bool migrated, bool chardevStdioLogd); int virSecurityManagerGetProcessLabel(virSecurityManager *mgr, @@ -170,10 +172,12 @@ typedef enum { int virSecurityManagerSetImageLabel(virSecurityManager *mgr, virDomainDef *vm, virStorageSource *src, + char *const *sharedFilesystems, virSecurityDomainImageLabelFlags flags); int virSecurityManagerRestoreImageLabel(virSecurityManager *mgr, virDomainDef *vm, virStorageSource *src, + char *const *sharedFilesystems, virSecurityDomainImageLabelFlags flags); int virSecurityManagerMoveImageMetadata(virSecurityManager *mgr, pid_t pid, diff --git a/src/security/security_nop.c b/src/security/security_nop.c index 1413f43d57..f9c0d3cad1 100644 --- a/src/security/security_nop.c +++ b/src/security/security_nop.c @@ -117,6 +117,7 @@ virSecurityDomainReleaseLabelNop(virSecurityManager *mgr G_GNUC_UNUSED, static int virSecurityDomainSetAllLabelNop(virSecurityManager *mgr G_GNUC_UNUSED, virDomainDef *sec G_GNUC_UNUSED, + char *const *sharedFilesystems G_GNUC_UNUSED, const char *incomingPath G_GNUC_UNUSED, bool chardevStdioLogd G_GNUC_UNUSED, bool migrated G_GNUC_UNUSED) @@ -127,6 +128,7 @@ virSecurityDomainSetAllLabelNop(virSecurityManager *mgr G_GNUC_UNUSED, static int virSecurityDomainRestoreAllLabelNop(virSecurityManager *mgr G_GNUC_UNUSED, virDomainDef *vm G_GNUC_UNUSED, + char *const *sharedFilesystems G_GNUC_UNUSED, bool migrated G_GNUC_UNUSED, bool chardevStdioLogd G_GNUC_UNUSED) { @@ -191,6 +193,7 @@ static int virSecurityDomainRestoreImageLabelNop(virSecurityManager *mgr G_GNUC_UNUSED, virDomainDef *def G_GNUC_UNUSED, virStorageSource *src G_GNUC_UNUSED, + char *const *sharedFilesystems G_GNUC_UNUSED, virSecurityDomainImageLabelFlags flags G_GNUC_UNUSED) { return 0; @@ -200,6 +203,7 @@ static int virSecurityDomainSetImageLabelNop(virSecurityManager *mgr G_GNUC_UNUSED, virDomainDef *def G_GNUC_UNUSED, virStorageSource *src G_GNUC_UNUSED, + char *const *sharedFilesystems G_GNUC_UNUSED, virSecurityDomainImageLabelFlags flags G_GNUC_UNUSED) { return 0; diff --git a/src/security/security_selinux.c b/src/security/security_selinux.c index b49af26e49..a891ad5839 100644 --- a/src/security/security_selinux.c +++ b/src/security/security_selinux.c @@ -1777,6 +1777,7 @@ static int virSecuritySELinuxRestoreImageLabelInt(virSecurityManager *mgr, virDomainDef *def, virStorageSource *src, + char *const *sharedFilesystems, bool migrated) { virSecurityLabelDef *seclabel; @@ -1833,7 +1834,7 @@ virSecuritySELinuxRestoreImageLabelInt(virSecurityManager *mgr, if (!src->path) return 0; - if ((rc = virFileIsSharedFS(src->path)) < 0) + if ((rc = virFileIsSharedFS(src->path, sharedFilesystems)) < 0) return -1; } @@ -1867,9 +1868,10 @@ static int virSecuritySELinuxRestoreImageLabel(virSecurityManager *mgr, virDomainDef *def, virStorageSource *src, + char *const *sharedFilesystems, virSecurityDomainImageLabelFlags flags G_GNUC_UNUSED) { - return virSecuritySELinuxRestoreImageLabelInt(mgr, def, src, false); + return virSecuritySELinuxRestoreImageLabelInt(mgr, def, src, sharedFilesystems, false); } @@ -1878,6 +1880,7 @@ virSecuritySELinuxSetImageLabelInternal(virSecurityManager *mgr, virDomainDef *def, virStorageSource *src, virStorageSource *parent, + char *const *sharedFilesystems G_GNUC_UNUSED, bool isChainTop) { virSecuritySELinuxData *data = virSecurityManagerGetPrivateData(mgr); @@ -1983,6 +1986,7 @@ static int virSecuritySELinuxSetImageLabel(virSecurityManager *mgr, virDomainDef *def, virStorageSource *src, + char *const *sharedFilesystems, virSecurityDomainImageLabelFlags flags) { virStorageSource *parent = src; @@ -1991,7 +1995,9 @@ virSecuritySELinuxSetImageLabel(virSecurityManager *mgr, for (n = src; virStorageSourceIsBacking(n); n = n->backingStore) { const bool isChainTop = flags & VIR_SECURITY_DOMAIN_IMAGE_PARENT_CHAIN_TOP; - if (virSecuritySELinuxSetImageLabelInternal(mgr, def, n, parent, isChainTop) < 0) + if (virSecuritySELinuxSetImageLabelInternal(mgr, def, n, parent, + sharedFilesystems, + isChainTop) < 0) return -1; if (!(flags & VIR_SECURITY_DOMAIN_IMAGE_LABEL_BACKING_CHAIN)) @@ -2819,6 +2825,7 @@ virSecuritySELinuxRestoreSysinfoLabel(virSecurityManager *mgr, static int virSecuritySELinuxRestoreAllLabel(virSecurityManager *mgr, virDomainDef *def, + char *const *sharedFilesystems, bool migrated, bool chardevStdioLogd) { @@ -2843,6 +2850,7 @@ virSecuritySELinuxRestoreAllLabel(virSecurityManager *mgr, virDomainDiskDef *disk = def->disks[i]; if (virSecuritySELinuxRestoreImageLabelInt(mgr, def, disk->src, + sharedFilesystems, migrated) < 0) rc = -1; } @@ -2889,6 +2897,7 @@ virSecuritySELinuxRestoreAllLabel(virSecurityManager *mgr, if (def->os.loader && def->os.loader->nvram) { if (virSecuritySELinuxRestoreImageLabelInt(mgr, def, def->os.loader->nvram, + sharedFilesystems, migrated) < 0) rc = -1; } @@ -3231,6 +3240,7 @@ virSecuritySELinuxSetSysinfoLabel(virSecurityManager *mgr, static int virSecuritySELinuxSetAllLabel(virSecurityManager *mgr, virDomainDef *def, + char *const *sharedFilesystems, const char *incomingPath G_GNUC_UNUSED, bool chardevStdioLogd, bool migrated G_GNUC_UNUSED) @@ -3258,6 +3268,7 @@ virSecuritySELinuxSetAllLabel(virSecurityManager *mgr, continue; } if (virSecuritySELinuxSetImageLabel(mgr, def, def->disks[i]->src, + sharedFilesystems, VIR_SECURITY_DOMAIN_IMAGE_LABEL_BACKING_CHAIN | VIR_SECURITY_DOMAIN_IMAGE_PARENT_CHAIN_TOP) < 0) return -1; @@ -3308,6 +3319,7 @@ virSecuritySELinuxSetAllLabel(virSecurityManager *mgr, if (def->os.loader && def->os.loader->nvram) { if (virSecuritySELinuxSetImageLabel(mgr, def, def->os.loader->nvram, + sharedFilesystems, VIR_SECURITY_DOMAIN_IMAGE_LABEL_BACKING_CHAIN | VIR_SECURITY_DOMAIN_IMAGE_PARENT_CHAIN_TOP) < 0) return -1; diff --git a/src/security/security_stack.c b/src/security/security_stack.c index 369b5dd3a6..dc52df0bff 100644 --- a/src/security/security_stack.c +++ b/src/security/security_stack.c @@ -338,6 +338,7 @@ virSecurityStackRestoreHostdevLabel(virSecurityManager *mgr, static int virSecurityStackSetAllLabel(virSecurityManager *mgr, virDomainDef *vm, + char *const *sharedFilesystems, const char *incomingPath, bool chardevStdioLogd, bool migrated) @@ -347,8 +348,8 @@ virSecurityStackSetAllLabel(virSecurityManager *mgr, for (; item; item = item->next) { if (virSecurityManagerSetAllLabel(item->securityManager, vm, - incomingPath, chardevStdioLogd, - migrated) < 0) + sharedFilesystems, incomingPath, + chardevStdioLogd, migrated) < 0) goto rollback; } @@ -358,6 +359,7 @@ virSecurityStackSetAllLabel(virSecurityManager *mgr, for (item = item->prev; item; item = item->prev) { if (virSecurityManagerRestoreAllLabel(item->securityManager, vm, + sharedFilesystems, migrated, chardevStdioLogd) < 0) { VIR_WARN("Unable to restore all labels after failed set label call " @@ -374,6 +376,7 @@ virSecurityStackSetAllLabel(virSecurityManager *mgr, static int virSecurityStackRestoreAllLabel(virSecurityManager *mgr, virDomainDef *vm, + char *const *sharedFilesystems, bool migrated, bool chardevStdioLogd) { @@ -383,6 +386,7 @@ virSecurityStackRestoreAllLabel(virSecurityManager *mgr, for (; item; item = item->next) { if (virSecurityManagerRestoreAllLabel(item->securityManager, vm, + sharedFilesystems, migrated, chardevStdioLogd) < 0) rc = -1; } @@ -640,6 +644,7 @@ static int virSecurityStackSetImageLabel(virSecurityManager *mgr, virDomainDef *vm, virStorageSource *src, + char *const *sharedFilesystems, virSecurityDomainImageLabelFlags flags) { virSecurityStackData *priv = virSecurityManagerGetPrivateData(mgr); @@ -647,7 +652,7 @@ virSecurityStackSetImageLabel(virSecurityManager *mgr, for (; item; item = item->next) { if (virSecurityManagerSetImageLabel(item->securityManager, vm, src, - flags) < 0) + sharedFilesystems, flags) < 0) goto rollback; } @@ -658,6 +663,7 @@ virSecurityStackSetImageLabel(virSecurityManager *mgr, if (virSecurityManagerRestoreImageLabel(item->securityManager, vm, src, + sharedFilesystems, flags) < 0) { VIR_WARN("Unable to restore image label after failed set label " "call virDriver=%s driver=%s domain=%s src=%p (path=%s) " @@ -674,6 +680,7 @@ static int virSecurityStackRestoreImageLabel(virSecurityManager *mgr, virDomainDef *vm, virStorageSource *src, + char *const *sharedFilesystems, virSecurityDomainImageLabelFlags flags) { virSecurityStackData *priv = virSecurityManagerGetPrivateData(mgr); @@ -682,7 +689,8 @@ virSecurityStackRestoreImageLabel(virSecurityManager *mgr, for (; item; item = item->next) { if (virSecurityManagerRestoreImageLabel(item->securityManager, - vm, src, flags) < 0) + vm, src, sharedFilesystems, + flags) < 0) rc = -1; } diff --git a/src/util/virfile.c b/src/util/virfile.c index deaf4555fd..a6a7de9829 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -2598,7 +2598,7 @@ virFileOpenAs(const char *path, int openflags, mode_t mode, /* On Linux we can also verify the FS-type of the * directory. (this is a NOP on other platforms). */ - if (virFileIsSharedFS(path) <= 0) + if (virFileIsSharedFS(path, NULL) <= 0) goto error; } @@ -3795,7 +3795,8 @@ virFileGetDefaultHugepage(virHugeTLBFS *fs, return NULL; } -int virFileIsSharedFS(const char *path) +int virFileIsSharedFS(const char *path, + char *const *overrides G_GNUC_UNUSED) { return virFileIsSharedFSType(path, VIR_FILE_SHFS_NFS | diff --git a/src/util/virfile.h b/src/util/virfile.h index 56fe309bce..3fdd7f526c 100644 --- a/src/util/virfile.h +++ b/src/util/virfile.h @@ -235,7 +235,8 @@ enum { }; int virFileIsSharedFSType(const char *path, unsigned int fstypes) ATTRIBUTE_NONNULL(1); -int virFileIsSharedFS(const char *path) ATTRIBUTE_NONNULL(1); +int virFileIsSharedFS(const char *path, + char *const *overrides) ATTRIBUTE_NONNULL(1); int virFileIsClusterFS(const char *path) ATTRIBUTE_NONNULL(1); int virFileIsMountPoint(const char *file) ATTRIBUTE_NONNULL(1); int virFileIsCDROM(const char *path) diff --git a/tests/securityselinuxlabeltest.c b/tests/securityselinuxlabeltest.c index 04bffe4356..f23772dcde 100644 --- a/tests/securityselinuxlabeltest.c +++ b/tests/securityselinuxlabeltest.c @@ -270,7 +270,7 @@ testSELinuxLabeling(const void *opaque) if (!(def = testSELinuxLoadDef(testname))) goto cleanup; - if (virSecurityManagerSetAllLabel(mgr, def, NULL, false, false) < 0) + if (virSecurityManagerSetAllLabel(mgr, def, NULL, NULL, false, false) < 0) goto cleanup; if (testSELinuxCheckLabels(files, nfiles) < 0) diff --git a/tests/virfiletest.c b/tests/virfiletest.c index 9fbfc37e56..e05925a321 100644 --- a/tests/virfiletest.c +++ b/tests/virfiletest.c @@ -313,7 +313,7 @@ testFileIsSharedFSType(const void *opaque G_GNUC_UNUSED) goto cleanup; } - actual = virFileIsSharedFS(data->filename); + actual = virFileIsSharedFS(data->filename, NULL); if (actual != data->expected) { fprintf(stderr, "Unexpected FS type. Expected %d got %d\n", -- 2.44.0

On Wed, Mar 20, 2024 at 10:19:12 +0100, Andrea Bolognani wrote:
virFileIsSharedFS() is the function that ultimately decides whether a filesystem should be considered shared, but the list of manually configured shared filesystems is part of the QEMU driver's configuration, so we need to pass the information through several layers in order to make use of it.
Note that with this change the list is propagated all the way through, but its contents are still ignored, so the behavior remains the same for now.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> --- src/lxc/lxc_controller.c | 2 +- src/lxc/lxc_driver.c | 2 +- src/lxc/lxc_process.c | 4 ++-- src/qemu/qemu_domain.c | 2 +- src/qemu/qemu_extdevice.c | 2 +- src/qemu/qemu_migration.c | 12 ++++++++---- src/qemu/qemu_security.c | 14 ++++++++++++-- src/qemu/qemu_tpm.c | 27 ++++++++++++++++++--------- src/qemu/qemu_tpm.h | 8 +++++--- src/security/security_apparmor.c | 2 ++ src/security/security_dac.c | 17 +++++++++++++---- src/security/security_driver.h | 4 ++++ src/security/security_manager.c | 20 ++++++++++++++------ src/security/security_manager.h | 4 ++++ src/security/security_nop.c | 4 ++++ src/security/security_selinux.c | 18 +++++++++++++++--- src/security/security_stack.c | 16 ++++++++++++---- src/util/virfile.c | 5 +++-- src/util/virfile.h | 3 ++- tests/securityselinuxlabeltest.c | 2 +- tests/virfiletest.c | 2 +- 21 files changed, 124 insertions(+), 46 deletions(-)
[...]
@@ -1355,7 +1363,7 @@ virSecurityManagerMetadataLock(virSecurityManager *mgr G_GNUC_UNUSED, } #endif /* !WIN32 */
- if (virFileIsSharedFS(p)) { + if (virFileIsSharedFS(p, NULL)) {
As virSecurityManagerMetadataLock is passed all of the functions that security labelling is happening on it feels weird to have one instance where it's not. Any reason you didn't pass it in? It should not matter as locally the files shouldn't return an error. If you have a reason, explain it in a comment please.
/* Probably a root squashed NFS. */ continue; }
Rest looks good

On Wed, Mar 20, 2024 at 01:10:00PM +0100, Peter Krempa wrote:
On Wed, Mar 20, 2024 at 10:19:12 +0100, Andrea Bolognani wrote:
@@ -1355,7 +1363,7 @@ virSecurityManagerMetadataLock(virSecurityManager *mgr G_GNUC_UNUSED, } #endif /* !WIN32 */
- if (virFileIsSharedFS(p)) { + if (virFileIsSharedFS(p, NULL)) {
As virSecurityManagerMetadataLock is passed all of the functions that security labelling is happening on it feels weird to have one instance where it's not. Any reason you didn't pass it in? It should not matter as locally the files shouldn't return an error.
If you have a reason, explain it in a comment please.
No reason at all, it was just an oversight O:-) Thanks a lot for catching that! I've already fixed it locally. -- Andrea Bolognani / Red Hat / Virtualization

If the filesystem wasn't determined to be a shared one via the type check, try comparing it with the additional paths that have been configured by the local admin. Signed-off-by: Andrea Bolognani <abologna@redhat.com> --- src/util/virfile.c | 86 ++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 72 insertions(+), 14 deletions(-) diff --git a/src/util/virfile.c b/src/util/virfile.c index a6a7de9829..ac9b5a77a6 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -3795,22 +3795,80 @@ virFileGetDefaultHugepage(virHugeTLBFS *fs, return NULL; } +static int +virFileIsSharedFSOverrideCompare(const char *path, + char *const *overrides) +{ + char *const *iter = overrides; + + while (*iter != NULL) { + if (STREQ(path, *iter)) + return 1; + iter++; + } + + return 0; +} + +static int +virFileIsSharedFSOverride(const char *path, + char *const *overrides) +{ + g_autofree char *dirpath = NULL; + char *p = NULL; + int ret = 0; + + if (!path || path[0] != '/' || !overrides) + return ret; + + dirpath = g_strdup(path); + + ret = virFileIsSharedFSOverrideCompare(dirpath, overrides); + + /* Continue until we've scanned the entire path or found a match */ + while (p != dirpath && ret == 0) { + + /* Find the last slash */ + if ((p = strrchr(dirpath, '/')) == NULL) + break; + + /* Truncate the path by overwriting the slash that we've just + * found with a null byte. If it is the very first slash in + * the path, we need to handle things slightly differently */ + if (p == dirpath) + *(p+1) = '\0'; + else + *p = '\0'; + + ret = virFileIsSharedFSOverrideCompare(dirpath, overrides); + } + + return ret; +} + int virFileIsSharedFS(const char *path, - char *const *overrides G_GNUC_UNUSED) + char *const *overrides) { - return virFileIsSharedFSType(path, - VIR_FILE_SHFS_NFS | - VIR_FILE_SHFS_GFS2 | - VIR_FILE_SHFS_OCFS | - VIR_FILE_SHFS_AFS | - VIR_FILE_SHFS_SMB | - VIR_FILE_SHFS_CIFS | - VIR_FILE_SHFS_CEPH | - VIR_FILE_SHFS_GPFS| - VIR_FILE_SHFS_QB | - VIR_FILE_SHFS_ACFS | - VIR_FILE_SHFS_GLUSTERFS | - VIR_FILE_SHFS_BEEGFS); + int ret; + + ret = virFileIsSharedFSType(path, + VIR_FILE_SHFS_NFS | + VIR_FILE_SHFS_GFS2 | + VIR_FILE_SHFS_OCFS | + VIR_FILE_SHFS_AFS | + VIR_FILE_SHFS_SMB | + VIR_FILE_SHFS_CIFS | + VIR_FILE_SHFS_CEPH | + VIR_FILE_SHFS_GPFS| + VIR_FILE_SHFS_QB | + VIR_FILE_SHFS_ACFS | + VIR_FILE_SHFS_GLUSTERFS | + VIR_FILE_SHFS_BEEGFS); + + if (ret == 0) + ret = virFileIsSharedFSOverride(path, overrides); + + return ret; } -- 2.44.0

On Wed, Mar 20, 2024 at 10:19:13 +0100, Andrea Bolognani wrote:
If the filesystem wasn't determined to be a shared one via the type check, try comparing it with the additional paths that have been configured by the local admin.
Signed-off-by: Andrea Bolognani <abologna@redhat.com> --- src/util/virfile.c | 86 ++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 72 insertions(+), 14 deletions(-)
diff --git a/src/util/virfile.c b/src/util/virfile.c index a6a7de9829..ac9b5a77a6 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -3795,22 +3795,80 @@ virFileGetDefaultHugepage(virHugeTLBFS *fs, return NULL; }
+static int +virFileIsSharedFSOverrideCompare(const char *path, + char *const *overrides) +{ + char *const *iter = overrides; + + while (*iter != NULL) { + if (STREQ(path, *iter)) + return 1; + iter++;
This is g_strv_contains in disguise.
+ } + + return 0; +} + +static int +virFileIsSharedFSOverride(const char *path, + char *const *overrides) +{ + g_autofree char *dirpath = NULL; + char *p = NULL; + int ret = 0; + + if (!path || path[0] != '/' || !overrides) + return ret;
Return 0 directly.
+ + dirpath = g_strdup(path); + + ret = virFileIsSharedFSOverrideCompare(dirpath, overrides);
Same here, return success directly rather than having the reader look what's happening.
+ + /* Continue until we've scanned the entire path or found a match */ + while (p != dirpath && ret == 0) { + + /* Find the last slash */ + if ((p = strrchr(dirpath, '/')) == NULL) + break; + + /* Truncate the path by overwriting the slash that we've just + * found with a null byte. If it is the very first slash in + * the path, we need to handle things slightly differently */ + if (p == dirpath) + *(p+1) = '\0'; + else + *p = '\0'; + + ret = virFileIsSharedFSOverrideCompare(dirpath, overrides);
Here too.
+ } + + return ret;
And return failure/negative answer here also directly. Also consider returning boolean as it's returning just 1 and 0, which would make the first question I had (What do the return values mean?!?!) auto-answered.
+} +

Up until this point, we have avoided setting labels for incoming migration when the TPM state is stored on a shared filesystem. This seems to make sense, because since the underlying storage is shared surely the labels will be as well. There's one problem, though: when a guest is migrated, the SELinux context for the destination process is different from the one of the source process. We haven't hit any issues with the current approach so far because NFS doesn't support SELinux, so effectively it doesn't matter whether relabeling happens or not: even if the SELinux contexts of the source and target processes are different, both will be able to access the storage. Now that it's possible for the local admin to manually mark exported directories as shared filesystems, however, things can get problematic. Consider the case in which one host (mig-one) exports its local filesystem /srv/nfs/libvirt/swtpm via NFS, and at the same time bind-mounts it to /var/lib/libvirt/swtpm; another host (mig-two) mounts the same filesystem to the same location, this time via NFS. Additionally, in order to allow migration in both directions, on mig-one the /var/lib/libvirt/swtpm directory is listed in the shared_filesystems qemu.conf option. When migrating from mig-one to mig-two, things work just fine; going in the opposite direction, however, results in an error: # virsh migrate cirros qemu+ssh://mig-one/system error: internal error: QEMU unexpectedly closed the monitor (vm='cirros'): qemu-system-x86_64: tpm-emulator: Setting the stateblob (type 1) failed with a TPM error 0x1f qemu-system-x86_64: error while loading state for instance 0x0 of device 'tpm-emulator' qemu-system-x86_64: load of migration failed: Input/output error This is because the directory on mig-one is considered a shared filesystem and thus labeling is skipped, resulting in a SELinux denial. The solution is quite simple: remove the check and always relabel. We know that it's okay to do so not just because it makes the error seen above go away, but also because no such check currently exists for disks and other types of persistent storage such as NVRAM files, which always get relabeled. Signed-off-by: Andrea Bolognani <abologna@redhat.com> --- src/qemu/qemu_tpm.c | 11 ++--------- 1 file changed, 2 insertions(+), 9 deletions(-) diff --git a/src/qemu/qemu_tpm.c b/src/qemu/qemu_tpm.c index f1b4283a70..e522c460aa 100644 --- a/src/qemu/qemu_tpm.c +++ b/src/qemu/qemu_tpm.c @@ -929,7 +929,6 @@ qemuTPMEmulatorStart(virQEMUDriver *driver, g_autofree char *pidfile = NULL; virTimeBackOffVar timebackoff; const unsigned long long timeout = 1000; /* ms */ - bool setTPMStateLabel = true; pid_t pid = -1; cfg = virQEMUDriverGetConfig(driver); @@ -956,13 +955,7 @@ qemuTPMEmulatorStart(virQEMUDriver *driver, virCommandSetPidFile(cmd, pidfile); virCommandSetErrorFD(cmd, &errfd); - if (incomingMigration && - virFileIsSharedFS(tpm->data.emulator.storagepath, cfg->sharedFilesystems) == 1) { - /* security labels must have been set up on source already */ - setTPMStateLabel = false; - } - - if (qemuSecuritySetTPMLabels(driver, vm, setTPMStateLabel) < 0) + if (qemuSecuritySetTPMLabels(driver, vm, true) < 0) return -1; if (qemuSecurityCommandRun(driver, vm, cmd, cfg->swtpm_user, @@ -1011,7 +1004,7 @@ qemuTPMEmulatorStart(virQEMUDriver *driver, virProcessKillPainfully(pid, true); if (pidfile) unlink(pidfile); - qemuSecurityRestoreTPMLabels(driver, vm, setTPMStateLabel); + qemuSecurityRestoreTPMLabels(driver, vm, true); return -1; } -- 2.44.0

On Wed, Mar 20, 2024 at 10:19:14 +0100, Andrea Bolognani wrote:
Up until this point, we have avoided setting labels for incoming migration when the TPM state is stored on a shared filesystem. This seems to make sense, because since the underlying storage is shared surely the labels will be as well.
There's one problem, though: when a guest is migrated, the SELinux context for the destination process is different from the one of the source process.
We haven't hit any issues with the current approach so far because NFS doesn't support SELinux, so effectively it doesn't matter whether relabeling happens or not: even if the SELinux contexts of the source and target processes are different, both will be able to access the storage.
Now that it's possible for the local admin to manually mark exported directories as shared filesystems, however, things can get problematic.
Consider the case in which one host (mig-one) exports its local filesystem /srv/nfs/libvirt/swtpm via NFS, and at the same time bind-mounts it to /var/lib/libvirt/swtpm; another host (mig-two) mounts the same filesystem to the same location, this time via NFS. Additionally, in order to allow migration in both directions, on mig-one the /var/lib/libvirt/swtpm directory is listed in the shared_filesystems qemu.conf option.
When migrating from mig-one to mig-two, things work just fine; going in the opposite direction, however, results in an error:
# virsh migrate cirros qemu+ssh://mig-one/system error: internal error: QEMU unexpectedly closed the monitor (vm='cirros'): qemu-system-x86_64: tpm-emulator: Setting the stateblob (type 1) failed with a TPM error 0x1f qemu-system-x86_64: error while loading state for instance 0x0 of device 'tpm-emulator' qemu-system-x86_64: load of migration failed: Input/output error
This is because the directory on mig-one is considered a shared filesystem and thus labeling is skipped, resulting in a SELinux denial.
The solution is quite simple: remove the check and always relabel. We know that it's okay to do so not just because it makes the error seen above go away, but also because no such check currently exists for disks and other types of persistent storage such as NVRAM files, which always get relabeled.
Did you consider the case when the migration fails and the VM will be restored to run on the source host again? In such case doin the relabelling might break the source host.

On 3/20/24 08:23, Peter Krempa wrote:
On Wed, Mar 20, 2024 at 10:19:14 +0100, Andrea Bolognani wrote:
Up until this point, we have avoided setting labels for incoming migration when the TPM state is stored on a shared filesystem. This seems to make sense, because since the underlying storage is shared surely the labels will be as well.
There's one problem, though: when a guest is migrated, the SELinux context for the destination process is different from the one of the source process.
We haven't hit any issues with the current approach so far because NFS doesn't support SELinux, so effectively it doesn't matter whether relabeling happens or not: even if the SELinux contexts of the source and target processes are different, both will be able to access the storage.
Now that it's possible for the local admin to manually mark exported directories as shared filesystems, however, things can get problematic.
Consider the case in which one host (mig-one) exports its local filesystem /srv/nfs/libvirt/swtpm via NFS, and at the same time bind-mounts it to /var/lib/libvirt/swtpm; another host (mig-two) mounts the same filesystem to the same location, this time via NFS. Additionally, in order to allow migration in both directions, on mig-one the /var/lib/libvirt/swtpm directory is listed in the shared_filesystems qemu.conf option.
When migrating from mig-one to mig-two, things work just fine; going in the opposite direction, however, results in an error:
# virsh migrate cirros qemu+ssh://mig-one/system error: internal error: QEMU unexpectedly closed the monitor (vm='cirros'): qemu-system-x86_64: tpm-emulator: Setting the stateblob (type 1) failed with a TPM error 0x1f qemu-system-x86_64: error while loading state for instance 0x0 of device 'tpm-emulator' qemu-system-x86_64: load of migration failed: Input/output error
This is because the directory on mig-one is considered a shared filesystem and thus labeling is skipped, resulting in a SELinux denial.
The solution is quite simple: remove the check and always relabel. We know that it's okay to do so not just because it makes the error seen above go away, but also because no such check currently exists for disks and other types of persistent storage such as NVRAM files, which always get relabeled.
Did you consider the case when the migration fails and the VM will be restored to run on the source host again? In such case doin the relabelling might break the source host.
Right. I seem to remember testing such scenarios. I had to put an exit() (or something like it) into swtpm on the destination side to trigger the fallback to the source side. The swtpm on the source side had closed file access and wants to open them (lockfile) again and so the files needed to be labeled correctly if the storage on the source side is on the disk and exported via NFS from there (iirc). If the storage is NFS-exported from a 3rd host it probably would not require the labels. Stefan

On Wed, Mar 20, 2024 at 10:18:39AM -0400, Stefan Berger wrote:
On 3/20/24 08:23, Peter Krempa wrote:
On Wed, Mar 20, 2024 at 10:19:14 +0100, Andrea Bolognani wrote:
Consider the case in which one host (mig-one) exports its local filesystem /srv/nfs/libvirt/swtpm via NFS, and at the same time bind-mounts it to /var/lib/libvirt/swtpm; another host (mig-two) mounts the same filesystem to the same location, this time via NFS. Additionally, in order to allow migration in both directions, on mig-one the /var/lib/libvirt/swtpm directory is listed in the shared_filesystems qemu.conf option.
When migrating from mig-one to mig-two, things work just fine; going in the opposite direction, however, results in an error:
# virsh migrate cirros qemu+ssh://mig-one/system error: internal error: QEMU unexpectedly closed the monitor (vm='cirros'): qemu-system-x86_64: tpm-emulator: Setting the stateblob (type 1) failed with a TPM error 0x1f qemu-system-x86_64: error while loading state for instance 0x0 of device 'tpm-emulator' qemu-system-x86_64: load of migration failed: Input/output error
This is because the directory on mig-one is considered a shared filesystem and thus labeling is skipped, resulting in a SELinux denial.
The solution is quite simple: remove the check and always relabel. We know that it's okay to do so not just because it makes the error seen above go away, but also because no such check currently exists for disks and other types of persistent storage such as NVRAM files, which always get relabeled.
Did you consider the case when the migration fails and the VM will be restored to run on the source host again? In such case doin the relabelling might break the source host.
Right. I seem to remember testing such scenarios. I had to put an exit() (or something like it) into swtpm on the destination side to trigger the fallback to the source side. The swtpm on the source side had closed file access and wants to open them (lockfile) again and so the files needed to be labeled correctly if the storage on the source side is on the disk and exported via NFS from there (iirc). If the storage is NFS-exported from a 3rd host it probably would not require the labels.
I didn't really consider the failure scenario, so thank you for bringing that up. I think it would be still fine. If the source has NFS storage, then access will keep working regardless of what relabeling the destination has been up to in the meantime. And if the source has local storage, then the relabeling on the destination (via NFS) will not actually have touched the SELinux labels. The only concern I have is that, when going from local to NFS, labels might have been restored on the source side. But I assume that restoring only happens once the migration has been confirmed as successful? I'll check. Once again, as far as I can tell (please let me know if I'm wrong!) there is no special casing when it comes to disks and other types of persistent storage, so if this approach was problematic I would have expected many issues to have been reported by now. -- Andrea Bolognani / Red Hat / Virtualization

On Wed, Mar 20, 2024 at 09:10:48AM -0700, Andrea Bolognani wrote:
On Wed, Mar 20, 2024 at 10:18:39AM -0400, Stefan Berger wrote:
On 3/20/24 08:23, Peter Krempa wrote:
Did you consider the case when the migration fails and the VM will be restored to run on the source host again? In such case doin the relabelling might break the source host.
Right. I seem to remember testing such scenarios. I had to put an exit() (or something like it) into swtpm on the destination side to trigger the fallback to the source side. The swtpm on the source side had closed file access and wants to open them (lockfile) again and so the files needed to be labeled correctly if the storage on the source side is on the disk and exported via NFS from there (iirc). If the storage is NFS-exported from a 3rd host it probably would not require the labels.
I didn't really consider the failure scenario, so thank you for bringing that up.
I think it would be still fine. If the source has NFS storage, then access will keep working regardless of what relabeling the destination has been up to in the meantime. And if the source has local storage, then the relabeling on the destination (via NFS) will not actually have touched the SELinux labels.
The only concern I have is that, when going from local to NFS, labels might have been restored on the source side. But I assume that restoring only happens once the migration has been confirmed as successful? I'll check.
Once again, as far as I can tell (please let me know if I'm wrong!) there is no special casing when it comes to disks and other types of persistent storage, so if this approach was problematic I would have expected many issues to have been reported by now.
I've tested this to confirm. My trick to simulating a migration failure was to add <qemu:commandline> <qemu:arg value='-machine'/> <qemu:arg value='pc-i440fx-6.0'/> </qemu:commandline> to the migration XML, where the VM uses a different machine type in its configuration. This results in something like process exited while connecting to monitor: qemu-system-x86_64: -device {"driver":"pcie-root-port","port":8,"chassis":1,"id":"pci.1",\ "bus":"pcie.0","multifunction":true,"addr":"0x1"}: Bus 'pcie.0' not found which should be a decent enough proxy for the kind of "we went as far as attempting to start the QEMU process on the destination host, then things went sideways" failure that we'd experience as a consequence of a permission error. Suggestions on how to improve the test methodology are very much appreciated :) Anyway, based on what I've seen I think I can confirm my initial intuition as reported above. Things work the way you expect, in that upon migration failure the VM keeps happily chugging along on the source host. -- Andrea Bolognani / Red Hat / Virtualization

On 4/17/24 11:20, Andrea Bolognani wrote:
On Wed, Mar 20, 2024 at 09:10:48AM -0700, Andrea Bolognani wrote:
On Wed, Mar 20, 2024 at 10:18:39AM -0400, Stefan Berger wrote:
On 3/20/24 08:23, Peter Krempa wrote:
Did you consider the case when the migration fails and the VM will be restored to run on the source host again? In such case doin the relabelling might break the source host.
Right. I seem to remember testing such scenarios. I had to put an exit() (or something like it) into swtpm on the destination side to trigger the fallback to the source side. The swtpm on the source side had closed file access and wants to open them (lockfile) again and so the files needed to be labeled correctly if the storage on the source side is on the disk and exported via NFS from there (iirc). If the storage is NFS-exported from a 3rd host it probably would not require the labels.
I didn't really consider the failure scenario, so thank you for bringing that up.
I think it would be still fine. If the source has NFS storage, then access will keep working regardless of what relabeling the destination has been up to in the meantime. And if the source has local storage, then the relabeling on the destination (via NFS) will not actually have touched the SELinux labels.
The only concern I have is that, when going from local to NFS, labels might have been restored on the source side. But I assume that restoring only happens once the migration has been confirmed as successful? I'll check.
Once again, as far as I can tell (please let me know if I'm wrong!) there is no special casing when it comes to disks and other types of persistent storage, so if this approach was problematic I would have expected many issues to have been reported by now.
I've tested this to confirm. My trick to simulating a migration failure was to add
<qemu:commandline> <qemu:arg value='-machine'/> <qemu:arg value='pc-i440fx-6.0'/> </qemu:commandline>
to the migration XML, where the VM uses a different machine type in its configuration. This results in something like
process exited while connecting to monitor: qemu-system-x86_64: -device {"driver":"pcie-root-port","port":8,"chassis":1,"id":"pci.1",\ "bus":"pcie.0","multifunction":true,"addr":"0x1"}: Bus 'pcie.0' not found
which should be a decent enough proxy for the kind of "we went as far as attempting to start the QEMU process on the destination host, then things went sideways" failure that we'd experience as a consequence of a permission error. Suggestions on how to improve the test methodology are very much appreciated :)
Anyway, based on what I've seen I think I can confirm my initial intuition as reported above. Things work the way you expect, in that upon migration failure the VM keeps happily chugging along on the source host.
If a file is labeled on the NFS server side then the current code doesn't relabel the state file on server side when we have an outgoing migration due to possible fall-back in case of error. With NFS at least not supporting SElinux labels (and security xattrs) we are fine on the server side if the fallback happens whatever was done with SELinux labeling on the client side (gets EOPNOTSUPP) because the label will not be changed on the server side. If there existed a shared filesystem that supports SELinux labels and propagated them we'd be in trouble at least for the fall-back scenario, right? I don't know whether any shared filesystem would ever share security xattrs across the network, though, so that this could become a problem.

On Wed, Mar 20, 2024 at 09:10:48AM -0700, Andrea Bolognani wrote:
On Wed, Mar 20, 2024 at 10:18:39AM -0400, Stefan Berger wrote:
On 3/20/24 08:23, Peter Krempa wrote:
On Wed, Mar 20, 2024 at 10:19:14 +0100, Andrea Bolognani wrote:
Consider the case in which one host (mig-one) exports its local filesystem /srv/nfs/libvirt/swtpm via NFS, and at the same time bind-mounts it to /var/lib/libvirt/swtpm; another host (mig-two) mounts the same filesystem to the same location, this time via NFS. Additionally, in order to allow migration in both directions, on mig-one the /var/lib/libvirt/swtpm directory is listed in the shared_filesystems qemu.conf option.
When migrating from mig-one to mig-two, things work just fine; going in the opposite direction, however, results in an error:
# virsh migrate cirros qemu+ssh://mig-one/system error: internal error: QEMU unexpectedly closed the monitor (vm='cirros'): qemu-system-x86_64: tpm-emulator: Setting the stateblob (type 1) failed with a TPM error 0x1f qemu-system-x86_64: error while loading state for instance 0x0 of device 'tpm-emulator' qemu-system-x86_64: load of migration failed: Input/output error
This is because the directory on mig-one is considered a shared filesystem and thus labeling is skipped, resulting in a SELinux denial.
The solution is quite simple: remove the check and always relabel. We know that it's okay to do so not just because it makes the error seen above go away, but also because no such check currently exists for disks and other types of persistent storage such as NVRAM files, which always get relabeled.
Did you consider the case when the migration fails and the VM will be restored to run on the source host again? In such case doin the relabelling might break the source host.
Right. I seem to remember testing such scenarios. I had to put an exit() (or something like it) into swtpm on the destination side to trigger the fallback to the source side. The swtpm on the source side had closed file access and wants to open them (lockfile) again and so the files needed to be labeled correctly if the storage on the source side is on the disk and exported via NFS from there (iirc). If the storage is NFS-exported from a 3rd host it probably would not require the labels.
I didn't really consider the failure scenario, so thank you for bringing that up.
I think it would be still fine. If the source has NFS storage, then access will keep working regardless of what relabeling the destination has been up to in the meantime. And if the source has local storage, then the relabeling on the destination (via NFS) will not actually have touched the SELinux labels.
The only concern I have is that, when going from local to NFS, labels might have been restored on the source side. But I assume that restoring only happens once the migration has been confirmed as successful? I'll check.
Once again, as far as I can tell (please let me know if I'm wrong!) there is no special casing when it comes to disks and other types of persistent storage, so if this approach was problematic I would have expected many issues to have been reported by now.
The Labelling and shared filesystems combination has some major caveats. By default NFS doesn't enable use of SELinux labels, so if both sides are on NFS, a generic label will be used for everything on the NFS volume and everything will be fine. If you enable use of labelling on NFS though, you can no longer use dynamic label assignment. static label assignment must be used, so ensure the same SELinux label is used on both sides of the migration. This is because both QEMU's need to be able to open the shared files, even if only 1 is actively writing. In this case we don't need to relabel the files on the shared FS, but it is a no-op if we try. If one side is ext4 and the other side is NFS, then the same applies if NFS has labelling enabled. If one side is ext4 and the other side is NFS, without labelling enabled, then I think we're probably in a world of hurt. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Thu, Apr 18, 2024 at 06:29:53PM +0100, Daniel P. Berrangé wrote:
On Wed, Mar 20, 2024 at 09:10:48AM -0700, Andrea Bolognani wrote:
On Wed, Mar 20, 2024 at 10:18:39AM -0400, Stefan Berger wrote:
On 3/20/24 08:23, Peter Krempa wrote:
Did you consider the case when the migration fails and the VM will be restored to run on the source host again? In such case doin the relabelling might break the source host.
Right. I seem to remember testing such scenarios. I had to put an exit() (or something like it) into swtpm on the destination side to trigger the fallback to the source side. The swtpm on the source side had closed file access and wants to open them (lockfile) again and so the files needed to be labeled correctly if the storage on the source side is on the disk and exported via NFS from there (iirc). If the storage is NFS-exported from a 3rd host it probably would not require the labels.
I didn't really consider the failure scenario, so thank you for bringing that up.
I think it would be still fine. If the source has NFS storage, then access will keep working regardless of what relabeling the destination has been up to in the meantime. And if the source has local storage, then the relabeling on the destination (via NFS) will not actually have touched the SELinux labels.
The only concern I have is that, when going from local to NFS, labels might have been restored on the source side. But I assume that restoring only happens once the migration has been confirmed as successful? I'll check.
Once again, as far as I can tell (please let me know if I'm wrong!) there is no special casing when it comes to disks and other types of persistent storage, so if this approach was problematic I would have expected many issues to have been reported by now.
The Labelling and shared filesystems combination has some major caveats.
By default NFS doesn't enable use of SELinux labels, so if both sides are on NFS, a generic label will be used for everything on the NFS volume and everything will be fine.
If you enable use of labelling on NFS though, you can no longer use dynamic label assignment. static label assignment must be used, so ensure the same SELinux label is used on both sides of the migration. This is because both QEMU's need to be able to open the shared files, even if only 1 is actively writing. In this case we don't need to relabel the files on the shared FS, but it is a no-op if we try.
If one side is ext4 and the other side is NFS, then the same applies if NFS has labelling enabled.
If one side is ext4 and the other side is NFS, without labelling enabled, then I think we're probably in a world of hurt.
Can you please point me to documentation on how to enable SELinux for NFS? Based on what I found I *think* you need NFS v4.2, but I'm pretty sure I'm currently using that (how to make sure?) and I'm still seeing nfs_t for everything on the client side. Given the lack of information around this, I'm still not entirely convinced that SELinux can be made to work at all over NFS. But you're right that dynamic label assignment can't really work in general over shared filesystems, and the only reason we currently get away with it is that NFS and most (all?) other shared filesystems don't do SELinux. Mixing ext4 and NFS the way these patches allow you to do doesn't make the situation any worse or better in that regard. -- Andrea Bolognani / Red Hat / Virtualization

Signed-off-by: Andrea Bolognani <abologna@redhat.com> --- NEWS.rst | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/NEWS.rst b/NEWS.rst index 489201d3fc..7e17043c2a 100644 --- a/NEWS.rst +++ b/NEWS.rst @@ -17,6 +17,13 @@ v10.2.0 (unreleased) * **New features** + * qemu: Add ``shared_filesystems`` configuration option + + This option can be used to configure libvirt so that migration between two + hosts, one of which exports a shared filesystem via NFS and the other one + which mounts it, is allowed in both directions. Without it, libvirt would + block migration from the host that is accessing the data locally. + * **Improvements** * **Bug fixes** -- 2.44.0

On 3/20/24 05:19, Andrea Bolognani wrote:
Signed-off-by: Andrea Bolognani <abologna@redhat.com> --- NEWS.rst | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/NEWS.rst b/NEWS.rst index 489201d3fc..7e17043c2a 100644 --- a/NEWS.rst +++ b/NEWS.rst @@ -17,6 +17,13 @@ v10.2.0 (unreleased)
* **New features**
+ * qemu: Add ``shared_filesystems`` configuration option + + This option can be used to configure libvirt so that migration between two + hosts, one of which exports a shared filesystem via NFS and the other one
Is NFS the only shared filesystem to consider or is NFS only an example?
+ which mounts it, is allowed in both directions. Without it, libvirt would + block migration from the host that is accessing the data locally. + * **Improvements**
* **Bug fixes**

On Wed, Mar 20, 2024 at 10:07:11AM -0400, Stefan Berger wrote:
On 3/20/24 05:19, Andrea Bolognani wrote:
+ * qemu: Add ``shared_filesystems`` configuration option + + This option can be used to configure libvirt so that migration between two + hosts, one of which exports a shared filesystem via NFS and the other one
Is NFS the only shared filesystem to consider or is NFS only an example?
Theoretically, just an example. Possibly the most common scenario? Certainly the only one I've actually tested :) -- Andrea Bolognani / Red Hat / Virtualization

On 3/20/24 11:59, Andrea Bolognani wrote:
On Wed, Mar 20, 2024 at 10:07:11AM -0400, Stefan Berger wrote:
On 3/20/24 05:19, Andrea Bolognani wrote:
+ * qemu: Add ``shared_filesystems`` configuration option + + This option can be used to configure libvirt so that migration between two + hosts, one of which exports a shared filesystem via NFS and the other one
Is NFS the only shared filesystem to consider or is NFS only an example?
Theoretically, just an example. Possibly the most common scenario? Certainly the only one I've actually tested :)
Then I would write ... exports a shared filesystem, such as NFS, and the other one ...
participants (4)
-
Andrea Bolognani
-
Daniel P. Berrangé
-
Peter Krempa
-
Stefan Berger