On 3/20/24 08:23, Peter Krempa wrote:
On Wed, Mar 20, 2024 at 10:19:14 +0100, Andrea Bolognani wrote:
> Up until this point, we have avoided setting labels for
> incoming migration when the TPM state is stored on a shared
> filesystem. This seems to make sense, because since the
> underlying storage is shared surely the labels will be as
> well.
>
> There's one problem, though: when a guest is migrated, the
> SELinux context for the destination process is different from
> the one of the source process.
>
> We haven't hit any issues with the current approach so far
> because NFS doesn't support SELinux, so effectively it doesn't
> matter whether relabeling happens or not: even if the SELinux
> contexts of the source and target processes are different,
> both will be able to access the storage.
>
> Now that it's possible for the local admin to manually mark
> exported directories as shared filesystems, however, things
> can get problematic.
>
> Consider the case in which one host (mig-one) exports its
> local filesystem /srv/nfs/libvirt/swtpm via NFS, and at the
> same time bind-mounts it to /var/lib/libvirt/swtpm; another
> host (mig-two) mounts the same filesystem to the same
> location, this time via NFS. Additionally, in order to
> allow migration in both directions, on mig-one the
> /var/lib/libvirt/swtpm directory is listed in the
> shared_filesystems qemu.conf option.
>
> When migrating from mig-one to mig-two, things work just fine;
> going in the opposite direction, however, results in an error:
>
> # virsh migrate cirros qemu+ssh://mig-one/system
> error: internal error: QEMU unexpectedly closed the monitor
(vm='cirros'):
> qemu-system-x86_64: tpm-emulator: Setting the stateblob (type 1) failed with a TPM
error 0x1f
> qemu-system-x86_64: error while loading state for instance 0x0 of device
'tpm-emulator'
> qemu-system-x86_64: load of migration failed: Input/output error
>
> This is because the directory on mig-one is considered a
> shared filesystem and thus labeling is skipped, resulting in
> a SELinux denial.
>
> The solution is quite simple: remove the check and always
> relabel. We know that it's okay to do so not just because it
> makes the error seen above go away, but also because no such
> check currently exists for disks and other types of persistent
> storage such as NVRAM files, which always get relabeled.
Did you consider the case when the migration fails and the VM will be
restored to run on the source host again? In such case doin the
relabelling might break the source host.
Right. I seem to remember testing such scenarios. I had to put an exit()
(or something like it) into swtpm on the destination side to trigger the
fallback to the source side. The swtpm on the source side had closed
file access and wants to open them (lockfile) again and so the files
needed to be labeled correctly if the storage on the source side is
on the disk and exported via NFS from there (iirc). If the storage is
NFS-exported from a 3rd host it probably would not require the labels.
Stefan