On Tue, Mar 26, 2024 at 08:54:03AM -0700, Andrea Bolognani wrote:
On Wed, Mar 20, 2024 at 08:43:24AM -0700, Andrea Bolognani wrote:
> On Wed, Mar 20, 2024 at 12:37:37PM +0100, Peter Krempa wrote:
> > On Wed, Mar 20, 2024 at 10:19:11 +0100, Andrea Bolognani wrote:
> > > +# libvirt will normally prevent migration if the storage backing the VM
is not
> > > +# on a shared filesystems. Sometimes, however, the storage *is* shared
despite
> > > +# not being detected as such: for example, this is the case when one of
the
> > > +# hosts involved in the migration is exporting its local storage to the
other
> > > +# one via NFS.
> > > +#
> > > +# Any directory listed here will be assumed to live on a shared
filesystem,
> > > +# making migration possible in scenarios such as the one described
above.
> > > +#
> > > +# If you need this feature, you probably want to set remember_owner=0
too.
> >
> > Could you please elaborate why you'd want to disable owner remembering?
> > With remote filesystems this works so I expect that if this makes
> > certain paths behave as shared filesystems, they should behave such
> > without any additional tweaks
>
> To be quite honest I don't remember exactly why I've added that, but
> I can confirm that if remember_owner=0 is not used on the destination
> host then migration will stall for a bit and then fail with
>
> error: unable to lock /var/lib/libvirt/swtpm/.../tpm2/.lock for
> metadata change: Resource temporarily unavailable
>
> Things work fine if swtpm is not involved. I'm going to dig deeper,
> but my guess is that, just like the situation addressed by the last
> patch, having an additional process complicates things compared to
> when we need to worry about QEMU only.
I've managed to track this down, and I wasn't far off.
The issue is that, when remember_owner is enabled, we perform a bunch
of additional locking on files around the actual relabel operation;
specifically, we call virSecurityManagerTransactionCommit() with the
last argument set to true.
This doesn't seem to cause any issues in general, *except* when it
comes to swtpm's storage lock.
I take this back. Further testing has confirmed that there are other
scenarios in which this is problematic, for example NVRAM files.
Even with the TPM locking issue out of the way, we still have to deal
with an underlying issue: label remembering is implemented by setting
custom XATTRs on the files involved, and NFS doesn't support XATTRs.
So what will happen when remember_owner=1 and the VM getting started
on the host that has local access to the filesystem is, that these
XATTRs will get set and not cleared up on migration; the VM will make
it safely to the other host.
Then, when you attempt to migrate it back, libvirt will report
Setting different SELinux label on
/var/lib/libvirt/qemu/nvram/test_VARS.fd which is already in use
and abort the operation. This is caused by
# getfattr -dm- /var/lib/libvirt/qemu/nvram/test_VARS.fd
security.selinux="system_u:object_r:svirt_image_t:s0:c704,c774"
trusted.libvirt.security.dac="+0:+0"
trusted.libvirt.security.ref_dac="1"
trusted.libvirt.security.ref_selinux="1"
trusted.libvirt.security.selinux="system_u:object_r:svirt_image_t:s0"
trusted.libvirt.security.timestamp_dac="1713347549"
trusted.libvirt.security.timestamp_selinux="1713347549"
In particular, the non-zero reference counts are what convinces
libvirt to bail.
I don't think we can handle this nicely without opening other cans of
worms. The principle of "don't touch anything on the source once the
VM is running on the destination" is a good one to follow, since
ignoring it might result in the destination VM suddenly being
prevented from performing I/O. And in case the migration fails, which
as you both pointed out earlier is a possibility that we need to take
into account, retaining the existing labels instead of clearing them
is actually a good thing, as it allows resuming execution on the
source host without running into permission issues.
In other words, I've come to the conclusion that remember_owner=0 is
the least bad option here.
In the v2 that I've just posted, I've updated the comment to stress
further the fact that this option comes with caveats and is not to be
used lightly.
--
Andrea Bolognani / Red Hat / Virtualization