On Mon, Jul 29, 2024 at 03:09:55PM GMT, Peter Krempa wrote:
On Fri, Jul 26, 2024 at 07:47:57 -0700, Andrea Bolognani wrote:
> For example, if the domain uses UEFI, local -> remote migration works
> but when attempting to migrate back I get
>
> error: Requested operation is not valid: Setting different SELinux
> label on /var/lib/libvirt/qemu/nvram/cirros-efi_VARS.fd which is
> already in use
So I've kind of forgot about this one. That's mainly as sharing this
path is not actually needed for migration to work, as the contents are
shared inside the migration stream automatically.
Given though that users might want to share this one, so that they can
start the VM withouth migration on a different host and also we allow
setting the path to a more reasonable directory for sharing
(users should not share /var/lib/libvirt/qemu/nvram/, or anything
belonging to libvirt in the first place)
What's wrong with sharing this? It's just another type of storage
after all. And it's true that in some cases you might be able to
start the domain even with the NVRAM file being missing, but that's
not true in the general case.
Or are you saying that it's okay to share the directory where NVRAM
files are stored, as long as it's not under /var/lib/libvirt?
> If I add TPM into the mix, local -> remote migration fails
with
>
> error: unable to lock
> /var/lib/libvirt/swtpm/50601047-47f7-4f45-9b41-9112dfaa3539/tpm2/.lock
> for metadata change: Resource temporarily unavailable
This makes it look like the file is still locked, that might be
indication of a different problem as libvirt will not keep the lock
after updating the XATTRs.
I think there were some interesting timing issues with swtpm
attempting to lock the file while libvirt was trying to perform its
own metadata bookkeeping. I need to revisit that because honestly
I've forgotten almost everything about it.
> One thing that concerns me is that the changes appear to only be
on
> the destination side of the migration. So only the remote -> local
> path is affected. But what about the local -> remote path?
On the source side, at least for disks we simply nuke the
metadata(xattrs). In fact we do that in most cases when stopping the VM,
not only for migration.
See qemuProcessStop(), in the loop which calls
qemuBlockRemoveImageMetadata(). That call removes the metadata from all
disk top level sources.
I see, thanks for the pointer.
> If we assume remember_owner=1, then obviously the various
XATTRs
> related to remembering will be set on domain startup. What will take
> care of collecting that garbage once local -> remote migration has
> been performed? I think things might be fine based on the fact that
> the XATTRs seems to be dropped for disks even in the current state,
> but I thought I'd mention this specifically just in case.
So apart from the code I've mentioned above which explicitly drops the
xattrs, they are also not shared via NFS. Thus the remote side itself
would not get them.
If we'd leak them though it would mean that a re-start or incoming
migration on the source of
the migration would no longer work, so we need the same treatment for
the other resources as well.
Right, that's exactly my concern. I have encountered at least one
case in which XATTRs not being cleaned up correctly on (failed?)
migration resulted in having to perform manual fixup tasks before the
domain would start again.
I'll post another version adding the missing treatment for uefi,
but I
don't think I care enough about TPMs to go digging how that stuff is
plumbed in or why it complains about locks being held which don't seem
to be held by libvirt.
I will try to look into it myself on top of your updated patches. I'm
pretty sure TPM is a requirement for KubeVirt, so we definitely need
it to be handled correctly or the only recourse from their side will
be to set remember_owner=0 - and we've already established that we
don't want that to happen :)
--
Andrea Bolognani / Red Hat / Virtualization