Re: [PATCH 06/10] qemu: Introduce shared_filesystems configuration option

26 Mar 2024


      On 3/26/24 12:38, Andrea Bolognani wrote:
...
On Tue, Mar 26, 2024 at 12:04:21PM -0400, Stefan Berger wrote:
...
On 3/26/24 11:54, Andrea Bolognani wrote:
...
On Wed, Mar 20, 2024 at 08:43:24AM -0700, Andrea Bolognani wrote:
...
On Wed, Mar 20, 2024 at 12:37:37PM +0100, Peter Krempa wrote:
...
On Wed, Mar 20, 2024 at 10:19:11 +0100, Andrea Bolognani wrote:
...
+# libvirt will normally prevent migration if the storage backing the VM is not
+# on a shared filesystems. Sometimes, however, the storage *is* shared despite
+# not being detected as such: for example, this is the case when one of the
+# hosts involved in the migration is exporting its local storage to the other
+# one via NFS.
+#
+# Any directory listed here will be assumed to live on a shared filesystem,
+# making migration possible in scenarios such as the one described above.
+#
+# If you need this feature, you probably want to set remember_owner=0 too.
Could you please elaborate why you'd want to disable owner remembering?
With remote filesystems this works so I expect that if this makes
certain paths behave as shared filesystems, they should behave such
without any additional tweaks
To be quite honest I don't remember exactly why I've added that, but
I can confirm that if remember_owner=0 is not used on the destination
host then migration will stall for a bit and then fail with
error: unable to lock /var/lib/libvirt/swtpm/.../tpm2/.lock for
    metadata change: Resource temporarily unavailable
Things work fine if swtpm is not involved. I'm going to dig deeper,
but my guess is that, just like the situation addressed by the last
patch, having an additional process complicates things compared to
when we need to worry about QEMU only.
I've managed to track this down, and I wasn't far off.
The issue is that, when remember_owner is enabled, we perform a bunch
of additional locking on files around the actual relabel operation;
specifically, we call virSecurityManagerTransactionCommit() with the
last argument set to true.
This doesn't seem to cause any issues in general, *except* when it
comes to swtpm's storage lock. The swtpm process holds this lock
while it's running, and only releases it once migration is triggered.
So, when we're about to start the target swtpm process and want to
prepare the environment by setting up labels, we try to acquire the
storage lock, and can't proceed because the source swtpm process is
still holding on to it.
Who is 'we try to acquire the storage lock'? Is libvirt trying to acquire
swtpm's storage lock? I would assume that only an instance of swtpm would
acquire the lock.
Yes, it's libvirt doing that. The lock is only held for a brief
period while labels are being applied, and released immediately
afterwards. swtpm is only launched once that's done, so generally
there's no conflict.
Yes, I saw the code now. It kind of prevents lock files from being used.
...
In the migration case, things have worked so far because labeling in
general has been skipped for shared filesystem, which means that
locking was not performed either. This series changes things so that
labeling is necessary.
Thanks for the re-cap.
Does libvirt actually get involved in case of a migration failure and 
fallback to the source host so that it could again relabel all files 
before QEMU and swtpm (and possibly other virtual devices) again open 
their files?
...
...
...
The hacky patch below makes migration work even when remember_owner
is enabled. Obviously I'd rewrite it so that we'd only skip locking
for incoming migration, but I wonder if there could be nasty side
effects to this...
Other ideas? Can we perhaps change things so that swtpm releases the
lock earlier upon our request or something like that?
Maybe below functiokn needs to be passed an array of exceptions, one being
swtpm's lock file. I mean the lock file serves the purpose of locking via
filesystem, so I don't think a third party should start interfering here and
influencing the protocol ...
I guess the idea behind the locking is to prevent unrelated processes
(and possibly other libvirt threads?) from stepping on each other's
toes. Some exceptions are carved out already, so it's not like
there's no precedent for what we're suggesting. I'm just concerned
about accidentally opening the door for fun race conditions or CVEs.
Hm, maybe exceptions of filenames not to lock could be configured in the 
config file instead of hard coded but with concrete names already given 
in the sample config file so that users don't need to find out.

Some ideas about exceptions for files not to lock:
- there is a list of exception of files
- check whether a file is locked by a process with a given name in an 
exception list (e.g., swtpm in /proc/<flock->l_pid>/comm ).