
On 4/17/24 11:20, Andrea Bolognani wrote:
On Wed, Mar 20, 2024 at 09:10:48AM -0700, Andrea Bolognani wrote:
On Wed, Mar 20, 2024 at 10:18:39AM -0400, Stefan Berger wrote:
On 3/20/24 08:23, Peter Krempa wrote:
Did you consider the case when the migration fails and the VM will be restored to run on the source host again? In such case doin the relabelling might break the source host.
Right. I seem to remember testing such scenarios. I had to put an exit() (or something like it) into swtpm on the destination side to trigger the fallback to the source side. The swtpm on the source side had closed file access and wants to open them (lockfile) again and so the files needed to be labeled correctly if the storage on the source side is on the disk and exported via NFS from there (iirc). If the storage is NFS-exported from a 3rd host it probably would not require the labels.
I didn't really consider the failure scenario, so thank you for bringing that up.
I think it would be still fine. If the source has NFS storage, then access will keep working regardless of what relabeling the destination has been up to in the meantime. And if the source has local storage, then the relabeling on the destination (via NFS) will not actually have touched the SELinux labels.
The only concern I have is that, when going from local to NFS, labels might have been restored on the source side. But I assume that restoring only happens once the migration has been confirmed as successful? I'll check.
Once again, as far as I can tell (please let me know if I'm wrong!) there is no special casing when it comes to disks and other types of persistent storage, so if this approach was problematic I would have expected many issues to have been reported by now.
I've tested this to confirm. My trick to simulating a migration failure was to add
<qemu:commandline> <qemu:arg value='-machine'/> <qemu:arg value='pc-i440fx-6.0'/> </qemu:commandline>
to the migration XML, where the VM uses a different machine type in its configuration. This results in something like
process exited while connecting to monitor: qemu-system-x86_64: -device {"driver":"pcie-root-port","port":8,"chassis":1,"id":"pci.1",\ "bus":"pcie.0","multifunction":true,"addr":"0x1"}: Bus 'pcie.0' not found
which should be a decent enough proxy for the kind of "we went as far as attempting to start the QEMU process on the destination host, then things went sideways" failure that we'd experience as a consequence of a permission error. Suggestions on how to improve the test methodology are very much appreciated :)
Anyway, based on what I've seen I think I can confirm my initial intuition as reported above. Things work the way you expect, in that upon migration failure the VM keeps happily chugging along on the source host.
If a file is labeled on the NFS server side then the current code doesn't relabel the state file on server side when we have an outgoing migration due to possible fall-back in case of error. With NFS at least not supporting SElinux labels (and security xattrs) we are fine on the server side if the fallback happens whatever was done with SELinux labeling on the client side (gets EOPNOTSUPP) because the label will not be changed on the server side. If there existed a shared filesystem that supports SELinux labels and propagated them we'd be in trouble at least for the fall-back scenario, right? I don't know whether any shared filesystem would ever share security xattrs across the network, though, so that this could become a problem.