On 4/17/24 11:20, Andrea Bolognani wrote:
On Wed, Mar 20, 2024 at 09:10:48AM -0700, Andrea Bolognani wrote:
> On Wed, Mar 20, 2024 at 10:18:39AM -0400, Stefan Berger wrote:
>> On 3/20/24 08:23, Peter Krempa wrote:
>>> Did you consider the case when the migration fails and the VM will be
>>> restored to run on the source host again? In such case doin the
>>> relabelling might break the source host.
>>
>> Right. I seem to remember testing such scenarios. I had to put an exit() (or
>> something like it) into swtpm on the destination side to trigger the
>> fallback to the source side. The swtpm on the source side had closed file
>> access and wants to open them (lockfile) again and so the files needed to be
>> labeled correctly if the storage on the source side is
>> on the disk and exported via NFS from there (iirc). If the storage is
>> NFS-exported from a 3rd host it probably would not require the labels.
>
> I didn't really consider the failure scenario, so thank you for
> bringing that up.
>
> I think it would be still fine. If the source has NFS storage, then
> access will keep working regardless of what relabeling the
> destination has been up to in the meantime. And if the source has
> local storage, then the relabeling on the destination (via NFS) will
> not actually have touched the SELinux labels.
>
> The only concern I have is that, when going from local to NFS, labels
> might have been restored on the source side. But I assume that
> restoring only happens once the migration has been confirmed as
> successful? I'll check.
>
> Once again, as far as I can tell (please let me know if I'm wrong!)
> there is no special casing when it comes to disks and other types of
> persistent storage, so if this approach was problematic I would have
> expected many issues to have been reported by now.
I've tested this to confirm. My trick to simulating a migration
failure was to add
<qemu:commandline>
<qemu:arg value='-machine'/>
<qemu:arg value='pc-i440fx-6.0'/>
</qemu:commandline>
to the migration XML, where the VM uses a different machine type in
its configuration. This results in something like
process exited while connecting to monitor: qemu-system-x86_64:
-device
{"driver":"pcie-root-port","port":8,"chassis":1,"id":"pci.1",\
"bus":"pcie.0","multifunction":true,"addr":"0x1"}:
Bus 'pcie.0' not found
which should be a decent enough proxy for the kind of "we went as far
as attempting to start the QEMU process on the destination host, then
things went sideways" failure that we'd experience as a consequence
of a permission error. Suggestions on how to improve the test
methodology are very much appreciated :)
Anyway, based on what I've seen I think I can confirm my initial
intuition as reported above. Things work the way you expect, in that
upon migration failure the VM keeps happily chugging along on the
source host.
If a file is labeled on the NFS server side then the current code
doesn't relabel the state file on server side when we have an outgoing
migration due to possible fall-back in case of error. With NFS at least
not supporting SElinux labels (and security xattrs) we are fine on the
server side if the fallback happens whatever was done with SELinux
labeling on the client side (gets EOPNOTSUPP) because the label will not
be changed on the server side. If there existed a shared filesystem that
supports SELinux labels and propagated them we'd be in trouble at least
for the fall-back scenario, right? I don't know whether any shared
filesystem would ever share security xattrs across the network, though,
so that this could become a problem.