Setting SELinux label on TPM state during migration

Stefan, as you saw, I'm trying to implement support for migration with TPM state on a shared volume. I mean, it is working when the shared volume is an NFS mount point because NFS does not really propagate SELinux labels, but rather has this 'virt_use_nfs' sebool which effectivelly allows all svirt_t processes to access NFS (thus including swtpm). But things get trickier when a distributed FS that knows SELinux properly (e.g. ceph) is used instead. What I am currently struggling with is - finding the sweet spot when the source swtpm has let go of the state and the destination has not accessed it (because if it did it would get EPERM). Bottom line - the SELinux label is generated dynamically on each guest startup (to ensure its uniqueness on the system). Therefore, the label on the destination is different to the one on the source. The behavior I'm seeing now is: 1) the source starts migration: {"execute":"migrate","arguments":{"detach":true,"resume":false,"uri":"fd:migrate"},"id":"libvirt-428"} 2) the destination does not touch the swtpm state right away, but when the TPM state comes in the migration stream, it is touched. Partial logs from the destination: -> {"execute":"migrate-incoming","arguments":{"uri":"tcp:[::]:49152"},"id":"libvirt-415"} <- {"timestamp": {"seconds": 1671449778, "microseconds": 500164}, "event": "MIGRATION", "data": {"status": "setup"}}] <- {"return": {}, "id": "libvirt-415"} <- {"timestamp": {"seconds": 1671449778, "microseconds": 732358}, "event": "MIGRATION", "data": {"status": "active"}} Now, before QEMU sends MIGRATION status:completed, I can see QEMU accessing the TPM state: Thread 1 "qemu-kvm" hit Breakpoint 1, tpm_emulator_set_state_blob (tpm_emu=0x5572af389cb0, type=1, tsb=0x5572af389db0, flags=0) at ../backends/tpm/tpm_emulator.c:796 796 { (gdb) bt #0 tpm_emulator_set_state_blob (tpm_emu=0x5572af389cb0, type=1, tsb=0x5572af389db0, flags=0) at ../backends/tpm/tpm_emulator.c:796 #1 0x00005572acf21fe5 in tpm_emulator_post_load (opaque=0x5572af389cb0, version_id=<optimized out>) at ../backends/tpm/tpm_emulator.c:868 #2 0x00005572acf25497 in vmstate_load_state (f=0x5572af512a10, vmsd=0x5572ad7743b8 <vmstate_tpm_emulator>, opaque=0x5572af389cb0, version_id=1) at ../migration/vmstate.c:162 #3 0x00005572acf45753 in qemu_loadvm_state_main (f=0x5572af512a10, mis=0x5572af39b4e0) at ../migration/savevm.c:876 #4 0x00005572acf47591 in qemu_loadvm_state (f=0x5572af512a10) at ../migration/savevm.c:2712 #5 0x00005572acf301f6 in process_incoming_migration_co (opaque=<optimized out>) at ../migration/migration.c:591 #6 0x00005572ad400976 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at ../util/coroutine-ucontext.c:177 #7 0x00007f64ca22a360 in ?? () from target:/lib64/libc.so.6 #8 0x00007f64c948cbc0 in ?? () #9 0x0000000000000000 in ?? () This in turn means that swtpm on the destination is going to CMD_INIT itself while the source is still using it. I wonder what we can do about this. Perhaps - postpone init until the time the vCPUs on the destination are resumed? That way libvirt on the source could restore labels (effectively cut the source swtpm process off the TPM state), then libvirtd on the destination could set the label and 'cont'. If 'cont' fails for whatever reason then the source libvirtd would just set the label on the TPM state and everything is back to normal. Corresponding BZ link: https://bugzilla.redhat.com/show_bug.cgi?id=2130192 Michal

On Mon, Dec 19, 2022 at 12:57:05PM +0100, Michal Prívozník wrote:
Stefan,
as you saw, I'm trying to implement support for migration with TPM state on a shared volume. I mean, it is working when the shared volume is an NFS mount point because NFS does not really propagate SELinux labels, but rather has this 'virt_use_nfs' sebool which effectivelly allows all svirt_t processes to access NFS (thus including swtpm). But things get trickier when a distributed FS that knows SELinux properly (e.g. ceph) is used instead.
What I am currently struggling with is - finding the sweet spot when the source swtpm has let go of the state and the destination has not accessed it (because if it did it would get EPERM).
Bottom line - the SELinux label is generated dynamically on each guest startup (to ensure its uniqueness on the system). Therefore, the label on the destination is different to the one on the source.
AFAIR, that's not a supported migration scenario, even without swtpm. If using dynamic label assignment, the VM label assignment MUST be scoped to the same realm which can access the filesystem volume(s) in use. For non-shared filesystems, this means label assignment only needs to be tracked per host. For shared filesystems, which do NOT have SELinux labelling, again label assignment only needs to be tracked per host For shared filesytems, which DO have SELinux labelling, then label assignment needs to be tracked across all hosts which can use that filesystem for VM storage. Libvirt is NOT capable of doing this tracking itself. So mgmt apps need to tell libvirt to use a static label which they've figured out uniqueness for globally. Libvirt can still do relabelling of files. IOW, we'd recommend this: <seclabel type='static' model='selinux' relabel='yes'>
I wonder what we can do about this.
Nothing. It is not a supportable scenario. It is in fact a security flaw for a mgmt app to configure use of dynamic labelling when using shared storage that honours per file SELinux labelling. It risks have multiple VMs on different hosts all using the same SELinux label, and loosing isolation of their storage which is on the same shared filesystem.
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 12/19/22 07:07, Daniel P. Berrangé wrote:
On Mon, Dec 19, 2022 at 12:57:05PM +0100, Michal Prívozník wrote:
Stefan,
as you saw, I'm trying to implement support for migration with TPM state on a shared volume. I mean, it is working when the shared volume is an NFS mount point because NFS does not really propagate SELinux labels, but rather has this 'virt_use_nfs' sebool which effectivelly allows all svirt_t processes to access NFS (thus including swtpm). But things get trickier when a distributed FS that knows SELinux properly (e.g. ceph) is used instead.
What I am currently struggling with is - finding the sweet spot when the source swtpm has let go of the state and the destination has not accessed it (because if it did it would get EPERM).
Bottom line - the SELinux label is generated dynamically on each guest startup (to ensure its uniqueness on the system). Therefore, the label on the destination is different to the one on the source.
AFAIR, that's not a supported migration scenario, even without swtpm.
If using dynamic label assignment, the VM label assignment MUST be scoped to the same realm which can access the filesystem volume(s) in use.
For non-shared filesystems, this means label assignment only needs to be tracked per host.
For shared filesystems, which do NOT have SELinux labelling, again label assignment only needs to be tracked per host
For shared filesytems, which DO have SELinux labelling, then label assignment needs to be tracked across all hosts which can use that filesystem for VM storage.
Libvirt is NOT capable of doing this tracking itself. So mgmt apps need to tell libvirt to use a static label which they've figured out uniqueness for globally. Libvirt can still do relabelling of files. IOW, we'd recommend this:
<seclabel type='static' model='selinux' relabel='yes'>
I wonder what we can do about this.
Nothing. It is not a supportable scenario.
It is in fact a security flaw for a mgmt app to configure use of dynamic labelling when using shared storage that honours per file SELinux labelling. It risks have multiple VMs on different hosts all using the same SELinux label, and loosing isolation of their storage which is on the same shared filesystem.
Other than generally bailing out on dynamically assigned security label assignments completely... maybe we need a table with shared filesystem types for which dynamic label assignment (from TPM migration purposes) at least 'works.' Stefan
With regards, Daniel
participants (3)
-
Daniel P. Berrangé
-
Michal Prívozník
-
Stefan Berger