[PATCH 0/2] security_selinux: Don't ignore NVMe disks when setting image label

The first patch is the important one. But since I could not decide which code looks better I've posted 2/2 too which moves a check. Michal Prívozník (2): security_selinux: Don't ignore NVMe disks when setting image label security_selinux: Move shortcut in virSecuritySELinuxSetImageLabelInternal() later src/security/security_selinux.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) -- 2.35.1

For NVMe disks we skip setting SELinux label on corresponding VFIO group (/dev/vfio/X). This bug is the best visible with namespaces and goes as follows: 1) libvirt assigns NVMe disk to vfio-pci driver, 2) kernel creates /dev/vfio/X node with generic device_t SELinux label, 3) our namespace code creates the exact copy of the node in domain's private /dev, 4) SELinux policy kicks in an changes the label on the node to vfio_device_t (in the top most namespace), 5) libvirt tells QEMU to attach the NVMe disk, which is denied by SELinux policy. While one can argue that kernel should have created the /dev/vfio/X node with the correct SELinux label from the beginning (step 2), libvirt can't rely on that and needs to set label on its own. Surprisingly, I already wrote the code that aims on this specific case (v6.0.0-rc1~241), but because of a shortcut we take earlier it is never ran. The reason is that virStorageSourceIsLocalStorage() considers NVMe disks as non-local because their source is not accessible via src->path (or even if it is, it's not a local path). Therefore, do not exit early for NVMe disks and let the function continue. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2121441 Fixes: 284a12bae0e4cf93ea72797965d6c12e3a103f40 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/security/security_selinux.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/security/security_selinux.c b/src/security/security_selinux.c index 9f2872decc..a296cb7613 100644 --- a/src/security/security_selinux.c +++ b/src/security/security_selinux.c @@ -1818,7 +1818,11 @@ virSecuritySELinuxSetImageLabelInternal(virSecurityManager *mgr, const char *path = src->path; int ret; - if (!src->path || !virStorageSourceIsLocalStorage(src)) + /* Special case NVMe. Per virStorageSourceIsLocalStorage() it's + * considered not local, but we still want the code below to set + * label on VFIO group. */ + if (src->type != VIR_STORAGE_TYPE_NVME && + (!src->path || !virStorageSourceIsLocalStorage(src))) return 0; secdef = virDomainDefGetSecurityLabelDef(def, SECURITY_SELINUX_NAME); -- 2.35.1

On Thu, Sep 22, 2022 at 13:40:43 +0200, Michal Privoznik wrote:
For NVMe disks we skip setting SELinux label on corresponding VFIO group (/dev/vfio/X). This bug is the best visible with
best visible? or only visible?
namespaces and goes as follows:
1) libvirt assigns NVMe disk to vfio-pci driver, 2) kernel creates /dev/vfio/X node with generic device_t SELinux label, 3) our namespace code creates the exact copy of the node in domain's private /dev, 4) SELinux policy kicks in an changes the label on the node to vfio_device_t (in the top most namespace),
Seems like this would prevent that bug without namespaces.
5) libvirt tells QEMU to attach the NVMe disk, which is denied by SELinux policy.
While one can argue that kernel should have created the /dev/vfio/X node with the correct SELinux label from the beginning (step 2), libvirt can't rely on that and needs to set label on its own.
Surprisingly, I already wrote the code that aims on this specific case (v6.0.0-rc1~241), but because of a shortcut we take earlier it is never ran. The reason is that virStorageSourceIsLocalStorage() considers NVMe disks as non-local because their source is not accessible via src->path (or even if it is, it's not a local path).
Therefore, do not exit early for NVMe disks and let the function continue.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2121441 Fixes: 284a12bae0e4cf93ea72797965d6c12e3a103f40 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/security/security_selinux.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/src/security/security_selinux.c b/src/security/security_selinux.c index 9f2872decc..a296cb7613 100644 --- a/src/security/security_selinux.c +++ b/src/security/security_selinux.c @@ -1818,7 +1818,11 @@ virSecuritySELinuxSetImageLabelInternal(virSecurityManager *mgr, const char *path = src->path; int ret;
- if (!src->path || !virStorageSourceIsLocalStorage(src)) + /* Special case NVMe. Per virStorageSourceIsLocalStorage() it's + * considered not local, but we still want the code below to set + * label on VFIO group. */
This comment gets deleted right in the next patch. Consider just omitting it completely.
+ if (src->type != VIR_STORAGE_TYPE_NVME && + (!src->path || !virStorageSourceIsLocalStorage(src))) return 0;
Reviewed-by: Peter Krempa <pkrempa@redhat.com>

On 9/22/22 13:57, Peter Krempa wrote:
On Thu, Sep 22, 2022 at 13:40:43 +0200, Michal Privoznik wrote:
For NVMe disks we skip setting SELinux label on corresponding VFIO group (/dev/vfio/X). This bug is the best visible with
best visible? or only visible?
Yeah, I think the former because if SELinux policy wakes up AFTER we instruct QEMU you'd hit the same issue. Although that is almost impossible because plenty of stuff happens in libvirt after we've detached NVMe from the host and before we instruct QEMU, so it's rather theoretical. But okay, I'll change it to the latter to avoid confusion.
namespaces and goes as follows:
1) libvirt assigns NVMe disk to vfio-pci driver, 2) kernel creates /dev/vfio/X node with generic device_t SELinux label, 3) our namespace code creates the exact copy of the node in domain's private /dev, 4) SELinux policy kicks in an changes the label on the node to vfio_device_t (in the top most namespace),
Seems like this would prevent that bug without namespaces.
Yeah, almost. I'm still puzzled how this worked in RHEL-8 era, because I'm almost certain that NVMe hotplug was tested there too (I mean this code I'm fixing was released in libvirt-6.0.0). Perhaps kernel creates the vfio node with the correct context from the beginning? Or maybe, the policy wasn't that strict.
5) libvirt tells QEMU to attach the NVMe disk, which is denied by SELinux policy.
While one can argue that kernel should have created the /dev/vfio/X node with the correct SELinux label from the beginning (step 2), libvirt can't rely on that and needs to set label on its own.
Surprisingly, I already wrote the code that aims on this specific case (v6.0.0-rc1~241), but because of a shortcut we take earlier it is never ran. The reason is that virStorageSourceIsLocalStorage() considers NVMe disks as non-local because their source is not accessible via src->path (or even if it is, it's not a local path).
Therefore, do not exit early for NVMe disks and let the function continue.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2121441 Fixes: 284a12bae0e4cf93ea72797965d6c12e3a103f40 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/security/security_selinux.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/src/security/security_selinux.c b/src/security/security_selinux.c index 9f2872decc..a296cb7613 100644 --- a/src/security/security_selinux.c +++ b/src/security/security_selinux.c @@ -1818,7 +1818,11 @@ virSecuritySELinuxSetImageLabelInternal(virSecurityManager *mgr, const char *path = src->path; int ret;
- if (!src->path || !virStorageSourceIsLocalStorage(src)) + /* Special case NVMe. Per virStorageSourceIsLocalStorage() it's + * considered not local, but we still want the code below to set + * label on VFIO group. */
This comment gets deleted right in the next patch. Consider just omitting it completely.
Per your review, it's not going to be merged, so I'll keep it.
+ if (src->type != VIR_STORAGE_TYPE_NVME && + (!src->path || !virStorageSourceIsLocalStorage(src))) return 0;
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Pushed, thanks! Michal

At the beginning of virSecuritySELinuxSetImageLabelInternal() there's a check that allows the function return early. In previous patch the check was extended to not return early for NVMe disks. However, there's no such check in other drivers (DAC, AppArmor). Therefore, move the check a couple of line down so that the resulting code is at least somewhat similar to the rest of secdrivers. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/security/security_selinux.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/src/security/security_selinux.c b/src/security/security_selinux.c index a296cb7613..26c6b281cc 100644 --- a/src/security/security_selinux.c +++ b/src/security/security_selinux.c @@ -1818,13 +1818,6 @@ virSecuritySELinuxSetImageLabelInternal(virSecurityManager *mgr, const char *path = src->path; int ret; - /* Special case NVMe. Per virStorageSourceIsLocalStorage() it's - * considered not local, but we still want the code below to set - * label on VFIO group. */ - if (src->type != VIR_STORAGE_TYPE_NVME && - (!src->path || !virStorageSourceIsLocalStorage(src))) - return 0; - secdef = virDomainDefGetSecurityLabelDef(def, SECURITY_SELINUX_NAME); if (!secdef || !secdef->relabel) return 0; @@ -1882,6 +1875,8 @@ virSecuritySELinuxSetImageLabelInternal(virSecurityManager *mgr, return -1; path = vfioGroupDev; + } else if (!path || !virStorageSourceIsLocalStorage(src)) { + return 0; } ret = virSecuritySELinuxSetFilecon(mgr, path, use_label, remember); -- 2.35.1

On Thu, Sep 22, 2022 at 13:40:44 +0200, Michal Privoznik wrote:
At the beginning of virSecuritySELinuxSetImageLabelInternal() there's a check that allows the function return early. In previous patch the check was extended to not return early for NVMe disks. However, there's no such check in other drivers (DAC, AppArmor). Therefore, move the check a couple of line down so that the resulting code is at least somewhat similar to the rest of secdrivers.
In the DAC driver the (almost) exact check is inside virSecurityDACSetOwnership, which has an additional argument to override anything in virStorageSource, thus the commit message is slightly misleading.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/security/security_selinux.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-)
diff --git a/src/security/security_selinux.c b/src/security/security_selinux.c index a296cb7613..26c6b281cc 100644 --- a/src/security/security_selinux.c +++ b/src/security/security_selinux.c @@ -1818,13 +1818,6 @@ virSecuritySELinuxSetImageLabelInternal(virSecurityManager *mgr, const char *path = src->path; int ret;
- /* Special case NVMe. Per virStorageSourceIsLocalStorage() it's - * considered not local, but we still want the code below to set - * label on VFIO group. */ - if (src->type != VIR_STORAGE_TYPE_NVME && - (!src->path || !virStorageSourceIsLocalStorage(src))) - return 0; - secdef = virDomainDefGetSecurityLabelDef(def, SECURITY_SELINUX_NAME); if (!secdef || !secdef->relabel) return 0; @@ -1882,6 +1875,8 @@ virSecuritySELinuxSetImageLabelInternal(virSecurityManager *mgr, return -1;
path = vfioGroupDev; + } else if (!path || !virStorageSourceIsLocalStorage(src)) { + return 0;
The full context of the block related to NVME is: /* This is not very clean. But so far we don't have NVMe * storage pool backend so that its chownCallback would be * called. And this place looks least offensive. */ if (src->type == VIR_STORAGE_TYPE_NVME) { const virStorageSourceNVMeDef *nvme = src->nvme; if (!(vfioGroupDev = virPCIDeviceAddressGetIOMMUGroupDev(&nvme->pciAddr))) return -1; path = vfioGroupDev; } ret = virSecuritySELinuxSetFilecon(mgr, path, use_label, remember); if (ret == 1 && !disk_seclabel) { /* If we failed to set a label, but virt_use_nfs let us * proceed anyway, then we don't need to relabel later. */ disk_seclabel = virSecurityDeviceLabelDefNew(SECURITY_SELINUX_NAME); if (!disk_seclabel) return -1; disk_seclabel->labelskip = true; VIR_APPEND_ELEMENT(src->seclabels, src->nseclabels, disk_seclabel); ret = 0; } There are two odd things here: 1) The comment above the block is the same as in the DAC driver, but in fact it makes no sense in the selinux driver as there's no 'chown_callback' in the selinux driver. 2) The code which follows 'virSecuritySELinuxSetFilecon' seems to be dead as virSecuritySELinuxSetFilecon can't ever return '1'. This seems to have changed when the security labelling remembering was added. Specifically 'virSecuritySELinuxSetFileconImpl' does return 1 when the label was not set, but only one of the caller cares about the return value to skip relabelling. If that is desired you can: 1) clean up the code to remove the dead code block 2) call virSecuritySELinuxSetFilecon directly from the block special-casing NVMe disks and return right away. 3) move the check for local storage below that block 4) remove the misleading comment above the NVME special case block That way it will resemble the code in other storage drivers more.
participants (3)
-
Michal Privoznik
-
Michal Prívozník
-
Peter Krempa