Re: Virtqemud wants to unlink /dev/urandom

[adding back libvir-list to the Cc] On Fri, Mar 11, 2022 at 03:55:03PM +0100, Nikola Knazekova wrote:
Hey Martin,
thanks for your resposne.
I don't know if it is happening in the mount namespace. Can you look at the logs in attachment?
It was happening on clear install on F35, F36 and on older versions probably too. But it is only an issue in the new selinux policy for libvirt. In old selinux policy is allowed for virtd to unlink /dev/urandom char files. I just wanted to be sure if it is ok to allow it for virtqemud.
That actually might be the case, that it actually does set the context on /dev/urandom correctly and then the unlink fails for virtqemud since the selinux policy only accounts for libvirtd even though we switched to modular daemons making virtqemud the one to do the work. @Michal can you confirm what I'm guessing here since you did a lot of the mount namespace work which I presume is what contributes to the issue here. In the meantime, would you mind trying this with the mount namespace feature turned off in /etc/libvirt/qemu.conf like this: namespaces = [] Thanks.
Regards, Nikola
On Thu, Feb 24, 2022 at 3:00 PM Martin Kletzander <mkletzan@redhat.com> wrote:
On Thu, Feb 24, 2022 at 01:41:50PM +0100, Nikola Knazekova wrote:
Hi,
when I am creating virtual machine on system with new SELinux policy for Libvirt, I am getting this error message:
Unable to complete install: 'Unable to create device /dev/urandom: File exists' Traceback (most recent call last): File "/usr/share/virt-manager/virtManager/asyncjob.py", line 65, in cb_wrapper callback(asyncjob, *args, **kwargs) File "/usr/share/virt-manager/virtManager/createvm.py", line 2001, in _do_async_install installer.start_install(guest, meter=meter) File "/usr/share/virt-manager/virtinst/install/installer.py", line 701, in start_install domain = self._create_guest( File "/usr/share/virt-manager/virtinst/install/installer.py", line 649, in _create_guest domain = self.conn.createXML(install_xml or final_xml, 0) File "/usr/lib64/python3.10/site-packages/libvirt.py", line 4393, in createXML raise libvirtError('virDomainCreateXML() failed') libvirt.libvirtError: Unable to create device /dev/urandom: File exists
And SELinux denial, where SELinux prevents virtqemud to unlink character device /dev/urandom:
time->Wed Feb 23 19:30:33 2022 type=PROCTITLE msg=audit(1645662633.819:930):
proctitle=2F7573722F7362696E2F7669727471656D7564002D2D74696D656F757400313230 type=PATH msg=audit(1645662633.819:930): item=1 name="/dev/urandom" inode=6 dev=00:44 mode=020666 ouid=0 ogid=0 rdev=01:09 obj=system_u:object_r:urandom_device_t:s0 nametype=DELETE cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0 type=PATH msg=audit(1645662633.819:930): item=0 name="/dev/" inode=1 dev=00:44 mode=040755 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:tmpfs_t:s0 nametype=PARENT cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0 type=CWD msg=audit(1645662633.819:930): cwd="/" type=SYSCALL msg=audit(1645662633.819:930): arch=c000003e syscall=87 success=no exit=-13 a0=7f9418064f50 a1=7f943909c930 a2=7f941d0ef6d4 a3=0 items=2 ppid=6722 pid=7196 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="rpc-worker" exe="/usr/sbin/virtqemud" subj=system_u:system_r:virtqemud_t:s0 key=(null) type=AVC msg=audit(1645662633.819:930): avc: denied { unlink } for pid=7196 comm="rpc-worker" name="urandom" dev="tmpfs" ino=6 scontext=system_u:system_r:virtqemud_t:s0 tcontext=system_u:object_r:urandom_device_t:s0 tclass=chr_file permissive=0
Is this expected behavior?
The error is not, but creating and removing /dev/urandom is fine, as far as it happens in the mount namespace of the domain, which we create and as such we also need to create some basic /dev structure in there.
Unfortunately this error does not show whether it is happening in the mount namespace, although it should definitely _not_ happen outside of it.
Does this happen on clean install? What is the version of libvirt and the selinux policy? What's the distro+version of the system? Would you mind capturing the debug logs and attaching them?
How to capture debug logs: https://libvirt.org/kbase/debuglogs.html
Thanks, Nikola
2022-03-04 03:08:28.053+0000: starting up libvirt version: 8.0.0, package: 2.fc36 (Fedora Project, 2022-01-20-17:44:09, ), qemu version: 6.2.0qemu-6.2.0-5.fc36, kernel: 5.17.0-0.rc5.102.fc36.x86_64, hostname: fedora LC_ALL=C \ PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin \ HOME=/var/lib/libvirt/qemu/domain-4-fedora35-3 \ XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-4-fedora35-3/.local/share \ XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-4-fedora35-3/.cache \ XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-4-fedora35-3/.config \ /usr/bin/qemu-system-x86_64 \ -name guest=fedora35-3,debug-threads=on \ -S \ -object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-4-fedora35-3/master-key.aes"}' \ -machine pc-q35-6.2,usb=off,vmport=off,dump-guest-core=off,memory-backend=pc.ram \ -accel kvm \ -cpu host,migratable=on \ -m 2048 \ -object '{"qom-type":"memory-backend-ram","id":"pc.ram","size":2147483648}' \ -overcommit mem-lock=off \ -smp 2,sockets=2,cores=1,threads=1 \ -uuid 818068b5-c72b-475c-a960-231f29f60464 \ -no-user-config \ -nodefaults \ -chardev socket,id=charmonitor,fd=28,server=on,wait=off \ -mon chardev=charmonitor,id=monitor,mode=control \ -rtc base=utc,driftfix=slew \ -global kvm-pit.lost_tick_policy=delay \ -no-hpet \ -no-shutdown \ -global ICH9-LPC.disable_s3=1 \ -global ICH9-LPC.disable_s4=1 \ -boot strict=on \ -device pcie-root-port,port=16,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \ -device pcie-root-port,port=17,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \ -device pcie-root-port,port=18,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 \ -device pcie-root-port,port=19,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 \ -device pcie-root-port,port=20,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 \ -device pcie-root-port,port=21,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 \ -device pcie-root-port,port=22,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 \ -device pcie-root-port,port=23,chassis=8,id=pci.8,bus=pcie.0,addr=0x2.0x7 \ -device pcie-root-port,port=24,chassis=9,id=pci.9,bus=pcie.0,multifunction=on,addr=0x3 \ -device pcie-root-port,port=25,chassis=10,id=pci.10,bus=pcie.0,addr=0x3.0x1 \ -device pcie-root-port,port=26,chassis=11,id=pci.11,bus=pcie.0,addr=0x3.0x2 \ -device pcie-root-port,port=27,chassis=12,id=pci.12,bus=pcie.0,addr=0x3.0x3 \ -device pcie-root-port,port=28,chassis=13,id=pci.13,bus=pcie.0,addr=0x3.0x4 \ -device pcie-root-port,port=29,chassis=14,id=pci.14,bus=pcie.0,addr=0x3.0x5 \ -device qemu-xhci,p2=15,p3=15,id=usb,bus=pci.2,addr=0x0 \ -device virtio-serial-pci,id=virtio-serial0,bus=pci.3,addr=0x0 \ -blockdev '{"driver":"file","filename":"/var/lib/libvirt/images/fedora35-3.qcow2","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-2-format","read-only":false,"discard":"unmap","driver":"qcow2","file":"libvirt-2-storage","backing":null}' \ -device virtio-blk-pci,bus=pci.4,addr=0x0,drive=libvirt-2-format,id=virtio-disk0,bootindex=2 \ -blockdev '{"driver":"file","filename":"/home/n/Downloads/Fedora-Workstation-Live-x86_64-35-1.2.iso","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-1-format","read-only":true,"driver":"raw","file":"libvirt-1-storage"}' \ -device ide-cd,bus=ide.0,drive=libvirt-1-format,id=sata0-0-0,bootindex=1 \ -netdev tap,fd=29,id=hostnet0,vhost=on,vhostfd=31 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:81:d4:90,bus=pci.1,addr=0x0 \ -chardev pty,id=charserial0 \ -device isa-serial,chardev=charserial0,id=serial0 \ -chardev socket,id=charchannel0,fd=27,server=on,wait=off \ -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 \ -chardev spicevmc,id=charchannel1,name=vdagent \ -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 \ -device usb-tablet,id=input0,bus=usb.0,port=1 \ -audiodev '{"id":"audio1","driver":"spice"}' \ -spice port=5900,addr=127.0.0.1,disable-ticketing=on,image-compression=off,seamless-migration=on \ -device virtio-vga,id=video0,max_outputs=1,bus=pcie.0,addr=0x1 \ -device ich9-intel-hda,id=sound0,bus=pcie.0,addr=0x1b \ -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0,audiodev=audio1 \ -chardev spicevmc,id=charredir0,name=usbredir \ -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 \ -chardev spicevmc,id=charredir1,name=usbredir \ -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 \ -device virtio-balloon-pci,id=balloon0,bus=pci.5,addr=0x0 \ -object '{"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"}' \ -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.6,addr=0x0 \ -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ -msg timestamp=on char device redirected to /dev/pts/2 (label charserial0) 2022-03-04T03:09:24.261105Z qemu-system-x86_64: terminating on signal 15 from pid 8179 (/usr/sbin/virtqemud) 2022-03-04 03:09:24.461+0000: shutting down, reason=destroyed

On 3/14/22 12:45, Martin Kletzander wrote:
[adding back libvir-list to the Cc]
On Fri, Mar 11, 2022 at 03:55:03PM +0100, Nikola Knazekova wrote:
Hey Martin,
thanks for your resposne.
I don't know if it is happening in the mount namespace. Can you look at the logs in attachment?
It was happening on clear install on F35, F36 and on older versions probably too. But it is only an issue in the new selinux policy for libvirt. In old selinux policy is allowed for virtd to unlink /dev/urandom char files. I just wanted to be sure if it is ok to allow it for virtqemud.
That actually might be the case, that it actually does set the context on /dev/urandom correctly and then the unlink fails for virtqemud since the selinux policy only accounts for libvirtd even though we switched to modular daemons making virtqemud the one to do the work.
@Michal can you confirm what I'm guessing here since you did a lot of the mount namespace work which I presume is what contributes to the issue here.
In the meantime, would you mind trying this with the mount namespace feature turned off in /etc/libvirt/qemu.conf like this:
namespaces = []
Yeah, this will definitely help. So, a short introduction into how libvirt starts a QEMU guest. It creates a mount namespace so that QEMU doesn't have access to all the files in the system. In this namespace (which is per each QEMU process) firstly very few paths are populated independent of guest configuration (like /dev/null, /dev/random/, /dev/urandom, etc.) - the full list is accessible here: https://gitlab.com/libvirt/libvirt/-/blob/master/src/qemu/qemu.conf#L565 (yes, it's the cgroup_device_acl list - because what you want to enable in CGroups you want to expose in the namespace) Then, the paths from domain XML are created using the following function: https://gitlab.com/libvirt/libvirt/-/blob/master/src/qemu/qemu_namespace.c#L... This function is written in a fashion that allows files to exist and if needed [1] it simply unlink()-s existing file and creates it from scratch again. Now, since you configured TPM for your guest with /dev/urandom as a backend, this node is created twice. The first time among with other cgroup_device_acl files, the second because of TPM from your domain config. 1: needed is probably a bad word, and in fact we can be more clever about it. We might check whether given device already exists and if it has the same MAJ:MIN and act accordingly. The same applies for symlinks. Let me see if I can cook up a patch that implements this idea. Michal

Hi guys, Thank you very much for the detailed explanation. With the mount namespace feature turned off, there were no SELinux denials. Michal I saw your commit <https://gitlab.com/libvirt/libvirt/-/commit/22188790cad490f51e73dabcac65736c3b8871a7>, where firstly the existence of devices is checked. I assume when some correction is required, virtqemud will still need unlink permission, right? Nikola On Mon, Mar 14, 2022 at 1:12 PM Michal Prívozník <mprivozn@redhat.com> wrote:
On 3/14/22 12:45, Martin Kletzander wrote:
[adding back libvir-list to the Cc]
On Fri, Mar 11, 2022 at 03:55:03PM +0100, Nikola Knazekova wrote:
Hey Martin,
thanks for your resposne.
I don't know if it is happening in the mount namespace. Can you look at the logs in attachment?
It was happening on clear install on F35, F36 and on older versions probably too. But it is only an issue in the new selinux policy for libvirt. In old selinux policy is allowed for virtd to unlink /dev/urandom char files. I just wanted to be sure if it is ok to allow it for virtqemud.
That actually might be the case, that it actually does set the context on /dev/urandom correctly and then the unlink fails for virtqemud since the selinux policy only accounts for libvirtd even though we switched to modular daemons making virtqemud the one to do the work.
@Michal can you confirm what I'm guessing here since you did a lot of the mount namespace work which I presume is what contributes to the issue here.
In the meantime, would you mind trying this with the mount namespace feature turned off in /etc/libvirt/qemu.conf like this:
namespaces = []
Yeah, this will definitely help. So, a short introduction into how libvirt starts a QEMU guest. It creates a mount namespace so that QEMU doesn't have access to all the files in the system. In this namespace (which is per each QEMU process) firstly very few paths are populated independent of guest configuration (like /dev/null, /dev/random/, /dev/urandom, etc.) - the full list is accessible here:
https://gitlab.com/libvirt/libvirt/-/blob/master/src/qemu/qemu.conf#L565
(yes, it's the cgroup_device_acl list - because what you want to enable in CGroups you want to expose in the namespace)
Then, the paths from domain XML are created using the following function:
https://gitlab.com/libvirt/libvirt/-/blob/master/src/qemu/qemu_namespace.c#L...
This function is written in a fashion that allows files to exist and if needed [1] it simply unlink()-s existing file and creates it from scratch again. Now, since you configured TPM for your guest with /dev/urandom as a backend, this node is created twice. The first time among with other cgroup_device_acl files, the second because of TPM from your domain config.
1: needed is probably a bad word, and in fact we can be more clever about it. We might check whether given device already exists and if it has the same MAJ:MIN and act accordingly. The same applies for symlinks.
Let me see if I can cook up a patch that implements this idea.
Michal

On 3/16/22 12:40, Nikola Knazekova wrote:
Hi guys,
Thank you very much for the detailed explanation.
With the mount namespace feature turned off, there were no SELinux denials.
Michal I saw yourcommit <https://gitlab.com/libvirt/libvirt/-/commit/22188790cad490f51e73dabcac65736c3b8871a7>, where firstly the existence of devices is checked. I assume when some correction is required, virtqemud will still need unlink permission, right?
Correct. So users can still hotplug and hotunplug devices from running guests. In case of hotunplug libvirt will remove corresponding /dev node. For instance, PCI devices need /dev/vfio/vfio. But if you hotunplug last PCI device from your guest, then libvirt will also remove /dev/vfio/vfio from the namespace. Therefore, we still need libvirt/virtqemud/virtlxcd to be able to remove files from under /dev. Michal
participants (3)
-
Martin Kletzander
-
Michal Prívozník
-
Nikola Knazekova