Problem with a disk device of type 'volume'

Hi, I need some help to debug a problem with libvirt and a disk device of type 'volume'. I have a VM failing to start with the following error : $ virsh -c qemu:///system start server error :Failed to start domain 'server' error :internal error: process exited while connecting to monitor: 2022-08-13T09:26:50.121259Z qemu-system-x86_64: -blockdev {"driver":"file","filename":"/mnt/images/debian-11-genericcloud-amd64.qcow2","node-name":"libvirt-3-storage","auto-read-only":true,"discard":"unmap"}: Could not open '/mnt/images/debian-11-genericcloud-amd64.qcow2': Permission denied I check the file access permission, but they are correct. I try to set everything to 777 or run qemu as root, but the problem persist. $ ll -d /mnt /mnt/images /mnt/images/* drwxr-xr-x 9 root root 4,0K 31 déc. 2021 /mnt drwxr-xr-x 2 root root 4,0K 13 août 11:31 /mnt/images -rw-r--r-- 1 libvirt-qemu libvirt-qemu 242M 13 août 11:31 /mnt/images/debian-11-genericcloud-amd64.qcow2 -rw-r--r-- 1 libvirt-qemu libvirt-qemu 366K 13 août 11:31 /mnt/images/server_cloudinit.iso -rw-r--r-- 1 libvirt-qemu libvirt-qemu 593M 13 août 11:59 /mnt/images/server_image.qcow2 After a lot of searching and testing, I found out that the disk device definition is linked to the source of the problem. The disk device is defined like this : <disk type="volume" device="disk"> <driver name="qemu" type="qcow2"/> <source pool="TERRAFORM" volume="server_image.qcow2"/> <target dev="vda" bus="virtio"/> <address type="pci" domain="0x0000" bus="0x00" slot="0x05" function="0x0"/> </disk> This image 'server_image.qcow2' use a backing file: $ qemu-img info /mnt/images/server_image.qcow2 --backing-chain image: /mnt/stockage_rapide/VMs/terraform/puppetdev_server_image.qcow2 file format: qcow2 virtual size: 6 GiB (6442450944 bytes) disk size: 475 MiB cluster_size: 65536 backing file: /mnt/images/debian-11-genericcloud-amd64.qcow2 backing file format: qcow2 Format specific information: compat: 0.10 compression type: zlib refcount bits: 16 image: /mnt/images/debian-11-genericcloud-amd64.qcow2 file format: qcow2 virtual size: 2 GiB (2147483648 bytes) disk size: 242 MiB cluster_size: 65536 Format specific information: compat: 1.1 compression type: zlib lazy refcounts: false refcount bits: 16 corrupt: false extended l2: false And here is the definition of the associated storage pool : <pool type="dir"> <name>TERRAFORM</name> <uuid>dae00836-db4d-49ba-9d32-1f0278055516</uuid> <capacity unit="bytes">155674652672</capacity> <allocation unit="bytes">74396299264</allocation> <available unit="bytes">81278353408</available> <source> </source> <target> <path>/mnt/images</path> <permissions> <mode>0755</mode> <owner>0</owner> <group>0</group> </permissions> </target> </pool> If I changed the disk device definition to this (and changing only that), the domain start and works fine (no permission problem !). <disk type="file" device="disk"> <driver name="qemu" type="qcow2"/> <source file="/mnt/images/server_image.qcow2"/> <target dev="vda" bus="virtio"/> <address type="pci" domain="0x0000" bus="0x00" slot="0x05" function="0x0"/> </disk> Could you help me find the reason why the domain doesn't work when the disk device is of type 'volume' ? Thanks in advance for your help. Regards, Fred Additional information: - Running this on Debian 11 with libvirt 8.0.0 (from backports) and qemu 7.0 (from backports). - Vanilla configuration of libvirt. I have just added my regular user to the libvirt group. - Problem exists even if AppArmor is disabled. PS: I want to use a disk device of type 'volume' because this domain is created by Terraform using the libvirt provider which use this kind of disk since it has some advantages. See the details here : https://github.com/dmacvicar/terraform-provider-libvirt/issues/126#issuecomm...

Hi, I have progressed in my research. I created a minimal test case in order to reproduce the problem (see below). I made tests on 3 (physical) machines under Debian 11.4: the problem is present on 2 machines but there is no problem on the third. I booted a machine where the problem is present into a Debian 11.4 live OS and made the test : it works, no problem. So far, all my tests lead me to the following conclusions: - The problem is tied to the configuration of the system. - It's not 'file permission' problem. The directory structure of the storage pool, the file permissions on this structure, the configuration of libvirt and qemu and the user under which the daemon runs are the same on all systems. - I have made the test with libvirt 7.0.0 & qemu 1.5.2 and with libvirt 8.0.0 and qemu 1.7.0 (from Debian 11 backports). The different versions have the same behavior. - Apparmor is not the culprit (No error in logs). I have also disabled it and the behavior is still in the same I will appreciate any hint about what I should check to find the difference between the working systems and the failing ones. Regards, Fred How to made a test (under root): 1/ Install libvirt & qemu if needed apt install libvirt-daemon-system qemu-system-x86 virtinst 2/ Start libvirt daemon if needed systemctl start libvirtd 3/ Create the default pool storage (if it is not created automatically) virsh pool-define-as default dir - - - - /var/lib/libvirt/images/ virsh pool-build default virsh pool-start default 5/ Download Debian 11.4 Generic cloud image and put it in the default storage pool wget -O /var/lib/libvirt/images/debian.qcow2 https://cloud.debian.org/images/cloud/bullseye/latest/debian-11-genericcloud... 6/ Refresh the default storage and check the Debian image is visible. virsh pool-refresh default virsh vol-list --pool default 7) Start the default network virsh net-start default 8) Create a VM based on the Debian 11.4 Generic cloud image virt-install -n TESTBUG --disk vol=default/debian.qcow2 --memory 1024 --import --noreboot --graphics none 9/ Start the VM, it should start and work fine virsh start TESTBUG 10/ Stop the VM virsh shutdown TESTBUG 11/ Change the disk definition to switch to the disk type from 'file' to 'volume' and adapt the 'source' attributes accordingly. virsh edit --domain TESTBUG Change this section: <disk type="file" device="disk"> <driver name="qemu" type="qcow2"/> <source file="/var/lib/libvirt/images/debian.qcow2"/> <target dev="hda" bus="ide"/> <address type="drive" controller="0" bus="0" target="0" unit="0"/> </disk> to : <disk type="volume" device="disk"> <driver name="qemu" type="qcow2"/> <source pool="default" volume="debian.qcow2"/> <target dev="hda" bus="ide"/> <address type="drive" controller="0" bus="0" target="0" unit="0"/> </disk> 12/ Start the VM again. It will either succeed or fail with the fololwing error : error creating libvirt domain: internal error: qemu unexpectedly closed the monitor: 2022-08-11T16:12:22.987252Z qemu-system-x86_64: -blockdev {"driver":"file","filename":"/var/lib/libvirt/images/debian.qcow2","node-name":"libvirt-3-storage","auto-read-only":true,"discard":"unmap"}: Could not open '/var/lib/libvirt/images/debian.qcow2': Permission denied Le 13/08/2022 à 12:39, Frédéric Lespez a écrit :
Hi,
I need some help to debug a problem with libvirt and a disk device of type 'volume'.
I have a VM failing to start with the following error : $ virsh -c qemu:///system start server error :Failed to start domain 'server' error :internal error: process exited while connecting to monitor: 2022-08-13T09:26:50.121259Z qemu-system-x86_64: -blockdev {"driver":"file","filename":"/mnt/images/debian-11-genericcloud-amd64.qcow2","node-name":"libvirt-3-storage","auto-read-only":true,"discard":"unmap"}: Could not open '/mnt/images/debian-11-genericcloud-amd64.qcow2': Permission denied
I check the file access permission, but they are correct. I try to set everything to 777 or run qemu as root, but the problem persist. $ ll -d /mnt /mnt/images /mnt/images/* drwxr-xr-x 9 root root 4,0K 31 déc. 2021 /mnt drwxr-xr-x 2 root root 4,0K 13 août 11:31 /mnt/images -rw-r--r-- 1 libvirt-qemu libvirt-qemu 242M 13 août 11:31 /mnt/images/debian-11-genericcloud-amd64.qcow2 -rw-r--r-- 1 libvirt-qemu libvirt-qemu 366K 13 août 11:31 /mnt/images/server_cloudinit.iso -rw-r--r-- 1 libvirt-qemu libvirt-qemu 593M 13 août 11:59 /mnt/images/server_image.qcow2
After a lot of searching and testing, I found out that the disk device definition is linked to the source of the problem. The disk device is defined like this : <disk type="volume" device="disk"> <driver name="qemu" type="qcow2"/> <source pool="TERRAFORM" volume="server_image.qcow2"/> <target dev="vda" bus="virtio"/> <address type="pci" domain="0x0000" bus="0x00" slot="0x05" function="0x0"/> </disk>
This image 'server_image.qcow2' use a backing file: $ qemu-img info /mnt/images/server_image.qcow2 --backing-chain image: /mnt/stockage_rapide/VMs/terraform/puppetdev_server_image.qcow2 file format: qcow2 virtual size: 6 GiB (6442450944 bytes) disk size: 475 MiB cluster_size: 65536 backing file: /mnt/images/debian-11-genericcloud-amd64.qcow2 backing file format: qcow2 Format specific information: compat: 0.10 compression type: zlib refcount bits: 16
image: /mnt/images/debian-11-genericcloud-amd64.qcow2 file format: qcow2 virtual size: 2 GiB (2147483648 bytes) disk size: 242 MiB cluster_size: 65536 Format specific information: compat: 1.1 compression type: zlib lazy refcounts: false refcount bits: 16 corrupt: false extended l2: false
And here is the definition of the associated storage pool : <pool type="dir"> <name>TERRAFORM</name> <uuid>dae00836-db4d-49ba-9d32-1f0278055516</uuid> <capacity unit="bytes">155674652672</capacity> <allocation unit="bytes">74396299264</allocation> <available unit="bytes">81278353408</available> <source> </source> <target> <path>/mnt/images</path> <permissions> <mode>0755</mode> <owner>0</owner> <group>0</group> </permissions> </target> </pool>
If I changed the disk device definition to this (and changing only that), the domain start and works fine (no permission problem !). <disk type="file" device="disk"> <driver name="qemu" type="qcow2"/> <source file="/mnt/images/server_image.qcow2"/> <target dev="vda" bus="virtio"/> <address type="pci" domain="0x0000" bus="0x00" slot="0x05" function="0x0"/> </disk>
Could you help me find the reason why the domain doesn't work when the disk device is of type 'volume' ? Thanks in advance for your help.
Regards, Fred
Additional information: - Running this on Debian 11 with libvirt 8.0.0 (from backports) and qemu 7.0 (from backports). - Vanilla configuration of libvirt. I have just added my regular user to the libvirt group. - Problem exists even if AppArmor is disabled.
PS: I want to use a disk device of type 'volume' because this domain is created by Terraform using the libvirt provider which use this kind of disk since it has some advantages. See the details here : https://github.com/dmacvicar/terraform-provider-libvirt/issues/126#issuecomm...

On Tue, Aug 16, 2022 at 13:51:20 +0200, Frédéric Lespez wrote:
Hi,
I have progressed in my research.
I created a minimal test case in order to reproduce the problem (see below).
I made tests on 3 (physical) machines under Debian 11.4: the problem is present on 2 machines but there is no problem on the third.
I booted a machine where the problem is present into a Debian 11.4 live OS and made the test : it works, no problem.
So far, all my tests lead me to the following conclusions: - The problem is tied to the configuration of the system. - It's not 'file permission' problem. The directory structure of the storage pool, the file permissions on this structure, the configuration of libvirt and qemu and the user under which the daemon runs are the same on all systems. - I have made the test with libvirt 7.0.0 & qemu 1.5.2 and with libvirt 8.0.0 and qemu 1.7.0 (from Debian 11 backports). The different versions have the same behavior. - Apparmor is not the culprit (No error in logs). I have also disabled it and the behavior is still in the same
I will appreciate any hint about what I should check to find the difference between the working systems and the failing ones.
Regards, Fred
How to made a test (under root):
1/ Install libvirt & qemu if needed apt install libvirt-daemon-system qemu-system-x86 virtinst
2/ Start libvirt daemon if needed systemctl start libvirtd
3/ Create the default pool storage (if it is not created automatically) virsh pool-define-as default dir - - - - /var/lib/libvirt/images/ virsh pool-build default virsh pool-start default
5/ Download Debian 11.4 Generic cloud image and put it in the default storage pool wget -O /var/lib/libvirt/images/debian.qcow2 https://cloud.debian.org/images/cloud/bullseye/latest/debian-11-genericcloud...
6/ Refresh the default storage and check the Debian image is visible. virsh pool-refresh default virsh vol-list --pool default
7) Start the default network virsh net-start default
8) Create a VM based on the Debian 11.4 Generic cloud image virt-install -n TESTBUG --disk vol=default/debian.qcow2 --memory 1024 --import --noreboot --graphics none
9/ Start the VM, it should start and work fine virsh start TESTBUG
10/ Stop the VM virsh shutdown TESTBUG
11/ Change the disk definition to switch to the disk type from 'file' to 'volume' and adapt the 'source' attributes accordingly. virsh edit --domain TESTBUG
Change this section: <disk type="file" device="disk"> <driver name="qemu" type="qcow2"/> <source file="/var/lib/libvirt/images/debian.qcow2"/> <target dev="hda" bus="ide"/> <address type="drive" controller="0" bus="0" target="0" unit="0"/> </disk>
to : <disk type="volume" device="disk"> <driver name="qemu" type="qcow2"/> <source pool="default" volume="debian.qcow2"/> <target dev="hda" bus="ide"/> <address type="drive" controller="0" bus="0" target="0" unit="0"/> </disk>
12/ Start the VM again. It will either succeed or fail with the fololwing error : error creating libvirt domain: internal error: qemu unexpectedly closed the monitor: 2022-08-11T16:12:22.987252Z qemu-system-x86_64: -blockdev {"driver":"file","filename":"/var/lib/libvirt/images/debian.qcow2","node-name":"libvirt-3-storage","auto-read-only":true,"discard":"unmap"}: Could not open '/var/lib/libvirt/images/debian.qcow2': Permission denied
Hi, I'm fairly certain that the above is because of Apparmor. Specifically the apparmor labelling code does not translate the pool/volume name to the path to the image, while for other security drivers we use the existing definition and thus do translate it. I'm not familiar enough with apparmor to point you to how to configure logging properly, though. The issue originates from the fact that the apparmor driver uses a helper process to setup the labelling and the helper process itself is not able to access libvirt's storage driver and thus unable to do the translation. I'll try to think about a possibility to pass the path though.

Le 18/08/2022 à 14:48, Peter Krempa a écrit :
Hi,
I'm fairly certain that the above is because of Apparmor. Specifically the apparmor labelling code does not translate the pool/volume name to the path to the image, while for other security drivers we use the existing definition and thus do translate it.
I'm not familiar enough with apparmor to point you to how to configure logging properly, though.
The issue originates from the fact that the apparmor driver uses a helper process to setup the labelling and the helper process itself is not able to access libvirt's storage driver and thus unable to do the translation.
I'll try to think about a possibility to pass the path though.
Hi Peter, I was about to answer you that I made tests exonerating AppArmor. For example, I tweaked the /etc/apparmor.d/libvirt/TEMPLATE.qemu to workaround the fact that virt-aa-helper cannot dynamically generate correct profiles when using pool/volume names (since it only has the domain's XML definition, it cannot generate correct rules without having the storage pool's XML definition). But I decided to do some tests again, since I made these at the beginning of my research. An effectively AppArmor is the culprit ! I discovered that: - you need to reboot between tests (reloading profiles - or AppArmor itself without a reboot is not sufficient even if the docs I have read say it is not needed) - Apparmor can do "something" without logging a message (at least with the default configuration in Debian 11). I will do more tests in order to pinpoint the precise cause and report my findings. Thanks a lot for drawing my attention again to AppArmor. It was driving me nuts ! Regards, Fred
participants (2)
-
Frédéric Lespez
-
Peter Krempa