RBD pool not starting "An error occurred, but the cause is unknown"
Hi all, I've been battling this for about a year now. I've deployed virtual machine hosts running AlpineLinux 3.23 (recently updated from 3.21) which ships `libvirtd` 11.10.0, Ceph 19.2.3 and QEMU 10.1.3. (For reference: previous thread is archived at https://www.spinics.net/linux/fedora/libvirt-users/msg14421.html) I have two RBD pools defined: one is a "high availability" pool which replicates a copy of each block on every node, the other (previously used with OpenNebula) just makes sure there are at least 2 copies of blocks across the nodes (preferrably 3). They're called "ha-images" and "opennebula-images" respectively. "ha-images" starts without issue. It has a configuration like the following: ``` <pool type="rbd"> <name>ha-images</name> <uuid>6beab982-52b3-495b-a4a7-ab7ebb522ef5</uuid> <capacity unit="bytes">16003182362624</capacity> <allocation unit="bytes">184657756160</allocation> <available unit="bytes">6658209792000</available> <source> <host name="172.31.252.1" port="6789"/> <host name="172.31.252.2" port="6789"/> <host name="172.31.252.5" port="6789"/> <host name="172.31.252.6" port="6789"/> <host name="172.31.252.7" port="6789"/> <name>ha</name> <auth type="ceph" username="libvirt"> <secret uuid="c14a16b5-bba5-473a-ae9b-53a9a6b0a4e3"/> </auth> </source> </pool> ``` No issues with it, it starts just fine. "opennebula-images" never starts, its configuration: ``` <pool type="rbd"> <name>opennebula-images</name> <uuid>fcaa2fa8-f0d2-4919-9168-756a9f4ad7ee</uuid> <capacity unit="bytes">16003182362624</capacity> <allocation unit="bytes">7893286499025</allocation> <available unit="bytes">6658213388288</available> <source> <host name="172.31.252.1" port="6789"/> <host name="172.31.252.2" port="6789"/> <host name="172.31.252.5" port="6789"/> <host name="172.31.252.6" port="6789"/> <host name="172.31.252.7" port="6789"/> <name>one</name> <auth type="ceph" username="libvirt"> <secret uuid="c14a16b5-bba5-473a-ae9b-53a9a6b0a4e3"/> </auth> </source> </pool> ``` The `libvirt` user I created in Ceph is able to list the RBD pool: ``` lithium:~# rbd --id libvirt ls -p one | head gapmx-old-vda gapmx-old-vdb gapmx-testing-vda gapmx-testing-vdb gapmx-vda gapmx-vdb mastodon-vda mastodon-vdb mastodon-vdd mastodon-vde ``` Starting it, `libvirt` merely tells me: "An error occurred, but the cause is unknown". No logs written anywhere, no explanation. Just "computer says no". Yet, if I hand-configure my virtual machines, I can still mount RBD volumes as virtual disks within either pool. I just can't browse them via `virsh` or `virt-manager` on the `opennebula-images` pool like I can for the `ha-images` pool. Where should I be looking for the error? What did I do wrong that makes it refuse to start?
participants (1)
-
me@vk4msl.com