On Sun, Jan 19, 2025 at 04:26:36PM +1000, Stuart Longland VK4MSL via Users wrote:
Hi all,
I have an issue getting a RBD pool going on a newly deployed compute node.
The storage back-end is a Ceph storage cluster running Ceph 14
(Nautilus… yes I know this is old, an update to 18 is planned soon). I
have an existing node, running Debian 10 (again, updating this is
planned, but I'd like to deploy new nodes to migrate the instances to
whilst this node is updated), which runs about a dozen VMs with disks on
this back-end.
I've loaded a new machine (a MSI Cubi 5 mini PC) up with AlpineLinux
3.21. Boot disk is a 240GB SATA SSD, and there's a 1TB nVME for local
VM storage. My intent is to allow VMs to mount RBDs for back-up
purposes. The machine has two Ethernet interfaces (a 2.5Gbps and a
1Gbps link), one will be the "front-end" used by the VMs, the other will
be a "back-end" link to talk to Ceph and administer the host.
- OpenVSwitch 2.17.11 is deployed with two bridges
- libvirtd 10.9.0 installed
- a LVM pool called 'data' has been created on nVME
- Ceph 19.2.0 is installed (libvirtd is linked to this version of librbd)
- /etc/ceph has been cloned from my existing working compute node
I have two RBD pools; 'one' and 'ha'. 'one' has most of my
virtual
machine images in it (it is from a former OpenNebula install), 'ha' has
core router root disk images in it ('ha' for high availability; it has
stronger replication settings than 'one' to guarantee better reliability).
I've created a `libvirt` user in Ceph, and on the intended node, this works:
> ~ # rbd --id libvirt ls -p one | head
> mastodon-vda
> mastodon-vdb
> mastodon-vdd
> mastodon-vde
> one-14
> one-15
> one-19
> one-20
> one-22
> one-23
could you try running:
rbd --id libvirt ls -p one | sort | uniq -d
please?
> ~ # rbd --id libvirt ls -p ha | head
> core-router-obsd75-vda
> core-router-obsd76-vda
I can also access RBD images just fine:
> ~ # rbd --id libvirt map one/shares-vda
> /dev/rbd0
> ~ # fdisk -l /dev/rbd0
> Disk /dev/rbd0: 20 GB, 21474836480 bytes, 41943040 sectors
> 2610 cylinders, 255 heads, 63 sectors/track
> Units: sectors of 1 * 512 = 512 bytes
>
> Device Boot StartCHS EndCHS StartLBA EndLBA Sectors Size Id
Type
> /dev/rbd0p1 * 2,0,33 611,8,56 2048 616447 614400 300M 83
Linux
> /dev/rbd0p2 611,8,57 1023,15,63 616448 2584575 1968128 961M 82
Linux swap
> /dev/rbd0p3 1023,15,63 1023,15,63 2584576 41943039 39358464 18.7G 83
Linux
> ~ # rbd unmap one/shares-vda
This is registered in libvirtd:
> ~ # virsh secret-list
> UUID Usage
> --------------------------------------------------------------------
> c14a16b5-bba5-473a-ae9b-53a9a6b0a4e3 ceph client.libvirt secret
> ~ # virsh secret-dumpxml c14a16b5-bba5-473a-ae9b-53a9a6b0a4e3
> <secret ephemeral='no' private='no'>
> <uuid>c14a16b5-bba5-473a-ae9b-53a9a6b0a4e3</uuid>
> <usage type='ceph'>
> <name>client.libvirt secret</name>
> </usage>
> </secret>
I have defined four pools, 'temp', 'local', 'ha-images' and
'opennebula-images':
> ~ # virsh pool-list --all
> Name State Autostart
> -------------------------------------------
> default active yes
> ha-images active yes
> local active yes
> opennebula-images inactive yes
> temp active yes
'ha-images' works just fine, this is its config:
> ~ # virsh pool-dumpxml ha-images
> <pool type='rbd'>
> <name>ha-images</name>
> <uuid>6beab982-52b3-495b-a4a7-ab7ebb522ef5</uuid>
> <capacity unit='bytes'>20003977953280</capacity>
> <allocation unit='bytes'>159339114496</allocation>
> <available unit='bytes'>13142248669184</available>
> <source>
> <host name='172.31.252.1' port='6789'/>
> <host name='172.31.252.2' port='6789'/>
> <host name='172.31.252.5' port='6789'/>
> <host name='172.31.252.6' port='6789'/>
> <host name='172.31.252.7' port='6789'/>
> <host name='172.31.252.8' port='6789'/>
> <host name='172.31.252.9' port='6789'/>
> <host name='172.31.252.10' port='6789'/>
> <name>ha</name>
> <auth type='ceph' username='libvirt'>
> <secret uuid='c14a16b5-bba5-473a-ae9b-53a9a6b0a4e3'/>
> </auth>
> </source>
> </pool>
'opennebula-images' does not, its config:
> ~ # virsh pool-dumpxml opennebula-images
> <pool type='rbd'>
> <name>opennebula-images</name>
> <uuid>fcaa2fa8-f0d2-4919-9168-756a9f4ad7ee</uuid>
> <capacity unit='bytes'>20003977953280</capacity>
> <allocation unit='bytes'>5454371495936</allocation>
> <available unit='bytes'>13142254759936</available>
> <source>
> <host name='172.31.252.1' port='6789'/>
> <host name='172.31.252.2' port='6789'/>
> <host name='172.31.252.5' port='6789'/>
> <host name='172.31.252.6' port='6789'/>
> <host name='172.31.252.7' port='6789'/>
> <host name='172.31.252.8' port='6789'/>
> <host name='172.31.252.9' port='6789'/>
> <host name='172.31.252.10' port='6789'/>
> <name>one</name>
> <auth type='ceph' username='libvirt'>
> <secret uuid='c14a16b5-bba5-473a-ae9b-53a9a6b0a4e3'/>
> </auth>
> </source>
> </pool>
It's not obvious what the differences are. `name`, `uuid`,
`allocation`, `available` and `source/name` are expected to be
different, everything else 100% matches. I've tried removing and
zeroing out the `capacity`, `allocation` and `available` tags to no effect.
> ~ # virsh pool-dumpxml ha-images > /tmp/ha-images.xml
> ~ # virsh pool-dumpxml opennebula-images > /tmp/opennebula-images.xml
> ~ # diff -u /tmp/ha-images.xml /tmp/opennebula-images.xml
> --- /tmp/ha-images.xml
> +++ /tmp/opennebula-images.xml
> @@ -1,9 +1,9 @@
> <pool type='rbd'>
> - <name>ha-images</name>
> - <uuid>6beab982-52b3-495b-a4a7-ab7ebb522ef5</uuid>
> + <name>opennebula-images</name>
> + <uuid>fcaa2fa8-f0d2-4919-9168-756a9f4ad7ee</uuid>
> <capacity unit='bytes'>20003977953280</capacity>
> - <allocation unit='bytes'>159339114496</allocation>
> - <available unit='bytes'>13142248669184</available>
> + <allocation unit='bytes'>5454371495936</allocation>
> + <available unit='bytes'>13142254759936</available>
> <source>
> <host name='172.31.252.1' port='6789'/>
> <host name='172.31.252.2' port='6789'/>
> @@ -13,7 +13,7 @@
> <host name='172.31.252.8' port='6789'/>
> <host name='172.31.252.9' port='6789'/>
> <host name='172.31.252.10' port='6789'/>
> - <name>ha</name>
> + <name>one</name>
> <auth type='ceph' username='libvirt'>
> <secret uuid='c14a16b5-bba5-473a-ae9b-53a9a6b0a4e3'/>
> </auth>
> ~ # diff -y /tmp/ha-images.xml /tmp/opennebula-images.xml
When I start this errant pool, I get this:
> ~ # virsh pool-start opennebula-images
> error: Failed to start pool opennebula-images
> error: An error occurred, but the cause is unknown
Is this related to whether the other pool is started or not?
If I crank debugging up in `libvirtd` (via the not-recommended
`log_level` and directing all output to a file), I see it successfully
connects to the pool for about 15 seconds, lists the sizes of about a
dozen disk images, then seemingly gives up and disconnects.
> 2025-01-19 05:16:55.176+0000: 3609: info : vir_object_finalize:319 : OBJECT_DISPOSE:
obj=0x7f975fc816a0
> 2025-01-19 05:16:55.177+0000: 3609: info : virObjectUnref:378 : OBJECT_UNREF:
obj=0x7f975fc816a0
> 2025-01-19 05:16:55.183+0000: 3609: debug : virStorageBackendRBDRefreshPool:693 :
Utilization of RBD pool one: (kb: 19535134720 kb_
> avail: 12800438616 num_bytes: 5489355030528)
> 2025-01-19 05:16:55.988+0000: 3609: debug : volStorageBackendRBDRefreshVolInfo:569 :
Refreshed RBD image one/mastodon-vda (capacity
> : 21474836480 allocation: 21474836480 obj_size: 4194304 num_objs: 5120)
> 2025-01-19 05:16:55.993+0000: 3609: info : virObjectNew:256 : OBJECT_NEW:
obj=0x7f975fa6bba0 classname=virStorageVolObj
> 2025-01-19 05:16:55.993+0000: 3609: info : virObjectRef:400 : OBJECT_REF:
obj=0x7f975fa6bba0
> 2025-01-19 05:16:55.993+0000: 3609: info : virObjectRef:400 : OBJECT_REF:
obj=0x7f975fa6bba0
> 2025-01-19 05:16:55.993+0000: 3609: info : virObjectRef:400 : OBJECT_REF:
obj=0x7f975fa6bba0
> 2025-01-19 05:16:55.993+0000: 3609: info : virObjectUnref:378 : OBJECT_UNREF:
obj=0x7f975fa6bba0
> 2025-01-19 05:16:56.011+0000: 3609: debug : volStorageBackendRBDRefreshVolInfo:569 :
Refreshed RBD image one/mastodon-vdb (capacity
> : 536870912000 allocation: 536870912000 obj_size: 4194304 num_objs: 128000)
…snip…
> 2025-01-19 05:17:03.756+0000: 3609: debug : volStorageBackendRBDRefreshVolInfo:569 :
Refreshed RBD image one/wsmail-vdb (capacity:
> 21474836480 allocation: 21474836480 obj_size: 4194304 num_objs: 5120)
> 2025-01-19 05:17:03.758+0000: 3609: info : virObjectNew:256 : OBJECT_NEW:
obj=0x7f975f9cf250 classname=virStorageVolObj
> 2025-01-19 05:17:03.758+0000: 3609: info : virObjectRef:400 : OBJECT_REF:
obj=0x7f975f9cf250
> 2025-01-19 05:17:03.758+0000: 3609: info : virObjectRef:400 : OBJECT_REF:
obj=0x7f975f9cf250
> 2025-01-19 05:17:03.758+0000: 3609: info : virObjectRef:400 : OBJECT_REF:
obj=0x7f975f9cf250
> 2025-01-19 05:17:03.758+0000: 3609: info : virObjectUnref:378 : OBJECT_UNREF:
obj=0x7f975f9cf250
> 2025-01-19 05:17:03.777+0000: 3609: debug : volStorageBackendRBDRefreshVolInfo:569 :
Refreshed RBD image one/sjl-router-obsd76-vda
> (capacity: 34359738368 allocation: 34359738368 obj_size: 4194304 num_objs: 8192)
So this line is mentioned for every volume, which libvirt is iterating
over. The next line I'd be expecting is "Found X images in RBD pool
one" but that line does not show up here. And looking at current HEAD
the only place where this can error out in between these two debug
messages is in virStoragePoolObjAddVol() since that is the only called
function which can return an error and not set an error properly. That
function errors out if:
- key, name or target.path are missing -- this cannot be the case since
key is the same as target.path and it is being printed in the debug
messages and name is part of that string
- volume with the same key, name or path already exists in the list --
now whether this could happen or not I am not sure, I don't know ceph
at all. But it's the only place where I see the code errors out
without setting an error proper.
> 2025-01-19 05:17:03.778+0000: 3609: debug :
virStorageBackendRBDCloseRADOSConn:369 : Closing RADOS IoCTX
> 2025-01-19 05:17:03.778+0000: 3609: debug : virStorageBackendRBDCloseRADOSConn:374 :
Closing RADOS connection
> 2025-01-19 05:17:03.783+0000: 3609: debug : virStorageBackendRBDCloseRADOSConn:378 :
RADOS connection existed for 15 seconds
> 2025-01-19 05:17:03.783+0000: 3609: info : virObjectUnref:378 : OBJECT_UNREF:
obj=0x7f975f9cf2b0
> 2025-01-19 05:17:03.783+0000: 3609: info : virObjectUnref:378 : OBJECT_UNREF:
obj=0x7f975ef90ac0
> 2025-01-19 05:17:03.783+0000: 3609: info : virObjectUnref:378 : OBJECT_UNREF:
obj=0x7f975fa6bd20
> 2025-01-19 05:17:03.783+0000: 3609: info : virObjectUnref:378 : OBJECT_UNREF:
obj=0x7f975fa6ecc0
…snip…
> 2025-01-19 05:17:03.785+0000: 3609: info : vir_object_finalize:319 : OBJECT_DISPOSE:
obj=0x7f975f9cee90
> 2025-01-19 05:17:03.785+0000: 3609: info : virObjectUnref:378 : OBJECT_UNREF:
obj=0x7f975f9cee90
> 2025-01-19 05:17:03.785+0000: 3609: info : vir_object_finalize:319 : OBJECT_DISPOSE:
obj=0x7f975fa6e960
> 2025-01-19 05:17:03.785+0000: 3609: info : virObjectUnref:378 : OBJECT_UNREF:
obj=0x7f975fa6e960
> 2025-01-19 05:17:03.785+0000: 3609: error : storageDriverAutostartCallback:213 :
internal error: Failed to autostart storage pool 'opennebula-images': no error
> 2025-01-19 05:17:03.785+0000: 3609: info : virObjectUnref:378 : OBJECT_UNREF:
obj=0x7f975fcd0490
> 2025-01-19 05:17:03.785+0000: 3609: info : virObjectRef:400 : OBJECT_REF:
obj=0x7f975fcd27c0
> 2025-01-19 05:17:03.785+0000: 3609: info : virObjectUnref:378 : OBJECT_UNREF:
obj=0x7f975fcd27c0
> 2025-01-19 05:17:03.786+0000: 3609: info : virObjectRef:400 : OBJECT_REF:
obj=0x7f975fcd06d0
> 2025-01-19 05:17:03.786+0000: 3609: info : virObjectUnref:378 : OBJECT_UNREF:
obj=0x7f975fcd06d0
> 2025-01-19 05:17:03.786+0000: 3609: info : virObjectRef:400 : OBJECT_REF:
obj=0x7f975fcd0130
If there's no cause for the error, it should not fail.
If it fails, there should be a cause listed, there's no excuse for it
Well, there is. In C code the error codes are usually numbers and the
error messages are set separately to save extra copying and all that.
Most of the time this is kept separate and in that case it can happen
that we return an error, but do not set the error string. In such case
NULL is represented as "no error" or "but the cause is unknown"
rather
than segfaulting during the print or showing something along the lines
of "error: <null>".
being "unknown" -- just because Microsoft's OSes make up
error codes
that its own help system can't explain is no excuse for the open-source
world to follow their example.
Oh come on, let's not get unreasonably heated here.
I'd happily provide more information, if someone can provide
guidance on
how to locate it.
Have you tried setting up some ceph debugging like is explained in
https://github.com/ceph/ceph/blob/f0023689d421373e058132fe3019530749afcb5...
under Configuring Ceph / Tip ?
--
Stuart Longland (aka Redhatter, VK4MSL)
I haven't lost my mind...
...it's backed up on a tape somewhere.