
[...]
Could you provide a bit more context...
Why does calling rados_conf_read_file with a NULL resolve the issue?
Is this something "new" or "expected"? And if expected, why are we only seeing it now?
What is the other thread that "has" the lock doing?
It seams that the server side of ceph does not response our request.
So when libvirt call rbd_open/rbd_list, etc, it never return.
But qemu works fine. So I take qemu's code as a reference. https://github.com/qemu/qemu/blob/master/block/rbd.c#L365
rados_conf_read_file with a NULL will try to get ceph conf file from /etc/ceph and other default paths.
Althougth we rados_conf_set in the following code, w/o rados_conf_read_file, ceph-0.94.7-1.el7 does not answer our rbd_open.
Some elder or newer ceph server does not have this issue. I think this may be a ceph server bug of ceph-0.94.7-1.el7.
Thus a bug should be filed against ceph to fix their 0.94 version rather than adding what would seemingly be an unnecessary change into libvirt to work around a problem that appears to be fixed in some future version of ceph. John
Doing rados_conf_read_file(cluster, NULL) will make our code more robust.
Regards, - Chen