[...]
>
> Could you provide a bit more context...
>
> Why does calling rados_conf_read_file with a NULL resolve the issue?
>
> Is this something "new" or "expected"? And if expected, why are
we only
> seeing it now?
>
> What is the other thread that "has" the lock doing?
It seams that the server side of ceph does not response our request.
So when libvirt call rbd_open/rbd_list, etc, it never return.
But qemu works fine.
So I take qemu's code as a reference.
https://github.com/qemu/qemu/blob/master/block/rbd.c#L365
rados_conf_read_file with a NULL will try to get ceph conf file from
/etc/ceph and other default paths.
Althougth we rados_conf_set in the following code,
w/o rados_conf_read_file,
ceph-0.94.7-1.el7 does not answer our rbd_open.
Some elder or newer ceph server does not have this issue.
I think this may be a ceph server bug of ceph-0.94.7-1.el7.
Thus a bug should be filed against ceph to fix their 0.94 version rather
than adding what would seemingly be an unnecessary change into libvirt
to work around a problem that appears to be fixed in some future version
of ceph.
John
Doing rados_conf_read_file(cluster, NULL)
will make our code more robust.
Regards,
- Chen