On 12/30/2016 03:39 AM, Chen Hanxiao wrote:
From: Chen Hanxiao <chenhanxiao(a)gmail.com>
This patch fix a dead lock when try to read a rbd image
When trying to connect a rbd server
(ceph-0.94.7-1.el7.centos.x86_64),
rbd_list/rbd_open enter a dead lock state.
Backtrace:
Thread 30 (Thread 0x7fdb342d0700 (LWP 12105)):
#0 0x00007fdb40b16705 in pthread_cond_wait@(a)GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007fdb294273f1 in librados::IoCtxImpl::operate_read(object_t const&,
ObjectOperation*, ceph::buffer::list*, int) () from /lib64/librados.so.2
#2 0x00007fdb29429fcc in librados::IoCtxImpl::read(object_t const&,
ceph::buffer::list&, unsigned long, unsigned long) () from /lib64/librados.so.2
#3 0x00007fdb293e850c in librados::IoCtx::read(std::string const&,
ceph::buffer::list&, unsigned long, unsigned long) () from /lib64/librados.so.2
#4 0x00007fdb2b9dd15e in librbd::list(librados::IoCtx&, std::vector<std::string,
std::allocator<std::string> >&) () from /lib64/librbd.so.1
#5 0x00007fdb2b98c089 in rbd_list () from /lib64/librbd.so.1
#6 0x00007fdb2e1a8052 in virStorageBackendRBDRefreshPool (conn=<optimized out>,
pool=0x7fdafc002d50) at storage/storage_backend_rbd.c:366
#7 0x00007fdb2e193833 in storagePoolCreate (obj=0x7fdb1c1fd5a0, flags=<optimized
out>) at storage/storage_driver.c:876
#8 0x00007fdb43790ea1 in virStoragePoolCreate (pool=pool@entry=0x7fdb1c1fd5a0, flags=0)
at libvirt-storage.c:695
#9 0x00007fdb443becdf in remoteDispatchStoragePoolCreate (server=0x7fdb45fb2ab0,
msg=0x7fdb45fb3db0, args=0x7fdb1c0037d0, rerr=0x7fdb342cfc30, client=<optimized
out>) at remote_dispatch.h:14383
#10 remoteDispatchStoragePoolCreateHelper (server=0x7fdb45fb2ab0, client=<optimized
out>, msg=0x7fdb45fb3db0, rerr=0x7fdb342cfc30, args=0x7fdb1c0037d0, ret=0x7fdb1c1b3260)
at remote_dispatch.h:14359
#11 0x00007fdb437d9c42 in virNetServerProgramDispatchCall (msg=0x7fdb45fb3db0,
client=0x7fdb45fd1a80, server=0x7fdb45fb2ab0, prog=0x7fdb45fcd670) at
rpc/virnetserverprogram.c:437
#12 virNetServerProgramDispatch (prog=0x7fdb45fcd670, server=server@entry=0x7fdb45fb2ab0,
client=0x7fdb45fd1a80, msg=0x7fdb45fb3db0) at rpc/virnetserverprogram.c:307
#13 0x00007fdb437d4ebd in virNetServerProcessMsg (msg=<optimized out>,
prog=<optimized out>, client=<optimized out>, srv=0x7fdb45fb2ab0) at
rpc/virnetserver.c:135
#14 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x7fdb45fb2ab0) at
rpc/virnetserver.c:156
#15 0x00007fdb436cfb35 in virThreadPoolWorker (opaque=opaque@entry=0x7fdb45fa7650) at
util/virthreadpool.c:145
#16 0x00007fdb436cf058 in virThreadHelper (data=<optimized out>) at
util/virthread.c:206
#17 0x00007fdb40b12df5 in start_thread () from /lib64/libpthread.so.0
#18 0x00007fdb408401ad in clone () from /lib64/libc.so.6
366 len = rbd_list(ptr.ioctx, names, &max_size);
(gdb) n
[New Thread 0x7fdb20758700 (LWP 22458)]
[New Thread 0x7fdb20556700 (LWP 22459)]
[Thread 0x7fdb20758700 (LWP 22458) exited]
[New Thread 0x7fdb20455700 (LWP 22460)]
[Thread 0x7fdb20556700 (LWP 22459) exited]
[New Thread 0x7fdb20556700 (LWP 22461)]
infinite loop...
Signed-off-by: Chen Hanxiao <chenhanxiao(a)gmail.com>
---
src/storage/storage_backend_rbd.c | 7 +++++++
1 file changed, 7 insertions(+)
Could you provide a bit more context...
Why does calling rados_conf_read_file with a NULL resolve the issue?
Is this something "new" or "expected"? And if expected, why are we
only
seeing it now?
What is the other thread that "has" the lock doing?
From my cursory/quick read of :
http://docs.ceph.com/docs/master/rados/api/librados/
...
"Then you configure your rados_t to connect to your cluster, either by
setting individual values (rados_conf_set()), using a configuration file
(rados_conf_read_file()), using command line options
(rados_conf_parse_argv()), or an environment variable
(rados_conf_parse_env()):"
Since we use rados_conf_set, that would seem to indicate we're OK. It's
not clear from just what's posted why calling eventually calling
rbd_list is causing a hang.
I don't have the cycles or environment to do the research right now and
it really isn't clear why a read_file would resolve the issue.
John
diff --git a/src/storage/storage_backend_rbd.c
b/src/storage/storage_backend_rbd.c
index b1c51ab..233737b 100644
--- a/src/storage/storage_backend_rbd.c
+++ b/src/storage/storage_backend_rbd.c
@@ -95,6 +95,9 @@ virStorageBackendRBDOpenRADOSConn(virStorageBackendRBDStatePtr ptr,
goto cleanup;
}
+ /* try default location, but ignore failure */
+ rados_conf_read_file(ptr->cluster, NULL);
+
if (!conn) {
virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
_("'ceph' authentication not supported "
@@ -124,6 +127,10 @@ virStorageBackendRBDOpenRADOSConn(virStorageBackendRBDStatePtr ptr,
_("failed to create the RADOS cluster"));
goto cleanup;
}
+
+ /* try default location, but ignore failure */
+ rados_conf_read_file(ptr->cluster, NULL);
+
if (virStorageBackendRBDRADOSConfSet(ptr->cluster,
"auth_supported",
"none") < 0)
goto cleanup;