On 04/04/2018 02:59 PM, Vincent Bernat wrote:
❦ 4 avril 2018 11:17 +0200, Michal Privoznik <mprivozn@redhat.com> :
Dunno, this is the first time I hear about this issue. Maybe you can try to set a break point on virHashIterationError() and when it's hit get stacktrace of all threads 't a a bt'. That might shed more light into the issue. Smells like we are not locking somewhere properly.
So, we have two threads iterating. Both of them originates from virNetServerProgramDispatchCall. So, it seems there is a lock missing?
No. As suspected, both threads are reading internal list of domains (actually hash table). They are not modifying the list so they grab a read lock.
Thread 10 (Thread 0x7f931f814700 (LWP 126453)): #0 virHashForEach (table=0x7f92fc69a480, iter=iter@entry=0x7f932ea8fbf0 <virDomainObjListCollectIterator>, data=data@entry=0x7f931f813a20) at ../../../src/util/virhash.c:597 #1 0x00007f932ea911c3 in virDomainObjListCollect (domlist=0x7f92fc82dd50, conn=conn@entry=0x7f92e0000f80, vms=vms@entry=0x7f931f813a80, nvms=nvms@entry=0x7f931f813a90, filter=0x7f932ebdc6b0 <virConnectListAllDomainsCheckACL>, flags=48) at ../../../src/conf/virdomainobjlist.c:935 #2 0x00007f932ea91482 in virDomainObjListExport (domlist=<optimized out>, conn=0x7f92e0000f80, domains=0x7f931f813af0, filter=<optimized out>, flags=<optimized out>) at ../../../src/conf/virdomainobjlist.c:1019 #3 0x00007f932eaf3b75 in virConnectListAllDomains (conn=0x7f92e0000f80, domains=0x7f931f813af0, flags=48) at ../../../src/libvirt-domain.c:6491 #4 0x0000559f219f6dca in remoteDispatchConnectListAllDomains (server=0x559f23795530, msg=0x559f237b78c0, ret=0x7f92d0000920, args=0x7f92d0000ad0, rerr=0x7f931f813bc0, client=<optimized out>) at ../../../daemon/remote_dispatch.h:1244 #5 remoteDispatchConnectListAllDomainsHelper (server=0x559f23795530, client=<optimized out>, msg=0x559f237b78c0, rerr=0x7f931f813bc0, args=0x7f92d0000ad0, ret=0x7f92d0000920) at ../../../daemon/remote_dispatch.h:1220 #6 0x00007f932eb5c5b9 in virNetServerProgramDispatchCall (msg=0x559f237b78c0, client=0x559f237b7bd0, server=0x559f23795530, prog=0x559f237b2a50) at ../../../src/rpc/virnetserverprogram.c:437 #7 virNetServerProgramDispatch (prog=0x559f237b2a50, server=server@entry=0x559f23795530, client=0x559f237b7bd0, msg=0x559f237b78c0) at ../../../src/rpc/virnetserverprogram.c:307 #8 0x0000559f21a20268 in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x559f23795530) at ../../../src/rpc/virnetserver.c:148 #9 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x559f23795530) at ../../../src/rpc/virnetserver.c:169 #10 0x00007f932ea3871b in virThreadPoolWorker (opaque=opaque@entry=0x559f2377da20) at ../../../src/util/virthreadpool.c:167 #11 0x00007f932ea37ac8 in virThreadHelper (data=<optimized out>) at ../../../src/util/virthread.c:206 #12 0x00007f932e0ed6ba in start_thread (arg=0x7f931f814700) at pthread_create.c:333 #13 0x00007f932de2341d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Thread 6 (Thread 0x7f9321818700 (LWP 126449)): #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007f932e0efdbd in __GI___pthread_mutex_lock (mutex=mutex@entry=0x7f92ec011c80) at ../nptl/pthread_mutex_lock.c:80 #2 0x00007f932ea37c45 in virMutexLock (m=m@entry=0x7f92ec011c80) at ../../../src/util/virthread.c:89 #3 0x00007f932ea180ca in virObjectLock (anyobj=anyobj@entry=0x7f92ec011c70) at ../../../src/util/virobject.c:388 #4 0x00007f932ea8fd81 in virDomainObjListCopyActiveIDs (payload=0x7f92ec011c70, name=<optimized out>, opaque=0x7f9321817aa0) at ../../../src/conf/virdomainobjlist.c:679 #5 0x00007f932e9ee6cd in virHashForEach (table=0x7f92fc69a480, iter=iter@entry=0x7f932ea8fd70 <virDomainObjListCopyActiveIDs>, data=data@entry=0x7f9321817aa0) at ../../../src/util/virhash.c:606 #6 0x00007f932ea90fba in virDomainObjListGetActiveIDs (doms=0x7f92fc82dd50, ids=<optimized out>, maxids=<optimized out>, filter=<optimized out>, conn=<optimized out>) at ../../../src/conf/virdomainobjlist.c:701 #7 0x00007f932eae1c6e in virConnectListDomains (conn=0x7f92e40018b0, ids=0x7f92f4015ae0, maxids=14) at ../../../src/libvirt-domain.c:66 #8 0x0000559f21a0d3c1 in remoteDispatchConnectListDomains (server=<optimized out>, msg=0x559f237beea0, ret=0x7f92f4009eb0, args=0x7f92f4016290, rerr=0x7f9321817bc0, client=<optimized out>) at ../../../daemon/remote_dispatch.h:2108 #9 remoteDispatchConnectListDomainsHelper (server=<optimized out>, client=<optimized out>, msg=0x559f237beea0, rerr=0x7f9321817bc0, args=0x7f92f4016290, ret=0x7f92f4009eb0) at ../../../daemon/remote_dispatch.h:2076 #10 0x00007f932eb5c5b9 in virNetServerProgramDispatchCall (msg=0x559f237beea0, client=0x559f237c0ce0, server=0x559f23795530, prog=0x559f237b2a50) at ../../../src/rpc/virnetserverprogram.c:437 #11 virNetServerProgramDispatch (prog=0x559f237b2a50, server=server@entry=0x559f23795530, client=0x559f237c0ce0, msg=0x559f237beea0) at ../../../src/rpc/virnetserverprogram.c:307 #12 0x0000559f21a20268 in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x559f23795530) at ../../../src/rpc/virnetserver.c:148 #13 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x559f23795530) at ../../../src/rpc/virnetserver.c:169 #14 0x00007f932ea3871b in virThreadPoolWorker (opaque=opaque@entry=0x559f2377d860) at ../../../src/util/virthreadpool.c:167 #15 0x00007f932ea37ac8 in virThreadHelper (data=<optimized out>) at ../../../src/util/virthread.c:206 #16 0x00007f932e0ed6ba in start_thread (arg=0x7f9321818700) at pthread_create.c:333 #17 0x00007f932de2341d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Both threads call virHashForEach(table=0x7f92fc69a480). Thread 6 was first so it starts iterating and sets table->iterating so later when thread 10 enters the function an error is reported. I guess we can go with what Dan suggested and after some rework we can just drop ->iterating completely. Michal