On 04/04/2018 02:59 PM, Vincent Bernat wrote:
❦ 4 avril 2018 11:17 +0200, Michal Privoznik
<mprivozn(a)redhat.com> :
> Dunno, this is the first time I hear about this issue. Maybe you can try
> to set a break point on virHashIterationError() and when it's hit get
> stacktrace of all threads 't a a bt'. That might shed more light into
> the issue. Smells like we are not locking somewhere properly.
So, we have two threads iterating. Both of them originates from
virNetServerProgramDispatchCall. So, it seems there is a lock missing?
No. As suspected, both threads are reading internal list of domains
(actually hash table). They are not modifying the list so they grab a
read lock.
Thread 10 (Thread 0x7f931f814700 (LWP 126453)):
#0 virHashForEach (table=0x7f92fc69a480, iter=iter@entry=0x7f932ea8fbf0
<virDomainObjListCollectIterator>, data=data@entry=0x7f931f813a20) at
../../../src/util/virhash.c:597
#1 0x00007f932ea911c3 in virDomainObjListCollect (domlist=0x7f92fc82dd50,
conn=conn@entry=0x7f92e0000f80, vms=vms@entry=0x7f931f813a80,
nvms=nvms@entry=0x7f931f813a90, filter=0x7f932ebdc6b0
<virConnectListAllDomainsCheckACL>, flags=48)
at ../../../src/conf/virdomainobjlist.c:935
#2 0x00007f932ea91482 in virDomainObjListExport (domlist=<optimized out>,
conn=0x7f92e0000f80, domains=0x7f931f813af0, filter=<optimized out>,
flags=<optimized out>) at ../../../src/conf/virdomainobjlist.c:1019
#3 0x00007f932eaf3b75 in virConnectListAllDomains (conn=0x7f92e0000f80,
domains=0x7f931f813af0, flags=48) at ../../../src/libvirt-domain.c:6491
#4 0x0000559f219f6dca in remoteDispatchConnectListAllDomains (server=0x559f23795530,
msg=0x559f237b78c0, ret=0x7f92d0000920, args=0x7f92d0000ad0, rerr=0x7f931f813bc0,
client=<optimized out>) at ../../../daemon/remote_dispatch.h:1244
#5 remoteDispatchConnectListAllDomainsHelper (server=0x559f23795530,
client=<optimized out>, msg=0x559f237b78c0, rerr=0x7f931f813bc0,
args=0x7f92d0000ad0, ret=0x7f92d0000920) at ../../../daemon/remote_dispatch.h:1220
#6 0x00007f932eb5c5b9 in virNetServerProgramDispatchCall (msg=0x559f237b78c0,
client=0x559f237b7bd0, server=0x559f23795530, prog=0x559f237b2a50) at
../../../src/rpc/virnetserverprogram.c:437
#7 virNetServerProgramDispatch (prog=0x559f237b2a50, server=server@entry=0x559f23795530,
client=0x559f237b7bd0, msg=0x559f237b78c0) at ../../../src/rpc/virnetserverprogram.c:307
#8 0x0000559f21a20268 in virNetServerProcessMsg (msg=<optimized out>,
prog=<optimized out>, client=<optimized out>, srv=0x559f23795530) at
../../../src/rpc/virnetserver.c:148
#9 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x559f23795530) at
../../../src/rpc/virnetserver.c:169
#10 0x00007f932ea3871b in virThreadPoolWorker (opaque=opaque@entry=0x559f2377da20) at
../../../src/util/virthreadpool.c:167
#11 0x00007f932ea37ac8 in virThreadHelper (data=<optimized out>) at
../../../src/util/virthread.c:206
#12 0x00007f932e0ed6ba in start_thread (arg=0x7f931f814700) at pthread_create.c:333
#13 0x00007f932de2341d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Thread 6 (Thread 0x7f9321818700 (LWP 126449)):
#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007f932e0efdbd in __GI___pthread_mutex_lock (mutex=mutex@entry=0x7f92ec011c80) at
../nptl/pthread_mutex_lock.c:80
#2 0x00007f932ea37c45 in virMutexLock (m=m@entry=0x7f92ec011c80) at
../../../src/util/virthread.c:89
#3 0x00007f932ea180ca in virObjectLock (anyobj=anyobj@entry=0x7f92ec011c70) at
../../../src/util/virobject.c:388
#4 0x00007f932ea8fd81 in virDomainObjListCopyActiveIDs (payload=0x7f92ec011c70,
name=<optimized out>, opaque=0x7f9321817aa0) at
../../../src/conf/virdomainobjlist.c:679
#5 0x00007f932e9ee6cd in virHashForEach (table=0x7f92fc69a480,
iter=iter@entry=0x7f932ea8fd70 <virDomainObjListCopyActiveIDs>,
data=data@entry=0x7f9321817aa0) at ../../../src/util/virhash.c:606
#6 0x00007f932ea90fba in virDomainObjListGetActiveIDs (doms=0x7f92fc82dd50,
ids=<optimized out>, maxids=<optimized out>, filter=<optimized out>,
conn=<optimized out>) at ../../../src/conf/virdomainobjlist.c:701
#7 0x00007f932eae1c6e in virConnectListDomains (conn=0x7f92e40018b0, ids=0x7f92f4015ae0,
maxids=14) at ../../../src/libvirt-domain.c:66
#8 0x0000559f21a0d3c1 in remoteDispatchConnectListDomains (server=<optimized out>,
msg=0x559f237beea0, ret=0x7f92f4009eb0, args=0x7f92f4016290, rerr=0x7f9321817bc0,
client=<optimized out>) at ../../../daemon/remote_dispatch.h:2108
#9 remoteDispatchConnectListDomainsHelper (server=<optimized out>,
client=<optimized out>, msg=0x559f237beea0, rerr=0x7f9321817bc0,
args=0x7f92f4016290, ret=0x7f92f4009eb0) at ../../../daemon/remote_dispatch.h:2076
#10 0x00007f932eb5c5b9 in virNetServerProgramDispatchCall (msg=0x559f237beea0,
client=0x559f237c0ce0, server=0x559f23795530, prog=0x559f237b2a50) at
../../../src/rpc/virnetserverprogram.c:437
#11 virNetServerProgramDispatch (prog=0x559f237b2a50, server=server@entry=0x559f23795530,
client=0x559f237c0ce0, msg=0x559f237beea0) at ../../../src/rpc/virnetserverprogram.c:307
#12 0x0000559f21a20268 in virNetServerProcessMsg (msg=<optimized out>,
prog=<optimized out>, client=<optimized out>, srv=0x559f23795530) at
../../../src/rpc/virnetserver.c:148
#13 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x559f23795530) at
../../../src/rpc/virnetserver.c:169
#14 0x00007f932ea3871b in virThreadPoolWorker (opaque=opaque@entry=0x559f2377d860) at
../../../src/util/virthreadpool.c:167
#15 0x00007f932ea37ac8 in virThreadHelper (data=<optimized out>) at
../../../src/util/virthread.c:206
#16 0x00007f932e0ed6ba in start_thread (arg=0x7f9321818700) at pthread_create.c:333
#17 0x00007f932de2341d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Both threads call virHashForEach(table=0x7f92fc69a480). Thread 6 was
first so it starts iterating and sets table->iterating so later when
thread 10 enters the function an error is reported.
I guess we can go with what Dan suggested and after some rework we can
just drop ->iterating completely.
Michal