Eric Blake <eblake(a)redhat.com> wrote on 03/22/2012 06:54:31 PM:
On 03/22/2012 04:49 PM, David Stevens wrote:
> Stefan Berger/Watson/IBM wrote on 03/22/2012 03:04:53 PM:
>
>>
>> I have some concerns about the cancelation of the thread. It can
>> hold the snoop lock and get cancelled while holding it. Next time
>> that lock is grabbed we will get a deadlock...
>>
>
> The snoop lock is acquired in virNWFilterDHCPSnoopEnd(), which
> is where the pthread_cancel() directly (for valid leases) or the
> freeReq()/pthread_cancel() is done. So, the canceler has the lock
> and the canceling thread can't have it, then.
>
> The only other case I see is when the config goes invalid and
> we exit the snooper loop; he frees the snoop_lock() before removing
> his own hash entry, which will cancel the same thread doing
> the remove. Again, it can't have the snoop lock when canceled.
>
> Is there some other case you think I'm missing?
pthread_cancel() tends to imply that you are properly managing signal
blocking across threads; we haven't used it anywhere else in libvirt,
and I'm extremely wary of pulling it in now, as there's probably a lot
of subtle bugs that it would expose. Are you sure you can't do this in
some other manner without dragging in pthread_cancel()?
From the patch:
+ if (pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, &tmp) < 0)
+ virNWFilterReportError(VIR_ERR_INTERNAL_ERROR,
"pthread_setcanceltype "
+ "failed; ifname \"%s\"",
req->ifname ?
+ req->ifname : "<NULL>");
from the man page of above function:
PTHREAD_CANCEL_ASYNCHRONOUS
The thread can be canceled at any time. (Typically, it
will be
canceled immediately upon receiving a cancellation request,
but
the system doesn't guarantee this.)
It seems implementation dependent on when the thread actually die...
So I can construct a scenario where the canceller holds the lock and the
thread doesn't, the thread gets cancelled but not immediately and can
still grab the lock and then dies. This is what I concluded to have seen
from below stack trace when the snooping thread was running and in rapid
succession I was shooting two SIGHUPs at libvirt:
Thread 11 (Thread 0x7f3c29ce8700 (LWP 12800)):
#0 0x00000038ec60b3b4 in pthread_cond_wait@(a)GLIBC_2.3.2 ()
from /lib64/libpthread.so.0
#1 0x00007f3c2fe2229a in virCondWait (c=<value optimized out>,
m=<value optimized out>) at util/threads-pthread.c:117
#2 0x00007f3c2fe2295b in virThreadPoolWorker (opaque=<value optimized
out>)
at util/threadpool.c:103
#3 0x00007f3c2fe21f36 in virThreadHelper (data=<value optimized out>)
at util/threads-pthread.c:161
#4 0x00000038ec606ccb in start_thread () from /lib64/libpthread.so.0
#5 0x00000038ec2e0c2d in clone () from /lib64/libc.so.6
Thread 10 (Thread 0x7f3c294e7700 (LWP 12801)):
#0 0x00000038ec60b3b4 in pthread_cond_wait@(a)GLIBC_2.3.2 ()
from /lib64/libpthread.so.0
#1 0x00007f3c2fe2229a in virCondWait (c=<value optimized out>,
m=<value optimized out>) at util/threads-pthread.c:117
#2 0x00007f3c2fe2295b in virThreadPoolWorker (opaque=<value optimized
out>)
at util/threadpool.c:103
#3 0x00007f3c2fe21f36 in virThreadHelper (data=<value optimized out>)
at util/threads-pthread.c:161
#4 0x00000038ec606ccb in start_thread () from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#5 0x00000038ec2e0c2d in clone () from /lib64/libc.so.6
Thread 9 (Thread 0x7f3c28ce6700 (LWP 12802)):
#0 0x00000038ec60b3b4 in pthread_cond_wait@(a)GLIBC_2.3.2 ()
from /lib64/libpthread.so.0
#1 0x00007f3c2fe2229a in virCondWait (c=<value optimized out>,
m=<value optimized out>) at util/threads-pthread.c:117
#2 0x00007f3c2fe2295b in virThreadPoolWorker (opaque=<value optimized
out>)
at util/threadpool.c:103
#3 0x00007f3c2fe21f36 in virThreadHelper (data=<value optimized out>)
at util/threads-pthread.c:161
#4 0x00000038ec606ccb in start_thread () from /lib64/libpthread.so.0
#5 0x00000038ec2e0c2d in clone () from /lib64/libc.so.6
Thread 8 (Thread 0x7f3c284e5700 (LWP 12803)):
#0 0x00000038ec60b3b4 in pthread_cond_wait@(a)GLIBC_2.3.2 ()
from /lib64/libpthread.so.0
#1 0x00007f3c2fe2229a in virCondWait (c=<value optimized out>,
m=<value optimized out>) at util/threads-pthread.c:117
#2 0x00007f3c2fe2295b in virThreadPoolWorker (opaque=<value optimized
out>)
at util/threadpool.c:103
#3 0x00007f3c2fe21f36 in virThreadHelper (data=<value optimized out>)
at util/threads-pthread.c:161
---Type <return> to continue, or q <return> to quit---
#4 0x00000038ec606ccb in start_thread () from /lib64/libpthread.so.0
#5 0x00000038ec2e0c2d in clone () from /lib64/libc.so.6
Thread 7 (Thread 0x7f3c27ce4700 (LWP 12804)):
#0 0x00000038ec60b3b4 in pthread_cond_wait@(a)GLIBC_2.3.2 ()
from /lib64/libpthread.so.0
#1 0x00007f3c2fe2229a in virCondWait (c=<value optimized out>,
m=<value optimized out>) at util/threads-pthread.c:117
#2 0x00007f3c2fe2295b in virThreadPoolWorker (opaque=<value optimized
out>)
at util/threadpool.c:103
#3 0x00007f3c2fe21f36 in virThreadHelper (data=<value optimized out>)
at util/threads-pthread.c:161
#4 0x00000038ec606ccb in start_thread () from /lib64/libpthread.so.0
#5 0x00000038ec2e0c2d in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x7f3c274e3700 (LWP 12805)):
#0 0x00000038ec60b3b4 in pthread_cond_wait@(a)GLIBC_2.3.2 ()
from /lib64/libpthread.so.0
#1 0x00007f3c2fe2229a in virCondWait (c=<value optimized out>,
m=<value optimized out>) at util/threads-pthread.c:117
#2 0x00007f3c2fe2295b in virThreadPoolWorker (opaque=<value optimized
out>)
at util/threadpool.c:103
#3 0x00007f3c2fe21f36 in virThreadHelper (data=<value optimized out>)
---Type <return> to continue, or q <return> to quit---
at util/threads-pthread.c:161
#4 0x00000038ec606ccb in start_thread () from /lib64/libpthread.so.0
#5 0x00000038ec2e0c2d in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7f3c26ce2700 (LWP 12806)):
#0 0x00000038ec60b3b4 in pthread_cond_wait@(a)GLIBC_2.3.2 ()
from /lib64/libpthread.so.0
#1 0x00007f3c2fe2229a in virCondWait (c=<value optimized out>,
m=<value optimized out>) at util/threads-pthread.c:117
#2 0x00007f3c2fe2295b in virThreadPoolWorker (opaque=<value optimized
out>)
at util/threadpool.c:103
#3 0x00007f3c2fe21f36 in virThreadHelper (data=<value optimized out>)
at util/threads-pthread.c:161
#4 0x00000038ec606ccb in start_thread () from /lib64/libpthread.so.0
#5 0x00000038ec2e0c2d in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7f3c264e1700 (LWP 12807)):
#0 0x00000038ec60b3b4 in pthread_cond_wait@(a)GLIBC_2.3.2 ()
from /lib64/libpthread.so.0
#1 0x00007f3c2fe2229a in virCondWait (c=<value optimized out>,
m=<value optimized out>) at util/threads-pthread.c:117
#2 0x00007f3c2fe2295b in virThreadPoolWorker (opaque=<value optimized
out>)
at util/threadpool.c:103
---Type <return> to continue, or q <return> to quit---
#3 0x00007f3c2fe21f36 in virThreadHelper (data=<value optimized out>)
at util/threads-pthread.c:161
#4 0x00000038ec606ccb in start_thread () from /lib64/libpthread.so.0
#5 0x00000038ec2e0c2d in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7f3c25ce0700 (LWP 12808)):
#0 0x00000038ec60b3b4 in pthread_cond_wait@(a)GLIBC_2.3.2 ()
from /lib64/libpthread.so.0
#1 0x00007f3c2fe2229a in virCondWait (c=<value optimized out>,
m=<value optimized out>) at util/threads-pthread.c:117
#2 0x00007f3c2fe2295b in virThreadPoolWorker (opaque=<value optimized
out>)
at util/threadpool.c:103
#3 0x00007f3c2fe21f36 in virThreadHelper (data=<value optimized out>)
at util/threads-pthread.c:161
#4 0x00000038ec606ccb in start_thread () from /lib64/libpthread.so.0
#5 0x00000038ec2e0c2d in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7f3c254df700 (LWP 12809)):
#0 0x00000038ec60b3b4 in pthread_cond_wait@(a)GLIBC_2.3.2 ()
from /lib64/libpthread.so.0
#1 0x00007f3c2fe2229a in virCondWait (c=<value optimized out>,
m=<value optimized out>) at util/threads-pthread.c:117
#2 0x00007f3c2fe2295b in virThreadPoolWorker (opaque=<value optimized
out>)
---Type <return> to continue, or q <return> to quit---
at util/threadpool.c:103
#3 0x00007f3c2fe21f36 in virThreadHelper (data=<value optimized out>)
at util/threads-pthread.c:161
#4 0x00000038ec606ccb in start_thread () from /lib64/libpthread.so.0
#5 0x00000038ec2e0c2d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f3c2fda2820 (LWP 12799)):
#0 0x00000038ec60dc9c in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00000038ec609125 in _L_lock_880 () from /lib64/libpthread.so.0
#2 0x00000038ec609008 in pthread_mutex_lock () from
/lib64/libpthread.so.0
#3 0x00007f3c2fe73818 in virNWFilterObjFindByUUID
(nwfilters=0x7f3c1c015008,
uuid=<value optimized out>) at conf/nwfilter_conf.c:2662
#4 0x00007f3c2fe738ad in virNWFilterObjAssignDef (conn=0x2055190,
nwfilters=0x7f3c1c015008, def=0x201ad60) at conf/nwfilter_conf.c:2947
#5 0x00007f3c2fe73d6f in virNWFilterObjLoad (conn=0x2055190,
nwfilters=0x7f3c1c015008, configDir=0x7f3c1c089460
"/etc/libvirt/nwfilter")
at conf/nwfilter_conf.c:3045
#6 virNWFilterLoadAllConfigs (conn=0x2055190, nwfilters=0x7f3c1c015008,
configDir=0x7f3c1c089460 "/etc/libvirt/nwfilter")
at conf/nwfilter_conf.c:3092
#7 0x00000000004e9084 in nwfilterDriverReload ()
at nwfilter/nwfilter_driver.c:163
#8 0x00007f3c2fea2c8f in virStateReload () at libvirt.c:891
---Type <return> to continue, or q <return> to quit---
#9 0x0000000000420572 in daemonReloadHandler (srv=<value optimized out>,
sig=<value optimized out>, opaque=<value optimized out>) at
libvirtd.c:1153
#10 0x00007f3c2fefd4ba in virNetServerSignalEvent (
watch=<value optimized out>, fd=<value optimized out>,
events=<value optimized out>, opaque=0x200fe50) at
rpc/virnetserver.c:500
#11 0x00007f3c2fe0f651 in virEventPollDispatchHandles ()
at util/event_poll.c:490
#12 virEventPollRunOnce () at util/event_poll.c:637
#13 0x00007f3c2fe0dd15 in virEventRunDefaultImpl () at util/event.c:247
#14 0x00007f3c2fefdffd in virNetServerRun (srv=0x200fe50)
at rpc/virnetserver.c:736
#15 0x000000000042235d in main (argc=<value optimized out>,
argv=<value optimized out>) at libvirtd.c:1609
I bet thread 1 was trying to grab a lock that wasn't properly unlocked due
to the async cancel. All the other threads look pretty much the same and I
doubt either one holds the lock.
Maybe we should go with the previous code from a while ago which was
setting a flag for the thread to die. It caused other work-arounds to
become necessary but at least we don't have to deal with possibly async.
deaths of threads holding locks.
Stefan