Hi Matthias,
This can't be reproduced 100%. I reproduce this case twice. But when I set the
CURLOPT_NOSIGNAL to 1. I didn't find the similar
core again. And it seems that everything works well. What do you mean " stuck in a
DNS lookup"?
B.R.
Benjamin Wang
-----Original Message-----
From: Matthias Bolte [mailto:matthias.bolte@googlemail.com]
Sent: 2012年9月30日 4:20
To: Benjamin Wang (gendwang)
Cc: libvir-list(a)redhat.com; Yang Zhou (yangzho)
Subject: Re: Two core dumps are generated in multi-thread scenarios
2012/9/23 Benjamin Wang (gendwang) <gendwang(a)cisco.com>:
Hi,
I found two core dumps generated in multi-thread scenarios in ESX part.
Case1: libcurl support multi-thread
core dump:
#12 0x00002aaabea89712 in addbyter () from /usr/local/lib/libcurl.so.4
#13 0x00002aaabea89b86 in dprintf_formatf () from
/usr/local/lib/libcurl.so.4
#14 0x00002aaabea8b055 in curl_mvsnprintf () from
/usr/local/lib/libcurl.so.4
#15 0x00002aaabea7678f in Curl_failf () from
/usr/local/lib/libcurl.so.4
#16 0x00002aaabea6d871 in Curl_resolv_timeout () from
/usr/local/lib/libcurl.so.4
#17 0x00000006e8a8f230 in ?? ()
Fix code:
esxVI_CURL_Connect() in esx_vi.c:
I add a new line as following:
curl_easy_setopt(curl->handle, CURLOPT_NOSIGNAL, 1);
It took me a moment reading libcurl code until I figured out what might be happening here.
The problem is that Curl_resolv_timeout uses SIGALRM + sigsetjmp/siglongjmp to realize the
timeout logic. This implementation is not thread-safe as the SIGALRM might be executed on
a different thread than the original thread that started the call to Curl_resolv_timeout.
This in turn results in the call to Curl_resolv_timeout being continued via siglongjmp
(called from the SIGALRM handler) on different thread. Setting CURLOPT_NOSIGNAL to 1 makes
libcurl avoid the SIGALRM + sigsetjmp/siglongjmp implementation.
This solves the problem but with the cost of losing the timeout capability.
In your case a DNS lookup took longer than libcurl was willing to wait and a timeout
aborted it. But the call to Curl_failf (as part of the timeout error handling) was made on
the wrong thread (I think) making it segfault. IMHO there is no ideal solution here,
because with CURLOPT_NOSIGNAL set to 0 (the default) libcurl can realize DNS lookup with
timeout, but the error handling might occur on the wrong thread.
But with CURLOPT_NOSIGNAL set to 1 the segfault is avoided but libcurl might get stuck in
a DNS lookup.
Are you able to reproduce this problem and can you confirm that setting CURLOPT_NOSIGNAL
to 1 fixes it?
--
Matthias Bolte
http://photron.blogspot.com