[libvirt] Core dump caused by misusing openssl in multithread scenario!

Hi, I am running libvirt with ESXi driver in multithread scenario to access ESXi by https. Sometimes a core dump will be generated as following: #0 0x0000003f9b030265 in raise () from /lib64/libc.so.6 #1 0x0000003f9b031d10 in abort () from /lib64/libc.so.6 #2 0x0000003f9b06a84b in __libc_message () from /lib64/libc.so.6 #3 0x0000003f9b072fae in _int_malloc () from /lib64/libc.so.6 #4 0x0000003f9b074cde in malloc () from /lib64/libc.so.6 #5 0x0000003f9b07963b in strerror () from /lib64/libc.so.6 #6 0x0000003fa188032a in ERR_load_ERR_strings () from /lib64/libcrypto.so.6 #7 0x0000003fa187fde9 in ERR_load_crypto_strings () from /lib64/libcrypto.so.6 #8 0x0000003fa48309d9 in SSL_load_error_strings () from /lib64/libssl.so.6 #9 0x00002aaaba8e612e in Curl_ossl_init () from /opt/CSCOppm-unit/hypervisor/libcurl/lib/libcurl.so.4 #10 0x00002aaaba8ee6c1 in curl_global_init () from /opt/CSCOppm-unit/hypervisor/libcurl/lib/libcurl.so.4 #11 0x00002aaaba8ee6f8 in curl_easy_init () from /opt/CSCOppm-unit/hypervisor/libcurl/lib/libcurl.so.4 #12 0x00002aaaba0d932b in esxVI_SessionIsActive (ctx=0x2aaac093ca80, sessionID=0x2aaac06932a0 "`3i\300\252*", userName=0x2aaac0ae6e80 "root", output=0xffffffffffffffff) at esx/esx_vi_methods.generated.c:599 #13 0x00002aaaba0c7a60 in esxStorageVolumeLookupByKey (conn=0x7412, key=0x76c1 <Address 0x76c1 out of bounds>) at esx/esx_storage_driver.c:825 I checked that currently ESXi driver didn't initialize openssl. Because libcurl will not handle openssl for multi-thread. According to openssl API, libvirt should register two methods to support mutli-threads. The detailed description is as following: http://www.openssl.org/docs/crypto/threads.html I have changed code as following: 1. virInitialize() in libvirt.c Old Code: int virInitialize(void) { ... virLogSetFromEnv(); virNetTLSInit(); ... } New Code: int virInitialize(void) { ... virLogSetFromEnv(); virNetTLSInit(); virOpenSSLInit(); ... } 2. In virnetServer.c New Code: pthread_mutex_t *lock_cs; long *lock_count; void virOpenSSLLockCallback(int mode, int type, const char *file ATTRIBUTE_UNUSED, int line ATTRIBUTE_UNUSED) { if (mode & CRYPTO_LOCK) { pthread_mutex_lock(&(lock_cs[type])); lock_count[type]++; } else { pthread_mutex_unlock(&(lock_cs[type])); } } unsigned long virOpenSSLIdCallback(void) { unsigned long ret; ret=(unsigned long)pthread_self(); return(ret); } void virOpenSSLInit(void) { int i; lock_cs=OPENSSL_malloc(CRYPTO_num_locks() * sizeof(pthread_mutex_t)); lock_count=OPENSSL_malloc(CRYPTO_num_locks() * sizeof(long)); for (i=0; i<CRYPTO_num_locks(); i++) { lock_count[i]=0; pthread_mutex_init(&(lock_cs[i]),NULL); } CRYPTO_set_id_callback(virOpenSSLIdCallback); CRYPTO_set_locking_callback(virOpenSSLLockCallback); } To be honest, virOpenSSLInit/ virOpenSSLIdCallback/ virOpenSSLLockCallback should not be defined in this file. But It seems that Makefile generated by autoconfig can't handle the new file recursively. What about this solution? If you have any comments, please feel free to contact me. BTW: If I add a new source/header file, is there a simple way to change Makefile? B.R. Benjamin Wang

On Sat, Sep 29, 2012 at 01:31:07PM +0000, Benjamin Wang (gendwang) wrote:
Hi, I am running libvirt with ESXi driver in multithread scenario to access ESXi by https. Sometimes a core dump will be generated as following: #0 0x0000003f9b030265 in raise () from /lib64/libc.so.6 #1 0x0000003f9b031d10 in abort () from /lib64/libc.so.6 #2 0x0000003f9b06a84b in __libc_message () from /lib64/libc.so.6 #3 0x0000003f9b072fae in _int_malloc () from /lib64/libc.so.6 #4 0x0000003f9b074cde in malloc () from /lib64/libc.so.6 #5 0x0000003f9b07963b in strerror () from /lib64/libc.so.6 #6 0x0000003fa188032a in ERR_load_ERR_strings () from /lib64/libcrypto.so.6 #7 0x0000003fa187fde9 in ERR_load_crypto_strings () from /lib64/libcrypto.so.6 #8 0x0000003fa48309d9 in SSL_load_error_strings () from /lib64/libssl.so.6 #9 0x00002aaaba8e612e in Curl_ossl_init () from /opt/CSCOppm-unit/hypervisor/libcurl/lib/libcurl.so.4 #10 0x00002aaaba8ee6c1 in curl_global_init () from /opt/CSCOppm-unit/hypervisor/libcurl/lib/libcurl.so.4 #11 0x00002aaaba8ee6f8 in curl_easy_init () from /opt/CSCOppm-unit/hypervisor/libcurl/lib/libcurl.so.4 #12 0x00002aaaba0d932b in esxVI_SessionIsActive (ctx=0x2aaac093ca80, sessionID=0x2aaac06932a0 "`3i\300\252*", userName=0x2aaac0ae6e80 "root", output=0xffffffffffffffff) at esx/esx_vi_methods.generated.c:599 #13 0x00002aaaba0c7a60 in esxStorageVolumeLookupByKey (conn=0x7412, key=0x76c1 <Address 0x76c1 out of bounds>) at esx/esx_storage_driver.c:825
I checked that currently ESXi driver didn't initialize openssl. Because libcurl will not handle openssl for multi-thread. According to openssl API, libvirt should
No code in libvirt should assume curl uses openssl - it may well have been compiled with gnutls, or nss instead. The actual flaw here is that libvirt does not invoke 'curl_global_init' from virInitialize. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Hi Daniel, My comments are as following: 1. Currently curl_easy_init method is called from esxVI_CURL_Connect method in esx_vi.c. And curl_global_init method is called by curl_easy_init. If we move Curl_global_init to virInitialize, shall we still need to call curl_easy_init from esxVI_CURL_Connect? Did the latest version fix this problem? 2. If we need to use openssl in multi-threads, we must register the two callbacks. Currently libcurl didn't do it. If we will not register these two callbacks in libvirt, How to do? B.R. Benjamin Wang -----Original Message----- From: Daniel P. Berrange [mailto:berrange@redhat.com] Sent: 2012年10月1日 16:24 To: Benjamin Wang (gendwang) Cc: libvir-list@redhat.com; Yang Zhou (yangzho) Subject: Re: [libvirt] Core dump caused by misusing openssl in multithread scenario! On Sat, Sep 29, 2012 at 01:31:07PM +0000, Benjamin Wang (gendwang) wrote:
Hi, I am running libvirt with ESXi driver in multithread scenario to access ESXi by https. Sometimes a core dump will be generated as following: #0 0x0000003f9b030265 in raise () from /lib64/libc.so.6 #1 0x0000003f9b031d10 in abort () from /lib64/libc.so.6 #2 0x0000003f9b06a84b in __libc_message () from /lib64/libc.so.6 #3 0x0000003f9b072fae in _int_malloc () from /lib64/libc.so.6 #4 0x0000003f9b074cde in malloc () from /lib64/libc.so.6 #5 0x0000003f9b07963b in strerror () from /lib64/libc.so.6 #6 0x0000003fa188032a in ERR_load_ERR_strings () from /lib64/libcrypto.so.6 #7 0x0000003fa187fde9 in ERR_load_crypto_strings () from /lib64/libcrypto.so.6 #8 0x0000003fa48309d9 in SSL_load_error_strings () from /lib64/libssl.so.6 #9 0x00002aaaba8e612e in Curl_ossl_init () from /opt/CSCOppm-unit/hypervisor/libcurl/lib/libcurl.so.4 #10 0x00002aaaba8ee6c1 in curl_global_init () from /opt/CSCOppm-unit/hypervisor/libcurl/lib/libcurl.so.4 #11 0x00002aaaba8ee6f8 in curl_easy_init () from /opt/CSCOppm-unit/hypervisor/libcurl/lib/libcurl.so.4 #12 0x00002aaaba0d932b in esxVI_SessionIsActive (ctx=0x2aaac093ca80, sessionID=0x2aaac06932a0 "`3i\300\252*", userName=0x2aaac0ae6e80 "root", output=0xffffffffffffffff) at esx/esx_vi_methods.generated.c:599 #13 0x00002aaaba0c7a60 in esxStorageVolumeLookupByKey (conn=0x7412, key=0x76c1 <Address 0x76c1 out of bounds>) at esx/esx_storage_driver.c:825
I checked that currently ESXi driver didn't initialize openssl. Because libcurl will not handle openssl for multi-thread. According to openssl API, libvirt should
No code in libvirt should assume curl uses openssl - it may well have been compiled with gnutls, or nss instead. The actual flaw here is that libvirt does not invoke 'curl_global_init' from virInitialize. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Tue, Oct 02, 2012 at 02:57:46AM +0000, Benjamin Wang (gendwang) wrote:
Hi Daniel, My comments are as following: 1. Currently curl_easy_init method is called from esxVI_CURL_Connect method in esx_vi.c. And curl_global_init method is called by curl_easy_init. If we move Curl_global_init to virInitialize, shall we still need to call curl_easy_init from esxVI_CURL_Connect? Did the latest version fix this problem?
That is actually the problem. The CURL docs explicitly tell you that it is *not* safe to rely on curl_easy_init in a multithreaded program. You must call curl_global_init explicitly. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Hi Daniel, Is this problem fixed in the latest version? What about the question 2 which related to openssl callbacks in multi-thread? B.R. Benjamin Wang -----Original Message----- From: Daniel P. Berrange [mailto:berrange@redhat.com] Sent: 2012年10月2日 16:02 To: Benjamin Wang (gendwang) Cc: libvir-list@redhat.com; Yang Zhou (yangzho) Subject: Re: [libvirt] Core dump caused by misusing openssl in multithread scenario! On Tue, Oct 02, 2012 at 02:57:46AM +0000, Benjamin Wang (gendwang) wrote:
Hi Daniel, My comments are as following: 1. Currently curl_easy_init method is called from esxVI_CURL_Connect method in esx_vi.c. And curl_global_init method is called by curl_easy_init. If we move Curl_global_init to virInitialize, shall we still need to call curl_easy_init from esxVI_CURL_Connect? Did the latest version fix this problem?
That is actually the problem. The CURL docs explicitly tell you that it is *not* safe to rely on curl_easy_init in a multithreaded program. You must call curl_global_init explicitly. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

2012/10/2 Benjamin Wang (gendwang) <gendwang@cisco.com>:
Hi Daniel, Is this problem fixed in the latest version? What about the question 2 which related to openssl callbacks in multi-thread?
As Daniel said, we cannot assume that libcurl was build with OpenSSL backend. We would need some way to detect this first. Also, wasn't there a license problem with OpenSSL and the (L)GPL? Can libvirt legally be used with a libcurl that is linked with OpenSSL? -- Matthias Bolte http://photron.blogspot.com

-----Original Message----- From: Matthias Bolte [mailto:matthias.bolte@googlemail.com] Sent: 2012年10月7日 2:14 To: Benjamin Wang (gendwang) Cc: Daniel P. Berrange; libvir-list@redhat.com; Yang Zhou (yangzho) Subject: Re: [libvirt] Core dump caused by misusing openssl in multithread scenario! 2012/10/2 Benjamin Wang (gendwang) <gendwang@cisco.com>:
Hi Daniel, Is this problem fixed in the latest version? What about the question 2 which related to openssl callbacks in multi-thread?
As Daniel said, we cannot assume that libcurl was build with OpenSSL backend. We would need some way to detect this first. [Benjamin]: I agree. But if libcurl want to access ESXi by https. OpenSSL will be used. And libvirt must call CRYPTO_set_id_callback/CRYPTO_set_locking_callback to support multi-threads Also, wasn't there a license problem with OpenSSL and the (L)GPL? Can libvirt legally be used with a libcurl that is linked with OpenSSL? [Benjamin]: I think there is no open source license issue. We will not change libcurl or openssl source code. What we needed is to call openssl API(CRYPTO_set_id_callback/CRYPTO_set_locking_callback) to support multi-threads. -- Matthias Bolte http://photron.blogspot.com

2012/10/8 Benjamin Wang (gendwang) <gendwang@cisco.com>:
-----Original Message----- From: Matthias Bolte [mailto:matthias.bolte@googlemail.com] Sent: 2012年10月7日 2:14 To: Benjamin Wang (gendwang) Cc: Daniel P. Berrange; libvir-list@redhat.com; Yang Zhou (yangzho) Subject: Re: [libvirt] Core dump caused by misusing openssl in multithread scenario!
2012/10/2 Benjamin Wang (gendwang) <gendwang@cisco.com>:
Hi Daniel, Is this problem fixed in the latest version? What about the question 2 which related to openssl callbacks in multi-thread?
As Daniel said, we cannot assume that libcurl was build with OpenSSL backend. We would need some way to detect this first. [Benjamin]: I agree. But if libcurl want to access ESXi by https. OpenSSL will be used. And libvirt must call CRYPTO_set_id_callback/CRYPTO_set_locking_callback to support multi-threads
This is not completely true. libcurl can be compiled with different SSL backends. OpenSSL is only one of them. As Daniel said there are many of them. Current libcurl supports OpenSSL, GnuTLS, NSS, qssl, CyaSSL, PolarSSL and axTLS as SSL backends. That's why we cannot make assumptions about which SSL backend is used by a specific instance of libcurl. You're probably right that if OpenSSL is used libvirt should provide the required multi-threading functions. But this requires that we can detect which SSL backend is used by libcurl library that is linked to libvirt.
Also, wasn't there a license problem with OpenSSL and the (L)GPL? Can libvirt legally be used with a libcurl that is linked with OpenSSL? [Benjamin]: I think there is no open source license issue. We will not change libcurl or openssl source code. What we needed is to call openssl API(CRYPTO_set_id_callback/CRYPTO_set_locking_callback) to support multi-threads.
If OpenSSL is actually incompatible to the (L)GPL then it cannot be used in combination with libvirt. This statement (if it is true) is independent of changing libcurl or OpenSSL code. The problem here will be that libcurl itself uses the MIT license which is compatible with the (L)GPL. But libcurl linked to OpenSSL might be not because OpenSSL's license might not be (L)GPL compatible. This whole point is only about what licenses can legally be mixed in a single project and this doesn't dependent on changing any code or not. The point of libvirt calling OpenSSL functions is the same problem. -- Matthias Bolte http://photron.blogspot.com

On Sat, Oct 06, 2012 at 08:14:09PM +0200, Matthias Bolte wrote:
2012/10/2 Benjamin Wang (gendwang) <gendwang@cisco.com>:
Hi Daniel, Is this problem fixed in the latest version? What about the question 2 which related to openssl callbacks in multi-thread?
As Daniel said, we cannot assume that libcurl was build with OpenSSL backend. We would need some way to detect this first.
Also, wasn't there a license problem with OpenSSL and the (L)GPL? Can libvirt legally be used with a libcurl that is linked with OpenSSL?
The OpenSSL vs GPL license compatibility is a subject of much debate amongst lawyers, so the answer you get depends on who you ask. IANAL, but my personal reading is that the OpenSSL license is *not* compatible with the (L)GPL. Thus as a libvirt copyright holder my position is that libvirt must only be used with a libcurl that links to either NSS or GNUTLS, and *not* OpenSSL. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Mon, Oct 08, 2012 at 03:22:07PM +0100, Daniel P. Berrange wrote:
On Sat, Oct 06, 2012 at 08:14:09PM +0200, Matthias Bolte wrote:
2012/10/2 Benjamin Wang (gendwang) <gendwang@cisco.com>:
Hi Daniel, Is this problem fixed in the latest version? What about the question 2 which related to openssl callbacks in multi-thread?
As Daniel said, we cannot assume that libcurl was build with OpenSSL backend. We would need some way to detect this first.
Also, wasn't there a license problem with OpenSSL and the (L)GPL? Can libvirt legally be used with a libcurl that is linked with OpenSSL?
The OpenSSL vs GPL license compatibility is a subject of much debate amongst lawyers, so the answer you get depends on who you ask. IANAL, but my personal reading is that the OpenSSL license is *not* compatible with the (L)GPL. Thus as a libvirt copyright holder my position is that libvirt must only be used with a libcurl that links to either NSS or GNUTLS, and *not* OpenSSL.
Opps, and this time with the links that I meant to provide https://en.wikipedia.org/wiki/OpenSSL#Licensing http://people.gnome.org/~markmc/openssl-and-the-gpl.html http://lists.debian.org/debian-legal/2002/10/msg00113.html Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

2012/10/2 Daniel P. Berrange <berrange@redhat.com>:
On Tue, Oct 02, 2012 at 02:57:46AM +0000, Benjamin Wang (gendwang) wrote:
Hi Daniel, My comments are as following: 1. Currently curl_easy_init method is called from esxVI_CURL_Connect method in esx_vi.c. And curl_global_init method is called by curl_easy_init. If we move Curl_global_init to virInitialize, shall we still need to call curl_easy_init from esxVI_CURL_Connect? Did the latest version fix this problem?
That is actually the problem. The CURL docs explicitly tell you that it is *not* safe to rely on curl_easy_init in a multithreaded program. You must call curl_global_init explicitly.
And here's is a patch that does this. https://www.redhat.com/archives/libvir-list/2012-October/msg00199.html -- Matthias Bolte http://photron.blogspot.com
participants (3)
-
Benjamin Wang (gendwang)
-
Daniel P. Berrange
-
Matthias Bolte