[libvirt] Application using libvirt crashes when having concurrent TLS connections (gnutls problem)

Hi list, I was wondering about the status of this bug: https://bugzilla.redhat.com/show_bug.cgi?id=512367 Is it correct that this is a bug in the libvirt client? I ran into this today, as I've written a kind of a VM scheduler (ressource requirements, placement etc.) in Java (libvirt-java 0.3.0, libvirt 0.6.5) and this is a show-stopper for me right now. I had some troubles with the newest version of libvirt (it couldn't connect to Xen IIRC), so I don't want to mess my setup again for nothing. What is actually causing this problem resp. in which situation is libvirt broken? When a client uses more than one connection at the same time? Or when a client uses more than one connection to the same server at the same time? Are there any recommeded workarounds? TIA & kr, thomas

On Fri, Oct 02, 2009 at 09:59:27PM +0200, Thomas Treutner wrote:
Hi list,
I was wondering about the status of this bug:
Patch proposed but the original reporter never indicated whether it actually fixed the problem or not. If someone can confirm ....
Is it correct that this is a bug in the libvirt client? I ran into this today, as I've written a kind of a VM scheduler (ressource requirements, placement etc.) in Java (libvirt-java 0.3.0, libvirt 0.6.5) and this is a show-stopper for me right now. I had some troubles with the newest version of libvirt (it couldn't connect to Xen IIRC), so I don't want to mess my setup again for nothing.
What is actually causing this problem resp. in which situation is libvirt broken? When a client uses more than one connection at the same time? Or when a client uses more than one connection to the same server at the same time? Are there any recommeded workarounds?
I believe the problem is when you use multiple different virConnectPtr objects. If using one connection from multiple threads libvirt itself would have provided sufficient locking. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Monday 05 October 2009 14:43:05 Daniel P. Berrange wrote:
On Fri, Oct 02, 2009 at 09:59:27PM +0200, Thomas Treutner wrote:
Hi list,
I was wondering about the status of this bug:
Patch proposed but the original reporter never indicated whether it actually fixed the problem or not. If someone can confirm ....
Now I get: remote_internal.c:1092: error: 'GCRY_THREAD_OPTION_VERSION' undeclared here (not in a function) /usr/include/gcrypt.h is there. I did a ./configure, make clean, make. System is Ubuntu 8.04. When I close my connections properly, I run into this, most probably whent he Java garbage collector runs: https://www.redhat.com/archives/libvir-list/2009-March/msg00453.html ..although I didn't see a segfault yet. From what I read this bug is fixed in the Python bindings. Is the error message I get noisy but harmless or do the Java bindings need a patch too? kr, thomas

On 10/05/2009 04:48 PM, Thomas Treutner wrote:
On Monday 05 October 2009 14:43:05 Daniel P. Berrange wrote:
On Fri, Oct 02, 2009 at 09:59:27PM +0200, Thomas Treutner wrote:
Hi list,
I was wondering about the status of this bug:
Patch proposed but the original reporter never indicated whether it actually fixed the problem or not. If someone can confirm ....
Now I get:
remote_internal.c:1092: error: 'GCRY_THREAD_OPTION_VERSION' undeclared here (not in a function)
/usr/include/gcrypt.h is there. I did a ./configure, make clean, make. System is Ubuntu 8.04.
When I close my connections properly, I run into this, most probably whent he Java garbage collector runs:
https://www.redhat.com/archives/libvir-list/2009-March/msg00453.html
..although I didn't see a segfault yet. From what I read this bug is fixed in the Python bindings. Is the error message I get noisy but harmless or do the Java bindings need a patch too?
The Java bindings should now be pretty light weight. Are you seeing this in them? If so, spin up a bug and I will take a look. -- bk

On Friday 09 October 2009 19:38:34 Bryan Kearney wrote:
The Java bindings should now be pretty light weight. Are you seeing this in them? If so, spin up a bug and I will take a look.
It took me quite some time to get to the bottom of this, as I'm not a professional full time dev. From what I see now, the gnutls problem just stems from the fact that libvirt connections somewhat tend to pile up. When does that happen? I looked into the libvirt-java and libvirt source, and to me (again, I'm just coding for fun), there are two obvious bugs in libvirt-java (I'm even ashamed I didn't see them for so long ;)), both in org.libvirt.Connect: 1) Connect.close() The return value of libvirt.virConnectClose(VCP) is never checked, so VCP is NULLed in *any* case. That means: If a connection couldn't be shut down because there are still references to it, it can *never* be shut down in the future, even when there are no more references, because the VCP is then NULL. 2) Connect.finalize() This is just a minor annoyance: The status of the VCP is again not checked. Means if the connection was closed properly, and some time later the garbage collector runs, an ugly and more importantly unneccesary error message is printed to the console. There should be a check if the VCP != null or something more appropriate. Regarding finalize(), I read in "Effective Java" that one should never rely on finalize() for tearing down things, as it is completely uncertain *when* the GC may run. This thing went through my brain in random directions, but right now, I think it's just the API user's responsibility to call close(). kr, tom

On 10/28/2009 05:00 PM, Thomas Treutner wrote:
On Friday 09 October 2009 19:38:34 Bryan Kearney wrote:
The Java bindings should now be pretty light weight. Are you seeing this in them? If so, spin up a bug and I will take a look.
It took me quite some time to get to the bottom of this, as I'm not a professional full time dev. From what I see now, the gnutls problem just stems from the fact that libvirt connections somewhat tend to pile up. When does that happen? I looked into the libvirt-java and libvirt source, and to me (again, I'm just coding for fun), there are two obvious bugs in libvirt-java (I'm even ashamed I didn't see them for so long ;)), both in org.libvirt.Connect:
1) Connect.close()
The return value of libvirt.virConnectClose(VCP) is never checked, so VCP is NULLed in *any* case. That means: If a connection couldn't be shut down because there are still references to it, it can *never* be shut down in the future, even when there are no more references, because the VCP is then NULL.
I am not checking it. I will take a look and send you a patch.
2) Connect.finalize()
This is just a minor annoyance: The status of the VCP is again not checked. Means if the connection was closed properly, and some time later the garbage collector runs, an ugly and more importantly unneccesary error message is printed to the console. There should be a check if the VCP != null or something more appropriate.
Agree.. I should check that as well.
Regarding finalize(), I read in "Effective Java" that one should never rely on finalize() for tearing down things, as it is completely uncertain *when* the GC may run. This thing went through my brain in random directions, but right now, I think it's just the API user's responsibility to call close().
Let me work on a patch and send it to you. -- bk

On 10/28/2009 08:18 PM, Bryan Kearney wrote:
On 10/28/2009 05:00 PM, Thomas Treutner wrote:
On Friday 09 October 2009 19:38:34 Bryan Kearney wrote:
Let me work on a patch and send it to you.
Ok.. patch is attached. You can see a test build at: http://bkearney.fedorapeople.org/libvirt-0.3.1-test.jar Let me know if this works for you. If so, I will push a new build. -- bk

On Thursday 29 October 2009 01:30:34 Bryan Kearney wrote:
Ok.. patch is attached. You can see a test build at:
http://bkearney.fedorapeople.org/libvirt-0.3.1-test.jar
Let me know if this works for you. If so, I will push a new build.
Yes, works for me, thank you! A minor improvement: finalize() should have an @override annotation, just to be sure it actually overrides a method. In case it doesn't (due to f.e. wrong return type) a warning (or even error?) is thrown at compile time, avoiding hard to find bugs. A more important issue: In org.libvirt.Domain.free(), the VDP is not checked for being NULL, and return type is void, hardening debugging. I did a copy and paste from your new Connect.close() methods and s/VCP/VDP/, works fine for me. kr, tom

On 10/29/2009 10:39 AM, Thomas Treutner wrote:
On Thursday 29 October 2009 01:30:34 Bryan Kearney wrote:
Ok.. patch is attached. You can see a test build at:
http://bkearney.fedorapeople.org/libvirt-0.3.1-test.jar
Let me know if this works for you. If so, I will push a new build.
Yes, works for me, thank you!
A minor improvement: finalize() should have an @override annotation, just to be sure it actually overrides a method. In case it doesn't (due to f.e. wrong return type) a warning (or even error?) is thrown at compile time, avoiding hard to find bugs.
A more important issue: In org.libvirt.Domain.free(), the VDP is not checked for being NULL, and return type is void, hardening debugging. I did a copy and paste from your new Connect.close() methods and s/VCP/VDP/, works fine for me.
kr, tom Thanks.. i will make those changes and push a 0.3.1 release. I will send out email when done.. if you would be willing to vote up the karma that would be great.
-- bk
participants (3)
-
Bryan Kearney
-
Daniel P. Berrange
-
Thomas Treutner