[libvirt] ruby-libvirt issue

Good evening, could anybody explain me this, please? divinus:~ # virsh error: cannot recv data: : Connection reset by peer error: failed to connect to the hypervisor divinus:~ # I am trying to imlement Chris's ruby-libvirt to my Ruby on Rails application. First few connection to libvirt daemon (thru ruby-libvirt API) from my app are ok, but then somethings happend and it's not possible to connect anymore until I restart libvirt daemon manually by init.d script or until I restart apache (or WEBrick). And it's not possible connect to virsh from shell also! See messages above. When I try ruby-libvirt from bare Ruby script, everythig is fine - doesn't matter how many times I connect. It's strange. Does anybody have experience with ruby-libvirt in RoR application? Thank you. Jaromir.

On 07/17/10 - 09:30:29PM, Jaromír Červenka wrote:
Good evening,
could anybody explain me this, please?
divinus:~ # virsh error: cannot recv data: : Connection reset by peer error: failed to connect to the hypervisor divinus:~ #
I am trying to imlement Chris's ruby-libvirt to my Ruby on Rails application. First few connection to libvirt daemon (thru ruby-libvirt API) from my app are ok, but then somethings happend and it's not possible to connect anymore until I restart libvirt daemon manually by init.d script or until I restart apache (or WEBrick). And it's not possible connect to virsh from shell also! See messages above.
When I try ruby-libvirt from bare Ruby script, everythig is fine - doesn't matter how many times I connect. It's strange.
Does anybody have experience with ruby-libvirt in RoR application?
That is pretty strange. One way that I could imagine this happening is if libvirtd runs out of file descriptors (or some other resource), and then can't accept any new connections. There have been some bugs in libvirtd that cause that, though it could equally be that your objects aren't being garbage collected by the Ruby VM. What version of libvirtd are you testing against? If it's possible, I wonder if you can force a garbage collection run in ruby and see if that frees up any resources? -- Chris Lalancette

Hi, i think that I know where is the problem. My "messages" log says: Jul 29 19:36:41 divinus libvirtd: 19:36:41.032: error : qemudDispatchServer:1315 : Too many active clients (100), dropping connection It looks like that ruby-libvirt doesn't closing connection, when it's running under passenger/rails/apache. The "100" is my defined number in /etc/libvirt/libvirtd.conf Kind regards, Jaromir. Dne 19. července 2010 20:34 Chris Lalancette <clalance@redhat.com>napsal(a):
On 07/17/10 - 09:30:29PM, Jaromír Červenka wrote:
Good evening,
could anybody explain me this, please?
divinus:~ # virsh error: cannot recv data: : Connection reset by peer error: failed to connect to the hypervisor divinus:~ #
I am trying to imlement Chris's ruby-libvirt to my Ruby on Rails application. First few connection to libvirt daemon (thru ruby-libvirt API) from my app are ok, but then somethings happend and it's not possible to connect anymore until I restart libvirt daemon manually by init.d script or until I restart apache (or WEBrick). And it's not possible connect to virsh from shell also! See messages above.
When I try ruby-libvirt from bare Ruby script, everythig is fine - doesn't matter how many times I connect. It's strange.
Does anybody have experience with ruby-libvirt in RoR application?
That is pretty strange. One way that I could imagine this happening is if libvirtd runs out of file descriptors (or some other resource), and then can't accept any new connections. There have been some bugs in libvirtd that cause that, though it could equally be that your objects aren't being garbage collected by the Ruby VM. What version of libvirtd are you testing against? If it's possible, I wonder if you can force a garbage collection run in ruby and see if that frees up any resources?
-- Chris Lalancette

On 07/29/10 - 07:41:26PM, Jaromír Červenka wrote:
Hi,
i think that I know where is the problem. My "messages" log says:
Jul 29 19:36:41 divinus libvirtd: 19:36:41.032: error : qemudDispatchServer:1315 : Too many active clients (100), dropping connection
It looks like that ruby-libvirt doesn't closing connection, when it's running under passenger/rails/apache. The "100" is my defined number in /etc/libvirt/libvirtd.conf
Ah, OK. So then the question becomes whether the problem is in the ruby bindings not releasing the object in certain circumstances, or in rails holding onto the object too long. Unfortunately I'm not really that familiar with rails or passenger, so I'm not entirely sure how to figure out where the problem is. I'll see if I can do a bit of testing and look at object lifetimes from the point-of-view of ruby-libvirt to try and eliminate that from the equation. -- Chris Lalancette

On Thu, Jul 29, 2010 at 02:32:06PM -0400, Chris Lalancette wrote:
On 07/29/10 - 07:41:26PM, Jaromír Červenka wrote:
Hi,
i think that I know where is the problem. My "messages" log says:
Jul 29 19:36:41 divinus libvirtd: 19:36:41.032: error : qemudDispatchServer:1315 : Too many active clients (100), dropping connection
It looks like that ruby-libvirt doesn't closing connection, when it's running under passenger/rails/apache. The "100" is my defined number in /etc/libvirt/libvirtd.conf
Ah, OK. So then the question becomes whether the problem is in the ruby bindings not releasing the object in certain circumstances, or in rails holding onto the object too long. Unfortunately I'm not really that familiar with rails or passenger, so I'm not entirely sure how to figure out where the problem is.
I'll see if I can do a bit of testing and look at object lifetimes from the point-of-view of ruby-libvirt to try and eliminate that from the equation.
Does ruby-libvirt rely on garbage collection to close the connection, or is there an explicit 'close' method on the virConnect binding. It is generally not workable to rely on garbage collection for closing connections because of the unpredictability of when that may run. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On 08/05/10 - 12:35:03PM, Daniel P. Berrange wrote:
On Thu, Jul 29, 2010 at 02:32:06PM -0400, Chris Lalancette wrote:
On 07/29/10 - 07:41:26PM, Jaromír Červenka wrote:
Hi,
i think that I know where is the problem. My "messages" log says:
Jul 29 19:36:41 divinus libvirtd: 19:36:41.032: error : qemudDispatchServer:1315 : Too many active clients (100), dropping connection
It looks like that ruby-libvirt doesn't closing connection, when it's running under passenger/rails/apache. The "100" is my defined number in /etc/libvirt/libvirtd.conf
Ah, OK. So then the question becomes whether the problem is in the ruby bindings not releasing the object in certain circumstances, or in rails holding onto the object too long. Unfortunately I'm not really that familiar with rails or passenger, so I'm not entirely sure how to figure out where the problem is.
I'll see if I can do a bit of testing and look at object lifetimes from the point-of-view of ruby-libvirt to try and eliminate that from the equation.
Does ruby-libvirt rely on garbage collection to close the connection, or is there an explicit 'close' method on the virConnect binding. It is generally not workable to rely on garbage collection for closing connections because of the unpredictability of when that may run.
It's an explicit close method. So my guess is that somewhere a close is being missed in Jaromír's application. -- Chris Lalancette

On 29.07.2010 20:32, Chris Lalancette wrote:
Ah, OK. So then the question becomes whether the problem is in the ruby bindings not releasing the object in certain circumstances, or in rails holding onto the object too long. Unfortunately I'm not really that familiar with rails or passenger, so I'm not entirely sure how to figure out where the problem is.
I'll see if I can do a bit of testing and look at object lifetimes from the point-of-view of ruby-libvirt to try and eliminate that from the equation.
following testcase confirms that problem is in bindings itself: ree-1.8.7-2010.02 > require 'libvirt' => true ree-1.8.7-2010.02 > puts `netstat -na|grep -v LISTENING |grep -c libvirt-sock` 0 => nil ree-1.8.7-2010.02 > c = Libvirt::open 'qemu:///system' => #<Libvirt::Connect:0x265b718> ree-1.8.7-2010.02 > puts `netstat -na|grep -v LISTENING |grep -c libvirt-sock` 1 => nil ree-1.8.7-2010.02 > d = c.lookup_domain_by_name 'abc' => #<Libvirt::Domain:0x2705128> ree-1.8.7-2010.02 > puts `netstat -na|grep -v LISTENING |grep -c libvirt-sock` 1 => nil ree-1.8.7-2010.02 > c.close => nil ree-1.8.7-2010.02 > puts `netstat -na|grep -v LISTENING |grep -c libvirt-sock` 1 => nil ree-1.8.7-2010.02 > c.closed? => true ree-1.8.7-2010.02 > d.connection.closed? => true ree-1.8.7-2010.02 > d.create ArgumentError: Connection has been closed from (irb):28:in `create' from (irb):28 from :0 ree-1.8.7-2010.02 > d=nil => nil ree-1.8.7-2010.02 > c=nil => nil ree-1.8.7-2010.02 > GC.start => nil ree-1.8.7-2010.02 > puts `netstat -na|grep -v LISTENING |grep -c libvirt-sock` 1 => nil Tested with MRI 1.8.7-p302 and ree-1.8.7-2010.02. My question is how to close connection to libvirtd? regards, -- Pawel

On 11/21/10 - 11:25:09PM, Pawel Krzesniak wrote:
On 29.07.2010 20:32, Chris Lalancette wrote:
Ah, OK. So then the question becomes whether the problem is in the ruby bindings not releasing the object in certain circumstances, or in rails holding onto the object too long. Unfortunately I'm not really that familiar with rails or passenger, so I'm not entirely sure how to figure out where the problem is.
I'll see if I can do a bit of testing and look at object lifetimes from the point-of-view of ruby-libvirt to try and eliminate that from the equation.
following testcase confirms that problem is in bindings itself:
OK, I see what the problem is. I'm not entirely sure if it is actually a problem in the bindings, but the behavior is a bit surprising in a garbage-collected language. See below.
ree-1.8.7-2010.02 > require 'libvirt' => true ree-1.8.7-2010.02 > puts `netstat -na|grep -v LISTENING |grep -c libvirt-sock` 0 => nil ree-1.8.7-2010.02 > c = Libvirt::open 'qemu:///system' => #<Libvirt::Connect:0x265b718> ree-1.8.7-2010.02 > puts `netstat -na|grep -v LISTENING |grep -c libvirt-sock` 1
After this point, we have a single connection.
=> nil ree-1.8.7-2010.02 > d = c.lookup_domain_by_name 'abc' => #<Libvirt::Domain:0x2705128> ree-1.8.7-2010.02 > puts `netstat -na|grep -v LISTENING |grep -c libvirt-sock` 1
Still the same connection.
=> nil ree-1.8.7-2010.02 > c.close
Here's where the problem is. Because you are doing c.close before cleaning up the domain object, the libvirt library keeps a connection open. If you do either: d = nil GC.start or d.free before the c.close, then everything would be cleaned up correctly. In terms of making this automatically happen during connection closing, I'm not entirely sure what we can (and should) do. I guess we could keep some sort of list of objects that "depend" on this connection object, and then during connection close free them all up. Does anyone know how the python bindings handle this? -- Chris Lalancette

On Tue, Nov 23, 2010 at 08:28:54AM -0500, Chris Lalancette wrote:
On 11/21/10 - 11:25:09PM, Pawel Krzesniak wrote:
On 29.07.2010 20:32, Chris Lalancette wrote:
Ah, OK. So then the question becomes whether the problem is in the ruby bindings not releasing the object in certain circumstances, or in rails holding onto the object too long. Unfortunately I'm not really that familiar with rails or passenger, so I'm not entirely sure how to figure out where the problem is.
I'll see if I can do a bit of testing and look at object lifetimes from the point-of-view of ruby-libvirt to try and eliminate that from the equation.
following testcase confirms that problem is in bindings itself:
OK, I see what the problem is. I'm not entirely sure if it is actually a problem in the bindings, but the behavior is a bit surprising in a garbage-collected language. See below.
ree-1.8.7-2010.02 > require 'libvirt' => true ree-1.8.7-2010.02 > puts `netstat -na|grep -v LISTENING |grep -c libvirt-sock` 0 => nil ree-1.8.7-2010.02 > c = Libvirt::open 'qemu:///system' => #<Libvirt::Connect:0x265b718> ree-1.8.7-2010.02 > puts `netstat -na|grep -v LISTENING |grep -c libvirt-sock` 1
After this point, we have a single connection.
=> nil ree-1.8.7-2010.02 > d = c.lookup_domain_by_name 'abc' => #<Libvirt::Domain:0x2705128> ree-1.8.7-2010.02 > puts `netstat -na|grep -v LISTENING |grep -c libvirt-sock` 1
Still the same connection.
=> nil ree-1.8.7-2010.02 > c.close
Here's where the problem is. Because you are doing c.close before cleaning up the domain object, the libvirt library keeps a connection open. If you do either:
d = nil GC.start
or
d.free
before the c.close, then everything would be cleaned up correctly.
In terms of making this automatically happen during connection closing, I'm not entirely sure what we can (and should) do. I guess we could keep some sort of list of objects that "depend" on this connection object, and then during connection close free them all up. Does anyone know how the python bindings handle this?
They don't do anything. IMHO this is just expected behaviour. The domain objects reference the connections, so it is only natural that the connection remains around for as long as any domain (or other object) remains in scope. Daniel

On Tue, Nov 23, 2010 at 14:28, Chris Lalancette <clalance@redhat.com> wrote:
In terms of making this automatically happen during connection closing, I'm not entirely sure what we can (and should) do. I guess we could keep some sort of list of objects that "depend" on this connection object, and then during connection close free them all up. Does anyone know how the python bindings handle this?
python bindings work more or less the same way (see attachment). So conclusion is, that all objects must be free() before closing connection. It's not so intuitive, so maybe info about that should be somewhere in docs? -- Pawel

2010/11/24 Paweł Krześniak <pawel.krzesniak@gmail.com>:
On Tue, Nov 23, 2010 at 14:28, Chris Lalancette <clalance@redhat.com> wrote:
In terms of making this automatically happen during connection closing, I'm not entirely sure what we can (and should) do. I guess we could keep some sort of list of objects that "depend" on this connection object, and then during connection close free them all up. Does anyone know how the python bindings handle this?
python bindings work more or less the same way (see attachment).
So conclusion is, that all objects must be free() before closing connection. It's not so intuitive, so maybe info about that should be somewhere in docs?
Actually it is supposed to work correctly as long as you match every virConnectOpen* call with a virConnectFree call and each call that returns a virDomainPtr, virStorageVolPtr etc with the corresponding free call. It should not matter in which order you call the close/free functions, the internal refcounting should make it work. Obviously the order seems to matter regarding the output of "netstat -na | grep -v LISTENING | grep -c libvirt-sock". The problem is not in the Python or Ruby bindings as I can reproduce it using the C API (current git version). This might indicate a problem in libvirt. I attached the test program. With NORMAL_ORDER = 1 the initial value stays the same, with NORMAL_ORDER = 0 it grows by 1 per iteration. Matthias

2010/11/24 Matthias Bolte <matthias.bolte@googlemail.com>:
2010/11/24 Paweł Krześniak <pawel.krzesniak@gmail.com>:
On Tue, Nov 23, 2010 at 14:28, Chris Lalancette <clalance@redhat.com> wrote:
In terms of making this automatically happen during connection closing, I'm not entirely sure what we can (and should) do. I guess we could keep some sort of list of objects that "depend" on this connection object, and then during connection close free them all up. Does anyone know how the python bindings handle this?
python bindings work more or less the same way (see attachment).
So conclusion is, that all objects must be free() before closing connection. It's not so intuitive, so maybe info about that should be somewhere in docs?
Actually it is supposed to work correctly as long as you match every virConnectOpen* call with a virConnectFree call and each call that returns a virDomainPtr, virStorageVolPtr etc with the corresponding free call. It should not matter in which order you call the close/free functions, the internal refcounting should make it work.
Obviously the order seems to matter regarding the output of "netstat -na | grep -v LISTENING | grep -c libvirt-sock". The problem is not in the Python or Ruby bindings as I can reproduce it using the C API (current git version). This might indicate a problem in libvirt.
I attached the test program. With NORMAL_ORDER = 1 the initial value stays the same, with NORMAL_ORDER = 0 it grows by 1 per iteration.
Matthias
Okay, I found it. The problem with the "non-normal" order is that libvirt will miss to close the open drivers when virConnectClose isn't the one that removes the last ref from the connection. This results in the remote driver keeping the handle to the libvirt-sock open. I just posted a patch that fixes this problem: https://www.redhat.com/archives/libvir-list/2010-November/msg01105.html Matthias
participants (6)
-
Chris Lalancette
-
Daniel P. Berrange
-
Jaromír Červenka
-
Matthias Bolte
-
Pawel Krzesniak
-
Paweł Krześniak