[libvirt] Python stream callback removal

I'm trying to write an example serial console implementation in python (attached), but I'm having some trouble getting stream events to do what I want. The console itself works fine as long as the domain stays up, but as soon as the domain shuts down the python script goes into a tight loop repeatedly calling the stream event callback. Debugging indicates that the stream event callback is being requested to be removed, but it never actually is removed which makes me think I am not properly releasing some resource, but I was under the impression that an error on a stream resulting in the stream aborting was supposed to free all the resources for me. Is that not correct? Thanks for any help anybody can provide, Dave

On 08/09/2011 06:04 AM, Dave Allan wrote:
I'm trying to write an example serial console implementation in python (attached), but I'm having some trouble getting stream events to do what I want. The console itself works fine as long as the domain stays up, but as soon as the domain shuts down the python script goes into a tight loop repeatedly calling the stream event callback. Debugging indicates that the stream event callback is being requested to be removed, but it never actually is removed which makes me think I am not properly releasing some resource, but I was under the impression that an error on a stream resulting in the stream aborting was supposed to free all the resources for me. Is that not correct?
Thanks for any help anybody can provide, Dave
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list I tried it with rhel6u1 guest on rhel6u1 host, there are two problems that causes the problems.
One is the position of console=tty0 and console=ttyS0, 9600n8 in grub.conf It's ok for me using as follows: kernel /vmlinuz-2.6.32-131.0.15.el6.x86_64 ro root=/dev/mapper/VolGroup-lv_root rd_LVM_LV=VolGroup/lv_root rd_LVM_LV=VolGroup/lv_swap console=tty0 rhgb quiet console=ttyS0,9600n8 The other one is to add a check to the value of "receivedData", if it is "" string, stop the while loop by set run_console to False like it in stdin_callback function The tight loop caused by the continuous calling to virEventRunDefaultImpl() when no data left in stream. I attached a modified copy of your scripts and tested it.

On Mon, Aug 08, 2011 at 06:04:50PM -0400, Dave Allan wrote:
I'm trying to write an example serial console implementation in python (attached), but I'm having some trouble getting stream events to do what I want. The console itself works fine as long as the domain stays up, but as soon as the domain shuts down the python script goes into a tight loop repeatedly calling the stream event callback. Debugging indicates that the stream event callback is being requested to be removed, but it never actually is removed which makes me think I am not properly releasing some resource, but I was under the impression that an error on a stream resulting in the stream aborting was supposed to free all the resources for me. Is that not correct?
No where in your code do you ever invoke eventRemoveCallback. When the stream is "aborted" this just means that libvirtd has released server side resource & reported the error back to the client. You still have to remove your event callbacks otherwise you'll just be invoked forever. See tools/console.c for example code doing what you're attempting, but in C. In particular the places which call virConsoleShutdown. Your code in Python should basically be a straight conversion of tools/console.c from C into Python. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Tue, Aug 09, 2011 at 10:59:02AM +0100, Daniel P. Berrange wrote:
On Mon, Aug 08, 2011 at 06:04:50PM -0400, Dave Allan wrote:
I'm trying to write an example serial console implementation in python (attached), but I'm having some trouble getting stream events to do what I want. The console itself works fine as long as the domain stays up, but as soon as the domain shuts down the python script goes into a tight loop repeatedly calling the stream event callback. Debugging indicates that the stream event callback is being requested to be removed, but it never actually is removed which makes me think I am not properly releasing some resource, but I was under the impression that an error on a stream resulting in the stream aborting was supposed to free all the resources for me. Is that not correct?
No where in your code do you ever invoke eventRemoveCallback. When the stream is "aborted" this just means that libvirtd has released server side resource & reported the error back to the client. You still have to remove your event callbacks otherwise you'll just be invoked forever. See tools/console.c for example code doing what you're attempting, but in C. In particular the places which call virConsoleShutdown. Your code in Python should basically be a straight conversion of tools/console.c from C into Python.
Thanks, that was the problem; calling remove fixed it. What I'm trying to do is to write a console that does not exit when the domain is down, and the code is now working, at least for a short while. However, I am seeing a strange behavior. After the domain has been powered off twice--regardless of whether the domain was started or stopped when the console program is started--when starting the domain the next time the console hangs and no callbacks are called. I attached to the process with gdb and the backtraces are very different when the process is responsive vs. when it is ok. Any ideas on what's going wrong? Dave

On Tue, Aug 09, 2011 at 10:28:26PM -0400, Dave Allan wrote:
On Tue, Aug 09, 2011 at 10:59:02AM +0100, Daniel P. Berrange wrote:
On Mon, Aug 08, 2011 at 06:04:50PM -0400, Dave Allan wrote:
I'm trying to write an example serial console implementation in python (attached), but I'm having some trouble getting stream events to do what I want. The console itself works fine as long as the domain stays up, but as soon as the domain shuts down the python script goes into a tight loop repeatedly calling the stream event callback. Debugging indicates that the stream event callback is being requested to be removed, but it never actually is removed which makes me think I am not properly releasing some resource, but I was under the impression that an error on a stream resulting in the stream aborting was supposed to free all the resources for me. Is that not correct?
No where in your code do you ever invoke eventRemoveCallback. When the stream is "aborted" this just means that libvirtd has released server side resource & reported the error back to the client. You still have to remove your event callbacks otherwise you'll just be invoked forever. See tools/console.c for example code doing what you're attempting, but in C. In particular the places which call virConsoleShutdown. Your code in Python should basically be a straight conversion of tools/console.c from C into Python.
Thanks, that was the problem; calling remove fixed it.
What I'm trying to do is to write a console that does not exit when the domain is down, and the code is now working, at least for a short while. However, I am seeing a strange behavior. After the domain has been powered off twice--regardless of whether the domain was started or stopped when the console program is started--when starting the domain the next time the console hangs and no callbacks are called. I attached to the process with gdb and the backtraces are very different when the process is responsive vs. when it is ok.
Any ideas on what's going wrong?
So, after your patches which have greatly improved the console behavior, I find that I'm back to this hang, which by its nature I can't reproduce with virsh console, as it only appears when I've shutdown and started a domain several times within the same connection. The hang is 100% reproducible. Per our IRC conversation, I'm attaching the RPC logs, as well as the python code for reference and a backtrace of the python process at the time that it was hung. Dave

On 08/18/2011 05:55 AM, Dave Allan wrote:
On Tue, Aug 09, 2011 at 10:28:26PM -0400, Dave Allan wrote:
On Tue, Aug 09, 2011 at 10:59:02AM +0100, Daniel P. Berrange wrote:
On Mon, Aug 08, 2011 at 06:04:50PM -0400, Dave Allan wrote:
I'm trying to write an example serial console implementation in python (attached), but I'm having some trouble getting stream events to do what I want. The console itself works fine as long as the domain stays up, but as soon as the domain shuts down the python script goes into a tight loop repeatedly calling the stream event callback. Debugging indicates that the stream event callback is being requested to be removed, but it never actually is removed which makes me think I am not properly releasing some resource, but I was under the impression that an error on a stream resulting in the stream aborting was supposed to free all the resources for me. Is that not correct? No where in your code do you ever invoke eventRemoveCallback. When the stream is "aborted" this just means that libvirtd has released server side resource& reported the error back to the client. You still have to remove your event callbacks otherwise you'll just be invoked forever. See tools/console.c for example code doing what you're attempting, but in C. In particular the places which call virConsoleShutdown. Your code in Python should basically be a straight conversion of tools/console.c from C into Python. Thanks, that was the problem; calling remove fixed it.
What I'm trying to do is to write a console that does not exit when the domain is down, and the code is now working, at least for a short while. However, I am seeing a strange behavior. After the domain has been powered off twice--regardless of whether the domain was started or stopped when the console program is started--when starting the domain the next time the console hangs and no callbacks are called. I attached to the process with gdb and the backtraces are very different when the process is responsive vs. when it is ok.
Any ideas on what's going wrong? So, after your patches which have greatly improved the console behavior, I find that I'm back to this hang, which by its nature I can't reproduce with virsh console, as it only appears when I've shutdown and started a domain several times within the same connection. The hang is 100% reproducible. Per our IRC conversation, I'm attaching the RPC logs, as well as the python code for reference and a backtrace of the python process at the time that it was hung.
Dave
I can produce the problem, so I did an research on this. According to the libvirtd log, it hangs because when the domain boot up at the second time, the libvirtd send a message to python scripts due to the lifecycle_callback setting, meanwhile setting the socket fd of the client to "mode=0", that means neither readable or writable on the libvirtd side. So when the python scripts got the lifecycle event and trys to call virDomainGetState() in the command of openning console, after it sent the message to libvirtd, it hanged and never get the response. libvirtd log <snippet> 22:37:47.795: 8856: debug : remoteRelayDomainEventLifecycle:129 : Relaying domain lifecycle event 2 0 22:37:47.795: 8856: debug : virNetMessageNew:44 : msg=0x2306f20 22:37:47.795: 8856: debug : virNetMessageEncodePayload:256 : Encode length as 68 22:37:47.795: 8856: debug : remoteDispatchDomainEventSend:2516 : Queue event 107 68 22:37:47.795: 8856: debug : virNetMessageFree:57 : msg=0x2306f20 22:37:47.795: 8856: debug : virNetServerClientCalculateHandleMode:130 : tls=(nil) hs=-1, rx=(nil) tx=(nil) 22:37:47.795: 8856: debug : virNetServerClientCalculateHandleMode:160 : mode=0 <snippet/>

On Wed, Aug 24, 2011 at 11:58:29PM +0800, Guannan Ren wrote:
On 08/18/2011 05:55 AM, Dave Allan wrote:
So, after your patches which have greatly improved the console behavior, I find that I'm back to this hang, which by its nature I can't reproduce with virsh console, as it only appears when I've shutdown and started a domain several times within the same connection. The hang is 100% reproducible. Per our IRC conversation, I'm attaching the RPC logs, as well as the python code for reference and a backtrace of the python process at the time that it was hung.
Dave
I can produce the problem, so I did an research on this. According to the libvirtd log, it hangs because when the domain boot up at the second time, the libvirtd send a message to python scripts due to the lifecycle_callback setting, meanwhile setting the socket fd of the client to "mode=0", that means neither readable or writable on the libvirtd side. So when the python scripts got the lifecycle event and trys to call virDomainGetState() in the command of openning console, after it sent the message to libvirtd, it hanged and never get the response.
The problem is that before decrementing 'client->nrequests' we check what the message type/status is. The check is incorrect for streams, because it failed to take account of the fact that some stream errors may be asynchronous and thus untracked. This in turn caused the 'nrequests' variable to go negative. A fix which worked for me is here https://www.redhat.com/archives/libvir-list/2011-August/msg01518.html Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

The value of nrequests should always be zero or positive but in the case of stream, when error occurrs, it could be negative and makes the client socket fd neither writable nor readable. The case will lead to hang on libvirt client side. This patch aim to fix the problem. --- src/rpc/virnetserverclient.c | 8 ++++++-- 1 files changed, 6 insertions(+), 2 deletions(-) diff --git a/src/rpc/virnetserverclient.c b/src/rpc/virnetserverclient.c index a73b06d..882ba77 100644 --- a/src/rpc/virnetserverclient.c +++ b/src/rpc/virnetserverclient.c @@ -78,8 +78,8 @@ struct _virNetServerClient * ie RPC calls in progress. Does not count * async events which are not used for * throttling calculations */ - size_t nrequests; - size_t nrequests_max; + ssize_t nrequests; + ssize_t nrequests_max; /* Zero or one messages being received. Zero if * nrequests >= max_clients and throttling */ virNetMessagePtr rx; @@ -889,6 +889,10 @@ virNetServerClientDispatchWrite(virNetServerClientPtr client) (msg->header.type == VIR_NET_STREAM && msg->header.status != VIR_NET_CONTINUE)) { client->nrequests--; + + if (client->nrequests < 0) + client->nrequests = 0; + /* See if the recv queue is currently throttled */ if (!client->rx && client->nrequests < client->nrequests_max) { -- 1.7.1

On Thu, Aug 25, 2011 at 10:07:36PM +0800, Guannan Ren wrote:
The value of nrequests should always be zero or positive but in the case of stream, when error occurrs, it could be negative and makes the client socket fd neither writable nor readable. The case will lead to hang on libvirt client side. This patch aim to fix the problem.
What is the scenario you use to demonstrate the original problem ? IMHO the fix below is not right, because it is merely patching up the problems caused by the bug, rather than actually fixing the bug.
--- src/rpc/virnetserverclient.c | 8 ++++++-- 1 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/src/rpc/virnetserverclient.c b/src/rpc/virnetserverclient.c index a73b06d..882ba77 100644 --- a/src/rpc/virnetserverclient.c +++ b/src/rpc/virnetserverclient.c @@ -78,8 +78,8 @@ struct _virNetServerClient * ie RPC calls in progress. Does not count * async events which are not used for * throttling calculations */ - size_t nrequests; - size_t nrequests_max; + ssize_t nrequests; + ssize_t nrequests_max; /* Zero or one messages being received. Zero if * nrequests >= max_clients and throttling */ virNetMessagePtr rx; @@ -889,6 +889,10 @@ virNetServerClientDispatchWrite(virNetServerClientPtr client) (msg->header.type == VIR_NET_STREAM && msg->header.status != VIR_NET_CONTINUE)) { client->nrequests--; + + if (client->nrequests < 0) + client->nrequests = 0; + /* See if the recv queue is currently throttled */ if (!client->rx && client->nrequests < client->nrequests_max) {
Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 08/25/2011 11:36 PM, Daniel P. Berrange wrote:
On Thu, Aug 25, 2011 at 10:07:36PM +0800, Guannan Ren wrote:
The value of nrequests should always be zero or positive but in the case of stream, when error occurrs, it could be negative and makes the client socket fd neither writable nor readable. The case will lead to hang on libvirt client side. This patch aim to fix the problem. What is the scenario you use to demonstrate the original problem ?
IMHO the fix below is not right, because it is merely patching up the problems caused by the bug, rather than actually fixing the bug.
The scenarios is just like the python scripts did, sometimes maybe we need to monitor and control the stream all the time without quit on the client side, other than quit like what libvirt-client did when guest was off. So I think libvirtd should support it. According the log and code, I noticed when the guest is shutting down, on the server side, the libvirtd will receive two messages from the pty opened fd, one is the null string, the other is an error message.(sometime, only null string, without error message the case happened on one of boxes ). The null string is with type VIR_NET_STREAM and status VIR_NET_CONTINUE, but the error message is of type VIR_NET_STREAM and status VIR_NET_ERROR that this will lead to client->nrequests-- in virNetServerClientDispatchWrite(). so when the guest shut down at the second time, the value of nrequests will become -1 on the assumption that only one thread connecting on client side at that time. The negative value will pass over the value assignment to client->rx and lead the mode of socket fd to zero, the client side hangs. One the libvirt-client side, when it meets the null string, it calls virConsoleShutdown() then quit and ignore the error message. I don't know if it is reasonable to fix the problem, any idea is appreciated. Guannan Ren
participants (3)
-
Daniel P. Berrange
-
Dave Allan
-
Guannan Ren