On 08/18/2011 05:55 AM, Dave Allan wrote:
On Tue, Aug 09, 2011 at 10:28:26PM -0400, Dave Allan wrote:
> On Tue, Aug 09, 2011 at 10:59:02AM +0100, Daniel P. Berrange wrote:
>> On Mon, Aug 08, 2011 at 06:04:50PM -0400, Dave Allan wrote:
>>> I'm trying to write an example serial console implementation in python
>>> (attached), but I'm having some trouble getting stream events to do
>>> what I want. The console itself works fine as long as the domain
>>> stays up, but as soon as the domain shuts down the python script goes
>>> into a tight loop repeatedly calling the stream event callback.
>>> Debugging indicates that the stream event callback is being requested
>>> to be removed, but it never actually is removed which makes me think I
>>> am not properly releasing some resource, but I was under the
>>> impression that an error on a stream resulting in the stream aborting
>>> was supposed to free all the resources for me. Is that not correct?
>> No where in your code do you ever invoke eventRemoveCallback.
>> When the stream is "aborted" this just means that libvirtd
>> has released server side resource& reported the error back
>> to the client. You still have to remove your event callbacks
>> otherwise you'll just be invoked forever. See tools/console.c
>> for example code doing what you're attempting, but in C.
>> In particular the places which call virConsoleShutdown.
>> Your code in Python should basically be a straight conversion
>> of tools/console.c from C into Python.
> Thanks, that was the problem; calling remove fixed it.
>
> What I'm trying to do is to write a console that does not exit when
> the domain is down, and the code is now working, at least for a short
> while. However, I am seeing a strange behavior. After the domain has
> been powered off twice--regardless of whether the domain was started
> or stopped when the console program is started--when starting the
> domain the next time the console hangs and no callbacks are called. I
> attached to the process with gdb and the backtraces are very different
> when the process is responsive vs. when it is ok.
>
> Any ideas on what's going wrong?
So, after your patches which have greatly improved the console
behavior, I find that I'm back to this hang, which by its nature I
can't reproduce with virsh console, as it only appears when I've
shutdown and started a domain several times within the same
connection. The hang is 100% reproducible. Per our IRC conversation,
I'm attaching the RPC logs, as well as the python code for reference
and a backtrace of the python process at the time that it was hung.
Dave
I can produce the problem, so I did an research on this.
According to the libvirtd log, it hangs because when the
domain boot up at the second time,
the libvirtd send a message to python scripts due to the
lifecycle_callback setting, meanwhile
setting the socket fd of the client to "mode=0", that means
neither readable or writable on the
libvirtd side.
So when the python scripts got the lifecycle event and trys to
call virDomainGetState() in
the command of openning console, after it sent the message to
libvirtd, it hanged and never get
the response.
libvirtd log
<snippet>
22:37:47.795: 8856: debug :
remoteRelayDomainEventLifecycle:129 : Relaying domain lifecycle event 2 0
22:37:47.795: 8856: debug : virNetMessageNew:44 :
msg=0x2306f20
22:37:47.795: 8856: debug :
virNetMessageEncodePayload:256 : Encode length as 68
22:37:47.795: 8856: debug :
remoteDispatchDomainEventSend:2516 : Queue event 107 68
22:37:47.795: 8856: debug : virNetMessageFree:57 :
msg=0x2306f20
22:37:47.795: 8856: debug :
virNetServerClientCalculateHandleMode:130 : tls=(nil) hs=-1, rx=(nil)
tx=(nil)
22:37:47.795: 8856: debug :
virNetServerClientCalculateHandleMode:160 : mode=0
<snippet/>