On 01/01/2014 03:44 PM, Eric Blake wrote:
In my recent work on events, I'm running into something where
I'm
stumped on timing behavior. I'm using
virEventRegisterDefaultImpl()/virEventRunDefaultImpl() as my event loop,
and trying to test that an event is firing correctly.
Ping. I'm not seeing the delays in my 'virsh event' patches, where the
event loop is being handled in a separate thread; so the problem may
only be in situations where the event loop is being drained by a single
thread (pointing to a problem in libvirt.so and not in the server). But
I'm still stumped on how to pinpoint what the problem is.
With the test:///default (or any other local URI), if I register an
event callback function, the callback gets invoked almost right after
the event happens. This is the same behavior under test in our
testsuite (in objecteventtest.c). But with qemu:///system (or any other
remote URI), when I register an event callback function, my callback
does not get invoked until I make some other RPC call on the connection.
At first, I thought the events were not being wired up at all, and that
I was just blocking forever because I missed something, but while
stepping through gdb on the client side, I noticed that calling
virConnectDomainEventDeregisterAny() in my shutdown code then indeed
proceeded to go through the entire backlog of events that had been
generated, but at that point I was no longer registered so my callback
was not called. So I ended up using virEventAddTimeout() to set up a
periodic interrupt (at 10 seconds, slow enough to see the delays), where
in the handler I called virConnectGetLibVersion() as a way to force RPC
traffic and at least get events as a result of my timer. If it matters,
I was initially using keepalive_interval=-1 in
/etc/libvirt/libvirtd.conf in order to avoid killing my debug sessions;
but even when I switched to keepalive_interval=3, I'm still seeing the
same delayed behavior (if the keepalive were to make a difference, I
would have expected seeing the event at 3 seconds instead of my timer at
10 seconds).
I thought that the RPC protocol was supposed to allow asynchronous
events, and with minimal downtime. gdb in libvirtd says that
remoteDispatchObjectEventSend() is called in a timely manner, but that
merely queues things up, and I got lost trying to follow where
virNetSocketUpdateIOCallback() turned into something that wakes up the
actual write() to the socket. On the other side, I don't see a call to
virDomainEventDispatchDefaultFunc() which then calls my callback until
the next RPC call/response, but again, I got lost on trying to trace
whether a read() of the socket was getting data in a timely manner but
just not firing anything in my event loop, or whether things were
getting stuck client side after the read but with nothing waking up my
event loop. Any advice on how to better debug where things are getting
stuck, to determine whether the holdup is in the server not sending or
in the client not being responsive to the send? Is it something where
systemtap might give me a better picture? Or better yet, a patch to fix
the problem would be nice.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library
http://libvirt.org