On Thu, Sep 06, 2012 at 03:06:41PM +0200, Christophe Fergeau wrote:
GNOME Boxes sometimes stops getting domain events from libvirtd,
even
after restarting it. Further investigation in libvirtd shows that
events are properly queued with virDomainEventStateQueue, but the
timer virDomainEventTimer which flushes the events and sends them to
the clients never gets called. Looking at the event queue in gdb
shows that it's non-empty and that its size increases with each new
events.
virDomainEventTimer is set up in virDomainEventStateRegister[ID]
when going from 0 client connecte to 1 client connected, but is
initially disabled. The timer is removed in
virDomainEventStateRegister[ID] when the last client is disconnected
(going from 1 client connected to 0).
This timer (which handles sending the events to the clients) is
enabled in virDomainEventStateQueue when queueing an event on an
empty queue (queue containing 0 events). It's disabled in
virDomainEventStateFlush after flushing the queue (ie removing all
the elements from it). This way, no extra work is done when the queue
is empty, and when the next event comes up, the timer will get
reenabled because the queue will go from 0 event to 1 event, which
triggers enabling the timer.
However, with this Boxes bug, we have a client connected (Boxes), a
non-empty queue (there are events waiting to be sent), but a disabled
timer, so something went wrong.
When Boxes connects (it's the only client connecting to the libvirtd
instance I used for debugging), the event timer is not set as expected
(state->timer == -1 when virDomainEventStateRegisterID is called),
but at the same time the event queue is not empty. In other words,
we had no clients connected, but pending events. This also explains
why the timer never gets enabled as this is only done when an event
is queued on an empty queue.
I think this can happen if an event gets queued using
virDomainEventStateQueue and the client disconnection happens before
the event timer virDomainEventTimer gets a chance to run and flush
the event. In this situation, virDomainEventStateDeregister[ID] will
get called with a non-empty event queue, the timer will be destroyed
if this was the only client connected. Then, when other clients connect
at a later time, they will never get notified about domain events as
the event timer will never get enabled because the timer is only
enabled if the event queue is empty when virDomainEventStateRegister[ID]
gets called, which will is no longer the case.
A nice long detailed explanation. I agree that this scenario you
outline is plausible as an explanation for why Boxes sometimes
stops getting events from libvirtd. It also explains why we don't
see it with VDSM - since they're a long running service, not a
frequently stoppped/started desktop app, they're much less likely
to hit the race.
To avoid this issue, this commit makes sure to remove all events
from
the event queue when the last client in unregistered. As there is
no longer anyone interested in receiving these events, these events
are stale so there is no need to keep them around. A client connecting
later will have no interest in getting events that happened before it
got connected.
Yep, makes sense.
---
src/conf/domain_event.c | 24 +++++++++++++++++++++---
1 file changed, 21 insertions(+), 3 deletions(-)
diff --git a/src/conf/domain_event.c b/src/conf/domain_event.c
index 43ecdcf..7ab973b 100644
--- a/src/conf/domain_event.c
+++ b/src/conf/domain_event.c
@@ -525,13 +525,13 @@ void virDomainEventFree(virDomainEventPtr event)
}
/**
- * virDomainEventQueueFree:
+ * virDomainEventQueueClear:
* @queue: pointer to the queue
*
- * Free the memory in the queue. We process this like a list here
+ * Removes all elements from the queue
*/
static void
-virDomainEventQueueFree(virDomainEventQueuePtr queue)
+virDomainEventQueueClear(virDomainEventQueuePtr queue)
{
int i;
if (!queue)
@@ -541,6 +541,22 @@ virDomainEventQueueFree(virDomainEventQueuePtr queue)
virDomainEventFree(queue->events[i]);
}
VIR_FREE(queue->events);
+ queue->count = 0;
+}
+
+/**
+ * virDomainEventQueueFree:
+ * @queue: pointer to the queue
+ *
+ * Free the memory in the queue. We process this like a list here
+ */
+static void
+virDomainEventQueueFree(virDomainEventQueuePtr queue)
+{
+ if (!queue)
+ return;
+
+ virDomainEventQueueClear(queue);
VIR_FREE(queue);
}
@@ -1559,6 +1575,7 @@ virDomainEventStateDeregister(virConnectPtr conn,
state->timer != -1) {
virEventRemoveTimeout(state->timer);
state->timer = -1;
+ virDomainEventQueueClear(state->queue);
}
virDomainEventStateUnlock(state);
@@ -1596,6 +1613,7 @@ virDomainEventStateDeregisterID(virConnectPtr conn,
state->timer != -1) {
virEventRemoveTimeout(state->timer);
state->timer = -1;
+ virDomainEventQueueClear(state->queue);
}
virDomainEventStateUnlock(state);
ACK
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|