On Fri, Sep 07, 2012 at 01:24:35PM +0100, Daniel P. Berrange wrote:
A nice long detailed explanation. I agree that this scenario you
outline is plausible as an explanation for why Boxes sometimes
stops getting events from libvirtd.
I've ran more tests in the mean time without this patch applied, but
with the one below to add some debugging:
diff --git a/src/conf/domain_event.c b/src/conf/domain_event.c
index 43ecdcf..33d90fb 100644
--- a/src/conf/domain_event.c
+++ b/src/conf/domain_event.c
@@ -1501,7 +1501,13 @@ virDomainEventStateRegisterID(virConnectPtr conn,
int ret = -1;
virDomainEventStateLock(state);
+ VIR_WARN("RegisterID");
+ if ((state->callbacks->count == 0) && (state->timer == -1)) {
+ if (state->queue->count != 0) {
+ VIR_WARN("REG: queue's not empty: %d",
state->queue->count);
+ }
+ }
if ((state->callbacks->count == 0) &&
(state->timer == -1) &&
(state->timer = virEventAddTimeout(-1,
@@ -1584,6 +1590,7 @@ virDomainEventStateDeregisterID(virConnectPtr conn,
{
int ret;
+ VIR_WARN("DeregisterID");
virDomainEventStateLock(state);
if (state->isDispatching)
ret = virDomainEventCallbackListMarkDeleteID(conn,
@@ -1596,6 +1603,9 @@ virDomainEventStateDeregisterID(virConnectPtr conn,
state->timer != -1) {
virEventRemoveTimeout(state->timer);
state->timer = -1;
+ if (state->queue->count != 0) {
+ VIR_WARN("DEREG: queue's not empty: %d",
state->queue->count);
+ }
}
virDomainEventStateUnlock(state);
I've hit the event lost issue once, and right when this started happening,
the log was:
2012-09-06 11:37:06.094+0000: 30498: warning :
virDomainEventStateDeregisterID:1593 : DeregisterID
2012-09-06 11:37:06.094+0000: 30498: warning :
virDomainEventStateDeregisterID:1607 : DEREG: queue's not empty: 1
2012-09-06 11:45:42.363+0000: 30502: warning :
virDomainEventStateRegisterID:1504 : RegisterID
2012-09-06 11:45:42.363+0000: 30502: warning :
virDomainEventStateRegisterID:1508 : REG: queue's not empty: 1
and after that, no events and these warnings kept happening with an
increasing number of queued events which is consistent with the hypothesis I made
in this patch.
Christophe