
On 07.09.2012 14:47, Daniel P. Berrange wrote:
On Fri, Sep 07, 2012 at 02:44:03PM +0200, Christophe Fergeau wrote:
On Fri, Sep 07, 2012 at 01:24:35PM +0100, Daniel P. Berrange wrote:
A nice long detailed explanation. I agree that this scenario you outline is plausible as an explanation for why Boxes sometimes stops getting events from libvirtd.
I've ran more tests in the mean time without this patch applied, but with the one below to add some debugging:
diff --git a/src/conf/domain_event.c b/src/conf/domain_event.c index 43ecdcf..33d90fb 100644 --- a/src/conf/domain_event.c +++ b/src/conf/domain_event.c @@ -1501,7 +1501,13 @@ virDomainEventStateRegisterID(virConnectPtr conn, int ret = -1;
virDomainEventStateLock(state); + VIR_WARN("RegisterID");
[1]
+ if ((state->callbacks->count == 0) && (state->timer == -1)) { + if (state->queue->count != 0) { + VIR_WARN("REG: queue's not empty: %d", state->queue->count);
I understand the WARN level here but not in [1]. Isn't DEBUG just enough there?
+ } + } if ((state->callbacks->count == 0) && (state->timer == -1) && (state->timer = virEventAddTimeout(-1, @@ -1584,6 +1590,7 @@ virDomainEventStateDeregisterID(virConnectPtr conn, { int ret;
+ VIR_WARN("DeregisterID");
ditto
virDomainEventStateLock(state); if (state->isDispatching) ret = virDomainEventCallbackListMarkDeleteID(conn, @@ -1596,6 +1603,9 @@ virDomainEventStateDeregisterID(virConnectPtr conn, state->timer != -1) { virEventRemoveTimeout(state->timer); state->timer = -1; + if (state->queue->count != 0) { + VIR_WARN("DEREG: queue's not empty: %d", state->queue->count); + } }
virDomainEventStateUnlock(state);
I've hit the event lost issue once, and right when this started happening, the log was: 2012-09-06 11:37:06.094+0000: 30498: warning : virDomainEventStateDeregisterID:1593 : DeregisterID 2012-09-06 11:37:06.094+0000: 30498: warning : virDomainEventStateDeregisterID:1607 : DEREG: queue's not empty: 1 2012-09-06 11:45:42.363+0000: 30502: warning : virDomainEventStateRegisterID:1504 : RegisterID 2012-09-06 11:45:42.363+0000: 30502: warning : virDomainEventStateRegisterID:1508 : REG: queue's not empty: 1
and after that, no events and these warnings kept happening with an increasing number of queued events which is consistent with the hypothesis I made in this patch.
Great, that's pretty encouraging then.
Daniel