On 07.09.2012 14:47, Daniel P. Berrange wrote:
On Fri, Sep 07, 2012 at 02:44:03PM +0200, Christophe Fergeau wrote:
> On Fri, Sep 07, 2012 at 01:24:35PM +0100, Daniel P. Berrange wrote:
>> A nice long detailed explanation. I agree that this scenario you
>> outline is plausible as an explanation for why Boxes sometimes
>> stops getting events from libvirtd.
>
> I've ran more tests in the mean time without this patch applied, but
> with the one below to add some debugging:
>
> diff --git a/src/conf/domain_event.c b/src/conf/domain_event.c
> index 43ecdcf..33d90fb 100644
> --- a/src/conf/domain_event.c
> +++ b/src/conf/domain_event.c
> @@ -1501,7 +1501,13 @@ virDomainEventStateRegisterID(virConnectPtr conn,
> int ret = -1;
>
> virDomainEventStateLock(state);
> + VIR_WARN("RegisterID");
[1]
>
> + if ((state->callbacks->count == 0) && (state->timer == -1)) {
> + if (state->queue->count != 0) {
> + VIR_WARN("REG: queue's not empty: %d",
state->queue->count);
I understand the WARN level here but not in [1]. Isn't DEBUG just enough
there?
> + }
> + }
> if ((state->callbacks->count == 0) &&
> (state->timer == -1) &&
> (state->timer = virEventAddTimeout(-1,
> @@ -1584,6 +1590,7 @@ virDomainEventStateDeregisterID(virConnectPtr conn,
> {
> int ret;
>
> + VIR_WARN("DeregisterID");
ditto
> virDomainEventStateLock(state);
> if (state->isDispatching)
> ret = virDomainEventCallbackListMarkDeleteID(conn,
> @@ -1596,6 +1603,9 @@ virDomainEventStateDeregisterID(virConnectPtr conn,
> state->timer != -1) {
> virEventRemoveTimeout(state->timer);
> state->timer = -1;
> + if (state->queue->count != 0) {
> + VIR_WARN("DEREG: queue's not empty: %d",
state->queue->count);
> + }
> }
>
> virDomainEventStateUnlock(state);
>
>
> I've hit the event lost issue once, and right when this started happening,
> the log was:
> 2012-09-06 11:37:06.094+0000: 30498: warning :
> virDomainEventStateDeregisterID:1593 : DeregisterID
> 2012-09-06 11:37:06.094+0000: 30498: warning :
> virDomainEventStateDeregisterID:1607 : DEREG: queue's not empty: 1
> 2012-09-06 11:45:42.363+0000: 30502: warning :
> virDomainEventStateRegisterID:1504 : RegisterID
> 2012-09-06 11:45:42.363+0000: 30502: warning :
> virDomainEventStateRegisterID:1508 : REG: queue's not empty: 1
>
> and after that, no events and these warnings kept happening with an
> increasing number of queued events which is consistent with the hypothesis I made
> in this patch.
Great, that's pretty encouraging then.
Daniel