On Mon, Oct 10, 2016 at 04:03:32PM +0800, Michal Privoznik wrote:
> On 07.10.2016 19:52, Martin Kletzander wrote:
>> I don't have any justification for this except an empirical one: with
>> this patch the code from Bug 1363628 doesn't crash after
"leaking".
>>
>> I currently don't have properly working valgrind, but I'm working on it
>> and it might shed some light into this (although it might also not
>> happen due to the slowdown).
>>
>> However in the meantime I attempted some analysis and I got even more
>> confused than before, I guess. With this patch the code doesn't crash,
>> even though virObjectEventStateQueueDispatch() properly skips callbacks
>> marked as deleted. What I feel is even more weird, that if I duplicate
>> the purgatory function (instead of moving it), it crashes again, and I
>> feel like even more often.
>>
>> Weirdly-fixes:
https://bugzilla.redhat.com/show_bug.cgi?id=1363628
>>
>> Signed-off-by: Martin Kletzander <mkletzan(a)redhat.com>
>> ---
>> src/conf/object_event.c | 6 +++---
>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/conf/object_event.c b/src/conf/object_event.c
>> index e5af4be68a7e..4066b4673b42 100644
>> --- a/src/conf/object_event.c
>> +++ b/src/conf/object_event.c
>> @@ -821,13 +821,13 @@ virObjectEventStateFlush(virObjectEventStatePtr
>> state)
>> if (state->timer != -1)
>> virEventUpdateTimeout(state->timer, -1);
>>
>> + /* Purge any deleted callbacks */
>> + virObjectEventCallbackListPurgeMarked(state->callbacks);
>> +
>> virObjectEventStateQueueDispatch(state,
>> &tempQueue,
>> state->callbacks);
>>
>> - /* Purge any deleted callbacks */
>> - virObjectEventCallbackListPurgeMarked(state->callbacks);
>> -
>> state->isDispatching = false;
>> virObjectEventStateUnlock(state);
>> }
>
> Well, I have no idea either, because virObjectEventStateQueueDispatch()
> calls virObjectEventStateDispatchCallbacks() which in turn calls
> virObjectEventDispatchMatchCallback() which returns false if the
> callback->deleted is truem and thus the callback is skipped - just like
> it is after this patch of yours.
>
> This whole bug feels like a refcounting problem and IMO your patch is
> just making the scenario more unlikely.
>
So I've found out what it is thanks to Luayo Huang. And if I understand
it correctly, it won't happen with stateful drivers. Stateless,
however, will have this problem.
The problem is that testConnectClose() clears the driver structure,
including calling virObjectEventStateFree() on the eventState. It
clears the whole eventState, but it can happen while the eventState is
dispatching. So there's the race. And it would also explain what I was
seeing all the time. qemuCleanup() is the only function in qemu driver
that would cleanup the eventState and since that gets called from the
daemon only after other stuff is cleaned up, it won't happen there.
Ah, yeah. I can see that in the code now too. Good job!
I'll try turning virObjectEventStatePtr into virObject, we'll
have
refcounting then and it will not disappear until it's stopped being
used. I'll see if that helps or not, but it could.
Yep, this is classical example of why we need refcounted objects.
Michal