[libvirt] [PATCH 0/2] Maybe fix a crash and some unnecessary logging

First patch fixes some unnecessary loging I've found out thanks to Vincent's crash reproducer. The second one is an RFC, I would like to know what others think about and can find (and maybe help). More info in that commit message. Martin Kletzander (2): Don't update timer if there's none. Purge marked callbacks before dispatching events src/conf/object_event.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) -- 2.10.1

Sometimes virObjectEventStateFlush can be called without timer (if the last event was unregistered right when the timer fired). There is a check for timer == -1, but that triggers warning and other log messages, which is unnecessary. Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/conf/object_event.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/conf/object_event.c b/src/conf/object_event.c index b5a6a81a3a04..e5af4be68a7e 100644 --- a/src/conf/object_event.c +++ b/src/conf/object_event.c @@ -818,7 +818,8 @@ virObjectEventStateFlush(virObjectEventStatePtr state) tempQueue.events = state->queue->events; state->queue->count = 0; state->queue->events = NULL; - virEventUpdateTimeout(state->timer, -1); + if (state->timer != -1) + virEventUpdateTimeout(state->timer, -1); virObjectEventStateQueueDispatch(state, &tempQueue, -- 2.10.1

On 07.10.2016 19:52, Martin Kletzander wrote:
Sometimes virObjectEventStateFlush can be called without timer (if the last event was unregistered right when the timer fired). There is a check for timer == -1, but that triggers warning and other log messages, which is unnecessary.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/conf/object_event.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/conf/object_event.c b/src/conf/object_event.c index b5a6a81a3a04..e5af4be68a7e 100644 --- a/src/conf/object_event.c +++ b/src/conf/object_event.c @@ -818,7 +818,8 @@ virObjectEventStateFlush(virObjectEventStatePtr state) tempQueue.events = state->queue->events; state->queue->count = 0; state->queue->events = NULL; - virEventUpdateTimeout(state->timer, -1); + if (state->timer != -1) + virEventUpdateTimeout(state->timer, -1);
virObjectEventStateQueueDispatch(state, &tempQueue,
ACK. Michal

I don't have any justification for this except an empirical one: with this patch the code from Bug 1363628 doesn't crash after "leaking". I currently don't have properly working valgrind, but I'm working on it and it might shed some light into this (although it might also not happen due to the slowdown). However in the meantime I attempted some analysis and I got even more confused than before, I guess. With this patch the code doesn't crash, even though virObjectEventStateQueueDispatch() properly skips callbacks marked as deleted. What I feel is even more weird, that if I duplicate the purgatory function (instead of moving it), it crashes again, and I feel like even more often. Weirdly-fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1363628 Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/conf/object_event.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/conf/object_event.c b/src/conf/object_event.c index e5af4be68a7e..4066b4673b42 100644 --- a/src/conf/object_event.c +++ b/src/conf/object_event.c @@ -821,13 +821,13 @@ virObjectEventStateFlush(virObjectEventStatePtr state) if (state->timer != -1) virEventUpdateTimeout(state->timer, -1); + /* Purge any deleted callbacks */ + virObjectEventCallbackListPurgeMarked(state->callbacks); + virObjectEventStateQueueDispatch(state, &tempQueue, state->callbacks); - /* Purge any deleted callbacks */ - virObjectEventCallbackListPurgeMarked(state->callbacks); - state->isDispatching = false; virObjectEventStateUnlock(state); } -- 2.10.1

On 07.10.2016 19:52, Martin Kletzander wrote:
I don't have any justification for this except an empirical one: with this patch the code from Bug 1363628 doesn't crash after "leaking".
I currently don't have properly working valgrind, but I'm working on it and it might shed some light into this (although it might also not happen due to the slowdown).
However in the meantime I attempted some analysis and I got even more confused than before, I guess. With this patch the code doesn't crash, even though virObjectEventStateQueueDispatch() properly skips callbacks marked as deleted. What I feel is even more weird, that if I duplicate the purgatory function (instead of moving it), it crashes again, and I feel like even more often.
Weirdly-fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1363628
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/conf/object_event.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/src/conf/object_event.c b/src/conf/object_event.c index e5af4be68a7e..4066b4673b42 100644 --- a/src/conf/object_event.c +++ b/src/conf/object_event.c @@ -821,13 +821,13 @@ virObjectEventStateFlush(virObjectEventStatePtr state) if (state->timer != -1) virEventUpdateTimeout(state->timer, -1);
+ /* Purge any deleted callbacks */ + virObjectEventCallbackListPurgeMarked(state->callbacks); + virObjectEventStateQueueDispatch(state, &tempQueue, state->callbacks);
- /* Purge any deleted callbacks */ - virObjectEventCallbackListPurgeMarked(state->callbacks); - state->isDispatching = false; virObjectEventStateUnlock(state); }
Well, I have no idea either, because virObjectEventStateQueueDispatch() calls virObjectEventStateDispatchCallbacks() which in turn calls virObjectEventDispatchMatchCallback() which returns false if the callback->deleted is truem and thus the callback is skipped - just like it is after this patch of yours. This whole bug feels like a refcounting problem and IMO your patch is just making the scenario more unlikely. Michal

On Mon, Oct 10, 2016 at 04:03:32PM +0800, Michal Privoznik wrote:
On 07.10.2016 19:52, Martin Kletzander wrote:
I don't have any justification for this except an empirical one: with this patch the code from Bug 1363628 doesn't crash after "leaking".
I currently don't have properly working valgrind, but I'm working on it and it might shed some light into this (although it might also not happen due to the slowdown).
However in the meantime I attempted some analysis and I got even more confused than before, I guess. With this patch the code doesn't crash, even though virObjectEventStateQueueDispatch() properly skips callbacks marked as deleted. What I feel is even more weird, that if I duplicate the purgatory function (instead of moving it), it crashes again, and I feel like even more often.
Weirdly-fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1363628
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/conf/object_event.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/src/conf/object_event.c b/src/conf/object_event.c index e5af4be68a7e..4066b4673b42 100644 --- a/src/conf/object_event.c +++ b/src/conf/object_event.c @@ -821,13 +821,13 @@ virObjectEventStateFlush(virObjectEventStatePtr state) if (state->timer != -1) virEventUpdateTimeout(state->timer, -1);
+ /* Purge any deleted callbacks */ + virObjectEventCallbackListPurgeMarked(state->callbacks); + virObjectEventStateQueueDispatch(state, &tempQueue, state->callbacks);
- /* Purge any deleted callbacks */ - virObjectEventCallbackListPurgeMarked(state->callbacks); - state->isDispatching = false; virObjectEventStateUnlock(state); }
Well, I have no idea either, because virObjectEventStateQueueDispatch() calls virObjectEventStateDispatchCallbacks() which in turn calls virObjectEventDispatchMatchCallback() which returns false if the callback->deleted is truem and thus the callback is skipped - just like it is after this patch of yours.
This whole bug feels like a refcounting problem and IMO your patch is just making the scenario more unlikely.
So I've found out what it is thanks to Luayo Huang. And if I understand it correctly, it won't happen with stateful drivers. Stateless, however, will have this problem. The problem is that testConnectClose() clears the driver structure, including calling virObjectEventStateFree() on the eventState. It clears the whole eventState, but it can happen while the eventState is dispatching. So there's the race. And it would also explain what I was seeing all the time. qemuCleanup() is the only function in qemu driver that would cleanup the eventState and since that gets called from the daemon only after other stuff is cleaned up, it won't happen there. I'll try turning virObjectEventStatePtr into virObject, we'll have refcounting then and it will not disappear until it's stopped being used. I'll see if that helps or not, but it could. Martin

On 10.10.2016 21:13, Martin Kletzander wrote:
On Mon, Oct 10, 2016 at 04:03:32PM +0800, Michal Privoznik wrote:
On 07.10.2016 19:52, Martin Kletzander wrote:
I don't have any justification for this except an empirical one: with this patch the code from Bug 1363628 doesn't crash after "leaking".
I currently don't have properly working valgrind, but I'm working on it and it might shed some light into this (although it might also not happen due to the slowdown).
However in the meantime I attempted some analysis and I got even more confused than before, I guess. With this patch the code doesn't crash, even though virObjectEventStateQueueDispatch() properly skips callbacks marked as deleted. What I feel is even more weird, that if I duplicate the purgatory function (instead of moving it), it crashes again, and I feel like even more often.
Weirdly-fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1363628
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/conf/object_event.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/src/conf/object_event.c b/src/conf/object_event.c index e5af4be68a7e..4066b4673b42 100644 --- a/src/conf/object_event.c +++ b/src/conf/object_event.c @@ -821,13 +821,13 @@ virObjectEventStateFlush(virObjectEventStatePtr state) if (state->timer != -1) virEventUpdateTimeout(state->timer, -1);
+ /* Purge any deleted callbacks */ + virObjectEventCallbackListPurgeMarked(state->callbacks); + virObjectEventStateQueueDispatch(state, &tempQueue, state->callbacks);
- /* Purge any deleted callbacks */ - virObjectEventCallbackListPurgeMarked(state->callbacks); - state->isDispatching = false; virObjectEventStateUnlock(state); }
Well, I have no idea either, because virObjectEventStateQueueDispatch() calls virObjectEventStateDispatchCallbacks() which in turn calls virObjectEventDispatchMatchCallback() which returns false if the callback->deleted is truem and thus the callback is skipped - just like it is after this patch of yours.
This whole bug feels like a refcounting problem and IMO your patch is just making the scenario more unlikely.
So I've found out what it is thanks to Luayo Huang. And if I understand it correctly, it won't happen with stateful drivers. Stateless, however, will have this problem.
The problem is that testConnectClose() clears the driver structure, including calling virObjectEventStateFree() on the eventState. It clears the whole eventState, but it can happen while the eventState is dispatching. So there's the race. And it would also explain what I was seeing all the time. qemuCleanup() is the only function in qemu driver that would cleanup the eventState and since that gets called from the daemon only after other stuff is cleaned up, it won't happen there.
Ah, yeah. I can see that in the code now too. Good job!
I'll try turning virObjectEventStatePtr into virObject, we'll have refcounting then and it will not disappear until it's stopped being used. I'll see if that helps or not, but it could.
Yep, this is classical example of why we need refcounted objects. Michal
participants (2)
-
Martin Kletzander
-
Michal Privoznik