Originaly, the code checked if another client is the queue and infered
ownership of the buck from that. Commit fa9595003d043df9f2efe95521c008
added a separate variable to track the buck. That caused, that a new
call might enter claiming it has the buck, while another thread was
signalled to take the buck. This ends in two threads claiming they hold
the buck and entering poll(). This happens due to a race on waking up
threads on the client lock mutex.
This caused multi-threaded clients to hang, most prominently visible and
reproducible on python based clients, like virt-manager.
This patch causes threads, that have been signalled to take the buck to
re-check if buck is held by another thread.
---
src/rpc/virnetclient.c | 9 ++++++++-
1 files changed, 8 insertions(+), 1 deletions(-)
diff --git a/src/rpc/virnetclient.c b/src/rpc/virnetclient.c
index 5165c8d..07d8a60 100644
--- a/src/rpc/virnetclient.c
+++ b/src/rpc/virnetclient.c
@@ -1545,6 +1545,7 @@ static int virNetClientIO(virNetClientPtr client,
virNetClientCallQueue(&client->waitDispatch, thiscall);
/* Check to see if another thread is dispatching */
+recheck:
if (client->haveTheBuck) {
char ignore = 1;
@@ -1592,7 +1593,13 @@ static int virNetClientIO(virNetClientPtr client,
goto cleanup;
}
- /* Grr, someone passed the buck onto us ... */
+ /* Grr, someone might have passed the buck onto us ... */
+
+ /* We need to re-check if the buck has been passed to this thread
+ * as this thread might have been signalled to wake up, but the another
+ * call might acquire the lock before this thread manages to wake up.
+ * This could cause that two threads claim they have the buck */
+ goto recheck;
}
VIR_DEBUG("We have the buck %p %p", client->waitDispatch, thiscall);
--
1.7.3.4