[libvirt] [PATCH 0/2] Fix RPC crash / hangs / badness

Given our history with breaking stuff in RPC layer. I suggest we might want to do an rc3 release to let people test this for a couple of days

From: "Daniel P. Berrange" <berrange@redhat.com> Commit fd066925440ba48acc95d8f31b2c98b1cc9d582d tried to fix a race condition in commit fa9595003d043df9f2efe95521c00898cef27106 Author: Daniel P. Berrange <berrange@redhat.com> Date: Fri Nov 11 15:28:41 2011 +0000 Explicitly track whether the buck is held in remote client Unfortunately there is a second race condition whereby the event loop can trigger due to incoming data to read. Revert this fix, so a complete fix for the problem can be cleanly applied * src/rpc/virnetclient.c: Revert fd066925440ba48acc95d8f31b2c98b1cc9d582d --- src/rpc/virnetclient.c | 9 +-------- 1 files changed, 1 insertions(+), 8 deletions(-) diff --git a/src/rpc/virnetclient.c b/src/rpc/virnetclient.c index 7f9a332..e35521d 100644 --- a/src/rpc/virnetclient.c +++ b/src/rpc/virnetclient.c @@ -1545,7 +1545,6 @@ static int virNetClientIO(virNetClientPtr client, virNetClientCallQueue(&client->waitDispatch, thiscall); /* Check to see if another thread is dispatching */ -recheck: if (client->haveTheBuck) { char ignore = 1; @@ -1593,13 +1592,7 @@ recheck: goto cleanup; } - /* Grr, someone might have passed the buck onto us ... */ - - /* We need to re-check if the buck has been passed to this thread - * as this thread might have been signalled to wake up, but another - * call might acquire the lock before this thread manages to wake up. - * This could cause that two threads claim they have the buck */ - goto recheck; + /* Grr, someone passed the buck onto us ... */ } VIR_DEBUG("We have the buck %p %p", client->waitDispatch, thiscall); -- 1.7.7.3

On 12/07/2011 03:46 PM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange"<berrange@redhat.com>
Commit fd066925440ba48acc95d8f31b2c98b1cc9d582d tried to fix a race condition in
commit fa9595003d043df9f2efe95521c00898cef27106 Author: Daniel P. Berrange<berrange@redhat.com> Date: Fri Nov 11 15:28:41 2011 +0000
Explicitly track whether the buck is held in remote client
Unfortunately there is a second race condition whereby the event loop can trigger due to incoming data to read. Revert this fix, so a complete fix for the problem can be cleanly applied
* src/rpc/virnetclient.c: Revert fd066925440ba48acc95d8f31b2c98b1cc9d582d --- ACK.
Peter

From: "Daniel P. Berrange" <berrange@redhat.com> When one thread passes the buck to another thread, it uses virCondSignal to wake up the target thread. The variable 'haveTheBuck' is not updated in a race-free manner when this occurs. The current thread sets it to false, and the woken up thread sets it to true. There is a window where a 3rd thread can come in and grab the buck. Even if this didn't lead to crashes & deadlocks, this would still result in unfairness in the buckpassing algorithm. A better solution is to *never* set haveTheBuck to false when we're passing the buck. Only set it to false when there is no further thread waiting for the buck. * src/rpc/virnetclient.c: Only set haveTheBuck to false if no thread is waiting --- src/rpc/virnetclient.c | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/src/rpc/virnetclient.c b/src/rpc/virnetclient.c index e35521d..469c6a5 100644 --- a/src/rpc/virnetclient.c +++ b/src/rpc/virnetclient.c @@ -1265,6 +1265,7 @@ static void virNetClientIOEventLoopPassTheBuck(virNetClientPtr client, virNetCli } tmp = tmp->next; } + client->haveTheBuck = false; VIR_DEBUG("No thread to pass the buck to"); if (client->wantClose) { @@ -1593,10 +1594,11 @@ static int virNetClientIO(virNetClientPtr client, } /* Grr, someone passed the buck onto us ... */ + } else { + client->haveTheBuck = true; } VIR_DEBUG("We have the buck %p %p", client->waitDispatch, thiscall); - client->haveTheBuck = true; /* * The buck stops here! @@ -1624,8 +1626,6 @@ static int virNetClientIO(virNetClientPtr client, virGetLastError()) rv = -1; - client->haveTheBuck = false; - cleanup: VIR_DEBUG("All done with our call %p %p %d", client->waitDispatch, thiscall, rv); return rv; -- 1.7.7.3

On 12/07/2011 03:46 PM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange"<berrange@redhat.com>
When one thread passes the buck to another thread, it uses virCondSignal to wake up the target thread. The variable 'haveTheBuck' is not updated in a race-free manner when this occurs. The current thread sets it to false, and the woken up thread sets it to true. There is a window where a 3rd thread can come in and grab the buck.
Even if this didn't lead to crashes& deadlocks, this would still result in unfairness in the buckpassing algorithm.
A better solution is to *never* set haveTheBuck to false when we're passing the buck. Only set it to false when there is no further thread waiting for the buck.
* src/rpc/virnetclient.c: Only set haveTheBuck to false if no thread is waiting --- I've tested it with virt-manager and also the trheaded test. Works fine to me and looks good.
ACK Peter
participants (2)
-
Daniel P. Berrange
-
Peter Krempa