
On Thu, Dec 21, 2017 at 01:18 PM +0100, John Ferlan <jferlan@redhat.com> wrote:
On 12/21/2017 06:29 AM, Marc Hartmayer wrote:
On Fri, Dec 15, 2017 at 07:16 PM +0100, John Ferlan <jferlan@redhat.com> wrote:
On 12/12/2017 06:36 AM, Marc Hartmayer wrote:
There is a race between virNetServerProcessClients (main thread) and remoteDispatchAuthList/remoteDispatchAuthPolkit/remoteSASLFinish (worker thread) that can lead to decrementing srv->nclients_unauth when it's zero. Since virNetServerCheckLimits relies on the value srv->nclients_unauth the underrun causes libvirtd to stop accepting new connections forever.
Example race scenario (assuming libvirtd is using policykit and the client is privileged): 1. The client calls the RPC remoteDispatchAuthList => remoteDispatchAuthList is executed on a worker thread (Thread T1). We're assuming now the execution stops for some time before the line 'virNetServerClientSetAuth(client, 0)' 2. The client closes the connection irregularly. This causes the event loop to wake up and virNetServerProcessClient to be called (on the main thread T0). During the virNetServerProcessClients the srv lock is hold. The condition virNetServerClientNeedAuth(client) will be checked and as the authentication is not finished right now virNetServerTrackCompletedAuthLocked(srv) will be called => --srv->nclients_unauth => 0 3. The Thread T1 continues, marks the client as authenticated, and calls virNetServerTrackCompletedAuthLocked(srv) => --srv->nclients_unauth => --0 => wrap around as nclient_unauth is unsigned 4. virNetServerCheckLimits(srv) will disable the services forever
To fix it, add an auth_pending field to the client struct so that it is now possible to determine if the authentication process has already been handled for this client.
Setting the authentication method to none for the client in virNetServerProcessClients is not a proper way to indicate that the counter has been decremented, as this would imply that the client is authenticated.
Additionally, adjust the existing test cases for this new field.
In any case, I think perhaps this can be split into two patches - one that just sets/clears the auth_pending and then what's leftover as the fix.
I’m not sure it’s necessary as the correct usage of auth_pending is already the fix. But let’s see then.
OK. I understand. I assume by this point I was juggling in my head a few of the various adjustments so I figured that by adding the flag and separating the logic between just the set/clear flag and the reason it was added would be easier to review, but then again that would make the set/clear patch essentially meaningless.
diff --git a/src/libvirt_remote.syms b/src/libvirt_remote.syms index 7fcae970484d..cef2794e1122 100644 --- a/src/libvirt_remote.syms +++ b/src/libvirt_remote.syms @@ -138,6 +138,7 @@ virNetServerClientGetUNIXIdentity; virNetServerClientImmediateClose;
[…snip…]
@@ -375,6 +376,7 @@ static virNetServerClientPtr virNetServerClientNewInternal(unsigned long long id, virNetSocketPtr sock, int auth, + bool auth_pending, #ifdef WITH_GNUTLS virNetTLSContextPtr tls, #endif @@ -384,6 +386,12 @@ virNetServerClientNewInternal(unsigned long long id, { virNetServerClientPtr client;
+ /* If the used authentication method implies that the new client + * is automatically authenticated, the authentication cannot be + * pending */ + if (auth_pending && virNetServerClientAuthMethodImpliesAuthenticated(auth)) ^^^^^^ Why check twice? I'm not clear on the need for this check. The setting of auth_pending is based on @auth in the caller or perhaps the value of auth_pending already. If there's some sort of check to be made perhaps it's in ExecRestart.
Normally I would prefer an assert here… The intent behind this guard was that it should prevent the wrong usage of this function.
But I’ll move the check.
I think this is perhaps why I made the separate the patches comment. I was starting to overflow my heap.
Also, if we returned NULL here without some sort of error, then couldn't we get the libvirt failed for some reason error message which is always not fun to track down.
Does it makes sense to leave the check here (and additionally add it to ExecRestart) and throw an error message here?
Adding a check, a message, and having a NULL return to guard against some future change passing the wrong parameters perhaps means that future patch submitter (and eventual reviewer) didn't clearly understand the purpose of the function...
Yep. That’s why I like asserts… (for functions used internally only). You can 'tell' other developers what are the preconditions/postconditions and help them to understand the code flow. Sorry for the off topic… Short answer, I’ll move the check.
Still this is a *NewInternal function - I would hope passing correct arguments wouldn't be problem for it.
Me too.
+ return NULL; + if (virNetServerClientInitialize() < 0) return NULL;
@@ -393,6 +401,7 @@ virNetServerClientNewInternal(unsigned long long id, client->id = id; client->sock = virObjectRef(sock); client->auth = auth; + client->auth_pending = auth_pending; client->readonly = readonly; #ifdef WITH_GNUTLS client->tlsCtxt = virObjectRef(tls); @@ -440,6 +449,7 @@ virNetServerClientPtr virNetServerClientNew(unsigned long long id, { virNetServerClientPtr client; time_t now; + bool auth_pending = !virNetServerClientAuthMethodImpliesAuthenticated(auth);
I think it's much easier for readability to have result be auth != VIR_NET_SERVER_SERVICE_AUTH_NONE instead of this call... Perhaps even easier to find all instances of AUTH_NONE.
This function/condition is used in four places. But sure, if you’re still saying it’s easier to read and to understand that a user is authenticated (and therefore the authentication is no longer pending) if auth_method == VIR_NET_SERVER_SERVICE_AUTH_NONE then I’ll replace this function at all places :)
By this time I wasn't sure what was going to happen from Dan's patch 8 comment and the virNetServerClientNeedAuth[Locked] functions vs. the name chosen (ImpliesAuthenticated) that was AFAICT trying to convey what you discovered.
Whether the comparison is hidden behind some call or a direct comparison I guess just becomes an implementation detail. I use cscope a lot to find things, so searching on NONE is perhaps easier that searching on NONE, finding some API that returning true/false based off of NONE, and then searching on callers for that too. OTOH, it's cleaner to have a helper <sigh>.
Shorter answer, your call. If you want the helper, that's fine.
I’ll think about it. […snip] -- Beste Grüße / Kind regards Marc Hartmayer IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294