Re: [libvirt] [PATCH 11/14] rpc: virnetserver: Fix race on srv->nclients_unauth

Thursday, 21 December 2017

On Thu, Dec 21, 2017 at 01:18 PM +0100, John Ferlan <jferlan(a)redhat.com&gt; wrote:
...
 On 12/21/2017 06:29 AM, Marc Hartmayer wrote:
> On Fri, Dec 15, 2017 at 07:16 PM +0100, John Ferlan <jferlan(a)redhat.com&gt;
wrote:
>> On 12/12/2017 06:36 AM, Marc Hartmayer wrote:
>>> There is a race between virNetServerProcessClients (main thread) and
>>> remoteDispatchAuthList/remoteDispatchAuthPolkit/remoteSASLFinish (worker
>>> thread) that can lead to decrementing srv->nclients_unauth when it's
>>> zero. Since virNetServerCheckLimits relies on the value
>>> srv->nclients_unauth the underrun causes libvirtd to stop accepting
>>> new connections forever.
>>>
>>> Example race scenario (assuming libvirtd is using policykit and the
>>> client is privileged):
>>>   1. The client calls the RPC remoteDispatchAuthList =>
>>>      remoteDispatchAuthList is executed on a worker thread (Thread
>>>      T1). We're assuming now the execution stops for some time before
>>>      the line 'virNetServerClientSetAuth(client, 0)'
>>>   2. The client closes the connection irregularly. This causes the
>>>      event loop to wake up and virNetServerProcessClient to be
>>>      called (on the main thread T0). During the
>>>      virNetServerProcessClients the srv lock is hold. The condition
>>>      virNetServerClientNeedAuth(client) will be checked and as the
>>>      authentication is not finished right now
>>>      virNetServerTrackCompletedAuthLocked(srv) will be called =>
>>>      --srv->nclients_unauth => 0
>>>   3. The Thread T1 continues, marks the client as authenticated, and
>>>      calls virNetServerTrackCompletedAuthLocked(srv) =>
>>>      --srv->nclients_unauth => --0 => wrap around as nclient_unauth
is
>>>      unsigned
>>>   4. virNetServerCheckLimits(srv) will disable the services forever
>>>
>>> To fix it, add an auth_pending field to the client struct so that it
>>> is now possible to determine if the authentication process has already
>>> been handled for this client.
>>>
>>> Setting the authentication method to none for the client in
>>> virNetServerProcessClients is not a proper way to indicate that the
>>> counter has been decremented, as this would imply that the client is
>>> authenticated.
>>>
>>> Additionally, adjust the existing test cases for this new field.
>>>
>>
>> In any case, I think perhaps this can be split into two patches - one
>> that just sets/clears the auth_pending and then what's leftover as the
>> fix.
>
> I’m not sure it’s necessary as the correct usage of auth_pending is
> already the fix. But let’s see then.
>

 OK. I understand. I assume by this point I was juggling in my head a few
 of the various adjustments so I figured that by adding the flag and
 separating the logic between just the set/clear flag and the reason it
 was added would be easier to review, but then again that would make the
 set/clear patch essentially meaningless.

>>
>>> diff --git a/src/libvirt_remote.syms b/src/libvirt_remote.syms
>>> index 7fcae970484d..cef2794e1122 100644
>>> --- a/src/libvirt_remote.syms
>>> +++ b/src/libvirt_remote.syms
>>> @@ -138,6 +138,7 @@ virNetServerClientGetUNIXIdentity;
>>>  virNetServerClientImmediateClose; 
[…snip…]

...
>>> @@ -375,6 +376,7 @@ static virNetServerClientPtr
>>>  virNetServerClientNewInternal(unsigned long long id,
>>>                                virNetSocketPtr sock,
>>>                                int auth,
>>> +                              bool auth_pending,
>>>  #ifdef WITH_GNUTLS
>>>                                virNetTLSContextPtr tls,
>>>  #endif
>>> @@ -384,6 +386,12 @@ virNetServerClientNewInternal(unsigned long long id,
>>>  {
>>>      virNetServerClientPtr client;
>>>
>>> +    /* If the used authentication method implies that the new client
>>> +     * is automatically authenticated, the authentication cannot be
>>> +     * pending */
>>> +    if (auth_pending &&
virNetServerClientAuthMethodImpliesAuthenticated(auth))
>>                            ^^^^^^
>> Why check twice? I'm not clear on the need for this check.  The setting
>> of auth_pending is based on @auth in the caller or perhaps the value of
>> auth_pending already. If there's some sort of check to be made perhaps
>> it's in ExecRestart.
>
> Normally I would prefer an assert here… The intent behind this guard was
> that it should prevent the wrong usage of this function.
>
> But I’ll move the check.
>

 I think this is perhaps why I made the separate the patches comment. I
 was starting to overflow my heap.

>>
>> Also, if we returned NULL here without some sort of error, then couldn't
>> we get the libvirt failed for some reason error message which is always
>> not fun to track down.
>
> Does it makes sense to leave the check here (and additionally add it to
> ExecRestart) and throw an error message here?
>

 Adding a check, a message, and having a NULL return to guard against
 some future change passing the wrong parameters perhaps means that
 future patch submitter (and eventual reviewer) didn't clearly understand
 the purpose of the function... 
Yep. That’s why I like asserts… (for functions used internally
only). You can 'tell' other developers what are the
preconditions/postconditions and help them to understand the code
flow. Sorry for the off topic…

Short answer, I’ll move the check.

...

 Still this is a *NewInternal function - I would hope passing correct
 arguments wouldn't be problem for it. 
Me too.

...

>>
>>> +        return NULL;
>>> +
>>>      if (virNetServerClientInitialize() < 0)
>>>          return NULL;
>>>
>>> @@ -393,6 +401,7 @@ virNetServerClientNewInternal(unsigned long long id,
>>>      client->id = id;
>>>      client->sock = virObjectRef(sock);
>>>      client->auth = auth;
>>> +    client->auth_pending = auth_pending;
>>>      client->readonly = readonly;
>>>  #ifdef WITH_GNUTLS
>>>      client->tlsCtxt = virObjectRef(tls);
>>> @@ -440,6 +449,7 @@ virNetServerClientPtr virNetServerClientNew(unsigned long
long id,
>>>  {
>>>      virNetServerClientPtr client;
>>>      time_t now;
>>> +    bool auth_pending =
!virNetServerClientAuthMethodImpliesAuthenticated(auth);
>>
>> I think it's much easier for readability to have result be auth !=
>> VIR_NET_SERVER_SERVICE_AUTH_NONE instead of this call... Perhaps even
>> easier to find all instances of AUTH_NONE.
>
> This function/condition is used in four places. But sure, if you’re
> still saying it’s easier to read and to understand that a user is
> authenticated (and therefore the authentication is no longer pending) if
> auth_method == VIR_NET_SERVER_SERVICE_AUTH_NONE then I’ll replace this
> function at all places :)
>

 By this time I wasn't sure what was going to happen from Dan's patch 8
 comment and the virNetServerClientNeedAuth[Locked] functions vs. the
 name chosen (ImpliesAuthenticated) that was AFAICT trying to convey what
 you discovered.

 Whether the comparison is hidden behind some call or a direct comparison
 I guess just becomes an implementation detail. I use cscope a lot to
 find things, so searching on NONE is perhaps easier that searching on
 NONE, finding some API that returning true/false based off of NONE, and
 then searching on callers for that too.  OTOH, it's cleaner to have a
 helper <sigh>.

 Shorter answer, your call. If you want the helper, that's fine. 
I’ll think about it.

[…snip]

--
Beste Grüße / Kind regards
   Marc Hartmayer

IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] [PATCH 11/14] rpc: virnetserver: Fix race on srv->nclients_unauth