On 17.10.2017 15:34, John Ferlan wrote:
On 10/17/2017 03:40 AM, Nikolay Shirokovskiy wrote:
>
>
> On 16.10.2017 15:47, John Ferlan wrote:
>>
>>
>> On 09/27/2017 08:45 AM, Nikolay Shirokovskiy wrote:
>>> Current daemon shutdown can cause crashes. The problem is that threads
>>> serving client request are joined on daemon dispose after drivers already
>>> cleaned up. But this threads typically uses drivers and thus crashes come.
>>> We need to join threads before virStateCleanup. virNetDaemonClose is
>>> a good candidate.
>>>
>>> But it turns out that we can hang on join. The problem is that at this
>>> moment event loop is not functional and for example threads waiting for
>>> qemu response will never finish. Let's introduce extra shutdown step
>>> for drivers so that they can signal any API calls in progress to finish.
>>> ---
>>> daemon/libvirtd.c | 2 ++
>>> src/driver-state.h | 4 ++++
>>> src/libvirt.c | 18 ++++++++++++++++++
>>> src/libvirt_internal.h | 1 +
>>> src/libvirt_private.syms | 1 +
>>> src/rpc/virnetserver.c | 5 +++--
>>> 6 files changed, 29 insertions(+), 2 deletions(-)
>>>
>>
>> So - first off - this patch is doing 2 things:
>>
>> 1. Introduce a new driver State function - "Shutdown"
>>
>> 2. Moving virThreadPoolFree from virNetServerDispose to virNetServerClose
>>
>> The two do not seem to be related so, they'd need to be separated.
>>
>> It appears the motivation behind the StateShutdown is to "force" the
>> close the qemu monitor and agent, but it doesn't seem that's really
>> related to the virNet{Daemon|Server}* timing issue. So let's consider
>> that separately... and I'm not considering it now...
>>
>> Focusing more on #2 - the move of the virThreadPoolFree would seem to
>> hasten the eventual free. Or more to the point ensure it happens. While
>> that does resolve the problem, I don't think it's the best or actual fix
>> to what it appears the real problem is.
>
> The problem is related to the fact that virThreadPoolFree does 2 things.
>
> 1. finishes all worker threads
> 2. do actual free
>
> What I want is to hasten #1 not actual free.
>
Right - but because your fix just calls virThreadPoolFree and sets
srv->workers = NULL - that caused me to wonder why we had to do that
when it "should" be done once we're done using the servers hash table
entry... IOW: Something should Unref sooner.
BTW: Your 1 and 2 are all the same to me - the Unref being the more key
component because we know virObjectUnref(srv) should be the "last"
reference to @srv. It's one of those ordering things. It should be do
A, then B to allocate/start and then undo B and undo A on cleanup. In
this case we partially undo B, then undo A, and eventually complete the
undo B much later on.
>>
>> Taking it from the Start/Alloc/Ref side - @srv gets a Ref at
>> virNetServerNew and then again at virNetDaemonAddServer. So each @srv
>> has two refs, which means in order for virNetServerDispose to be called,
>> there would need to be two Unref's; however, I can only find one.
>>
>> During cleanup: @srv is Unref'd after virNetDaemonClose, but I'm not
>> finding the other one. Do you recall where you may have seen it? I'm
>
> When the daemon object is unref'd at the end of daemon main function
> servers are unrefed as part of daemon dispose.>
OK - right, virHashFree does end up making that free call, but as you
note, much too late.
>
>> assuming the answer is no, there wasn't one and hence why you moved that
>> virThreadPoolFree call.
>
> Not entirely true. I moved this call so that all client request
> are finished before we clean up drivers state. Otherwise client requests
> will see disposed objects in the middele of operation and crashes occur.
>
So while the move "works" it does so because it's bypassing the
'proper'
way to resolve this by Unref() the table elements when the decision is
made by the the code that it is done with them...
>>
>> Since at virNetDaemonAddServer we add a Ref to @srv, then during
>> virNetDaemonClose I'd expect that for each server on dmn->servers
>> there'd be the virNetServerClose and a removal from the HashTable and
>> Unref of the @srv object which I'm not seeing. If that was there, then
>> the virNetServerDispose would call virThreadPoolFree right when the
>> Unref was done on @srv. The better fix, I believe is a call to
>> virHashRemoveAll in virNetDaemonClose which does that removal of @srv
>> from the dmn->servers hash table. NB this would fix a memory leak since
>> the eventual virHashFree(dmn->servers) doesn't do free the elements of
>> the hash table when virNetDaemonDispose is called as a result of the
>> virObjectUnref(dmn) at the bottom of main() in daemon/libvirtd.c.
>
> Servers hash table created with free function virObjectFreeHashData
> which will unref servers when hash table is freed.
>
> As to clearing servers hash table at virNetDaemonClose. To me daemon
> has close and dispose function and looks like dispose function is
> more suitable for freeing servers for the sake of functional groupping.
> Why we distinct close/dispose functions - I guess one can potentially
> close/open daemon and reuse daemon object but there is no such usage
> currently.
>
If virNetDaemonNew() took @srv as a parameter, then I agree as part of
Dispose, the Unref of each @srv would be more appropriate. E.g., the
dmn->servers hash table is allocated during New and Free'd during Dispose.
Logically, since @srv is added to dmn->servers after @dmn is allocated,
then when we Close the @dmn, we should then remove the @srv's that we
find, right? The Close function will currently call for each @srv the
daemonServerClose function to call the virNetServerClose helper.
Since we have the dmn lock and for every dmn->servers hash table entry
we've closed, then we should also remove the entries from dmn->servers
at that time. That means either doing them one at a time during
daemonServerClose (remember that @srv is our entry) or all at once after
we've gone through each entry.
I think we can go any of the ways until there is no more daemon usecases.
BTW: I did actually test using the more simplified approach of calling
virHashRemoveAll after virHashForEach in virNetDaemonClose. Prior to the
adjustment, I saw the stack trace (more or less) as noted in the cover,
but with the patch in place, the client would eventually get the
keepalive timeout message as opposed to the connection reset message.
Yes and the patch series fixes this hung besides crash too )
>>
>> As an aside (and I think something else that needs to be fixed), there's
>> virNetDaemonAddServerPostExec which adds a @srv to dmn->services, but
>> never does the virObjectRef after the virHashAddEntry call. That would
>> need to be a patch before the patch that adds the virHashRemoveAll.
>
> Agree. By the way virtlogd and virtlockd treat differently
virNetDaemonAddServerPostExec.
> The first stores reference to server object and unref it on daemon dispose as if
> virNetDaemonAddServerPostExec return server with extra reference. The second just
> drops the returned server object as if there is no extra reference. So if we
> add extra reference we need to patch virtlockd as well.
I didn't dig through all that code... My viewpoint was more that we
allocate @srv, then we Add it to the dmn->servers, but don't Ref it as
we did during virNetDaemonAddServer when @srv was added to dmn->servers.
Yes virNetDaemonAddServerPostExec definetely have to take extra ref. One
ref goes to hash table and one ref goes to the caller. We can not put
taking ref to the caller because of multi threading.
But yes, I see virtlockd would need some adjustment too since that would
seem to be "more correct". To some degree the fact that Disposal of
@srv is done during virHashFree probably helps in this case. Still
makes my brain hurt thinking about it.
John
So I can split this patch to 2 and clear servers hash table as you
suggusted instead of tossing virThreadPoolFree and add 2 extra
patches to make referencing at virNetDaemonAddServerPostExec straight.
This will take almost no time thus let's move to the other parts of the series.
Nikolay
>
> The original server reference owner is servers table I mean.
>
>>
>> Make sense? This is a very interesting/good catch to a problem - let's
>> just get the right fix!
>>
>> Tks -
>>
>> John
>>
>>> diff --git a/daemon/libvirtd.c b/daemon/libvirtd.c
>>> index 589b321..d2bbe1e 100644
>>> --- a/daemon/libvirtd.c
>>> +++ b/daemon/libvirtd.c
>>> @@ -1504,6 +1504,8 @@ int main(int argc, char **argv) {
>>> virObjectUnref(lxcProgram);
>>> virObjectUnref(qemuProgram);
>>> virObjectUnref(adminProgram);
>>> + if (driversInitialized)
>>> + virStateShutdown();
>>> virNetDaemonClose(dmn);
>>> virObjectUnref(srv);
>>> virObjectUnref(srvAdm);
>>> diff --git a/src/driver-state.h b/src/driver-state.h
>>> index 1cb3e4f..ea549a7 100644
>>> --- a/src/driver-state.h
>>> +++ b/src/driver-state.h
>>> @@ -42,6 +42,9 @@ typedef int
>>> typedef int
>>> (*virDrvStateStop)(void);
>>>
>>> +typedef void
>>> +(*virDrvStateShutdown)(void);
>>> +
>>> typedef struct _virStateDriver virStateDriver;
>>> typedef virStateDriver *virStateDriverPtr;
>>>
>>> @@ -52,6 +55,7 @@ struct _virStateDriver {
>>> virDrvStateCleanup stateCleanup;
>>> virDrvStateReload stateReload;
>>> virDrvStateStop stateStop;
>>> + virDrvStateShutdown stateShutdown;
>>> };
>>>
>>>
>>> diff --git a/src/libvirt.c b/src/libvirt.c
>>> index 6d66fa4..ff41764 100644
>>> --- a/src/libvirt.c
>>> +++ b/src/libvirt.c
>>> @@ -812,6 +812,24 @@ virStateCleanup(void)
>>>
>>>
>>> /**
>>> + * virStateShutdown:
>>> + *
>>> + * Run each virtualization driver's shutdown method.
>>> + *
>>> + */
>>> +void
>>> +virStateShutdown(void)
>>> +{
>>> + int r;
>>> +
>>> + for (r = virStateDriverTabCount - 1; r >= 0; r--) {
>>> + if (virStateDriverTab[r]->stateShutdown)
>>> + virStateDriverTab[r]->stateShutdown();
>>> + }
>>> +}
>>> +
>>> +
>>> +/**
>>> * virStateReload:
>>> *
>>> * Run each virtualization driver's reload method.
>>> diff --git a/src/libvirt_internal.h b/src/libvirt_internal.h
>>> index 62f490a..9863b07 100644
>>> --- a/src/libvirt_internal.h
>>> +++ b/src/libvirt_internal.h
>>> @@ -36,6 +36,7 @@ int virStateInitialize(bool privileged,
>>> int virStateCleanup(void);
>>> int virStateReload(void);
>>> int virStateStop(void);
>>> +void virStateShutdown(void);
>>>
>>> /* Feature detection. This is a libvirt-private interface for determining
>>> * what features are supported by the driver.
>>> diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms
>>> index 5b1bc5e..59f8207 100644
>>> --- a/src/libvirt_private.syms
>>> +++ b/src/libvirt_private.syms
>>> @@ -1189,6 +1189,7 @@ virSetSharedStorageDriver;
>>> virStateCleanup;
>>> virStateInitialize;
>>> virStateReload;
>>> +virStateShutdown;
>>> virStateStop;
>>> virStreamInData;
>>>
>>> diff --git a/src/rpc/virnetserver.c b/src/rpc/virnetserver.c
>>> index 2b76daa..7dc8018 100644
>>> --- a/src/rpc/virnetserver.c
>>> +++ b/src/rpc/virnetserver.c
>>> @@ -764,8 +764,6 @@ void virNetServerDispose(void *obj)
>>> for (i = 0; i < srv->nservices; i++)
>>> virNetServerServiceToggle(srv->services[i], false);
>>>
>>> - virThreadPoolFree(srv->workers);
>>> -
>>> for (i = 0; i < srv->nservices; i++)
>>> virObjectUnref(srv->services[i]);
>>> VIR_FREE(srv->services);
>>> @@ -796,6 +794,9 @@ void virNetServerClose(virNetServerPtr srv)
>>> for (i = 0; i < srv->nservices; i++)
>>> virNetServerServiceClose(srv->services[i]);
>>>
>>> + virThreadPoolFree(srv->workers);
>>> + srv->workers = NULL;
>>> +
>>> virObjectUnlock(srv);
>>> }
>>>
>>>