On Wed, Feb 14, 2018 at 11:36 PM +0100, John Ferlan <jferlan(a)redhat.com> wrote:
On 01/26/2018 03:47 AM, Nikolay Shirokovskiy wrote:
>
>
> On 19.01.2018 20:23, John Ferlan wrote:
>> RFC:
>>
https://www.redhat.com/archives/libvir-list/2018-January/msg00318.html
>>
>> Adjustments since RFC...
>>
>> Patches 1&2: No change, were already R-B'd
>> Patch 3: Removed code as noted in code review, update commit message
>> Patch 4: From old series removed, see below for more details
>> Patch 9: no change
>> NB: Patches 5-8 and 10 from Nikolay Shirokovskiy
<nshirokovskiy(a)virtuozzo.com>
>> are removed as they seemed to not be necessary
>>
>> Replaced the former patch 4 with series of patches to (slowly) provide
>> support to disable new connections, handle removing waiting jobs, causing
>> the waiting workers to quit, and allow any running jobs to complete.
>>
>> As it turns out, waiting for running jobs to complete cannot be done
>> from the virNetServerClose callbacks because that means the event loop
>> processing done during virNetServerRun will not allow any currently
>> long running worker job thread a means to complete.
>
> Hi, John.
Sorry - been busy in other areas lately - trying to get back to this...
>
> So you suggest a different appoarch. Instead of introducing means to
> help rpc threads to finish after event loop is finished (stateShutdown hook in my
series)
> you suggest to block futher new rpc processing and then waiting running
> rpc calls to finish keeping event loop running for that purpose.
Somewhere along the way in one of the reviews it was suggested to give
the workers perhaps a few extra cycles to complete anything they might
be doing. It was also suggested a mechanism to do that - which is what I
tried to accomplish here.
Based on that and that your mechanism was more of an "immediate
separation" by IIRC closing the monitor connection and letting that
close handler do whatever magic happens there.
This not to say your mechanism is the wrong way to go about it, but it
also hasn't garnered support nor have you actively attempted to champion
it. So I presented an alternative approach.
>
> This could lead to libvirtd never finish its termination if one of
> qemu processes do not respond for example. In case of long running
> operation such as migration finishing can take much time. On the
> over hand if we finish rpc threads abruptly as in case of stateShutdown hook
> we will deal with all possible error paths on daemon finishing. May
> be good approach is to abort long running operation, then give rpc threads
> some time to finish as you suggest and only after that finish them abruptly to
handle
> hanging qemu etc.
True, but it's also easy enough to add something to the last and updated
patch to add the QuitTimer and force an exit without going through the
rest of the "friendly" quit (IOW: more or less what Dan has suggested).
There's an incredible amount of cycles being spent for what amounts to
trying to be nice from a SIGTERM, SIGQUIT, or SIGINT - IOW: eventual death.
>
> Also in this approach we keep event loop running for rpc threads only
> but there can be other threads that depend on the loop. For example if
> qemu is rebooted we spawn a thread that executes qemuProcessFakeReboot
> that sends commands to qemu (i.e. depends on event loop). We need to await
> such threads finishing too while keep event loop running. In approach of
> stateShutdown hook we finish all threads that uses qemu monitor by closing
> the monitor.
>
> And hack with timeout in a loop... I think of a different aproach for waiting
> rpc threads to finish their work. First let's make drain function only flush
> job queue and take some callback which will be called when all workers will
> be free from work (let's keep worker threads as Dan suggested). In this callback
we
> can use same technique as in virNetDaemonSignalHandler. That is make some
> IO to set some flag in the event loop thread and cause virEventRunDefaultImpl
> to finish and then test this flag in virNetDaemonRun.
>
Please feel free to spend as many cycles as you can making adjustments
to what I have. I'm working through some alterations and posting a v2 of
what I have and we'll see where it takes me/us.
John
Is there any update so far? I’m asking because I’m still getting
segmentation faults and hang-ups on termination of libvirtd (using the
newest version of libvirt).
Example for a hang-up:
➤ bt
#0 0x000003fffca8df84 in pthread_cond_wait@(a)GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x000003fffdac29ca in virCondWait (c=<optimized out>, m=<optimized out>)
at ../../src/util/virthread.c:154
#2 0x000003fffdac381c in virThreadPoolFree (pool=<optimized out>) at
../../src/util/virthreadpool.c:290
#3 0x000003fffdbb21ae in virNetServerDispose (obj=0x1000cc640) at
../../src/rpc/virnetserver.c:803
#4 0x000003fffda97286 in virObjectUnref (anyobj=<optimized out>) at
../../src/util/virobject.c:350
#5 0x000003fffda97a5a in virObjectFreeHashData (opaque=<optimized out>,
name=<optimized out>) at ../../src/util/virobject.c:591
#6 0x000003fffda66576 in virHashFree (table=<optimized out>) at
../../src/util/virhash.c:305
#7 0x000003fffdbaff82 in virNetDaemonDispose (obj=0x1000cc3c0) at
../../src/rpc/virnetdaemon.c:105
#8 0x000003fffda97286 in virObjectUnref (anyobj=<optimized out>) at
../../src/util/virobject.c:350
#9 0x0000000100026cd6 in main (argc=<optimized out>, argv=<optimized out>) at
../../src/remote/remote_daemon.c:1487
And segmentation faults happen for RPC jobs that are still running.
[…snip]
--
Beste Grüße / Kind regards
Marc Hartmayer
IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294