On 22.12.2017 17:13, John Ferlan wrote:
[...]
>>
>> Still adding the "virHashRemoveAll(dmn->servers);" into
>> virNetDaemonClose doesn't help the situation as I can still either crash
>> randomly or hang, so I'm less convinced this would really fix anything.
>> It does change the "nature" of the hung thread stack trace though, as
>> the second thread is now:
>
> virHashRemoveAll is not enough now. Due to unref reordeing last ref to @srv is
> unrefed after virStateCleanup. So we need to virObjectUnref(srv|srvAdm) before
> virStateCleanup. Or we can call virThreadPoolFree from virNetServerClose (
> as in the first version of the patch and as Erik suggests) instead
> of virHashRemoveAll.
>
Patches w/
1. Long pause before GetAllStats (without using [u]sleep)
2. Adjustment to call virNetServerServiceToggle in
virNetServerServiceClose (instead of virNetServerDispose)
3. Call virHashRemoveAll in virNetDaemonClose
4. Call virThreadPoolFree in virNetServerClose
5. Perform Unref (adminProgram, srvAdm, qemuProgram, lxcProgram,
remoteProgream, and srv) before virNetDaemonClose
Still has the virCondWait's - so as Daniel points out there's quite a
bit more work to be done. Like most Red Hat engineers - I will not be
very active over the next week or so (until the New Year) as it's a
holiday break/vacation for us.
So unless you have the burning desire to put together some patches and
do the work yourself, more thoughts/work will need to wait.
John
I've checked what's going on after applying patch you described above
(however it would be enough to apply only 3 (or 4) and part of 5 besides
pause hunk). I get hangs too and this kind of hangs are fixed by
second series - '[PATCH 0/4] libvirtd: fix hang on termination in qemu driver'.
That is there is a next hang backtrace besides hang in thread
freeing thread pool you already mentioned:
#0 pthread_cond_wait@(a)GLIBC_2.3.2 () at
../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x00007ffff7335c58 in virCondWait (c=0x7fffc4000e18, m=0x7fffc4000df0) at
util/virthread.c:154
#2 0x00007fffd9605983 in qemuMonitorSend (mon=0x7fffc4000de0, msg=0x7fffe70bd1f0) at
qemu/qemu_monitor.c:1067
#3 0x00007fffd961b68f in qemuMonitorJSONCommandWithFd (mon=0x7fffc4000de0,
cmd=0x7fffb0005310, scm_fd=-1,
reply=0x7fffe70bd2d0) at qemu/qemu_monitor_json.c:300
#4 0x00007fffd961b7c1 in qemuMonitorJSONCommand (mon=0x7fffc4000de0, cmd=0x7fffb0005310,
reply=0x7fffe70bd2d0)
at qemu/qemu_monitor_json.c:330
#5 0x00007fffd9629f0b in qemuMonitorJSONGetObjectListPaths (mon=0x7fffc4000de0,
path=0x7fffd96a7c96 "/machine/peripheral", paths=0x7fffe70bd380) at
qemu/qemu_monitor_json.c:5715
#6 0x00007fffd962dcc4 in qemuMonitorJSONFindObjectPathByAlias (mon=0x7fffc4000de0,
name=0x7fffd969f3cd "virtio-balloon-pci", alias=0x7fffcc1e8d30
"balloon0", path=0x7fffe70bd450)
at qemu/qemu_monitor_json.c:7235
#7 0x00007fffd962e231 in qemuMonitorJSONFindLinkPath (mon=0x7fffc4000de0,
name=0x7fffd969f3cd "virtio-balloon-pci",
alias=0x7fffcc1e8d30 "balloon0", path=0x7fffe70bd450) at
qemu/qemu_monitor_json.c:7349
#8 0x00007fffd9605bf7 in qemuMonitorInitBalloonObjectPath (mon=0x7fffc4000de0,
balloon=0x7fffcc1e8e60)
at qemu/qemu_monitor.c:1157
#9 0x00007fffd96098d3 in qemuMonitorGetMemoryStats (mon=0x7fffc4000de0,
balloon=0x7fffcc1e8e60,
stats=0x7fffe70bd5b0, nr_stats=10) at qemu/qemu_monitor.c:2133
#10 0x00007fffd964e70c in qemuDomainMemoryStatsInternal (driver=0x7fffcc1872a0,
vm=0x7fffcc2737e0,
stats=0x7fffe70bd5b0, nr_stats=10) at qemu/qemu_driver.c:11453
#11 0x00007fffd9667013 in qemuDomainGetStatsBalloon (driver=0x7fffcc1872a0,
dom=0x7fffcc2737e0,
record=0x7fffb00008c0, maxparams=0x7fffe70bd6b0, privflags=1) at
qemu/qemu_driver.c:19478
#12 0x00007fffd9669597 in qemuDomainGetStats (conn=0x7fffb80030e0, dom=0x7fffcc2737e0,
stats=127,
record=0x7fffe70bd790, flags=1) at qemu/qemu_driver.c:20133
#13 0x00007fffd966997f in qemuConnectGetAllDomainStats (conn=0x7fffb80030e0,
doms=0x7fffb0005220, ndoms=1,
stats=127, retStats=0x7fffe70bd8e0, flags=0) at qemu/qemu_driver.c:20226
#14 0x00007ffff7424fd7 in virDomainListGetStats (doms=0x7fffb0005220, stats=0,
retStats=0x7fffe70bd8e0, flags=0)
at libvirt-domain.c:11595
#15 0x00005555555ac030 in remoteDispatchConnectGetAllDomainStats (server=0x55555612a3a0,
client=0x555556151d10,
msg=0x555556152540, rerr=0x7fffe70bda20, args=0x7fffb00036e0, ret=0x7fffb0002d20) at
remote.c:6538
I'm writing this not to involve you back into the work and do not expect a reply. It
is holydays)
Only to document my research.
Nikolay