On 12/23/19 11:12 AM, Daniel P. Berrangé wrote:
On Mon, Dec 23, 2019 at 03:13:10PM +0800, Yi Wang wrote:
> From: Li XueLei <li.xuelei1(a)zte.com.cn>
>
> Libvirtd no longer receives external requests, after I do the following.
>
> Steps to reproduce:
> 1.Virsh tool initiates a non-p2p migrations.
> 2.After the phase of qemuDomainMigratePerform3 and before the phase of
> qemuDomainMigrateConfirm3, kill the virsh client which initiated the
> migration.
>
> What happens:
> Libvirtd will be blocked in the next call stack because it can't
> get the condition variables, and it will not be able to receive new
> requests.
>
> Thread 1 (Thread 0x7f36a7b9f8c0 (LWP 27556)):
> #0 0x00007f36a45799f5 in pthread_cond_wait@(a)GLIBC_2.3.2 () from
/lib64/libpthread.so.0
> #1 0x00007f36a6e7a97e in virCondWait () from /lib64/libvirt.so.0
> #2 0x00007f3679c6a1b3 in qemuMonitorSend () from
/usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> #3 0x00007f3679c837de in qemuMonitorJSONCommandWithFd () from
/usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> #4 0x00007f3679c93c5a in qemuMonitorJSONSetMigrationCapabilities () from
/usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> #5 0x00007f3679c786ed in qemuMonitorSetMigrationCapabilities () from
/usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> #6 0x00007f3679c67857 in qemuMigrationParamsApply () from
/usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> #7 0x00007f3679c67eff in qemuMigrationParamsReset () from
/usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> #8 0x00007f3679c5198a in qemuMigrationSrcCleanup () from
/usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> #9 0x00007f36a6dcd862 in virCloseCallbacksRun () from /lib64/libvirt.so.0
> #10 0x00007f3679cc28f6 in qemuConnectClose () from
/usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> #11 0x00007f36a70a0870 in virConnectDispose () from /lib64/libvirt.so.0
> #12 0x00007f36a6e3f43b in virObjectUnref () from /lib64/libvirt.so.0
> #13 0x00007f36a70a55c4 in virConnectClose () from /lib64/libvirt.so.0
> #14 0x000056058a206f92 in remoteClientFree ()
> #15 0x00007f36a6fa49a0 in virNetServerClientDispose () from /lib64/libvirt.so.0
> #16 0x00007f36a6e3f43b in virObjectUnref () from /lib64/libvirt.so.0
> #17 0x00007f36a6e3fd21 in virObjectFreeCallback () from /lib64/libvirt.so.0
> #18 0x00007f36a6f9942e in virNetSocketEventFree () from /lib64/libvirt.so.0
> #19 0x00007f36a6de8439 in virEventPollCleanupHandles () from /lib64/libvirt.so.0
> #20 0x00007f36a6de9ab6 in virEventPollRunOnce () from /lib64/libvirt.so.0
> #21 0x00007f36a6de817a in virEventRunDefaultImpl () from /lib64/libvirt.so.0
> #22 0x00007f36a6faadb5 in virNetDaemonRun () from /lib64/libvirt.so.0
> #23 0x000056058a1c5cc1 in main ()
Hmm, this is a bit strange - why are we trying to reset migration parameters
in a connection close callback in the first place ?
Because we are cancelling the ongoing migration because the connection
that initiated it was closed.
Libvirt allows multiple connections concurrently and changes made by one
connection are supposed to apply globally. No per-VM state should be tied
to a particular connection.
This is generally very true, except for migration.
IOW, I don't think we need a thread. We should just stop calling
qemuMigrationSrcCleanup from the connection close callback entirely
But I agree that something fishy is going on and this doesn't look like
proper solution. Yi, can you please share the backtrace of other threads
too? Why aren't we getting any reply on the monitor?
Michal