On 03/31/2017 11:30 AM, Chris Friesen wrote:
On 03/31/2017 11:21 AM, Chris Friesen wrote:
> I ran tcpdump looking for TCP traffic between the two libvirtd processes, and
> was unable to see any after several minutes. So it doesn't look like there is
> any regular keepalive messaging going on (/etc/libvirt/libvirtd.conf doesn't
> specify any keepalive settings so we'd be using the defaults I think). And yet
> the TCP connection is stuck open.
Turns out I ran tcpdump in the wrong window....oops. There's what appears to be
a keepalive sequence every 5 seconds.
I still don't understand why the connection wasn't taken down when qemu exited
on the destination host.
One final update for now....I attached gdb to libvirtd on the source host and
then killed libvirtd on the destination host. I saw the TCP connection get
closed down, and gdb showed this:
[Thread 0x7f8948ab3700 (LWP 4514) exited]
At this point "virsh" commands on the source host work as expected, it's no
longer hung.
So it appears we have a number of factors contributing to the hang:
1) failure of migration in qemu
2) connection between hosts not getting torn down when migration fails
3) the libvirtd thread managing the migration on the source side appears to be
sleeping indefinitely while holding a resource of some sort which causes the
apparent hang when we try to do other operations
Chris