On 10/08/2013 10:31 AM, Michal Privoznik wrote:
I've noticed a SIGSEGV-ing libvirtd on the destination when the
qemu
died too quickly = in Prepare phase. What is happening here is:
1) [Thread 3493] We are in qemuMigrationPrepareAny() and calling
qemuProcessStart() which subsequently calls qemuProcessWaitForMonitor()
and qemuConnectMonitor(). So far so good. The qemuMonitorOpen()
succeeds, however switching monitor to QMP mode fails as qemu died
meanwhile. That is qemuMonitorSetCapabilities() returns -1.
2013-10-08 15:54:10.629+0000: 3493: debug : qemuMonitorSetCapabilities:1356 :
mon=0x14a53da0
2013-10-08 15:54:10.630+0000: 3493: debug : qemuMonitorJSONCommandWithFd:262 : Send
command
'{"execute":"qmp_capabilities","id":"libvirt-1"}'
for write with FD -1
2013-10-08 15:54:10.630+0000: 3493: debug : virEventPollUpdateHandle:147 :
EVENT_POLL_UPDATE_HANDLE: watch=17 events=13
...
2013-10-08 15:54:10.631+0000: 3493: debug : qemuMonitorSend:956 : QEMU_MONITOR_SEND_MSG:
mon=0x14a53da0
msg={"execute":"qmp_capabilities","id":"libvirt-1"}
fd=-1
2013-10-08 15:54:10.631+0000: 3262: debug : virEventPollRunOnce:641 : Poll got 1
event(s)
2) [Thread 3262] The event loop is trying to do the talking to monitor.
However, qemu is dead already, remember?
2013-10-08 15:54:13.436+0000: 3262: error : qemuMonitorIORead:551 : Unable to read from
monitor: Connection reset by peer
2013-10-08 15:54:13.516+0000: 3262: debug : virFileClose:90 : Closed fd 25
...
2013-10-08 15:54:13.533+0000: 3493: debug : qemuMonitorSend:968 : Send command resulted
in error internal error: early end of file from monitor: possible problem:
3) [Thread 3493] qemuProcessStart() failed. No big deal. Go to the
'endjob' label and subsequently to the 'cleanup'. Since the domain is
not persistent and ret is -1, the qemuDomainRemoveInactive() is called.
This has an (unpleasant) effect of virObjectUnref()-in the @vm object.
Unpleasant because the event loop which is about to trigger EOF callback
still holds a pointer to the @vm (not the reference). See the valgrind
output below.
4) [Thread 3262] So the even loop starts triggering EOF:
s/even/event/
2013-10-08 15:54:13.542+0000: 3262: debug : qemuMonitorIO:729 : Triggering EOF callback
2013-10-08 15:54:13.543+0000: 3262: debug : qemuProcessHandleMonitorEOF:294 : Received
EOF on 0x14549110 'migt10'
And the monitor is cleaned up. This results in calling
qemuProcessHandleMonitorEOF with the @vm pointer passed. The pointer is
kept in qemuMonitor struct.
==3262== Thread 1:
==3262== Invalid read of size 4
The mon->vm is set in qemuMonitorOpenInternal() which is the correct
place to increase @vm ref counter. The correct place to decrease the ref
counter is then qemuMonitorDispose(). However, in order to avoid cyclic
calling of dispose functions, we must use hack, and set priv->mon to
NULL. The qemuDomainObjPrivateFree won't call us again then. This
describes the last two chunks of the commit. The others are there for
handling virObjectRef() and virObjectUnref() correctly. So far it
doesn't work well with NULL virClassPtr :)
Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>
---
Yet another one of those patches, where chasing it down took a half of a day
and description what went wrong is longer than a few lines of fix.
Yeah, we've had a few of those lately.
src/qemu/qemu_capabilities.c | 14 ++++++++++----
src/qemu/qemu_monitor.c | 12 ++++++++++++
2 files changed, 22 insertions(+), 4 deletions(-)
@@ -781,6 +792,7 @@ qemuMonitorOpenInternal(virDomainObjPtr vm,
}
mon->fd = fd;
mon->hasSendFD = hasSendFD;
+ virObjectRef(vm);
mon->vm = vm;
Could be written:
mon->vm = virObjectRef(vm);
Overall, looks fine to me, but I'd feel more comfortable if you also got
Dan to review it before pushing.
ACK.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library
http://libvirt.org