Thanks at first, this patch some kinda solve my problem until now. But I still have a
doubt about this patch.
-----Original Message-----
From: libvir-list-bounces(a)redhat.com
[mailto:libvir-list-bounces@redhat.com] On Behalf Of Michal Privoznik
Sent: Friday, October 11, 2013 8:15 PM
To: libvir-list(a)redhat.com
Subject: [libvirt] [PATCH v2] qemu_migration: Avoid crashing if domain dies
too quickly
I've noticed a SIGSEGV-ing libvirtd on the destination when the qemu
died too quickly = in Prepare phase. What is happening here is:
1) [Thread 3493] We are in qemuMigrationPrepareAny() and calling
qemuProcessStart() which subsequently calls qemuProcessWaitForMonitor()
and qemuConnectMonitor(). So far so good. The qemuMonitorOpen()
succeeds, however switching monitor to QMP mode fails as qemu died
meanwhile. That is qemuMonitorSetCapabilities() returns -1.
2013-10-08 15:54:10.629+0000: 3493: debug :
qemuMonitorSetCapabilities:1356 : mon=0x14a53da0
2013-10-08 15:54:10.630+0000: 3493: debug :
qemuMonitorJSONCommandWithFd:262 : Send command
'{"execute":"qmp_capabilities","id":"libvirt-1"}'
for write with FD -1
2013-10-08 15:54:10.630+0000: 3493: debug :
virEventPollUpdateHandle:147 : EVENT_POLL_UPDATE_HANDLE: watch=17
events=13
...
2013-10-08 15:54:10.631+0000: 3493: debug : qemuMonitorSend:956 :
QEMU_MONITOR_SEND_MSG: mon=0x14a53da0
msg={"execute":"qmp_capabilities","id":"libvirt-1"}
fd=-1
2013-10-08 15:54:10.631+0000: 3262: debug : virEventPollRunOnce:641 :
Poll got 1 event(s)
2) [Thread 3262] The event loop is trying to do the talking to monitor.
However, qemu is dead already, remember?
2013-10-08 15:54:13.436+0000: 3262: error : qemuMonitorIORead:551 :
Unable to read from monitor: Connection reset by peer
2013-10-08 15:54:13.516+0000: 3262: debug : virFileClose:90 : Closed fd 25
...
2013-10-08 15:54:13.533+0000: 3493: debug : qemuMonitorSend:968 : Send
command resulted in error internal error: early end of file from monitor:
possible problem:
3) [Thread 3493] qemuProcessStart() failed. No big deal. Go to the
'endjob' label and subsequently to the 'cleanup'. Since the domain is
not persistent and ret is -1, the qemuDomainRemoveInactive() is called.
This has an (unpleasant) effect of virObjectUnref()-in the @vm object.
Unpleasant because the event loop which is about to trigger EOF callback
still holds a pointer to the @vm (not the reference). See the valgrind
output below.
4) [Thread 3262] So the even loop starts triggering EOF:
2013-10-08 15:54:13.542+0000: 3262: debug : qemuMonitorIO:729 :
Triggering EOF callback
2013-10-08 15:54:13.543+0000: 3262: debug :
qemuProcessHandleMonitorEOF:294 : Received EOF on 0x14549110 'migt10'
And the monitor is cleaned up. This results in calling
qemuProcessHandleMonitorEOF with the @vm pointer passed. The pointer
is
kept in qemuMonitor struct.
==3262== Thread 1:
==3262== Invalid read of size 4
==3262== at 0x77ECCAA: pthread_mutex_lock (in
/lib64/libpthread-2.15.so)
==3262== by 0x52FAA06: virMutexLock (virthreadpthread.c:85)
==3262== by 0x52E3891: virObjectLock (virobject.c:320)
==3262== by 0x11626743: qemuProcessHandleMonitorEOF
(qemu_process.c:296)
==3262== by 0x11642593: qemuMonitorIO (qemu_monitor.c:730)
==3262== by 0x52BD526: virEventPollDispatchHandles
(vireventpoll.c:501)
==3262== by 0x52BDD49: virEventPollRunOnce (vireventpoll.c:648)
==3262== by 0x52BBC68: virEventRunDefaultImpl (virevent.c:274)
==3262== by 0x542D3D9: virNetServerRun (virnetserver.c:1112)
==3262== by 0x11F368: main (libvirtd.c:1513)
==3262== Address 0x14549128 is 24 bytes inside a block of size 136 free'd
==3262== at 0x4C2AF5C: free (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==3262== by 0x529B1FF: virFree (viralloc.c:580)
==3262== by 0x52E3703: virObjectUnref (virobject.c:270)
==3262== by 0x531557E: virDomainObjListRemove
(domain_conf.c:2355)
==3262== by 0x1160E899: qemuDomainRemoveInactive
(qemu_domain.c:2061)
==3262== by 0x1163A0C6: qemuMigrationPrepareAny
(qemu_migration.c:2450)
==3262== by 0x1163A923: qemuMigrationPrepareDirect
(qemu_migration.c:2626)
==3262== by 0x11682D71: qemuDomainMigratePrepare3Params
(qemu_driver.c:10309)
==3262== by 0x53B0976: virDomainMigratePrepare3Params
(libvirt.c:7266)
==3262== by 0x1502D3:
remoteDispatchDomainMigratePrepare3Params (remote.c:4797)
==3262== by 0x12DECA:
remoteDispatchDomainMigratePrepare3ParamsHelper
(remote_dispatch.h:5741)
==3262== by 0x54322EB: virNetServerProgramDispatchCall
(virnetserverprogram.c:435)
The mon->vm is set in qemuMonitorOpenInternal() which is the correct
place to increase @vm ref counter. The correct place to decrease the ref
counter is then qemuMonitorDispose().
Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>
---
It turned out the hack from v1 is no longer needed. In fact, my reasoning
there
flavouring the hack was flawed.
src/qemu/qemu_capabilities.c | 14 ++++++++++----
src/qemu/qemu_monitor.c | 4 +++-
2 files changed, 13 insertions(+), 5 deletions(-)
diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c
index 7c39c1c..17095b4 100644
--- a/src/qemu/qemu_capabilities.c
+++ b/src/qemu/qemu_capabilities.c
@@ -2587,7 +2587,8 @@ virQEMUCapsInitQMP(virQEMUCapsPtr
qemuCaps,
char *monpath = NULL;
char *pidfile = NULL;
pid_t pid = 0;
- virDomainObj vm;
+ virDomainObjPtr vm = NULL;
+ virDomainXMLOptionPtr xmlopt = NULL;
/* the ".sock" sufix is important to avoid a possible clash with a qemu
* domain called "capabilities"
@@ -2650,10 +2651,13 @@ virQEMUCapsInitQMP(virQEMUCapsPtr
qemuCaps,
goto cleanup;
}
- memset(&vm, 0, sizeof(vm));
- vm.pid = pid;
+ if (!(xmlopt = virDomainXMLOptionNew(NULL, NULL, NULL)) ||
+ !(vm = virDomainObjNew(xmlopt)))
+ goto cleanup;
+
+ vm->pid = pid;
- if (!(mon = qemuMonitorOpen(&vm, &config, true, &callbacks, NULL)))
{
+ if (!(mon = qemuMonitorOpen(vm, &config, true, &callbacks, NULL))) {
ret = 0;
goto cleanup;
}
@@ -2673,6 +2677,8 @@ cleanup:
virCommandFree(cmd);
VIR_FREE(monarg);
VIR_FREE(monpath);
+ virObjectUnref(vm);
Is this virObjectUnref(vm) corresponding to mon->vm = virObjectRef(vm) added in
qemuMonitorOpenInternal?
If it is, how can we confirm virObjectRef(vm) has been done in qemuMonitorOpenInternal? If
an error (anyone follows)happened in qemuMonitorOpenInternal is before mon->vm =
virObjectRef(vm),
then we still goto cleanup and do virObjectUnref(vm), the refs will be wrong. Am I right?
if (!cb->eofNotify) {
virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
_("EOF notify callback must be supplied"));
return NULL;
}
if (!cb->errorNotify) {
virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
_("Error notify callback must be supplied"));
return NULL;
}
if (qemuMonitorInitialize() < 0)
return NULL;
if (!(mon = virObjectLockableNew(qemuMonitorClass)))
return NULL;
mon->fd = -1;
mon->logfd = -1;
if (virCondInit(&mon->notify) < 0) {
virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
_("cannot initialize monitor condition"));
goto cleanup;
}
mon->fd = fd;
mon->hasSendFD = hasSendFD;
mon->vm = virObjectRef(vm);
+ virObjectUnref(xmlopt);
if (pid != 0) {
char ebuf[1024];
diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c
index a601ee0..2bafe28 100644
--- a/src/qemu/qemu_monitor.c
+++ b/src/qemu/qemu_monitor.c
@@ -255,6 +255,8 @@ static void qemuMonitorDispose(void *obj)
VIR_DEBUG("mon=%p", mon);
if (mon->cb && mon->cb->destroy)
(mon->cb->destroy)(mon, mon->vm, mon->callbackOpaque);
+ virObjectUnref(mon->vm);
+
virCondDestroy(&mon->notify);
VIR_FREE(mon->buffer);
virJSONValueFree(mon->options);
@@ -781,7 +783,7 @@ qemuMonitorOpenInternal(virDomainObjPtr vm,
}
mon->fd = fd;
mon->hasSendFD = hasSendFD;
- mon->vm = vm;
+ mon->vm = virObjectRef(vm);
mon->json = json;
if (json)
mon->waitGreeting = true;
--
1.8.1.5
--
libvir-list mailing list
libvir-list(a)redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list