[libvirt] [PATCH v2 0/2] Some qemu monitor crashers fix

An improved versions of my previous attempt. The 1/2 was leaking the @vm. The 2/2 had issues which Jan pointed out. Michal Privoznik (2): qemu: Avoid double free of VM qemu: Don't access vm->priv on unlocked domain src/qemu/qemu_migration.c | 8 ++++---- src/qemu/qemu_process.c | 16 +++------------- 2 files changed, 7 insertions(+), 17 deletions(-) -- 1.8.1.5

One of my previous patches (c7ac2519b7f) did try to fix the issue when domain dies too soon during migration. However, this clumsy approach was missing removal of qemuProcessHandleMonitorDestroy resulting in double unrefing of mon->vm and hence producing the daemon crash: ==11843== Invalid read of size 4 ==11843== at 0x50C28C5: virObjectUnref (virobject.c:255) ==11843== by 0x1148F7DB: qemuMonitorDispose (qemu_monitor.c:258) ==11843== by 0x50C2991: virObjectUnref (virobject.c:262) ==11843== by 0x50C2D13: virObjectFreeCallback (virobject.c:388) ==11843== by 0x509C37B: virEventPollCleanupHandles (vireventpoll.c:583) ==11843== by 0x509C711: virEventPollRunOnce (vireventpoll.c:652) ==11843== by 0x509A620: virEventRunDefaultImpl (virevent.c:274) ==11843== by 0x520D21C: virNetServerRun (virnetserver.c:1112) ==11843== by 0x11F368: main (libvirtd.c:1513) ==11843== Address 0x13b88864 is 4 bytes inside a block of size 136 free'd ==11843== at 0x4A07F5C: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==11843== by 0x5079A2F: virFree (viralloc.c:580) ==11843== by 0x50C29E3: virObjectUnref (virobject.c:270) ==11843== by 0x114770E4: qemuProcessHandleMonitorDestroy (qemu_process.c:1103) ==11843== by 0x1148F7CB: qemuMonitorDispose (qemu_monitor.c:257) ==11843== by 0x50C2991: virObjectUnref (virobject.c:262) ==11843== by 0x50C2D13: virObjectFreeCallback (virobject.c:388) ==11843== by 0x509C37B: virEventPollCleanupHandles (vireventpoll.c:583) ==11843== by 0x509C711: virEventPollRunOnce (vireventpoll.c:652) ==11843== by 0x509A620: virEventRunDefaultImpl (virevent.c:274) ==11843== by 0x520D21C: virNetServerRun (virnetserver.c:1112) ==11843== by 0x11F368: main (libvirtd.c:1513) Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_process.c | 16 +++------------- 1 file changed, 3 insertions(+), 13 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 7c23eb4..09d9dbd 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1095,14 +1095,6 @@ error: return -1; } - -static void qemuProcessHandleMonitorDestroy(qemuMonitorPtr mon ATTRIBUTE_UNUSED, - virDomainObjPtr vm, - void *opaque ATTRIBUTE_UNUSED) -{ - virObjectUnref(vm); -} - static int qemuProcessHandleTrayChange(qemuMonitorPtr mon ATTRIBUTE_UNUSED, virDomainObjPtr vm, @@ -1366,7 +1358,6 @@ cleanup: static qemuMonitorCallbacks monitorCallbacks = { - .destroy = qemuProcessHandleMonitorDestroy, .eofNotify = qemuProcessHandleMonitorEOF, .errorNotify = qemuProcessHandleMonitorError, .diskSecretLookup = qemuProcessFindVolumeQcowPassphrase, @@ -1403,7 +1394,7 @@ qemuConnectMonitor(virQEMUDriverPtr driver, virDomainObjPtr vm, int logfd) } /* Hold an extra reference because we can't allow 'vm' to be - * deleted while the monitor is active */ + * deleted while the monitor is unlocked */ virObjectRef(vm); ignore_value(virTimeMillisNow(&priv->monStart)); @@ -1419,11 +1410,10 @@ qemuConnectMonitor(virQEMUDriverPtr driver, virDomainObjPtr vm, int logfd) ignore_value(qemuMonitorSetDomainLog(mon, logfd)); virObjectLock(vm); + virObjectUnref(vm); priv->monStart = 0; - if (mon == NULL) { - virObjectUnref(vm); - } else if (!virDomainObjIsActive(vm)) { + if (!virDomainObjIsActive(vm)) { qemuMonitorClose(mon); mon = NULL; } -- 1.8.1.5

On 11/06/2013 07:05 PM, Michal Privoznik wrote:
One of my previous patches (c7ac2519b7f) did try to fix the issue when domain dies too soon during migration. However, this clumsy approach was missing removal of qemuProcessHandleMonitorDestroy resulting in double unrefing of mon->vm and hence producing the daemon crash:
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_process.c | 16 +++------------- 1 file changed, 3 insertions(+), 13 deletions(-)
@@ -1403,7 +1394,7 @@ qemuConnectMonitor(virQEMUDriverPtr driver, virDomainObjPtr vm, int logfd) }
/* Hold an extra reference because we can't allow 'vm' to be - * deleted while the monitor is active */ + * deleted while the monitor is unlocked */
This seems wrong, how about "deleted unitl the monitor gets its own reference"?
virObjectRef(vm);
ignore_value(virTimeMillisNow(&priv->monStart));
ACK

Since 86d90b3a (yes, my patch; again) we are supporting NBD storage migration. However, on error recovery path we got the steps reversed. The correct order is: return NBD port to the virPortAllocator and then either unlock the vm or remove it from the driver. Not vice versa. ==11192== Invalid write of size 4 ==11192== at 0x11488559: qemuMigrationPrepareAny (qemu_migration.c:2459) ==11192== by 0x11488EA6: qemuMigrationPrepareDirect (qemu_migration.c:2652) ==11192== by 0x114D1509: qemuDomainMigratePrepare3Params (qemu_driver.c:10332) ==11192== by 0x519075D: virDomainMigratePrepare3Params (libvirt.c:7290) ==11192== by 0x1502DA: remoteDispatchDomainMigratePrepare3Params (remote.c:4798) ==11192== by 0x12DECA: remoteDispatchDomainMigratePrepare3ParamsHelper (remote_dispatch.h:5741) ==11192== by 0x5212127: virNetServerProgramDispatchCall (virnetserverprogram.c:435) ==11192== by 0x5211C86: virNetServerProgramDispatch (virnetserverprogram.c:305) ==11192== by 0x520A8FD: virNetServerProcessMsg (virnetserver.c:165) ==11192== by 0x520A9E1: virNetServerHandleJob (virnetserver.c:186) ==11192== by 0x50DA78F: virThreadPoolWorker (virthreadpool.c:144) ==11192== by 0x50DA11C: virThreadHelper (virthreadpthread.c:161) ==11192== Address 0x1368baa0 is 576 bytes inside a block of size 688 free'd ==11192== at 0x4A07F5C: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==11192== by 0x5079A2F: virFree (viralloc.c:580) ==11192== by 0x11456C34: qemuDomainObjPrivateFree (qemu_domain.c:267) ==11192== by 0x50F41B4: virDomainObjDispose (domain_conf.c:2034) ==11192== by 0x50C2991: virObjectUnref (virobject.c:262) ==11192== by 0x50F4CFC: virDomainObjListRemove (domain_conf.c:2361) ==11192== by 0x1145C125: qemuDomainRemoveInactive (qemu_domain.c:2087) ==11192== by 0x11488520: qemuMigrationPrepareAny (qemu_migration.c:2456) ==11192== by 0x11488EA6: qemuMigrationPrepareDirect (qemu_migration.c:2652) ==11192== by 0x114D1509: qemuDomainMigratePrepare3Params (qemu_driver.c:10332) ==11192== by 0x519075D: virDomainMigratePrepare3Params (libvirt.c:7290) ==11192== by 0x1502DA: remoteDispatchDomainMigratePrepare3Params (remote.c:4798) Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_migration.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 7d8b727..4148954 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -2450,14 +2450,14 @@ cleanup: VIR_FORCE_CLOSE(dataFD[0]); VIR_FORCE_CLOSE(dataFD[1]); if (vm) { - if (ret >= 0 || vm->persistent) - virObjectUnlock(vm); - else - qemuDomainRemoveInactive(driver, vm); if (ret < 0) { virPortAllocatorRelease(driver->remotePorts, priv->nbdPort); priv->nbdPort = 0; } + if (ret >= 0 || vm->persistent) + virObjectUnlock(vm); + else + qemuDomainRemoveInactive(driver, vm); } if (event) qemuDomainEventQueue(driver, event); -- 1.8.1.5

On 11/06/2013 07:05 PM, Michal Privoznik wrote:
Since 86d90b3a (yes, my patch; again) we are supporting NBD storage migration. However, on error recovery path we got the steps reversed. The correct order is: return NBD port to the virPortAllocator and then either unlock the vm or remove it from the driver. Not vice versa.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_migration.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
ACK
participants (2)
-
Ján Tomko
-
Michal Privoznik