[libvirt] [PATCH 0/2] active commit fixes

In response to the valgrind trace I posted to the list last night. Valgrind is awesome :) I had a very hard time reproducing the race when run under gdb or at normal speeds, but valgrind was slow enough that I was hitting it more frequently, as well as getting information about the bug and where to go to fix it. Eric Blake (2): blockjob: avoid memory leak during block pivot blockjob: fix use-after-free in blockcopy src/qemu/qemu_driver.c | 40 +++++++++++++++++++++++++--------------- src/util/virstoragefile.c | 1 + 2 files changed, 26 insertions(+), 15 deletions(-) -- 1.9.3

Valgrind caught a memory leak: ==2018== 9 bytes in 1 blocks are definitely lost in loss record 143 of 927 ==2018== at 0x4A0645D: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==2018== by 0x8C42369: strdup (strdup.c:42) ==2018== by 0x50EACC9: virStrdup (virstring.c:676) ==2018== by 0x50E79E5: virStorageSourceCopy (virstoragefile.c:1845) ==2018== by 0x20A3FAA7: qemuDomainBlockCommit (qemu_driver.c:15620) ==2018== by 0x51DC6B2: virDomainBlockCommit (libvirt.c:20092) I traced it to the fact that blockcopy and blockcommit end up reparsing a backing chain on pivot, but the chain parsing code doesn't gracefully handle the case where the backing file is already known. I'm not exactly sure when this was introduced, but suspect that the refactoring in commit 9944b71 and friends that moved towards probing in-place rather than into a temporary structure are part of the cause. * src/util/virstoragefile.c (virStorageFileGetMetadataInternal): Don't leak any prior value. Signed-off-by: Eric Blake <eblake@redhat.com> --- src/util/virstoragefile.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/util/virstoragefile.c b/src/util/virstoragefile.c index 3da9073..5b6b2f5 100644 --- a/src/util/virstoragefile.c +++ b/src/util/virstoragefile.c @@ -817,6 +817,7 @@ virStorageFileGetMetadataInternal(virStorageSourcePtr meta, goto cleanup; } + VIR_FREE(meta->backingStoreRaw); if (fileTypeInfo[meta->format].getBackingStore != NULL) { int store = fileTypeInfo[meta->format].getBackingStore(&meta->backingStoreRaw, backingFormat, -- 1.9.3

On 08/06/14 23:12, Eric Blake wrote:
Valgrind caught a memory leak:
==2018== 9 bytes in 1 blocks are definitely lost in loss record 143 of 927 ==2018== at 0x4A0645D: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==2018== by 0x8C42369: strdup (strdup.c:42) ==2018== by 0x50EACC9: virStrdup (virstring.c:676) ==2018== by 0x50E79E5: virStorageSourceCopy (virstoragefile.c:1845) ==2018== by 0x20A3FAA7: qemuDomainBlockCommit (qemu_driver.c:15620) ==2018== by 0x51DC6B2: virDomainBlockCommit (libvirt.c:20092)
I traced it to the fact that blockcopy and blockcommit end up reparsing a backing chain on pivot, but the chain parsing code doesn't gracefully handle the case where the backing file is already known.
I'm not exactly sure when this was introduced, but suspect that the refactoring in commit 9944b71 and friends that moved towards probing in-place rather than into a temporary structure are part of the cause.
* src/util/virstoragefile.c (virStorageFileGetMetadataInternal): Don't leak any prior value.
Signed-off-by: Eric Blake <eblake@redhat.com> --- src/util/virstoragefile.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/src/util/virstoragefile.c b/src/util/virstoragefile.c index 3da9073..5b6b2f5 100644 --- a/src/util/virstoragefile.c +++ b/src/util/virstoragefile.c @@ -817,6 +817,7 @@ virStorageFileGetMetadataInternal(virStorageSourcePtr meta, goto cleanup; }
+ VIR_FREE(meta->backingStoreRaw); if (fileTypeInfo[meta->format].getBackingStore != NULL) { int store = fileTypeInfo[meta->format].getBackingStore(&meta->backingStoreRaw, backingFormat,
ACK Peter

On 08/07/2014 12:57 AM, Peter Krempa wrote:
On 08/06/14 23:12, Eric Blake wrote:
Valgrind caught a memory leak:
I traced it to the fact that blockcopy and blockcommit end up reparsing a backing chain on pivot, but the chain parsing code doesn't gracefully handle the case where the backing file is already known.
ACK
Pushed. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Commit febf84c2 tried to delay in-memory modification of the actual domain disk structure until after the qemu event was received. However, I missed that the code for block pivot had been temporarily setting disk->src = disk->mirror prior to the qemu command, in order to label the backing chain of a reused external blockcopy disk; and calls into qemu while still in that state before finally undoing things at the cleanup label. Since the qemu event handler then does: virStorageSourceFree(disk->src); disk->src = disk->mirror; we have the sad race that a fast enough qemu event can cause a leak of the original disk->src, as well as a use-after-free of the disk->mirror contents, bad enough to crash libvirtd in some of my test runs, even though the common case of the qemu event being much later won't trip the race. I'll go wear the brown paper bag of shame, for introducing a crasher in between rc1 and rc2 of the freeze for 1.2.7 :( My only consolation is that virDomainBlockJobAbort requires the domain:write ACL, so it is not a CVE. The valgrind report when the race occurs looks like: ==25612== Invalid read of size 4 ==25612== at 0x50E7C90: virStorageSourceGetActualType (virstoragefile.c:1948) ==25612== by 0x209C0B18: qemuDomainDetermineDiskChain (qemu_domain.c:2473) ==25612== by 0x209D7F6A: qemuProcessHandleBlockJob (qemu_process.c:1087) ==25612== by 0x209F40C9: qemuMonitorEmitBlockJob (qemu_monitor.c:1357) ... ==25612== Address 0xe4b5610 is 0 bytes inside a block of size 200 free'd ==25612== at 0x4A07577: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==25612== by 0x50839E9: virFree (viralloc.c:582) ==25612== by 0x50E7E51: virStorageSourceFree (virstoragefile.c:2015) ==25612== by 0x209D7EFF: qemuProcessHandleBlockJob (qemu_process.c:1073) ==25612== by 0x209F40C9: qemuMonitorEmitBlockJob (qemu_monitor.c:1357) * src/qemu/qemu_driver.c (qemuDomainBlockPivot): Don't corrupt disk->src, and only label chain for blockcopy. Signed-off-by: Eric Blake <eblake@redhat.com> --- src/qemu/qemu_driver.c | 40 +++++++++++++++++++++++++--------------- 1 file changed, 25 insertions(+), 15 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 96835bc..82a82aa 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -14886,23 +14886,33 @@ qemuDomainBlockPivot(virConnectPtr conn, } } - /* We previously labeled only the top-level image; but if the - * image includes a relative backing file, the pivot may result in - * qemu needing to open the entire backing chain, so we need to - * label the entire chain. This action is safe even if the - * backing chain has already been labeled; but only necessary when - * we know for sure that there is a backing chain. */ - oldsrc = disk->src; - disk->src = disk->mirror; + /* For active commit, the mirror is part of the already labeled + * chain. For blockcopy, we previously labeled only the top-level + * image; but if the user is reusing an external image that + * includes a backing file, the pivot may result in qemu needing + * to open the entire backing chain, so we need to label the + * entire chain. This action is safe even if the backing chain + * has already been labeled; but only necessary when we know for + * sure that there is a backing chain. */ + if (disk->mirrorJob == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) { + oldsrc = disk->src; + disk->src = disk->mirror; - if (qemuDomainDetermineDiskChain(driver, vm, disk, false) < 0) - goto cleanup; + if (qemuDomainDetermineDiskChain(driver, vm, disk, false) < 0) + goto cleanup; - if (disk->mirror->format && disk->mirror->format != VIR_STORAGE_FILE_RAW && - (virDomainLockDiskAttach(driver->lockManager, cfg->uri, vm, disk) < 0 || - qemuSetupDiskCgroup(vm, disk) < 0 || - virSecurityManagerSetDiskLabel(driver->securityManager, vm->def, disk) < 0)) - goto cleanup; + if (disk->mirror->format && + disk->mirror->format != VIR_STORAGE_FILE_RAW && + (virDomainLockDiskAttach(driver->lockManager, cfg->uri, vm, + disk) < 0 || + qemuSetupDiskCgroup(vm, disk) < 0 || + virSecurityManagerSetDiskLabel(driver->securityManager, vm->def, + disk) < 0)) + goto cleanup; + + disk->src = oldsrc; + oldsrc = NULL; + } /* Attempt the pivot. Record the attempt now, to prevent duplicate * attempts; but the actual disk change will be made when emitting -- 1.9.3

On 08/06/14 23:12, Eric Blake wrote:
Commit febf84c2 tried to delay in-memory modification of the actual domain disk structure until after the qemu event was received. However, I missed that the code for block pivot had been temporarily setting disk->src = disk->mirror prior to the qemu command, in order to label the backing chain of a reused external blockcopy disk; and calls into qemu while still in that state before finally undoing things at the cleanup label. Since the qemu event handler then does: virStorageSourceFree(disk->src); disk->src = disk->mirror; we have the sad race that a fast enough qemu event can cause a leak of the original disk->src, as well as a use-after-free of the disk->mirror contents, bad enough to crash libvirtd in some of my test runs, even though the common case of the qemu event being much later won't trip the race.
I'll go wear the brown paper bag of shame, for introducing a crasher in between rc1 and rc2 of the freeze for 1.2.7 :( My only consolation is that virDomainBlockJobAbort requires the domain:write ACL, so it is not a CVE.
The valgrind report when the race occurs looks like:
==25612== Invalid read of size 4 ==25612== at 0x50E7C90: virStorageSourceGetActualType (virstoragefile.c:1948) ==25612== by 0x209C0B18: qemuDomainDetermineDiskChain (qemu_domain.c:2473) ==25612== by 0x209D7F6A: qemuProcessHandleBlockJob (qemu_process.c:1087) ==25612== by 0x209F40C9: qemuMonitorEmitBlockJob (qemu_monitor.c:1357) ... ==25612== Address 0xe4b5610 is 0 bytes inside a block of size 200 free'd ==25612== at 0x4A07577: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==25612== by 0x50839E9: virFree (viralloc.c:582) ==25612== by 0x50E7E51: virStorageSourceFree (virstoragefile.c:2015) ==25612== by 0x209D7EFF: qemuProcessHandleBlockJob (qemu_process.c:1073) ==25612== by 0x209F40C9: qemuMonitorEmitBlockJob (qemu_monitor.c:1357)
* src/qemu/qemu_driver.c (qemuDomainBlockPivot): Don't corrupt disk->src, and only label chain for blockcopy.
Signed-off-by: Eric Blake <eblake@redhat.com> --- src/qemu/qemu_driver.c | 40 +++++++++++++++++++++++++--------------- 1 file changed, 25 insertions(+), 15 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 96835bc..82a82aa 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -14886,23 +14886,33 @@ qemuDomainBlockPivot(virConnectPtr conn, } }
- /* We previously labeled only the top-level image; but if the - * image includes a relative backing file, the pivot may result in - * qemu needing to open the entire backing chain, so we need to - * label the entire chain. This action is safe even if the - * backing chain has already been labeled; but only necessary when - * we know for sure that there is a backing chain. */ - oldsrc = disk->src; - disk->src = disk->mirror; + /* For active commit, the mirror is part of the already labeled + * chain. For blockcopy, we previously labeled only the top-level + * image; but if the user is reusing an external image that + * includes a backing file, the pivot may result in qemu needing + * to open the entire backing chain, so we need to label the + * entire chain. This action is safe even if the backing chain + * has already been labeled; but only necessary when we know for + * sure that there is a backing chain. */ + if (disk->mirrorJob == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) { + oldsrc = disk->src; + disk->src = disk->mirror;
- if (qemuDomainDetermineDiskChain(driver, vm, disk, false) < 0) - goto cleanup; + if (qemuDomainDetermineDiskChain(driver, vm, disk, false) < 0) + goto cleanup;
- if (disk->mirror->format && disk->mirror->format != VIR_STORAGE_FILE_RAW && - (virDomainLockDiskAttach(driver->lockManager, cfg->uri, vm, disk) < 0 || - qemuSetupDiskCgroup(vm, disk) < 0 || - virSecurityManagerSetDiskLabel(driver->securityManager, vm->def, disk) < 0)) - goto cleanup; + if (disk->mirror->format && + disk->mirror->format != VIR_STORAGE_FILE_RAW && + (virDomainLockDiskAttach(driver->lockManager, cfg->uri, vm, + disk) < 0 || + qemuSetupDiskCgroup(vm, disk) < 0 || + virSecurityManagerSetDiskLabel(driver->securityManager, vm->def, + disk) < 0)) + goto cleanup; + + disk->src = oldsrc; + oldsrc = NULL; + }
/* Attempt the pivot. Record the attempt now, to prevent duplicate * attempts; but the actual disk change will be made when emitting
In the cleanup section there's the original place where oldsrc was returned back to disk->src. That would now be dead code. ACK with that part removed. Peter

On 08/07/2014 01:58 AM, Peter Krempa wrote:
On 08/06/14 23:12, Eric Blake wrote:
Commit febf84c2 tried to delay in-memory modification of the actual domain disk structure until after the qemu event was received. However, I missed that the code for block pivot had been temporarily setting disk->src = disk->mirror prior to the qemu command, in order to label the backing chain of a reused external blockcopy disk; and calls into qemu while still in that state before finally undoing things at the cleanup label. Since the qemu event handler then does: virStorageSourceFree(disk->src); disk->src = disk->mirror; we have the sad race that a fast enough qemu event can cause a leak of the original disk->src, as well as a use-after-free of the disk->mirror contents, bad enough to crash libvirtd in some of my test runs, even though the common case of the qemu event being much later won't trip the race.
- if (qemuDomainDetermineDiskChain(driver, vm, disk, false) < 0) - goto cleanup; + if (qemuDomainDetermineDiskChain(driver, vm, disk, false) < 0) + goto cleanup;
If we go to cleanup here...
- if (disk->mirror->format && disk->mirror->format != VIR_STORAGE_FILE_RAW && - (virDomainLockDiskAttach(driver->lockManager, cfg->uri, vm, disk) < 0 || - qemuSetupDiskCgroup(vm, disk) < 0 || - virSecurityManagerSetDiskLabel(driver->securityManager, vm->def, disk) < 0)) - goto cleanup; + if (disk->mirror->format && + disk->mirror->format != VIR_STORAGE_FILE_RAW && + (virDomainLockDiskAttach(driver->lockManager, cfg->uri, vm, + disk) < 0 || + qemuSetupDiskCgroup(vm, disk) < 0 || + virSecurityManagerSetDiskLabel(driver->securityManager, vm->def, + disk) < 0)) + goto cleanup; + + disk->src = oldsrc; + oldsrc = NULL;
...then we miss the restore of disk->src here...
+ }
/* Attempt the pivot. Record the attempt now, to prevent duplicate * attempts; but the actual disk change will be made when emitting
In the cleanup section there's the original place where oldsrc was returned back to disk->src. That would now be dead code.
...so the cleanup label is not dead code.
ACK with that part removed.
Is the patch okay as-is, with my explanation? -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 08/07/14 18:36, Eric Blake wrote:
On 08/07/2014 01:58 AM, Peter Krempa wrote:
On 08/06/14 23:12, Eric Blake wrote:
Commit febf84c2 tried to delay in-memory modification of the actual domain disk structure until after the qemu event was received. However, I missed that the code for block pivot had been temporarily setting disk->src = disk->mirror prior to the qemu command, in order to label the backing chain of a reused external blockcopy disk; and calls into qemu while still in that state before finally undoing things at the cleanup label. Since the qemu event handler then does: virStorageSourceFree(disk->src); disk->src = disk->mirror; we have the sad race that a fast enough qemu event can cause a leak of the original disk->src, as well as a use-after-free of the disk->mirror contents, bad enough to crash libvirtd in some of my test runs, even though the common case of the qemu event being much later won't trip the race.
- if (qemuDomainDetermineDiskChain(driver, vm, disk, false) < 0) - goto cleanup; + if (qemuDomainDetermineDiskChain(driver, vm, disk, false) < 0) + goto cleanup;
If we go to cleanup here...
- if (disk->mirror->format && disk->mirror->format != VIR_STORAGE_FILE_RAW && - (virDomainLockDiskAttach(driver->lockManager, cfg->uri, vm, disk) < 0 || - qemuSetupDiskCgroup(vm, disk) < 0 || - virSecurityManagerSetDiskLabel(driver->securityManager, vm->def, disk) < 0)) - goto cleanup; + if (disk->mirror->format && + disk->mirror->format != VIR_STORAGE_FILE_RAW && + (virDomainLockDiskAttach(driver->lockManager, cfg->uri, vm, + disk) < 0 || + qemuSetupDiskCgroup(vm, disk) < 0 || + virSecurityManagerSetDiskLabel(driver->securityManager, vm->def, + disk) < 0)) + goto cleanup; + + disk->src = oldsrc; + oldsrc = NULL;
...then we miss the restore of disk->src here...
+ }
/* Attempt the pivot. Record the attempt now, to prevent duplicate * attempts; but the actual disk change will be made when emitting
In the cleanup section there's the original place where oldsrc was returned back to disk->src. That would now be dead code.
...so the cleanup label is not dead code.
ACK with that part removed.
Is the patch okay as-is, with my explanation?
Ah right :) Peter
participants (2)
-
Eric Blake
-
Peter Krempa