[libvirt] [PATCH v2] qemu: read backing chain names from qemu

https://bugzilla.redhat.com/show_bug.cgi?id=1199182 documents that after a series of disk snapshots into existing destination images, followed by active commits of the top image, it is possible for qemu 2.2 and earlier to end up tracking a different name for the image than what it would have had when opening the chain afresh. That is, when starting with the chain 'a <- b <- c', the name associated with 'b' is how it was spelled in the metadata of 'c', but when starting with 'a', taking two snapshots into 'a <- b <- c', then committing 'c' back into 'b', the name associated with 'b' is now the name used when taking the first snapshot. Sadly, older qemu doesn't know how to treat different spellings of the same filename as identical files (it uses strcmp() instead of checking for the same inode), which means libvirt's attempt to commit an image using solely the names learned from qcow2 metadata fails with a cryptic: error: internal error: unable to execute QEMU command 'block-commit': Top image file /tmp/images/c/../b/b not found even though the file exists. Trying to teach libvirt the rules on which name qemu will expect is not worth the effort (besides, we'd have to remember it across libvirtd restarts, and track whether a file was opened via metadata or via snapshot creation for a given qemu process); it is easier to just always directly ask qemu what string it expects to see in the first place. As a safety valve, we validate that any name returned by qemu still maps to the same local file as we have tracked it, so that a compromised qemu cannot accidentally cause us to act on an incorrect file. * src/qemu/qemu_monitor.h (qemuMonitorDiskNameLookup): New prototype. * src/qemu/qemu_monitor_json.h (qemuMonitorJSONDiskNameLookup): Likewise. * src/qemu/qemu_monitor.c (qemuMonitorDiskNameLookup): New function. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONDiskNameLookup) (qemuMonitorJSONDiskNameLookupOne): Likewise. * src/qemu/qemu_driver.c (qemuDomainBlockCommit) (qemuDomainBlockJobImpl): Use it. Signed-off-by: Eric Blake <eblake@redhat.com> --- v2: as suggested by Dan, add a sanity checking valve to ensure we don't use qemu's string until vetting that it resolves to the same local name we are already tracking src/qemu/qemu_driver.c | 28 ++++++------- src/qemu/qemu_monitor.c | 20 ++++++++- src/qemu/qemu_monitor.h | 8 +++- src/qemu/qemu_monitor_json.c | 97 +++++++++++++++++++++++++++++++++++++++++++- src/qemu/qemu_monitor_json.h | 9 +++- 5 files changed, 144 insertions(+), 18 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index b3263ac..f0e530d 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -16132,9 +16132,6 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm, goto endjob; if (baseSource) { - if (qemuGetDriveSourceString(baseSource, NULL, &basePath) < 0) - goto endjob; - if (flags & VIR_DOMAIN_BLOCK_REBASE_RELATIVE) { if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_CHANGE_BACKING_FILE)) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", @@ -16172,8 +16169,12 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm, } qemuDomainObjEnterMonitor(driver, vm); - ret = qemuMonitorBlockJob(priv->mon, device, basePath, backingPath, - speed, mode, async); + if (baseSource) + basePath = qemuMonitorDiskNameLookup(priv->mon, device, disk->src, + baseSource); + if (!baseSource || basePath) + ret = qemuMonitorBlockJob(priv->mon, device, basePath, backingPath, + speed, mode, async); if (qemuDomainObjExitMonitor(driver, vm) < 0) ret = -1; if (ret < 0) { @@ -16903,12 +16904,6 @@ qemuDomainBlockCommit(virDomainPtr dom, VIR_DISK_CHAIN_READ_WRITE) < 0)) goto endjob; - if (qemuGetDriveSourceString(topSource, NULL, &topPath) < 0) - goto endjob; - - if (qemuGetDriveSourceString(baseSource, NULL, &basePath) < 0) - goto endjob; - if (flags & VIR_DOMAIN_BLOCK_COMMIT_RELATIVE && topSource != disk->src) { if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_CHANGE_BACKING_FILE)) { @@ -16939,9 +16934,14 @@ qemuDomainBlockCommit(virDomainPtr dom, disk->mirrorJob = VIR_DOMAIN_BLOCK_JOB_TYPE_ACTIVE_COMMIT; } qemuDomainObjEnterMonitor(driver, vm); - ret = qemuMonitorBlockCommit(priv->mon, device, - topPath, basePath, backingPath, - speed); + basePath = qemuMonitorDiskNameLookup(priv->mon, device, disk->src, + baseSource); + topPath = qemuMonitorDiskNameLookup(priv->mon, device, disk->src, + topSource); + if (basePath && topPath) + ret = qemuMonitorBlockCommit(priv->mon, device, + topPath, basePath, backingPath, + speed); if (qemuDomainObjExitMonitor(driver, vm) < 0) { ret = -1; goto endjob; diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index d869a72..cf7dc5e 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -1,7 +1,7 @@ /* * qemu_monitor.c: interaction with QEMU monitor console * - * Copyright (C) 2006-2014 Red Hat, Inc. + * Copyright (C) 2006-2015 Red Hat, Inc. * Copyright (C) 2006 Daniel P. Berrange * * This library is free software; you can redistribute it and/or @@ -3458,6 +3458,24 @@ qemuMonitorSupportsActiveCommit(qemuMonitorPtr mon) } +/* Determine the name that qemu is using for tracking the backing + * element TARGET within the chain starting at TOP. */ +char * +qemuMonitorDiskNameLookup(qemuMonitorPtr mon, + const char *device, + virStorageSourcePtr top, + virStorageSourcePtr target) +{ + if (!mon->json) { + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s", + _("JSON monitor is required")); + return NULL; + } + + return qemuMonitorJSONDiskNameLookup(mon, device, top, target); +} + + /* Use the block-job-complete monitor command to pivot a block copy * job. */ int diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index b30da34..e67d800 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -1,7 +1,7 @@ /* * qemu_monitor.h: interaction with QEMU monitor console * - * Copyright (C) 2006-2014 Red Hat, Inc. + * Copyright (C) 2006-2015 Red Hat, Inc. * Copyright (C) 2006 Daniel P. Berrange * * This library is free software; you can redistribute it and/or @@ -734,6 +734,12 @@ int qemuMonitorBlockCommit(qemuMonitorPtr mon, ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3) ATTRIBUTE_NONNULL(4); bool qemuMonitorSupportsActiveCommit(qemuMonitorPtr mon); +char *qemuMonitorDiskNameLookup(qemuMonitorPtr mon, + const char *device, + virStorageSourcePtr top, + virStorageSourcePtr target) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3) + ATTRIBUTE_NONNULL(4); int qemuMonitorArbitraryCommand(qemuMonitorPtr mon, const char *cmd, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index c16f3ca..522fd17 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -1,7 +1,7 @@ /* * qemu_monitor_json.c: interaction with QEMU monitor console * - * Copyright (C) 2006-2014 Red Hat, Inc. + * Copyright (C) 2006-2015 Red Hat, Inc. * Copyright (C) 2006 Daniel P. Berrange * * This library is free software; you can redistribute it and/or @@ -3883,6 +3883,101 @@ qemuMonitorJSONDrivePivot(qemuMonitorPtr mon, const char *device, } +static char * +qemuMonitorJSONDiskNameLookupOne(virJSONValuePtr image, + virStorageSourcePtr top, + virStorageSourcePtr target) +{ + virJSONValuePtr backing; + char *ret; + + if (!top) + return NULL; + if (top != target) { + backing = virJSONValueObjectGet(image, "backing-image"); + return qemuMonitorJSONDiskNameLookupOne(backing, top->backingStore, + target); + } + if (VIR_STRDUP(ret, virJSONValueObjectGetString(image, "filename")) < 0) + return NULL; + /* Sanity check - the name qemu gave us should resolve to the same + file tracked by our target description. */ + if (virStorageSourceIsLocalStorage(target) && + STRNEQ(ret, target->path) && + !virFileLinkPointsTo(ret, target->path)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("qemu block name '%s' doesn't match expected '%s'"), + ret, target->path); + VIR_FREE(ret); + } + return ret; +} + + +char * +qemuMonitorJSONDiskNameLookup(qemuMonitorPtr mon, + const char *device, + virStorageSourcePtr top, + virStorageSourcePtr target) +{ + char *ret = NULL; + virJSONValuePtr cmd = NULL; + virJSONValuePtr reply = NULL; + virJSONValuePtr devices; + size_t i; + + cmd = qemuMonitorJSONMakeCommand("query-block", NULL); + if (!cmd) + return NULL; + if (qemuMonitorJSONCommand(mon, cmd, &reply) < 0) + goto cleanup; + + devices = virJSONValueObjectGet(reply, "return"); + if (!devices || devices->type != VIR_JSON_TYPE_ARRAY) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("block info reply was missing device list")); + goto cleanup; + } + + for (i = 0; i < virJSONValueArraySize(devices); i++) { + virJSONValuePtr dev = virJSONValueArrayGet(devices, i); + virJSONValuePtr inserted; + virJSONValuePtr image; + const char *thisdev; + + if (!dev || dev->type != VIR_JSON_TYPE_OBJECT) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("block info device entry was not in expected format")); + goto cleanup; + } + + if ((thisdev = virJSONValueObjectGetString(dev, "device")) == NULL) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("block info device entry was not in expected format")); + goto cleanup; + } + + if (STREQ(thisdev, device)) { + if ((inserted = virJSONValueObjectGet(dev, "inserted")) && + (image = virJSONValueObjectGet(inserted, "image"))) { + ret = qemuMonitorJSONDiskNameLookupOne(image, top, target); + } + break; + } + } + if (!ret && !virGetLastError()) + virReportError(VIR_ERR_INTERNAL_ERROR, + _("unable to find backing name for device %s"), + device); + + cleanup: + virJSONValueFree(cmd); + virJSONValueFree(reply); + + return ret; +} + + int qemuMonitorJSONArbitraryCommand(qemuMonitorPtr mon, const char *cmd_str, char **reply_str, diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index 8ceea8a..49392b6 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -1,7 +1,7 @@ /* * qemu_monitor_json.h: interaction with QEMU monitor console * - * Copyright (C) 2006-2009, 2011-2014 Red Hat, Inc. + * Copyright (C) 2006-2009, 2011-2015 Red Hat, Inc. * Copyright (C) 2006 Daniel P. Berrange * * This library is free software; you can redistribute it and/or @@ -277,6 +277,13 @@ int qemuMonitorJSONBlockCommit(qemuMonitorPtr mon, unsigned long long bandwidth) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); +char *qemuMonitorJSONDiskNameLookup(qemuMonitorPtr mon, + const char *device, + virStorageSourcePtr top, + virStorageSourcePtr target) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3) + ATTRIBUTE_NONNULL(4); + int qemuMonitorJSONArbitraryCommand(qemuMonitorPtr mon, const char *cmd_str, char **reply_str, -- 2.1.0

I do meet libvirtd crash sometime when test this patch(I also met it when test v1 yesterday, but can not reproduce it 100%.) Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffe9d39700 (LWP 25413)] virJSONValueObjectGetString (object=0x0, key=key@entry=0x7fffe4f72429 "filename") at util/virjson.c:1074 1074 if (object->type != VIR_JSON_TYPE_OBJECT) (gdb) t a a bt Thread 6 (Thread 0x7fffe9d39700 (LWP 25413)): #0 virJSONValueObjectGetString (object=0x0, key=key@entry=0x7fffe4f72429 "filename") at util/virjson.c:1074 #1 0x00007fffe4f2a1f4 in qemuMonitorJSONDiskNameLookupOne (image=<optimized out>, top=0x7fffd40013b0, target=target@entry=0x7fffd40013b0) at qemu/qemu_monitor_json.c:3901 #2 0x00007fffe4f2a1bc in qemuMonitorJSONDiskNameLookupOne (image=<optimized out>, top=top@entry=0x7fffdc0fc940, target=target@entry=0x7fffd40013b0) at qemu/qemu_monitor_json.c:3898 #3 0x00007fffe4f31800 in qemuMonitorJSONDiskNameLookup (mon=<optimized out>, device=0x7fffd429cee0 "drive-virtio-disk0", top=0x7fffdc0fc940, target=target@entry=0x7fffd40013b0) at qemu/qemu_monitor_json.c:3963 #4 0x00007fffe4f1f87e in qemuMonitorDiskNameLookup (mon=<optimized out>, device=<optimized out>, top=<optimized out>, target=target@entry=0x7fffd40013b0) at qemu/qemu_monitor.c:3475 #5 0x00007fffe4f55775 in qemuDomainBlockCommit (dom=<optimized out>, path=<optimized out>, base=<optimized out>, top=<optimized out>, bandwidth=<optimized out>, flags=<optimized out>) at qemu/qemu_driver.c:16937 #6 0x00007ffff75ff933 in virDomainBlockCommit (dom=dom@entry=0x7fffd429d630, disk=0x7fffd40010a0 "vda", base=0x0, top=0x0, bandwidth=0, flags=5) at libvirt-domain.c:10218 #7 0x00005555555736fe in remoteDispatchDomainBlockCommit (server=<optimized out>, msg=<optimized out>, args=0x7fffd429d9c0, rerr=0x7fffe9d38cb0, client=<optimized out>) at remote_dispatch.h:2594 #8 remoteDispatchDomainBlockCommitHelper (server=<optimized out>, client=<optimized out>, msg=<optimized out>, rerr=0x7fffe9d38cb0, args=0x7fffd429d9c0, ret=<optimized out>) at remote_dispatch.h:2564 #9 0x00007ffff7653db9 in virNetServerProgramDispatchCall (msg=0x5555557d8240, client=0x5555557ce4a0, server=0x5555557cc820, prog=0x5555557d4a40) at rpc/virnetserverprogram.c:437 #10 virNetServerProgramDispatch (prog=0x5555557d4a40, server=server@entry=0x5555557cc820, client=0x5555557ce4a0, msg=0x5555557d8240) at rpc/virnetserverprogram.c:307 #11 0x00005555555989d8 in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x5555557cc820) at rpc/virnetserver.c:172 #12 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x5555557cc820) at rpc/virnetserver.c:193 #13 0x00007ffff755ed8e in virThreadPoolWorker (opaque=opaque@entry=0x5555557d8370) at util/virthreadpool.c:144 #14 0x00007ffff755e72e in virThreadHelper (data=<optimized out>) at util/virthread.c:197 #15 0x00007ffff5de252a in start_thread (arg=0x7fffe9d39700) at pthread_create.c:310 #16 0x00007ffff5b1e22d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 On 03/13/2015 04:23 AM, Eric Blake wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1199182 documents that after a series of disk snapshots into existing destination images, followed by active commits of the top image, it is possible for qemu 2.2 and earlier to end up tracking a different name for the image than what it would have had when opening the chain afresh. That is, when starting with the chain 'a <- b <- c', the name associated with 'b' is how it was spelled in the metadata of 'c', but when starting with 'a', taking two snapshots into 'a <- b <- c', then committing 'c' back into 'b', the name associated with 'b' is now the name used when taking the first snapshot.
Sadly, older qemu doesn't know how to treat different spellings of the same filename as identical files (it uses strcmp() instead of checking for the same inode), which means libvirt's attempt to commit an image using solely the names learned from qcow2 metadata fails with a cryptic:
error: internal error: unable to execute QEMU command 'block-commit': Top image file /tmp/images/c/../b/b not found
even though the file exists. Trying to teach libvirt the rules on which name qemu will expect is not worth the effort (besides, we'd have to remember it across libvirtd restarts, and track whether a file was opened via metadata or via snapshot creation for a given qemu process); it is easier to just always directly ask qemu what string it expects to see in the first place.
As a safety valve, we validate that any name returned by qemu still maps to the same local file as we have tracked it, so that a compromised qemu cannot accidentally cause us to act on an incorrect file.
* src/qemu/qemu_monitor.h (qemuMonitorDiskNameLookup): New prototype. * src/qemu/qemu_monitor_json.h (qemuMonitorJSONDiskNameLookup): Likewise. * src/qemu/qemu_monitor.c (qemuMonitorDiskNameLookup): New function. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONDiskNameLookup) (qemuMonitorJSONDiskNameLookupOne): Likewise. * src/qemu/qemu_driver.c (qemuDomainBlockCommit) (qemuDomainBlockJobImpl): Use it.
Signed-off-by: Eric Blake <eblake@redhat.com> ---
v2: as suggested by Dan, add a sanity checking valve to ensure we don't use qemu's string until vetting that it resolves to the same local name we are already tracking
src/qemu/qemu_driver.c | 28 ++++++------- src/qemu/qemu_monitor.c | 20 ++++++++- src/qemu/qemu_monitor.h | 8 +++- src/qemu/qemu_monitor_json.c | 97 +++++++++++++++++++++++++++++++++++++++++++- src/qemu/qemu_monitor_json.h | 9 +++- 5 files changed, 144 insertions(+), 18 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index b3263ac..f0e530d 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -16132,9 +16132,6 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm, goto endjob;
if (baseSource) { - if (qemuGetDriveSourceString(baseSource, NULL, &basePath) < 0) - goto endjob; - if (flags & VIR_DOMAIN_BLOCK_REBASE_RELATIVE) { if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_CHANGE_BACKING_FILE)) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", @@ -16172,8 +16169,12 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm, }
qemuDomainObjEnterMonitor(driver, vm); - ret = qemuMonitorBlockJob(priv->mon, device, basePath, backingPath, - speed, mode, async); + if (baseSource) + basePath = qemuMonitorDiskNameLookup(priv->mon, device, disk->src, + baseSource); + if (!baseSource || basePath) + ret = qemuMonitorBlockJob(priv->mon, device, basePath, backingPath, + speed, mode, async); if (qemuDomainObjExitMonitor(driver, vm) < 0) ret = -1; if (ret < 0) { @@ -16903,12 +16904,6 @@ qemuDomainBlockCommit(virDomainPtr dom, VIR_DISK_CHAIN_READ_WRITE) < 0)) goto endjob;
- if (qemuGetDriveSourceString(topSource, NULL, &topPath) < 0) - goto endjob; - - if (qemuGetDriveSourceString(baseSource, NULL, &basePath) < 0) - goto endjob; - if (flags & VIR_DOMAIN_BLOCK_COMMIT_RELATIVE && topSource != disk->src) { if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_CHANGE_BACKING_FILE)) { @@ -16939,9 +16934,14 @@ qemuDomainBlockCommit(virDomainPtr dom, disk->mirrorJob = VIR_DOMAIN_BLOCK_JOB_TYPE_ACTIVE_COMMIT; } qemuDomainObjEnterMonitor(driver, vm); - ret = qemuMonitorBlockCommit(priv->mon, device, - topPath, basePath, backingPath, - speed); + basePath = qemuMonitorDiskNameLookup(priv->mon, device, disk->src, + baseSource); + topPath = qemuMonitorDiskNameLookup(priv->mon, device, disk->src, + topSource); + if (basePath && topPath) + ret = qemuMonitorBlockCommit(priv->mon, device, + topPath, basePath, backingPath, + speed); if (qemuDomainObjExitMonitor(driver, vm) < 0) { ret = -1; goto endjob; diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index d869a72..cf7dc5e 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -1,7 +1,7 @@ /* * qemu_monitor.c: interaction with QEMU monitor console * - * Copyright (C) 2006-2014 Red Hat, Inc. + * Copyright (C) 2006-2015 Red Hat, Inc. * Copyright (C) 2006 Daniel P. Berrange * * This library is free software; you can redistribute it and/or @@ -3458,6 +3458,24 @@ qemuMonitorSupportsActiveCommit(qemuMonitorPtr mon) }
+/* Determine the name that qemu is using for tracking the backing + * element TARGET within the chain starting at TOP. */ +char * +qemuMonitorDiskNameLookup(qemuMonitorPtr mon, + const char *device, + virStorageSourcePtr top, + virStorageSourcePtr target) +{ + if (!mon->json) { + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s", + _("JSON monitor is required")); + return NULL; + } + + return qemuMonitorJSONDiskNameLookup(mon, device, top, target); +} + + /* Use the block-job-complete monitor command to pivot a block copy * job. */ int diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index b30da34..e67d800 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -1,7 +1,7 @@ /* * qemu_monitor.h: interaction with QEMU monitor console * - * Copyright (C) 2006-2014 Red Hat, Inc. + * Copyright (C) 2006-2015 Red Hat, Inc. * Copyright (C) 2006 Daniel P. Berrange * * This library is free software; you can redistribute it and/or @@ -734,6 +734,12 @@ int qemuMonitorBlockCommit(qemuMonitorPtr mon, ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3) ATTRIBUTE_NONNULL(4); bool qemuMonitorSupportsActiveCommit(qemuMonitorPtr mon); +char *qemuMonitorDiskNameLookup(qemuMonitorPtr mon, + const char *device, + virStorageSourcePtr top, + virStorageSourcePtr target) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3) + ATTRIBUTE_NONNULL(4);
int qemuMonitorArbitraryCommand(qemuMonitorPtr mon, const char *cmd, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index c16f3ca..522fd17 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -1,7 +1,7 @@ /* * qemu_monitor_json.c: interaction with QEMU monitor console * - * Copyright (C) 2006-2014 Red Hat, Inc. + * Copyright (C) 2006-2015 Red Hat, Inc. * Copyright (C) 2006 Daniel P. Berrange * * This library is free software; you can redistribute it and/or @@ -3883,6 +3883,101 @@ qemuMonitorJSONDrivePivot(qemuMonitorPtr mon, const char *device, }
+static char * +qemuMonitorJSONDiskNameLookupOne(virJSONValuePtr image, + virStorageSourcePtr top, + virStorageSourcePtr target) +{ + virJSONValuePtr backing; + char *ret; + + if (!top) + return NULL; + if (top != target) { + backing = virJSONValueObjectGet(image, "backing-image"); + return qemuMonitorJSONDiskNameLookupOne(backing, top->backingStore, + target); + } + if (VIR_STRDUP(ret, virJSONValueObjectGetString(image, "filename")) < 0) + return NULL; + /* Sanity check - the name qemu gave us should resolve to the same + file tracked by our target description. */ + if (virStorageSourceIsLocalStorage(target) && + STRNEQ(ret, target->path) && + !virFileLinkPointsTo(ret, target->path)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("qemu block name '%s' doesn't match expected '%s'"), + ret, target->path); + VIR_FREE(ret); + } + return ret; +} + + +char * +qemuMonitorJSONDiskNameLookup(qemuMonitorPtr mon, + const char *device, + virStorageSourcePtr top, + virStorageSourcePtr target) +{ + char *ret = NULL; + virJSONValuePtr cmd = NULL; + virJSONValuePtr reply = NULL; + virJSONValuePtr devices; + size_t i; + + cmd = qemuMonitorJSONMakeCommand("query-block", NULL); + if (!cmd) + return NULL; + if (qemuMonitorJSONCommand(mon, cmd, &reply) < 0) + goto cleanup; + + devices = virJSONValueObjectGet(reply, "return"); + if (!devices || devices->type != VIR_JSON_TYPE_ARRAY) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("block info reply was missing device list")); + goto cleanup; + } + + for (i = 0; i < virJSONValueArraySize(devices); i++) { + virJSONValuePtr dev = virJSONValueArrayGet(devices, i); + virJSONValuePtr inserted; + virJSONValuePtr image; + const char *thisdev; + + if (!dev || dev->type != VIR_JSON_TYPE_OBJECT) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("block info device entry was not in expected format")); + goto cleanup; + } + + if ((thisdev = virJSONValueObjectGetString(dev, "device")) == NULL) {
If dev=NULL, it will cased crash in virJSONValueObjectGetString? const char * virJSONValueObjectGetString(virJSONValuePtr object, const char *key) { virJSONValuePtr val; if (object->type != VIR_JSON_TYPE_OBJECT) return NULL;
+ virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("block info device entry was not in expected format")); + goto cleanup; + } + + if (STREQ(thisdev, device)) { + if ((inserted = virJSONValueObjectGet(dev, "inserted")) && + (image = virJSONValueObjectGet(inserted, "image"))) { + ret = qemuMonitorJSONDiskNameLookupOne(image, top, target); + } + break; + } + } + if (!ret && !virGetLastError()) + virReportError(VIR_ERR_INTERNAL_ERROR, + _("unable to find backing name for device %s"), + device); + + cleanup: + virJSONValueFree(cmd); + virJSONValueFree(reply); + + return ret; +} + + int qemuMonitorJSONArbitraryCommand(qemuMonitorPtr mon, const char *cmd_str, char **reply_str, diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index 8ceea8a..49392b6 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -1,7 +1,7 @@ /* * qemu_monitor_json.h: interaction with QEMU monitor console * - * Copyright (C) 2006-2009, 2011-2014 Red Hat, Inc. + * Copyright (C) 2006-2009, 2011-2015 Red Hat, Inc. * Copyright (C) 2006 Daniel P. Berrange * * This library is free software; you can redistribute it and/or @@ -277,6 +277,13 @@ int qemuMonitorJSONBlockCommit(qemuMonitorPtr mon, unsigned long long bandwidth) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2);
+char *qemuMonitorJSONDiskNameLookup(qemuMonitorPtr mon, + const char *device, + virStorageSourcePtr top, + virStorageSourcePtr target) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3) + ATTRIBUTE_NONNULL(4); + int qemuMonitorJSONArbitraryCommand(qemuMonitorPtr mon, const char *cmd_str, char **reply_str,
-- Regards shyu

On Thu, Mar 12, 2015 at 14:23:48 -0600, Eric Blake wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1199182 documents that after a series of disk snapshots into existing destination images, followed by active commits of the top image, it is possible for qemu 2.2 and earlier to end up tracking a different name for the image than what it would have had when opening the chain afresh. That is, when starting with the chain 'a <- b <- c', the name associated with 'b' is how it was spelled in the metadata of 'c', but when starting with 'a', taking two snapshots into 'a <- b <- c', then committing 'c' back into 'b', the name associated with 'b' is now the name used when taking the first snapshot.
Sadly, older qemu doesn't know how to treat different spellings of the same filename as identical files (it uses strcmp() instead of checking for the same inode), which means libvirt's attempt to commit an image using solely the names learned from qcow2 metadata fails with a cryptic:
error: internal error: unable to execute QEMU command 'block-commit': Top image file /tmp/images/c/../b/b not found
even though the file exists. Trying to teach libvirt the rules on which name qemu will expect is not worth the effort (besides, we'd have to remember it across libvirtd restarts, and track whether a file was opened via metadata or via snapshot creation for a given qemu process); it is easier to just always directly ask qemu what string it expects to see in the first place.
As a safety valve, we validate that any name returned by qemu still maps to the same local file as we have tracked it, so that a compromised qemu cannot accidentally cause us to act on an incorrect file.
It would still allow to act on remote storage though. Also if qemu is corrupted in a way that it'd lie to us correctly via monitor it would be most probably also able to act on the file itself. As the labelling is done from the internal structures it should not allow to do anything besides what the instance is already allowed. A bigger problem though would be that since we don't store the backing chain internally all the time, qemu could rewrite the metadata in the image and libvirt would happily accept those. Corrupting qemu in that way is very unprobable though IMO.
* src/qemu/qemu_monitor.h (qemuMonitorDiskNameLookup): New prototype. * src/qemu/qemu_monitor_json.h (qemuMonitorJSONDiskNameLookup): Likewise. * src/qemu/qemu_monitor.c (qemuMonitorDiskNameLookup): New function. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONDiskNameLookup) (qemuMonitorJSONDiskNameLookupOne): Likewise. * src/qemu/qemu_driver.c (qemuDomainBlockCommit) (qemuDomainBlockJobImpl): Use it.
Signed-off-by: Eric Blake <eblake@redhat.com> ---
v2: as suggested by Dan, add a sanity checking valve to ensure we don't use qemu's string until vetting that it resolves to the same local name we are already tracking
src/qemu/qemu_driver.c | 28 ++++++------- src/qemu/qemu_monitor.c | 20 ++++++++- src/qemu/qemu_monitor.h | 8 +++- src/qemu/qemu_monitor_json.c | 97 +++++++++++++++++++++++++++++++++++++++++++- src/qemu/qemu_monitor_json.h | 9 +++- 5 files changed, 144 insertions(+), 18 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index b3263ac..f0e530d 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c
...
@@ -16172,8 +16169,12 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm, }
qemuDomainObjEnterMonitor(driver, vm); - ret = qemuMonitorBlockJob(priv->mon, device, basePath, backingPath, - speed, mode, async); + if (baseSource) + basePath = qemuMonitorDiskNameLookup(priv->mon, device, disk->src,
I remember that at some point accessing of domain definition while in the monitor was not okay for some reason, but I can't now remember why nor whether it was fixed.
+ baseSource); + if (!baseSource || basePath) + ret = qemuMonitorBlockJob(priv->mon, device, basePath, backingPath, + speed, mode, async); if (qemuDomainObjExitMonitor(driver, vm) < 0) ret = -1; if (ret < 0) {
...
diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index d869a72..cf7dc5e 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -1,7 +1,7 @@ /* * qemu_monitor.c: interaction with QEMU monitor console * - * Copyright (C) 2006-2014 Red Hat, Inc. + * Copyright (C) 2006-2015 Red Hat, Inc.
Shouldn't we employ something as in gnulib, where copyrights would be bumped at once everywhere?
* Copyright (C) 2006 Daniel P. Berrange * * This library is free software; you can redistribute it and/or
...
diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index b30da34..e67d800 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -1,7 +1,7 @@ /* * qemu_monitor.h: interaction with QEMU monitor console * - * Copyright (C) 2006-2014 Red Hat, Inc. + * Copyright (C) 2006-2015 Red Hat, Inc. * Copyright (C) 2006 Daniel P. Berrange
Because it's annoying when you are trying to trim uninterresting parts from a patch when you are doing a review.
* * This library is free software; you can redistribute it and/or
...
diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index c16f3ca..522fd17 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -1,7 +1,7 @@ /* * qemu_monitor_json.c: interaction with QEMU monitor console * - * Copyright (C) 2006-2014 Red Hat, Inc. + * Copyright (C) 2006-2015 Red Hat, Inc. * Copyright (C) 2006 Daniel P. Berrange * * This library is free software; you can redistribute it and/or
...
@@ -3883,6 +3883,101 @@ qemuMonitorJSONDrivePivot(qemuMonitorPtr mon, const char *device, }
+static char * +qemuMonitorJSONDiskNameLookupOne(virJSONValuePtr image, + virStorageSourcePtr top, + virStorageSourcePtr target) +{ + virJSONValuePtr backing; + char *ret; + + if (!top) + return NULL;
In case the backing chain as remembered by libvirt is shorter than what qemu sees you don't report error. Since the caller checks whether an error was set and if not then adds one, please state this fact in a comment here as it's not obvious until you follow the call chain.
+ if (top != target) { + backing = virJSONValueObjectGet(image, "backing-image"); + return qemuMonitorJSONDiskNameLookupOne(backing, top->backingStore, + target);
Also the recursion doesn't take into account that for some reason qemu might report a shorter chain than libvirt thinks, which would crash here.
+ } + if (VIR_STRDUP(ret, virJSONValueObjectGetString(image, "filename")) < 0) + return NULL; + /* Sanity check - the name qemu gave us should resolve to the same + file tracked by our target description. */ + if (virStorageSourceIsLocalStorage(target) && + STRNEQ(ret, target->path) && + !virFileLinkPointsTo(ret, target->path)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("qemu block name '%s' doesn't match expected '%s'"), + ret, target->path); + VIR_FREE(ret); + } + return ret; +} + + +char * +qemuMonitorJSONDiskNameLookup(qemuMonitorPtr mon, + const char *device, + virStorageSourcePtr top, + virStorageSourcePtr target) +{ + char *ret = NULL; + virJSONValuePtr cmd = NULL; + virJSONValuePtr reply = NULL; + virJSONValuePtr devices; + size_t i; + + cmd = qemuMonitorJSONMakeCommand("query-block", NULL); + if (!cmd) + return NULL; + if (qemuMonitorJSONCommand(mon, cmd, &reply) < 0) + goto cleanup; + + devices = virJSONValueObjectGet(reply, "return"); + if (!devices || devices->type != VIR_JSON_TYPE_ARRAY) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("block info reply was missing device list")); + goto cleanup; + } + + for (i = 0; i < virJSONValueArraySize(devices); i++) { + virJSONValuePtr dev = virJSONValueArrayGet(devices, i); + virJSONValuePtr inserted; + virJSONValuePtr image; + const char *thisdev; + + if (!dev || dev->type != VIR_JSON_TYPE_OBJECT) {
[1]
+ virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("block info device entry was not in expected format")); + goto cleanup; + } + + if ((thisdev = virJSONValueObjectGetString(dev, "device")) == NULL) {
You are mixing styles of cheching of the pointer to be non-null within a few lines ([1])
+ virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("block info device entry was not in expected format")); + goto cleanup; + } + + if (STREQ(thisdev, device)) { + if ((inserted = virJSONValueObjectGet(dev, "inserted")) && + (image = virJSONValueObjectGet(inserted, "image"))) { + ret = qemuMonitorJSONDiskNameLookupOne(image, top, target); + } + break; + } + } + if (!ret && !virGetLastError()) + virReportError(VIR_ERR_INTERNAL_ERROR, + _("unable to find backing name for device %s"), + device); + + cleanup: + virJSONValueFree(cmd); + virJSONValueFree(reply); + + return ret; +} + + int qemuMonitorJSONArbitraryCommand(qemuMonitorPtr mon, const char *cmd_str, char **reply_str,
ACK if you add the comment and fix the potential crash. I'm currently OK with accessing domain definition while it's unlocked (but guarded via the domain job) as I don't have an counter example where it wouldn't work correctly. Peter

On 03/13/2015 02:02 AM, Peter Krempa wrote:
@@ -16172,8 +16169,12 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm, }
qemuDomainObjEnterMonitor(driver, vm); - ret = qemuMonitorBlockJob(priv->mon, device, basePath, backingPath, - speed, mode, async); + if (baseSource) + basePath = qemuMonitorDiskNameLookup(priv->mon, device, disk->src,
I remember that at some point accessing of domain definition while in the monitor was not okay for some reason, but I can't now remember why nor whether it was fixed.
Oh, right. You're thinking of CVE-2013-6458. That problem was that as soon as we enter the monitor, we drop locks. If we do not already own a block job, then some other parallel API could be hot-unplugging a disk before we regain control, freeing 'disk' before we dereference it. But we fixed that problem by guaranteeing that we always own the job early enough (no other thread can hot-unplug the disk as long as we own the job), so it is not an issue for this patch.
- * Copyright (C) 2006-2014 Red Hat, Inc. + * Copyright (C) 2006-2015 Red Hat, Inc.
Shouldn't we employ something as in gnulib, where copyrights would be bumped at once everywhere?
Might be nice, but one wrinkle. Gnulib has a single copyright holder (FSF), so they can afford to bump all files at once (the bump is also owned by FSF, so FSF adding another year to its copyright is appropriate). But libvirt is split among multiple copyright holders - Red Hat can't claim copyright over all files, so it wouldn't be wise to bump all files, just the ones that Red Hat has already touched. Personally, I've just got an emacs hook that checks if any file I touch has an up-to-date copyright line.
+static char * +qemuMonitorJSONDiskNameLookupOne(virJSONValuePtr image, + virStorageSourcePtr top, + virStorageSourcePtr target) +{ + virJSONValuePtr backing; + char *ret; + + if (!top) + return NULL;
In case the backing chain as remembered by libvirt is shorter than what qemu sees you don't report error. Since the caller checks whether an error was set and if not then adds one, please state this fact in a comment here as it's not obvious until you follow the call chain.
Will do.
+ if (top != target) { + backing = virJSONValueObjectGet(image, "backing-image"); + return qemuMonitorJSONDiskNameLookupOne(backing, top->backingStore, + target);
Also the recursion doesn't take into account that for some reason qemu might report a shorter chain than libvirt thinks, which would crash here.
Oh, good catch (and looks like it explains what Shanzhi reported).
+ if (!dev || dev->type != VIR_JSON_TYPE_OBJECT) {
[1]
+ virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("block info device entry was not in expected format")); + goto cleanup; + } + + if ((thisdev = virJSONValueObjectGetString(dev, "device")) == NULL) {
You are mixing styles of cheching of the pointer to be non-null within a few lines ([1])
Copy-and-paste from another recursive parser of "query-block" information, but I can make it more consistent.
ACK if you add the comment and fix the potential crash. I'm currently OK with accessing domain definition while it's unlocked (but guarded via the domain job) as I don't have an counter example where it wouldn't work correctly.
I'm still a bit worried by Shanzhi's report of a crash; maybe I still have a race condition. That is, we change libvirt's notion of the chain length after a commit based on response to a qemu event rather than a user command - I was thinking that libvirt's chain and qemu's chain will always be the same length, but since Shanzhi provided a stack trace where it is not true, I'm wondering if the qemu chain being shorter than the libvirt chain might mean that we have some sort of window where a qemu event happens at the wrong moment when repeatedly hammering on consecutive commits. So I'll post a v3 after more testing rather than just blindly going on this ACK. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Fri, Mar 13, 2015 at 07:01:06AM -0600, Eric Blake wrote:
On 03/13/2015 02:02 AM, Peter Krempa wrote:
@@ -16172,8 +16169,12 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm, }
qemuDomainObjEnterMonitor(driver, vm); - ret = qemuMonitorBlockJob(priv->mon, device, basePath, backingPath, - speed, mode, async); + if (baseSource) + basePath = qemuMonitorDiskNameLookup(priv->mon, device, disk->src,
I remember that at some point accessing of domain definition while in the monitor was not okay for some reason, but I can't now remember why nor whether it was fixed.
Oh, right. You're thinking of CVE-2013-6458. That problem was that as soon as we enter the monitor, we drop locks. If we do not already own a block job, then some other parallel API could be hot-unplugging a disk before we regain control, freeing 'disk' before we dereference it. But we fixed that problem by guaranteeing that we always own the job early enough (no other thread can hot-unplug the disk as long as we own the job), so it is not an issue for this patch.
- * Copyright (C) 2006-2014 Red Hat, Inc. + * Copyright (C) 2006-2015 Red Hat, Inc.
Shouldn't we employ something as in gnulib, where copyrights would be bumped at once everywhere?
Might be nice, but one wrinkle. Gnulib has a single copyright holder (FSF), so they can afford to bump all files at once (the bump is also owned by FSF, so FSF adding another year to its copyright is appropriate). But libvirt is split among multiple copyright holders - Red Hat can't claim copyright over all files, so it wouldn't be wise to bump all files, just the ones that Red Hat has already touched.
Personally, I've just got an emacs hook that checks if any file I touch has an up-to-date copyright line.
Technically there is no need to actually assert copyright over the code at all, since copyright is an automatic right you get the moment you author the code. Given that the copyright notice is not even required in the first place, asserting a year alongside the copyright notice is by implication not required either, nor is updating the year when you change code. Adding the Copyright lines is at most an informative step, to assist those reading the code in seeing its providence & ownership. Of course GIT history is much more useful for that purpose, but not everyone will receive a copy of GIT repo when they receive the code. In essence, the Copyright lines had a moderate benefit in clarifying ownership, but no legal benefit. By all means include a date when first starting a new file, but I think updating existing dates is pretty much a waste of time. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
participants (4)
-
Daniel P. Berrange
-
Eric Blake
-
Peter Krempa
-
Shanzhi Yu