[libvirt] [PATCH] qemu: read backing chain names from qemu

https://bugzilla.redhat.com/show_bug.cgi?id=1199182 documents that after a series of disk snapshots into existing destination images, followed by active commits of the top image, it is possible for qemu 2.2 and earlier to end up tracking a different name for the image than what it would have had when opening the chain afresh. That is, when starting with the chain 'a <- b <- c', the name associated with 'b' is how it was spelled in the metadata of 'c', but when starting with 'a', taking two snapshots into 'a <- b <- c', then committing 'c' back into 'b', the name associated with 'b' is now the name used when taking the first snapshot. Sadly, older qemu doesn't know how to treat different spellings of the same filename as identical files (it uses strcmp() instead of checking for the same inode), which means libvirt's attempt to commit an image using solely the names learned from qcow2 metadata fails with a cryptic: error: internal error: unable to execute QEMU command 'block-commit': Top image file /tmp/images/c/../b/b not found even though the file exists. Trying to teach libvirt the rules on which name qemu will expect is not worth the effort (besides, we'd have to remember it across libvirtd restarts, and track whether a file was opened via metadata or via snapshot creation for a given qemu process); it is easier to just always directly ask qemu what string it expects to see in the first place. * src/qemu/qemu_monitor.h (qemuMonitorDiskNameLookup): New prototype. * src/qemu/qemu_monitor_json.h (qemuMonitorJSONDiskNameLookup): Likewise. * src/qemu/qemu_monitor.c (qemuMonitorDiskNameLookup): New function. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONDiskNameLookup) (qemuMonitorJSONDiskNameLookupOne): Likewise. * src/qemu/qemu_driver.c (qemuDomainBlockCommit) (qemuDomainBlockJobImpl): Use it. Signed-off-by: Eric Blake <eblake@redhat.com> --- Adam, since you reported the issue, let me know if you need a scratch binary to test this patch on. Jeff, thanks for the idea on how to solve the problem without touching qemu (of course, it would still be nice to teach qemu to respect inode equivalence rather than relying on streq, and even nicer for libvirt to use node names instead of file names, but those can be later patches without holding up VDSM now). src/qemu/qemu_driver.c | 28 +++++++------- src/qemu/qemu_monitor.c | 20 +++++++++- src/qemu/qemu_monitor.h | 8 +++- src/qemu/qemu_monitor_json.c | 87 +++++++++++++++++++++++++++++++++++++++++++- src/qemu/qemu_monitor_json.h | 9 ++++- 5 files changed, 134 insertions(+), 18 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 51b30b7..e540001 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -15932,9 +15932,6 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm, goto endjob; if (baseSource) { - if (qemuGetDriveSourceString(baseSource, NULL, &basePath) < 0) - goto endjob; - if (flags & VIR_DOMAIN_BLOCK_REBASE_RELATIVE) { if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_CHANGE_BACKING_FILE)) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", @@ -15972,8 +15969,12 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm, } qemuDomainObjEnterMonitor(driver, vm); - ret = qemuMonitorBlockJob(priv->mon, device, basePath, backingPath, - speed, mode, async); + if (baseSource) + basePath = qemuMonitorDiskNameLookup(priv->mon, device, disk->src, + baseSource); + if (!baseSource || basePath) + ret = qemuMonitorBlockJob(priv->mon, device, basePath, backingPath, + speed, mode, async); if (qemuDomainObjExitMonitor(driver, vm) < 0) ret = -1; if (ret < 0) { @@ -16703,12 +16704,6 @@ qemuDomainBlockCommit(virDomainPtr dom, VIR_DISK_CHAIN_READ_WRITE) < 0)) goto endjob; - if (qemuGetDriveSourceString(topSource, NULL, &topPath) < 0) - goto endjob; - - if (qemuGetDriveSourceString(baseSource, NULL, &basePath) < 0) - goto endjob; - if (flags & VIR_DOMAIN_BLOCK_COMMIT_RELATIVE && topSource != disk->src) { if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_CHANGE_BACKING_FILE)) { @@ -16739,9 +16734,14 @@ qemuDomainBlockCommit(virDomainPtr dom, disk->mirrorJob = VIR_DOMAIN_BLOCK_JOB_TYPE_ACTIVE_COMMIT; } qemuDomainObjEnterMonitor(driver, vm); - ret = qemuMonitorBlockCommit(priv->mon, device, - topPath, basePath, backingPath, - speed); + basePath = qemuMonitorDiskNameLookup(priv->mon, device, disk->src, + baseSource); + topPath = qemuMonitorDiskNameLookup(priv->mon, device, disk->src, + topSource); + if (basePath && topPath) + ret = qemuMonitorBlockCommit(priv->mon, device, + topPath, basePath, backingPath, + speed); if (qemuDomainObjExitMonitor(driver, vm) < 0) { ret = -1; goto endjob; diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index d869a72..cf7dc5e 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -1,7 +1,7 @@ /* * qemu_monitor.c: interaction with QEMU monitor console * - * Copyright (C) 2006-2014 Red Hat, Inc. + * Copyright (C) 2006-2015 Red Hat, Inc. * Copyright (C) 2006 Daniel P. Berrange * * This library is free software; you can redistribute it and/or @@ -3458,6 +3458,24 @@ qemuMonitorSupportsActiveCommit(qemuMonitorPtr mon) } +/* Determine the name that qemu is using for tracking the backing + * element TARGET within the chain starting at TOP. */ +char * +qemuMonitorDiskNameLookup(qemuMonitorPtr mon, + const char *device, + virStorageSourcePtr top, + virStorageSourcePtr target) +{ + if (!mon->json) { + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s", + _("JSON monitor is required")); + return NULL; + } + + return qemuMonitorJSONDiskNameLookup(mon, device, top, target); +} + + /* Use the block-job-complete monitor command to pivot a block copy * job. */ int diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index b30da34..e67d800 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -1,7 +1,7 @@ /* * qemu_monitor.h: interaction with QEMU monitor console * - * Copyright (C) 2006-2014 Red Hat, Inc. + * Copyright (C) 2006-2015 Red Hat, Inc. * Copyright (C) 2006 Daniel P. Berrange * * This library is free software; you can redistribute it and/or @@ -734,6 +734,12 @@ int qemuMonitorBlockCommit(qemuMonitorPtr mon, ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3) ATTRIBUTE_NONNULL(4); bool qemuMonitorSupportsActiveCommit(qemuMonitorPtr mon); +char *qemuMonitorDiskNameLookup(qemuMonitorPtr mon, + const char *device, + virStorageSourcePtr top, + virStorageSourcePtr target) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3) + ATTRIBUTE_NONNULL(4); int qemuMonitorArbitraryCommand(qemuMonitorPtr mon, const char *cmd, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index c16f3ca..7759365 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -1,7 +1,7 @@ /* * qemu_monitor_json.c: interaction with QEMU monitor console * - * Copyright (C) 2006-2014 Red Hat, Inc. + * Copyright (C) 2006-2015 Red Hat, Inc. * Copyright (C) 2006 Daniel P. Berrange * * This library is free software; you can redistribute it and/or @@ -3883,6 +3883,91 @@ qemuMonitorJSONDrivePivot(qemuMonitorPtr mon, const char *device, } +static char * +qemuMonitorJSONDiskNameLookupOne(virJSONValuePtr image, + virStorageSourcePtr top, + virStorageSourcePtr target) +{ + virJSONValuePtr backing; + char *ret; + + if (!top) + return NULL; + if (top != target) { + backing = virJSONValueObjectGet(image, "backing-image"); + return qemuMonitorJSONDiskNameLookupOne(backing, top->backingStore, + target); + } + ignore_value(VIR_STRDUP(ret, + virJSONValueObjectGetString(image, "filename"))); + return ret; +} + + +char * +qemuMonitorJSONDiskNameLookup(qemuMonitorPtr mon, + const char *device, + virStorageSourcePtr top, + virStorageSourcePtr target) +{ + char *ret = NULL; + virJSONValuePtr cmd = NULL; + virJSONValuePtr reply = NULL; + virJSONValuePtr devices; + size_t i; + + cmd = qemuMonitorJSONMakeCommand("query-block", NULL); + if (!cmd) + return NULL; + if (qemuMonitorJSONCommand(mon, cmd, &reply) < 0) + goto cleanup; + + devices = virJSONValueObjectGet(reply, "return"); + if (!devices || devices->type != VIR_JSON_TYPE_ARRAY) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("block info reply was missing device list")); + goto cleanup; + } + + for (i = 0; i < virJSONValueArraySize(devices); i++) { + virJSONValuePtr dev = virJSONValueArrayGet(devices, i); + virJSONValuePtr inserted; + virJSONValuePtr image; + const char *thisdev; + + if (!dev || dev->type != VIR_JSON_TYPE_OBJECT) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("block info device entry was not in expected format")); + goto cleanup; + } + + if ((thisdev = virJSONValueObjectGetString(dev, "device")) == NULL) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("block info device entry was not in expected format")); + goto cleanup; + } + + if (STRNEQ(thisdev, device) || + !(inserted = virJSONValueObjectGet(dev, "inserted")) || + !(image = virJSONValueObjectGet(inserted, "image"))) { + continue; + } + ret = qemuMonitorJSONDiskNameLookupOne(image, top, target); + break; + } + if (!ret) + virReportError(VIR_ERR_INTERNAL_ERROR, + _("unable to find backing name for device %s"), + device); + + cleanup: + virJSONValueFree(cmd); + virJSONValueFree(reply); + + return ret; +} + + int qemuMonitorJSONArbitraryCommand(qemuMonitorPtr mon, const char *cmd_str, char **reply_str, diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index 8ceea8a..49392b6 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -1,7 +1,7 @@ /* * qemu_monitor_json.h: interaction with QEMU monitor console * - * Copyright (C) 2006-2009, 2011-2014 Red Hat, Inc. + * Copyright (C) 2006-2009, 2011-2015 Red Hat, Inc. * Copyright (C) 2006 Daniel P. Berrange * * This library is free software; you can redistribute it and/or @@ -277,6 +277,13 @@ int qemuMonitorJSONBlockCommit(qemuMonitorPtr mon, unsigned long long bandwidth) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); +char *qemuMonitorJSONDiskNameLookup(qemuMonitorPtr mon, + const char *device, + virStorageSourcePtr top, + virStorageSourcePtr target) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3) + ATTRIBUTE_NONNULL(4); + int qemuMonitorJSONArbitraryCommand(qemuMonitorPtr mon, const char *cmd_str, char **reply_str, -- 2.1.0

On Wed, Mar 11, 2015 at 03:03:00PM -0600, Eric Blake wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1199182 documents that after a series of disk snapshots into existing destination images, followed by active commits of the top image, it is possible for qemu 2.2 and earlier to end up tracking a different name for the image than what it would have had when opening the chain afresh. That is, when starting with the chain 'a <- b <- c', the name associated with 'b' is how it was spelled in the metadata of 'c', but when starting with 'a', taking two snapshots into 'a <- b <- c', then committing 'c' back into 'b', the name associated with 'b' is now the name used when taking the first snapshot.
Sadly, older qemu doesn't know how to treat different spellings of the same filename as identical files (it uses strcmp() instead of checking for the same inode), which means libvirt's attempt to commit an image using solely the names learned from qcow2 metadata fails with a cryptic:
error: internal error: unable to execute QEMU command 'block-commit': Top image file /tmp/images/c/../b/b not found
even though the file exists. Trying to teach libvirt the rules on which name qemu will expect is not worth the effort (besides, we'd have to remember it across libvirtd restarts, and track whether a file was opened via metadata or via snapshot creation for a given qemu process); it is easier to just always directly ask qemu what string it expects to see in the first place.
If you're going to ask QEMU for the names, then libvirt needs to validate that the name it gets back resolves to the same inode as the name it originally had. We cannot trust QEMU to tell the truth and cannot let it trick us into relabelling the wrong files by giving us the name of something completely different.
qemuDomainObjEnterMonitor(driver, vm); - ret = qemuMonitorBlockJob(priv->mon, device, basePath, backingPath, - speed, mode, async); + if (baseSource) + basePath = qemuMonitorDiskNameLookup(priv->mon, device, disk->src, + baseSource);
So this needs to valid basePath vs baseSource inodes.
@@ -16739,9 +16734,14 @@ qemuDomainBlockCommit(virDomainPtr dom, disk->mirrorJob = VIR_DOMAIN_BLOCK_JOB_TYPE_ACTIVE_COMMIT; } qemuDomainObjEnterMonitor(driver, vm); - ret = qemuMonitorBlockCommit(priv->mon, device, - topPath, basePath, backingPath, - speed); + basePath = qemuMonitorDiskNameLookup(priv->mon, device, disk->src, + baseSource);
And here
+ topPath = qemuMonitorDiskNameLookup(priv->mon, device, disk->src, + topSource);
And here Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 03/12/2015 03:14 AM, Daniel P. Berrange wrote:
even though the file exists. Trying to teach libvirt the rules on which name qemu will expect is not worth the effort (besides, we'd have to remember it across libvirtd restarts, and track whether a file was opened via metadata or via snapshot creation for a given qemu process); it is easier to just always directly ask qemu what string it expects to see in the first place.
If you're going to ask QEMU for the names, then libvirt needs to validate that the name it gets back resolves to the same inode as the name it originally had. We cannot trust QEMU to tell the truth and cannot let it trick us into relabelling the wrong files by giving us the name of something completely different.
We are NOT relabelling files based on the information. I agree that if we act on a name returned by qemu, then we are vulnerable to mistakes if a guest manages to compromise qemu; but I don't think this is one of those situations.
qemuDomainObjEnterMonitor(driver, vm); - ret = qemuMonitorBlockJob(priv->mon, device, basePath, backingPath, - speed, mode, async); + if (baseSource) + basePath = qemuMonitorDiskNameLookup(priv->mon, device, disk->src, + baseSource);
So this needs to valid basePath vs baseSource inodes.
The only use of basePath is the string passed right back to qemu, to kick off the commit or pull operation. Since qemu is using strcmp() (rather than inode comparison), feeding it the name it just told us will make the commit operation work on the correct node. Everything else that libvirt does, such as relabelling files, is done solely on the information libvirt has been tracking (and not reliant on the name returned by qemu). But if it would satisfy your paranoia, I can certainly add a verification step that the string being returned by qemu resolves to the same inode being tracked by libvirt, at least in the case where the <disk> element resolves to a filename rather than a network disk. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Thu, Mar 12, 2015 at 08:09:35AM -0600, Eric Blake wrote:
On 03/12/2015 03:14 AM, Daniel P. Berrange wrote:
even though the file exists. Trying to teach libvirt the rules on which name qemu will expect is not worth the effort (besides, we'd have to remember it across libvirtd restarts, and track whether a file was opened via metadata or via snapshot creation for a given qemu process); it is easier to just always directly ask qemu what string it expects to see in the first place.
If you're going to ask QEMU for the names, then libvirt needs to validate that the name it gets back resolves to the same inode as the name it originally had. We cannot trust QEMU to tell the truth and cannot let it trick us into relabelling the wrong files by giving us the name of something completely different.
We are NOT relabelling files based on the information. I agree that if we act on a name returned by qemu, then we are vulnerable to mistakes if a guest manages to compromise qemu; but I don't think this is one of those situations.
qemuDomainObjEnterMonitor(driver, vm); - ret = qemuMonitorBlockJob(priv->mon, device, basePath, backingPath, - speed, mode, async); + if (baseSource) + basePath = qemuMonitorDiskNameLookup(priv->mon, device, disk->src, + baseSource);
So this needs to valid basePath vs baseSource inodes.
The only use of basePath is the string passed right back to qemu, to kick off the commit or pull operation. Since qemu is using strcmp() (rather than inode comparison), feeding it the name it just told us will make the commit operation work on the correct node. Everything else that libvirt does, such as relabelling files, is done solely on the information libvirt has been tracking (and not reliant on the name returned by qemu).
But if it would satisfy your paranoia, I can certainly add a verification step that the string being returned by qemu resolves to the same inode being tracked by libvirt, at least in the case where the <disk> element resolves to a filename rather than a network disk.
I think it would be desirable, because while your current usage may be safe with these assumptions, if someone refactors this 6 months later they may not realize the security implications of this code. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 03/12/2015 08:23 AM, Daniel P. Berrange wrote:
But if it would satisfy your paranoia, I can certainly add a verification step that the string being returned by qemu resolves to the same inode being tracked by libvirt, at least in the case where the <disk> element resolves to a filename rather than a network disk.
I think it would be desirable, because while your current usage may be safe with these assumptions, if someone refactors this 6 months later they may not realize the security implications of this code.
v2 posted on those grounds. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 11/03/15 15:03 -0600, Eric Blake wrote:
Adam, since you reported the issue, let me know if you need a scratch binary to test this patch on. Jeff, thanks for the idea on how to solve the problem without touching qemu (of course, it would still be nice to teach qemu to respect inode equivalence rather than relying on streq, and even nicer for libvirt to use node names instead of file names, but those can be later patches without holding up VDSM now).
Thanks Eric. I am in the process of doing a crude port to RHEL-7.1 (libvirt-1.2.8-16) in order to verify it, but a proper build would probably be best (when you have some time). -- Adam Litke
participants (3)
-
Adam Litke
-
Daniel P. Berrange
-
Eric Blake