[libvirt] [PATCH 0/6] atomic snapshots, rounds 4 and 5

I was originally going to send this as two rounds, one for the XML additions of adding mirrors, and one for the flag addition to virDomainSnapshotDelete for mapping to drive-reopen; but it turned out that testing is easier if I finish the series. Note that this is minimally tested at the moment; I'm posting it to get review started, but I plan on hammering it tomorrow, and possibly coming up with slight alterations for a v2. This assumes that rounds 1-3 (ending here) have been applied: https://www.redhat.com/archives/libvir-list/2012-March/msg00846.html This should finish out my RFC here: https://www.redhat.com/archives/libvir-list/2012-March/msg00578.html but with one major caveat: https://lists.gnu.org/archive/html/qemu-devel/2012-03/msg01524.html Upstream qemu has not yet committed the 'drive-mirror' or 'drive-reopen' monitor commands, which this series (rounds 4-5) depends on. There are still bugs being worked out under the hood on the qemu side, and there is a slight possibility that fixing those bugs may change the public interface. Therefore, part of the review of this series should be to determine whether we take the risk of applying the patch now, or waiting for qemu to stabilize a bit further. I will probably push rounds 1-3 tomorrow, since they have been ACK'd (well, round 3 is still awaiting review), and since qemu has committed to the 'transaction' monitor command for qemu 1.1. For reference, it is likely that RHEL 6.3 will not wait for qemu 1.1, but will instead provide RHEL-specific monitor commands named __com.redhat_drive-mirror and __com.redhat_drive-reopen; I have hopefully structured things well enough that it will be a couple lines of patching to qemu_monitor_json.c to recognize that alternate command name. Eric Blake (6): snapshot: allow for creation of mirrored snapshots snapshot: add new snapshot delete flags snapshot: make it possible to check for mirrored snapshot snapshot: expose qemu commands for mirrored storage migration snapshot: implement new snapshot delete flags in qemu snapshot: enable mirrored snapshots on transient vm docs/formatsnapshot.html.in | 31 +++ docs/schemas/domainsnapshot.rng | 8 + include/libvirt/libvirt.h.in | 6 + src/conf/domain_conf.c | 58 +++++- src/conf/domain_conf.h | 4 + src/libvirt.c | 37 +++- src/libvirt_private.syms | 2 + src/qemu/qemu_capabilities.c | 3 + src/qemu/qemu_capabilities.h | 2 + src/qemu/qemu_driver.c | 242 +++++++++++++++++++- src/qemu/qemu_monitor.c | 50 ++++ src/qemu/qemu_monitor.h | 14 ++ src/qemu/qemu_monitor_json.c | 70 ++++++- src/qemu/qemu_monitor_json.h | 21 ++- .../disk_snapshot_mirror.xml | 13 + .../disk_snapshot_mirror.xml | 49 ++++ tests/domainsnapshotxml2xmltest.c | 4 +- tools/virsh.c | 13 + tools/virsh.pod | 21 ++- 19 files changed, 622 insertions(+), 26 deletions(-) create mode 100644 tests/domainsnapshotxml2xmlin/disk_snapshot_mirror.xml create mode 100644 tests/domainsnapshotxml2xmlout/disk_snapshot_mirror.xml -- 1.7.7.6

This extends <domainsnapshot> XML to add a new element under each <disk> of a disk snapshot: <disk name='vda'> <source file='/path/to/live'/> <mirror file='/path/to/mirror'/> </disk> For now, if a <mirror> is requested, the snapshot must be external, and assumes the same driver format (qcow2 or qed) as the <source>. qemu allows more flexibility in mirror creation, which could possibly be added by further XML enhancements, but more likely would be better served by adding a new API to specifically target mirroring, as well as XML changes to represent the entire backing chain in <domain> rather than commandeering <domainsnapshot>. Meanwhile, the changes in this patch are sufficient for the oVirt use case of using a mirror for live storage migration. The new XML is parsed but rejected until later patches update the hypervisor drivers to act on mirror requests. The testsuite additions show that we can round-trip the XML. * docs/schemas/domainsnapshot.rng (disksnapshot): Add optional mirror. * docs/formatsnapshot.html.in: Document it. * src/conf/domain_conf.h (VIR_DOMAIN_SNAPSHOT_PARSE_MIRROR): New flag. (_virDomainSnapshotDiskDef): Add new member. * src/conf/domain_conf.c (virDomainSnapshotDiskDefParseXML): Add parameter, to parse or reject mirrors. (virDomainSnapshotDefParseString): Honor new flag. (virDomainSnapshotDefFormat): Output any mirrors. (virDomainSnapshotDiskDefClear): Clean up. * tests/domainsnapshotxml2xmltest.c (mymain): Add a test. (testCompareXMLToXMLFiles): Allow for mirrors. * tests/domainsnapshotxml2xmlout/disk_snapshot_mirror.xml: New file. * tests/domainsnapshotxml2xmlin/disk_snapshot_mirror.xml: Likewise. --- docs/formatsnapshot.html.in | 31 ++++++++++++ docs/schemas/domainsnapshot.rng | 8 +++ src/conf/domain_conf.c | 38 ++++++++++++++- src/conf/domain_conf.h | 2 + .../disk_snapshot_mirror.xml | 13 +++++ .../disk_snapshot_mirror.xml | 49 ++++++++++++++++++++ tests/domainsnapshotxml2xmltest.c | 4 +- 7 files changed, 141 insertions(+), 4 deletions(-) create mode 100644 tests/domainsnapshotxml2xmlin/disk_snapshot_mirror.xml create mode 100644 tests/domainsnapshotxml2xmlout/disk_snapshot_mirror.xml diff --git a/docs/formatsnapshot.html.in b/docs/formatsnapshot.html.in index ec5ebf3..cd39f1b 100644 --- a/docs/formatsnapshot.html.in +++ b/docs/formatsnapshot.html.in @@ -90,6 +90,30 @@ another snapshot. </p> <p> + External disk snapshots have another use of allowing live + storage migration across storage domains, while still + guaranteeing that at all points in time along the storage + migration, there exists a single storage domain with a + consistent view of the guest's data. This is done by adding the + concept of snapshot mirroring, where a snapshot request + specifies not only the new file name on the original storage + domain, but a secondary mirror file name on the target storage + domain. While the snapshot is active, reads are still tied to + the first storage domain, but all writes of changes from the + backing file are recorded in both storage domains; and the + management application can synchronize the backing files across + storage domains in parallel with the guest execution. Once the + second storage domain has all data in place, the management + application then deletes the snapshot, which stops the mirroring + and lets the hypervisor swap over to the second storage domain, + so that the first storage domain is no longer in active use. + Depending on the hypervisor, the use of a mirrored snapshots may + be restricted to transient domains, and the existence of a + mirrored snapshot can prevent migration, domain saves, disk hot + plug actions, and further snapshot manipulation until the + mirrored snapshot is deleted in order to stop the mirroring. + </p> + <p> The top-level <code>domainsnapshot</code> element may contain the following elements: </p> @@ -156,6 +180,13 @@ snapshots, the original file name becomes the read-only snapshot, and the new file name contains the read-write delta of all disk changes since the snapshot. + <span class="since">Since 0.9.11</span>, an external + snapshot may also have an optional + element <code>mirror</code>, which names a second file + alongside <code>source</code> that will mirror all changes + since the snapshot and using the same file format. A + mirror must have the <code>file</code> attribute listing + the file name to use. </dd> </dl> </dd> diff --git a/docs/schemas/domainsnapshot.rng b/docs/schemas/domainsnapshot.rng index 0ef0631..86193a7 100644 --- a/docs/schemas/domainsnapshot.rng +++ b/docs/schemas/domainsnapshot.rng @@ -121,6 +121,14 @@ <empty/> </element> </optional> + <optional> + <element name='mirror'> + <attribute name='file'> + <ref name='absFilePath'/> + </attribute> + <empty/> + </element> + </optional> </interleave> </group> </choice> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index d5def1c..f84304c 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -13212,6 +13212,7 @@ virDomainSnapshotDiskDefClear(virDomainSnapshotDiskDefPtr disk) VIR_FREE(disk->name); VIR_FREE(disk->file); VIR_FREE(disk->driverType); + VIR_FREE(disk->mirror); } void virDomainSnapshotDefFree(virDomainSnapshotDefPtr def) @@ -13233,11 +13234,13 @@ void virDomainSnapshotDefFree(virDomainSnapshotDefPtr def) static int virDomainSnapshotDiskDefParseXML(xmlNodePtr node, - virDomainSnapshotDiskDefPtr def) + virDomainSnapshotDiskDefPtr def, + bool allow_mirror) { int ret = -1; char *snapshot = NULL; xmlNodePtr cur; + xmlNodePtr mirror = NULL; def->name = virXMLPropString(node, "name"); if (!def->name) { @@ -13266,14 +13269,38 @@ virDomainSnapshotDiskDefParseXML(xmlNodePtr node, } else if (!def->driverType && xmlStrEqual(cur->name, BAD_CAST "driver")) { def->driverType = virXMLPropString(cur, "type"); + } else if (!mirror && xmlStrEqual(cur->name, BAD_CAST "mirror")) { + mirror = cur; } } cur = cur->next; } - if (!def->snapshot && (def->file || def->driverType)) + if (!def->snapshot && (def->file || def->driverType || mirror)) def->snapshot = VIR_DOMAIN_DISK_SNAPSHOT_EXTERNAL; + if (mirror) { + if (!allow_mirror) { + virDomainReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, + _("unable to handle mirroring for '%s'"), + def->name); + goto cleanup; + } + def->mirror = virXMLPropString(mirror, "file"); + if (!def->mirror) { + virDomainReportError(VIR_ERR_XML_ERROR, + _("mirror for '%s' requires file attribute"), + def->name); + goto cleanup; + } + if (def->snapshot != VIR_DOMAIN_DISK_SNAPSHOT_EXTERNAL) { + virDomainReportError(VIR_ERR_XML_ERROR, + _("mirror for '%s' requires external snapshot"), + def->name); + goto cleanup; + } + } + ret = 0; cleanup: VIR_FREE(snapshot); @@ -13303,6 +13330,7 @@ virDomainSnapshotDefParseString(const char *xmlStr, int active; char *tmp; int keepBlanksDefault = xmlKeepBlanksDefault(0); + bool allow_mirror = (flags & VIR_DOMAIN_SNAPSHOT_PARSE_MIRROR) != 0; xml = virXMLParseCtxt(NULL, xmlStr, _("(domain_snapshot)"), &ctxt); if (!xml) { @@ -13406,7 +13434,8 @@ virDomainSnapshotDefParseString(const char *xmlStr, goto cleanup; } for (i = 0; i < def->ndisks; i++) { - if (virDomainSnapshotDiskDefParseXML(nodes[i], &def->disks[i]) < 0) + if (virDomainSnapshotDiskDefParseXML(nodes[i], &def->disks[i], + allow_mirror) < 0) goto cleanup; } VIR_FREE(nodes); @@ -13671,6 +13700,9 @@ char *virDomainSnapshotDefFormat(const char *domain_uuid, if (disk->file) virBufferEscapeString(&buf, " <source file='%s'/>\n", disk->file); + if (disk->mirror) + virBufferEscapeString(&buf, " <mirror file='%s'/>\n", + disk->mirror); virBufferAddLit(&buf, " </disk>\n"); } else { virBufferAddLit(&buf, "/>\n"); diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index f471e35..01a504b 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -1642,6 +1642,7 @@ struct _virDomainSnapshotDiskDef { int snapshot; /* enum virDomainDiskSnapshot */ char *file; /* new source file when snapshot is external */ char *driverType; /* file format type of new file */ + char *mirror; /* external snapshot mirror, of same file format as file */ }; /* Stores the complete snapshot metadata */ @@ -1690,6 +1691,7 @@ typedef enum { VIR_DOMAIN_SNAPSHOT_PARSE_REDEFINE = 1 << 0, VIR_DOMAIN_SNAPSHOT_PARSE_DISKS = 1 << 1, VIR_DOMAIN_SNAPSHOT_PARSE_INTERNAL = 1 << 2, + VIR_DOMAIN_SNAPSHOT_PARSE_MIRROR = 1 << 3, } virDomainSnapshotParseFlags; virDomainSnapshotDefPtr virDomainSnapshotDefParseString(const char *xmlStr, diff --git a/tests/domainsnapshotxml2xmlin/disk_snapshot_mirror.xml b/tests/domainsnapshotxml2xmlin/disk_snapshot_mirror.xml new file mode 100644 index 0000000..4085260 --- /dev/null +++ b/tests/domainsnapshotxml2xmlin/disk_snapshot_mirror.xml @@ -0,0 +1,13 @@ +<domainsnapshot> + <name>my snap name</name> + <description>!@#$%^</description> + <disks> + <disk name='hda' snapshot='external'> + <source file='/path/to/new'/> + <mirror file='/path/to/other'/> + </disk> + <disk name='hdb'> + <mirror file='/path/to/other2'/> + </disk> + </disks> +</domainsnapshot> diff --git a/tests/domainsnapshotxml2xmlout/disk_snapshot_mirror.xml b/tests/domainsnapshotxml2xmlout/disk_snapshot_mirror.xml new file mode 100644 index 0000000..c99fd5b --- /dev/null +++ b/tests/domainsnapshotxml2xmlout/disk_snapshot_mirror.xml @@ -0,0 +1,49 @@ +<domainsnapshot> + <name>my snap name</name> + <description>!@#$%^</description> + <state>disk-snapshot</state> + <creationTime>1272917631</creationTime> + <disks> + <disk name='hda' snapshot='external'> + <driver type='qcow2'/> + <source file='/path/to/new'/> + <mirror file='/path/to/other'/> + </disk> + <disk name='hdb' snapshot='external'> + <driver type='qcow2'/> + <source file='/path/to/generated'/> + <mirror file='/path/to/other2'/> + </disk> + </disks> + <domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219100</memory> + <currentMemory unit='KiB'>219100</currentMemory> + <vcpu placement='static' cpuset='1-4,8-20,525'>1</vcpu> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu</emulator> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='ide'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest2'/> + <target dev='hdb' bus='ide'/> + <address type='drive' controller='0' bus='1' target='0' unit='0'/> + </disk> + <controller type='usb' index='0'/> + <controller type='ide' index='0'/> + <memballoon model='virtio'/> + </devices> + </domain> +</domainsnapshot> diff --git a/tests/domainsnapshotxml2xmltest.c b/tests/domainsnapshotxml2xmltest.c index 978a2d4..f706b8b 100644 --- a/tests/domainsnapshotxml2xmltest.c +++ b/tests/domainsnapshotxml2xmltest.c @@ -26,7 +26,8 @@ testCompareXMLToXMLFiles(const char *inxml, const char *uuid, int internal) int ret = -1; virDomainSnapshotDefPtr def = NULL; unsigned int flags = (VIR_DOMAIN_SNAPSHOT_PARSE_REDEFINE | - VIR_DOMAIN_SNAPSHOT_PARSE_DISKS); + VIR_DOMAIN_SNAPSHOT_PARSE_DISKS | + VIR_DOMAIN_SNAPSHOT_PARSE_MIRROR); if (virtTestLoadFile(inxml, &inXmlData) < 0) goto fail; @@ -105,6 +106,7 @@ mymain(void) DO_TEST("all_parameters", "9d37b878-a7cc-9f9a-b78f-49b3abad25a8", 1); DO_TEST("disk_snapshot", "c7a5fdbd-edaf-9455-926a-d65c16db1809", 1); + DO_TEST("disk_snapshot_mirror", "c7a5fdbd-edaf-9455-926a-d65c16db1809", 0); DO_TEST("full_domain", "c7a5fdbd-edaf-9455-926a-d65c16db1809", 1); DO_TEST("noparent_nodescription_noactive", NULL, 0); DO_TEST("noparent_nodescription", NULL, 1); -- 1.7.7.6

This completes the public API for using mirrored snapshots as a means of performing live storage migration. Of course, it will take additional patches to actually provide the implementation. The idea here is that oVirt can start with a domain with 'vda' open on /path1/to/old.qcow2 with a base file of /path1/to/common, then do the following steps for a live migration to /path2 (here using virsh commands, although the underlying API would be available to oVirt through other language bindings as well). First, pre-create two files; it is important that the mirror file have a relative backing file name (if we let qemu create the entire snapshot, the backing file name would be absolute to /path1, and pivoting to the mirror would still be using the original source): $ qemu-img create -f qcow2 -o backing_file=old.qcow2 \ -o backing_fmt=qcow2 /path1/to/old.migrate $ qemu-img create -f qcow2 -o backing_file=old.qcow2 \ -o backing_fmt=qcow2 /path2/to/new.qcow2 Next, create a mirrored snapshot, while telling qemu to respect the pre-existing qcow2 header in both files: $ virsh snapshot-create-as $dom migrate --reuse-external \ --diskspec vda,file=/path1/to/old.migrate,mirror=/path2/to/new.qcow2 which means 'vda' is now open read-write on /path1/to/old.migrate and mirrored (write only) on /path2/to/new.qcow2, where qemu is using /path1/to/old.qcow2 as the logial backing for both files, but where the relative name is still intact in /path2/to/new.qcow2. Next, the backing files can be copied between locations: $ cp /path1/to/common /path1/to/old.qcow2 /path2/to When that is complete, the mirrored snapshot is no longer necessary, and requesting a pivot while deleting things will force qemu to reread the header of /path2/to/new.qcow2, notice a relative backing file name, and open /path2/to/old.qcow2 as the logical backing file: $ virsh snapshot-delete $dom migrate --mirror-pivot Now /path1 is no longer in use, but the backing chain on /path2 is longer than original. This can be cleaned up with: $ virsh blockpull $dom vda --base /path2/to/common to go back to having /path2/to/new.qcow2 directly backed by /path2/to/common. That smells like a hack, right? Well, there is a proposal for a better, more powerful API named virDomainBlockCopy: https://www.redhat.com/archives/libvir-list/2012-March/msg00585.html which would more appropriately expose the various knobs of the new qemu commands; but being a new API, it lacks the ability to be backported without causing a .so bump. So this is the compromise. * include/libvirt/libvirt.h.in (VIR_DOMAIN_SNAPSHOT_DELETE_MIRROR_ABORT) (VIR_DOMAIN_SNAPSHOT_DELETE_MIRROR_PIVOT): New flags. * src/libvirt.c (virDomainSnapshotDelete): Document them. (virDomainSnapshotCreateXML, virDomainRevertToSnapshot) (virDomainSaveFlags): Mention effects of mirrored snapshots. * tools/virsh.c (vshParseSnapshotDiskspec): Add <mirror> support. (cmdSnapshotDelete): Add --mirror-abort, --mirror-pivot flags. * tools/virsh.pod (snapshot-delete, snapshot-create-as): Document new options. --- include/libvirt/libvirt.h.in | 6 ++++++ src/libvirt.c | 37 +++++++++++++++++++++++++++++++++++-- tools/virsh.c | 13 +++++++++++++ tools/virsh.pod | 21 ++++++++++++++------- 4 files changed, 68 insertions(+), 9 deletions(-) diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index 4566580..dbb6358 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -3369,6 +3369,12 @@ typedef enum { VIR_DOMAIN_SNAPSHOT_DELETE_CHILDREN = (1 << 0), /* Also delete children */ VIR_DOMAIN_SNAPSHOT_DELETE_METADATA_ONLY = (1 << 1), /* Delete just metadata */ VIR_DOMAIN_SNAPSHOT_DELETE_CHILDREN_ONLY = (1 << 2), /* Delete just children */ + VIR_DOMAIN_SNAPSHOT_DELETE_MIRROR_ABORT = (1 << 3), /* Stop a mirrored + snapshot, reopening + to the source. */ + VIR_DOMAIN_SNAPSHOT_DELETE_MIRROR_PIVOT = (1 << 4), /* Stop a mirrored + snapshot, reopening + to the mirror. */ } virDomainSnapshotDeleteFlags; int virDomainSnapshotDelete(virDomainSnapshotPtr snapshot, diff --git a/src/libvirt.c b/src/libvirt.c index 7df3667..800d1ee 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -2702,6 +2702,9 @@ error: * @flags will override what state gets saved into the file. These * two flags are mutually exclusive. * + * Some hypervisors may prevent this call if there is a current + * mirrored snapshot. + * * A save file can be inspected or modified slightly with * virDomainSaveImageGetXMLDesc() and virDomainSaveImageDefineXML(). * @@ -17087,12 +17090,14 @@ virDomainSnapshotGetConnect(virDomainSnapshotPtr snapshot) * not exist, the hypervisor may validate that reverting to the * snapshot appears to be possible (for example, disk images have * snapshot contents by the requested name). Not all hypervisors - * support these flags. + * support these flags; this might also be forbidden if there is a + * current mirrored snapshot. * * If @flags includes VIR_DOMAIN_SNAPSHOT_CREATE_NO_METADATA, then the * domain's disk images are modified according to @xmlDesc, but then * the just-created snapshot has its metadata deleted. This flag is - * incompatible with VIR_DOMAIN_SNAPSHOT_CREATE_REDEFINE. + * incompatible with VIR_DOMAIN_SNAPSHOT_CREATE_REDEFINE, and may + * also prevent any @xmlDesc that tries to create a mirrored snapshot. * * If @flags includes VIR_DOMAIN_SNAPSHOT_CREATE_HALT, then the domain * will be inactive after the snapshot completes, regardless of whether @@ -17698,6 +17703,9 @@ error: * inactive snapshots with a @flags request to start the domain after * the revert. * + * Some hypervisors may prevent reverting to snapshots if there is an + * active mirrored snapshot. + * * Returns 0 if the creation is successful, -1 on error. */ int @@ -17764,6 +17772,25 @@ error: * libvirt metadata to track snapshots, then this flag is silently * ignored. * + * If the domain has a current mirrored snapshot (that is, if the XML + * given to virDomainSnapshotCreateXML() included <mirror> elements for + * a disk), then the caller must specify how to end the mirroring before + * the snapshot can be deleted. In this situation, @snap must be the + * current mirrored snapshot, and @flags must contain either + * VIR_DOMAIN_SNAPSHOT_DELETE_MIRROR_ABORT to abandon the mirror while + * keeping the primary external file, or contain + * VIR_DOMAIN_SNAPSHOT_DELETE_MIRROR_PIVOT to reopen the storage device + * using just the mirror, and abandon the primary external file. Note + * that while virDomainSnapshotCreateXML can create mirrors atomically, + * the deletion process might not be atomic; in this case, the operation + * will fail, but only after updating the snapshot object to record + * which mirrors were successfully reopened. If atomicity is important + * when using mirrored snapshots to perform live storage migration, it + * is recommended to migrate only one disk per snapshot. After using + * a mirrored snapshot to perform a storage migration, the caller may + * also want to use virDomainBlockRebase() to reduce the length of the + * backing chain that was lengthened by the snapshot process. + * * Returns 0 if the selected snapshot(s) were successfully deleted, * -1 on error. */ @@ -17797,6 +17824,12 @@ virDomainSnapshotDelete(virDomainSnapshotPtr snapshot, "mutually exclusive")); goto error; } + if ((flags & VIR_DOMAIN_SNAPSHOT_DELETE_MIRROR_ABORT) && + (flags & VIR_DOMAIN_SNAPSHOT_DELETE_MIRROR_PIVOT)) { + virLibDomainError(VIR_ERR_INVALID_ARG, + _("delete mirror flags are mutually exclusive")); + goto error; + } if (conn->driver->domainSnapshotDelete) { int ret = conn->driver->domainSnapshotDelete(snapshot, flags); diff --git a/tools/virsh.c b/tools/virsh.c index 2548b73..9c0358e 100644 --- a/tools/virsh.c +++ b/tools/virsh.c @@ -15790,6 +15790,7 @@ vshParseSnapshotDiskspec(vshControl *ctl, virBufferPtr buf, const char *str) char *snapshot = NULL; char *driver = NULL; char *file = NULL; + char *mirror = NULL; char *spec = vshStrdup(ctl, str); char *tmp = spec; size_t len = strlen(str); @@ -15813,6 +15814,8 @@ vshParseSnapshotDiskspec(vshControl *ctl, virBufferPtr buf, const char *str) driver = tmp + strlen("driver="); else if (!file && STRPREFIX(tmp, "file=")) file = tmp + strlen("file="); + else if (!mirror && STRPREFIX(tmp, "mirror=")) + mirror = tmp + strlen("mirror="); else goto cleanup; } @@ -15826,6 +15829,8 @@ vshParseSnapshotDiskspec(vshControl *ctl, virBufferPtr buf, const char *str) virBufferAsprintf(buf, " <driver type='%s'/>\n", driver); if (file) virBufferEscapeString(buf, " <source file='%s'/>\n", file); + if (mirror) + virBufferEscapeString(buf, " <mirror file='%s'/>\n", mirror); virBufferAddLit(buf, " </disk>\n"); } else { virBufferAddLit(buf, "/>\n"); @@ -16861,6 +16866,10 @@ static const vshCmdOptDef opts_snapshot_delete[] = { {"children-only", VSH_OT_BOOL, 0, N_("delete children but not snapshot")}, {"metadata", VSH_OT_BOOL, 0, N_("delete only libvirt metadata, leaving snapshot contents behind")}, + {"mirror-abort", VSH_OT_BOOL, 0, + N_("abort a mirror while removing snapshot")}, + {"mirror-pivot", VSH_OT_BOOL, 0, + N_("pivot to the mirror before removing snapshot")}, {NULL, 0, 0, NULL} }; @@ -16890,6 +16899,10 @@ cmdSnapshotDelete(vshControl *ctl, const vshCmd *cmd) flags |= VIR_DOMAIN_SNAPSHOT_DELETE_CHILDREN_ONLY; if (vshCommandOptBool(cmd, "metadata")) flags |= VIR_DOMAIN_SNAPSHOT_DELETE_METADATA_ONLY; + if (vshCommandOptBool(cmd, "mirror-abort")) + flags |= VIR_DOMAIN_SNAPSHOT_DELETE_MIRROR_ABORT; + if (vshCommandOptBool(cmd, "mirror-pivot")) + flags |= VIR_DOMAIN_SNAPSHOT_DELETE_MIRROR_PIVOT; /* XXX If we wanted, we could emulate DELETE_CHILDREN_ONLY even on * older servers that reject the flag, by manually computing the diff --git a/tools/virsh.pod b/tools/virsh.pod index 8517fe5..5712c83 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -2355,12 +2355,12 @@ is specified, the snapshot will not include vm state. The I<--disk-only> flag is used to request a disk-only snapshot. When this flag is in use, the command can also take additional I<diskspec> arguments to add <disk> elements to the xml. Each <diskspec> is in the -form B<disk[,snapshot=type][,driver=type][,file=name]>. To include a -literal comma in B<disk> or in B<file=name>, escape it with a second -comma. A literal I<--diskspec> must preceed each B<diskspec> unless -all three of I<domain>, I<name>, and I<description> are also present. -For example, a diskspec of "vda,snapshot=external,file=/path/to,,new" -results in the following XML: +form B<disk[,snapshot=type][,driver=type][,file=name][,mirror=name]>. +To include a literal comma in B<disk>, B<file=name>, or B<mirror=name>, +escape it with a second comma. A literal I<--diskspec> must preceed +each B<diskspec> unless all three of I<domain>, I<name>, and +I<description> are also present. For example, a diskspec of +"vda,snapshot=external,file=/path/to,,new" results in the following XML: <disk name='vda' snapshot='external'> <source file='/path/to,new'/> </disk> @@ -2503,7 +2503,7 @@ with an inactive snapshot that is combined with the I<--start> or I<--pause> flag. =item B<snapshot-delete> I<domain> {I<snapshot> | I<--current>} [I<--metadata>] -[{I<--children> | I<--children-only>}] +[{I<--children> | I<--children-only>}] [{I<--mirror-abort> | I<--mirror-pivot>}] Delete the snapshot for the domain named I<snapshot>, or the current snapshot with I<--current>. If this snapshot @@ -2518,6 +2518,13 @@ maintained by libvirt, while leaving the snapshot contents intact for access by external tools; otherwise deleting a snapshot also removes the data contents from that point in time. +If the snapshot is mirrored (that is, the <domainsnapshot> XML used to +create the snapshot included an external <mirror> designation for use +in live migration of storage), then this command will fail unless the +current snapshot is being deleted, and either the I<--mirror-abort> or +I<--mirror-pivot> flag is present to control which half of the mirror +is kept active. + =back =head1 NWFILTER COMMMANDS -- 1.7.7.6

For now, disk migration via mirroring is not implemented. But when we do implement it, we have to deal with the fact that qemu does not provide an easy way to re-start a qemu process with mirroring still intact (it _might_ be possible by using qemu -S then an initial 'drive-mirror' with disk reuse before starting the domain, but that gets hairy). Even something like 'virDomainSave' becomes hairy, if you realize the implications that 'virDomainRestore' would be stuck with recreating the same mirror layout. But if we step back and look at the bigger picture, we realize that the initial client of live storage migration via disk mirroring is oVirt, which always uses transient domains, and that if a transient domain is destroyed while a mirror exists, oVirt can easily restart the storage migration by creating a new domain that visits just the source storage, with no loss in data. We can make life a lot easier by being cowards, and forbidding certain operations on a domain. This patch guarantees that there will be at most one snapshot with disk mirroring, that it will always be the current snapshot, and that the user cannot redefine that snapshot nor hot-plug or hot-unplug any domain disks - for now, the only way to delete such a snapshot is by destroying the transient domain it is attached to. A future patch will add the ability to also delete such snapshots under a condition of a flag which says how to end the mirroring with the 'drive-reopen' monitor command; and once that is in place, then we can finally implement creation of the mirroring via 'drive-mirror'. With just this patch applied, the only way you can ever get virDomainHasDiskMirror to return true is by manually playing with the libvirt internal directory and then restarting libvirtd. * src/conf/domain_conf.h (virDomainSnapshotHasDiskMirror) (virDomainHasDiskMirror): New prototypes. * src/conf/domain_conf.c (virDomainSnapshotHasDiskMirror) (virDomainHasDiskMirror): Implement them. * src/libvirt_private.syms (domain_conf.h): Export them. * src/qemu/qemu_driver.c (qemuDomainSaveInternal) (qemuDomainSnapshotCreateXML, qemuDomainRevertToSnapshot) (qemuDomainSnapshotDelete, qemuDomainAttachDeviceDiskLive) (qemuDomainDetachDeviceDiskLive): Use it. (qemuDomainSnapshotLoad): Allow libvirtd restarts with active disk mirroring, but sanity check for only a current image with mirrors. --- src/conf/domain_conf.c | 20 +++++++++++++++++ src/conf/domain_conf.h | 2 + src/libvirt_private.syms | 2 + src/qemu/qemu_driver.c | 53 +++++++++++++++++++++++++++++++++++++++++++++- 4 files changed, 76 insertions(+), 1 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index f84304c..85abf32 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -13652,6 +13652,26 @@ cleanup: return ret; } +/* Determine if the given snapshot contains a disk mirror. */ +bool +virDomainSnapshotHasDiskMirror(virDomainSnapshotDefPtr snapshot) +{ + int i; + for (i = 0; i < snapshot->ndisks; i++) + if (snapshot->disks[i].mirror) + return true; + return false; +} + +/* Determine if the given domain has a current snapshot that contains + * a disk mirror. */ +bool virDomainHasDiskMirror(virDomainObjPtr vm) +{ + if (!vm->current_snapshot) + return false; + return virDomainSnapshotHasDiskMirror(vm->current_snapshot->def); +} + char *virDomainSnapshotDefFormat(const char *domain_uuid, virDomainSnapshotDefPtr def, unsigned int flags, diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 01a504b..1ea7332 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -1706,6 +1706,7 @@ char *virDomainSnapshotDefFormat(const char *domain_uuid, int virDomainSnapshotAlignDisks(virDomainSnapshotDefPtr snapshot, int default_snapshot, bool require_match); +bool virDomainSnapshotHasDiskMirror(virDomainSnapshotDefPtr snapshot); virDomainSnapshotObjPtr virDomainSnapshotAssignDef(virDomainSnapshotObjListPtr snapshots, const virDomainSnapshotDefPtr def); @@ -1931,6 +1932,7 @@ virDomainDiskDefPtr virDomainDiskRemove(virDomainDefPtr def, size_t i); virDomainDiskDefPtr virDomainDiskRemoveByName(virDomainDefPtr def, const char *name); +bool virDomainHasDiskMirror(virDomainObjPtr vm); int virDomainNetIndexByMac(virDomainDefPtr def, const unsigned char *mac); int virDomainNetInsert(virDomainDefPtr def, virDomainNetDefPtr net); diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 9a718b4..6739733 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -358,6 +358,7 @@ virDomainGraphicsSpiceZlibCompressionTypeFromString; virDomainGraphicsSpiceZlibCompressionTypeToString; virDomainGraphicsTypeFromString; virDomainGraphicsTypeToString; +virDomainHasDiskMirror; virDomainHostdevDefAlloc; virDomainHostdevDefClear; virDomainHostdevDefFree; @@ -450,6 +451,7 @@ virDomainSnapshotDropParent; virDomainSnapshotFindByName; virDomainSnapshotForEachChild; virDomainSnapshotForEachDescendant; +virDomainSnapshotHasDiskMirror; virDomainSnapshotObjListGetNames; virDomainSnapshotObjListGetNamesFrom; virDomainSnapshotObjListNum; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 6dd1b32..89aa56c 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -320,7 +320,8 @@ static void qemuDomainSnapshotLoad(void *payload, char ebuf[1024]; unsigned int flags = (VIR_DOMAIN_SNAPSHOT_PARSE_REDEFINE | VIR_DOMAIN_SNAPSHOT_PARSE_DISKS | - VIR_DOMAIN_SNAPSHOT_PARSE_INTERNAL); + VIR_DOMAIN_SNAPSHOT_PARSE_INTERNAL | + VIR_DOMAIN_SNAPSHOT_PARSE_MIRROR); virDomainObjLock(vm); if (virAsprintf(&snapDir, "%s/%s", baseDir, vm->def->name) < 0) { @@ -373,6 +374,16 @@ static void qemuDomainSnapshotLoad(void *payload, VIR_FREE(xmlStr); continue; } + if (!def->current && virDomainSnapshotHasDiskMirror(def)) { + /* Someone must have hand-modified the directory; ignore them. */ + VIR_ERROR(_("Disk mirroring unexpected since snapshot file '%s' " + "does not claim to be the current snapshot"), + fullpath); + virDomainSnapshotDefFree(def); + VIR_FREE(fullpath); + VIR_FREE(xmlStr); + continue; + } snap = virDomainSnapshotAssignDef(&vm->snapshots, def); if (snap == NULL) { @@ -2536,6 +2547,11 @@ qemuDomainSaveInternal(struct qemud_driver *driver, virDomainPtr dom, "%s", _("domain is marked for auto destroy")); goto cleanup; } + if (virDomainHasDiskMirror(vm)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain has active disk mirrors")); + goto cleanup; + } memset(&header, 0, sizeof(header)); memcpy(header.magic, QEMUD_SAVE_PARTIAL, sizeof(header.magic)); @@ -5067,6 +5083,12 @@ qemuDomainAttachDeviceDiskLive(virConnectPtr conn, virCgroupPtr cgroup = NULL; int ret = -1; + if (virDomainHasDiskMirror(vm)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain has active disk mirrors")); + goto end; + } + if (disk->driverName != NULL && !STREQ(disk->driverName, "qemu")) { qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, _("unsupported driver name '%s' for disk '%s'"), @@ -5217,6 +5239,12 @@ qemuDomainDetachDeviceDiskLive(struct qemud_driver *driver, virDomainDiskDefPtr disk = dev->data.disk; int ret = -1; + if (virDomainHasDiskMirror(vm)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain has active disk mirrors")); + return -1; + } + switch (disk->device) { case VIR_DOMAIN_DISK_DEVICE_DISK: case VIR_DOMAIN_DISK_DEVICE_LUN: @@ -10239,6 +10267,12 @@ qemuDomainSnapshotCreateXML(virDomainPtr domain, "%s", _("domain is marked for auto destroy")); goto cleanup; } + if (virDomainHasDiskMirror(vm)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain has active disk mirrors")); + goto cleanup; + } + if (!vm->persistent && (flags & VIR_DOMAIN_SNAPSHOT_CREATE_HALT)) { qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", _("cannot halt after transient domain snapshot")); @@ -10847,6 +10881,12 @@ static int qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, goto cleanup; } + if (virDomainHasDiskMirror(vm)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain has active disk mirrors")); + goto cleanup; + } + snap = virDomainSnapshotFindByName(&vm->snapshots, snapshot->name); if (!snap) { qemuReportError(VIR_ERR_NO_DOMAIN_SNAPSHOT, @@ -11216,6 +11256,17 @@ static int qemuDomainSnapshotDelete(virDomainSnapshotPtr snapshot, goto cleanup; } + if (virDomainHasDiskMirror(vm)) { + if (snap != vm->current_snapshot) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain has active disk mirrors")); + goto cleanup; + } + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("deletion of active disk mirrors unimplemented")); + goto cleanup; + } + if (!(flags & VIR_DOMAIN_SNAPSHOT_DELETE_METADATA_ONLY)) { if (!(flags & VIR_DOMAIN_SNAPSHOT_DELETE_CHILDREN_ONLY) && snap->def->state == VIR_DOMAIN_DISK_SNAPSHOT) -- 1.7.7.6

A snapshot-based mirrored storage migration sequence requires both the 'drive-mirror' action in 'transaction' (present if the 'drive-mirror' standalone monitor command also exists) and the 'drive-reopen' monitor command (it would be nice if that were also part of a 'transaction', but the initial qemu implementation has it standalone only). As of this[1] qemu email, both commands have been proposed but not yet incorporated into the tree, so there is a risk that qemu 1.1 will not have these commands, or will have something subtly different. [1]https://lists.gnu.org/archive/html/qemu-devel/2012-03/msg01524.html * src/qemu/qemu_capabilities.h (QEMU_CAPS_DRIVE_MIRROR) (QEMU_CAPS_DRIVE_REOPEN): New bits. * src/qemu/qemu_capabilities.c (qemuCaps): Name them. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONCheckCommands): Set them. (qemuMonitorJSONDriveMirror, qemuMonitorDriveReopen): New functions. * src/qemu/qemu_monitor_json.h (qemuMonitorJSONDriveMirror) (qemuMonitorDriveReopen): Declare them. * src/qemu/qemu_monitor.c (qemuMonitorDriveMirror) (qemuMonitorDriveReopen): New passthroughs. * src/qemu/qemu_monitor.h (qemuMonitorDriveMirror) (qemuMonitorDriveReopen): Declare them. --- src/qemu/qemu_capabilities.c | 3 ++ src/qemu/qemu_capabilities.h | 2 + src/qemu/qemu_monitor.c | 50 ++++++++++++++++++++++++++++++ src/qemu/qemu_monitor.h | 14 ++++++++ src/qemu/qemu_monitor_json.c | 70 +++++++++++++++++++++++++++++++++++++++-- src/qemu/qemu_monitor_json.h | 21 +++++++++++- 6 files changed, 154 insertions(+), 6 deletions(-) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 0e09d6d..1938ae4 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -156,6 +156,9 @@ VIR_ENUM_IMPL(qemuCaps, QEMU_CAPS_LAST, "scsi-disk.channel", "scsi-block", "transaction", + + "drive-mirror", /* 90 */ + "drive-reopen", ); struct qemu_feature_flags { diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index 78cdbe0..405bf2a 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -124,6 +124,8 @@ enum qemuCapsFlags { QEMU_CAPS_SCSI_DISK_CHANNEL = 87, /* Is scsi-disk.channel available? */ QEMU_CAPS_SCSI_BLOCK = 88, /* -device scsi-block */ QEMU_CAPS_TRANSACTION = 89, /* transaction monitor command */ + QEMU_CAPS_DRIVE_MIRROR = 90, /* drive-mirror monitor command */ + QEMU_CAPS_DRIVE_REOPEN = 91, /* drive-reopen monitor command */ QEMU_CAPS_LAST, /* this must always be the last item */ }; diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index bbe0f98..06d24c8 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -2652,6 +2652,32 @@ qemuMonitorDiskSnapshot(qemuMonitorPtr mon, virJSONValuePtr actions, return ret; } +/* Add the drive-mirror action to a transaction. */ +int +qemuMonitorDriveMirror(qemuMonitorPtr mon, virJSONValuePtr actions, + const char *device, const char *file, + const char *format, bool reuse) +{ + int ret; + + VIR_DEBUG("mon=%p, actions=%p, device=%s, file=%s, format=%s, reuse=%d", + mon, actions, device, file, format, reuse); + + if (!mon) { + qemuReportError(VIR_ERR_INVALID_ARG, "%s", + _("monitor must not be NULL")); + return -1; + } + + if (mon->json) + ret = qemuMonitorJSONDriveMirror(mon, actions, device, file, format, + reuse); + else + qemuReportError(VIR_ERR_INVALID_ARG, "%s", + _("drive-mirror requires JSON monitor")); + return ret; +} + /* Use the transaction QMP command to run atomic snapshot commands. */ int qemuMonitorTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) @@ -2668,6 +2694,30 @@ qemuMonitorTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) return ret; } +/* Use the drive-reopen monitor command. */ +int +qemuMonitorDriveReopen(qemuMonitorPtr mon, const char *device, + const char *file, const char *format) +{ + int ret; + + VIR_DEBUG("mon=%p, device=%s, file=%s, format=%s", + mon, device, file, format); + + if (!mon) { + qemuReportError(VIR_ERR_INVALID_ARG, "%s", + _("monitor must not be NULL")); + return -1; + } + + if (mon->json) + ret = qemuMonitorJSONDriveReopen(mon, device, file, format); + else + qemuReportError(VIR_ERR_INVALID_ARG, "%s", + _("drive-reopen requires JSON monitor")); + return ret; +} + int qemuMonitorArbitraryCommand(qemuMonitorPtr mon, const char *cmd, char **reply, diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 909e9ac..d213818 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -498,8 +498,22 @@ int qemuMonitorDiskSnapshot(qemuMonitorPtr mon, const char *file, const char *format, bool reuse); +int qemuMonitorDriveMirror(qemuMonitorPtr mon, + virJSONValuePtr actions, + const char *device, + const char *file, + const char *format, + bool reuse) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3) + ATTRIBUTE_NONNULL(4) ATTRIBUTE_NONNULL(5); int qemuMonitorTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); +int qemuMonitorDriveReopen(qemuMonitorPtr mon, + const char *device, + const char *file, + const char *format) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3) + ATTRIBUTE_NONNULL(4); int qemuMonitorArbitraryCommand(qemuMonitorPtr mon, const char *cmd, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 0e8bcce..4338456 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -888,12 +888,14 @@ qemuMonitorJSONCheckCommands(qemuMonitorPtr mon, if (STREQ(name, "human-monitor-command")) *json_hmp = 1; - - if (STREQ(name, "system_wakeup")) + else if (STREQ(name, "system_wakeup")) qemuCapsSet(qemuCaps, QEMU_CAPS_WAKEUP); - - if (STREQ(name, "transaction")) + else if (STREQ(name, "transaction")) qemuCapsSet(qemuCaps, QEMU_CAPS_TRANSACTION); + else if (STREQ(name, "drive-mirror")) + qemuCapsSet(qemuCaps, QEMU_CAPS_DRIVE_MIRROR); + else if (STREQ(name, "drive-reopen")) + qemuCapsSet(qemuCaps, QEMU_CAPS_DRIVE_REOPEN); } ret = 0; @@ -3106,6 +3108,39 @@ cleanup: return ret; } +int +qemuMonitorJSONDriveMirror(qemuMonitorPtr mon ATTRIBUTE_UNUSED, + virJSONValuePtr actions, + const char *device, const char *file, + const char *format, bool reuse) +{ + int ret = -1; + virJSONValuePtr cmd; + + cmd = qemuMonitorJSONMakeCommandRaw(true, + "drive-mirror", + "s:device", device, + "s:target", file, + "s:format", format, + "s:mode", + reuse ? "existing" : "absolute-paths", + NULL); + if (!cmd) + return -1; + + if (virJSONValueArrayAppend(actions, cmd) < 0) { + virReportOOMError(); + goto cleanup; + } + + cmd = NULL; + ret = 0; + +cleanup: + virJSONValueFree(cmd); + return ret; +} + /* Note that this call frees actions regardless of whether the call * succeeds. */ int @@ -3134,6 +3169,33 @@ cleanup: return ret; } +int +qemuMonitorJSONDriveReopen(qemuMonitorPtr mon, const char *device, + const char *file, const char *format) +{ + int ret; + virJSONValuePtr cmd; + virJSONValuePtr reply = NULL; + + cmd = qemuMonitorJSONMakeCommand("drive-reopen", + "s:device", device, + "s:new-image-file", file, + "s:format", format, + NULL); + if (!cmd) + return -1; + + if ((ret = qemuMonitorJSONCommand(mon, cmd, &reply)) < 0) + goto cleanup; + + ret = qemuMonitorJSONCheckError(cmd, reply); + +cleanup: + virJSONValueFree(cmd); + virJSONValueFree(reply); + return ret; +} + int qemuMonitorJSONArbitraryCommand(qemuMonitorPtr mon, const char *cmd_str, char **reply_str, diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index a0f67aa..406836e 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -230,8 +230,25 @@ int qemuMonitorJSONDiskSnapshot(qemuMonitorPtr mon, const char *device, const char *file, const char *format, - bool reuse); -int qemuMonitorJSONTransaction(qemuMonitorPtr mon, virJSONValuePtr actions); + bool reuse) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(3) + ATTRIBUTE_NONNULL(4) ATTRIBUTE_NONNULL(5); +int qemuMonitorJSONDriveMirror(qemuMonitorPtr mon, + virJSONValuePtr actions, + const char *device, + const char *file, + const char *format, + bool reuse) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3) + ATTRIBUTE_NONNULL(4) ATTRIBUTE_NONNULL(5); +int qemuMonitorJSONTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); +int qemuMonitorJSONDriveReopen(qemuMonitorPtr mon, + const char *device, + const char *file, + const char *format) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3) + ATTRIBUTE_NONNULL(4); int qemuMonitorJSONArbitraryCommand(qemuMonitorPtr mon, const char *cmd_str, -- 1.7.7.6

Delete a mirrored snapshot by picking which of the two files in the mirror to reopen. This is not atomic, so we update the snapshot in place as we iterate through each successful disk. Since we limited mirrored snapshots to transient domains, there is no persistent configuration to update. This mode implies deleting the snapshot metadata but preserving the new file. * src/qemu/qemu_driver.c (qemuDomainSnapshotDelete): Honor new flags. (qemuDomainSnapshotDeleteMirror): New helper function. --- src/qemu/qemu_driver.c | 75 ++++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 73 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 89aa56c..428ba0f 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11222,6 +11222,61 @@ qemuDomainSnapshotReparentChildren(void *payload, rep->driver->snapshotDir); } +/* Must be called while holding a job. */ +static int +qemuDomainSnapshotDeleteMirror(struct qemud_driver *driver, + virDomainObjPtr vm, + virDomainSnapshotObjPtr snap, + bool pivot) +{ + int i; + int ret = 0; + qemuDomainObjPrivatePtr priv = vm->privateData; + char *device = NULL; + char *file; + + if (!virDomainObjIsActive(vm)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("domain is not running")); + return -1; + } + + qemuDomainObjEnterMonitorWithDriver(driver, vm); + + for (i = 0; i < snap->def->ndisks; i++) { + if (!snap->def->disks[i].mirror) + continue; + VIR_FREE(device); + if (virAsprintf(&device, "drive-%s", + vm->def->disks[i]->info.alias) < 0) { + virReportOOMError(); + break; + } + file = pivot ? snap->def->disks[i].mirror : snap->def->disks[i].file; + if (qemuMonitorDriveReopen(priv->mon, device, file, + snap->def->disks[i].driverType) < 0) + break; + /* TODO: Release lock and reset label. */ + + VIR_FREE(vm->def->disks[i]->src); + vm->def->disks[i]->src = file; + if (pivot) + snap->def->disks[i].file = snap->def->disks[i].mirror; + snap->def->disks[i].mirror = NULL; + } + VIR_FREE(device); + if (i < snap->def->ndisks) + ret = -1; + + qemuDomainObjExitMonitorWithDriver(driver, vm); + + if (ret < 0 && + qemuDomainSnapshotWriteMetadata(vm, snap, driver->snapshotDir) < 0) + VIR_WARN("failed writing snapshot '%s' after partial mirror deletion", + snap->def->name); + return ret; +} + static int qemuDomainSnapshotDelete(virDomainSnapshotPtr snapshot, unsigned int flags) { @@ -11234,10 +11289,14 @@ static int qemuDomainSnapshotDelete(virDomainSnapshotPtr snapshot, struct snap_reparent rep; bool metadata_only = !!(flags & VIR_DOMAIN_SNAPSHOT_DELETE_METADATA_ONLY); int external = 0; + bool mirror_abort = (flags & VIR_DOMAIN_SNAPSHOT_DELETE_MIRROR_ABORT) != 0; + bool mirror_pivot = (flags & VIR_DOMAIN_SNAPSHOT_DELETE_MIRROR_PIVOT) != 0; virCheckFlags(VIR_DOMAIN_SNAPSHOT_DELETE_CHILDREN | VIR_DOMAIN_SNAPSHOT_DELETE_METADATA_ONLY | - VIR_DOMAIN_SNAPSHOT_DELETE_CHILDREN_ONLY, -1); + VIR_DOMAIN_SNAPSHOT_DELETE_CHILDREN_ONLY | + VIR_DOMAIN_SNAPSHOT_DELETE_MIRROR_ABORT | + VIR_DOMAIN_SNAPSHOT_DELETE_MIRROR_PIVOT, -1); qemuDriverLock(driver); virUUIDFormat(snapshot->domain->uuid, uuidstr); @@ -11262,8 +11321,16 @@ static int qemuDomainSnapshotDelete(virDomainSnapshotPtr snapshot, _("domain has active disk mirrors")); goto cleanup; } + if (!(mirror_abort || mirror_pivot)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("must specify whether to abort or pivot " + "mirrors before deleting snapshot")); + goto cleanup; + } + flags |= VIR_DOMAIN_SNAPSHOT_DELETE_METADATA_ONLY; + } else if (mirror_abort || mirror_pivot) { qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", - _("deletion of active disk mirrors unimplemented")); + _("snapshot has no disk mirrors")); goto cleanup; } @@ -11286,6 +11353,10 @@ static int qemuDomainSnapshotDelete(virDomainSnapshotPtr snapshot, if (qemuDomainObjBeginJobWithDriver(driver, vm, QEMU_JOB_MODIFY) < 0) goto cleanup; + if ((mirror_abort || mirror_pivot) && + qemuDomainSnapshotDeleteMirror(driver, vm, snap, mirror_pivot) < 0) + goto endjob; + if (flags & (VIR_DOMAIN_SNAPSHOT_DELETE_CHILDREN | VIR_DOMAIN_SNAPSHOT_DELETE_CHILDREN_ONLY)) { rem.driver = driver; -- 1.7.7.6

The hardest part of this patch is figuring out how to provide proper security labeling and lock manager setup for the mirror, as well as rolling it all back on error. * src/qemu/qemu_driver.c (qemuDomainSnapshotCreateXML): Decide when mirrors are allowed. (qemuDomainSnapshotDiskPrepare): Prepare for mirrors. (qemuDomainSnapshotCreateSingleDiskActive): Turn on mirrors. (qemuDomainSnapshotUndoSingleDiskActive): Undo mirrors on failure. (qemuDomainSnapshotCreateDiskActive): Update caller. --- src/qemu/qemu_driver.c | 116 +++++++++++++++++++++++++++++++++++++++++++++-- 1 files changed, 111 insertions(+), 5 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 428ba0f..d03064f 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -9815,6 +9815,25 @@ qemuDomainSnapshotDiskPrepare(virDomainObjPtr vm, virDomainSnapshotDefPtr def, disk->name, disk->file); goto cleanup; } + if (disk->mirror) { + if (stat(disk->mirror, &st) < 0) { + if (errno != ENOENT) { + virReportSystemError(errno, + _("unable to stat mirror for " + "disk %s: %s"), + disk->name, disk->mirror); + goto cleanup; + } + } else if (!(S_ISBLK(st.st_mode) || !st.st_size || + allow_reuse)) { + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("external mirror file for disk %s " + "already exists and is not a block " + "device: %s"), + disk->name, disk->mirror); + goto cleanup; + } + } found = true; external++; break; @@ -9875,6 +9894,7 @@ qemuDomainSnapshotCreateSingleDiskActive(struct qemud_driver *driver, char *origsrc = NULL; char *origdriver = NULL; bool need_unlink = false; + bool need_unlink_mirror = false; if (snap->snapshot != VIR_DOMAIN_DISK_SNAPSHOT_EXTERNAL) { qemuReportError(VIR_ERR_INTERNAL_ERROR, "%s", @@ -9902,13 +9922,22 @@ qemuDomainSnapshotCreateSingleDiskActive(struct qemud_driver *driver, if (fd < 0) goto cleanup; VIR_FORCE_CLOSE(fd); + + if (snap->mirror) { + fd = qemuOpenFile(driver, snap->mirror, + O_WRONLY | O_TRUNC | O_CREAT, + &need_unlink_mirror, NULL); + if (fd < 0) + goto cleanup; + VIR_FORCE_CLOSE(fd); + } } origsrc = disk->src; - disk->src = source; origdriver = disk->driverType; disk->driverType = (char *) "raw"; /* Don't want to probe backing files */ + disk->src = source; if (virDomainLockDiskAttach(driver->lockManager, vm, disk) < 0) goto cleanup; if (virSecurityManagerSetImageLabel(driver->securityManager, vm->def, @@ -9918,9 +9947,21 @@ qemuDomainSnapshotCreateSingleDiskActive(struct qemud_driver *driver, goto cleanup; } + if (snap->mirror) { + disk->src = snap->mirror; + if (virDomainLockDiskAttach(driver->lockManager, vm, disk) < 0) + goto cleanup2; + if (virSecurityManagerSetImageLabel(driver->securityManager, vm->def, + disk) < 0) { + if (virDomainLockDiskDetach(driver->lockManager, vm, disk) < 0) + VIR_WARN("Unable to release lock on %s", source); + goto cleanup2; + } + } + disk->src = origsrc; - origsrc = NULL; disk->driverType = origdriver; + origsrc = NULL; origdriver = NULL; /* create the actual snapshot */ @@ -9929,9 +9970,19 @@ qemuDomainSnapshotCreateSingleDiskActive(struct qemud_driver *driver, virDomainAuditDisk(vm, disk->src, source, "snapshot", ret >= 0); if (ret < 0) goto cleanup; + if (snap->mirror) { + ret = qemuMonitorDriveMirror(priv->mon, actions, device, snap->mirror, + driverType, reuse); + virDomainAuditDisk(vm, NULL, snap->mirror, "mirror", ret >= 0); + if (ret < 0) { + ret = -2; + goto cleanup; + } + } /* Update vm in place to match changes. */ need_unlink = false; + need_unlink_mirror = false; VIR_FREE(disk->src); disk->src = source; source = NULL; @@ -9958,12 +10009,25 @@ cleanup: } if (need_unlink && unlink(source)) VIR_WARN("unable to unlink just-created %s", source); + if (need_unlink_mirror && unlink(snap->mirror)) + VIR_WARN("unable to unlink just-created %s", snap->mirror); VIR_FREE(device); VIR_FREE(source); VIR_FREE(driverType); VIR_FREE(persistSource); VIR_FREE(persistDriverType); return ret; + +cleanup2: + /* We failed to relabel a mirror, but successfully labeled the + * destination; so we need to release the destination. */ + disk->src = source; + if (virDomainLockDiskDetach(driver->lockManager, vm, disk) < 0) + VIR_WARN("Unable to release lock on %s", source); + if (virSecurityManagerRestoreImageLabel(driver->securityManager, + vm->def, disk) < 0) + VIR_WARN("Unable to restore security label on %s", source); + goto cleanup; } /* The domain is expected to hold monitor lock. This is the @@ -9975,7 +10039,7 @@ qemuDomainSnapshotUndoSingleDiskActive(struct qemud_driver *driver, virDomainDiskDefPtr origdisk, virDomainDiskDefPtr disk, virDomainDiskDefPtr persistDisk, - bool need_unlink) + bool need_unlink, char *mirror) { char *source = NULL; char *driverType = NULL; @@ -9993,6 +10057,23 @@ qemuDomainSnapshotUndoSingleDiskActive(struct qemud_driver *driver, goto cleanup; } + if (mirror) { + char *origsrc = disk->src; + char *origdriver = disk->driverType; + disk->src = mirror; + disk->driverType = (char *) "raw"; + if (virSecurityManagerRestoreImageLabel(driver->securityManager, + vm->def, disk) < 0) + VIR_WARN("Unable to restore security label on %s", mirror); + if (virDomainLockDiskDetach(driver->lockManager, vm, disk) < 0) + VIR_WARN("Unable to release lock on %s", mirror); + if (need_unlink && stat(mirror, &st) == 0 && + st.st_size == 0 && S_ISREG(st.st_mode) && unlink(mirror) < 0) + VIR_WARN("Unable to remove just-created %s", mirror); + disk->src = origsrc; + disk->driverType = origdriver; + } + if (virSecurityManagerRestoreImageLabel(driver->securityManager, vm->def, disk) < 0) VIR_WARN("Unable to restore security label on %s", disk->src); @@ -10131,13 +10212,21 @@ qemuDomainSnapshotCreateDiskActive(virConnectPtr conn, if (ret == 0) ret = qemuMonitorTransaction(priv->mon, actions); if (ret < 0) { - /* Transaction failed; undo the changes to vm. */ + /* Transaction failed; undo the changes to vm. Creating a + * mirror takes two parts; a return of -2 tells us that we + * failed half-way through. */ + if (ret == -2) + i++; bool need_unlink = !(flags & VIR_DOMAIN_SNAPSHOT_CREATE_REUSE_EXT); while (--i >= 0) { virDomainDiskDefPtr persistDisk = NULL; + char *mirror = NULL; if (snap->def->disks[i].snapshot == VIR_DOMAIN_DISK_SNAPSHOT_NO) continue; + if (ret != -2) + mirror = snap->def->disks[i].mirror; + ret = -1; if (vm->newDef) { int indx = virDomainDiskIndexByName(vm->newDef, vm->def->disks[i]->dst, @@ -10150,7 +10239,7 @@ qemuDomainSnapshotCreateDiskActive(virConnectPtr conn, snap->def->dom->disks[i], vm->def->disks[i], persistDisk, - need_unlink); + need_unlink, mirror); } } } @@ -10218,6 +10307,7 @@ qemuDomainSnapshotCreateXML(virDomainPtr domain, unsigned int flags) { struct qemud_driver *driver = domain->conn->privateData; + qemuDomainObjPrivatePtr priv; virDomainObjPtr vm = NULL; char *xml = NULL; virDomainSnapshotObjPtr snap = NULL; @@ -10261,6 +10351,7 @@ qemuDomainSnapshotCreateXML(virDomainPtr domain, _("no domain with matching uuid '%s'"), uuidstr); goto cleanup; } + priv = vm->privateData; if (qemuProcessAutoDestroyActive(driver, vm)) { qemuReportError(VIR_ERR_OPERATION_INVALID, @@ -10279,6 +10370,21 @@ qemuDomainSnapshotCreateXML(virDomainPtr domain, goto cleanup; } + /* Right now, we only permit disk mirroring when qemu is new + * enough, and only when creating a new snapshot on a transient + * domain; that way, we don't have to deal with complications like + * restoring mirroring when stopping and rebooting a domain, or + * altering the set of mirrors when redefining a snapshot. We + * require the snapshot to exist as long as mirroring is + * active. */ + if (qemuCapsGet(priv->qemuCaps, QEMU_CAPS_DRIVE_MIRROR) && + qemuCapsGet(priv->qemuCaps, QEMU_CAPS_DRIVE_REOPEN) && + !vm->persistent && + (flags & (VIR_DOMAIN_SNAPSHOT_CREATE_HALT | + VIR_DOMAIN_SNAPSHOT_CREATE_NO_METADATA | + VIR_DOMAIN_SNAPSHOT_CREATE_REDEFINE)) == 0) + parse_flags |= VIR_DOMAIN_SNAPSHOT_PARSE_MIRROR; + if (!(def = virDomainSnapshotDefParseString(xmlDesc, driver->caps, QEMU_EXPECTED_VIRT_TYPES, parse_flags))) -- 1.7.7.6

Il 23/03/2012 05:17, Eric Blake ha scritto:
I was originally going to send this as two rounds, one for the XML additions of adding mirrors, and one for the flag addition to virDomainSnapshotDelete for mapping to drive-reopen; but it turned out that testing is easier if I finish the series.
Note that this is minimally tested at the moment; I'm posting it to get review started, but I plan on hammering it tomorrow, and possibly coming up with slight alterations for a v2.
This assumes that rounds 1-3 (ending here) have been applied: https://www.redhat.com/archives/libvir-list/2012-March/msg00846.html
This should finish out my RFC here: https://www.redhat.com/archives/libvir-list/2012-March/msg00578.html but with one major caveat: https://lists.gnu.org/archive/html/qemu-devel/2012-03/msg01524.html Upstream qemu has not yet committed the 'drive-mirror' or 'drive-reopen' monitor commands, which this series (rounds 4-5) depends on. There are still bugs being worked out under the hood on the qemu side, and there is a slight possibility that fixing those bugs may change the public interface. Therefore, part of the review of this series should be to determine whether we take the risk of applying the patch now, or waiting for qemu to stabilize a bit further.
I will probably push rounds 1-3 tomorrow, since they have been ACK'd (well, round 3 is still awaiting review), and since qemu has committed to the 'transaction' monitor command for qemu 1.1.
For reference, it is likely that RHEL 6.3 will not wait for qemu 1.1, but will instead provide RHEL-specific monitor commands named __com.redhat_drive-mirror and __com.redhat_drive-reopen; I have hopefully structured things well enough that it will be a couple lines of patching to qemu_monitor_json.c to recognize that alternate command name.
NACK. RHEL can have all the hacks you want, but upstream oVirt is just another client of libvirt and QEMU and things should be done properly there using a new API based on block_stream. Such a new API would also be extensible in the future to non-shared-storage migration with NBD + qemu-nbd. Paolo

On 03/23/2012 06:48 AM, Paolo Bonzini wrote:
Il 23/03/2012 05:17, Eric Blake ha scritto:
I was originally going to send this as two rounds, one for the XML additions of adding mirrors, and one for the flag addition to virDomainSnapshotDelete for mapping to drive-reopen; but it turned out that testing is easier if I finish the series.
I will probably push rounds 1-3 tomorrow, since they have been ACK'd (well, round 3 is still awaiting review), and since qemu has committed to the 'transaction' monitor command for qemu 1.1.
This part is still true, once I get round 3 reviewed.
For reference, it is likely that RHEL 6.3 will not wait for qemu 1.1, but will instead provide RHEL-specific monitor commands named __com.redhat_drive-mirror and __com.redhat_drive-reopen; I have hopefully structured things well enough that it will be a couple lines of patching to qemu_monitor_json.c to recognize that alternate command name.
NACK.
RHEL can have all the hacks you want, but upstream oVirt is just another client of libvirt and QEMU and things should be done properly there using a new API based on block_stream. Such a new API would also be extensible in the future to non-shared-storage migration with NBD + qemu-nbd.
I spoke more with Paolo off-line; the NACK is a valid complaint that we really don't want to commit to the hard-to-use API of round 4 (XML changes) or round 5 (snapshot delete flags) this close to the release of libvirt 0.9.11. Paolo is also complaining that my round 4/5 patches provide snapshot mirroring, but that snapshot mirroring is not as nice as streaming mirroring (that is, with snapshot mirroring, the backing chain gets longer and you have to clean it up after the fact; with streaming mirroring, the mirror has the same backing chain length as the original). My idea now is to work with Paolo at understanding how the same qemu 'transaction' and 'drive-reopen' commands can do both snapshot mirroring and streaming mirroring, then I hope to have a counter-proposal RFC posted by Monday, with an implementation sometime next week, that exposes streaming mirroring via virDomainBlockRebase. Then, because the idea of snapshot mirroring has been controversial, we do _not_ want either solution in next week's 0.9.11 release, which gives us a week or so to compare the two proposed implementations; hopefully streaming mirroring really does look better, and we can commit to either streaming mirroring or snapshot mirroring (and hopefully not both) in time for libvirt 0.9.12 at the end-of-April release. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Il 23/03/2012 16:49, Eric Blake ha scritto:
My idea now is to work with Paolo at understanding how the same qemu 'transaction' and 'drive-reopen' commands can do both snapshot mirroring and streaming mirroring, then I hope to have a counter-proposal RFC posted by Monday, with an implementation sometime next week, that exposes streaming mirroring via virDomainBlockRebase.
... of course this doesn't extend to rounds 1-3, which are a fine feature and match what will be in QEMU 1.1. An initial sketch of the mirror/stream API is at http://wiki.qemu.org/Features/SnapshotsMultipleDevices Paolo
participants (2)
-
Eric Blake
-
Paolo Bonzini