[libvirt] [PATCH v5 0/4] fix blockcopy across libvirtd restart, turn on active blockcommit

v4 was here: https://www.redhat.com/archives/libvir-list/2014-June/msg01095.html Since then: I _finally_ implemented persistent domain updates to match live changes, by reimplementing how <mirror> is tracked in XML to be more robust to libvirtd restarts. In particular, it fixes a bug in v4 that Adam Litke pointed out where a pivot commit would emit a ready event for 'active commit' but then a completed event for plain 'commit'. Patches 1 and 2 are completely local to blockcopy, and fix a long-standing bug where it is not robust to libvirtd restarts (similar to the bug fixed in 60e4944). They are pre-requisite to turning on active commit. Patches 3 and 4 are borderline on whether it is a new feature or a bug fix. But consider that commit 47549d5 turned on qemu feature probing with the intent of getting this in 1.2.7, and we already missed getting active commit into 1.2.6. As it is, patch 4 is very similar to the already-acked counterpart of v4. Therefore, even though I missed rc1, I'm arguing that this whole series should be included in rc2 in time for 1.2.7. Not done yet, but that I'd also like to have in the release if I can swing it: the new virConnectGetDomainCapabilities API needs to expose features on whether active commit will work. Also, I'd like to fix libvirtd restarts to inspect all existing <mirror> tags and see if the job completed while libvirtd was offline (that is, when we miss an event, we should still start up in the correct state when reconnecting to qemu). Eric Blake (4): blockcopy: add more XML for state tracking blockjob: properly track blockcopy xml changes on disk blockcommit: track job type in xml blockcommit: turn on active commit docs/formatdomain.html.in | 23 ++- docs/schemas/domaincommon.rng | 19 ++- src/conf/domain_conf.c | 56 ++++++- src/conf/domain_conf.h | 14 +- src/qemu/qemu_driver.c | 167 ++++++++++++++------- src/qemu/qemu_process.c | 92 ++++++++++-- .../qemuxml2argv-disk-active-commit.xml | 37 +++++ .../qemuxml2argvdata/qemuxml2argv-disk-mirror.xml | 8 +- .../qemuxml2xmlout-disk-mirror-old.xml | 4 +- tests/qemuxml2xmltest.c | 1 + 10 files changed, 336 insertions(+), 85 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-disk-active-commit.xml -- 1.9.3

Doing a blockcopy operation across a libvirtd restart is not very robust at the moment. In particular, we are clearing the <mirror> element prior to telling qemu to finish the job; and thanks to the ability to request async completion, the user can easily regain control prior to qemu actually finishing the effort, and they should be able to poll the domain XML to see if the job is still going. A future patch will fix things to actually wait until qemu is done before modifying the XML to reflect the job completion. But since qemu issues identical BLOCK_JOB_COMPLETE events regardless of whether the job was cancelled (kept the original disk) or completed (pivoted to the new disk), we have to track which of the two operations were used to end the job. Furthermore, we'd like to avoid attempts to end a job where we are already waiting on an earlier request to qemu to end the job. Likewise, if we miss the qemu event (perhaps because it arrived during a libvirtd restart), we still need enough state recorded to be able to determine how to modify the domain XML once we reconnect to qemu and manually learn whether the job still exists. Although this patch doesn't actually fix the problem, it is a preliminary step that makes it possible to track whether a job has already begun steps towards completion. * src/conf/domain_conf.h (virDomainDiskMirror): New enum. (_virDomainDiskDef): Convert bool mirroring to new enum. * src/conf/domain_conf.c (virDomainDiskDefParseXML) (virDomainDiskDefFormat): Handle new values. * src/qemu/qemu_process.c (qemuProcessHandleBlockJob): Adjust client. * src/qemu/qemu_driver.c (qemuDomainBlockPivot) (qemuDomainBlockJobImpl): Likewise. * docs/schemas/domaincommon.rng (diskMirror): Expose new values. * docs/formatdomain.html.in (elementsDisks): Document it. * tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml: Test it. Signed-off-by: Eric Blake <eblake@redhat.com> --- docs/formatdomain.html.in | 10 +++++++--- docs/schemas/domaincommon.rng | 6 +++++- src/conf/domain_conf.c | 23 +++++++++++++++++++--- src/conf/domain_conf.h | 13 +++++++++++- src/qemu/qemu_driver.c | 12 +++++------ src/qemu/qemu_process.c | 4 ++-- .../qemuxml2argvdata/qemuxml2argv-disk-mirror.xml | 4 ++++ 7 files changed, 56 insertions(+), 16 deletions(-) diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 8950959..4cb4289 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1907,9 +1907,13 @@ format of the source). The details of the <code>source</code> sub-element are determined by the <code>type</code> attribute of the mirror, similar to what is done for the - overall <code>disk</code> device element. If - attribute <code>ready</code> is present, then it is known the - disk is ready to pivot; otherwise, the disk is probably still + overall <code>disk</code> device element. The + attribute <code>ready</code>, if present, tracks progress of + the job: <code>yes</code> if the disk is known to be ready to + pivot, or, <span class="since">since + 1.2.7</span>, <code>abort</code> or <code>pivot</code> if the + job is in the process of completing. If <code>ready</code> is + not present, the disk is probably still copying. For now, this element only valid in output; it is ignored on input. The <code>source</code> sub-element exists for all two-phase jobs <span class="since">since 1.2.6</span>. diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index f6f697c..389c8c9 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -4284,7 +4284,11 @@ </choice> <optional> <attribute name='ready'> - <value>yes</value> + <choice> + <value>yes</value> + <value>abort</value> + <value>pivot</value> + </choice> </attribute> </optional> </element> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 910f6e2..1c8f8cf 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -747,6 +747,12 @@ VIR_ENUM_IMPL(virDomainDiskDiscard, VIR_DOMAIN_DISK_DISCARD_LAST, "unmap", "ignore") +VIR_ENUM_IMPL(virDomainDiskMirror, VIR_DOMAIN_DISK_MIRROR_LAST, + "default", + "yes", + "abort", + "pivot") + #define VIR_DOMAIN_XML_WRITE_FLAGS VIR_DOMAIN_XML_SECURE #define VIR_DOMAIN_XML_READ_FLAGS VIR_DOMAIN_XML_INACTIVE @@ -5482,7 +5488,14 @@ virDomainDiskDefParseXML(virDomainXMLOptionPtr xmlopt, } ready = virXMLPropString(cur, "ready"); if (ready) { - def->mirroring = true; + def->mirrorState = virDomainDiskMirrorTypeFromString(ready); + if (def->mirrorState < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("unknown mirror ready state %s"), + ready); + VIR_FREE(ready); + goto error; + } VIR_FREE(ready); } } else if (xmlStrEqual(cur->name, BAD_CAST "auth")) { @@ -15273,8 +15286,12 @@ virDomainDiskDefFormat(virBufferPtr buf, virBufferEscapeString(buf, " file='%s'", def->mirror->path); virBufferEscapeString(buf, " format='%s'", formatStr); } - if (def->mirroring) - virBufferAddLit(buf, " ready='yes'"); + if (def->mirrorState) { + const char *mirror; + + mirror = virDomainDiskMirrorTypeToString(def->mirrorState); + virBufferEscapeString(buf, " ready='%s'", mirror); + } virBufferAddLit(buf, ">\n"); virBufferAdjustIndent(buf, 2); virBufferEscapeString(buf, "<format type='%s'/>\n", formatStr); diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index b988b17..bc8966d 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -610,6 +610,16 @@ struct _virDomainBlockIoTuneInfo { typedef virDomainBlockIoTuneInfo *virDomainBlockIoTuneInfoPtr; +typedef enum { + VIR_DOMAIN_DISK_MIRROR_DEFAULT = 0, /* No job, or job still not synced */ + VIR_DOMAIN_DISK_MIRROR_READY, /* Job in second phase */ + VIR_DOMAIN_DISK_MIRROR_ABORT, /* Job aborted, waiting for event */ + VIR_DOMAIN_DISK_MIRROR_PIVOT, /* Job pivoted, waiting for event */ + + VIR_DOMAIN_DISK_MIRROR_LAST +} virDomainDiskMirror; + + /* Stores the virtual disk configuration */ struct _virDomainDiskDef { virStorageSourcePtr src; /* non-NULL. XXX Allow NULL for empty cdrom? */ @@ -621,7 +631,7 @@ struct _virDomainDiskDef { int removable; /* enum virTristateSwitch */ virStorageSourcePtr mirror; - bool mirroring; + int mirrorState; /* enum virDomainDiskMirror */ struct { unsigned int cylinders; @@ -2556,6 +2566,7 @@ VIR_ENUM_DECL(virDomainDiskIo) VIR_ENUM_DECL(virDomainDeviceSGIO) VIR_ENUM_DECL(virDomainDiskTray) VIR_ENUM_DECL(virDomainDiskDiscard) +VIR_ENUM_DECL(virDomainDiskMirror) VIR_ENUM_DECL(virDomainController) VIR_ENUM_DECL(virDomainControllerModelPCI) VIR_ENUM_DECL(virDomainControllerModelSCSI) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 704ba39..acc9ef0 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -14846,7 +14846,7 @@ qemuDomainBlockPivot(virConnectPtr conn, format = virStorageFileFormatTypeToString(disk->mirror->format); /* Probe the status, if needed. */ - if (!disk->mirroring) { + if (!disk->mirrorState) { qemuDomainObjEnterMonitor(driver, vm); rc = qemuMonitorBlockJob(priv->mon, device, NULL, NULL, 0, &info, BLOCK_JOB_INFO, true); @@ -14860,10 +14860,10 @@ qemuDomainBlockPivot(virConnectPtr conn, } if (rc == 1 && info.cur == info.end && info.type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) - disk->mirroring = true; + disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_READY; } - if (!disk->mirroring) { + if (disk->mirrorState != VIR_DOMAIN_DISK_MIRROR_READY) { virReportError(VIR_ERR_BLOCK_COPY_ACTIVE, _("disk '%s' not ready for pivot yet"), disk->dst); @@ -14939,7 +14939,7 @@ qemuDomainBlockPivot(virConnectPtr conn, } disk->mirror = NULL; - disk->mirroring = false; + disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_DEFAULT; cleanup: /* revert to original disk def on failure */ @@ -15096,7 +15096,7 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm, * avoid checking if pivot is safe. */ if (mode == BLOCK_JOB_INFO && ret == 1 && disk->mirror && info->cur == info->end && info->type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) - disk->mirroring = true; + disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_READY; /* A successful block job cancelation stops any mirroring. */ if (mode == BLOCK_JOB_ABORT && disk->mirror) { @@ -15104,7 +15104,7 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm, * the mirror, and audit that fact, before dropping things. */ virStorageSourceFree(disk->mirror); disk->mirror = NULL; - disk->mirroring = false; + disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_DEFAULT; } waitjob: diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 3d10732..73ad27d 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1040,11 +1040,11 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED, qemuDomainDetermineDiskChain(driver, vm, disk, true); if (disk->mirror && type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) { if (status == VIR_DOMAIN_BLOCK_JOB_READY) { - disk->mirroring = true; + disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_READY; } else if (status == VIR_DOMAIN_BLOCK_JOB_FAILED) { virStorageSourceFree(disk->mirror); disk->mirror = NULL; - disk->mirroring = false; + disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_DEFAULT; } } } diff --git a/tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml b/tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml index 72b03c9..fa7af31 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml +++ b/tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml @@ -42,6 +42,10 @@ <disk type='file' device='disk'> <source file='/tmp/logs.img'/> <backingStore/> + <mirror type='file' file='/tmp/logcopy.img' format='qcow2' ready='abort'> + <format type='qcow2'/> + <source file='/tmp/logcopy.img'/> + </mirror> <target dev='vdb' bus='virtio'/> </disk> <controller type='usb' index='0'/> -- 1.9.3

On 07/29/14 06:20, Eric Blake wrote:
Doing a blockcopy operation across a libvirtd restart is not very robust at the moment. In particular, we are clearing the <mirror> element prior to telling qemu to finish the job; and thanks to the ability to request async completion, the user can easily regain control prior to qemu actually finishing the effort, and they should be able to poll the domain XML to see if the job is still going.
A future patch will fix things to actually wait until qemu is done before modifying the XML to reflect the job completion. But since qemu issues identical BLOCK_JOB_COMPLETE events regardless of whether the job was cancelled (kept the original disk) or completed (pivoted to the new disk), we have to track which of the two operations were used to end the job. Furthermore, we'd like to avoid attempts to end a job where we are already waiting on an earlier request to qemu to end the job. Likewise, if we miss the qemu event (perhaps because it arrived during a libvirtd restart), we still need enough state recorded to be able to determine how to modify the domain XML once we reconnect to qemu and manually learn whether the job still exists.
Although this patch doesn't actually fix the problem, it is a preliminary step that makes it possible to track whether a job has already begun steps towards completion.
* src/conf/domain_conf.h (virDomainDiskMirror): New enum. (_virDomainDiskDef): Convert bool mirroring to new enum. * src/conf/domain_conf.c (virDomainDiskDefParseXML) (virDomainDiskDefFormat): Handle new values. * src/qemu/qemu_process.c (qemuProcessHandleBlockJob): Adjust client. * src/qemu/qemu_driver.c (qemuDomainBlockPivot) (qemuDomainBlockJobImpl): Likewise. * docs/schemas/domaincommon.rng (diskMirror): Expose new values. * docs/formatdomain.html.in (elementsDisks): Document it. * tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml: Test it.
Signed-off-by: Eric Blake <eblake@redhat.com> --- docs/formatdomain.html.in | 10 +++++++--- docs/schemas/domaincommon.rng | 6 +++++- src/conf/domain_conf.c | 23 +++++++++++++++++++--- src/conf/domain_conf.h | 13 +++++++++++- src/qemu/qemu_driver.c | 12 +++++------ src/qemu/qemu_process.c | 4 ++-- .../qemuxml2argvdata/qemuxml2argv-disk-mirror.xml | 4 ++++ 7 files changed, 56 insertions(+), 16 deletions(-)
diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index f6f697c..389c8c9 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -4284,7 +4284,11 @@ </choice> <optional> <attribute name='ready'> - <value>yes</value> + <choice> + <value>yes</value> + <value>abort</value>
Seems reasonable to store the state of the final step here.
+ <value>pivot</value> + </choice> </attribute> </optional> </element> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 910f6e2..1c8f8cf 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -747,6 +747,12 @@ VIR_ENUM_IMPL(virDomainDiskDiscard, VIR_DOMAIN_DISK_DISCARD_LAST, "unmap", "ignore")
+VIR_ENUM_IMPL(virDomainDiskMirror, VIR_DOMAIN_DISK_MIRROR_LAST, + "default", + "yes", + "abort", + "pivot") + #define VIR_DOMAIN_XML_WRITE_FLAGS VIR_DOMAIN_XML_SECURE #define VIR_DOMAIN_XML_READ_FLAGS VIR_DOMAIN_XML_INACTIVE
@@ -5482,7 +5488,14 @@ virDomainDiskDefParseXML(virDomainXMLOptionPtr xmlopt, } ready = virXMLPropString(cur, "ready"); if (ready) { - def->mirroring = true; + def->mirrorState = virDomainDiskMirrorTypeFromString(ready); + if (def->mirrorState < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("unknown mirror ready state %s"), + ready); + VIR_FREE(ready); + goto error; + } VIR_FREE(ready); } } else if (xmlStrEqual(cur->name, BAD_CAST "auth")) { @@ -15273,8 +15286,12 @@ virDomainDiskDefFormat(virBufferPtr buf, virBufferEscapeString(buf, " file='%s'", def->mirror->path); virBufferEscapeString(buf, " format='%s'", formatStr); } - if (def->mirroring) - virBufferAddLit(buf, " ready='yes'"); + if (def->mirrorState) { + const char *mirror; + + mirror = virDomainDiskMirrorTypeToString(def->mirrorState); + virBufferEscapeString(buf, " ready='%s'", mirror); + } virBufferAddLit(buf, ">\n"); virBufferAdjustIndent(buf, 2); virBufferEscapeString(buf, "<format type='%s'/>\n", formatStr); diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index b988b17..bc8966d 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -610,6 +610,16 @@ struct _virDomainBlockIoTuneInfo { typedef virDomainBlockIoTuneInfo *virDomainBlockIoTuneInfoPtr;
+typedef enum { + VIR_DOMAIN_DISK_MIRROR_DEFAULT = 0, /* No job, or job still not synced */
I'd go with _NONE here and ...
+ VIR_DOMAIN_DISK_MIRROR_READY, /* Job in second phase */ + VIR_DOMAIN_DISK_MIRROR_ABORT, /* Job aborted, waiting for event */ + VIR_DOMAIN_DISK_MIRROR_PIVOT, /* Job pivoted, waiting for event */ + + VIR_DOMAIN_DISK_MIRROR_LAST +} virDomainDiskMirror;
... as this describes the ready state of the mirroring operation I'd go with virDomainDiskMirrorState and rename those fields accordingly. Without that it's not clear what the enum describes.
+ + /* Stores the virtual disk configuration */ struct _virDomainDiskDef { virStorageSourcePtr src; /* non-NULL. XXX Allow NULL for empty cdrom? */ @@ -621,7 +631,7 @@ struct _virDomainDiskDef { int removable; /* enum virTristateSwitch */
virStorageSourcePtr mirror; - bool mirroring; + int mirrorState; /* enum virDomainDiskMirror */
struct { unsigned int cylinders;
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 704ba39..acc9ef0 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -14846,7 +14846,7 @@ qemuDomainBlockPivot(virConnectPtr conn, format = virStorageFileFormatTypeToString(disk->mirror->format);
/* Probe the status, if needed. */ - if (!disk->mirroring) { + if (!disk->mirrorState) { qemuDomainObjEnterMonitor(driver, vm); rc = qemuMonitorBlockJob(priv->mon, device, NULL, NULL, 0, &info, BLOCK_JOB_INFO, true); @@ -14860,10 +14860,10 @@ qemuDomainBlockPivot(virConnectPtr conn, } if (rc == 1 && info.cur == info.end && info.type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) - disk->mirroring = true; + disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_READY; }
- if (!disk->mirroring) { + if (disk->mirrorState != VIR_DOMAIN_DISK_MIRROR_READY) { virReportError(VIR_ERR_BLOCK_COPY_ACTIVE, _("disk '%s' not ready for pivot yet"), disk->dst); @@ -14939,7 +14939,7 @@ qemuDomainBlockPivot(virConnectPtr conn, }
disk->mirror = NULL; - disk->mirroring = false; + disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_DEFAULT;
cleanup: /* revert to original disk def on failure */ @@ -15096,7 +15096,7 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm, * avoid checking if pivot is safe. */ if (mode == BLOCK_JOB_INFO && ret == 1 && disk->mirror && info->cur == info->end && info->type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) - disk->mirroring = true; + disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_READY;
/* A successful block job cancelation stops any mirroring. */ if (mode == BLOCK_JOB_ABORT && disk->mirror) { @@ -15104,7 +15104,7 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm, * the mirror, and audit that fact, before dropping things. */ virStorageSourceFree(disk->mirror); disk->mirror = NULL; - disk->mirroring = false; + disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_DEFAULT; }
waitjob: diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 3d10732..73ad27d 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1040,11 +1040,11 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED, qemuDomainDetermineDiskChain(driver, vm, disk, true); if (disk->mirror && type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) { if (status == VIR_DOMAIN_BLOCK_JOB_READY) { - disk->mirroring = true; + disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_READY; } else if (status == VIR_DOMAIN_BLOCK_JOB_FAILED) { virStorageSourceFree(disk->mirror); disk->mirror = NULL; - disk->mirroring = false; + disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_DEFAULT; } } } diff --git a/tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml b/tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml index 72b03c9..fa7af31 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml +++ b/tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml @@ -42,6 +42,10 @@ <disk type='file' device='disk'> <source file='/tmp/logs.img'/> <backingStore/> + <mirror type='file' file='/tmp/logcopy.img' format='qcow2' ready='abort'> + <format type='qcow2'/> + <source file='/tmp/logcopy.img'/> + </mirror> <target dev='vdb' bus='virtio'/> </disk> <controller type='usb' index='0'/>
Otherwise looks good. ACK if you change the enum name and default value name. Peter

On 07/29/2014 05:20 AM, Peter Krempa wrote:
On 07/29/14 06:20, Eric Blake wrote:
Doing a blockcopy operation across a libvirtd restart is not very robust at the moment. In particular, we are clearing the <mirror> element prior to telling qemu to finish the job; and thanks to the ability to request async completion, the user can easily regain control prior to qemu actually finishing the effort, and they should be able to poll the domain XML to see if the job is still going.
+typedef enum { + VIR_DOMAIN_DISK_MIRROR_DEFAULT = 0, /* No job, or job still not synced */
I'd go with _NONE here and ...
+ VIR_DOMAIN_DISK_MIRROR_READY, /* Job in second phase */ + VIR_DOMAIN_DISK_MIRROR_ABORT, /* Job aborted, waiting for event */ + VIR_DOMAIN_DISK_MIRROR_PIVOT, /* Job pivoted, waiting for event */ + + VIR_DOMAIN_DISK_MIRROR_LAST +} virDomainDiskMirror;
... as this describes the ready state of the mirroring operation I'd go with virDomainDiskMirrorState and rename those fields accordingly. Without that it's not clear what the enum describes.
Otherwise looks good. ACK if you change the enum name and default value name.
I'm squashing this in, and pushing. diff --git i/src/conf/domain_conf.c w/src/conf/domain_conf.c index 1c8f8cf..c2a81f1 100644 --- i/src/conf/domain_conf.c +++ w/src/conf/domain_conf.c @@ -747,8 +747,8 @@ VIR_ENUM_IMPL(virDomainDiskDiscard, VIR_DOMAIN_DISK_DISCARD_LAST, "unmap", "ignore") -VIR_ENUM_IMPL(virDomainDiskMirror, VIR_DOMAIN_DISK_MIRROR_LAST, - "default", +VIR_ENUM_IMPL(virDomainDiskMirrorState, VIR_DOMAIN_DISK_MIRROR_STATE_LAST, + "none", "yes", "abort", "pivot") @@ -5488,8 +5488,8 @@ virDomainDiskDefParseXML(virDomainXMLOptionPtr xmlopt, } ready = virXMLPropString(cur, "ready"); if (ready) { - def->mirrorState = virDomainDiskMirrorTypeFromString(ready); - if (def->mirrorState < 0) { + if ((def->mirrorState = + virDomainDiskMirrorStateTypeFromString(ready)) < 0) { virReportError(VIR_ERR_XML_ERROR, _("unknown mirror ready state %s"), ready); @@ -15289,7 +15289,7 @@ virDomainDiskDefFormat(virBufferPtr buf, if (def->mirrorState) { const char *mirror; - mirror = virDomainDiskMirrorTypeToString(def->mirrorState); + mirror = virDomainDiskMirrorStateTypeToString(def->mirrorState); virBufferEscapeString(buf, " ready='%s'", mirror); } virBufferAddLit(buf, ">\n"); diff --git i/src/conf/domain_conf.h w/src/conf/domain_conf.h index bc8966d..2df3ae2 100644 --- i/src/conf/domain_conf.h +++ w/src/conf/domain_conf.h @@ -611,13 +611,13 @@ typedef virDomainBlockIoTuneInfo *virDomainBlockIoTuneInfoPtr; typedef enum { - VIR_DOMAIN_DISK_MIRROR_DEFAULT = 0, /* No job, or job still not synced */ - VIR_DOMAIN_DISK_MIRROR_READY, /* Job in second phase */ - VIR_DOMAIN_DISK_MIRROR_ABORT, /* Job aborted, waiting for event */ - VIR_DOMAIN_DISK_MIRROR_PIVOT, /* Job pivoted, waiting for event */ + VIR_DOMAIN_DISK_MIRROR_STATE_NONE = 0, /* No job, or job still not synced */ + VIR_DOMAIN_DISK_MIRROR_STATE_READY, /* Job in second phase */ + VIR_DOMAIN_DISK_MIRROR_STATE_ABORT, /* Job aborted, waiting for event */ + VIR_DOMAIN_DISK_MIRROR_STATE_PIVOT, /* Job pivoted, waiting for event */ - VIR_DOMAIN_DISK_MIRROR_LAST -} virDomainDiskMirror; + VIR_DOMAIN_DISK_MIRROR_STATE_LAST +} virDomainDiskMirrorState; /* Stores the virtual disk configuration */ @@ -631,7 +631,7 @@ struct _virDomainDiskDef { int removable; /* enum virTristateSwitch */ virStorageSourcePtr mirror; - int mirrorState; /* enum virDomainDiskMirror */ + int mirrorState; /* enum virDomainDiskMirrorState */ struct { unsigned int cylinders; @@ -2566,7 +2566,7 @@ VIR_ENUM_DECL(virDomainDiskIo) VIR_ENUM_DECL(virDomainDeviceSGIO) VIR_ENUM_DECL(virDomainDiskTray) VIR_ENUM_DECL(virDomainDiskDiscard) -VIR_ENUM_DECL(virDomainDiskMirror) +VIR_ENUM_DECL(virDomainDiskMirrorState) VIR_ENUM_DECL(virDomainController) VIR_ENUM_DECL(virDomainControllerModelPCI) VIR_ENUM_DECL(virDomainControllerModelSCSI) diff --git i/src/qemu/qemu_driver.c w/src/qemu/qemu_driver.c index acc9ef0..836adba 100644 --- i/src/qemu/qemu_driver.c +++ w/src/qemu/qemu_driver.c @@ -14860,10 +14860,10 @@ qemuDomainBlockPivot(virConnectPtr conn, } if (rc == 1 && info.cur == info.end && info.type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) - disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_READY; + disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_READY; } - if (disk->mirrorState != VIR_DOMAIN_DISK_MIRROR_READY) { + if (disk->mirrorState != VIR_DOMAIN_DISK_MIRROR_STATE_READY) { virReportError(VIR_ERR_BLOCK_COPY_ACTIVE, _("disk '%s' not ready for pivot yet"), disk->dst); @@ -14939,7 +14939,7 @@ qemuDomainBlockPivot(virConnectPtr conn, } disk->mirror = NULL; - disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_DEFAULT; + disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_NONE; cleanup: /* revert to original disk def on failure */ @@ -15096,7 +15096,7 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm, * avoid checking if pivot is safe. */ if (mode == BLOCK_JOB_INFO && ret == 1 && disk->mirror && info->cur == info->end && info->type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) - disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_READY; + disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_READY; /* A successful block job cancelation stops any mirroring. */ if (mode == BLOCK_JOB_ABORT && disk->mirror) { @@ -15104,7 +15104,7 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm, * the mirror, and audit that fact, before dropping things. */ virStorageSourceFree(disk->mirror); disk->mirror = NULL; - disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_DEFAULT; + disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_NONE; } waitjob: diff --git i/src/qemu/qemu_process.c w/src/qemu/qemu_process.c index 73ad27d..378c6d3 100644 --- i/src/qemu/qemu_process.c +++ w/src/qemu/qemu_process.c @@ -1040,11 +1040,11 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED, qemuDomainDetermineDiskChain(driver, vm, disk, true); if (disk->mirror && type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) { if (status == VIR_DOMAIN_BLOCK_JOB_READY) { - disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_READY; + disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_READY; } else if (status == VIR_DOMAIN_BLOCK_JOB_FAILED) { virStorageSourceFree(disk->mirror); disk->mirror = NULL; - disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_DEFAULT; + disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_NONE; } } } -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

We were not directly saving the domain XML to file after starting or finishing a blockcopy. Without the startup write, a libvirtd restart in the middle of a copy job would forget that the job was underway. Then at pivot, we were indirectly writing new XML in reaction to events that occur as we stop and restart the guest CPUs. But there was a race: since pivot is an async action, it is possible that libvirtd is restarted before the pivot completes, so if XML changes during the event, that change was not written. The original blockcopy code cleared out the <mirror> element prior to restarting the CPUs, but this is also a race, observed if a user does an async pivot and a dumpxml before the event occurs. Furthermore, this race will interfere with active commit in a future patch, because that code will rely on the <mirror> element at the time of the qemu event to determine whether to inform the user of a normal commit or an active commit. Fix things by saving state any time we modify live XML, while delaying XML disk modifications until after the event completes. We still need a to teach libvirtd restarts to examine all existing <mirror> elements to see if the job completed in the meantime (that is, if libvirtd misses the event, the updated state still needs to be updated in live XML), but that will be a later patch, in part because we also need to to start taking advantage of newer qemu's ability to keep the job around after completion rather than the current usage where the job disappears both on error and on success. * src/qemu/qemu_driver.c (qemuDomainBlockCopy): Track XML change on disk. (qemuDomainBlockJobImpl, qemuDomainBlockPivot): Move job-end XML rewrites... * src/qemu/qemu_process.c (qemuProcessHandleBlockJob): ...here. Signed-off-by: Eric Blake <eblake@redhat.com> --- src/qemu/qemu_driver.c | 121 ++++++++++++++++++++++++++++++------------------ src/qemu/qemu_process.c | 53 ++++++++++++++++----- 2 files changed, 117 insertions(+), 57 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index acc9ef0..180333c 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -14911,38 +14911,43 @@ qemuDomainBlockPivot(virConnectPtr conn, virSecurityManagerSetDiskLabel(driver->securityManager, vm->def, disk) < 0)) goto cleanup; - /* Attempt the pivot. */ + /* Attempt the pivot. Record the attempt now, to prevent duplicate + * attempts; but the actual disk change will be made when emitting + * the event. + * XXX On libvirtd restarts, if we missed the qemu event, we need + * to double check what state qemu is in. + * XXX We should be using qemu's rerror flag to make sure the job + * remains alive until we know it's final state. + * XXX If the abort command is synchronous but the qemu event says + * that pivot failed, we need to reflect that failure into the + * overall return value. */ + disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_PIVOT; qemuDomainObjEnterMonitor(driver, vm); ret = qemuMonitorDrivePivot(priv->mon, device, disk->mirror->path, format); qemuDomainObjExitMonitor(driver, vm); - if (ret == 0) { - /* XXX We want to revoke security labels and disk lease, as - * well as audit that revocation, before dropping the original - * source. But it gets tricky if both source and mirror share - * common backing files (we want to only revoke the non-shared - * portion of the chain, and is made more difficult by the - * fact that we aren't tracking the full chain ourselves; so - * for now, we leak the access to the original. */ - virStorageSourceFree(oldsrc); - oldsrc = NULL; - } else { + if (ret < 0) { /* On failure, qemu abandons the mirror, and reverts back to * the source disk (RHEL 6.3 has a bug where the revert could * cause catastrophic failure in qemu, but we don't need to * worry about it here as it is not an upstream qemu problem. */ /* XXX should we be parsing the exact qemu error, or calling * 'query-block', to see what state we really got left in - * before killing the mirroring job? And just as on the - * success case, there's security labeling to worry about. */ + * before killing the mirroring job? + * XXX We want to revoke security labels and disk lease, as + * well as audit that revocation, before dropping the original + * source. But it gets tricky if both source and mirror share + * common backing files (we want to only revoke the non-shared + * portion of the chain); so for now, we leak the access to + * the original. */ virStorageSourceFree(disk->mirror); + disk->mirror = NULL; + disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_DEFAULT; } - - disk->mirror = NULL; - disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_DEFAULT; + if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm) < 0) + ret = -1; cleanup: - /* revert to original disk def on failure */ if (oldsrc) disk->src = oldsrc; @@ -14985,6 +14990,8 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm, unsigned int baseIndex = 0; char *basePath = NULL; char *backingPath = NULL; + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + bool save = false; if (!virDomainObjIsActive(vm)) { virReportError(VIR_ERR_OPERATION_INVALID, "%s", @@ -15038,19 +15045,30 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm, disk->dst); goto endjob; } - if (mode == BLOCK_JOB_ABORT && - (flags & VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT) && - !(async && disk->mirror)) { - virReportError(VIR_ERR_OPERATION_INVALID, - _("pivot of disk '%s' requires an active copy job"), - disk->dst); - goto endjob; - } + if (mode == BLOCK_JOB_ABORT) { + if ((flags & VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT) && + !(async && disk->mirror)) { + virReportError(VIR_ERR_OPERATION_INVALID, + _("pivot of disk '%s' requires an active copy job"), + disk->dst); + goto endjob; + } + if (disk->mirrorState != VIR_DOMAIN_DISK_MIRROR_DEFAULT && + disk->mirrorState != VIR_DOMAIN_DISK_MIRROR_READY) { + virReportError(VIR_ERR_OPERATION_INVALID, + _("another job on disk '%s' is still being ended"), + disk->dst); + goto endjob; + } - if (disk->mirror && mode == BLOCK_JOB_ABORT && - (flags & VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT)) { - ret = qemuDomainBlockPivot(conn, driver, vm, device, disk); - goto waitjob; + if (disk->mirror && (flags & VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT)) { + ret = qemuDomainBlockPivot(conn, driver, vm, device, disk); + goto waitjob; + } + if (disk->mirror) { + disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_ABORT; + save = true; + } } if (base && @@ -15089,36 +15107,40 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm, ret = qemuMonitorBlockJob(priv->mon, device, basePath, backingPath, bandwidth, info, mode, async); qemuDomainObjExitMonitor(driver, vm); - if (ret < 0) + if (ret < 0) { + if (mode == BLOCK_JOB_ABORT && disk->mirror) + disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_DEFAULT; goto endjob; + } /* Snoop block copy operations, so future cancel operations can * avoid checking if pivot is safe. */ if (mode == BLOCK_JOB_INFO && ret == 1 && disk->mirror && - info->cur == info->end && info->type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) + info->cur == info->end && !disk->mirrorState) { disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_READY; - - /* A successful block job cancelation stops any mirroring. */ - if (mode == BLOCK_JOB_ABORT && disk->mirror) { - /* XXX We should also revoke security labels and disk lease on - * the mirror, and audit that fact, before dropping things. */ - virStorageSourceFree(disk->mirror); - disk->mirror = NULL; - disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_DEFAULT; + save = true; } waitjob: + /* If we have made changes to XML due to a copy job, make a best + * effort to save it now. But we can ignore failure, since there + * will be further changes when the event marks completion. */ + if (save) + ignore_value(virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm)); + /* With synchronous block cancel, we must synthesize an event, and * we silently ignore the ABORT_ASYNC flag. With asynchronous - * block cancel, the event will come from qemu, but without the - * ABORT_ASYNC flag, we must block to guarantee synchronous - * operation. We do the waiting while still holding the VM job, - * to prevent newly scheduled block jobs from confusing us. */ + * block cancel, the event will come from qemu and will update the + * XML as appropriate, but without the ABORT_ASYNC flag, we must + * block to guarantee synchronous operation. We do the waiting + * while still holding the VM job, to prevent newly scheduled + * block jobs from confusing us. */ if (mode == BLOCK_JOB_ABORT) { if (!async) { /* Older qemu that lacked async reporting also lacked - * active commit, so we can hardcode the event to pull. - * We have to generate two variants of the event. */ + * blockcopy and active commit, so we can hardcode the + * event to pull, and we know the XML doesn't need + * updating. We have to generate two event variants. */ int type = VIR_DOMAIN_BLOCK_JOB_TYPE_PULL; int status = VIR_DOMAIN_BLOCK_JOB_CANCELED; event = virDomainEventBlockJobNewFromObj(vm, disk->src->path, type, @@ -15126,6 +15148,8 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm, event2 = virDomainEventBlockJob2NewFromObj(vm, disk->dst, type, status); } else if (!(flags & VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC)) { + /* XXX If the event reports failure, we should reflect + * that back into the return status of this API call. */ while (1) { /* Poll every 50ms */ static struct timespec ts = { .tv_sec = 0, @@ -15163,6 +15187,7 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm, } cleanup: + virObjectUnref(cfg); VIR_FREE(basePath); VIR_FREE(backingPath); VIR_FREE(device); @@ -15394,6 +15419,10 @@ qemuDomainBlockCopy(virDomainObjPtr vm, disk->mirror = mirror; mirror = NULL; + if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm) < 0) + VIR_WARN("Unable to save status on vm %s after state change", + vm->def->name); + endjob: if (need_unlink && unlink(dest)) VIR_WARN("unable to unlink just-created %s", dest); diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 73ad27d..e770a77 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1016,6 +1016,8 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED, virObjectEventPtr event2 = NULL; const char *path; virDomainDiskDefPtr disk; + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + bool save = false; virObjectLock(vm); disk = qemuProcessFindDomainDiskByAlias(vm, diskAlias); @@ -1027,29 +1029,58 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED, event = virDomainEventBlockJobNewFromObj(vm, path, type, status); event2 = virDomainEventBlockJob2NewFromObj(vm, disk->dst, type, status); - /* XXX If we completed a block pull or commit, then recompute - * the cached backing chain to match. Better would be storing - * the chain ourselves rather than reprobing, but this - * requires modifying domain_conf and our XML to fully track - * the chain across libvirtd restarts. For that matter, if - * qemu gains support for committing the active layer, we have - * to update disk->src. */ - if ((type == VIR_DOMAIN_BLOCK_JOB_TYPE_PULL || - type == VIR_DOMAIN_BLOCK_JOB_TYPE_COMMIT) && - status == VIR_DOMAIN_BLOCK_JOB_COMPLETED) + + /* If we completed a block pull or commit, then update the XML + * to match. */ + if (status == VIR_DOMAIN_BLOCK_JOB_COMPLETED) { + bool has_mirror = !!disk->mirror; + + if (disk->mirrorState == VIR_DOMAIN_DISK_MIRROR_PIVOT) { + /* XXX We want to revoke security labels and disk + * lease, as well as audit that revocation, before + * dropping the original source. But it gets tricky + * if both source and mirror share common backing + * files (we want to only revoke the non-shared + * portion of the chain); so for now, we leak the + * access to the original. */ + virStorageSourceFree(disk->src); + disk->src = disk->mirror; + disk->mirror = NULL; + } + if (has_mirror) { + virStorageSourceFree(disk->mirror); + disk->mirror = NULL; + save = disk->mirrorState != VIR_DOMAIN_DISK_MIRROR_DEFAULT; + disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_DEFAULT; + } + /* Recompute the cached backing chain to match our + * updates. Better would be storing the chain ourselves + * rather than reprobing, but we haven't quite completed + * that conversion to use our XML tracking. */ qemuDomainDetermineDiskChain(driver, vm, disk, true); - if (disk->mirror && type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) { + } + + if (disk->mirror && + type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) { if (status == VIR_DOMAIN_BLOCK_JOB_READY) { disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_READY; + save = true; } else if (status == VIR_DOMAIN_BLOCK_JOB_FAILED) { virStorageSourceFree(disk->mirror); disk->mirror = NULL; disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_DEFAULT; + save = true; } } } + if (save) { + if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm) < 0) + VIR_WARN("Unable to save status on vm %s after block job", + vm->def->name); + } virObjectUnlock(vm); + virObjectUnref(cfg); if (event) qemuDomainEventQueue(driver, event); -- 1.9.3

On 07/29/14 06:20, Eric Blake wrote:
We were not directly saving the domain XML to file after starting or finishing a blockcopy. Without the startup write, a libvirtd restart in the middle of a copy job would forget that the job was underway. Then at pivot, we were indirectly writing new XML in reaction to events that occur as we stop and restart the guest CPUs. But there was a race: since pivot is an async action, it is possible that libvirtd is restarted before the pivot completes, so if XML changes during the event, that change was not written. The original blockcopy code cleared out the <mirror> element prior to restarting the CPUs, but this is also a race, observed if a user does an async pivot and a dumpxml before the event occurs. Furthermore, this race will interfere with active commit in a future patch, because that code will rely on the <mirror> element at the time of the qemu event to determine whether to inform the user of a normal commit or an active commit.
Fix things by saving state any time we modify live XML, while delaying XML disk modifications until after the event completes. We still need a to teach libvirtd restarts to examine all existing <mirror> elements to see if the job completed in the meantime (that is, if libvirtd misses the event, the updated state still needs to be updated in live XML), but that will be a later patch, in part because we also need to to start taking advantage of newer qemu's ability to keep the job around after completion rather than the current usage where the job disappears both on error and on success.
* src/qemu/qemu_driver.c (qemuDomainBlockCopy): Track XML change on disk. (qemuDomainBlockJobImpl, qemuDomainBlockPivot): Move job-end XML rewrites... * src/qemu/qemu_process.c (qemuProcessHandleBlockJob): ...here.
Signed-off-by: Eric Blake <eblake@redhat.com> --- src/qemu/qemu_driver.c | 121 ++++++++++++++++++++++++++++++------------------ src/qemu/qemu_process.c | 53 ++++++++++++++++----- 2 files changed, 117 insertions(+), 57 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index acc9ef0..180333c 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -14911,38 +14911,43 @@ qemuDomainBlockPivot(virConnectPtr conn, virSecurityManagerSetDiskLabel(driver->securityManager, vm->def, disk) < 0)) goto cleanup;
- /* Attempt the pivot. */ + /* Attempt the pivot. Record the attempt now, to prevent duplicate + * attempts; but the actual disk change will be made when emitting + * the event. + * XXX On libvirtd restarts, if we missed the qemu event, we need + * to double check what state qemu is in. + * XXX We should be using qemu's rerror flag to make sure the job + * remains alive until we know it's final state. + * XXX If the abort command is synchronous but the qemu event says + * that pivot failed, we need to reflect that failure into the + * overall return value. */ + disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_PIVOT; qemuDomainObjEnterMonitor(driver, vm); ret = qemuMonitorDrivePivot(priv->mon, device, disk->mirror->path, format); qemuDomainObjExitMonitor(driver, vm);
- if (ret == 0) { - /* XXX We want to revoke security labels and disk lease, as - * well as audit that revocation, before dropping the original - * source. But it gets tricky if both source and mirror share - * common backing files (we want to only revoke the non-shared - * portion of the chain, and is made more difficult by the - * fact that we aren't tracking the full chain ourselves; so - * for now, we leak the access to the original. */ - virStorageSourceFree(oldsrc); - oldsrc = NULL; - } else { + if (ret < 0) { /* On failure, qemu abandons the mirror, and reverts back to * the source disk (RHEL 6.3 has a bug where the revert could * cause catastrophic failure in qemu, but we don't need to * worry about it here as it is not an upstream qemu problem. */ /* XXX should we be parsing the exact qemu error, or calling * 'query-block', to see what state we really got left in - * before killing the mirroring job? And just as on the - * success case, there's security labeling to worry about. */ + * before killing the mirroring job? + * XXX We want to revoke security labels and disk lease, as + * well as audit that revocation, before dropping the original + * source. But it gets tricky if both source and mirror share + * common backing files (we want to only revoke the non-shared + * portion of the chain); so for now, we leak the access to + * the original. */
Libvirt has now the option to revert labeling on individual images, but that can be done later.
virStorageSourceFree(disk->mirror); + disk->mirror = NULL; + disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_DEFAULT; } - - disk->mirror = NULL; - disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_DEFAULT; + if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm) < 0) + ret = -1;
cleanup: - /* revert to original disk def on failure */ if (oldsrc) disk->src = oldsrc;
@@ -14985,6 +14990,8 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm, unsigned int baseIndex = 0; char *basePath = NULL; char *backingPath = NULL; + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + bool save = false;
if (!virDomainObjIsActive(vm)) { virReportError(VIR_ERR_OPERATION_INVALID, "%s", @@ -15038,19 +15045,30 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm, disk->dst); goto endjob; } - if (mode == BLOCK_JOB_ABORT && - (flags & VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT) && - !(async && disk->mirror)) { - virReportError(VIR_ERR_OPERATION_INVALID, - _("pivot of disk '%s' requires an active copy job"), - disk->dst); - goto endjob; - } + if (mode == BLOCK_JOB_ABORT) { + if ((flags & VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT) && + !(async && disk->mirror)) { + virReportError(VIR_ERR_OPERATION_INVALID, + _("pivot of disk '%s' requires an active copy job"), + disk->dst); + goto endjob; + } + if (disk->mirrorState != VIR_DOMAIN_DISK_MIRROR_DEFAULT && + disk->mirrorState != VIR_DOMAIN_DISK_MIRROR_READY) { + virReportError(VIR_ERR_OPERATION_INVALID, + _("another job on disk '%s' is still being ended"), + disk->dst); + goto endjob; + }
- if (disk->mirror && mode == BLOCK_JOB_ABORT && - (flags & VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT)) { - ret = qemuDomainBlockPivot(conn, driver, vm, device, disk); - goto waitjob; + if (disk->mirror && (flags & VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT)) { + ret = qemuDomainBlockPivot(conn, driver, vm, device, disk); + goto waitjob; + } + if (disk->mirror) { + disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_ABORT; + save = true; + } }
if (base && @@ -15089,36 +15107,40 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm, ret = qemuMonitorBlockJob(priv->mon, device, basePath, backingPath, bandwidth, info, mode, async); qemuDomainObjExitMonitor(driver, vm); - if (ret < 0) + if (ret < 0) { + if (mode == BLOCK_JOB_ABORT && disk->mirror) + disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_DEFAULT; goto endjob; + }
/* Snoop block copy operations, so future cancel operations can * avoid checking if pivot is safe. */ if (mode == BLOCK_JOB_INFO && ret == 1 && disk->mirror && - info->cur == info->end && info->type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) + info->cur == info->end && !disk->mirrorState) { disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_READY; - - /* A successful block job cancelation stops any mirroring. */ - if (mode == BLOCK_JOB_ABORT && disk->mirror) { - /* XXX We should also revoke security labels and disk lease on - * the mirror, and audit that fact, before dropping things. */ - virStorageSourceFree(disk->mirror); - disk->mirror = NULL; - disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_DEFAULT; + save = true; }
waitjob: + /* If we have made changes to XML due to a copy job, make a best + * effort to save it now. But we can ignore failure, since there + * will be further changes when the event marks completion. */ + if (save) + ignore_value(virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm)); + /* With synchronous block cancel, we must synthesize an event, and * we silently ignore the ABORT_ASYNC flag. With asynchronous - * block cancel, the event will come from qemu, but without the - * ABORT_ASYNC flag, we must block to guarantee synchronous - * operation. We do the waiting while still holding the VM job, - * to prevent newly scheduled block jobs from confusing us. */ + * block cancel, the event will come from qemu and will update the + * XML as appropriate, but without the ABORT_ASYNC flag, we must + * block to guarantee synchronous operation. We do the waiting + * while still holding the VM job, to prevent newly scheduled + * block jobs from confusing us. */ if (mode == BLOCK_JOB_ABORT) { if (!async) { /* Older qemu that lacked async reporting also lacked - * active commit, so we can hardcode the event to pull. - * We have to generate two variants of the event. */ + * blockcopy and active commit, so we can hardcode the + * event to pull, and we know the XML doesn't need + * updating. We have to generate two event variants. */ int type = VIR_DOMAIN_BLOCK_JOB_TYPE_PULL; int status = VIR_DOMAIN_BLOCK_JOB_CANCELED; event = virDomainEventBlockJobNewFromObj(vm, disk->src->path, type, @@ -15126,6 +15148,8 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm, event2 = virDomainEventBlockJob2NewFromObj(vm, disk->dst, type, status); } else if (!(flags & VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC)) { + /* XXX If the event reports failure, we should reflect + * that back into the return status of this API call. */ while (1) { /* Poll every 50ms */ static struct timespec ts = { .tv_sec = 0, @@ -15163,6 +15187,7 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm, }
cleanup: + virObjectUnref(cfg); VIR_FREE(basePath); VIR_FREE(backingPath); VIR_FREE(device); @@ -15394,6 +15419,10 @@ qemuDomainBlockCopy(virDomainObjPtr vm, disk->mirror = mirror; mirror = NULL;
+ if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm) < 0) + VIR_WARN("Unable to save status on vm %s after state change", + vm->def->name); + endjob: if (need_unlink && unlink(dest)) VIR_WARN("unable to unlink just-created %s", dest); diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 73ad27d..e770a77 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1016,6 +1016,8 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED, virObjectEventPtr event2 = NULL; const char *path; virDomainDiskDefPtr disk; + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + bool save = false;
virObjectLock(vm); disk = qemuProcessFindDomainDiskByAlias(vm, diskAlias); @@ -1027,29 +1029,58 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED, event = virDomainEventBlockJobNewFromObj(vm, path, type, status); event2 = virDomainEventBlockJob2NewFromObj(vm, disk->dst, type, status); - /* XXX If we completed a block pull or commit, then recompute - * the cached backing chain to match. Better would be storing - * the chain ourselves rather than reprobing, but this - * requires modifying domain_conf and our XML to fully track - * the chain across libvirtd restarts. For that matter, if - * qemu gains support for committing the active layer, we have - * to update disk->src. */ - if ((type == VIR_DOMAIN_BLOCK_JOB_TYPE_PULL || - type == VIR_DOMAIN_BLOCK_JOB_TYPE_COMMIT) && - status == VIR_DOMAIN_BLOCK_JOB_COMPLETED) + + /* If we completed a block pull or commit, then update the XML + * to match. */ + if (status == VIR_DOMAIN_BLOCK_JOB_COMPLETED) { + bool has_mirror = !!disk->mirror;
The control flow is becoming less obvious in this function.
+ + if (disk->mirrorState == VIR_DOMAIN_DISK_MIRROR_PIVOT) { + /* XXX We want to revoke security labels and disk + * lease, as well as audit that revocation, before + * dropping the original source. But it gets tricky + * if both source and mirror share common backing + * files (we want to only revoke the non-shared + * portion of the chain); so for now, we leak the + * access to the original. */ + virStorageSourceFree(disk->src); + disk->src = disk->mirror; + disk->mirror = NULL;
If we were here, then
+ } + if (has_mirror) {
This condition is also true and we try to free the source of the mirror again, even if we did it above. On the other hand, this is reached even if the _ABORT job is completed, where this makes sense.
+ virStorageSourceFree(disk->mirror); + disk->mirror = NULL; + save = disk->mirrorState != VIR_DOMAIN_DISK_MIRROR_DEFAULT; + disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_DEFAULT; + } + /* Recompute the cached backing chain to match our + * updates. Better would be storing the chain ourselves + * rather than reprobing, but we haven't quite completed + * that conversion to use our XML tracking. */ qemuDomainDetermineDiskChain(driver, vm, disk, true); - if (disk->mirror && type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) { + } + + if (disk->mirror &&
Now this really is an else section to the above as both paths above clear disk->mirror.
+ type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) { if (status == VIR_DOMAIN_BLOCK_JOB_READY) { disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_READY; + save = true; } else if (status == VIR_DOMAIN_BLOCK_JOB_FAILED) { virStorageSourceFree(disk->mirror); disk->mirror = NULL; disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_DEFAULT; + save = true; } } }
+ if (save) {
Save is true if this function was reached due to a mirroring operation change ...
+ if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm) < 0) + VIR_WARN("Unable to save status on vm %s after block job", + vm->def->name); + } virObjectUnlock(vm); + virObjectUnref(cfg);
if (event) qemuDomainEventQueue(driver, event);
ACK, the function flow isn't obvious but it's correct from my POV Peter

On 07/29/2014 05:40 AM, Peter Krempa wrote:
Fix things by saving state any time we modify live XML, while delaying XML disk modifications until after the event completes. We still need a to teach libvirtd restarts to examine all existing <mirror> elements to see if the job completed in the meantime (that is, if libvirtd misses the event, the updated state still needs to be updated in live XML), but that will be a later patch, in part because we also need to to start taking advantage of newer qemu's ability to keep the job around after completion rather than the current usage where the job disappears both on error and on success.
+ /* If we completed a block pull or commit, then update the XML + * to match. */ + if (status == VIR_DOMAIN_BLOCK_JOB_COMPLETED) { + bool has_mirror = !!disk->mirror;
The control flow is becoming less obvious in this function.
Yeah, I struggled on that for a while.
+ + if (disk->mirrorState == VIR_DOMAIN_DISK_MIRROR_PIVOT) { + /* XXX We want to revoke security labels and disk + * lease, as well as audit that revocation, before + * dropping the original source. But it gets tricky + * if both source and mirror share common backing + * files (we want to only revoke the non-shared + * portion of the chain); so for now, we leak the + * access to the original. */ + virStorageSourceFree(disk->src); + disk->src = disk->mirror; + disk->mirror = NULL;
If we were here, then
+ } + if (has_mirror) {
This condition is also true and we try to free the source of the mirror again, even if we did it above. On the other hand, this is reached even if the _ABORT job is completed, where this makes sense.
Basically, once a job completes, we have to clean up disk->mirror whether it was abort or pivot; but if it was pivot, we also have to adjust the source to be the former mirror contents. But I agree that a minor tweak to the flow makes it easier to read.
qemuDomainDetermineDiskChain(driver, vm, disk, true); - if (disk->mirror && type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) { + } + + if (disk->mirror &&
Now this really is an else section to the above as both paths above clear disk->mirror.
Also correct.
ACK, the function flow isn't obvious but it's correct from my POV
After rebasing on top of 1/4, here's what I squashed in to improve the control flow, before pushing 1 and 2. I'll wait on 3 and 4 for any other opinions on whether active commit deserves to be in rc2. diff --git i/src/qemu/qemu_process.c w/src/qemu/qemu_process.c index 1340c8d..48da843 100644 --- i/src/qemu/qemu_process.c +++ w/src/qemu/qemu_process.c @@ -1033,8 +1033,6 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED, /* If we completed a block pull or commit, then update the XML * to match. */ if (status == VIR_DOMAIN_BLOCK_JOB_COMPLETED) { - bool has_mirror = !!disk->mirror; - if (disk->mirrorState == VIR_DOMAIN_DISK_MIRROR_STATE_PIVOT) { /* XXX We want to revoke security labels and disk * lease, as well as audit that revocation, before @@ -1045,22 +1043,19 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED, * access to the original. */ virStorageSourceFree(disk->src); disk->src = disk->mirror; - disk->mirror = NULL; - } - if (has_mirror) { + } else { virStorageSourceFree(disk->mirror); - disk->mirror = NULL; - save = disk->mirrorState != VIR_DOMAIN_DISK_MIRROR_STATE_NONE; - disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_NONE; } + /* Recompute the cached backing chain to match our * updates. Better would be storing the chain ourselves * rather than reprobing, but we haven't quite completed * that conversion to use our XML tracking. */ + disk->mirror = NULL; + save = disk->mirrorState != VIR_DOMAIN_DISK_MIRROR_STATE_NONE; + disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_NONE; qemuDomainDetermineDiskChain(driver, vm, disk, true); - } - - if (disk->mirror && type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) { + } else if (disk->mirror && type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) { if (status == VIR_DOMAIN_BLOCK_JOB_READY) { disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_READY; save = true; -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 07/28/2014 10:20 PM, Eric Blake wrote:
Fix things by saving state any time we modify live XML, while delaying XML disk modifications until after the event completes.
Except that I didn't, which means I caused a use-after-free situation which can crash libvirtd when doing blockcopy or active blockcommit:
+++ b/src/qemu/qemu_driver.c @@ -14911,38 +14911,43 @@ qemuDomainBlockPivot(virConnectPtr conn, virSecurityManagerSetDiskLabel(driver->securityManager, vm->def, disk) < 0)) goto cleanup;
- /* Attempt the pivot. */ + /* Attempt the pivot. Record the attempt now, to prevent duplicate + * attempts; but the actual disk change will be made when emitting + * the event.
What I missed was that a couple lines before, the code did 'disk->src = disk->mirror', which means that the qemu event handler...
@@ -1027,29 +1029,58 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED,
+ /* If we completed a block pull or commit, then update the XML + * to match. */ + if (status == VIR_DOMAIN_BLOCK_JOB_COMPLETED) { + bool has_mirror = !!disk->mirror; + + if (disk->mirrorState == VIR_DOMAIN_DISK_MIRROR_PIVOT) { + /* XXX We want to revoke security labels and disk + * lease, as well as audit that revocation, before + * dropping the original source. But it gets tricky + * if both source and mirror share common backing + * files (we want to only revoke the non-shared + * portion of the chain); so for now, we leak the + * access to the original. */ + virStorageSourceFree(disk->src); + disk->src = disk->mirror;
is promptly calling virStorageSourceFree() on the same pointer that it is about to assign into disk->src, as well as leaking the original disk->src. Fix coming up. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

A future patch is going to wire up qemu active block commit jobs; but as they have similar events and are canceled/pivoted in the same way as block copy jobs, it is easiest to track all bookkeeping for the commit job by reusing the <mirror> element. This patch adds domain XML to track which job was responsible for creating a mirroring situation, and adds a job='copy' attribute to all existing uses of <mirror>. Along the way, it also massages the qemu monitor backend to read the new field in order to generate the correct type of libvirt job (even though it requires a future patch to actually cause a qemu event that can be reported as an active commit). It also prepares to update persistent XML to match changes made to live XML when a copy completes. * docs/schemas/domaincommon.rng: Enhance schema. * docs/formatdomain.html.in: Document it. * src/conf/domain_conf.h (_virDomainDiskDef): Add a field. * src/conf/domain_conf.c (virDomainBlockJobType): String conversion. (virDomainDiskDefParseXML): Parse job type. (virDomainDiskDefFormat): Output job type. * src/qemu/qemu_process.c (qemuProcessHandleBlockJob): Distinguish active from regular commit. * src/qemu/qemu_driver.c (qemuDomainBlockCopy): Set job type. (qemuDomainBlockPivot, qemuDomainBlockJobImpl): Clean up job type on completion. * tests/qemuxml2xmloutdata/qemuxml2xmlout-disk-mirror-old.xml: Update tests. * tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml: Likewise. * tests/qemuxml2argvdata/qemuxml2argv-disk-active-commit.xml: New file. * tests/qemuxml2xmltest.c (mymain): Drive new test. Signed-off-by: Eric Blake <eblake@redhat.com> --- docs/formatdomain.html.in | 15 +++++---- docs/schemas/domaincommon.rng | 13 ++++++++ src/conf/domain_conf.c | 33 ++++++++++++++++++- src/conf/domain_conf.h | 1 + src/qemu/qemu_driver.c | 2 ++ src/qemu/qemu_process.c | 37 +++++++++++++++++++++- .../qemuxml2argv-disk-active-commit.xml | 37 ++++++++++++++++++++++ .../qemuxml2argvdata/qemuxml2argv-disk-mirror.xml | 6 ++-- .../qemuxml2xmlout-disk-mirror-old.xml | 4 +-- tests/qemuxml2xmltest.c | 1 + 10 files changed, 136 insertions(+), 13 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-disk-active-commit.xml diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 4cb4289..0e7fad2 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1898,16 +1898,19 @@ </dd> <dt><code>mirror</code></dt> <dd> - This element is present if the hypervisor has started a block - copy operation (via the <code>virDomainBlockRebase</code> - API), where the mirror location in the <code>source</code> - sub-element will eventually have the same contents as the - source, and with the file format in the + This element is present if the hypervisor has started a + long-running block job operation, where the mirror location in + the <code>source</code> sub-element will eventually have the + same contents as the source, and with the file format in the sub-element <code>format</code> (which might differ from the format of the source). The details of the <code>source</code> sub-element are determined by the <code>type</code> attribute of the mirror, similar to what is done for the - overall <code>disk</code> device element. The + overall <code>disk</code> device element. The <code>job</code> + attribute mentions which API started the operation ("copy" for + the <code>virDomainBlockRebase</code> API, or "active-commit" + for the <code>virDomainBlockCommit</code> + API), <span class="since">since 1.2.7</span>. The attribute <code>ready</code>, if present, tracks progress of the job: <code>yes</code> if the disk is known to be ready to pivot, or, <span class="since">since diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 389c8c9..3911a9f 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -4265,6 +4265,13 @@ </attribute> </optional> <optional> + <attribute name='job'> + <choice> + <value>copy</value> + </choice> + </attribute> + </optional> + <optional> <interleave> <ref name='diskSourceFile'/> <optional> @@ -4274,6 +4281,12 @@ </optional> </group> <group> <!-- preferred format --> + <attribute name='job'> + <choice> + <value>copy</value> + <value>active-commit</value> + </choice> + </attribute> <interleave> <ref name="diskSource"/> <optional> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 1c8f8cf..1a76ce7 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -753,6 +753,12 @@ VIR_ENUM_IMPL(virDomainDiskMirror, VIR_DOMAIN_DISK_MIRROR_LAST, "abort", "pivot") +/* Internal mapping: subset of block job types that can be present in + * <mirror> XML (remaining types are not two-phase). */ +VIR_ENUM_DECL(virDomainBlockJob) +VIR_ENUM_IMPL(virDomainBlockJob, VIR_DOMAIN_BLOCK_JOB_TYPE_LAST, + "", "", "copy", "", "active-commit") + #define VIR_DOMAIN_XML_WRITE_FLAGS VIR_DOMAIN_XML_SECURE #define VIR_DOMAIN_XML_READ_FLAGS VIR_DOMAIN_XML_INACTIVE @@ -5437,10 +5443,26 @@ virDomainDiskDefParseXML(virDomainXMLOptionPtr xmlopt, xmlStrEqual(cur->name, BAD_CAST "mirror") && !(flags & VIR_DOMAIN_XML_INACTIVE)) { char *ready; + char *blockJob; if (VIR_ALLOC(def->mirror) < 0) goto error; + blockJob = virXMLPropString(cur, "job"); + if (blockJob) { + def->mirrorJob = virDomainBlockJobTypeFromString(blockJob); + if (def->mirrorJob <= 0) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("unknown mirror job type '%s'"), + blockJob); + VIR_FREE(blockJob); + goto error; + } + VIR_FREE(blockJob); + } else { + def->mirrorJob = VIR_DOMAIN_BLOCK_JOB_TYPE_COPY; + } + mirrorType = virXMLPropString(cur, "type"); if (mirrorType) { def->mirror->type = virStorageTypeFromString(mirrorType); @@ -5463,6 +5485,12 @@ virDomainDiskDefParseXML(virDomainXMLOptionPtr xmlopt, _("mirror requires file name")); goto error; } + if (def->mirrorJob != VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("mirror without type only supported " + "by copy job")); + goto error; + } mirrorFormat = virXMLPropString(cur, "format"); } if (mirrorFormat) { @@ -15282,10 +15310,13 @@ virDomainDiskDefFormat(virBufferPtr buf, formatStr = virStorageFileFormatTypeToString(def->mirror->format); virBufferAsprintf(buf, "<mirror type='%s'", virStorageTypeToString(def->mirror->type)); - if (def->mirror->type == VIR_STORAGE_TYPE_FILE) { + if (def->mirror->type == VIR_STORAGE_TYPE_FILE && + def->mirrorJob == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) { virBufferEscapeString(buf, " file='%s'", def->mirror->path); virBufferEscapeString(buf, " format='%s'", formatStr); } + virBufferEscapeString(buf, " job='%s'", + virDomainBlockJobTypeToString(def->mirrorJob)); if (def->mirrorState) { const char *mirror; diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index bc8966d..00ad137 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -632,6 +632,7 @@ struct _virDomainDiskDef { virStorageSourcePtr mirror; int mirrorState; /* enum virDomainDiskMirror */ + int mirrorJob; /* virDomainBlockJobType */ struct { unsigned int cylinders; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 180333c..683eaf3 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -14943,6 +14943,7 @@ qemuDomainBlockPivot(virConnectPtr conn, virStorageSourceFree(disk->mirror); disk->mirror = NULL; disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_DEFAULT; + disk->mirrorJob = VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN; } if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm) < 0) ret = -1; @@ -15418,6 +15419,7 @@ qemuDomainBlockCopy(virDomainObjPtr vm, need_unlink = false; disk->mirror = mirror; mirror = NULL; + disk->mirrorJob = VIR_DOMAIN_BLOCK_JOB_TYPE_COPY; if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm) < 0) VIR_WARN("Unable to save status on vm %s after state change", diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index e770a77..4ea5493 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1017,6 +1017,7 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED, const char *path; virDomainDiskDefPtr disk; virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + virDomainDiskDefPtr persistDisk = NULL; bool save = false; virObjectLock(vm); @@ -1025,6 +1026,9 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED, if (disk) { /* Have to generate two variants of the event for old vs. new * client callbacks */ + if (type == VIR_DOMAIN_BLOCK_JOB_TYPE_COMMIT && + disk->mirrorJob == VIR_DOMAIN_BLOCK_JOB_TYPE_ACTIVE_COMMIT) + type = disk->mirrorJob; path = virDomainDiskGetSource(disk); event = virDomainEventBlockJobNewFromObj(vm, path, type, status); event2 = virDomainEventBlockJob2NewFromObj(vm, disk->dst, type, @@ -1036,6 +1040,31 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED, bool has_mirror = !!disk->mirror; if (disk->mirrorState == VIR_DOMAIN_DISK_MIRROR_PIVOT) { + if (vm->newDef) { + int indx = virDomainDiskIndexByName(vm->newDef, disk->dst, + false); + virStorageSourcePtr copy = NULL; + + if (indx >= 0) { + persistDisk = vm->newDef->disks[indx]; + copy = virStorageSourceCopy(disk->mirror, false); + if (virStorageSourceInitChainElement(copy, + persistDisk->src, + false) < 0) { + VIR_WARN("Unable to update persistent definition " + "on vm %s after block job", + vm->def->name); + virStorageSourceFree(copy); + copy = NULL; + persistDisk = NULL; + } + } + if (copy) { + virStorageSourceFree(persistDisk->src); + persistDisk->src = copy; + } + } + /* XXX We want to revoke security labels and disk * lease, as well as audit that revocation, before * dropping the original source. But it gets tricky @@ -1061,7 +1090,8 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED, } if (disk->mirror && - type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) { + (type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY || + type == VIR_DOMAIN_BLOCK_JOB_TYPE_ACTIVE_COMMIT)) { if (status == VIR_DOMAIN_BLOCK_JOB_READY) { disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_READY; save = true; @@ -1069,6 +1099,7 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED, virStorageSourceFree(disk->mirror); disk->mirror = NULL; disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_DEFAULT; + disk->mirrorJob = VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN; save = true; } } @@ -1078,6 +1109,10 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED, if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm) < 0) VIR_WARN("Unable to save status on vm %s after block job", vm->def->name); + if (persistDisk && virDomainSaveConfig(cfg->configDir, + vm->newDef) < 0) + VIR_WARN("Unable to update persistent definition on vm %s " + "after block job", vm->def->name); } virObjectUnlock(vm); virObjectUnref(cfg); diff --git a/tests/qemuxml2argvdata/qemuxml2argv-disk-active-commit.xml b/tests/qemuxml2argvdata/qemuxml2argv-disk-active-commit.xml new file mode 100644 index 0000000..6ba5724 --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-disk-active-commit.xml @@ -0,0 +1,37 @@ +<domain type='qemu' id='1'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219136</memory> + <currentMemory unit='KiB'>219136</currentMemory> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu</emulator> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2'/> + <source file='/tmp/HostVG/QEMUGuest1-snap'/> + <backingStore type='block' index='1'> + <format type='raw'/> + <source dev='/dev/HostVG/QEMUGuest1'/> + <backingStore/> + </backingStore> + <mirror type='block' job='active-commit'> + <format type='raw'/> + <source dev='/dev/HostVG/QEMUGuest1'/> + </mirror> + <target dev='hda' bus='ide'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <controller type='usb' index='0'/> + <controller type='ide' index='0'/> + <controller type='pci' index='0' model='pci-root'/> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml b/tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml index fa7af31..46f2a3e 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml +++ b/tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml @@ -17,7 +17,7 @@ <disk type='block' device='disk'> <source dev='/dev/HostVG/QEMUGuest1'/> <backingStore/> - <mirror type='file' file='/dev/HostVG/QEMUGuest1Copy' ready='yes'> + <mirror type='file' file='/dev/HostVG/QEMUGuest1Copy' job='copy' ready='yes'> <source file='/dev/HostVG/QEMUGuest1Copy'/> </mirror> <target dev='hda' bus='ide'/> @@ -33,7 +33,7 @@ <disk type='file' device='disk'> <source file='/tmp/data.img'/> <backingStore/> - <mirror type='file' file='/tmp/copy.img' format='qcow2'> + <mirror type='file' file='/tmp/copy.img' format='qcow2' job='copy'> <format type='qcow2'/> <source file='/tmp/copy.img'/> </mirror> @@ -42,7 +42,7 @@ <disk type='file' device='disk'> <source file='/tmp/logs.img'/> <backingStore/> - <mirror type='file' file='/tmp/logcopy.img' format='qcow2' ready='abort'> + <mirror type='file' file='/tmp/logcopy.img' format='qcow2' job='copy' ready='abort'> <format type='qcow2'/> <source file='/tmp/logcopy.img'/> </mirror> diff --git a/tests/qemuxml2xmloutdata/qemuxml2xmlout-disk-mirror-old.xml b/tests/qemuxml2xmloutdata/qemuxml2xmlout-disk-mirror-old.xml index 72b03c9..abec180 100644 --- a/tests/qemuxml2xmloutdata/qemuxml2xmlout-disk-mirror-old.xml +++ b/tests/qemuxml2xmloutdata/qemuxml2xmlout-disk-mirror-old.xml @@ -17,7 +17,7 @@ <disk type='block' device='disk'> <source dev='/dev/HostVG/QEMUGuest1'/> <backingStore/> - <mirror type='file' file='/dev/HostVG/QEMUGuest1Copy' ready='yes'> + <mirror type='file' file='/dev/HostVG/QEMUGuest1Copy' job='copy' ready='yes'> <source file='/dev/HostVG/QEMUGuest1Copy'/> </mirror> <target dev='hda' bus='ide'/> @@ -33,7 +33,7 @@ <disk type='file' device='disk'> <source file='/tmp/data.img'/> <backingStore/> - <mirror type='file' file='/tmp/copy.img' format='qcow2'> + <mirror type='file' file='/tmp/copy.img' format='qcow2' job='copy'> <format type='qcow2'/> <source file='/tmp/copy.img'/> </mirror> diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index cefe05b..8966b9a 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -229,6 +229,7 @@ mymain(void) DO_TEST_DIFFERENT("disk-mirror-old"); DO_TEST_FULL("disk-mirror", false, WHEN_ACTIVE); DO_TEST_FULL("disk-mirror", true, WHEN_INACTIVE); + DO_TEST_FULL("disk-active-commit", false, WHEN_ACTIVE); DO_TEST("graphics-listen-network"); DO_TEST("graphics-vnc"); DO_TEST("graphics-vnc-websocket"); -- 1.9.3

On 07/29/14 06:20, Eric Blake wrote:
A future patch is going to wire up qemu active block commit jobs; but as they have similar events and are canceled/pivoted in the same way as block copy jobs, it is easiest to track all bookkeeping for the commit job by reusing the <mirror> element. This patch adds domain XML to track which job was responsible for creating a mirroring situation, and adds a job='copy' attribute to all existing uses of <mirror>. Along the way, it also massages the qemu monitor backend to read the new field in order to generate the correct type of libvirt job (even though it requires a future patch to actually cause a qemu event that can be reported as an active commit). It also prepares to update persistent XML to match changes made to live XML when a copy completes.
* docs/schemas/domaincommon.rng: Enhance schema. * docs/formatdomain.html.in: Document it. * src/conf/domain_conf.h (_virDomainDiskDef): Add a field. * src/conf/domain_conf.c (virDomainBlockJobType): String conversion. (virDomainDiskDefParseXML): Parse job type. (virDomainDiskDefFormat): Output job type. * src/qemu/qemu_process.c (qemuProcessHandleBlockJob): Distinguish active from regular commit. * src/qemu/qemu_driver.c (qemuDomainBlockCopy): Set job type. (qemuDomainBlockPivot, qemuDomainBlockJobImpl): Clean up job type on completion. * tests/qemuxml2xmloutdata/qemuxml2xmlout-disk-mirror-old.xml: Update tests. * tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml: Likewise. * tests/qemuxml2argvdata/qemuxml2argv-disk-active-commit.xml: New file. * tests/qemuxml2xmltest.c (mymain): Drive new test.
Signed-off-by: Eric Blake <eblake@redhat.com> --- docs/formatdomain.html.in | 15 +++++---- docs/schemas/domaincommon.rng | 13 ++++++++ src/conf/domain_conf.c | 33 ++++++++++++++++++- src/conf/domain_conf.h | 1 + src/qemu/qemu_driver.c | 2 ++ src/qemu/qemu_process.c | 37 +++++++++++++++++++++- .../qemuxml2argv-disk-active-commit.xml | 37 ++++++++++++++++++++++ .../qemuxml2argvdata/qemuxml2argv-disk-mirror.xml | 6 ++-- .../qemuxml2xmlout-disk-mirror-old.xml | 4 +-- tests/qemuxml2xmltest.c | 1 + 10 files changed, 136 insertions(+), 13 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-disk-active-commit.xml
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index e770a77..4ea5493 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1017,6 +1017,7 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED, const char *path; virDomainDiskDefPtr disk; virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + virDomainDiskDefPtr persistDisk = NULL; bool save = false;
virObjectLock(vm); @@ -1025,6 +1026,9 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED, if (disk) { /* Have to generate two variants of the event for old vs. new * client callbacks */ + if (type == VIR_DOMAIN_BLOCK_JOB_TYPE_COMMIT && + disk->mirrorJob == VIR_DOMAIN_BLOCK_JOB_TYPE_ACTIVE_COMMIT) + type = disk->mirrorJob;
Hmm, looks like this would be better of in the next patch but one of the tests probably wouldn't work without this.
path = virDomainDiskGetSource(disk); event = virDomainEventBlockJobNewFromObj(vm, path, type, status); event2 = virDomainEventBlockJob2NewFromObj(vm, disk->dst, type, @@ -1036,6 +1040,31 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED, bool has_mirror = !!disk->mirror;
if (disk->mirrorState == VIR_DOMAIN_DISK_MIRROR_PIVOT) { + if (vm->newDef) { + int indx = virDomainDiskIndexByName(vm->newDef, disk->dst, + false); + virStorageSourcePtr copy = NULL; + + if (indx >= 0) { + persistDisk = vm->newDef->disks[indx]; + copy = virStorageSourceCopy(disk->mirror, false); + if (virStorageSourceInitChainElement(copy, + persistDisk->src, + false) < 0) { + VIR_WARN("Unable to update persistent definition " + "on vm %s after block job", + vm->def->name); + virStorageSourceFree(copy); + copy = NULL; + persistDisk = NULL; + } + } + if (copy) { + virStorageSourceFree(persistDisk->src); + persistDisk->src = copy; + } + }
Won't this allow us to enable block copy on persistent domains?
+ /* XXX We want to revoke security labels and disk * lease, as well as audit that revocation, before * dropping the original source. But it gets tricky @@ -1061,7 +1090,8 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED, }
if (disk->mirror && - type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) { + (type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY || + type == VIR_DOMAIN_BLOCK_JOB_TYPE_ACTIVE_COMMIT)) { if (status == VIR_DOMAIN_BLOCK_JOB_READY) { disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_READY; save = true; @@ -1069,6 +1099,7 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED, virStorageSourceFree(disk->mirror); disk->mirror = NULL; disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_DEFAULT; + disk->mirrorJob = VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN; save = true; } } @@ -1078,6 +1109,10 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED, if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm) < 0) VIR_WARN("Unable to save status on vm %s after block job", vm->def->name); + if (persistDisk && virDomainSaveConfig(cfg->configDir, + vm->newDef) < 0) + VIR_WARN("Unable to update persistent definition on vm %s " + "after block job", vm->def->name); } virObjectUnlock(vm); virObjectUnref(cfg);
ACK, although this patch an the next one are a bit borderline now that libvirt is frozen for the next release. Peter

On 07/29/2014 07:03 AM, Peter Krempa wrote:
On 07/29/14 06:20, Eric Blake wrote:
A future patch is going to wire up qemu active block commit jobs; but as they have similar events and are canceled/pivoted in the same way as block copy jobs, it is easiest to track all bookkeeping for the commit job by reusing the <mirror> element. This patch adds domain XML to track which job was responsible for creating a mirroring situation, and adds a job='copy' attribute to all existing uses of <mirror>. Along the way, it also massages the qemu monitor backend to read the new field in order to generate the correct type of libvirt job (even though it requires a future patch to actually cause a qemu event that can be reported as an active commit). It also prepares to update persistent XML to match changes made to live XML when a copy completes.
@@ -1025,6 +1026,9 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED, if (disk) { /* Have to generate two variants of the event for old vs. new * client callbacks */ + if (type == VIR_DOMAIN_BLOCK_JOB_TYPE_COMMIT && + disk->mirrorJob == VIR_DOMAIN_BLOCK_JOB_TYPE_ACTIVE_COMMIT) + type = disk->mirrorJob;
Hmm, looks like this would be better of in the next patch but one of the tests probably wouldn't work without this.
Maybe, but then the next commit doesn't look so deceptively simple, and I _did_ mention it in the commit message :)
if (disk->mirrorState == VIR_DOMAIN_DISK_MIRROR_PIVOT) { + if (vm->newDef) { + int indx = virDomainDiskIndexByName(vm->newDef, disk->dst, + false);
+ if (copy) { + virStorageSourceFree(persistDisk->src); + persistDisk->src = copy; + } + }
Won't this allow us to enable block copy on persistent domains?
It's part of the effort. The other half is actually being able to restart a blockcopy operation across qemu restarts (not available in qemu 2.1, it requires a persistent bitmap; but might be available in qemu 2.2). The reason that active commit can be done on a persistent domain is that you don't lose progress - if you spend 4 minutes getting 80% of the way, then the guest shuts down, then you restart qemu and restart the commit operation, then you only need 1 more minute for the remaining 20%. That is, aborting a commit job is not expensive. On the other hand, aborting a copy job at 80% progress, and then starting a new one, will end up copying all 100% of the file (5 more minutes, for a total of 9 instead of 5), because without a persistent bitmap, qemu loses track of how much of the copy has already been made. So for now, I'm still going to require transient domain for blockcopy. That, and I _really_ need a new API that lets you target a <disk> XML element as the copy destination (I ran out of time for 1.2.7, but 1.2.8 is looking good). Plus we need a way to expose qemu's drive-backup operation; the existing blockcopy code is crammed into virDomainBlockRebase where we don't have good exposure for pre- vs. post- snapshot; but a new virDomainBlockCopy() with an XML destination and proper flags would be better for it. For those less familiar with the difference between the two qemu commands, here's a timeline: A B C +---------------------------------------------------------- ^ ^ ^ request drive-mirror | | sync ready request complete/pivot the copy taken by drive-mirror is consistent at point C, which may be several minutes after the user actually requested the copy. So whatever the guest saw at point A may be long gone by the time the copy is actually finalized; planning for user point-in-time snapshots requires forethought to actually get a mirror up and running long before the user has a need for the snapshot. However, with that planning, the time between declaring the snapshot complete and actually having the snapshot to work with is nearly instant. A B +------------------------------------------ ^ ^ request drive-backup | backup complete the copy taken by drive-backup is not consistent until point B, but its contents match what the guest saw at point A. This is much nicer for point-in-time semantics, although it does mean that the user has to wait from the time they requested a backup until it is actually ready for them to use.
ACK, although this patch an the next one are a bit borderline now that libvirt is frozen for the next release.
I'll wait for a third-party opinion, then (maybe DV or danpb will weigh in before DV tries to cut rc2 in the next 24 hours...) -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

With this in place, I can (finally!) now do: virsh blockcommit $dom vda --shallow --verbose --pivot and watch qemu shorten the backing chain by one, followed by libvirt automatically updating the dumpxml output, effectively undoing the work of virsh snapshot-commit --no-metadata --disk-only. Commit is SOOOO much faster than blockpull, when I'm still fairly close in time to when the temporary qcow2 wrapper file was created via a snapshot operation! * src/qemu/qemu_driver.c (qemuDomainBlockCommit): Implement live commit. Signed-off-by: Eric Blake <eblake@redhat.com> --- src/qemu/qemu_driver.c | 36 ++++++++++++++++++++++++++++++++---- 1 file changed, 32 insertions(+), 4 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 683eaf3..9cf8fe8 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -15517,9 +15517,11 @@ qemuDomainBlockCommit(virDomainPtr dom, char *topPath = NULL; char *basePath = NULL; char *backingPath = NULL; + virStorageSourcePtr mirror = NULL; - /* XXX Add support for COMMIT_ACTIVE, COMMIT_DELETE */ + /* XXX Add support for COMMIT_DELETE */ virCheckFlags(VIR_DOMAIN_BLOCK_COMMIT_SHALLOW | + VIR_DOMAIN_BLOCK_COMMIT_ACTIVE | VIR_DOMAIN_BLOCK_COMMIT_RELATIVE, -1); if (!(vm = qemuDomObjFromDomain(dom))) @@ -15568,9 +15570,6 @@ qemuDomainBlockCommit(virDomainPtr dom, &top_parent))) goto endjob; - /* FIXME: qemu 2.0 supports active commit, but as a two-stage - * process; qemu 2.1 is further improving active commit. We need - * to start supporting it in libvirt. */ if (topSource == disk->src) { if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_ACTIVE_COMMIT)) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", @@ -15584,6 +15583,12 @@ qemuDomainBlockCommit(virDomainPtr dom, disk->dst); goto endjob; } + if (disk->mirror) { + virReportError(VIR_ERR_BLOCK_COPY_ACTIVE, + _("disk '%s' already in active block job"), + disk->dst); + goto endjob; + } } else if (flags & VIR_DOMAIN_BLOCK_COMMIT_ACTIVE) { virReportError(VIR_ERR_INVALID_ARG, _("active commit requested but '%s' is not active"), @@ -15614,6 +15619,16 @@ qemuDomainBlockCommit(virDomainPtr dom, goto endjob; } + /* For an active commit, clone enough of the base to act as the mirror */ + if (topSource == disk->src) { + if (!(mirror = virStorageSourceCopy(baseSource, false))) + goto endjob; + if (virStorageSourceInitChainElement(mirror, + disk->src, + false) < 0) + goto endjob; + } + /* For the commit to succeed, we must allow qemu to open both the * 'base' image and the parent of 'top' as read/write; 'top' might * not have a parent, or might already be read-write. XXX It @@ -15665,6 +15680,18 @@ qemuDomainBlockCommit(virDomainPtr dom, bandwidth); qemuDomainObjExitMonitor(driver, vm); + if (ret == 0 && mirror) { + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + + disk->mirror = mirror; + mirror = NULL; + disk->mirrorJob = VIR_DOMAIN_BLOCK_JOB_TYPE_ACTIVE_COMMIT; + if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm) < 0) + VIR_WARN("Unable to save status on vm %s after block job", + vm->def->name); + virObjectUnref(cfg); + } + endjob: if (ret < 0 && clean_access) { /* Revert access to read-only, if possible. */ @@ -15674,6 +15701,7 @@ qemuDomainBlockCommit(virDomainPtr dom, qemuDomainPrepareDiskChainElement(driver, vm, top_parent, VIR_DISK_CHAIN_READ_ONLY); } + virStorageSourceFree(mirror); if (!qemuDomainObjEndJob(driver, vm)) vm = NULL; -- 1.9.3

On 07/29/14 06:20, Eric Blake wrote:
With this in place, I can (finally!) now do:
virsh blockcommit $dom vda --shallow --verbose --pivot
and watch qemu shorten the backing chain by one, followed by libvirt automatically updating the dumpxml output, effectively undoing the work of virsh snapshot-commit --no-metadata --disk-only. Commit is SOOOO much faster than blockpull, when I'm still fairly close in time to when the temporary qcow2 wrapper file was created via a snapshot operation!
* src/qemu/qemu_driver.c (qemuDomainBlockCommit): Implement live commit.
Signed-off-by: Eric Blake <eblake@redhat.com> --- src/qemu/qemu_driver.c | 36 ++++++++++++++++++++++++++++++++---- 1 file changed, 32 insertions(+), 4 deletions(-)
ACK, didn't change from my previous commit much. Peter

On 07/29/2014 07:06 AM, Peter Krempa wrote:
On 07/29/14 06:20, Eric Blake wrote:
With this in place, I can (finally!) now do:
virsh blockcommit $dom vda --shallow --verbose --pivot
and watch qemu shorten the backing chain by one, followed by libvirt automatically updating the dumpxml output, effectively undoing the work of virsh snapshot-commit --no-metadata --disk-only. Commit is SOOOO much faster than blockpull, when I'm still fairly close in time to when the temporary qcow2 wrapper file was created via a snapshot operation!
* src/qemu/qemu_driver.c (qemuDomainBlockCommit): Implement live commit.
Signed-off-by: Eric Blake <eblake@redhat.com> --- src/qemu/qemu_driver.c | 36 ++++++++++++++++++++++++++++++++---- 1 file changed, 32 insertions(+), 4 deletions(-)
ACK,
didn't change from my previous commit much.
Alas, it needs to change - now that I'm more careful about using the qemu events to react to state changes, I have to be prepared for a qemu event that occurs prior to regaining lock in the main code. I'll post a v6, but here's the interdiff: diff --git i/src/qemu/qemu_driver.c w/src/qemu/qemu_driver.c index 6a649e5..c619fbb 100644 --- i/src/qemu/qemu_driver.c +++ w/src/qemu/qemu_driver.c @@ -15673,23 +15673,31 @@ qemuDomainBlockCommit(virDomainPtr dom, * if any, through to qemu, since qemu may behave differently * depending on whether the input was specified as relative or * absolute (that is, our absolute top_canon may do the wrong - * thing if the user specified a relative name). */ + * thing if the user specified a relative name). Be prepared for + * a ready event to occur while locks are dropped. */ + if (mirror) { + disk->mirror = mirror; + disk->mirrorJob = VIR_DOMAIN_BLOCK_JOB_TYPE_ACTIVE_COMMIT; + } qemuDomainObjEnterMonitor(driver, vm); ret = qemuMonitorBlockCommit(priv->mon, device, topPath, basePath, backingPath, bandwidth); qemuDomainObjExitMonitor(driver, vm); - if (ret == 0 && mirror) { - virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + if (mirror) { + if (ret == 0) { + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); - disk->mirror = mirror; - mirror = NULL; - disk->mirrorJob = VIR_DOMAIN_BLOCK_JOB_TYPE_ACTIVE_COMMIT; - if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm) < 0) - VIR_WARN("Unable to save status on vm %s after block job", - vm->def->name); - virObjectUnref(cfg); + mirror = NULL; + if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm) < 0) + VIR_WARN("Unable to save status on vm %s after block job", + vm->def->name); + virObjectUnref(cfg); + } else { + disk->mirror = NULL; + disk->mirrorJob = VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN; + } } endjob: -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 07/28/2014 10:20 PM, Eric Blake wrote:
v4 was here: https://www.redhat.com/archives/libvir-list/2014-June/msg01095.html
Since then: I _finally_ implemented persistent domain updates to match live changes, by reimplementing how <mirror> is tracked in XML to be more robust to libvirtd restarts. In particular, it fixes a bug in v4 that Adam Litke pointed out where a pivot commit would emit a ready event for 'active commit' but then a completed event for plain 'commit'.
Eric Blake (4): blockcopy: add more XML for state tracking blockjob: properly track blockcopy xml changes on disk blockcommit: track job type in xml blockcommit: turn on active commit
Also at: http://repo.or.cz/w/libvirt/ericb.git/shortlog/refs/heads/gluster or git checkout git://repo.or.cz/libvirt/ericb.git gluster (although my testing so far has been on local files, the idea is that this series _should_ allow active commit into a gluster-based backing chain). -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
participants (2)
-
Eric Blake
-
Peter Krempa