On 04/01/2015 04:40 PM, Peter Krempa wrote:
QEMU does not abandon the mirror. The job carries on in the
synchronised
phase and it might be either pivoted again or cancelled. The commit
hints that the described behavior was happening in a downstream version.
If the command returns false there are two possible options:
1) qemu did not reach the point where it would ask the block job to
pivot
2) pivotting failed in the actual qemu coroutine
If either of those would happen we return failure and reset the
condition that waits for the block job to complete. This makes the API
fail but in case where qemu would actually abandon the mirror the fact
is notified via the event and handled asynchronously.
Resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=1202704
---
Notes:
I've spent some time looking how the active commit and copy job actually
works in qemu, but I did not check if that behavior changed in the upstream
releases. At any rate, it makes sense thus I expect that it was there ever-since.
Version 2:
- this version resets the flag that makes libvirt wait on the event. This should
make the API as rugged as it can possibly be.
src/qemu/qemu_driver.c | 27 ++++++---------------------
1 file changed, 6 insertions(+), 21 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index 2dd8ed4..52c3587 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -16128,27 +16128,10 @@ qemuDomainBlockPivot(virQEMUDriverPtr driver,
}
if (ret < 0) {
- /* On failure, qemu abandons the mirror, and reverts back to
- * the source disk (RHEL 6.3 has a bug where the revert could
- * cause catastrophic failure in qemu, but we don't need to
- * worry about it here as it is not an upstream qemu problem. */
- /* XXX should we be parsing the exact qemu error, or calling
- * 'query-block', to see what state we really got left in
- * before killing the mirroring job?
- * XXX We want to revoke security labels and disk lease, as
- * well as audit that revocation, before dropping the original
- * source. But it gets tricky if both source and mirror share
- * common backing files (we want to only revoke the non-shared
- * portion of the chain); so for now, we leak the access to
- * the original. */
- virStorageSourceFree(disk->mirror);
- disk->mirror = NULL;
- disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_NONE;
- disk->mirrorJob = VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN;
- disk->blockjob = false;
+ /* The pivot failed. The block job in QEMU remains in the synchronised
+ * phase. Reset the state we changed and return the error to the user */
+ disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_READY;
}
I won't pretend to say I understand all the comments that get deleted in
this, but I assume they were reacting to various "changes" as time went
on with perhaps a final decision at some point in time to
So I'd say a weak ACK only because I don't have all the history on this
nor do I have in mind all the various states... Although since I do see
the code checks a few lines above:
if (disk->mirrorState != VIR_DOMAIN_DISK_MIRROR_STATE_READY) {
error...
}
returning mirrorState back to READY does seem logical to me.
My only other comment would be you could move the comment outside the {}
and then remove the {}'s...
John
FWIW:
Depending on the dictionary one uses - synchronized or canceled may be
used, but using synchronised and cancelled seems to go against the will
of Thunderbird's spell checker ;-)
- if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm)
< 0)
- ret = -1;
cleanup:
if (oldsrc)
@@ -16354,8 +16337,10 @@ qemuDomainBlockJobAbort(virDomainPtr dom,
if (disk->mirror && (flags & VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT)) {
ret = qemuDomainBlockPivot(driver, vm, device, disk);
- if (ret < 0 && async)
+ if (ret < 0 && async) {
+ disk->blockJobSync = false;
goto endjob;
+ }
goto waitjob;
}
if (disk->mirror) {