Re: [libvirt] [PATCH 10/11 v2] qemu: drivePivot: Fix assumption when 'block-job-complete' fails

Wednesday, 8 April 2015

On 04/01/2015 04:40 PM, Peter Krempa wrote:
...
 QEMU does not abandon the mirror. The job carries on in the
synchronised
 phase and it might be either pivoted again or cancelled. The commit
 hints that the described behavior was happening in a downstream version.

 If the command returns false there are two possible options:
 1) qemu did not reach the point where it would ask the block job to
 pivot
 2) pivotting failed in the actual qemu coroutine

 If either of those would happen we return failure and reset the
 condition that waits for the block job to complete. This makes the API
 fail but in case where qemu would actually abandon the mirror the fact
 is notified via the event and handled asynchronously.

 Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1202704
 ---

 Notes:
     I've spent some time looking how the active commit and copy  job actually
     works in qemu, but I did not check if that behavior changed in the upstream
     releases. At any rate, it makes sense  thus I expect that it was there ever-since.

     Version 2:
     - this version resets the flag that makes libvirt wait on the event. This should
     make the API as rugged as it can possibly be.

  src/qemu/qemu_driver.c | 27 ++++++---------------------
  1 file changed, 6 insertions(+), 21 deletions(-)

 diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
 index 2dd8ed4..52c3587 100644
 --- a/src/qemu/qemu_driver.c
 +++ b/src/qemu/qemu_driver.c
 @@ -16128,27 +16128,10 @@ qemuDomainBlockPivot(virQEMUDriverPtr driver,
      }

      if (ret < 0) {
 -        /* On failure, qemu abandons the mirror, and reverts back to
 -         * the source disk (RHEL 6.3 has a bug where the revert could
 -         * cause catastrophic failure in qemu, but we don't need to
 -         * worry about it here as it is not an upstream qemu problem.  */
 -        /* XXX should we be parsing the exact qemu error, or calling
 -         * 'query-block', to see what state we really got left in
 -         * before killing the mirroring job?
 -         * XXX We want to revoke security labels and disk lease, as
 -         * well as audit that revocation, before dropping the original
 -         * source.  But it gets tricky if both source and mirror share
 -         * common backing files (we want to only revoke the non-shared
 -         * portion of the chain); so for now, we leak the access to
 -         * the original.  */
 -        virStorageSourceFree(disk->mirror);
 -        disk->mirror = NULL;
 -        disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_NONE;
 -        disk->mirrorJob = VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN;
 -        disk->blockjob = false;
 +        /* The pivot failed. The block job in QEMU remains in the synchronised
 +         * phase. Reset the state we changed and return the error to the user */
 +        disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_READY;
      } 

I won't pretend to say I understand all the comments that get deleted in
this, but I assume they were reacting to various "changes" as time went
on with perhaps a final decision at some point in time to

So I'd say a weak ACK only because I don't have all the history on this
nor do I have in mind all the various states...  Although since I do see
the code checks a few lines above:

    if (disk->mirrorState != VIR_DOMAIN_DISK_MIRROR_STATE_READY) {
         error...
    }

returning mirrorState back to READY does seem logical to me.

My only other comment would be you could move the comment outside the {}
and then remove the {}'s...

John

FWIW:
Depending on the dictionary one uses - synchronized or canceled may be
used, but using synchronised and cancelled seems to go against the will
of Thunderbird's spell checker ;-)

...
 -    if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm)
< 0)
 -        ret = -1;

   cleanup:
      if (oldsrc)
 @@ -16354,8 +16337,10 @@ qemuDomainBlockJobAbort(virDomainPtr dom,

      if (disk->mirror && (flags & VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT)) {
          ret = qemuDomainBlockPivot(driver, vm, device, disk);
 -        if (ret < 0 && async)
 +        if (ret < 0 && async) {
 +            disk->blockJobSync = false;
              goto endjob;
 +        }
          goto waitjob;
      }
      if (disk->mirror) {

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] [PATCH 10/11 v2] qemu: drivePivot: Fix assumption when 'block-job-complete' fails