On 04/13/2012 08:46 AM, Jiri Denemark wrote:
On Mon, Apr 09, 2012 at 21:52:21 -0600, Eric Blake wrote:
> Minimal patch to wire up all the pieces in the previous patches
> to actually enable a block copy job. By minimal, I mean that
> qemu creates the file (that is, no REUSE_EXT flag support yet),
> SELinux must be disabled, a lock manager is not informed, and
> the audit logs aren't updated. But those will be added as
> improvements in future patches.
>
> * src/qemu/qemu_driver.c (qemuDomainBlockCopy): New function.
> (qemuDomainBlockRebase): Call it when appropriate.
> ---
> src/qemu/qemu_driver.c | 114 +++++++++++++++++++++++++++++++++++++++++++++++-
> 1 files changed, 113 insertions(+), 1 deletions(-)
> +
> + priv = vm->privateData;
> + if (!(qemuCapsGet(priv->qemuCaps, QEMU_CAPS_DRIVE_MIRROR) &&
> + qemuCapsGet(priv->qemuCaps, QEMU_CAPS_DRIVE_REOPEN))) {
> + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s",
We usually use VIR_ERR_CONFIG_UNSUPPORTED in such cases.
Sure, easy to fix. Actually, looking at 'git grep -B3 -i "qemu binary"
src/qemu', I counted an INTERNAL_ERROR, an OPERATION_FAILED, and a
couple of OPERATION_INVALID that should all be cleaned up, to match the
fact that the majority of uses did indeed favor CONFIG_UNSUPPORTED.
I'll split the cleanup into a separate patch.
> + _("block copy is not supported with this QEMU
binary"));
> + goto cleanup;
> + }
> + if (vm->persistent) {
> + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s",
> + _("domain is not transient"));
> + goto cleanup;
> + }
I guess I wasn't paying enough attention somewhere but why do we forbid block
copy for persistent domains? I understand why we want to forbid certain
operations when block copy is active but I don't see a reason for penalizing
persistent domains.
It was in patch 8/18 where I added the restrictions, and hopefully
documented in that commit message why limiting things to transient is a
good first step:
1. the first client of live migration is oVirt, which uses transient domains
2. qemu does not (yet) provide a way to resume a mirror when restarting
a domain, so anything that would require restoring a domain from saved
state is broken: incoming migration, virsh restore, and virsh start
after a managed save. But users would be upset if they saved a domain,
only to find out that they cannot then restore it, so I squelched things
one step earlier in the process, by preventing any save of a domain so
that we never have a broken save image in the first place.
My worry now comes from the fact that managedsave is on the list of
forbidden operations. If a domain is transient, managedsave is already
forbidden (it is assumed that you either don't care about the domain if
the host dies, or that you are running a higher-level app like oVirt
that knows how to rebuild the guest on a different host). But if a
guest is persistent, and you use the libvirt-guests init script, then
you have a right to assume that rebooting your host will resume your
guests in the same state that they were prior to the host going down -
because libvirt-guests uses managedsave. If we allow a mirror job on a
persistent domain, we violate this assumption (libvirt-guests will fail
to save the guest). Therefore, I forbid to start a mirror job on a
persistent domain, just as I forbid to 'virsh define' a transient domain
into a persistent domain if a mirror job is active.
If, at a later date, qemu comes up with a way to resume mirroring when
restarting a domain, we can relax these restrictions.
> +
> + /* Actually start the mirroring */
> + qemuDomainObjEnterMonitorWithDriver(driver, vm);
> + ret = qemuMonitorDriveMirror(priv->mon, NULL, device, dest, format, flags);
> + if (ret == 0 && bandwidth != 0)
> + ret = qemuMonitorBlockJob(priv->mon, device, NULL, bandwidth, NULL,
> + BLOCK_JOB_SPEED_INTERNAL);
> + qemuDomainObjExitMonitorWithDriver(driver, vm);
Can BLOCK_JOB_SPEED_INTERNAL fail while block copy remains running? If so, we
should try to abort the job in case we fail to set the speed, otherwise we
will report an error and forget about the job while qemu will keep copying
data.
Good catch, and virDomainBlockPull() suffers from the same
ramifications. I think both code paths need the same fix.
> @@ -11889,6 +12000,7 @@ static int
> qemuDomainBlockPull(virDomainPtr dom, const char *path, unsigned long bandwidth,
> unsigned int flags)
> {
> + virCheckFlags(0, -1);
> return qemuDomainBlockRebase(dom, path, NULL, bandwidth, flags);
> }
Hmm, personally, I would make qemuDomainBlockPull call qemuDomainBlockJobImpl
directly instead of going through qemuDomainBlockRebase while adding the flags
check...
Easy enough to change.
--
Eric Blake eblake(a)redhat.com +1-919-301-3266
Libvirt virtualization library
http://libvirt.org