On 08/25/2014 11:20 AM, Peter Krempa wrote:
On 08/24/14 05:32, Eric Blake wrote:
> This commit (finally) adds the virDomainBlockCopy API, with the
> intent that it will provide more power to the existing 'virsh
> blockcopy' command.
>
> +/**
> + * VIR_DOMAIN_BLOCK_COPY_BANDWIDTH:
> + * Macro for the virDomainBlockCopy bandwidth tunable: it represents
> + * the maximum bandwidth (in MiB/s) used while getting the copy
> + * operation into the mirrored phase, with a type of ullong (although
MiB/s and ullong? thats an awful lot of speed. Shouldn't we go with
KiB/s? This might benefit slower networks. (although it may never
converge there)
No. We're forced to use back-compat to the existing design of:
virDomainBlockRebase(virDomainPtr dom, const char *disk,
const char *base, unsigned long bandwidth,
unsigned int flags)
virDomainBlockJobSetSpeed(virDomainPtr dom, const char *disk,
unsigned long bandwidth, unsigned int flags)
which already document bandwidth as an unsigned long (32-bit maximum is
the only portable value). It's just that virTypedParameters
intentionally lack support for 'unsigned long' (precisely because it is
variable-sized between 32-bit and 64-bit), so my (still-in-progress)
patch 6 has an overflow check that errors out if the user passed in a
bandwidth using ullong that doesn't fit in the ulong handed to qemu.
The other alternative is to declare the parameter as 'uint', and that
the new API is unable to represent uses of the old API that exceed
INT_MAX, but I really want the new API to be a superset of the old API.
I tried to cover that point in the commit message, and also in the
comments to the macro stating that the hypervisor can reject
out-of-range values. Do I need to make the wording any stronger?
> +/**
> + * VIR_DOMAIN_BLOCK_COPY_GRANULARITY:
> + * Macro for the virDomainBlockCopy granularity tunable: it represents
> + * the granularity at which the copy operation recognizes dirty pages,
I'd rather say "dirty blocks". Pages might indicate RAM memory pages.
Sure. I'm just trying to expose the qemu parameters, which are rather
sparsely documented as:
# @granularity: #optional granularity of the dirty bitmap, default is 64K
# if the image format doesn't have clusters, 4K if the
clusters
# are smaller than that, else the cluster size. Must be a
# power of 2 between 512 and 64M (since 1.4).
#
# @buf-size: #optional maximum amount of data in flight from source to
# target (since 1.4).
> + * as an unsigned int, and must be a power of 2.
> + */
> +#define VIR_DOMAIN_BLOCK_COPY_GRANULARITY "granularity"
> +
> +/**
> + * VIR_DOMAIN_BLOCK_COPY_BUF_SIZE:
> + * Macro for the virDomainBlockCopy buffer size tunable: it represents
> + * how much data can be in flight between source and destination, as
> + * an unsigned int.
In bytes?
I assume so; again, see the qemu docs that I'm trying to expose.
> + *
> + * A copy job has two parts; in the first phase, the @bandwidth parameter
@bandwidth is now provided as a typed param.
Too much copy-and-paste - yeah, I'll have to adjust that.
> +int
> +virDomainBlockCopy(virDomainPtr dom, const char *disk,
> + const char *destxml,
> + virTypedParameterPtr params,
> + int nparams,
> + unsigned int flags)
Wow, XML, typed params and flags. Now that's future proof! :)
I sure hope so! Although I'm _already_ slightly worried about image
fleecing, which says to expose the state of the disk not as it is
currently evolving, but as it existed at a fixed point in time in the
past, often in order to copy out that state to backup storage. In that
case, fleecing may want to start from a known point whereas this API
kind of implies starting from the active image. Hmm, I guess we have
the 'vda[1]' notation for picking a known point that is the backing file
of vda. Then again, while fleecing is a form of copying data, it might
be distinct enough to warrant a separate API anyway.
> +LIBVIRT_1.2.8 {
> + global:
> + virDomainBlockCopy;
One of us will have to rebase. I've recently posted a series that adds
API too :)
Fortunately, the rebase is trivial.
> +} LIBVIRT_1.2.7;
> +
> # .... define new API here using predicted next version number ....
>
Apart from a few DOC problems the API looks fine to me and should be
fairly future proof.
ACK to the design (once docs are fixed).
Peter
P.S.: I've run out of time to review the rest of this, but this should
be good enough to merge the rest a bit later.
Thanks; still, I'll post a v2 with tweaks you mentioned above and with
anything else I learn today while implementing the rest, if it looks
like DV will give me enough time to still get it into rc0.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library
http://libvirt.org