On 07/13/2011 08:04 PM, Daniel Veillard wrote:
On Wed, Jul 13, 2011 at 03:46:30PM -0500, Adam Litke wrote:
> Unfortunately, after committing the blockPull API to libvirt, the qemu
> community decided to change the API somewhat and we needed to revert the
> libvirt implementation. Now that the qemu API is settling out again, I would
> like to propose an updated libvirt public API which has required only a minor
> set of changes to sync back up to the qemu API.
>
> Summary of changes:
> - Qemu dropped incremental streaming so remove libvirt incremental
> BlockPull() API
> - Rename virDomainBlockPullAll() to virDomainBlockPull()
> - Changes required to qemu monitor handlers for changed command names
Okay.
> Currently, qemu block device streaming completely flattens a disk image's
> backing file chain. Consider the following chain: A->B->C where C is a leaf
> image that is backed by B and B is backed by A. The current disk streaming
> command will produce an independent image C with no backing file. Future
> versions of qemu block streaming may support an option to specify a new base
> image from the current chain. For example: stream --backing_file B C would
> pull all blocks that are only in A to produce the chain: B->C (thus
> eliminating a dependency on A but maintaining B as a backing image.
>
> Do we want to create room in the BlockPull API to support this advanced usage
> in the future? If so, then a new parameter must be added to BlockPull: const
> char *backing_path. Even if we extend the API in this manner, the initial
> implementation will not support it because qemu will not support it
> immediately, and libvirt is missing core functionality to support it (no
> public enumeration of the disk backing file chain). Thoughts?
My own preference would rather be to defer this, and provide a
separate API when QEmu implementation comes. The problem is that
the single backing store file might not be sufficient to fully support
what may come if QEmu adds this and as you pointed out we aren't really
ready for this on libvirt side either.
Yes, I agree with you here.
On the other hand I suspect that we are missing the mechanism to
control the rate of the transfer in the new API, which is something
which could be implemented in the old incremental mechanism, but not
anymore. So I wonder if the virDomainBlockPull() shouldn't get an
"unsigned long bandwidth" (to stay coherent with migration APIs)
before the flags. I don't know if the support is ready in QEmu but
I suspect that will be an important point, so if not present will
be added :-)
Hopefully Stefan can comment on any throttling mechanisms that might be
added to the qemu implementation. It would be nice to have this feature
built into the API to match the migration APIs, but if that is not
feasible then we could always add it later.
> --
>
> To help speed the provisioning process for large domains, new QED disks are
> created with backing to a template image. These disks are configured with
> copy on read such that blocks that are read from the backing file are copied
> to the new disk. This reduces I/O over a potentially costly path to the
> backing image.
>
> In such a configuration, there is a desire to remove the dependency on the
> backing image as the domain runs. To accomplish this, qemu will provide an
> interface to perform sequential copy on read operations during normal VM
> operation. Once all data has been copied, the disk image's link to the
> backing file is removed.
>
> The virDomainBlockPull API family brings this functionality to libvirt.
>
> virDomainBlockPull() instructs the hypervisor to stream the entire device in
> the background. Progress of this operation can be checked with the function
> virDomainBlockPullInfo(). An ongoing stream can be cancelled with
> virDomainBlockPullAbort().
>
> An event (VIR_DOMAIN_EVENT_ID_BLOCK_PULL) will be emitted when a disk has been
> fully populated or if a BlockPull() operation was terminated due to an error.
> This event is useful to avoid polling on virDomainBlockPullInfo() for
> completion and could also be used by the security driver to revoke access to
> the backing file when it is no longer needed.
>
> /*
> * BlockPull API
> */
>
> /* An iterator for initiating and monitoring block pull operations */
> typedef unsigned long long virDomainBlockPullCursor;
>
> typedef struct _virDomainBlockPullInfo virDomainBlockPullInfo;
> struct _virDomainBlockPullInfo {
> /*
> * The following fields provide an indication of block pull progress. @cur
> * indicates the current position and will be between 0 and @end. @end is
> * the final cursor position for this operation and represents completion.
> * To approximate progress, divide @cur by @end.
> */
> virDomainBlockPullCursor cur;
> virDomainBlockPullCursor end;
> };
> typedef virDomainBlockPullInfo *virDomainBlockPullInfoPtr;
>
> /**
> * virDomainBlockPull:
> * @dom: pointer to domain object
> * @path: Fully-qualified filename of disk
> * @flags: currently unused, for future extension
> *
> * Populate a disk image with data from its backing image. Once all data from
> * its backing image has been pulled, the disk no longer depends on a backing
> * image. This function pulls data for the entire device in the background.
> * Progress of the operation can be checked with virDomainGetBlockPullInfo() and
> * the operation can be aborted with virDomainBlockPullAbort(). When finished,
> * an asynchronous event is raised to indicate the final status.
> *
> * Returns 0 if the operation has started, -1 on failure.
> */
> int virDomainBlockPull(virDomainPtr dom,
> const char *path,
So I would just add an unsigned long bandwidth,
here
> unsigned int flags);
and document it as:
@bandwidth: (optional) specify copy bandwidth limit in Mbps
* The maximum bandwidth (in Mbps) that will be used to do the copy
* can be specified with the bandwidth parameter. If set to 0,
* libvirt will choose a suitable default. Some hypervisors do
* not support this feature and will return an error if bandwidth
* is not 0.
I am concerned with the units here. I am not sure if the qemu
throttling code will permit us to throttle by Mbps. The amount of data
that has to actually be transferred depends on how many of the image's
blocks are already populated. Stefan, can you elaborate on the
possibilities wrt qemu throttling?
> /**
> * virDomainBlockPullAbort:
> * @dom: pointer to domain object
> * @path: fully-qualified filename of disk
> * @flags: currently unused, for future extension
> *
> * Cancel a pull operation previously started by virDomainBlockPullAll().
> *
> * Returns -1 in case of failure, 0 when successful.
> */
> int virDomainBlockPullAbort(virDomainPtr dom,
> const char *path,
> unsigned int flags);
>
> /**
> * virDomainGetBlockPullInfo:
> * @dom: pointer to domain object
> * @path: fully-qualified filename of disk
> * @info: pointer to a virDomainBlockPullInfo structure
> * @flags: currently unused, for future extension
> *
> * Request progress information on a block pull operation that has been started
> * with virDomainBlockPull(). If an operation is active for the given
> * parameters, @info will be updated with the current progress.
> *
> * Returns -1 in case of failure, 0 when successful.
> */
> int virDomainGetBlockPullInfo(virDomainPtr dom,
> const char *path,
> virDomainBlockPullInfoPtr info,
> unsigned int flags);
I also wonder about the PullInfo, is there any way we could extend it
to suggest remaining time, an approximation might be computable by the
management code, but since libvirt will know about bandwidth better than
the upper layer maybe it's in abetter situation to do that evaluation
(assuming QEmu doesn't give informations about this).
We had considered this earlier in the original API discussions, but
given the non-deterministic nature of the data that will ultimately be
sent, an accurate time estimate is not possible.
> /**
> * virConnectDomainEventBlockPullStatus:
> *
> * The final status of a virDomainBlockPull() operation
> */
> typedef enum {
> VIR_DOMAIN_BLOCK_PULL_COMPLETED = 0,
> VIR_DOMAIN_BLOCK_PULL_FAILED = 1,
> } virConnectDomainEventBlockPullStatus;
>
> /**
> * virConnectDomainEventBlockPullCallback:
> * @conn: connection object
> * @dom: domain on which the event occurred
> * @path: fully-qualified filename of the affected disk
> * @status: final status of the operation (virConnectDomainEventBlockPullStatus
> *
> * The callback signature to use when registering for an event of type
> * VIR_DOMAIN_EVENT_ID_BLOCK_PULL with virConnectDomainEventRegisterAny()
> */
> typedef void (*virConnectDomainEventBlockPullCallback)(virConnectPtr conn,
> virDomainPtr dom,
> const char *path,
> int status,
> void *opaque);
Except for the bandwidth addition and potential extension of the
BlockInfo, that looks fine to me.
Daniel
--
Adam Litke
IBM Linux Technology Center