Re: [libvirt] RFC New virDomainBlockPull API family to libvirt

15 Jul 2011

      On Thu, Jul 14, 2011 at 7:47 PM, Adam Litke <agl@us.ibm.com> wrote:
...
On 07/13/2011 08:04 PM, Daniel Veillard wrote:
...
On Wed, Jul 13, 2011 at 03:46:30PM -0500, Adam Litke wrote:
...
Unfortunately, after committing the blockPull API to libvirt, the qemu
community decided to change the API somewhat and we needed to revert the
libvirt implementation.  Now that the qemu API is settling out again, I would
like to propose an updated libvirt public API which has required only a minor
set of changes to sync back up to the qemu API.
Summary of changes:
 - Qemu dropped incremental streaming so remove libvirt incremental
   BlockPull() API
 - Rename virDomainBlockPullAll() to virDomainBlockPull()
 - Changes required to qemu monitor handlers for changed command names
  Okay.
...
Currently, qemu block device streaming completely flattens a disk image's
backing file chain.  Consider the following chain: A->B->C where C is a leaf
image that is backed by B and B is backed by A.  The current disk streaming
command will produce an independent image C with no backing file.  Future
versions of qemu block streaming may support an option to specify a new base
image from the current chain.  For example: stream --backing_file B C would
pull all blocks that are only in A to produce the chain: B->C (thus
eliminating a dependency on A but maintaining B as a backing image.
Do we want to create room in the BlockPull API to support this advanced usage
in the future?  If so, then a new parameter must be added to BlockPull: const
char *backing_path.  Even if we extend the API in this manner, the initial
implementation will not support it because qemu will not support it
immediately, and libvirt is missing core functionality to support it (no
public enumeration of the disk backing file chain).  Thoughts?
  My own preference would rather be to defer this, and provide a
separate API when QEmu implementation comes. The problem is that
the single backing store file might not be sufficient to fully support
what may come if QEmu adds this and as you pointed out we aren't really
ready for this on libvirt side either.
Yes, I agree with you here.
...
  On the other hand I suspect that we are missing the mechanism to
control the rate of the transfer in the new API, which is something
which could be implemented in the old incremental mechanism, but not
anymore. So I wonder if the virDomainBlockPull() shouldn't get an
"unsigned long bandwidth" (to stay coherent with migration APIs)
before the flags. I don't know if the support is ready in QEmu but
I suspect that will be an important point, so if not present will
be added :-)
Hopefully Stefan can comment on any throttling mechanisms that might be
added to the qemu implementation.  It would be nice to have this feature
built into the API to match the migration APIs, but if that is not
feasible then we could always add it later.
If we follow the live migration API then a block_stream_set_speed
command would be used.  libvirt would issue it as a separate command
immediately after starting the streaming operation.  There is the
window of time after streaming has started but before
block_stream_set_speed has been called where no throttling takes
place, but I don't think this is a practical problem.

It also opens the possibility of allowing the user to change the rate
limit value while the block streaming operation is active.
...
...
...
To help speed the provisioning process for large domains, new QED disks are
created with backing to a template image.  These disks are configured with
copy on read such that blocks that are read from the backing file are copied
to the new disk.  This reduces I/O over a potentially costly path to the
backing image.
In such a configuration, there is a desire to remove the dependency on the
backing image as the domain runs.  To accomplish this, qemu will provide an
interface to perform sequential copy on read operations during normal VM
operation.  Once all data has been copied, the disk image's link to the
backing file is removed.
The virDomainBlockPull API family brings this functionality to libvirt.
virDomainBlockPull() instructs the hypervisor to stream the entire device in
the background.  Progress of this operation can be checked with the function
virDomainBlockPullInfo().  An ongoing stream can be cancelled with
virDomainBlockPullAbort().
An event (VIR_DOMAIN_EVENT_ID_BLOCK_PULL) will be emitted when a disk has been
fully populated or if a BlockPull() operation was terminated due to an error.
This event is useful to avoid polling on virDomainBlockPullInfo() for
completion and could also be used by the security driver to revoke access to
the backing file when it is no longer needed.
/*
 * BlockPull API
 */
/* An iterator for initiating and monitoring block pull operations */
typedef unsigned long long virDomainBlockPullCursor;
typedef struct _virDomainBlockPullInfo virDomainBlockPullInfo;
struct _virDomainBlockPullInfo {
    /*
     * The following fields provide an indication of block pull progress.  @cur
     * indicates the current position and will be between 0 and @end.  @end is
     * the final cursor position for this operation and represents completion.
     * To approximate progress, divide @cur by @end.
     */
    virDomainBlockPullCursor cur;
    virDomainBlockPullCursor end;
};
typedef virDomainBlockPullInfo *virDomainBlockPullInfoPtr;
/**
 * virDomainBlockPull:
 * @dom: pointer to domain object
 * @path: Fully-qualified filename of disk
 * @flags: currently unused, for future extension
 *
 * Populate a disk image with data from its backing image.  Once all data from
 * its backing image has been pulled, the disk no longer depends on a backing
 * image.  This function pulls data for the entire device in the background.
 * Progress of the operation can be checked with virDomainGetBlockPullInfo() and
 * the operation can be aborted with virDomainBlockPullAbort().  When finished,
 * an asynchronous event is raised to indicate the final status.
 *
 * Returns 0 if the operation has started, -1 on failure.
 */
int                 virDomainBlockPull(virDomainPtr dom,
                                       const char *path,
So I would just add an                   unsigned long bandwidth,
here
...
                                       unsigned int flags);
and document it as:
  @bandwidth: (optional) specify copy bandwidth limit in Mbps
...
 * The maximum bandwidth (in Mbps) that will be used to do the copy
 * can be specified with the bandwidth parameter.  If set to 0,
 * libvirt will choose a suitable default.  Some hypervisors do
 * not support this feature and will return an error if bandwidth
 * is not 0.
I am concerned with the units here.  I am not sure if the qemu
throttling code will permit us to throttle by Mbps.  The amount of data
that has to actually be transferred depends on how many of the image's
blocks are already populated.  Stefan, can you elaborate on the
possibilities wrt qemu throttling?
Migration speed can be set at byte granularity but the default units are MB/s.

Currently there is no image streaming throttling code at all, but in
terms of implementing throttling the algorithm does know when data is
transferred so it can do some accounting to rate limit itself.  I
think MB/s would be fine.

Stefan

Re: [libvirt] RFC New virDomainBlockPull API family to libvirt

Stefan Hajnoczi