On Mon, May 02, 2011 at 04:29:49PM -0500, Adam Litke wrote:
After several long distractions, I am back to working on disk
streaming.
Before I hit the list with a new series of patches, I was hoping to
reach some middle ground on the proposed streaming API.
On Fri, 2011-04-08 at 14:31 +0100, Daniel P. Berrange wrote:
> On Thu, Apr 07, 2011 at 04:31:59PM -0500, Adam Litke wrote:
> > /*
> > + * Disk Streaming
> > + */
> > +typedef enum {
> > + VIR_STREAM_DISK_ONE = 1, /* Stream a single disk unit */
> > + VIR_STREAM_DISK_START = 2, /* Stream the entire disk */
> > + VIR_STREAM_DISK_STOP = 4, /* Stop streaming a disk */
> > +} virDomainStreamDiskFlags;
>
> Using flags to combine two separate tasks into one single API
> is rather unpleasant. As raised in the previous patch, the API
> should also be taking a offset+length in bytes, then there is
> no need for a special case transfer of an individual sector.
This is a fair point. I will work with Stefan to support an
offset/length qemu API. Since there doesn't seem to be a great way to
query device sizes, I think we do need a convenient way to request that
the entire disk be streamed. We could either do this with a flag or by
overriding (offset==0 && length==0) to mean stream the entire device.
Do you have a preference?
Since length==0 is otherwise meaningless, it is fine to use that
to indicate "until end of disk". This is consistent with
what we do for virStorageVolUpload/Download which allow length=0
to indicate "until end of disk".
> > +#define VIR_STREAM_PATH_BUFLEN 1024
> > +#define VIR_STREAM_DISK_MAX_STREAMS 10
> > +
> > +typedef struct _virStreamDiskState virStreamDiskState;
> > +struct _virStreamDiskState {
> > + char path[VIR_STREAM_PATH_BUFLEN];
> > + /*
> > + * The unit of measure for size and offset is unspecified. These fields
> > + * are meant to indicate the progress of a continuous streaming
operation.
> > + */
> > + unsigned long long offset; /* Current offset of active streaming */
> > + unsigned long long size; /* Disk size */
> > +};
> > +typedef virStreamDiskState *virStreamDiskStatePtr;
> > +
> > +unsigned long long virDomainStreamDisk(virDomainPtr dom,
> > + const char *path,
> > + unsigned long long offset,
> > + unsigned int flags);
> > +
> > +int virDomainStreamDiskInfo(virDomainPtr dom,
> > + virStreamDiskStatePtr
states,
> > + unsigned int nr_states,
> > + unsigned int flags);
>
> I would have liked it if we could use the existing JobInfo APIs for
> getting progress information, but if we need to allow concurrent
> usage for multiple disks per-VM, we can't. I think we should still
> use a similar style of API though.
The goal is to eventually support concurrent streams. Therefore, we
will probably need to roll our own Status and Abort APIs (see my
proposed API below).
> There also doesn't appear to be a precise way to determine if the
> copying of an entire disk failed part way through, and if so, how
> much was actually copied.
>
> Taking all the previous points together, I think the API needs to
> look like this:
>
> typedef enum {
> /* If set, virDomainBlockAllocate() will return immediately
> * allowing polling for operation completion & status
> */
> VIR_DOMAIN_DISK_ALLOCATE_NONBLOCK,
> } virDomainBlockAllocateFlags;
>
> /*
> * @path: fully qualified filename of the virtual disk
I probably misnamed it, but path is actually the device alias, not a
path to an image file.
Hmm, I wonder whether that is a good choice or not. The other
APIs all use the disk path. Perhaps we could use that as default
and have a flag to indicate whether the path should be treated as
a device alias instead. Thus getting both options.
> * @offset: logical position in bytes, within the virtual disk
> * @length: amount of data to copy, in bytes starting from @offset
> * @flags: One of virDomainBlockAllocateFlags, or zero
> *
> * Ensure the virtual disk @path is fully allocated
> * between @offset and @offset+@length. If a backing
> * store is present, data will be filled from the
> * backing store, otherwise data will be fileld with
> * zeros
> *
> * If @flags contains VIR_DOMAIN_DISK_ALLOCATE_NONBLOCK,
> * this API will return immediately after initiating the
> * copy, otherwise it will block until copying is complete
> *
> */
> int virDomainBlockAllocate(virDomainPtr dom,
> const char *path,
> unsigned long long offset,
> unsigned long long length,
> unsigned int flags);
>
>
> /*
> * @path: fully qualified filename of the virtual disk
> * @info: allocated struct to return progress info
> *
> * Query the progress of a disk allocation job. This
> * API must be used when virDomainBlockAllocate() was
> * invoked with the VIR_DOMAIN_DISK_ALLOCATE_NONBLOCK
> * flag set.
> *
> * The @info.type field will indicate whether the job
> * was completed successfully, or failed part way
> * through.
> *
> * The @info data progress fields will contain current
> * progress information.
> *
> * The hypervisor driver may optionally chose to also
> * fillin a time estimate for completion.
> */
> int virDomainBlockGetJobInfo(virDomainPtr dom,
> const char *path,
> virDomainJobInfoPtr info);
>
> /*
> * @path: fully qualified filename of the virtual disk
> *
> * Request that a disk allocation job be aborted at
> * the soonest opportunity. This API can only be used
> * when virDomainBlockAllocate() was invoked with the
> * VIR_DOMAIN_DISK_ALLOCATE_NONBLOCK flag set.
> */
> int virDomainBlockAbortJob(virDomainPtr dom,
> const char *path);
>
>
> typedef struct _virDomainBlockRegion virDomainBlockRegion;
> typedef virDomainBlockRegion *virDomainBlockRegionPtr;
> struct _virDomainBlockRegion {
> /* The logical offset within the file of the allocated region */
> unsigned long long offset;
> /* The length of the allocated region */
> unsigned long long length;
> };
>
> /*
> * @path: fully qualified filename of the virtual disk
> * @nregions: filled in the number of @region structs
> * @regions: filled with a list of allocated regions
> *
> * Query the extents of allocated regions within the
> * virtual disk file. The offsets in the list of regions
> * are not guarenteed to be sorted in any explicit order.
> */
> int virDomainBlockGetAllocationMap(virDomainPtr dom,
> const char *path,
> unsigned int *nregions,
> virDomainBlockRegionPtr *regions);
I am not convinced that we really need the block allocation map stuff.
It's a fun toy, but the layout of data within a device is far too
low-level of a concept to be exposing to users. In my opinion,
streaming should really be done sequentially from the start of the
device to the end (with arbitrary chunk sizes allowed for throttling
purposes). If an application wants to stream according to some special
pattern, let them maintain a data structure outside of libvirt to manage
that extra complexity. It's not an error to stream a chunk that is
already present. The populated areas will just complete immediately.
Therefore, there isn't a need to maintain a strict record of outstanding
regions.
Well we can always add something like this in at a later
date, so it is fine to drop it.
> This takes care of things for running guests. It would be
> desirable to have the same functionality available when a
> guest is not running, via the virStorageVol APIs. Indeed,
> this would allow access to the allocation functionality
> for disks not explicitly associated with any VM yet.
In light of previous discussion about the complexities around storage
formats and filesystems, I am not sure how useful such an offline API is
going to be in practice. At any rate, I would prefer to consider that
issue separately.
So, with my points taken into account I would like to counter with the
following API proposal:
==== snip ====
typedef enum {
/* If set, virDomainBlockAllocate() will return immediately
* allowing polling for operation completion & status
*/
VIR_DOMAIN_DISK_STREAM_NONBLOCK,
} virDomainBlockStreamFlags;
/*
* @device: alias of the target block device
* @offset: logical position in bytes, within the virtual disk
* @length: amount of data to copy, in bytes starting from @offset
* @flags: One of virDomainBlockAllocateFlags, or zero
*
* Ensure the virtual disk @device is fully allocated
* between @offset and @offset+@length. If a backing
* store is present, data will be filled from the
* backing store, otherwise data will be fileld with
* zeros.
*
* If @flags contains VIR_DOMAIN_DISK_STREAM_NONBLOCK,
* this API will return immediately after initiating the
* copy, otherwise it will block until copying is complete
*
*/
int virDomainBlockStream(virDomainPtr dom,
const char *device,
unsigned long long offset,
unsigned long long length,
unsigned int flags);
/*
* @device: alias of the target block device
*
* Request that a disk streaming job be aborted at
* the soonest opportunity. This API can only be used
* when virDomainBlockStream() was invoked with the
* VIR_DOMAIN_DISK_STREAM_NONBLOCK flag set.
*/
int virDomainBlockStreamAbort(virDomainPtr dom,
const char *device);
typedef enum {
VIR_BLOCK_STREAM_STATE_COMPLETED = 0,
VIR_BLOCK_STREAM_STATE_ACTIVE = 1,
VIR_BLOCK_STREAM_STATE_FAILED = 2,
} virBlockStreamStatus;
typedef struct _virBlockStreamInfo virBlockStreamInfo;
struct _virBlockStreamInfo {
virBlockStreamStatus status;
Coding guidelines don't allow use of enum types in structs
or as API parameters, instead requiring 'int'.
unsigned long long remaining; /* number of bytes remaining
*/
};
typedef virBlockStreamInfo *virBlockStreamInfoPtr;
s/virBlock/virDomainBlock/ in several places here.
/*
* @device: alias of the target block device
* @info: allocated struct to return progress info
*
* Query the progress of a disk streaming job. This
* API must be used when virDomainBlockStream() was
* invoked with the VIR_DOMAIN_DISK_STREAM_NONBLOCK
* flag set.
*
*/
int virDomainBlockStreamInfo(virDomainPtr dom,
const char *device,
virBlockStreamInfoPtr info);
Any reason you didn't just use the existing 'virDomainJobInfoPtr'
struct, with this new API ? In particular I think we want many
of the other fields from that struct beyond just 'remaining'.
Regards,
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|