After several long distractions, I am back to working on disk streaming.
Before I hit the list with a new series of patches, I was hoping to
reach some middle ground on the proposed streaming API.
On Fri, 2011-04-08 at 14:31 +0100, Daniel P. Berrange wrote:
On Thu, Apr 07, 2011 at 04:31:59PM -0500, Adam Litke wrote:
> /*
> + * Disk Streaming
> + */
> +typedef enum {
> + VIR_STREAM_DISK_ONE = 1, /* Stream a single disk unit */
> + VIR_STREAM_DISK_START = 2, /* Stream the entire disk */
> + VIR_STREAM_DISK_STOP = 4, /* Stop streaming a disk */
> +} virDomainStreamDiskFlags;
Using flags to combine two separate tasks into one single API
is rather unpleasant. As raised in the previous patch, the API
should also be taking a offset+length in bytes, then there is
no need for a special case transfer of an individual sector.
This is a fair point. I will work with Stefan to support an
offset/length qemu API. Since there doesn't seem to be a great way to
query device sizes, I think we do need a convenient way to request that
the entire disk be streamed. We could either do this with a flag or by
overriding (offset==0 && length==0) to mean stream the entire device.
Do you have a preference?
> +
> +#define VIR_STREAM_PATH_BUFLEN 1024
> +#define VIR_STREAM_DISK_MAX_STREAMS 10
> +
> +typedef struct _virStreamDiskState virStreamDiskState;
> +struct _virStreamDiskState {
> + char path[VIR_STREAM_PATH_BUFLEN];
> + /*
> + * The unit of measure for size and offset is unspecified. These fields
> + * are meant to indicate the progress of a continuous streaming operation.
> + */
> + unsigned long long offset; /* Current offset of active streaming */
> + unsigned long long size; /* Disk size */
> +};
> +typedef virStreamDiskState *virStreamDiskStatePtr;
> +
> +unsigned long long virDomainStreamDisk(virDomainPtr dom,
> + const char *path,
> + unsigned long long offset,
> + unsigned int flags);
> +
> +int virDomainStreamDiskInfo(virDomainPtr dom,
> + virStreamDiskStatePtr states,
> + unsigned int nr_states,
> + unsigned int flags);
I would have liked it if we could use the existing JobInfo APIs for
getting progress information, but if we need to allow concurrent
usage for multiple disks per-VM, we can't. I think we should still
use a similar style of API though.
The goal is to eventually support concurrent streams. Therefore, we
will probably need to roll our own Status and Abort APIs (see my
proposed API below).
There also doesn't appear to be a precise way to determine if
the
copying of an entire disk failed part way through, and if so, how
much was actually copied.
Taking all the previous points together, I think the API needs to
look like this:
typedef enum {
/* If set, virDomainBlockAllocate() will return immediately
* allowing polling for operation completion & status
*/
VIR_DOMAIN_DISK_ALLOCATE_NONBLOCK,
} virDomainBlockAllocateFlags;
/*
* @path: fully qualified filename of the virtual disk
I probably misnamed it, but path is actually the device alias, not a
path to an image file.
* @offset: logical position in bytes, within the virtual disk
* @length: amount of data to copy, in bytes starting from @offset
* @flags: One of virDomainBlockAllocateFlags, or zero
*
* Ensure the virtual disk @path is fully allocated
* between @offset and @offset+@length. If a backing
* store is present, data will be filled from the
* backing store, otherwise data will be fileld with
* zeros
*
* If @flags contains VIR_DOMAIN_DISK_ALLOCATE_NONBLOCK,
* this API will return immediately after initiating the
* copy, otherwise it will block until copying is complete
*
*/
int virDomainBlockAllocate(virDomainPtr dom,
const char *path,
unsigned long long offset,
unsigned long long length,
unsigned int flags);
/*
* @path: fully qualified filename of the virtual disk
* @info: allocated struct to return progress info
*
* Query the progress of a disk allocation job. This
* API must be used when virDomainBlockAllocate() was
* invoked with the VIR_DOMAIN_DISK_ALLOCATE_NONBLOCK
* flag set.
*
* The @info.type field will indicate whether the job
* was completed successfully, or failed part way
* through.
*
* The @info data progress fields will contain current
* progress information.
*
* The hypervisor driver may optionally chose to also
* fillin a time estimate for completion.
*/
int virDomainBlockGetJobInfo(virDomainPtr dom,
const char *path,
virDomainJobInfoPtr info);
/*
* @path: fully qualified filename of the virtual disk
*
* Request that a disk allocation job be aborted at
* the soonest opportunity. This API can only be used
* when virDomainBlockAllocate() was invoked with the
* VIR_DOMAIN_DISK_ALLOCATE_NONBLOCK flag set.
*/
int virDomainBlockAbortJob(virDomainPtr dom,
const char *path);
typedef struct _virDomainBlockRegion virDomainBlockRegion;
typedef virDomainBlockRegion *virDomainBlockRegionPtr;
struct _virDomainBlockRegion {
/* The logical offset within the file of the allocated region */
unsigned long long offset;
/* The length of the allocated region */
unsigned long long length;
};
/*
* @path: fully qualified filename of the virtual disk
* @nregions: filled in the number of @region structs
* @regions: filled with a list of allocated regions
*
* Query the extents of allocated regions within the
* virtual disk file. The offsets in the list of regions
* are not guarenteed to be sorted in any explicit order.
*/
int virDomainBlockGetAllocationMap(virDomainPtr dom,
const char *path,
unsigned int *nregions,
virDomainBlockRegionPtr *regions);
I am not convinced that we really need the block allocation map stuff.
It's a fun toy, but the layout of data within a device is far too
low-level of a concept to be exposing to users. In my opinion,
streaming should really be done sequentially from the start of the
device to the end (with arbitrary chunk sizes allowed for throttling
purposes). If an application wants to stream according to some special
pattern, let them maintain a data structure outside of libvirt to manage
that extra complexity. It's not an error to stream a chunk that is
already present. The populated areas will just complete immediately.
Therefore, there isn't a need to maintain a strict record of outstanding
regions.
NB, I've used 'Block' rather than 'Disk' in the
API names,
since all the other APIs we have dealing with disks in a
guest, use 'Block' in their name. An unfortunate naming
convention, but now we have it, we should stick with it
consistently.
Sure, no problem.
This takes care of things for running guests. It would be
desirable to have the same functionality available when a
guest is not running, via the virStorageVol APIs. Indeed,
this would allow access to the allocation functionality
for disks not explicitly associated with any VM yet.
In light of previous discussion about the complexities around storage
formats and filesystems, I am not sure how useful such an offline API is
going to be in practice. At any rate, I would prefer to consider that
issue separately.
So, with my points taken into account I would like to counter with the
following API proposal:
==== snip ====
typedef enum {
/* If set, virDomainBlockAllocate() will return immediately
* allowing polling for operation completion & status
*/
VIR_DOMAIN_DISK_STREAM_NONBLOCK,
} virDomainBlockStreamFlags;
/*
* @device: alias of the target block device
* @offset: logical position in bytes, within the virtual disk
* @length: amount of data to copy, in bytes starting from @offset
* @flags: One of virDomainBlockAllocateFlags, or zero
*
* Ensure the virtual disk @device is fully allocated
* between @offset and @offset+@length. If a backing
* store is present, data will be filled from the
* backing store, otherwise data will be fileld with
* zeros.
*
* If @flags contains VIR_DOMAIN_DISK_STREAM_NONBLOCK,
* this API will return immediately after initiating the
* copy, otherwise it will block until copying is complete
*
*/
int virDomainBlockStream(virDomainPtr dom,
const char *device,
unsigned long long offset,
unsigned long long length,
unsigned int flags);
/*
* @device: alias of the target block device
*
* Request that a disk streaming job be aborted at
* the soonest opportunity. This API can only be used
* when virDomainBlockStream() was invoked with the
* VIR_DOMAIN_DISK_STREAM_NONBLOCK flag set.
*/
int virDomainBlockStreamAbort(virDomainPtr dom,
const char *device);
typedef enum {
VIR_BLOCK_STREAM_STATE_COMPLETED = 0,
VIR_BLOCK_STREAM_STATE_ACTIVE = 1,
VIR_BLOCK_STREAM_STATE_FAILED = 2,
} virBlockStreamStatus;
typedef struct _virBlockStreamInfo virBlockStreamInfo;
struct _virBlockStreamInfo {
virBlockStreamStatus status;
unsigned long long remaining; /* number of bytes remaining */
};
typedef virBlockStreamInfo *virBlockStreamInfoPtr;
/*
* @device: alias of the target block device
* @info: allocated struct to return progress info
*
* Query the progress of a disk streaming job. This
* API must be used when virDomainBlockStream() was
* invoked with the VIR_DOMAIN_DISK_STREAM_NONBLOCK
* flag set.
*
*/
int virDomainBlockStreamInfo(virDomainPtr dom,
const char *device,
virBlockStreamInfoPtr info);
--
Thanks,
Adam