On 05/09/2011 11:09 AM, Daniel P. Berrange wrote:
On Mon, May 02, 2011 at 04:29:49PM -0500, Adam Litke wrote:
> After several long distractions, I am back to working on disk streaming.
> Before I hit the list with a new series of patches, I was hoping to
> reach some middle ground on the proposed streaming API.
>
> On Fri, 2011-04-08 at 14:31 +0100, Daniel P. Berrange wrote:
>> On Thu, Apr 07, 2011 at 04:31:59PM -0500, Adam Litke wrote:
>>> /*
>>> + * Disk Streaming
>>> + */
>>> +typedef enum {
>>> + VIR_STREAM_DISK_ONE = 1, /* Stream a single disk unit */
>>> + VIR_STREAM_DISK_START = 2, /* Stream the entire disk */
>>> + VIR_STREAM_DISK_STOP = 4, /* Stop streaming a disk */
>>> +} virDomainStreamDiskFlags;
>>
>> Using flags to combine two separate tasks into one single API
>> is rather unpleasant. As raised in the previous patch, the API
>> should also be taking a offset+length in bytes, then there is
>> no need for a special case transfer of an individual sector.
>
> This is a fair point. I will work with Stefan to support an
> offset/length qemu API. Since there doesn't seem to be a great way to
> query device sizes, I think we do need a convenient way to request that
> the entire disk be streamed. We could either do this with a flag or by
> overriding (offset==0&& length==0) to mean stream the entire device.
> Do you have a preference?
Since length==0 is otherwise meaningless, it is fine to use that
to indicate "until end of disk". This is consistent with
what we do for virStorageVolUpload/Download which allow length=0
to indicate "until end of disk".
>>> +#define VIR_STREAM_PATH_BUFLEN 1024
>>> +#define VIR_STREAM_DISK_MAX_STREAMS 10
>>> +
>>> +typedef struct _virStreamDiskState virStreamDiskState;
>>> +struct _virStreamDiskState {
>>> + char path[VIR_STREAM_PATH_BUFLEN];
>>> + /*
>>> + * The unit of measure for size and offset is unspecified. These
fields
>>> + * are meant to indicate the progress of a continuous streaming
operation.
>>> + */
>>> + unsigned long long offset; /* Current offset of active streaming */
>>> + unsigned long long size; /* Disk size */
>>> +};
>>> +typedef virStreamDiskState *virStreamDiskStatePtr;
>>> +
>>> +unsigned long long virDomainStreamDisk(virDomainPtr dom,
>>> + const char *path,
>>> + unsigned long long offset,
>>> + unsigned int flags);
>>> +
>>> +int virDomainStreamDiskInfo(virDomainPtr dom,
>>> + virStreamDiskStatePtr
states,
>>> + unsigned int nr_states,
>>> + unsigned int flags);
>>
>> I would have liked it if we could use the existing JobInfo APIs for
>> getting progress information, but if we need to allow concurrent
>> usage for multiple disks per-VM, we can't. I think we should still
>> use a similar style of API though.
>
> The goal is to eventually support concurrent streams. Therefore, we
> will probably need to roll our own Status and Abort APIs (see my
> proposed API below).
>
>> There also doesn't appear to be a precise way to determine if the
>> copying of an entire disk failed part way through, and if so, how
>> much was actually copied.
>>
>> Taking all the previous points together, I think the API needs to
>> look like this:
>>
>> typedef enum {
>> /* If set, virDomainBlockAllocate() will return immediately
>> * allowing polling for operation completion& status
>> */
>> VIR_DOMAIN_DISK_ALLOCATE_NONBLOCK,
>> } virDomainBlockAllocateFlags;
>>
>> /*
>> * @path: fully qualified filename of the virtual disk
>
> I probably misnamed it, but path is actually the device alias, not a
> path to an image file.
Hmm, I wonder whether that is a good choice or not. The other
APIs all use the disk path. Perhaps we could use that as default
and have a flag to indicate whether the path should be treated as
a device alias instead. Thus getting both options.
Does libvirt have an option to lookup a device alias by disk path? If
so, then I am happy to use the file path and convert it to the form that
qemu expects.
>
>> * @offset: logical position in bytes, within the virtual disk
>> * @length: amount of data to copy, in bytes starting from @offset
>> * @flags: One of virDomainBlockAllocateFlags, or zero
>> *
>> * Ensure the virtual disk @path is fully allocated
>> * between @offset and @offset+@length. If a backing
>> * store is present, data will be filled from the
>> * backing store, otherwise data will be fileld with
>> * zeros
>> *
>> * If @flags contains VIR_DOMAIN_DISK_ALLOCATE_NONBLOCK,
>> * this API will return immediately after initiating the
>> * copy, otherwise it will block until copying is complete
>> *
>> */
>> int virDomainBlockAllocate(virDomainPtr dom,
>> const char *path,
>> unsigned long long offset,
>> unsigned long long length,
>> unsigned int flags);
>>
>>
>> /*
>> * @path: fully qualified filename of the virtual disk
>> * @info: allocated struct to return progress info
>> *
>> * Query the progress of a disk allocation job. This
>> * API must be used when virDomainBlockAllocate() was
>> * invoked with the VIR_DOMAIN_DISK_ALLOCATE_NONBLOCK
>> * flag set.
>> *
>> * The @info.type field will indicate whether the job
>> * was completed successfully, or failed part way
>> * through.
>> *
>> * The @info data progress fields will contain current
>> * progress information.
>> *
>> * The hypervisor driver may optionally chose to also
>> * fillin a time estimate for completion.
>> */
>> int virDomainBlockGetJobInfo(virDomainPtr dom,
>> const char *path,
>> virDomainJobInfoPtr info);
>>
>> /*
>> * @path: fully qualified filename of the virtual disk
>> *
>> * Request that a disk allocation job be aborted at
>> * the soonest opportunity. This API can only be used
>> * when virDomainBlockAllocate() was invoked with the
>> * VIR_DOMAIN_DISK_ALLOCATE_NONBLOCK flag set.
>> */
>> int virDomainBlockAbortJob(virDomainPtr dom,
>> const char *path);
>>
>>
>> typedef struct _virDomainBlockRegion virDomainBlockRegion;
>> typedef virDomainBlockRegion *virDomainBlockRegionPtr;
>> struct _virDomainBlockRegion {
>> /* The logical offset within the file of the allocated region */
>> unsigned long long offset;
>> /* The length of the allocated region */
>> unsigned long long length;
>> };
>>
>> /*
>> * @path: fully qualified filename of the virtual disk
>> * @nregions: filled in the number of @region structs
>> * @regions: filled with a list of allocated regions
>> *
>> * Query the extents of allocated regions within the
>> * virtual disk file. The offsets in the list of regions
>> * are not guarenteed to be sorted in any explicit order.
>> */
>> int virDomainBlockGetAllocationMap(virDomainPtr dom,
>> const char *path,
>> unsigned int *nregions,
>> virDomainBlockRegionPtr *regions);
>
> I am not convinced that we really need the block allocation map stuff.
> It's a fun toy, but the layout of data within a device is far too
> low-level of a concept to be exposing to users. In my opinion,
> streaming should really be done sequentially from the start of the
> device to the end (with arbitrary chunk sizes allowed for throttling
> purposes). If an application wants to stream according to some special
> pattern, let them maintain a data structure outside of libvirt to manage
> that extra complexity. It's not an error to stream a chunk that is
> already present. The populated areas will just complete immediately.
> Therefore, there isn't a need to maintain a strict record of outstanding
> regions.
Well we can always add something like this in at a later
date, so it is fine to drop it.
>> This takes care of things for running guests. It would be
>> desirable to have the same functionality available when a
>> guest is not running, via the virStorageVol APIs. Indeed,
>> this would allow access to the allocation functionality
>> for disks not explicitly associated with any VM yet.
>
> In light of previous discussion about the complexities around storage
> formats and filesystems, I am not sure how useful such an offline API is
> going to be in practice. At any rate, I would prefer to consider that
> issue separately.
>
> So, with my points taken into account I would like to counter with the
> following API proposal:
>
> ==== snip ====
>
> typedef enum {
> /* If set, virDomainBlockAllocate() will return immediately
> * allowing polling for operation completion& status
> */
> VIR_DOMAIN_DISK_STREAM_NONBLOCK,
> } virDomainBlockStreamFlags;
>
> /*
> * @device: alias of the target block device
> * @offset: logical position in bytes, within the virtual disk
> * @length: amount of data to copy, in bytes starting from @offset
> * @flags: One of virDomainBlockAllocateFlags, or zero
> *
> * Ensure the virtual disk @device is fully allocated
> * between @offset and @offset+@length. If a backing
> * store is present, data will be filled from the
> * backing store, otherwise data will be fileld with
> * zeros.
> *
> * If @flags contains VIR_DOMAIN_DISK_STREAM_NONBLOCK,
> * this API will return immediately after initiating the
> * copy, otherwise it will block until copying is complete
> *
> */
> int virDomainBlockStream(virDomainPtr dom,
> const char *device,
> unsigned long long offset,
> unsigned long long length,
> unsigned int flags);
>
> /*
> * @device: alias of the target block device
> *
> * Request that a disk streaming job be aborted at
> * the soonest opportunity. This API can only be used
> * when virDomainBlockStream() was invoked with the
> * VIR_DOMAIN_DISK_STREAM_NONBLOCK flag set.
> */
> int virDomainBlockStreamAbort(virDomainPtr dom,
> const char *device);
>
> typedef enum {
> VIR_BLOCK_STREAM_STATE_COMPLETED = 0,
> VIR_BLOCK_STREAM_STATE_ACTIVE = 1,
> VIR_BLOCK_STREAM_STATE_FAILED = 2,
> } virBlockStreamStatus;
>
> typedef struct _virBlockStreamInfo virBlockStreamInfo;
> struct _virBlockStreamInfo {
> virBlockStreamStatus status;
Coding guidelines don't allow use of enum types in structs
or as API parameters, instead requiring 'int'.
OK.
> unsigned long long remaining; /* number of bytes
remaining */
> };
> typedef virBlockStreamInfo *virBlockStreamInfoPtr;
s/virBlock/virDomainBlock/ in several places here.
>
> /*
> * @device: alias of the target block device
> * @info: allocated struct to return progress info
> *
> * Query the progress of a disk streaming job. This
> * API must be used when virDomainBlockStream() was
> * invoked with the VIR_DOMAIN_DISK_STREAM_NONBLOCK
> * flag set.
> *
> */
> int virDomainBlockStreamInfo(virDomainPtr dom,
> const char *device,
> virBlockStreamInfoPtr info);
Any reason you didn't just use the existing 'virDomainJobInfoPtr'
struct, with this new API ? In particular I think we want many
of the other fields from that struct beyond just 'remaining'.
It looks like I can use this structure without problems but given the
desire to eventually support simultaneous streams, we'll still need to
write our own start/stop/status routines.