[libvirt] RFC New virDomainBlockPull API family to libvirt

Unfortunately, after committing the blockPull API to libvirt, the qemu community decided to change the API somewhat and we needed to revert the libvirt implementation. Now that the qemu API is settling out again, I would like to propose an updated libvirt public API which has required only a minor set of changes to sync back up to the qemu API. Summary of changes: - Qemu dropped incremental streaming so remove libvirt incremental BlockPull() API - Rename virDomainBlockPullAll() to virDomainBlockPull() - Changes required to qemu monitor handlers for changed command names Currently, qemu block device streaming completely flattens a disk image's backing file chain. Consider the following chain: A->B->C where C is a leaf image that is backed by B and B is backed by A. The current disk streaming command will produce an independent image C with no backing file. Future versions of qemu block streaming may support an option to specify a new base image from the current chain. For example: stream --backing_file B C would pull all blocks that are only in A to produce the chain: B->C (thus eliminating a dependency on A but maintaining B as a backing image. Do we want to create room in the BlockPull API to support this advanced usage in the future? If so, then a new parameter must be added to BlockPull: const char *backing_path. Even if we extend the API in this manner, the initial implementation will not support it because qemu will not support it immediately, and libvirt is missing core functionality to support it (no public enumeration of the disk backing file chain). Thoughts? -- To help speed the provisioning process for large domains, new QED disks are created with backing to a template image. These disks are configured with copy on read such that blocks that are read from the backing file are copied to the new disk. This reduces I/O over a potentially costly path to the backing image. In such a configuration, there is a desire to remove the dependency on the backing image as the domain runs. To accomplish this, qemu will provide an interface to perform sequential copy on read operations during normal VM operation. Once all data has been copied, the disk image's link to the backing file is removed. The virDomainBlockPull API family brings this functionality to libvirt. virDomainBlockPull() instructs the hypervisor to stream the entire device in the background. Progress of this operation can be checked with the function virDomainBlockPullInfo(). An ongoing stream can be cancelled with virDomainBlockPullAbort(). An event (VIR_DOMAIN_EVENT_ID_BLOCK_PULL) will be emitted when a disk has been fully populated or if a BlockPull() operation was terminated due to an error. This event is useful to avoid polling on virDomainBlockPullInfo() for completion and could also be used by the security driver to revoke access to the backing file when it is no longer needed. /* * BlockPull API */ /* An iterator for initiating and monitoring block pull operations */ typedef unsigned long long virDomainBlockPullCursor; typedef struct _virDomainBlockPullInfo virDomainBlockPullInfo; struct _virDomainBlockPullInfo { /* * The following fields provide an indication of block pull progress. @cur * indicates the current position and will be between 0 and @end. @end is * the final cursor position for this operation and represents completion. * To approximate progress, divide @cur by @end. */ virDomainBlockPullCursor cur; virDomainBlockPullCursor end; }; typedef virDomainBlockPullInfo *virDomainBlockPullInfoPtr; /** * virDomainBlockPull: * @dom: pointer to domain object * @path: Fully-qualified filename of disk * @flags: currently unused, for future extension * * Populate a disk image with data from its backing image. Once all data from * its backing image has been pulled, the disk no longer depends on a backing * image. This function pulls data for the entire device in the background. * Progress of the operation can be checked with virDomainGetBlockPullInfo() and * the operation can be aborted with virDomainBlockPullAbort(). When finished, * an asynchronous event is raised to indicate the final status. * * Returns 0 if the operation has started, -1 on failure. */ int virDomainBlockPull(virDomainPtr dom, const char *path, unsigned int flags); /** * virDomainBlockPullAbort: * @dom: pointer to domain object * @path: fully-qualified filename of disk * @flags: currently unused, for future extension * * Cancel a pull operation previously started by virDomainBlockPullAll(). * * Returns -1 in case of failure, 0 when successful. */ int virDomainBlockPullAbort(virDomainPtr dom, const char *path, unsigned int flags); /** * virDomainGetBlockPullInfo: * @dom: pointer to domain object * @path: fully-qualified filename of disk * @info: pointer to a virDomainBlockPullInfo structure * @flags: currently unused, for future extension * * Request progress information on a block pull operation that has been started * with virDomainBlockPull(). If an operation is active for the given * parameters, @info will be updated with the current progress. * * Returns -1 in case of failure, 0 when successful. */ int virDomainGetBlockPullInfo(virDomainPtr dom, const char *path, virDomainBlockPullInfoPtr info, unsigned int flags); /** * virConnectDomainEventBlockPullStatus: * * The final status of a virDomainBlockPull() operation */ typedef enum { VIR_DOMAIN_BLOCK_PULL_COMPLETED = 0, VIR_DOMAIN_BLOCK_PULL_FAILED = 1, } virConnectDomainEventBlockPullStatus; /** * virConnectDomainEventBlockPullCallback: * @conn: connection object * @dom: domain on which the event occurred * @path: fully-qualified filename of the affected disk * @status: final status of the operation (virConnectDomainEventBlockPullStatus * * The callback signature to use when registering for an event of type * VIR_DOMAIN_EVENT_ID_BLOCK_PULL with virConnectDomainEventRegisterAny() */ typedef void (*virConnectDomainEventBlockPullCallback)(virConnectPtr conn, virDomainPtr dom, const char *path, int status, void *opaque); -- Adam Litke IBM Linux Technology Center

On Wed, Jul 13, 2011 at 03:46:30PM -0500, Adam Litke wrote:
Unfortunately, after committing the blockPull API to libvirt, the qemu community decided to change the API somewhat and we needed to revert the libvirt implementation. Now that the qemu API is settling out again, I would like to propose an updated libvirt public API which has required only a minor set of changes to sync back up to the qemu API.
Summary of changes: - Qemu dropped incremental streaming so remove libvirt incremental BlockPull() API - Rename virDomainBlockPullAll() to virDomainBlockPull() - Changes required to qemu monitor handlers for changed command names
Okay.
Currently, qemu block device streaming completely flattens a disk image's backing file chain. Consider the following chain: A->B->C where C is a leaf image that is backed by B and B is backed by A. The current disk streaming command will produce an independent image C with no backing file. Future versions of qemu block streaming may support an option to specify a new base image from the current chain. For example: stream --backing_file B C would pull all blocks that are only in A to produce the chain: B->C (thus eliminating a dependency on A but maintaining B as a backing image.
Do we want to create room in the BlockPull API to support this advanced usage in the future? If so, then a new parameter must be added to BlockPull: const char *backing_path. Even if we extend the API in this manner, the initial implementation will not support it because qemu will not support it immediately, and libvirt is missing core functionality to support it (no public enumeration of the disk backing file chain). Thoughts?
My own preference would rather be to defer this, and provide a separate API when QEmu implementation comes. The problem is that the single backing store file might not be sufficient to fully support what may come if QEmu adds this and as you pointed out we aren't really ready for this on libvirt side either. On the other hand I suspect that we are missing the mechanism to control the rate of the transfer in the new API, which is something which could be implemented in the old incremental mechanism, but not anymore. So I wonder if the virDomainBlockPull() shouldn't get an "unsigned long bandwidth" (to stay coherent with migration APIs) before the flags. I don't know if the support is ready in QEmu but I suspect that will be an important point, so if not present will be added :-)
--
To help speed the provisioning process for large domains, new QED disks are created with backing to a template image. These disks are configured with copy on read such that blocks that are read from the backing file are copied to the new disk. This reduces I/O over a potentially costly path to the backing image.
In such a configuration, there is a desire to remove the dependency on the backing image as the domain runs. To accomplish this, qemu will provide an interface to perform sequential copy on read operations during normal VM operation. Once all data has been copied, the disk image's link to the backing file is removed.
The virDomainBlockPull API family brings this functionality to libvirt.
virDomainBlockPull() instructs the hypervisor to stream the entire device in the background. Progress of this operation can be checked with the function virDomainBlockPullInfo(). An ongoing stream can be cancelled with virDomainBlockPullAbort().
An event (VIR_DOMAIN_EVENT_ID_BLOCK_PULL) will be emitted when a disk has been fully populated or if a BlockPull() operation was terminated due to an error. This event is useful to avoid polling on virDomainBlockPullInfo() for completion and could also be used by the security driver to revoke access to the backing file when it is no longer needed.
/* * BlockPull API */
/* An iterator for initiating and monitoring block pull operations */ typedef unsigned long long virDomainBlockPullCursor;
typedef struct _virDomainBlockPullInfo virDomainBlockPullInfo; struct _virDomainBlockPullInfo { /* * The following fields provide an indication of block pull progress. @cur * indicates the current position and will be between 0 and @end. @end is * the final cursor position for this operation and represents completion. * To approximate progress, divide @cur by @end. */ virDomainBlockPullCursor cur; virDomainBlockPullCursor end; }; typedef virDomainBlockPullInfo *virDomainBlockPullInfoPtr;
/** * virDomainBlockPull: * @dom: pointer to domain object * @path: Fully-qualified filename of disk * @flags: currently unused, for future extension * * Populate a disk image with data from its backing image. Once all data from * its backing image has been pulled, the disk no longer depends on a backing * image. This function pulls data for the entire device in the background. * Progress of the operation can be checked with virDomainGetBlockPullInfo() and * the operation can be aborted with virDomainBlockPullAbort(). When finished, * an asynchronous event is raised to indicate the final status. * * Returns 0 if the operation has started, -1 on failure. */ int virDomainBlockPull(virDomainPtr dom, const char *path,
So I would just add an unsigned long bandwidth, here
unsigned int flags);
and document it as: @bandwidth: (optional) specify copy bandwidth limit in Mbps * The maximum bandwidth (in Mbps) that will be used to do the copy * can be specified with the bandwidth parameter. If set to 0, * libvirt will choose a suitable default. Some hypervisors do * not support this feature and will return an error if bandwidth * is not 0.
/** * virDomainBlockPullAbort: * @dom: pointer to domain object * @path: fully-qualified filename of disk * @flags: currently unused, for future extension * * Cancel a pull operation previously started by virDomainBlockPullAll(). * * Returns -1 in case of failure, 0 when successful. */ int virDomainBlockPullAbort(virDomainPtr dom, const char *path, unsigned int flags);
/** * virDomainGetBlockPullInfo: * @dom: pointer to domain object * @path: fully-qualified filename of disk * @info: pointer to a virDomainBlockPullInfo structure * @flags: currently unused, for future extension * * Request progress information on a block pull operation that has been started * with virDomainBlockPull(). If an operation is active for the given * parameters, @info will be updated with the current progress. * * Returns -1 in case of failure, 0 when successful. */ int virDomainGetBlockPullInfo(virDomainPtr dom, const char *path, virDomainBlockPullInfoPtr info, unsigned int flags);
I also wonder about the PullInfo, is there any way we could extend it to suggest remaining time, an approximation might be computable by the management code, but since libvirt will know about bandwidth better than the upper layer maybe it's in abetter situation to do that evaluation (assuming QEmu doesn't give informations about this).
/** * virConnectDomainEventBlockPullStatus: * * The final status of a virDomainBlockPull() operation */ typedef enum { VIR_DOMAIN_BLOCK_PULL_COMPLETED = 0, VIR_DOMAIN_BLOCK_PULL_FAILED = 1, } virConnectDomainEventBlockPullStatus;
/** * virConnectDomainEventBlockPullCallback: * @conn: connection object * @dom: domain on which the event occurred * @path: fully-qualified filename of the affected disk * @status: final status of the operation (virConnectDomainEventBlockPullStatus * * The callback signature to use when registering for an event of type * VIR_DOMAIN_EVENT_ID_BLOCK_PULL with virConnectDomainEventRegisterAny() */ typedef void (*virConnectDomainEventBlockPullCallback)(virConnectPtr conn, virDomainPtr dom, const char *path, int status, void *opaque);
Except for the bandwidth addition and potential extension of the BlockInfo, that looks fine to me. Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

On 07/13/2011 08:04 PM, Daniel Veillard wrote:
On Wed, Jul 13, 2011 at 03:46:30PM -0500, Adam Litke wrote:
Unfortunately, after committing the blockPull API to libvirt, the qemu community decided to change the API somewhat and we needed to revert the libvirt implementation. Now that the qemu API is settling out again, I would like to propose an updated libvirt public API which has required only a minor set of changes to sync back up to the qemu API.
Summary of changes: - Qemu dropped incremental streaming so remove libvirt incremental BlockPull() API - Rename virDomainBlockPullAll() to virDomainBlockPull() - Changes required to qemu monitor handlers for changed command names
Okay.
Currently, qemu block device streaming completely flattens a disk image's backing file chain. Consider the following chain: A->B->C where C is a leaf image that is backed by B and B is backed by A. The current disk streaming command will produce an independent image C with no backing file. Future versions of qemu block streaming may support an option to specify a new base image from the current chain. For example: stream --backing_file B C would pull all blocks that are only in A to produce the chain: B->C (thus eliminating a dependency on A but maintaining B as a backing image.
Do we want to create room in the BlockPull API to support this advanced usage in the future? If so, then a new parameter must be added to BlockPull: const char *backing_path. Even if we extend the API in this manner, the initial implementation will not support it because qemu will not support it immediately, and libvirt is missing core functionality to support it (no public enumeration of the disk backing file chain). Thoughts?
My own preference would rather be to defer this, and provide a separate API when QEmu implementation comes. The problem is that the single backing store file might not be sufficient to fully support what may come if QEmu adds this and as you pointed out we aren't really ready for this on libvirt side either.
Yes, I agree with you here.
On the other hand I suspect that we are missing the mechanism to control the rate of the transfer in the new API, which is something which could be implemented in the old incremental mechanism, but not anymore. So I wonder if the virDomainBlockPull() shouldn't get an "unsigned long bandwidth" (to stay coherent with migration APIs) before the flags. I don't know if the support is ready in QEmu but I suspect that will be an important point, so if not present will be added :-)
Hopefully Stefan can comment on any throttling mechanisms that might be added to the qemu implementation. It would be nice to have this feature built into the API to match the migration APIs, but if that is not feasible then we could always add it later.
--
To help speed the provisioning process for large domains, new QED disks are created with backing to a template image. These disks are configured with copy on read such that blocks that are read from the backing file are copied to the new disk. This reduces I/O over a potentially costly path to the backing image.
In such a configuration, there is a desire to remove the dependency on the backing image as the domain runs. To accomplish this, qemu will provide an interface to perform sequential copy on read operations during normal VM operation. Once all data has been copied, the disk image's link to the backing file is removed.
The virDomainBlockPull API family brings this functionality to libvirt.
virDomainBlockPull() instructs the hypervisor to stream the entire device in the background. Progress of this operation can be checked with the function virDomainBlockPullInfo(). An ongoing stream can be cancelled with virDomainBlockPullAbort().
An event (VIR_DOMAIN_EVENT_ID_BLOCK_PULL) will be emitted when a disk has been fully populated or if a BlockPull() operation was terminated due to an error. This event is useful to avoid polling on virDomainBlockPullInfo() for completion and could also be used by the security driver to revoke access to the backing file when it is no longer needed.
/* * BlockPull API */
/* An iterator for initiating and monitoring block pull operations */ typedef unsigned long long virDomainBlockPullCursor;
typedef struct _virDomainBlockPullInfo virDomainBlockPullInfo; struct _virDomainBlockPullInfo { /* * The following fields provide an indication of block pull progress. @cur * indicates the current position and will be between 0 and @end. @end is * the final cursor position for this operation and represents completion. * To approximate progress, divide @cur by @end. */ virDomainBlockPullCursor cur; virDomainBlockPullCursor end; }; typedef virDomainBlockPullInfo *virDomainBlockPullInfoPtr;
/** * virDomainBlockPull: * @dom: pointer to domain object * @path: Fully-qualified filename of disk * @flags: currently unused, for future extension * * Populate a disk image with data from its backing image. Once all data from * its backing image has been pulled, the disk no longer depends on a backing * image. This function pulls data for the entire device in the background. * Progress of the operation can be checked with virDomainGetBlockPullInfo() and * the operation can be aborted with virDomainBlockPullAbort(). When finished, * an asynchronous event is raised to indicate the final status. * * Returns 0 if the operation has started, -1 on failure. */ int virDomainBlockPull(virDomainPtr dom, const char *path,
So I would just add an unsigned long bandwidth, here
unsigned int flags);
and document it as: @bandwidth: (optional) specify copy bandwidth limit in Mbps
* The maximum bandwidth (in Mbps) that will be used to do the copy * can be specified with the bandwidth parameter. If set to 0, * libvirt will choose a suitable default. Some hypervisors do * not support this feature and will return an error if bandwidth * is not 0.
I am concerned with the units here. I am not sure if the qemu throttling code will permit us to throttle by Mbps. The amount of data that has to actually be transferred depends on how many of the image's blocks are already populated. Stefan, can you elaborate on the possibilities wrt qemu throttling?
/** * virDomainBlockPullAbort: * @dom: pointer to domain object * @path: fully-qualified filename of disk * @flags: currently unused, for future extension * * Cancel a pull operation previously started by virDomainBlockPullAll(). * * Returns -1 in case of failure, 0 when successful. */ int virDomainBlockPullAbort(virDomainPtr dom, const char *path, unsigned int flags);
/** * virDomainGetBlockPullInfo: * @dom: pointer to domain object * @path: fully-qualified filename of disk * @info: pointer to a virDomainBlockPullInfo structure * @flags: currently unused, for future extension * * Request progress information on a block pull operation that has been started * with virDomainBlockPull(). If an operation is active for the given * parameters, @info will be updated with the current progress. * * Returns -1 in case of failure, 0 when successful. */ int virDomainGetBlockPullInfo(virDomainPtr dom, const char *path, virDomainBlockPullInfoPtr info, unsigned int flags);
I also wonder about the PullInfo, is there any way we could extend it to suggest remaining time, an approximation might be computable by the management code, but since libvirt will know about bandwidth better than the upper layer maybe it's in abetter situation to do that evaluation (assuming QEmu doesn't give informations about this).
We had considered this earlier in the original API discussions, but given the non-deterministic nature of the data that will ultimately be sent, an accurate time estimate is not possible.
/** * virConnectDomainEventBlockPullStatus: * * The final status of a virDomainBlockPull() operation */ typedef enum { VIR_DOMAIN_BLOCK_PULL_COMPLETED = 0, VIR_DOMAIN_BLOCK_PULL_FAILED = 1, } virConnectDomainEventBlockPullStatus;
/** * virConnectDomainEventBlockPullCallback: * @conn: connection object * @dom: domain on which the event occurred * @path: fully-qualified filename of the affected disk * @status: final status of the operation (virConnectDomainEventBlockPullStatus * * The callback signature to use when registering for an event of type * VIR_DOMAIN_EVENT_ID_BLOCK_PULL with virConnectDomainEventRegisterAny() */ typedef void (*virConnectDomainEventBlockPullCallback)(virConnectPtr conn, virDomainPtr dom, const char *path, int status, void *opaque);
Except for the bandwidth addition and potential extension of the BlockInfo, that looks fine to me.
Daniel
-- Adam Litke IBM Linux Technology Center

On Thu, Jul 14, 2011 at 7:47 PM, Adam Litke <agl@us.ibm.com> wrote:
On 07/13/2011 08:04 PM, Daniel Veillard wrote:
On Wed, Jul 13, 2011 at 03:46:30PM -0500, Adam Litke wrote:
Unfortunately, after committing the blockPull API to libvirt, the qemu community decided to change the API somewhat and we needed to revert the libvirt implementation. Now that the qemu API is settling out again, I would like to propose an updated libvirt public API which has required only a minor set of changes to sync back up to the qemu API.
Summary of changes: - Qemu dropped incremental streaming so remove libvirt incremental BlockPull() API - Rename virDomainBlockPullAll() to virDomainBlockPull() - Changes required to qemu monitor handlers for changed command names
Okay.
Currently, qemu block device streaming completely flattens a disk image's backing file chain. Consider the following chain: A->B->C where C is a leaf image that is backed by B and B is backed by A. The current disk streaming command will produce an independent image C with no backing file. Future versions of qemu block streaming may support an option to specify a new base image from the current chain. For example: stream --backing_file B C would pull all blocks that are only in A to produce the chain: B->C (thus eliminating a dependency on A but maintaining B as a backing image.
Do we want to create room in the BlockPull API to support this advanced usage in the future? If so, then a new parameter must be added to BlockPull: const char *backing_path. Even if we extend the API in this manner, the initial implementation will not support it because qemu will not support it immediately, and libvirt is missing core functionality to support it (no public enumeration of the disk backing file chain). Thoughts?
My own preference would rather be to defer this, and provide a separate API when QEmu implementation comes. The problem is that the single backing store file might not be sufficient to fully support what may come if QEmu adds this and as you pointed out we aren't really ready for this on libvirt side either.
Yes, I agree with you here.
On the other hand I suspect that we are missing the mechanism to control the rate of the transfer in the new API, which is something which could be implemented in the old incremental mechanism, but not anymore. So I wonder if the virDomainBlockPull() shouldn't get an "unsigned long bandwidth" (to stay coherent with migration APIs) before the flags. I don't know if the support is ready in QEmu but I suspect that will be an important point, so if not present will be added :-)
Hopefully Stefan can comment on any throttling mechanisms that might be added to the qemu implementation. It would be nice to have this feature built into the API to match the migration APIs, but if that is not feasible then we could always add it later.
If we follow the live migration API then a block_stream_set_speed command would be used. libvirt would issue it as a separate command immediately after starting the streaming operation. There is the window of time after streaming has started but before block_stream_set_speed has been called where no throttling takes place, but I don't think this is a practical problem. It also opens the possibility of allowing the user to change the rate limit value while the block streaming operation is active.
To help speed the provisioning process for large domains, new QED disks are created with backing to a template image. These disks are configured with copy on read such that blocks that are read from the backing file are copied to the new disk. This reduces I/O over a potentially costly path to the backing image.
In such a configuration, there is a desire to remove the dependency on the backing image as the domain runs. To accomplish this, qemu will provide an interface to perform sequential copy on read operations during normal VM operation. Once all data has been copied, the disk image's link to the backing file is removed.
The virDomainBlockPull API family brings this functionality to libvirt.
virDomainBlockPull() instructs the hypervisor to stream the entire device in the background. Progress of this operation can be checked with the function virDomainBlockPullInfo(). An ongoing stream can be cancelled with virDomainBlockPullAbort().
An event (VIR_DOMAIN_EVENT_ID_BLOCK_PULL) will be emitted when a disk has been fully populated or if a BlockPull() operation was terminated due to an error. This event is useful to avoid polling on virDomainBlockPullInfo() for completion and could also be used by the security driver to revoke access to the backing file when it is no longer needed.
/* * BlockPull API */
/* An iterator for initiating and monitoring block pull operations */ typedef unsigned long long virDomainBlockPullCursor;
typedef struct _virDomainBlockPullInfo virDomainBlockPullInfo; struct _virDomainBlockPullInfo { /* * The following fields provide an indication of block pull progress. @cur * indicates the current position and will be between 0 and @end. @end is * the final cursor position for this operation and represents completion. * To approximate progress, divide @cur by @end. */ virDomainBlockPullCursor cur; virDomainBlockPullCursor end; }; typedef virDomainBlockPullInfo *virDomainBlockPullInfoPtr;
/** * virDomainBlockPull: * @dom: pointer to domain object * @path: Fully-qualified filename of disk * @flags: currently unused, for future extension * * Populate a disk image with data from its backing image. Once all data from * its backing image has been pulled, the disk no longer depends on a backing * image. This function pulls data for the entire device in the background. * Progress of the operation can be checked with virDomainGetBlockPullInfo() and * the operation can be aborted with virDomainBlockPullAbort(). When finished, * an asynchronous event is raised to indicate the final status. * * Returns 0 if the operation has started, -1 on failure. */ int virDomainBlockPull(virDomainPtr dom, const char *path,
So I would just add an unsigned long bandwidth, here
unsigned int flags);
and document it as: @bandwidth: (optional) specify copy bandwidth limit in Mbps
* The maximum bandwidth (in Mbps) that will be used to do the copy * can be specified with the bandwidth parameter. If set to 0, * libvirt will choose a suitable default. Some hypervisors do * not support this feature and will return an error if bandwidth * is not 0.
I am concerned with the units here. I am not sure if the qemu throttling code will permit us to throttle by Mbps. The amount of data that has to actually be transferred depends on how many of the image's blocks are already populated. Stefan, can you elaborate on the possibilities wrt qemu throttling?
Migration speed can be set at byte granularity but the default units are MB/s. Currently there is no image streaming throttling code at all, but in terms of implementing throttling the algorithm does know when data is transferred so it can do some accounting to rate limit itself. I think MB/s would be fine. Stefan

On 07/15/2011 05:39 AM, Stefan Hajnoczi wrote:
On Thu, Jul 14, 2011 at 7:47 PM, Adam Litke <agl@us.ibm.com> wrote:
On 07/13/2011 08:04 PM, Daniel Veillard wrote:
On Wed, Jul 13, 2011 at 03:46:30PM -0500, Adam Litke wrote:
Unfortunately, after committing the blockPull API to libvirt, the qemu community decided to change the API somewhat and we needed to revert the libvirt implementation. Now that the qemu API is settling out again, I would like to propose an updated libvirt public API which has required only a minor set of changes to sync back up to the qemu API.
Summary of changes: - Qemu dropped incremental streaming so remove libvirt incremental BlockPull() API - Rename virDomainBlockPullAll() to virDomainBlockPull() - Changes required to qemu monitor handlers for changed command names
Okay.
<snip>
On the other hand I suspect that we are missing the mechanism to control the rate of the transfer in the new API, which is something which could be implemented in the old incremental mechanism, but not anymore. So I wonder if the virDomainBlockPull() shouldn't get an "unsigned long bandwidth" (to stay coherent with migration APIs) before the flags. I don't know if the support is ready in QEmu but I suspect that will be an important point, so if not present will be added :-)
Hopefully Stefan can comment on any throttling mechanisms that might be added to the qemu implementation. It would be nice to have this feature built into the API to match the migration APIs, but if that is not feasible then we could always add it later.
If we follow the live migration API then a block_stream_set_speed command would be used. libvirt would issue it as a separate command immediately after starting the streaming operation. There is the window of time after streaming has started but before block_stream_set_speed has been called where no throttling takes place, but I don't think this is a practical problem.
It also opens the possibility of allowing the user to change the rate limit value while the block streaming operation is active.
In light of our decision to provide a generic BlockJobAbort/BlockJobInfo interface, should SetSpeed be generic too? virDomainBlockJobSetSpeed() or virDomainBlockPullSetSpeed() ? -- Adam Litke IBM Linux Technology Center

On Fri, Jul 15, 2011 at 3:09 PM, Adam Litke <agl@us.ibm.com> wrote:
On 07/15/2011 05:39 AM, Stefan Hajnoczi wrote:
On Thu, Jul 14, 2011 at 7:47 PM, Adam Litke <agl@us.ibm.com> wrote:
On 07/13/2011 08:04 PM, Daniel Veillard wrote:
On Wed, Jul 13, 2011 at 03:46:30PM -0500, Adam Litke wrote:
Unfortunately, after committing the blockPull API to libvirt, the qemu community decided to change the API somewhat and we needed to revert the libvirt implementation. Now that the qemu API is settling out again, I would like to propose an updated libvirt public API which has required only a minor set of changes to sync back up to the qemu API.
Summary of changes: - Qemu dropped incremental streaming so remove libvirt incremental BlockPull() API - Rename virDomainBlockPullAll() to virDomainBlockPull() - Changes required to qemu monitor handlers for changed command names
Okay.
<snip>
On the other hand I suspect that we are missing the mechanism to control the rate of the transfer in the new API, which is something which could be implemented in the old incremental mechanism, but not anymore. So I wonder if the virDomainBlockPull() shouldn't get an "unsigned long bandwidth" (to stay coherent with migration APIs) before the flags. I don't know if the support is ready in QEmu but I suspect that will be an important point, so if not present will be added :-)
Hopefully Stefan can comment on any throttling mechanisms that might be added to the qemu implementation. It would be nice to have this feature built into the API to match the migration APIs, but if that is not feasible then we could always add it later.
If we follow the live migration API then a block_stream_set_speed command would be used. libvirt would issue it as a separate command immediately after starting the streaming operation. There is the window of time after streaming has started but before block_stream_set_speed has been called where no throttling takes place, but I don't think this is a practical problem.
It also opens the possibility of allowing the user to change the rate limit value while the block streaming operation is active.
In light of our decision to provide a generic BlockJobAbort/BlockJobInfo interface, should SetSpeed be generic too?
virDomainBlockJobSetSpeed() or virDomainBlockPullSetSpeed() ?
The block_stream_set_speed semantics allow the command to be used when no streaming operation is active. This can be used to set the speed without a window of time after creation and before set_speed is called. If we switch to block_job_set_speed then it is cleaner to restrict the command to work on active jobs only. This is because there is only one "speed" variable kept but there might be multiple job types (streaming, compaction, etc) which would need different settings. What do you think about this? block_job_set_speed ------------------- Set maximum speed for a background block operation. This is a per-block device command that can only be issued when there is an active block job. Throttling can be disabled by setting the speed to 0. Arguments: - device: device name (json-string) - value: maximum speed, in bytes per second (json-int) Errors: NotSupported: job type does not support speed setting Example: -> { "execute": "block_job_set_speed", "arguments": { "device": "virtio0", "value": 1024 } } Stefan

On 07/18/2011 09:35 AM, Stefan Hajnoczi wrote:
On the other hand I suspect that we are missing the mechanism to control the rate of the transfer in the new API, which is something which could be implemented in the old incremental mechanism, but not anymore. So I wonder if the virDomainBlockPull() shouldn't get an "unsigned long bandwidth" (to stay coherent with migration APIs) before the flags. I don't know if the support is ready in QEmu but I suspect that will be an important point, so if not present will be added :-)
Hopefully Stefan can comment on any throttling mechanisms that might be added to the qemu implementation. It would be nice to have this feature built into the API to match the migration APIs, but if that is not feasible then we could always add it later.
If we follow the live migration API then a block_stream_set_speed command would be used. libvirt would issue it as a separate command immediately after starting the streaming operation. There is the window of time after streaming has started but before block_stream_set_speed has been called where no throttling takes place, but I don't think this is a practical problem.
It also opens the possibility of allowing the user to change the rate limit value while the block streaming operation is active.
In light of our decision to provide a generic BlockJobAbort/BlockJobInfo interface, should SetSpeed be generic too?
virDomainBlockJobSetSpeed() or virDomainBlockPullSetSpeed() ?
The block_stream_set_speed semantics allow the command to be used when no streaming operation is active. This can be used to set the speed without a window of time after creation and before set_speed is called.
If we switch to block_job_set_speed then it is cleaner to restrict the command to work on active jobs only. This is because there is only one "speed" variable kept but there might be multiple job types (streaming, compaction, etc) which would need different settings.
What do you think about this?
I think the block_job_set_speed semantics seem reasonable to me. What is the method for querying the current speed setting? I would suggest extending the query-block-job command to include this information. In this case, I would extend virDomainBlockJobInfo to contain: /* The maximum allowable bandwidth for this job (MB/s) */ unsigned long bandwidth;
block_job_set_speed -------------------
Set maximum speed for a background block operation.
This is a per-block device command that can only be issued when there is an active block job.
Throttling can be disabled by setting the speed to 0.
Arguments:
- device: device name (json-string) - value: maximum speed, in bytes per second (json-int)
Errors: NotSupported: job type does not support speed setting
Example:
-> { "execute": "block_job_set_speed", "arguments": { "device": "virtio0", "value": 1024 } }
Stefan
-- Adam Litke IBM Linux Technology Center

On Mon, Jul 18, 2011 at 9:10 PM, Adam Litke <agl@us.ibm.com> wrote:
On 07/18/2011 09:35 AM, Stefan Hajnoczi wrote:
On the other hand I suspect that we are missing the mechanism to control the rate of the transfer in the new API, which is something which could be implemented in the old incremental mechanism, but not anymore. So I wonder if the virDomainBlockPull() shouldn't get an "unsigned long bandwidth" (to stay coherent with migration APIs) before the flags. I don't know if the support is ready in QEmu but I suspect that will be an important point, so if not present will be added :-)
Hopefully Stefan can comment on any throttling mechanisms that might be added to the qemu implementation. It would be nice to have this feature built into the API to match the migration APIs, but if that is not feasible then we could always add it later.
If we follow the live migration API then a block_stream_set_speed command would be used. libvirt would issue it as a separate command immediately after starting the streaming operation. There is the window of time after streaming has started but before block_stream_set_speed has been called where no throttling takes place, but I don't think this is a practical problem.
It also opens the possibility of allowing the user to change the rate limit value while the block streaming operation is active.
In light of our decision to provide a generic BlockJobAbort/BlockJobInfo interface, should SetSpeed be generic too?
virDomainBlockJobSetSpeed() or virDomainBlockPullSetSpeed() ?
The block_stream_set_speed semantics allow the command to be used when no streaming operation is active. This can be used to set the speed without a window of time after creation and before set_speed is called.
If we switch to block_job_set_speed then it is cleaner to restrict the command to work on active jobs only. This is because there is only one "speed" variable kept but there might be multiple job types (streaming, compaction, etc) which would need different settings.
What do you think about this?
I think the block_job_set_speed semantics seem reasonable to me. What is the method for querying the current speed setting? I would suggest extending the query-block-job command to include this information. In this case, I would extend virDomainBlockJobInfo to contain:
/* The maximum allowable bandwidth for this job (MB/s) */ unsigned long bandwidth;
Working QEMU code is pushed to: http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/stream-command I am updating the QEMU wiki page with the latest changes. Stefan

On Wed, Jul 20, 2011 at 2:00 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
On Mon, Jul 18, 2011 at 9:10 PM, Adam Litke <agl@us.ibm.com> wrote:
On 07/18/2011 09:35 AM, Stefan Hajnoczi wrote:
> On the other hand I suspect that we are missing the mechanism to > control the rate of the transfer in the new API, which is something > which could be implemented in the old incremental mechanism, but not > anymore. So I wonder if the virDomainBlockPull() shouldn't get an > "unsigned long bandwidth" (to stay coherent with migration APIs) > before the flags. I don't know if the support is ready in QEmu but > I suspect that will be an important point, so if not present will > be added :-)
Hopefully Stefan can comment on any throttling mechanisms that might be added to the qemu implementation. It would be nice to have this feature built into the API to match the migration APIs, but if that is not feasible then we could always add it later.
If we follow the live migration API then a block_stream_set_speed command would be used. libvirt would issue it as a separate command immediately after starting the streaming operation. There is the window of time after streaming has started but before block_stream_set_speed has been called where no throttling takes place, but I don't think this is a practical problem.
It also opens the possibility of allowing the user to change the rate limit value while the block streaming operation is active.
In light of our decision to provide a generic BlockJobAbort/BlockJobInfo interface, should SetSpeed be generic too?
virDomainBlockJobSetSpeed() or virDomainBlockPullSetSpeed() ?
The block_stream_set_speed semantics allow the command to be used when no streaming operation is active. This can be used to set the speed without a window of time after creation and before set_speed is called.
If we switch to block_job_set_speed then it is cleaner to restrict the command to work on active jobs only. This is because there is only one "speed" variable kept but there might be multiple job types (streaming, compaction, etc) which would need different settings.
What do you think about this?
I think the block_job_set_speed semantics seem reasonable to me. What is the method for querying the current speed setting? I would suggest extending the query-block-job command to include this information. In this case, I would extend virDomainBlockJobInfo to contain:
/* The maximum allowable bandwidth for this job (MB/s) */ unsigned long bandwidth;
Working QEMU code is pushed to: http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/stream-command
I am updating the QEMU wiki page with the latest changes.
v4 posted: http://wiki.qemu.org/Features/LiveBlockMigration/ImageStreamingAPI Adam: block_job_set_speed fully updated. Kevin: we talked about error values earlier. Internally we really only have an errno value when streaming fails. I have converted this to a human-readable string. The semantics for clients is simply that streaming has failed and that there is an error message. There are no particular error constants to test and react to on the client side. Stefan

Am 20.07.2011 15:12, schrieb Stefan Hajnoczi:
On Wed, Jul 20, 2011 at 2:00 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
On Mon, Jul 18, 2011 at 9:10 PM, Adam Litke <agl@us.ibm.com> wrote:
On 07/18/2011 09:35 AM, Stefan Hajnoczi wrote:
>> On the other hand I suspect that we are missing the mechanism to >> control the rate of the transfer in the new API, which is something >> which could be implemented in the old incremental mechanism, but not >> anymore. So I wonder if the virDomainBlockPull() shouldn't get an >> "unsigned long bandwidth" (to stay coherent with migration APIs) >> before the flags. I don't know if the support is ready in QEmu but >> I suspect that will be an important point, so if not present will >> be added :-) > > Hopefully Stefan can comment on any throttling mechanisms that might be > added to the qemu implementation. It would be nice to have this feature > built into the API to match the migration APIs, but if that is not > feasible then we could always add it later.
If we follow the live migration API then a block_stream_set_speed command would be used. libvirt would issue it as a separate command immediately after starting the streaming operation. There is the window of time after streaming has started but before block_stream_set_speed has been called where no throttling takes place, but I don't think this is a practical problem.
It also opens the possibility of allowing the user to change the rate limit value while the block streaming operation is active.
In light of our decision to provide a generic BlockJobAbort/BlockJobInfo interface, should SetSpeed be generic too?
virDomainBlockJobSetSpeed() or virDomainBlockPullSetSpeed() ?
The block_stream_set_speed semantics allow the command to be used when no streaming operation is active. This can be used to set the speed without a window of time after creation and before set_speed is called.
If we switch to block_job_set_speed then it is cleaner to restrict the command to work on active jobs only. This is because there is only one "speed" variable kept but there might be multiple job types (streaming, compaction, etc) which would need different settings.
What do you think about this?
I think the block_job_set_speed semantics seem reasonable to me. What is the method for querying the current speed setting? I would suggest extending the query-block-job command to include this information. In this case, I would extend virDomainBlockJobInfo to contain:
/* The maximum allowable bandwidth for this job (MB/s) */ unsigned long bandwidth;
Working QEMU code is pushed to: http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/stream-command
I am updating the QEMU wiki page with the latest changes.
v4 posted: http://wiki.qemu.org/Features/LiveBlockMigration/ImageStreamingAPI
Adam: block_job_set_speed fully updated.
Kevin: we talked about error values earlier. Internally we really only have an errno value when streaming fails. I have converted this to a human-readable string.
If the human can read English. :-) And even though I do, I really don't like messages like I got from the Fedora updater in earlier versions ("You have 42 Aktualisierungen"), so plain strings aren't going to work well. This is true even for localised strerror output, as the client may use a different language than the server. I think this has been discussed multiple time in the past (we wanted to somehow expose something useful on I/O errors). Kevin

On Wed, Jul 20, 2011 at 2:40 PM, Kevin Wolf <kwolf@redhat.com> wrote:
Am 20.07.2011 15:12, schrieb Stefan Hajnoczi:
On Wed, Jul 20, 2011 at 2:00 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
On Mon, Jul 18, 2011 at 9:10 PM, Adam Litke <agl@us.ibm.com> wrote:
On 07/18/2011 09:35 AM, Stefan Hajnoczi wrote:
>>> On the other hand I suspect that we are missing the mechanism to >>> control the rate of the transfer in the new API, which is something >>> which could be implemented in the old incremental mechanism, but not >>> anymore. So I wonder if the virDomainBlockPull() shouldn't get an >>> "unsigned long bandwidth" (to stay coherent with migration APIs) >>> before the flags. I don't know if the support is ready in QEmu but >>> I suspect that will be an important point, so if not present will >>> be added :-) >> >> Hopefully Stefan can comment on any throttling mechanisms that might be >> added to the qemu implementation. It would be nice to have this feature >> built into the API to match the migration APIs, but if that is not >> feasible then we could always add it later. > > If we follow the live migration API then a block_stream_set_speed > command would be used. libvirt would issue it as a separate command > immediately after starting the streaming operation. There is the > window of time after streaming has started but before > block_stream_set_speed has been called where no throttling takes > place, but I don't think this is a practical problem. > > It also opens the possibility of allowing the user to change the rate > limit value while the block streaming operation is active.
In light of our decision to provide a generic BlockJobAbort/BlockJobInfo interface, should SetSpeed be generic too?
virDomainBlockJobSetSpeed() or virDomainBlockPullSetSpeed() ?
The block_stream_set_speed semantics allow the command to be used when no streaming operation is active. This can be used to set the speed without a window of time after creation and before set_speed is called.
If we switch to block_job_set_speed then it is cleaner to restrict the command to work on active jobs only. This is because there is only one "speed" variable kept but there might be multiple job types (streaming, compaction, etc) which would need different settings.
What do you think about this?
I think the block_job_set_speed semantics seem reasonable to me. What is the method for querying the current speed setting? I would suggest extending the query-block-job command to include this information. In this case, I would extend virDomainBlockJobInfo to contain:
/* The maximum allowable bandwidth for this job (MB/s) */ unsigned long bandwidth;
Working QEMU code is pushed to: http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/stream-command
I am updating the QEMU wiki page with the latest changes.
v4 posted: http://wiki.qemu.org/Features/LiveBlockMigration/ImageStreamingAPI
Adam: block_job_set_speed fully updated.
Kevin: we talked about error values earlier. Internally we really only have an errno value when streaming fails. I have converted this to a human-readable string.
If the human can read English. :-)
And even though I do, I really don't like messages like I got from the Fedora updater in earlier versions ("You have 42 Aktualisierungen"), so plain strings aren't going to work well. This is true even for localised strerror output, as the client may use a different language than the server.
I think this has been discussed multiple time in the past (we wanted to somehow expose something useful on I/O errors).
That's true but is there a better solution? I'm not very enthusiastic about localizing systems software because the error messages end up being gibberish in other languages anyway. The translations are often literal or babelfish-style when it comes to systems software. Segmentation fault, Bus error, Overflow - I cringe when I see them translated. It's a bit of a personal rant, I know, but wasted effort IMO. Still better than error code #1524-1352 :). Stefan

On Wed, Jul 20, 2011 at 3:25 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
On Wed, Jul 20, 2011 at 2:40 PM, Kevin Wolf <kwolf@redhat.com> wrote:
Am 20.07.2011 15:12, schrieb Stefan Hajnoczi:
On Wed, Jul 20, 2011 at 2:00 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
On Mon, Jul 18, 2011 at 9:10 PM, Adam Litke <agl@us.ibm.com> wrote:
On 07/18/2011 09:35 AM, Stefan Hajnoczi wrote:
>>>> On the other hand I suspect that we are missing the mechanism to >>>> control the rate of the transfer in the new API, which is something >>>> which could be implemented in the old incremental mechanism, but not >>>> anymore. So I wonder if the virDomainBlockPull() shouldn't get an >>>> "unsigned long bandwidth" (to stay coherent with migration APIs) >>>> before the flags. I don't know if the support is ready in QEmu but >>>> I suspect that will be an important point, so if not present will >>>> be added :-) >>> >>> Hopefully Stefan can comment on any throttling mechanisms that might be >>> added to the qemu implementation. It would be nice to have this feature >>> built into the API to match the migration APIs, but if that is not >>> feasible then we could always add it later. >> >> If we follow the live migration API then a block_stream_set_speed >> command would be used. libvirt would issue it as a separate command >> immediately after starting the streaming operation. There is the >> window of time after streaming has started but before >> block_stream_set_speed has been called where no throttling takes >> place, but I don't think this is a practical problem. >> >> It also opens the possibility of allowing the user to change the rate >> limit value while the block streaming operation is active. > > In light of our decision to provide a generic BlockJobAbort/BlockJobInfo > interface, should SetSpeed be generic too? > > virDomainBlockJobSetSpeed() or virDomainBlockPullSetSpeed() ?
The block_stream_set_speed semantics allow the command to be used when no streaming operation is active. This can be used to set the speed without a window of time after creation and before set_speed is called.
If we switch to block_job_set_speed then it is cleaner to restrict the command to work on active jobs only. This is because there is only one "speed" variable kept but there might be multiple job types (streaming, compaction, etc) which would need different settings.
What do you think about this?
I think the block_job_set_speed semantics seem reasonable to me. What is the method for querying the current speed setting? I would suggest extending the query-block-job command to include this information. In this case, I would extend virDomainBlockJobInfo to contain:
/* The maximum allowable bandwidth for this job (MB/s) */ unsigned long bandwidth;
Working QEMU code is pushed to: http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/stream-command
I am updating the QEMU wiki page with the latest changes.
v4 posted: http://wiki.qemu.org/Features/LiveBlockMigration/ImageStreamingAPI
Adam: block_job_set_speed fully updated.
Kevin: we talked about error values earlier. Internally we really only have an errno value when streaming fails. I have converted this to a human-readable string.
If the human can read English. :-)
And even though I do, I really don't like messages like I got from the Fedora updater in earlier versions ("You have 42 Aktualisierungen"), so plain strings aren't going to work well. This is true even for localised strerror output, as the client may use a different language than the server.
I think this has been discussed multiple time in the past (we wanted to somehow expose something useful on I/O errors).
That's true but is there a better solution?
I'm not very enthusiastic about localizing systems software because the error messages end up being gibberish in other languages anyway. The translations are often literal or babelfish-style when it comes to systems software. Segmentation fault, Bus error, Overflow - I cringe when I see them translated. It's a bit of a personal rant, I know, but wasted effort IMO. Still better than error code #1524-1352 :).
Also, there's the reverse problem of when someone has an error they need help with. They go on IRC or mailing lists but their errors are in a language no one able to help speaks. Way to go :). Stefan

On Wed, Jul 13, 2011 at 15:46:30 -0500, Adam Litke wrote: ...
/* * BlockPull API */
/* An iterator for initiating and monitoring block pull operations */ typedef unsigned long long virDomainBlockPullCursor;
typedef struct _virDomainBlockPullInfo virDomainBlockPullInfo; struct _virDomainBlockPullInfo { /* * The following fields provide an indication of block pull progress. @cur * indicates the current position and will be between 0 and @end. @end is * the final cursor position for this operation and represents completion. * To approximate progress, divide @cur by @end. */ virDomainBlockPullCursor cur; virDomainBlockPullCursor end; }; typedef virDomainBlockPullInfo *virDomainBlockPullInfoPtr; ... /** * virDomainBlockPullAbort: * @dom: pointer to domain object * @path: fully-qualified filename of disk * @flags: currently unused, for future extension * * Cancel a pull operation previously started by virDomainBlockPullAll(). * * Returns -1 in case of failure, 0 when successful. */ int virDomainBlockPullAbort(virDomainPtr dom, const char *path, unsigned int flags);
/** * virDomainGetBlockPullInfo: * @dom: pointer to domain object * @path: fully-qualified filename of disk * @info: pointer to a virDomainBlockPullInfo structure * @flags: currently unused, for future extension * * Request progress information on a block pull operation that has been started * with virDomainBlockPull(). If an operation is active for the given * parameters, @info will be updated with the current progress. * * Returns -1 in case of failure, 0 when successful. */ int virDomainGetBlockPullInfo(virDomainPtr dom, const char *path, virDomainBlockPullInfoPtr info, unsigned int flags);
/** * virConnectDomainEventBlockPullStatus: * * The final status of a virDomainBlockPull() operation */ typedef enum { VIR_DOMAIN_BLOCK_PULL_COMPLETED = 0, VIR_DOMAIN_BLOCK_PULL_FAILED = 1, } virConnectDomainEventBlockPullStatus;
/** * virConnectDomainEventBlockPullCallback: * @conn: connection object * @dom: domain on which the event occurred * @path: fully-qualified filename of the affected disk * @status: final status of the operation (virConnectDomainEventBlockPullStatus * * The callback signature to use when registering for an event of type * VIR_DOMAIN_EVENT_ID_BLOCK_PULL with virConnectDomainEventRegisterAny() */ typedef void (*virConnectDomainEventBlockPullCallback)(virConnectPtr conn, virDomainPtr dom, const char *path, int status, void *opaque);
Could these managing functions and event callback become general and usable by other block operations such as block copy? Also I think it would be nice to make the virDomainBlockPullInfo more similar to virDomainJobInfo. I was thinking about something along the following lines: typedef struct _virDomainBlockJobInfo virDomainBlockJobInfo; struct _virDomainBlockJobInfo { int job; /* BLOCK_PULL, BLOCK_COPY, ... */ int status; /* COMPLETED, FAILED, CANCELLED */ unsigned long long timeElapsed; unsigned long long total; unsigned long long processed; unsigned long long remaining; } int virDomainBlockJobAbort(virDomainPtr dom, const char *path, unsigned int flags); int virDomainGetBlockJobInfo(virDomainPtr dom, const char *path, virDomainBlockJobInfoPtr info, unsigned int flags); typedef void (*virConnectDomainEventBlockJobCallback)(virConnectPtr conn, virDomainPtr dom, const char *path, int job, int status); I doubt it's practical to support more than one block operation on a single disk at a time, since that sounds to me like asking for troubles and data corruption. But if someone thinks we should keep the door open for this, we can add 'job' parameter to Abort and Info APIs. What do you thing about that? Jirka

On Thu, Jul 14, 2011 at 10:21 AM, Jiri Denemark <jdenemar@redhat.com> wrote:
On Wed, Jul 13, 2011 at 15:46:30 -0500, Adam Litke wrote: ...
/* * BlockPull API */
/* An iterator for initiating and monitoring block pull operations */ typedef unsigned long long virDomainBlockPullCursor;
typedef struct _virDomainBlockPullInfo virDomainBlockPullInfo; struct _virDomainBlockPullInfo { /* * The following fields provide an indication of block pull progress. @cur * indicates the current position and will be between 0 and @end. @end is * the final cursor position for this operation and represents completion. * To approximate progress, divide @cur by @end. */ virDomainBlockPullCursor cur; virDomainBlockPullCursor end; }; typedef virDomainBlockPullInfo *virDomainBlockPullInfoPtr; ... /** * virDomainBlockPullAbort: * @dom: pointer to domain object * @path: fully-qualified filename of disk * @flags: currently unused, for future extension * * Cancel a pull operation previously started by virDomainBlockPullAll(). * * Returns -1 in case of failure, 0 when successful. */ int virDomainBlockPullAbort(virDomainPtr dom, const char *path, unsigned int flags);
/** * virDomainGetBlockPullInfo: * @dom: pointer to domain object * @path: fully-qualified filename of disk * @info: pointer to a virDomainBlockPullInfo structure * @flags: currently unused, for future extension * * Request progress information on a block pull operation that has been started * with virDomainBlockPull(). If an operation is active for the given * parameters, @info will be updated with the current progress. * * Returns -1 in case of failure, 0 when successful. */ int virDomainGetBlockPullInfo(virDomainPtr dom, const char *path, virDomainBlockPullInfoPtr info, unsigned int flags);
/** * virConnectDomainEventBlockPullStatus: * * The final status of a virDomainBlockPull() operation */ typedef enum { VIR_DOMAIN_BLOCK_PULL_COMPLETED = 0, VIR_DOMAIN_BLOCK_PULL_FAILED = 1, } virConnectDomainEventBlockPullStatus;
/** * virConnectDomainEventBlockPullCallback: * @conn: connection object * @dom: domain on which the event occurred * @path: fully-qualified filename of the affected disk * @status: final status of the operation (virConnectDomainEventBlockPullStatus * * The callback signature to use when registering for an event of type * VIR_DOMAIN_EVENT_ID_BLOCK_PULL with virConnectDomainEventRegisterAny() */ typedef void (*virConnectDomainEventBlockPullCallback)(virConnectPtr conn, virDomainPtr dom, const char *path, int status, void *opaque);
Could these managing functions and event callback become general and usable by other block operations such as block copy?
Hi Jiri, Live block copy will certainly be possible using the streaming API. Before you start streaming you need to use the snapshot_blkdev command to create the destination file with the source file as its backing image. You can then use the streaming API to copy data from the source file into the destination file. On completion the source file is no longer needed. Here is the QEMU image streaming API documentation: http://wiki.qemu.org/Features/LiveBlockMigration/ImageStreamingAPI At the bottom of that page you can see how live block copy can be implemented (thanks to Kevin Wolf and Marcelo Tosatti for driving that use case and explaing how snapshot_blkdev should be used). Stefan

On Thu, Jul 14, 2011 at 10:58:31 +0100, Stefan Hajnoczi wrote:
On Thu, Jul 14, 2011 at 10:21 AM, Jiri Denemark <jdenemar@redhat.com> wrote:
On Wed, Jul 13, 2011 at 15:46:30 -0500, Adam Litke wrote: ...
/* * BlockPull API */
/* An iterator for initiating and monitoring block pull operations */ typedef unsigned long long virDomainBlockPullCursor;
typedef struct _virDomainBlockPullInfo virDomainBlockPullInfo; struct _virDomainBlockPullInfo { /* * The following fields provide an indication of block pull progress. @cur * indicates the current position and will be between 0 and @end. @end is * the final cursor position for this operation and represents completion. * To approximate progress, divide @cur by @end. */ virDomainBlockPullCursor cur; virDomainBlockPullCursor end; }; typedef virDomainBlockPullInfo *virDomainBlockPullInfoPtr; ... /** * virDomainBlockPullAbort: * @dom: pointer to domain object * @path: fully-qualified filename of disk * @flags: currently unused, for future extension * * Cancel a pull operation previously started by virDomainBlockPullAll(). * * Returns -1 in case of failure, 0 when successful. */ int virDomainBlockPullAbort(virDomainPtr dom, const char *path, unsigned int flags);
/** * virDomainGetBlockPullInfo: * @dom: pointer to domain object * @path: fully-qualified filename of disk * @info: pointer to a virDomainBlockPullInfo structure * @flags: currently unused, for future extension * * Request progress information on a block pull operation that has been started * with virDomainBlockPull(). If an operation is active for the given * parameters, @info will be updated with the current progress. * * Returns -1 in case of failure, 0 when successful. */ int virDomainGetBlockPullInfo(virDomainPtr dom, const char *path, virDomainBlockPullInfoPtr info, unsigned int flags);
/** * virConnectDomainEventBlockPullStatus: * * The final status of a virDomainBlockPull() operation */ typedef enum { VIR_DOMAIN_BLOCK_PULL_COMPLETED = 0, VIR_DOMAIN_BLOCK_PULL_FAILED = 1, } virConnectDomainEventBlockPullStatus;
/** * virConnectDomainEventBlockPullCallback: * @conn: connection object * @dom: domain on which the event occurred * @path: fully-qualified filename of the affected disk * @status: final status of the operation (virConnectDomainEventBlockPullStatus * * The callback signature to use when registering for an event of type * VIR_DOMAIN_EVENT_ID_BLOCK_PULL with virConnectDomainEventRegisterAny() */ typedef void (*virConnectDomainEventBlockPullCallback)(virConnectPtr conn, virDomainPtr dom, const char *path, int status, void *opaque);
Could these managing functions and event callback become general and usable by other block operations such as block copy?
Hi Jiri, Live block copy will certainly be possible using the streaming API. Before you start streaming you need to use the snapshot_blkdev command to create the destination file with the source file as its backing image. You can then use the streaming API to copy data from the source file into the destination file. On completion the source file is no longer needed.
Well, I'm not talking about using the same API for block copy or implementing block copy internally as block streaming. I'm talking about making GetInfo, Abort and event callback general to be usable not only for block streaming or block copy but also for other possible block operations in the future. The reason is that starting a block operation (streaming, copy, whatever) may need different parameters so they should be different APIs. But once the operation is started, we just need to know how far it got and we need the ability to abort the job. So this can share the same APIs for all operations. It doesn't make sense to require any block operation to provide their own set of managing APIs. It's analogous to virDomainSave, virDomainMigrate, virDomainCoreDump. They are all different jobs but all of them can be monitored and aborted using virDomainGetJobInfo and virDomainAbortJob. Jirka

On Thu, Jul 14, 2011 at 11:19 AM, Jiri Denemark <jdenemar@redhat.com> wrote:
On Thu, Jul 14, 2011 at 10:58:31 +0100, Stefan Hajnoczi wrote:
On Thu, Jul 14, 2011 at 10:21 AM, Jiri Denemark <jdenemar@redhat.com> wrote:
On Wed, Jul 13, 2011 at 15:46:30 -0500, Adam Litke wrote: ...
/* * BlockPull API */
/* An iterator for initiating and monitoring block pull operations */ typedef unsigned long long virDomainBlockPullCursor;
typedef struct _virDomainBlockPullInfo virDomainBlockPullInfo; struct _virDomainBlockPullInfo { /* * The following fields provide an indication of block pull progress. @cur * indicates the current position and will be between 0 and @end. @end is * the final cursor position for this operation and represents completion. * To approximate progress, divide @cur by @end. */ virDomainBlockPullCursor cur; virDomainBlockPullCursor end; }; typedef virDomainBlockPullInfo *virDomainBlockPullInfoPtr; ... /** * virDomainBlockPullAbort: * @dom: pointer to domain object * @path: fully-qualified filename of disk * @flags: currently unused, for future extension * * Cancel a pull operation previously started by virDomainBlockPullAll(). * * Returns -1 in case of failure, 0 when successful. */ int virDomainBlockPullAbort(virDomainPtr dom, const char *path, unsigned int flags);
/** * virDomainGetBlockPullInfo: * @dom: pointer to domain object * @path: fully-qualified filename of disk * @info: pointer to a virDomainBlockPullInfo structure * @flags: currently unused, for future extension * * Request progress information on a block pull operation that has been started * with virDomainBlockPull(). If an operation is active for the given * parameters, @info will be updated with the current progress. * * Returns -1 in case of failure, 0 when successful. */ int virDomainGetBlockPullInfo(virDomainPtr dom, const char *path, virDomainBlockPullInfoPtr info, unsigned int flags);
/** * virConnectDomainEventBlockPullStatus: * * The final status of a virDomainBlockPull() operation */ typedef enum { VIR_DOMAIN_BLOCK_PULL_COMPLETED = 0, VIR_DOMAIN_BLOCK_PULL_FAILED = 1, } virConnectDomainEventBlockPullStatus;
/** * virConnectDomainEventBlockPullCallback: * @conn: connection object * @dom: domain on which the event occurred * @path: fully-qualified filename of the affected disk * @status: final status of the operation (virConnectDomainEventBlockPullStatus * * The callback signature to use when registering for an event of type * VIR_DOMAIN_EVENT_ID_BLOCK_PULL with virConnectDomainEventRegisterAny() */ typedef void (*virConnectDomainEventBlockPullCallback)(virConnectPtr conn, virDomainPtr dom, const char *path, int status, void *opaque);
Could these managing functions and event callback become general and usable by other block operations such as block copy?
Hi Jiri, Live block copy will certainly be possible using the streaming API. Before you start streaming you need to use the snapshot_blkdev command to create the destination file with the source file as its backing image. You can then use the streaming API to copy data from the source file into the destination file. On completion the source file is no longer needed.
Well, I'm not talking about using the same API for block copy or implementing block copy internally as block streaming. I'm talking about making GetInfo, Abort and event callback general to be usable not only for block streaming or block copy but also for other possible block operations in the future.
The reason is that starting a block operation (streaming, copy, whatever) may need different parameters so they should be different APIs. But once the operation is started, we just need to know how far it got and we need the ability to abort the job. So this can share the same APIs for all operations. It doesn't make sense to require any block operation to provide their own set of managing APIs.
It's analogous to virDomainSave, virDomainMigrate, virDomainCoreDump. They are all different jobs but all of them can be monitored and aborted using virDomainGetJobInfo and virDomainAbortJob.
I understand. I can't comment on libvirt API specifics but yes, there's a similarity here and a chance we'll have other operations in the future too. I'm thinking things like compacting images, compressing images, etc. Stefan

On 07/14/2011 05:33 AM, Stefan Hajnoczi wrote:
On Thu, Jul 14, 2011 at 11:19 AM, Jiri Denemark <jdenemar@redhat.com> wrote:
On Thu, Jul 14, 2011 at 10:58:31 +0100, Stefan Hajnoczi wrote:
On Thu, Jul 14, 2011 at 10:21 AM, Jiri Denemark <jdenemar@redhat.com> wrote:
On Wed, Jul 13, 2011 at 15:46:30 -0500, Adam Litke wrote: ...
/* * BlockPull API */
/* An iterator for initiating and monitoring block pull operations */ typedef unsigned long long virDomainBlockPullCursor;
typedef struct _virDomainBlockPullInfo virDomainBlockPullInfo; struct _virDomainBlockPullInfo { /* * The following fields provide an indication of block pull progress. @cur * indicates the current position and will be between 0 and @end. @end is * the final cursor position for this operation and represents completion. * To approximate progress, divide @cur by @end. */ virDomainBlockPullCursor cur; virDomainBlockPullCursor end; }; typedef virDomainBlockPullInfo *virDomainBlockPullInfoPtr; ... /** * virDomainBlockPullAbort: * @dom: pointer to domain object * @path: fully-qualified filename of disk * @flags: currently unused, for future extension * * Cancel a pull operation previously started by virDomainBlockPullAll(). * * Returns -1 in case of failure, 0 when successful. */ int virDomainBlockPullAbort(virDomainPtr dom, const char *path, unsigned int flags);
/** * virDomainGetBlockPullInfo: * @dom: pointer to domain object * @path: fully-qualified filename of disk * @info: pointer to a virDomainBlockPullInfo structure * @flags: currently unused, for future extension * * Request progress information on a block pull operation that has been started * with virDomainBlockPull(). If an operation is active for the given * parameters, @info will be updated with the current progress. * * Returns -1 in case of failure, 0 when successful. */ int virDomainGetBlockPullInfo(virDomainPtr dom, const char *path, virDomainBlockPullInfoPtr info, unsigned int flags);
/** * virConnectDomainEventBlockPullStatus: * * The final status of a virDomainBlockPull() operation */ typedef enum { VIR_DOMAIN_BLOCK_PULL_COMPLETED = 0, VIR_DOMAIN_BLOCK_PULL_FAILED = 1, } virConnectDomainEventBlockPullStatus;
/** * virConnectDomainEventBlockPullCallback: * @conn: connection object * @dom: domain on which the event occurred * @path: fully-qualified filename of the affected disk * @status: final status of the operation (virConnectDomainEventBlockPullStatus * * The callback signature to use when registering for an event of type * VIR_DOMAIN_EVENT_ID_BLOCK_PULL with virConnectDomainEventRegisterAny() */ typedef void (*virConnectDomainEventBlockPullCallback)(virConnectPtr conn, virDomainPtr dom, const char *path, int status, void *opaque);
Could these managing functions and event callback become general and usable by other block operations such as block copy?
Hi Jiri, Live block copy will certainly be possible using the streaming API. Before you start streaming you need to use the snapshot_blkdev command to create the destination file with the source file as its backing image. You can then use the streaming API to copy data from the source file into the destination file. On completion the source file is no longer needed.
Well, I'm not talking about using the same API for block copy or implementing block copy internally as block streaming. I'm talking about making GetInfo, Abort and event callback general to be usable not only for block streaming or block copy but also for other possible block operations in the future.
The reason is that starting a block operation (streaming, copy, whatever) may need different parameters so they should be different APIs. But once the operation is started, we just need to know how far it got and we need the ability to abort the job. So this can share the same APIs for all operations. It doesn't make sense to require any block operation to provide their own set of managing APIs.
It's analogous to virDomainSave, virDomainMigrate, virDomainCoreDump. They are all different jobs but all of them can be monitored and aborted using virDomainGetJobInfo and virDomainAbortJob.
I understand. I can't comment on libvirt API specifics but yes, there's a similarity here and a chance we'll have other operations in the future too.
I'm thinking things like compacting images, compressing images, etc.
From a libvirt API perspective, I would only want to merge the abort and info commands if the underlying qemu monitor commands are also merged. Otherwise, we'll need to maintain state on which types of jobs are active on which devices and the added complexity erases any potential benefits IMO.
Stefan, any chance that block operation info and cancellation could be unified at the qemu level? -- Adam Litke IBM Linux Technology Center

On Thu, Jul 14, 2011 at 7:54 PM, Adam Litke <agl@us.ibm.com> wrote:
On 07/14/2011 05:33 AM, Stefan Hajnoczi wrote:
On Thu, Jul 14, 2011 at 11:19 AM, Jiri Denemark <jdenemar@redhat.com> wrote:
On Thu, Jul 14, 2011 at 10:58:31 +0100, Stefan Hajnoczi wrote:
On Thu, Jul 14, 2011 at 10:21 AM, Jiri Denemark <jdenemar@redhat.com> wrote:
On Wed, Jul 13, 2011 at 15:46:30 -0500, Adam Litke wrote: ...
/* * BlockPull API */
/* An iterator for initiating and monitoring block pull operations */ typedef unsigned long long virDomainBlockPullCursor;
typedef struct _virDomainBlockPullInfo virDomainBlockPullInfo; struct _virDomainBlockPullInfo { /* * The following fields provide an indication of block pull progress. @cur * indicates the current position and will be between 0 and @end. @end is * the final cursor position for this operation and represents completion. * To approximate progress, divide @cur by @end. */ virDomainBlockPullCursor cur; virDomainBlockPullCursor end; }; typedef virDomainBlockPullInfo *virDomainBlockPullInfoPtr; ... /** * virDomainBlockPullAbort: * @dom: pointer to domain object * @path: fully-qualified filename of disk * @flags: currently unused, for future extension * * Cancel a pull operation previously started by virDomainBlockPullAll(). * * Returns -1 in case of failure, 0 when successful. */ int virDomainBlockPullAbort(virDomainPtr dom, const char *path, unsigned int flags);
/** * virDomainGetBlockPullInfo: * @dom: pointer to domain object * @path: fully-qualified filename of disk * @info: pointer to a virDomainBlockPullInfo structure * @flags: currently unused, for future extension * * Request progress information on a block pull operation that has been started * with virDomainBlockPull(). If an operation is active for the given * parameters, @info will be updated with the current progress. * * Returns -1 in case of failure, 0 when successful. */ int virDomainGetBlockPullInfo(virDomainPtr dom, const char *path, virDomainBlockPullInfoPtr info, unsigned int flags);
/** * virConnectDomainEventBlockPullStatus: * * The final status of a virDomainBlockPull() operation */ typedef enum { VIR_DOMAIN_BLOCK_PULL_COMPLETED = 0, VIR_DOMAIN_BLOCK_PULL_FAILED = 1, } virConnectDomainEventBlockPullStatus;
/** * virConnectDomainEventBlockPullCallback: * @conn: connection object * @dom: domain on which the event occurred * @path: fully-qualified filename of the affected disk * @status: final status of the operation (virConnectDomainEventBlockPullStatus * * The callback signature to use when registering for an event of type * VIR_DOMAIN_EVENT_ID_BLOCK_PULL with virConnectDomainEventRegisterAny() */ typedef void (*virConnectDomainEventBlockPullCallback)(virConnectPtr conn, virDomainPtr dom, const char *path, int status, void *opaque);
Could these managing functions and event callback become general and usable by other block operations such as block copy?
Hi Jiri, Live block copy will certainly be possible using the streaming API. Before you start streaming you need to use the snapshot_blkdev command to create the destination file with the source file as its backing image. You can then use the streaming API to copy data from the source file into the destination file. On completion the source file is no longer needed.
Well, I'm not talking about using the same API for block copy or implementing block copy internally as block streaming. I'm talking about making GetInfo, Abort and event callback general to be usable not only for block streaming or block copy but also for other possible block operations in the future.
The reason is that starting a block operation (streaming, copy, whatever) may need different parameters so they should be different APIs. But once the operation is started, we just need to know how far it got and we need the ability to abort the job. So this can share the same APIs for all operations. It doesn't make sense to require any block operation to provide their own set of managing APIs.
It's analogous to virDomainSave, virDomainMigrate, virDomainCoreDump. They are all different jobs but all of them can be monitored and aborted using virDomainGetJobInfo and virDomainAbortJob.
I understand. I can't comment on libvirt API specifics but yes, there's a similarity here and a chance we'll have other operations in the future too.
I'm thinking things like compacting images, compressing images, etc.
From a libvirt API perspective, I would only want to merge the abort and info commands if the underlying qemu monitor commands are also merged. Otherwise, we'll need to maintain state on which types of jobs are active on which devices and the added complexity erases any potential benefits IMO.
Stefan, any chance that block operation info and cancellation could be unified at the qemu level?
The commands would become "block_job_cancel" and "info block-job". block_stream would start a job, which basically means: BlockJob { BlockDriverState *bs; /* the block device */ BlockJobType type; /* BLOCK_JOB_STREAMING */ int64_t offset; /* progress info */ int64_t end; }; We probably want to generalize the offset/end pair to have no units. They are simply used to indicate progress information. When the job completes offset == end. The BLOCK_STREAM_COMPLETED event also needs to be generalized to BLOCK_JOB_COMPLETED. This approach is reasonable although I do worry that we don't understand the requirements for future block device background jobs. For example, jobs that perform an operation on two block devices instead of just one. With QMP we are extensible and can add fields for specific job types, so hopefully we can get by without finding the API inadequate for future block jobs. Kevin: What do you think about providing a job interface instead of tying the cancel and info operations specifically to block_stream? Stefan

Am 15.07.2011 12:50, schrieb Stefan Hajnoczi:
On Thu, Jul 14, 2011 at 7:54 PM, Adam Litke <agl@us.ibm.com> wrote:
On 07/14/2011 05:33 AM, Stefan Hajnoczi wrote:
On Thu, Jul 14, 2011 at 11:19 AM, Jiri Denemark <jdenemar@redhat.com> wrote:
On Thu, Jul 14, 2011 at 10:58:31 +0100, Stefan Hajnoczi wrote:
On Thu, Jul 14, 2011 at 10:21 AM, Jiri Denemark <jdenemar@redhat.com> wrote:
On Wed, Jul 13, 2011 at 15:46:30 -0500, Adam Litke wrote: ... > /* > * BlockPull API > */ > > /* An iterator for initiating and monitoring block pull operations */ > typedef unsigned long long virDomainBlockPullCursor; > > typedef struct _virDomainBlockPullInfo virDomainBlockPullInfo; > struct _virDomainBlockPullInfo { > /* > * The following fields provide an indication of block pull progress. @cur > * indicates the current position and will be between 0 and @end. @end is > * the final cursor position for this operation and represents completion. > * To approximate progress, divide @cur by @end. > */ > virDomainBlockPullCursor cur; > virDomainBlockPullCursor end; > }; > typedef virDomainBlockPullInfo *virDomainBlockPullInfoPtr; ... > /** > * virDomainBlockPullAbort: > * @dom: pointer to domain object > * @path: fully-qualified filename of disk > * @flags: currently unused, for future extension > * > * Cancel a pull operation previously started by virDomainBlockPullAll(). > * > * Returns -1 in case of failure, 0 when successful. > */ > int virDomainBlockPullAbort(virDomainPtr dom, > const char *path, > unsigned int flags); > > /** > * virDomainGetBlockPullInfo: > * @dom: pointer to domain object > * @path: fully-qualified filename of disk > * @info: pointer to a virDomainBlockPullInfo structure > * @flags: currently unused, for future extension > * > * Request progress information on a block pull operation that has been started > * with virDomainBlockPull(). If an operation is active for the given > * parameters, @info will be updated with the current progress. > * > * Returns -1 in case of failure, 0 when successful. > */ > int virDomainGetBlockPullInfo(virDomainPtr dom, > const char *path, > virDomainBlockPullInfoPtr info, > unsigned int flags); > > /** > * virConnectDomainEventBlockPullStatus: > * > * The final status of a virDomainBlockPull() operation > */ > typedef enum { > VIR_DOMAIN_BLOCK_PULL_COMPLETED = 0, > VIR_DOMAIN_BLOCK_PULL_FAILED = 1, > } virConnectDomainEventBlockPullStatus; > > /** > * virConnectDomainEventBlockPullCallback: > * @conn: connection object > * @dom: domain on which the event occurred > * @path: fully-qualified filename of the affected disk > * @status: final status of the operation (virConnectDomainEventBlockPullStatus > * > * The callback signature to use when registering for an event of type > * VIR_DOMAIN_EVENT_ID_BLOCK_PULL with virConnectDomainEventRegisterAny() > */ > typedef void (*virConnectDomainEventBlockPullCallback)(virConnectPtr conn, > virDomainPtr dom, > const char *path, > int status, > void *opaque);
Could these managing functions and event callback become general and usable by other block operations such as block copy?
Hi Jiri, Live block copy will certainly be possible using the streaming API. Before you start streaming you need to use the snapshot_blkdev command to create the destination file with the source file as its backing image. You can then use the streaming API to copy data from the source file into the destination file. On completion the source file is no longer needed.
Well, I'm not talking about using the same API for block copy or implementing block copy internally as block streaming. I'm talking about making GetInfo, Abort and event callback general to be usable not only for block streaming or block copy but also for other possible block operations in the future.
The reason is that starting a block operation (streaming, copy, whatever) may need different parameters so they should be different APIs. But once the operation is started, we just need to know how far it got and we need the ability to abort the job. So this can share the same APIs for all operations. It doesn't make sense to require any block operation to provide their own set of managing APIs.
It's analogous to virDomainSave, virDomainMigrate, virDomainCoreDump. They are all different jobs but all of them can be monitored and aborted using virDomainGetJobInfo and virDomainAbortJob.
I understand. I can't comment on libvirt API specifics but yes, there's a similarity here and a chance we'll have other operations in the future too.
I'm thinking things like compacting images, compressing images, etc.
From a libvirt API perspective, I would only want to merge the abort and info commands if the underlying qemu monitor commands are also merged. Otherwise, we'll need to maintain state on which types of jobs are active on which devices and the added complexity erases any potential benefits IMO.
Stefan, any chance that block operation info and cancellation could be unified at the qemu level?
The commands would become "block_job_cancel" and "info block-job".
block_stream would start a job, which basically means: BlockJob { BlockDriverState *bs; /* the block device */ BlockJobType type; /* BLOCK_JOB_STREAMING */ int64_t offset; /* progress info */ int64_t end; };
We probably want to generalize the offset/end pair to have no units. They are simply used to indicate progress information. When the job completes offset == end.
The BLOCK_STREAM_COMPLETED event also needs to be generalized to BLOCK_JOB_COMPLETED.
This approach is reasonable although I do worry that we don't understand the requirements for future block device background jobs. For example, jobs that perform an operation on two block devices instead of just one. With QMP we are extensible and can add fields for specific job types, so hopefully we can get by without finding the API inadequate for future block jobs.
Kevin: What do you think about providing a job interface instead of tying the cancel and info operations specifically to block_stream?
I guess it's a reasonable thing to do. We shouldn't overengineer here and try to solve all problems in qemu at once, but as long as we're basically talking about renaming what our current streaming/block copy concept needs anyway, I don't see a problem. Kevin

On Fri, Jul 15, 2011 at 12:05 PM, Kevin Wolf <kwolf@redhat.com> wrote:
Am 15.07.2011 12:50, schrieb Stefan Hajnoczi:
On Thu, Jul 14, 2011 at 7:54 PM, Adam Litke <agl@us.ibm.com> wrote:
On 07/14/2011 05:33 AM, Stefan Hajnoczi wrote:
On Thu, Jul 14, 2011 at 11:19 AM, Jiri Denemark <jdenemar@redhat.com> wrote:
On Thu, Jul 14, 2011 at 10:58:31 +0100, Stefan Hajnoczi wrote:
On Thu, Jul 14, 2011 at 10:21 AM, Jiri Denemark <jdenemar@redhat.com> wrote: > On Wed, Jul 13, 2011 at 15:46:30 -0500, Adam Litke wrote: > ... >> /* >> * BlockPull API >> */ >> >> /* An iterator for initiating and monitoring block pull operations */ >> typedef unsigned long long virDomainBlockPullCursor; >> >> typedef struct _virDomainBlockPullInfo virDomainBlockPullInfo; >> struct _virDomainBlockPullInfo { >> /* >> * The following fields provide an indication of block pull progress. @cur >> * indicates the current position and will be between 0 and @end. @end is >> * the final cursor position for this operation and represents completion. >> * To approximate progress, divide @cur by @end. >> */ >> virDomainBlockPullCursor cur; >> virDomainBlockPullCursor end; >> }; >> typedef virDomainBlockPullInfo *virDomainBlockPullInfoPtr; > ... >> /** >> * virDomainBlockPullAbort: >> * @dom: pointer to domain object >> * @path: fully-qualified filename of disk >> * @flags: currently unused, for future extension >> * >> * Cancel a pull operation previously started by virDomainBlockPullAll(). >> * >> * Returns -1 in case of failure, 0 when successful. >> */ >> int virDomainBlockPullAbort(virDomainPtr dom, >> const char *path, >> unsigned int flags); >> >> /** >> * virDomainGetBlockPullInfo: >> * @dom: pointer to domain object >> * @path: fully-qualified filename of disk >> * @info: pointer to a virDomainBlockPullInfo structure >> * @flags: currently unused, for future extension >> * >> * Request progress information on a block pull operation that has been started >> * with virDomainBlockPull(). If an operation is active for the given >> * parameters, @info will be updated with the current progress. >> * >> * Returns -1 in case of failure, 0 when successful. >> */ >> int virDomainGetBlockPullInfo(virDomainPtr dom, >> const char *path, >> virDomainBlockPullInfoPtr info, >> unsigned int flags); >> >> /** >> * virConnectDomainEventBlockPullStatus: >> * >> * The final status of a virDomainBlockPull() operation >> */ >> typedef enum { >> VIR_DOMAIN_BLOCK_PULL_COMPLETED = 0, >> VIR_DOMAIN_BLOCK_PULL_FAILED = 1, >> } virConnectDomainEventBlockPullStatus; >> >> /** >> * virConnectDomainEventBlockPullCallback: >> * @conn: connection object >> * @dom: domain on which the event occurred >> * @path: fully-qualified filename of the affected disk >> * @status: final status of the operation (virConnectDomainEventBlockPullStatus >> * >> * The callback signature to use when registering for an event of type >> * VIR_DOMAIN_EVENT_ID_BLOCK_PULL with virConnectDomainEventRegisterAny() >> */ >> typedef void (*virConnectDomainEventBlockPullCallback)(virConnectPtr conn, >> virDomainPtr dom, >> const char *path, >> int status, >> void *opaque); > > Could these managing functions and event callback become general and usable by > other block operations such as block copy?
Hi Jiri, Live block copy will certainly be possible using the streaming API. Before you start streaming you need to use the snapshot_blkdev command to create the destination file with the source file as its backing image. You can then use the streaming API to copy data from the source file into the destination file. On completion the source file is no longer needed.
Well, I'm not talking about using the same API for block copy or implementing block copy internally as block streaming. I'm talking about making GetInfo, Abort and event callback general to be usable not only for block streaming or block copy but also for other possible block operations in the future.
The reason is that starting a block operation (streaming, copy, whatever) may need different parameters so they should be different APIs. But once the operation is started, we just need to know how far it got and we need the ability to abort the job. So this can share the same APIs for all operations. It doesn't make sense to require any block operation to provide their own set of managing APIs.
It's analogous to virDomainSave, virDomainMigrate, virDomainCoreDump. They are all different jobs but all of them can be monitored and aborted using virDomainGetJobInfo and virDomainAbortJob.
I understand. I can't comment on libvirt API specifics but yes, there's a similarity here and a chance we'll have other operations in the future too.
I'm thinking things like compacting images, compressing images, etc.
From a libvirt API perspective, I would only want to merge the abort and info commands if the underlying qemu monitor commands are also merged. Otherwise, we'll need to maintain state on which types of jobs are active on which devices and the added complexity erases any potential benefits IMO.
Stefan, any chance that block operation info and cancellation could be unified at the qemu level?
The commands would become "block_job_cancel" and "info block-job".
block_stream would start a job, which basically means: BlockJob { BlockDriverState *bs; /* the block device */ BlockJobType type; /* BLOCK_JOB_STREAMING */ int64_t offset; /* progress info */ int64_t end; };
We probably want to generalize the offset/end pair to have no units. They are simply used to indicate progress information. When the job completes offset == end.
The BLOCK_STREAM_COMPLETED event also needs to be generalized to BLOCK_JOB_COMPLETED.
This approach is reasonable although I do worry that we don't understand the requirements for future block device background jobs. For example, jobs that perform an operation on two block devices instead of just one. With QMP we are extensible and can add fields for specific job types, so hopefully we can get by without finding the API inadequate for future block jobs.
Kevin: What do you think about providing a job interface instead of tying the cancel and info operations specifically to block_stream?
I guess it's a reasonable thing to do. We shouldn't overengineer here and try to solve all problems in qemu at once, but as long as we're basically talking about renaming what our current streaming/block copy concept needs anyway, I don't see a problem.
I would leave block_stream as-is because the job start function is where most of the variability comes in. I think it's not worth doing a generic block_job_start function where you have to say type=streaming. /me is off to update the image streaming API wiki page (and have lunch). Stefan

On Fri, Jul 15, 2011 at 12:07 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
On Fri, Jul 15, 2011 at 12:05 PM, Kevin Wolf <kwolf@redhat.com> wrote:
Am 15.07.2011 12:50, schrieb Stefan Hajnoczi:
On Thu, Jul 14, 2011 at 7:54 PM, Adam Litke <agl@us.ibm.com> wrote:
On 07/14/2011 05:33 AM, Stefan Hajnoczi wrote:
On Thu, Jul 14, 2011 at 11:19 AM, Jiri Denemark <jdenemar@redhat.com> wrote:
On Thu, Jul 14, 2011 at 10:58:31 +0100, Stefan Hajnoczi wrote: > On Thu, Jul 14, 2011 at 10:21 AM, Jiri Denemark <jdenemar@redhat.com> wrote: >> On Wed, Jul 13, 2011 at 15:46:30 -0500, Adam Litke wrote: >> ... >>> /* >>> * BlockPull API >>> */ >>> >>> /* An iterator for initiating and monitoring block pull operations */ >>> typedef unsigned long long virDomainBlockPullCursor; >>> >>> typedef struct _virDomainBlockPullInfo virDomainBlockPullInfo; >>> struct _virDomainBlockPullInfo { >>> /* >>> * The following fields provide an indication of block pull progress. @cur >>> * indicates the current position and will be between 0 and @end. @end is >>> * the final cursor position for this operation and represents completion. >>> * To approximate progress, divide @cur by @end. >>> */ >>> virDomainBlockPullCursor cur; >>> virDomainBlockPullCursor end; >>> }; >>> typedef virDomainBlockPullInfo *virDomainBlockPullInfoPtr; >> ... >>> /** >>> * virDomainBlockPullAbort: >>> * @dom: pointer to domain object >>> * @path: fully-qualified filename of disk >>> * @flags: currently unused, for future extension >>> * >>> * Cancel a pull operation previously started by virDomainBlockPullAll(). >>> * >>> * Returns -1 in case of failure, 0 when successful. >>> */ >>> int virDomainBlockPullAbort(virDomainPtr dom, >>> const char *path, >>> unsigned int flags); >>> >>> /** >>> * virDomainGetBlockPullInfo: >>> * @dom: pointer to domain object >>> * @path: fully-qualified filename of disk >>> * @info: pointer to a virDomainBlockPullInfo structure >>> * @flags: currently unused, for future extension >>> * >>> * Request progress information on a block pull operation that has been started >>> * with virDomainBlockPull(). If an operation is active for the given >>> * parameters, @info will be updated with the current progress. >>> * >>> * Returns -1 in case of failure, 0 when successful. >>> */ >>> int virDomainGetBlockPullInfo(virDomainPtr dom, >>> const char *path, >>> virDomainBlockPullInfoPtr info, >>> unsigned int flags); >>> >>> /** >>> * virConnectDomainEventBlockPullStatus: >>> * >>> * The final status of a virDomainBlockPull() operation >>> */ >>> typedef enum { >>> VIR_DOMAIN_BLOCK_PULL_COMPLETED = 0, >>> VIR_DOMAIN_BLOCK_PULL_FAILED = 1, >>> } virConnectDomainEventBlockPullStatus; >>> >>> /** >>> * virConnectDomainEventBlockPullCallback: >>> * @conn: connection object >>> * @dom: domain on which the event occurred >>> * @path: fully-qualified filename of the affected disk >>> * @status: final status of the operation (virConnectDomainEventBlockPullStatus >>> * >>> * The callback signature to use when registering for an event of type >>> * VIR_DOMAIN_EVENT_ID_BLOCK_PULL with virConnectDomainEventRegisterAny() >>> */ >>> typedef void (*virConnectDomainEventBlockPullCallback)(virConnectPtr conn, >>> virDomainPtr dom, >>> const char *path, >>> int status, >>> void *opaque); >> >> Could these managing functions and event callback become general and usable by >> other block operations such as block copy? > > Hi Jiri, > Live block copy will certainly be possible using the streaming API. > Before you start streaming you need to use the snapshot_blkdev command > to create the destination file with the source file as its backing > image. You can then use the streaming API to copy data from the > source file into the destination file. On completion the source file > is no longer needed.
Well, I'm not talking about using the same API for block copy or implementing block copy internally as block streaming. I'm talking about making GetInfo, Abort and event callback general to be usable not only for block streaming or block copy but also for other possible block operations in the future.
The reason is that starting a block operation (streaming, copy, whatever) may need different parameters so they should be different APIs. But once the operation is started, we just need to know how far it got and we need the ability to abort the job. So this can share the same APIs for all operations. It doesn't make sense to require any block operation to provide their own set of managing APIs.
It's analogous to virDomainSave, virDomainMigrate, virDomainCoreDump. They are all different jobs but all of them can be monitored and aborted using virDomainGetJobInfo and virDomainAbortJob.
I understand. I can't comment on libvirt API specifics but yes, there's a similarity here and a chance we'll have other operations in the future too.
I'm thinking things like compacting images, compressing images, etc.
From a libvirt API perspective, I would only want to merge the abort and info commands if the underlying qemu monitor commands are also merged. Otherwise, we'll need to maintain state on which types of jobs are active on which devices and the added complexity erases any potential benefits IMO.
Stefan, any chance that block operation info and cancellation could be unified at the qemu level?
The commands would become "block_job_cancel" and "info block-job".
block_stream would start a job, which basically means: BlockJob { BlockDriverState *bs; /* the block device */ BlockJobType type; /* BLOCK_JOB_STREAMING */ int64_t offset; /* progress info */ int64_t end; };
We probably want to generalize the offset/end pair to have no units. They are simply used to indicate progress information. When the job completes offset == end.
The BLOCK_STREAM_COMPLETED event also needs to be generalized to BLOCK_JOB_COMPLETED.
This approach is reasonable although I do worry that we don't understand the requirements for future block device background jobs. For example, jobs that perform an operation on two block devices instead of just one. With QMP we are extensible and can add fields for specific job types, so hopefully we can get by without finding the API inadequate for future block jobs.
Kevin: What do you think about providing a job interface instead of tying the cancel and info operations specifically to block_stream?
I guess it's a reasonable thing to do. We shouldn't overengineer here and try to solve all problems in qemu at once, but as long as we're basically talking about renaming what our current streaming/block copy concept needs anyway, I don't see a problem.
I would leave block_stream as-is because the job start function is where most of the variability comes in. I think it's not worth doing a generic block_job_start function where you have to say type=streaming.
/me is off to update the image streaming API wiki page (and have lunch).
The wiki is updated with block_job APIs and block_stream_set_speed: http://wiki.qemu.org/Features/LiveBlockMigration/ImageStreamingAPI Stefan

On 07/14/2011 04:21 AM, Jiri Denemark wrote:
On Wed, Jul 13, 2011 at 15:46:30 -0500, Adam Litke wrote: ...
/* * BlockPull API */
/* An iterator for initiating and monitoring block pull operations */ typedef unsigned long long virDomainBlockPullCursor;
typedef struct _virDomainBlockPullInfo virDomainBlockPullInfo; struct _virDomainBlockPullInfo { /* * The following fields provide an indication of block pull progress. @cur * indicates the current position and will be between 0 and @end. @end is * the final cursor position for this operation and represents completion. * To approximate progress, divide @cur by @end. */ virDomainBlockPullCursor cur; virDomainBlockPullCursor end; }; typedef virDomainBlockPullInfo *virDomainBlockPullInfoPtr; ... /** * virDomainBlockPullAbort: * @dom: pointer to domain object * @path: fully-qualified filename of disk * @flags: currently unused, for future extension * * Cancel a pull operation previously started by virDomainBlockPullAll(). * * Returns -1 in case of failure, 0 when successful. */ int virDomainBlockPullAbort(virDomainPtr dom, const char *path, unsigned int flags);
/** * virDomainGetBlockPullInfo: * @dom: pointer to domain object * @path: fully-qualified filename of disk * @info: pointer to a virDomainBlockPullInfo structure * @flags: currently unused, for future extension * * Request progress information on a block pull operation that has been started * with virDomainBlockPull(). If an operation is active for the given * parameters, @info will be updated with the current progress. * * Returns -1 in case of failure, 0 when successful. */ int virDomainGetBlockPullInfo(virDomainPtr dom, const char *path, virDomainBlockPullInfoPtr info, unsigned int flags);
/** * virConnectDomainEventBlockPullStatus: * * The final status of a virDomainBlockPull() operation */ typedef enum { VIR_DOMAIN_BLOCK_PULL_COMPLETED = 0, VIR_DOMAIN_BLOCK_PULL_FAILED = 1, } virConnectDomainEventBlockPullStatus;
/** * virConnectDomainEventBlockPullCallback: * @conn: connection object * @dom: domain on which the event occurred * @path: fully-qualified filename of the affected disk * @status: final status of the operation (virConnectDomainEventBlockPullStatus * * The callback signature to use when registering for an event of type * VIR_DOMAIN_EVENT_ID_BLOCK_PULL with virConnectDomainEventRegisterAny() */ typedef void (*virConnectDomainEventBlockPullCallback)(virConnectPtr conn, virDomainPtr dom, const char *path, int status, void *opaque);
Could these managing functions and event callback become general and usable by other block operations such as block copy? Also I think it would be nice to make the virDomainBlockPullInfo more similar to virDomainJobInfo. I was thinking about something along the following lines:
We've already discussed using a JobInfo-like structure in the past but the fields are not really appropriate for this operation. The number of blocks remaining is not available via the qemu API and it is a non-deterministic value that depends on other guest IO that is ongoing.
typedef struct _virDomainBlockJobInfo virDomainBlockJobInfo; struct _virDomainBlockJobInfo { int job; /* BLOCK_PULL, BLOCK_COPY, ... */ int status; /* COMPLETED, FAILED, CANCELLED */ unsigned long long timeElapsed; unsigned long long total; unsigned long long processed; unsigned long long remaining; }
int virDomainBlockJobAbort(virDomainPtr dom, const char *path, unsigned int flags);
int virDomainGetBlockJobInfo(virDomainPtr dom, const char *path, virDomainBlockJobInfoPtr info, unsigned int flags);
typedef void (*virConnectDomainEventBlockJobCallback)(virConnectPtr conn, virDomainPtr dom, const char *path, int job, int status);
I doubt it's practical to support more than one block operation on a single disk at a time, since that sounds to me like asking for troubles and data corruption. But if someone thinks we should keep the door open for this, we can add 'job' parameter to Abort and Info APIs.
I don't think we should be supporting multiple block operations at once. -- Adam Litke IBM Linux Technology Center
participants (5)
-
Adam Litke
-
Daniel Veillard
-
Jiri Denemark
-
Kevin Wolf
-
Stefan Hajnoczi