On Thu, Nov 24, 2011 at 09:21:42AM +0000, Stefan Hajnoczi wrote:
On Thu, Nov 24, 2011 at 5:31 AM, Daniel Veillard
<veillard(a)redhat.com> wrote:
> On Wed, Nov 23, 2011 at 09:04:50AM -0700, Eric Blake wrote:
>> On 11/23/2011 07:48 AM, Stefan Hajnoczi wrote:
>> > This means that virDomainBlockJobAbort() returns to the client without
>> > a guarantee that the job has completed. If the client enumerates jobs
>> > it may still see a job that has not finished cancelling. The client
>> > must register a handler for the BLOCK_JOB_CANCELLED event if it wants
>> > to know when the job really goes away. The BLOCK_JOB_CANCELLED event
>> > has the same fields as the BLOCK_JOB_COMPLETED event, except it lacks
>> > the optional "error" message field.
>> >
>> > The impact on clients is that they need to add a BLOCK_JOB_CANCELLED
>> > handler if they really want to wait. Most clients today (not many
>> > exist) will be fine without waiting for cancellation.
>> >
>> > Any objections or thoughts on this?
>>
>> virDomainBlockJobAbort() thankfully has an 'unsigned int flags'
>> argument. For backwards-compatibility, I suggest we use it:
>>
>> calling virDomainBlockJobAbort(,0) maintains old blocking behavior, and
>> we document that blocking until things abort may render the rest of
>> interactions with the domain unresponsive.
>>
>> The new virDomainBlockJobAbort(,VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC) would
>> then implement your new proposed semantics of returning immediately once
>> the cancellation has been requested, even if it hasn't been acted on yet.
>>
>> Maybe you could convince me to swap the flags: have 0 change semantics
>> to non-blocking, and a new flag to request blocking, where callers that
>> care have to try the flag, and if the flag is unsupported then they know
>> they are talking to older libvirtd where the behavior is blocking by
>> default, but that's a bit riskier.
>
> Agreed, I would rather not change the current call semantic,
> but an ASYNC flag would be a really good addition. We can document
> the risk of not using it in the function description and suggest
> new applications use ASYNC flag.
Yep, that's a nice suggestion and solves the problem.
I am almost ready to post the code that makes the above change, but before I do,
I need to ask a question about implementing synchronous behavior.
Stefan's qemu tree has a block_job_cancel command that always acts
asynchronously. In order to provide the synchronous behavior in libvirt (when
flags is 0), I need to wait for the block job to go away. I see two options:
1) Use the event:
To implement this I would create an internal event callback function and
register it to receive the block job events. When the cancel event comes in, I
can exit the qemu job context. One problem I see with this is that events are
only available when using the json monitor mode.
2) Poll the qemu monitor
To do it this way, I would write a function that repeatedly calls
virDomainGetBlockJobInfo() against the disk in question. Once the job
disappears from the list I can return with confidence that the job is gone.
This is obviously sub-optimal because I need to poll and sleep.
3) Ask Stefan to provide a synchronous mode for the qemu monitor command
This one is the nicest from a libvirt perspective, but I doubt qemu wants to add
such an interface given that it basically has broken semantics as far as qemu is
concerned.
If this is all too nasty, we could probably just change the behavior of
blockJobAbort and make it always synchronous with a 'cancelled' event.
Thoughts?
--
Adam Litke <agl(a)us.ibm.com>
IBM Linux Technology Center