[libvirt] virDomainBlockJobAbort and block_job_cancel

Block job cancellation waits until the job has been cancelled before returning. This allows clients to know that the operation has been cancelled successfully. Unfortunately, these semantics are not really possible with today's QEMU and libvirt code. A command that waits for block I/O completion may wait for many minutes. During this time the monitor is unavailable. While the QMP protocol may in theory support multiple in-flight commands, both QEMU and libvirt's implemenations are geared towards one command at a time. So in practice a hung cancellation command would make the monitor unavailable - we need to avoid this. This means block_job_cancel cannot wait until the job is cancelled or it risks hanging the monitor if there is a block I/O timeout. We need a solution that reflects this in QEMU and libvirt, here is what I propose: block_job_cancel returns immediately upon marking the job cancelled. The job may still be finishing block I/O but will cancel itself at some point in the future. When the job actually completes it raises the new BLOCK_JOB_CANCELLED event. This means that virDomainBlockJobAbort() returns to the client without a guarantee that the job has completed. If the client enumerates jobs it may still see a job that has not finished cancelling. The client must register a handler for the BLOCK_JOB_CANCELLED event if it wants to know when the job really goes away. The BLOCK_JOB_CANCELLED event has the same fields as the BLOCK_JOB_COMPLETED event, except it lacks the optional "error" message field. The impact on clients is that they need to add a BLOCK_JOB_CANCELLED handler if they really want to wait. Most clients today (not many exist) will be fine without waiting for cancellation. Any objections or thoughts on this? Stefan

On 11/23/2011 07:48 AM, Stefan Hajnoczi wrote:
This means that virDomainBlockJobAbort() returns to the client without a guarantee that the job has completed. If the client enumerates jobs it may still see a job that has not finished cancelling. The client must register a handler for the BLOCK_JOB_CANCELLED event if it wants to know when the job really goes away. The BLOCK_JOB_CANCELLED event has the same fields as the BLOCK_JOB_COMPLETED event, except it lacks the optional "error" message field.
The impact on clients is that they need to add a BLOCK_JOB_CANCELLED handler if they really want to wait. Most clients today (not many exist) will be fine without waiting for cancellation.
Any objections or thoughts on this?
virDomainBlockJobAbort() thankfully has an 'unsigned int flags' argument. For backwards-compatibility, I suggest we use it: calling virDomainBlockJobAbort(,0) maintains old blocking behavior, and we document that blocking until things abort may render the rest of interactions with the domain unresponsive. The new virDomainBlockJobAbort(,VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC) would then implement your new proposed semantics of returning immediately once the cancellation has been requested, even if it hasn't been acted on yet. Maybe you could convince me to swap the flags: have 0 change semantics to non-blocking, and a new flag to request blocking, where callers that care have to try the flag, and if the flag is unsupported then they know they are talking to older libvirtd where the behavior is blocking by default, but that's a bit riskier. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Wed, Nov 23, 2011 at 09:04:50AM -0700, Eric Blake wrote:
On 11/23/2011 07:48 AM, Stefan Hajnoczi wrote:
This means that virDomainBlockJobAbort() returns to the client without a guarantee that the job has completed. If the client enumerates jobs it may still see a job that has not finished cancelling. The client must register a handler for the BLOCK_JOB_CANCELLED event if it wants to know when the job really goes away. The BLOCK_JOB_CANCELLED event has the same fields as the BLOCK_JOB_COMPLETED event, except it lacks the optional "error" message field.
The impact on clients is that they need to add a BLOCK_JOB_CANCELLED handler if they really want to wait. Most clients today (not many exist) will be fine without waiting for cancellation.
Any objections or thoughts on this?
virDomainBlockJobAbort() thankfully has an 'unsigned int flags' argument. For backwards-compatibility, I suggest we use it:
calling virDomainBlockJobAbort(,0) maintains old blocking behavior, and we document that blocking until things abort may render the rest of interactions with the domain unresponsive.
The new virDomainBlockJobAbort(,VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC) would then implement your new proposed semantics of returning immediately once the cancellation has been requested, even if it hasn't been acted on yet.
Maybe you could convince me to swap the flags: have 0 change semantics to non-blocking, and a new flag to request blocking, where callers that care have to try the flag, and if the flag is unsupported then they know they are talking to older libvirtd where the behavior is blocking by default, but that's a bit riskier.
Agreed, I would rather not change the current call semantic, but an ASYNC flag would be a really good addition. We can document the risk of not using it in the function description and suggest new applications use ASYNC flag. Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

On Thu, Nov 24, 2011 at 5:31 AM, Daniel Veillard <veillard@redhat.com> wrote:
On Wed, Nov 23, 2011 at 09:04:50AM -0700, Eric Blake wrote:
On 11/23/2011 07:48 AM, Stefan Hajnoczi wrote:
This means that virDomainBlockJobAbort() returns to the client without a guarantee that the job has completed. If the client enumerates jobs it may still see a job that has not finished cancelling. The client must register a handler for the BLOCK_JOB_CANCELLED event if it wants to know when the job really goes away. The BLOCK_JOB_CANCELLED event has the same fields as the BLOCK_JOB_COMPLETED event, except it lacks the optional "error" message field.
The impact on clients is that they need to add a BLOCK_JOB_CANCELLED handler if they really want to wait. Most clients today (not many exist) will be fine without waiting for cancellation.
Any objections or thoughts on this?
virDomainBlockJobAbort() thankfully has an 'unsigned int flags' argument. For backwards-compatibility, I suggest we use it:
calling virDomainBlockJobAbort(,0) maintains old blocking behavior, and we document that blocking until things abort may render the rest of interactions with the domain unresponsive.
The new virDomainBlockJobAbort(,VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC) would then implement your new proposed semantics of returning immediately once the cancellation has been requested, even if it hasn't been acted on yet.
Maybe you could convince me to swap the flags: have 0 change semantics to non-blocking, and a new flag to request blocking, where callers that care have to try the flag, and if the flag is unsupported then they know they are talking to older libvirtd where the behavior is blocking by default, but that's a bit riskier.
Agreed, I would rather not change the current call semantic, but an ASYNC flag would be a really good addition. We can document the risk of not using it in the function description and suggest new applications use ASYNC flag.
Yep, that's a nice suggestion and solves the problem. Stefan

On Thu, Nov 24, 2011 at 09:21:42AM +0000, Stefan Hajnoczi wrote:
On Thu, Nov 24, 2011 at 5:31 AM, Daniel Veillard <veillard@redhat.com> wrote:
On Wed, Nov 23, 2011 at 09:04:50AM -0700, Eric Blake wrote:
On 11/23/2011 07:48 AM, Stefan Hajnoczi wrote:
This means that virDomainBlockJobAbort() returns to the client without a guarantee that the job has completed. If the client enumerates jobs it may still see a job that has not finished cancelling. The client must register a handler for the BLOCK_JOB_CANCELLED event if it wants to know when the job really goes away. The BLOCK_JOB_CANCELLED event has the same fields as the BLOCK_JOB_COMPLETED event, except it lacks the optional "error" message field.
The impact on clients is that they need to add a BLOCK_JOB_CANCELLED handler if they really want to wait. Most clients today (not many exist) will be fine without waiting for cancellation.
Any objections or thoughts on this?
virDomainBlockJobAbort() thankfully has an 'unsigned int flags' argument. For backwards-compatibility, I suggest we use it:
calling virDomainBlockJobAbort(,0) maintains old blocking behavior, and we document that blocking until things abort may render the rest of interactions with the domain unresponsive.
The new virDomainBlockJobAbort(,VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC) would then implement your new proposed semantics of returning immediately once the cancellation has been requested, even if it hasn't been acted on yet.
Maybe you could convince me to swap the flags: have 0 change semantics to non-blocking, and a new flag to request blocking, where callers that care have to try the flag, and if the flag is unsupported then they know they are talking to older libvirtd where the behavior is blocking by default, but that's a bit riskier.
Agreed, I would rather not change the current call semantic, but an ASYNC flag would be a really good addition. We can document the risk of not using it in the function description and suggest new applications use ASYNC flag.
Yep, that's a nice suggestion and solves the problem.
I am almost ready to post the code that makes the above change, but before I do, I need to ask a question about implementing synchronous behavior. Stefan's qemu tree has a block_job_cancel command that always acts asynchronously. In order to provide the synchronous behavior in libvirt (when flags is 0), I need to wait for the block job to go away. I see two options: 1) Use the event: To implement this I would create an internal event callback function and register it to receive the block job events. When the cancel event comes in, I can exit the qemu job context. One problem I see with this is that events are only available when using the json monitor mode. 2) Poll the qemu monitor To do it this way, I would write a function that repeatedly calls virDomainGetBlockJobInfo() against the disk in question. Once the job disappears from the list I can return with confidence that the job is gone. This is obviously sub-optimal because I need to poll and sleep. 3) Ask Stefan to provide a synchronous mode for the qemu monitor command This one is the nicest from a libvirt perspective, but I doubt qemu wants to add such an interface given that it basically has broken semantics as far as qemu is concerned. If this is all too nasty, we could probably just change the behavior of blockJobAbort and make it always synchronous with a 'cancelled' event. Thoughts? -- Adam Litke <agl@us.ibm.com> IBM Linux Technology Center

On 12/07/2011 03:35 PM, Adam Litke wrote:
Stefan's qemu tree has a block_job_cancel command that always acts asynchronously. In order to provide the synchronous behavior in libvirt (when flags is 0), I need to wait for the block job to go away. I see two options:
1) Use the event: To implement this I would create an internal event callback function and register it to receive the block job events. When the cancel event comes in, I can exit the qemu job context. One problem I see with this is that events are only available when using the json monitor mode.
I like this idea. We have internally handled events before, and limited it to just JSON if that made life easier: for example, virDomainReboot on qemu is rejected if you only have the HMP monitor, and if you have the JSON monitor, the implementation internally handles the event to change the domain state. Can we reliably detect whether qemu is new enough to provide the event, and if qemu was older and did not provide the event, do we reliably know that abort was blocking in that case? It's okay to make things work that used to fail, but it is a regression to make blocking job cancel fail where it used to work, so rejecting blocking job cancel with HMP monitor is not a good idea. If we can guarantee that all qemu new enough to have async cancel also support the event, while older qemu was always blocking, and that we can always use the JSON monitor to newer qemu, then we're set - merely ensure that we use only the JSON monitor and the event. But if we can't make the guarantees, and insist on supporting newer qemu via HMP monitor, then we may need a hybrid combination of options 1 and 2, or for less code maintenance, just option 2.
2) Poll the qemu monitor To do it this way, I would write a function that repeatedly calls virDomainGetBlockJobInfo() against the disk in question. Once the job disappears from the list I can return with confidence that the job is gone. This is obviously sub-optimal because I need to poll and sleep.
We've done this before, for both HMP and JSON - see qemuMigrationWaitForCompletion. I agree that an event is nicer than polling, but we may be locked into this.
3) Ask Stefan to provide a synchronous mode for the qemu monitor command This one is the nicest from a libvirt perspective, but I doubt qemu wants to add such an interface given that it basically has broken semantics as far as qemu is concerned.
Or even: 4) Ask Stefan to make the HMP monitor command synchronous, but only expose the JSON command as asynchronous. After all, it is only HMP where we can't wait for an event.
If this is all too nasty, we could probably just change the behavior of blockJobAbort and make it always synchronous with a 'cancelled' event.
No, we can't change the behavior without breaking back-compat of existing clients of job cancellation. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Wed, Dec 7, 2011 at 11:01 PM, Eric Blake <eblake@redhat.com> wrote:
On 12/07/2011 03:35 PM, Adam Litke wrote: 4) Ask Stefan to make the HMP monitor command synchronous, but only expose the JSON command as asynchronous. After all, it is only HMP where we can't wait for an event.
QEMU cannot do async commands, even for HMP. My QEMU patches used to work because there is a broken async flag for monitor commands. I didn't know it was broken at the time O:-). I have CCed Luiz who has been working on moving commands off the old MONITOR_CMD_ASYNC flag. My current understanding is that patches adding use of MONITOR_CMD_ASYNC will not be accepted.
2) Poll the qemu monitor To do it this way, I would write a function that repeatedly calls virDomainGetBlockJobInfo() against the disk in question. Once the job disappears from the list I can return with confidence that the job is gone. This is obviously sub-optimal because I need to poll and sleep.
We've done this before, for both HMP and JSON - see qemuMigrationWaitForCompletion. I agree that an event is nicer than polling, but we may be locked into this.
This seems like the safest option although it's ugly. Stefan

On Wed, Dec 07, 2011 at 04:01:58PM -0700, Eric Blake wrote:
On 12/07/2011 03:35 PM, Adam Litke wrote:
Stefan's qemu tree has a block_job_cancel command that always acts asynchronously. In order to provide the synchronous behavior in libvirt (when flags is 0), I need to wait for the block job to go away. I see two options:
1) Use the event: To implement this I would create an internal event callback function and register it to receive the block job events. When the cancel event comes in, I can exit the qemu job context. One problem I see with this is that events are only available when using the json monitor mode.
I like this idea. We have internally handled events before, and limited it to just JSON if that made life easier: for example, virDomainReboot on qemu is rejected if you only have the HMP monitor, and if you have the JSON monitor, the implementation internally handles the event to change the domain state.
Can we reliably detect whether qemu is new enough to provide the event, and if qemu was older and did not provide the event, do we reliably know that abort was blocking in that case?
I think we can say that qemu will operate in one of two modes: a) Blocking abort AND event is not emitted b) Non-blocking abort AND event is emitted The difficulty is in detecting which case the current qemu supports. I don't believe there is a way to query qemu for a list of currently-supported events. Therefore, we'd have to use version numbers. If we do this, how do we avoid breaking users of qemu git versions that fall between official qemu releases?
It's okay to make things work that used to fail, but it is a regression to make blocking job cancel fail where it used to work, so rejecting blocking job cancel with HMP monitor is not a good idea. If we can guarantee that all qemu new enough to have async cancel also support the event, while older qemu was always blocking, and that we can always use the JSON monitor to newer qemu, then we're set - merely ensure that we use only the JSON monitor and the event. But if we can't make the guarantees, and insist on supporting newer qemu via HMP monitor, then we may need a hybrid combination of options 1 and 2, or for less code maintenance, just option 2.
Is there a deprecation plan for HMP with newer qemu versions? I really hate the idea of needing two implementations for this: one polling and one event-based.
2) Poll the qemu monitor To do it this way, I would write a function that repeatedly calls virDomainGetBlockJobInfo() against the disk in question. Once the job disappears from the list I can return with confidence that the job is gone. This is obviously sub-optimal because I need to poll and sleep.
We've done this before, for both HMP and JSON - see qemuMigrationWaitForCompletion. I agree that an event is nicer than polling, but we may be locked into this.
3) Ask Stefan to provide a synchronous mode for the qemu monitor command This one is the nicest from a libvirt perspective, but I doubt qemu wants to add such an interface given that it basically has broken semantics as far as qemu is concerned.
Or even:
4) Ask Stefan to make the HMP monitor command synchronous, but only expose the JSON command as asynchronous. After all, it is only HMP where we can't wait for an event.
Stefan, how 'bout it?
If this is all too nasty, we could probably just change the behavior of blockJobAbort and make it always synchronous with a 'cancelled' event.
No, we can't change the behavior without breaking back-compat of existing clients of job cancellation.
-- Adam Litke <agl@us.ibm.com> IBM Linux Technology Center

On Thu, Dec 8, 2011 at 2:55 PM, Adam Litke <agl@us.ibm.com> wrote:
On Wed, Dec 07, 2011 at 04:01:58PM -0700, Eric Blake wrote:
On 12/07/2011 03:35 PM, Adam Litke wrote:
Stefan's qemu tree has a block_job_cancel command that always acts asynchronously. In order to provide the synchronous behavior in libvirt (when flags is 0), I need to wait for the block job to go away. I see two options:
1) Use the event: To implement this I would create an internal event callback function and register it to receive the block job events. When the cancel event comes in, I can exit the qemu job context. One problem I see with this is that events are only available when using the json monitor mode.
I like this idea. We have internally handled events before, and limited it to just JSON if that made life easier: for example, virDomainReboot on qemu is rejected if you only have the HMP monitor, and if you have the JSON monitor, the implementation internally handles the event to change the domain state.
Can we reliably detect whether qemu is new enough to provide the event, and if qemu was older and did not provide the event, do we reliably know that abort was blocking in that case?
I think we can say that qemu will operate in one of two modes: a) Blocking abort AND event is not emitted b) Non-blocking abort AND event is emitted
The difficulty is in detecting which case the current qemu supports. I don't believe there is a way to query qemu for a list of currently-supported events. Therefore, we'd have to use version numbers. If we do this, how do we avoid breaking users of qemu git versions that fall between official qemu releases?
I agree. Checking version numbers is always problematic - distros may backport features. This isn't pretty but how about: 1. Issue block_job_cancel 2. Issue query-block-jobs and check if the job is there. 3. Check for QMP events (if applicable). 4. If the block job was visible then we must be async and need to expect an event if we didn't already get one. 5. If the block job was not visible then it has been stopped. Stefan
participants (4)
-
Adam Litke
-
Daniel Veillard
-
Eric Blake
-
Stefan Hajnoczi