[libvirt] virDomainBlockJobAbort and block_job_cancel

23 Nov 2011

      Block job cancellation waits until the job has been cancelled before
returning.  This allows clients to know that the operation has been
cancelled successfully.  Unfortunately, these semantics are not really
possible with today's QEMU and libvirt code.

A command that waits for block I/O completion may wait for many
minutes.  During this time the monitor is unavailable.  While the QMP
protocol may in theory support multiple in-flight commands, both QEMU
and libvirt's implemenations are geared towards one command at a time.
 So in practice a hung cancellation command would make the monitor
unavailable - we need to avoid this.

This means block_job_cancel cannot wait until the job is cancelled or
it risks hanging the monitor if there is a block I/O timeout.  We need
a solution that reflects this in QEMU and libvirt, here is what I
propose:

block_job_cancel returns immediately upon marking the job cancelled.
The job may still be finishing block I/O but will cancel itself at
some point in the future.  When the job actually completes it raises
the new BLOCK_JOB_CANCELLED event.

This means that virDomainBlockJobAbort() returns to the client without
a guarantee that the job has completed.  If the client enumerates jobs
it may still see a job that has not finished cancelling.  The client
must register a handler for the BLOCK_JOB_CANCELLED event if it wants
to know when the job really goes away.  The BLOCK_JOB_CANCELLED event
has the same fields as the BLOCK_JOB_COMPLETED event, except it lacks
the optional "error" message field.

The impact on clients is that they need to add a BLOCK_JOB_CANCELLED
handler if they really want to wait.  Most clients today (not many
exist) will be fine without waiting for cancellation.

Any objections or thoughts on this?

Stefan

[libvirt] virDomainBlockJobAbort and block_job_cancel

Stefan Hajnoczi