
On Thu, Nov 08, 2012 at 11:00:40 +0100, Michal Privoznik wrote:
On 07.11.2012 23:23, Jiri Denemark wrote:
AbortJob API was never a synchronous one and I don't think we need to change it. Not to mention that this code unlocks and locks vm without holding an extra reference to it. And even if it did so, it cannot guarantee that the job it's checking in the loop is the same one it tried to cancel. It's quite unlikely but the original job could have finished and another one could have been started while this code was sleeping.
However, AbortJob is documented as synchronous. If current implementation doesn't follow docs it is a bug. On the other hand, I don't recall anybody screaming about it so far. But that means nothing, right?
IIRC the implementation was never synchronous in which case I think we may just fix the documentation to match reality :-)
@@ -1172,6 +1172,12 @@ qemuMigrationUpdateJobStatus(struct qemud_driver *driver, } priv->job.info.timeElapsed -= priv->job.start;
+ if (qemuDomainObjAbortAsyncJobRequested(vm)) { + VIR_DEBUG("Migration abort requested. Translating " + "status to MIGRATION_STATUS_CANCELLED"); + status = QEMU_MONITOR_MIGRATION_STATUS_CANCELLED; + } +
This check seems to be to late and that complicates the code later. If we keep qemuDomainAbortJob calling qemuMonitorMigrateCancel in qemuDomainAbortJob, we may count on that and check priv.job.asyncAbort before entering the monitor to tell QEMU to start migrating in qemuMigrationRun. If the abort is not requested by then, it may only happen after we call qemuMonitorMigrateTo* and in that case the migration will be properly aborted by qemuMonitorMigrateCancel.
Not really. The issue I've seen was like this:
Thread A Thread B 1. start async migration out job 2. Since job is set, abort it by calling 'migrate_cancel' 3. return to user 4. issue 'migrate' on the monitor
And I think your suggestion makes it just harder to hit this race and not really avoid it. However with my code, we are guaranteed that 'migrate_cancel' will have an effect.
Oh right, we actually need to check priv.job.asyncAbort after entering the monitor (since we may be preempted while waiting for entering the monitor) but still before calling qemuMonitorMigrateTo* that actually starts the monitor. When we entered the monitor, the AbortJob must have already been there or it will come after we leave the monitor. Jirka