On Thu, Nov 08, 2012 at 11:00:40 +0100, Michal Privoznik wrote:
On 07.11.2012 23:23, Jiri Denemark wrote:
>
> AbortJob API was never a synchronous one and I don't think we need to change
> it. Not to mention that this code unlocks and locks vm without holding an
> extra reference to it. And even if it did so, it cannot guarantee that the job
> it's checking in the loop is the same one it tried to cancel. It's quite
> unlikely but the original job could have finished and another one could have
> been started while this code was sleeping.
However, AbortJob is documented as synchronous. If current implementation doesn't
follow docs it is a bug. On the other hand, I don't recall anybody screaming about
it so far. But that means nothing, right?
IIRC the implementation was never synchronous in which case I think we may
just fix the documentation to match reality :-)
>> @@ -1172,6 +1172,12 @@ qemuMigrationUpdateJobStatus(struct
qemud_driver *driver,
>> }
>> priv->job.info.timeElapsed -= priv->job.start;
>>
>> + if (qemuDomainObjAbortAsyncJobRequested(vm)) {
>> + VIR_DEBUG("Migration abort requested. Translating "
>> + "status to MIGRATION_STATUS_CANCELLED");
>> + status = QEMU_MONITOR_MIGRATION_STATUS_CANCELLED;
>> + }
>> +
>
> This check seems to be to late and that complicates the code later. If we keep
> qemuDomainAbortJob calling qemuMonitorMigrateCancel in qemuDomainAbortJob, we
> may count on that and check priv.job.asyncAbort before entering the monitor
> to tell QEMU to start migrating in qemuMigrationRun. If the abort is not
> requested by then, it may only happen after we call qemuMonitorMigrateTo* and
> in that case the migration will be properly aborted by
> qemuMonitorMigrateCancel.
>
Not really. The issue I've seen was like this:
Thread A Thread B
1. start async migration out job
2. Since job is set, abort it by calling 'migrate_cancel'
3. return to user
4. issue 'migrate' on the monitor
And I think your suggestion makes it just harder to hit this race and not really avoid
it.
However with my code, we are guaranteed that 'migrate_cancel' will have an
effect.
Oh right, we actually need to check priv.job.asyncAbort after entering the
monitor (since we may be preempted while waiting for entering the monitor) but
still before calling qemuMonitorMigrateTo* that actually starts the monitor.
When we entered the monitor, the AbortJob must have already been there or it
will come after we leave the monitor.
Jirka