On 05/16/2011 07:30 PM, Eric Blake wrote:
+ while (priv->jobSignals& QEMU_JOB_SIGNAL_BLKSTAT)
+ ignore_value(virCondWait(&priv->signalCond,&vm->lock));
+
+ priv->jobSignalsData.statDevName = disk->info.alias;
+ priv->jobSignalsData.blockStat = stats;
+ priv->jobSignalsData.statRetCode = -1;
+ priv->jobSignals |= QEMU_JOB_SIGNAL_BLKSTAT;
+
+ while (priv->jobSignals& QEMU_JOB_SIGNAL_BLKSTAT)
+ ignore_value(virCondWait(&priv->signalCond,&vm->lock));
I'm not sure that the original problem with jobSignals is fixed (the
one that you solved by splitting the retcode field). A similar race
can happen with two threads, both requesting blkstat.
Thread 1 starts a migration.
Thread 2 queries BlockStats, waits for the condition variable and
priv->jobSignals to be 0.
Thread 2 sets priv->jobSignals to non-zero, then waits for condition
variable and priv->jobSignals to clear BLKSTATS.
Thread 1 processes thread 2's query, and clears priv->jobSignals
Thread 3 sees priv->jobSignals == 0 and queries BlockStats.
Thread 1 processes thread 3's query, and clears priv->jobSignals
Thread 2 gets condition lock, but sees the answer intended for
thread 3.
At this point I'm not even sure if there is a deadlock or what.
I think the code should use two condition variables. signalCond is
broadcast when you _can start_ a job, resultCond is broadcast when a
job has finished.
while ((priv->jobSignals | priv->jobResults) & QEMU_JOB_SIGNAL_BLKSTAT)
ignore_value(virCondWait(&priv->signalCond, &vm->lock));
priv->jobSignalsData.statDevName = disk->info.alias;
priv->jobSignalsData.blockStat = stats;
priv->jobSignalsData.statRetCode = -1;
priv->jobSignals |= QEMU_JOB_SIGNAL_BLKSTAT;
while ((priv->jobResults & QEMU_JOB_SIGNAL_BLKSTAT) == 0)
ignore_value(virCondWait(&priv->resultCond, &vm->lock));
ret = priv->jobSignalsData.statRetCode;
priv->jobResults ^= QEMU_JOB_SIGNAL_BLKSTAT;
virCondBroadcast(&priv->signalCond);
...
priv->jobSignalsData.statRetCode = ret;
priv->jobResults |= QEMU_JOB_SIGNAL_BLKSTAT;
priv->jobSignals ^= QEMU_JOB_SIGNAL_BLKSTAT;
virCondBroadcast(&priv->resultCond);
...
while (priv->jobSignals & ~priv->jobResults) {
if (qemuMigrationProcessJobSignals(driver, vm, job, false) < 0)
goto cleanup;
}
...
cleanup:
while (priv->jobSignals & ~priv->jobResults) {
qemuMigrationProcessJobSignals(driver, vm, job, false);
}