On Mon, Jul 19, 2021 at 09:37:01 +0200, Michal Prívozník wrote:
On 7/16/21 5:06 PM, Jiri Denemark wrote:
> Signaling the condition before vm->def->id is reset to -1 is dangerous:
> in case a waiting thread wakes up, it does not see anything interesting
> (the domain is still marked as running) and just enters virDomainObjWait
> where it waits forever because the condition will never be signalled
> again.
>
> Originally it was impossible to get into such situation because the vm
> object was locked all the time between signaling the condition and
> resetting vm->def->id, but after commit 860a999802 released in 6.8.0,
> qemuDomainObjStopWorker called in qemuProcessStop between
> virDomainObjBroadcast and setting vm->def->id to -1 unlocks the vm
> object giving other threads a chance to wake up and possibly hang.
>
> In real world, this can be easily reproduced by killing, destroying, or
> just shutting down (from the guest OS) a domain while it is being
> migrated somewhere else. The migration job would never finish.
>
> We can't fix this by reseting vm->def->id earlier because other
> functions (such as qemuProcessKill) do nothing when the domain is
> already marked as inactive. So let's make sure we delay signaling the
Not true really. The VIR_QEMU_PROCESS_KILL_NOCHECK flag is passed to
qemuProcessKill which means the check for domain state (aka vm->def->id
== -1) is skipped.
Heh, I completely missed this flag when looking at the code :-/
I dropped this part of the commit message.
Jirka