On Mon, Jul 19, 2021 at 09:37:01 +0200, Michal Prívozník wrote:
On 7/16/21 5:06 PM, Jiri Denemark wrote:
Signaling the condition before vm->def->id is reset to -1 is dangerous: in case a waiting thread wakes up, it does not see anything interesting (the domain is still marked as running) and just enters virDomainObjWait where it waits forever because the condition will never be signalled again.
Originally it was impossible to get into such situation because the vm object was locked all the time between signaling the condition and resetting vm->def->id, but after commit 860a999802 released in 6.8.0, qemuDomainObjStopWorker called in qemuProcessStop between virDomainObjBroadcast and setting vm->def->id to -1 unlocks the vm object giving other threads a chance to wake up and possibly hang.
In real world, this can be easily reproduced by killing, destroying, or just shutting down (from the guest OS) a domain while it is being migrated somewhere else. The migration job would never finish.
We can't fix this by reseting vm->def->id earlier because other functions (such as qemuProcessKill) do nothing when the domain is already marked as inactive. So let's make sure we delay signaling the
Not true really. The VIR_QEMU_PROCESS_KILL_NOCHECK flag is passed to qemuProcessKill which means the check for domain state (aka vm->def->id == -1) is skipped.
Heh, I completely missed this flag when looking at the code :-/ I dropped this part of the commit message. Jirka