I've been investigating a lockd lock ordering bug in a migration error handling
path in the libxl driver. In the perform phase, the src calls
virDomainLockProcessPause to release the lock before sending the VM to dst. In
this case the send fails for other reasons and an attempt is made to reacquire
the lock with virDomainLockProcessResume. But that fails since the dst has not
finished cleaning up the failed VM and releasing the lock it acquired when
starting to receive the VM. My immediate reaction was "why not reacquire the
lock in the confirm phase", but then I saw my older comment a few lines later in
the perform phase code
/*
* Confirm phase will not be executed if perform fails. End the
* job started in begin phase.
*/
Is that just a bug in the implementation, or is it intended to skip the confirm
phase if perform fails?
Regards,
Jim