From: Jiri Denemark <jdenemar@redhat.com> When a domain is in post-copy migration phase, we need to keep the job active if something fails to protect the domain from changes. Unfortunately, there is a race between migration code and qemuProcessStop that can cause the job to stay active even when the domain is gone and thus preventing the domain from being started again (until virtqemud is restarted). The race is caused by unlocking the vm object when calling virConnectUnregisterCloseCallback. While the domain is unlocked qemuProcessStop can finish its work and the domain may no longer be active when we get the lock back. The post-copy path does not properly check if a domain is still active. Instead of adding the virDomainObjIsActive check in all places where this could happen, we can add it in virDomainObjIsPostcopy and virDomainObjIsFailedPostcopy and let the code take the pre-copy cleanup path. Clearly an inactive domain can never be in (failed) post-copy migration. https://issues.redhat.com/browse/RHEL-145179 Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/conf/domain_conf.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 998b333c74..3528b90742 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -30723,6 +30723,9 @@ bool virDomainObjIsFailedPostcopy(virDomainObj *dom, virDomainJobObj *job) { + if (!virDomainObjIsActive(dom)) + return false; + if (job && job->asyncPaused && (job->asyncJob == VIR_ASYNC_JOB_MIGRATION_IN || job->asyncJob == VIR_ASYNC_JOB_MIGRATION_OUT)) @@ -30739,6 +30742,9 @@ bool virDomainObjIsPostcopy(virDomainObj *dom, virDomainJobObj *job) { + if (!virDomainObjIsActive(dom)) + return false; + if (virDomainObjIsFailedPostcopy(dom, job)) return true; -- 2.53.0