On Fri, Mar 05, 2021 at 09:19:52AM +0800, Hogan Wang wrote:
>From: Zhuang Shengen <zhuangshengen(a)huawei.com>
>
>When a vm is doing migration phase confirm, and then start it
>concurrently, it will lead to the vm out of libvirtd control.
>
>Cause Analysis:
>1. thread1 migrate vm out.
>2. thread2 start the migrating vm.
>3. thread1 remove vm from domain list after migrate success.
>4. thread2 acquired the vm job success and start the vm.
>5. cannot find the vm any more by 'virsh list' command. Actually,
> the started vm is not exist in the domain list.
>
>Solution:
>Check the vm->removing state before start.
>
Well, this would only fix starting it, but there could be other ways
that domain can be started, right? Like restoring it from a save.
Anyway, I think the issue here is that the CreateWithFlags is even
able to get a job started, I think you should look into that.
Yes, it's, a removed vm begin a job may cause unanticipated results.
Therefore, I think qemuDomainObjBeginJobInternal should return error
after a removed vm acquire job success. I will push an new patch to
fix it.
>
>Signed-off-by: Zhuang Shengen <zhuangshengen(a)huawei.com>
>Reviewed-by: Hogan Wang <hogan.wang(a)huawei.com>
>---
> src/qemu/qemu_driver.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
>diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index
>d1a3659774..a5dfea94cb 100644
>--- a/src/qemu/qemu_driver.c
>+++ b/src/qemu/qemu_driver.c
>@@ -6637,6 +6637,12 @@ qemuDomainCreateWithFlags(virDomainPtr dom, unsigned int
flags)
> goto endjob;
> }
>
>+ if (vm->removing) {
>+ virReportError(VIR_ERR_OPERATION_INVALID,
>+ "%s", _("domain is already removing"));
>+ goto endjob;
>+ }
>+
> if (qemuDomainObjStart(dom->conn, driver, vm, flags,
> QEMU_ASYNC_JOB_START) < 0)
> goto endjob;
>--
>2.23.0
>
>