-----Original Message-----
From: Eric Blake [mailto:eblake@redhat.com]
Sent: Tuesday, March 25, 2014 12:01 AM
To: Paolo Bonzini; Gonglei (Arei); qemu-devel(a)nongnu.org
Cc: quintela(a)redhat.com; owasserm(a)redhat.com; Yanqiangjun; Zhaoyanbin
(A); Zengjunliang; libvir-list(a)redhat.com
Subject: Re: [PATCH] migration: Fix possible bug for migrate cancel
[adding libvirt]
On 03/24/2014 09:47 AM, Paolo Bonzini wrote:
> Il 24/03/2014 14:04, arei.gonglei(a)huawei.com ha scritto:
>> From: zengjunliang <zengjunliang(a)huawei.com>
>>
>> Return error for migrate cancel, when migration status is not
>> MIG_STATE_SETUP or MIG_STATE_ACTIVE. Thus, libvirt can can
>> perceive the operation fails.
>>
>> Signed-off-by: zengjunliang <zengjunliang(a)huawei.com>
>> Signed-off-by: Gonglei <arei.gonglei(a)huawei.com>
>
> I think this is done on purpose, because canceling migration is racy.
> Instead, libvirt should do "query-migrate" and check if the migration
> was completed or canceled.
Can you please give more details at how you are triggering the problem
with libvirt? I think Paolo is probably right - the bug is more likely
to be in libvirt not expecting the race and not recovering correctly
when the race occurs, than it is to be in changing qemu's state algorithm.
When the migration progress reaches 100%, and the migration status becomes
MIG_STATE_COMPLETED in Qemu.
It will take some time which from MIG_STATE_COMPLETED to the migration thread resources
are recovered.
If we cancel the migration at this moment, the migrate_fd_cancel function will break
directly without reporting
error code. Then, libvirt considers the cancle operation a success, contrary facts.
Best regards,
-Gonglei