On 11.02.2015 17:42, Jiri Denemark wrote:
On Wed, Feb 11, 2015 at 13:51:11 +0100, Michal Privoznik wrote:
> When migrating with storage, libvirt iterates over domain disks and
> instruct qemu to migrate the ones we are interested in (shared, RO and
> source-less disks are skipped). The disks are migrated in series. No
> new disk is transferred until the previous one hasn't been quiesced.
> This is checked on the qemu monitor via 'query-jobs' command. If the
> disk has been quiesced, it practically went from copying its content
> to mirroring state, where all disk writes are mirrored to the other
> side of migration too. Having said that, there's one inherent error in
> the design. The monitor command we use reports only active jobs. So if
> the job fails for whatever reason, we will not see it anymore in the
> command output. And this can happen fairly simply: just try to migrate
> a domain with storage. If the storage migration fails (e.g. due to
> ENOSPC on the destination) we resume the host on the destination and
> let it run on partly copied disk.
>
> The proper fix is what even the comment in the code says: listen for
> qemu events instead of polling. If storage migration changes state an
> event is emitted and we can act accordingly: either consider disk
> copied and continue the process, or consider disk mangled and abort
> the migration.
>
> Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>
> ---
> src/qemu/qemu_migration.c | 37 +++++++++++++++++--------------------
> 1 file changed, 17 insertions(+), 20 deletions(-)
>
> diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
> index 4506d87..14a4ec6 100644
> --- a/src/qemu/qemu_migration.c
> +++ b/src/qemu/qemu_migration.c
> @@ -1739,7 +1739,6 @@ qemuMigrationDriveMirror(virQEMUDriverPtr driver,
...
> - /* XXX Frankly speaking, we should listen to the events,
> - * instead of doing this. But this works for now and we
> - * are doing something similar in migration itself anyway */
> + /* XXX Turn this into virCond someday. */
Exactly, this is what I wanted to suggest when seeing this code. The
code should be even simpler with virCond so why not doing the change
right away? I think this could be done as a separate patch (in this
series) to make reviewing the code easier.
Exactly. It has not direct influence on this patcheset. The virCond
patch should be easy and small though, so after this is merged, I can
propose one. But I'd really love to fix the bug asap.
Michal