On Wed, Feb 11, 2015 at 13:51:11 +0100, Michal Privoznik wrote:
When migrating with storage, libvirt iterates over domain disks and
instruct qemu to migrate the ones we are interested in (shared, RO and
source-less disks are skipped). The disks are migrated in series. No
new disk is transferred until the previous one hasn't been quiesced.
Quiesce in the libvirt terminology is used in the meaning that the
filesystem writes were frozen (see snapshot creation).
In this context it's used in the meaning that the qemu block-copy job
entered the synchronized phase, but writes are still possible.
Please tweak the wording to avoid confusion.
This is checked on the qemu monitor via 'query-jobs' command.
If the
disk has been quiesced, it practically went from copying its content
to mirroring state, where all disk writes are mirrored to the other
side of migration too. Having said that, there's one inherent error in
the design. The monitor command we use reports only active jobs. So if
the job fails for whatever reason, we will not see it anymore in the
command output. And this can happen fairly simply: just try to migrate
a domain with storage. If the storage migration fails (e.g. due to
ENOSPC on the destination) we resume the host on the destination and
let it run on partly copied disk.
The proper fix is what even the comment in the code says: listen for
qemu events instead of polling. If storage migration changes state an
event is emitted and we can act accordingly: either consider disk
copied and continue the process, or consider disk mangled and abort
the migration.
Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>
---
src/qemu/qemu_migration.c | 37 +++++++++++++++++--------------------
1 file changed, 17 insertions(+), 20 deletions(-)
Peter