On 10.12.2012 20:27, Michal Privoznik wrote:
This patch set re-implements migration with storage for enough new
qemu.
Currently, you can migrate a domain to a host without need for shared storage.
This is done by setting 'blk' or 'inc' attribute (representing
VIR_MIGRATE_NON_SHARED_DISK and VIR_MIGRATE_NON_SHARED_INC flags respectively)
of 'migrate' monitor command. However, the qemu implementation is
buggy and applications are advised to switch to new impementation
which, moreover, offers some nice features, like migrating only explicitly
specified disks.
The new functionality is controlled via 'nbd-server-*' and
'drive-mirror'
commands. The flow is meant to look like this:
1) User invokes libvirt's migrate functionality.
2) libvirt checks that no block jobs are active on the source.
3) libvirt starts the destination QEMU and sets up the NBD server using the
nbd-server-start and nbd-server-add commands.
4) libvirt starts drive-mirror with a destination pointing to the remote NBD
server, for example nbd:host:port:exportname=diskname (where diskname is the
-drive id specified on the destination).
5) once all mirroring jobs reach steady state, libvirt invokes the migrate
command.
6) once migration completed, libvirt invokes the nbd-server-stop command on the
destination QEMU.
If we just skip the 2nd step and there is an active block-job, qemu will fail in
step 4. No big deal.
Since we try to NOT break migration and keep things compatible, this feature is
enabled iff both sides support it. Since there's obvious need for some data
transfer between src and dst, I've put it into qemuCookieMigration:
1) src -> dest: (QEMU_MIGRATION_PHASE_BEGIN3 -> QEMU_MIGRATION_PHASE_PREPARE)
<nbd>
<disk size='17179869184'/>
</nbd>
Hey destination, I know how to use this cool new feature. Moreover,
these are the disks I'll send you. Each one of them is X bytes big.
It's one of the prerequisite - the file (disk->src) on dst exists and has
at least the same size as on dst.
2) dst -> src: (QEMU_MIGRATION_PHASE_PREPARE -> QEMU_MIGRATION_PHASE_PERFORM3)
<nbd port='X'/>
Okay, I (destination) support this feature as well. I've created all
files as you (src) told me to and you can start rolling data. I am listening
on port X.
3) src -> dst: (QEMU_MIGRATION_PHASE_PERFORM3 -> QEMU_MIGRATION_PHASE_FINISH3)
<nbd port='-1'/>
Migration completed, destination, you may shut the NBD server down.
If either src or dst doesn't support NBD, it is not used and whole process fall
backs to old implementation.
diff to v1:
-Eric's and Daniel's suggestions worked in. To point out the bigger ones:
don't do NBD style when TUNNELLED requested, added 'b:writable' to
'nbd-server-add'
-drop '/qemu-migration/nbd/disk/@src' attribute from migration cookie.
As pointed out by Jirka, disk->src can be changed during migration (e.g. by
migration hook or by passed xml). So I've tried (as suggested on the list)
passing disk alias. However, since qemu hasn't been started on destination yet,
the aliases hasn't been generated yet. So we have to rely on ordering
completely.
The patches 1,3 and 5 has been ACKed already.
Michal Privoznik (11):
qemu: Introduce NBD_SERVER capability
Introduce NBD migration cookie
qemu: Introduce nbd-server-start command
qemu: Introduce nbd-server-add command
qemu: Introduce nbd-server-stop command
qemu_migration: Introduce qemuMigrationStartNBDServer
qemu_migration: Move port allocation to a separate func
qemu_migration: Implement qemuMigrationStartNBDServer()
qemu_migration: Implement qemuMigrationDriveMirror
qemu_migration: Check size prerequisites
qemu_migration: Stop NBD server at Finish phase
src/qemu/qemu_capabilities.c | 3 +
src/qemu/qemu_capabilities.h | 1 +
src/qemu/qemu_driver.c | 8 +-
src/qemu/qemu_migration.c | 620 +++++++++++++++++++++++++++++++++++++++---
src/qemu/qemu_migration.h | 6 +-
src/qemu/qemu_monitor.c | 63 +++++
src/qemu/qemu_monitor.h | 7 +
src/qemu/qemu_monitor_json.c | 95 +++++++
src/qemu/qemu_monitor_json.h | 7 +
9 files changed, 772 insertions(+), 38 deletions(-)
Now, that we are post release, it would be nice if somebody has a look
at this. Thanks.
Michal