
On 08/11/10 - 03:37:12PM, David Teigland wrote:
There are complications around migration we need to consider too. During migration, you actually need QEMU running on two hosts at once. IIRC the idea is that before starting the migration operation, we'd have to tell sync-manager to mark the lease as shared with a specific host. The destination QEMU would have to startup in shared mode, and upgrade this to an exclusive lock when migration completes, or quit when migration fails.
sync_manager leases can only be exclusive, so it's a matter of transfering ownership of the exclusive lock from source host to destination host. We have not yet added lease transfer capabilities to sync_manager, but it might look something like this:
S = source host, sm-S = sync_manager on S, ... D = destination host, sm-D = sync_manager on D, ...
1. sm-S holds the lease, and is monitoring qemu 2. migration begins from S to D 3. libvirt-D runs sm-D: sync_manager -c qemu with the addition of a new sync_manager option --receive-lease 4. sm-D writes its hostid D to the lease area signaling sm-S that it wants to be the lease owner when S is done with it 5. sm-D begins monitoring the lease owner on disk (which is still S) 6. sm-D forks qemu-D 7. sm-S sees that D wants the lease 8. qemu-S exits with success 9. sm-S sees qemu-S exit with success 10. sm-S writes D as the lease owner into the lease area and exits (in the non-migration/transfer case, sm-S writes owner=LEASE_FREE) 11. sm-D (still monitoring the lease owner) sees that it has become the owner, and begins renewing the lease 12. qemu-D runs fully
Unfortunately, this is not how migration works in qemu/kvm. Using your nomenclature above, it's more like the following: A guest is running on S. A migration is then initiated, at which point D fires up a qemu process with a -incoming argument. This is sort of a container process that will receive all of the migration data. Crucially for sync-manager, though, qemu completely starts up and "attaches" to all of the resources (including disks) *while* qemu at S is still running. Then it enters a sort of paused state (where the guest cannot run), and receives all of the migration data. Once all of the migration data has been received, the guest on S is destroyed, and the guest on D is unpaused. That's why Dan mentioned that we need two hosts to access the disk at once. -- Chris Lalancette