-----Original Message-----
From: Daniel P. Berrange [mailto:berrange@redhat.com]
Sent: Thursday, April 21, 2011 4:44 AM
To: Stefan Berger
Cc: Christian Benvenuti (benve); eblake(a)redhat.com; laine(a)laine.org;
chrisw(a)redhat.com; libvir-list(a)redhat.com; David Wang (dwang2); Roopa
Prabhu (roprabhu); Gerhard Stenzel; Jens Osterkamp; Anthony Liguori
Subject: Re: [libvirt] [PATCH 3/6] Introduce yet another migration
version in API.
On Thu, Apr 21, 2011 at 07:37:30AM -0400, Stefan Berger wrote:
> On 04/20/2011 11:38 PM, Christian Benvenuti (benve) wrote:
> >>On 04/20/2011 05:28 PM, Christian Benvenuti (benve) wrote:
> >>>Daniel,
> >>> I looked at the patch-set you sent out on the 2/9/11
> >>>
> >>> [libvirt] [PATCH 0/6] Introduce a new migration protocol
> >>> to QEMU driver
> >>>
http://www.mail-archive.com/libvir-
list(a)redhat.com/msg33223.html
> >>>
> >>>What is the status of this new migration protocol?
> >>>Is there any pending issue blocking its integration?
> >>>
> >>>I would like to propose an RFC enhancement to the migration
> >>>algorithm.
> >>>
> >>>Here is a quick summary of the proposal/idea.
> >>>
> >>>- finer control on migration result
> >>>
> >>> - possibility of specifying what features cannot fail
> >>> their initialization on the dst host during migration.
> >>> Migration should not succeed if any of them fails.
> >>> - optional: each one of those features should be able to
> >>> provide a deinit function to cleanup resources
> >>> on the dst host if migration fails.
> >>>
> >>>This functionality would come useful for the (NIC) set port
> >>>profile feature VDP (802.1Qbg/1Qbh), but what I propose is
> >>>a generic config option / API that can be used by any feature.
> >>>
> >>>And now the details.
> >>>
> >>>----------------------------------------------
> >>>enhancement: finer control on migration result
> >>>----------------------------------------------
> >>>
> >>>There are different reasons why a VM may need (or be forced) to
> >>>migrate.
> >>>You can classify the types of the migrations also based on
> >>>different semantics.
> >>>For simplicity I'll classify them into two categories, based on
> >>>how important it is for the VM to migrate as fast as possible:
> >>>
> >>>(1) It IS important
> >>>
> >>> In this case, whether the VM will not be able to (temporary)
> >>> make use of certain resources (for example the network) on
the
> >>> dst host, is not that important, because the completion of
the
> >>> migration is considered higher priority.
> >>> A possible scenario could be a server that must migrate ASAP
> >>> because of a disaster/emergency.
> >>>
> >>>(2) It IS NOT important
> >>>
> >>> I can think of a VM whose applications/servers need a network
> >>> connection in order to work properly. Loosing such network
> >>> connectivity as a consequence of a migration would not be
> >>> acceptable (or highly undesirable).
> >>>
> >>>Given the case (2) above, I have a comment about the Finish
> >>>step, with regards to the port profile (VDP) codepath.
> >>>
> >>>The call to
> >>>
> >>> qemuMigrationVPAssociatePortProfile
> >>>
> >>>in
> >>> qemuMigrationFinish
> >>>
> >>>can fail, but its result (success or failure) does not influence
> >>>the result of the migration Finish step (it was already like this
> >>>in migration V2).
> >>I *believe* the underlying problem is Qemu's switch-over. Once Qemu
> >>decides that the migration was successful, Qemu on the source side
> >dies
> >>and continues running on the destination side. I don't think there
are
> >>more handshakes foreseen with higher layers that this could be
> >reversed
> >>or the switch-over delayed, but correct me if I am wrong...
> >Actually I think this is not what happens in migration V3.
> >My understanding is this:
> >
> >- the qemu cmdline built by Libvirt on the dst host during Prepare3
> > includes the "-S" option (ie no autostart)
> >
> >- the VM on the dst host does not start running until libvirt
> > calls qemuProcessStartCPUs in the Finish3 step.
> > This fn simply sends the "-cont" cmd to the monitor to
> > start the VM/CPUs.
> That's correct, but it's doing this already in v2. The non-autostart
> (-S) corresponds to Qemu's autostart here (migration.c):
>
> void process_incoming_migration(QEMUFile *f)
> {
> if (qemu_loadvm_state(f) < 0) {
> fprintf(stderr, "load of migration failed\n");
> exit(0);
> }
> qemu_announce_self();
> DPRINTF("successfully loaded vm state\n");
>
> incoming_expected = false;
>
> if (autostart)
> vm_start();
> }
>
> and simply doesn't start the VM. After this function is called all
> sockets are closed and the communication with the source host is
> cut. I don't think it allows for fall-back at this point.
Sure it does. As long as the destination QEMU CPUs have not been
started, you can fallback by simply killing the dest QEMU and
restarting CPUs on the src QEMU.
> Rather we may need a 'wait' option for migration and before the
>
> qemu_put_byte(f, QEMU_VM_EOF);
>
> in qemu_savevm_state_complete() sync with the monitor and either
> wait for something like migrate_finish or migrate_cancel.
The real problem, is that while we can tell from 'info migrate'
on the src, when the src has finished sending all data, there is
no way to ask the dest QEMU when it has finished receiving all
data.
So libvirt assumes that 'src finished sending' == success, and
will attempt to start the dst QEMU CPUs. As raised many times
in the past, we need 'info migrate' to work on the destination
too, in order to query success/fail. And ideally need async
events emitted when migration completes, so we don't have to
poll on 'info migrate' every 50ms
What is the reason why this point ('info migrate' on dst host)
was raised many times in the past but it was never implemented?
Is there any technical reason?
Assuming the interval between the moment src host finishes sending
and the dst host finishes receiving is not too big (which is
a fair assumption I guess), libvirt on the dst host could block
on that condition (ie wait for 'info migrate' to say "rx all" in the
dst host) at the beginning of Finish3. Is it doable?
/Christian