-----Original Message-----
From: Daniel P. Berrange [mailto:berrange@redhat.com]
Sent: Thursday, April 21, 2011 5:02 AM
To: Christian Benvenuti (benve)
Cc: eblake(a)redhat.com; stefanb(a)linux.vnet.ibm.com; laine(a)laine.org;
chrisw(a)redhat.com; libvir-list(a)redhat.com; David Wang (dwang2); Roopa
Prabhu (roprabhu)
Subject: Re: [libvirt] [PATCH 3/6] Introduce yet another migration
version in API.
On Wed, Apr 20, 2011 at 04:28:12PM -0500, Christian Benvenuti (benve)
wrote:
> Daniel,
> I looked at the patch-set you sent out on the 2/9/11
>
> [libvirt] [PATCH 0/6] Introduce a new migration protocol
> to QEMU driver
>
http://www.mail-archive.com/libvir-list@redhat.com/msg33223.html
>
> What is the status of this new migration protocol?
> Is there any pending issue blocking its integration?
>
> I would like to propose an RFC enhancement to the migration
> algorithm.
>
> Here is a quick summary of the proposal/idea.
>
> - finer control on migration result
>
> - possibility of specifying what features cannot fail
> their initialization on the dst host during migration.
> Migration should not succeed if any of them fails.
> - optional: each one of those features should be able to
> provide a deinit function to cleanup resources
> on the dst host if migration fails.
I'm not really very convinced that allowing things to fail
during migration is useful, not least from the POV of the
app determining just what worked vs failed. IMHO, migration
should be atomic and only succeed, if everything related to
the guest succeeds.
I agree, the migration should be atomic ... in most cases.
However, the scenario I was referring to is different: in that
scenario you want the migration to complete as fast as possible.
Because of that, blocking on operations (such as net config) which
may need several seconds to complete and that could be taken
care in a second moment would not be desirable.
In such a scenario you would be choosing between these two
results:
1) you may lose the VM because the migration does not
complete fast enough
2) you have more chances of successfully moving the VM, but
you may lose something like network connectivity.
This loss may be acceptable (or less of a problem compared
to the loss of the VM run-time state) in the sense that
mgmt can re-try initializing it.
If we want to support a case where the dst can't connect to
the same network, then we should add an API that lets us
change the network backend on the fly.
'on the fly' when/where? On the dst host at the end of the migration?
NB, this is different
from NIC hotplug/unplug, in that the guest device never
changes. We merely change how the guest is connected to the
host.
So, if you have a guest with a NIC configureed using VEPA,
then you can re-configure it to use a 'no op' (aka /dev/null)
NIC backend, and then perform the migration.
Wouldn't it be better to let the migration try to migrate
the net connection too, and, in case it failed, let mgmt re-try
if configured to do so, based for example on a configuration
policy (ie, "net persistent re-try 10s")?
> ----------------------------------------------
> enhancement: finer control on migration result
> ----------------------------------------------
>
> There are different reasons why a VM may need (or be forced) to
> migrate.
> You can classify the types of the migrations also based on
> different semantics.
> For simplicity I'll classify them into two categories, based on
> how important it is for the VM to migrate as fast as possible:
>
> (1) It IS important
>
> In this case, whether the VM will not be able to (temporary)
> make use of certain resources (for example the network) on the
> dst host, is not that important, because the completion of the
> migration is considered higher priority.
> A possible scenario could be a server that must migrate ASAP
> because of a disaster/emergency.
>
> (2) It IS NOT important
>
> I can think of a VM whose applications/servers need a network
> connection in order to work properly. Loosing such network
> connectivity as a consequence of a migration would not be
> acceptable (or highly undesirable).
>
> Given the case (2) above, I have a comment about the Finish
> step, with regards to the port profile (VDP) codepath.
>
> The call to
>
> qemuMigrationVPAssociatePortProfile
>
> in
> qemuMigrationFinish
>
> can fail, but its result (success or failure) does not influence
> the result of the migration Finish step (it was already like this
> in migration V2).
> It is therefore possible for a VM to lose its network connectivity
> after a (successful) migration.
That is a clear bug in our code - something that can fail during
migration, should be causing migration to abort, leaving the guest
on the original host unchanged.
I agree.
However I believe there may be corner cases (like emergency scenarios)
where it can make sense to relax this policy.
/Christian
> BTW, would the new functionality being discussed in this 3D
>
> "RFC: virInterface change transaction API"
>
http://www.redhat.com/archives/libvir-list/2011-April/msg00499.html
>
> be able to provide the same configuration "atomicity" (ie, rollback
> in case of migration failure)?
> My understanding is that:
>
> - Such new framework would apply to (host) network config only.
> Even though it may cover the VDP (port profile) use case I
> mentioned above, it would not apply to other features that
> may need some kind of rollback after a migration failure.
Host NIC configuration from virInterface isn't really tied into
the migration at all. It is something the mgmt app has to do on
the source & dest hosts, before setting up any VMs, let alone
getting to migration.
Regards,
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-
manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-
vnc :|