On Wed, Nov 13, 2013 at 12:15:30PM +0800, Zheng Sheng ZS Zhou wrote:
Hi Daniel,
on 2013/11/12/ 20:23, Daniel P. Berrange wrote:> On Tue, Nov 12, 2013 at 08:14:11PM
+0800, Zheng Sheng ZS Zhou wrote:
>> Hi all,
>>
>> Recently QEMU developers are working on a feature to allow upgrading
>> a live QEMU instance to a new version without restarting the VM. This
>> is implemented as live migration between the old and new QEMU process
>> on the same host [1]. Here is the the use case:
>>
>> 1) Guests are running QEMU release 1.6.1.
>> 2) Admin installs QEMU release 1.6.2 via RPM or deb.
>> 3) Admin starts a new VM using the updated QEMU binary, and asks the old
>> QEMU process to migrate the VM to the newly started VM.
>>
>> I think it will be very useful to support QEMU live upgrade in libvirt.
>> After some investigations, I found migrating to the same host breaks
>> the current migration code. I'd like to propose a new work flow for
>> QEMU live migration. It is to implement the above step 3).
>
> How does it break migration code ? Your patch below is effectively
> re-implementing the multistep migration workflow, leaving out many
> important features (seemless reconnect to SPICE clients for example)
> which is really bad for our ongoing code support burden, so not
> something I want to see.
>
> Daniel
>
Actually I wrote another hacking patch to investigate how we
can re-use existing framework to do local migration. I found
the following problems.
(1) When migrate to different host, the destination domain uses
the same UUID and name as the source, and this is OK. When migrate
to localhost, destination domain UUID and name causes conflict
with the source. In QEMU driver, it maintains a hash table of
domain objects, the reference key is the UUID of the virtual
machine. The closeCallbacks is also a hash table with domain
UUID as key, and maybe there are other data structures using
UUID as key. This implies we use a different name and UUID
for the destination domain. In the migration framework, during
the Begin and Prepare stage, it calls virDomainDefCheckABIStability
to prevent us using a different UUID, and it also checks the
hostname and host UUID to be different. If we want to enable
local migration, we have to skip these check and generate new
UUID and name for destination domain. Of course we restore the
original UUID after migration. UUID is used in higher level
management software to identify virtual machines. It should
stay the same after QEMU live upgrade.
This point is something that needs to be solved regardless of
whether using migration framework, or re-inventing the migration
framework. The QEMU driver fundamentally assumes that there is
only ever one single VM with a given UUID, and a VM has only
1 process. IMHO name + uuid must be preserved during any live
upgrade process, otherwise mgmt will get confused. This has
more problems becasue 'name' is used for various resources
created by QEMU on disk - eg the monitor command path. We can't
have 2 QEMUs using the same name, but at the same time that's
exactly what we'd need here.
(2) If I understand the code correctly, libvirt uses thread
pool to handle RPC requests. This means local migration may
cause deadlock in P2P migration mode. Suppose there are some
concurrent local migration requests and all the worker threads
are occupied by these requests. When source libvirtd connects
destination libvirtd on the same host to negotiate the migration,
the negotiation request is queued, but the negotiation request
will never be handled, because the original migration request
from client is waiting for the negotiation request to finish
to progress, while the negotiation request is queued waiting
for the original request to end. This is one of the dealock
risk I can think of.
I guess in traditional migration mode, in which the client
opens two connections to source and destination libvirtd,
there is also risk to cause deadlock.
Yes, it sounds like you could get deadlock even with 2 separate
libvirtds, if both them were migrating to the other concurrently.
(3) Libvirt supports Unix domain socket transport, but
this is only used in a tunnelled migration. For native
migration, it only supports TCP. We need to enable Unix
domain socket transport in native migration. Now we already
have a hypervisor migration URI argument in the migration
API, but there is no support for parsing and verifying a
"unix:/full/path" URI and passing that URI transparently
to QEMU. We can add this to current migration framework
but direct Unix socket transport looks meaningless for
normal migration.
Actually as far as QEMU is concerned libvirt uses fd: migration
only. Again though this points seems pretty much unrelated to
the question of how we design the APIs & structure the code.
(4) When migration fails, the source domain is resumed, and
this may not work if we enable page-flipping in QEMU. With
page-flipping enabled, QEMU transfers memory page ownership
to the destination QEMU, so the source virtual machine
should be restarted but not resumed when the migration fails.
IMHO that is not an acceptable approach. The whole point of doing
live upgrades in place, is that you consider the VMs to be
"precious". If you were OK with VMs being killed & restarted then
we'd not bother doing any of this live upgrade pain at all.
So if we're going to support live upgrades, we *must* be able to
guarantee that they will either succeed, or the existing QEMU is
left intact. Killing the VM and restarting is not an option on
failure.
So I propose a new and compact work flow dedicated for QEMU
live upgrade. After all, it's an upgrade operation based on
tricky migration. When developing the previous RFC patch for
the new API, I focused on the correctness of the work flow,
so many other things are missing. I think I can add things
like Spice seamless migration when I submitting new versions.
This way lies madness. We do not want 2 impls of the internal
migration framework.
I am also really happy if you could give me some advice to
re-use the migration framework. Re-using the current framework
can saves a lot of effort.
I consider using the internal migration framework a mandatory
requirement here, even if the public API is different.
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|