On Tue, Nov 12, 2013 at 09:54:44PM -0700, Eric Blake wrote:
On 11/12/2013 05:14 AM, Zheng Sheng ZS Zhou wrote:
>>From 2b659584f2cbe676c843ddeaf198c9a8368ff0ff Mon Sep 17 00:00:00 2001
> From: Zhou Zheng Sheng <zhshzhou(a)linux.vnet.ibm.com>
> Date: Wed, 30 Oct 2013 15:36:49 +0800
> Subject: [PATCH] RFC: Support QEMU live uprgade
>
> This patch is to support upgrading QEMU version without restarting the
> virtual machine.
>
> Add new API virDomainQemuLiveUpgrade(), and a new virsh command
> qemu-live-upgrade. virDomainQemuLiveUpgrade() migrates a running VM to
> the same host as a new VM with new name and new UUID. Then it shutdown
> the original VM and drop the new VM definition without shutdown the QEMU
> process of the new VM. At last it attaches original VM to the new QEMU
> process.
>
> Firstly the admin installs new QEMU package, then he runs
> virsh qemu-live-upgrade domain_name
> to trigger our virDomainQemuLiveUpgrade() upgrading flow.
In general, I agree that we need a new API (in fact, I think I helped
suggest why we need it as opposed to reusing existing migration API,
precisely for some of the deadlock reasons you called out in your reply
to Dan). But the new API should still reuse as much of the existing
migration code as possible (refactor that to be reusable, rather than
bulk copying into completely new code). High-level review below (I
didn't test whether things work or look for details like memory leaks,
so much as a first impression of style problems and even some major
design problems).
I really don't like the idea of adding a new API for this - IMHO we
need to address the deadlock scenario and fit this into our existing
migration APIs. In particular calling this "live upgrades" is wrong,
as that is just a specific use case. Functionally this is "localhost
migration" and so belongs in the migration APIs.
As mentioned in my other message, I believe the deadlock scenario
mentioned could even occurr in non-localhost migration, if two
libvirtds were doing migrating concurrent migrations in opposite
directions. So this seems like something we need to look at fixing
somehow. Perhaps it needs a dedicated thread pool, or spawn on
demand thread, just for doing the specific migration RPC call that
could deadlock, so we can guarantee we can always succeed in it ?
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|