Hi Eric,
on 2013/11/13 12:54, Eric Blake wrote:
On 11/12/2013 05:14 AM, Zheng Sheng ZS Zhou wrote:
> >From 2b659584f2cbe676c843ddeaf198c9a8368ff0ff Mon Sep 17 00:00:00 2001
> From: Zhou Zheng Sheng <zhshzhou(a)linux.vnet.ibm.com>
> Date: Wed, 30 Oct 2013 15:36:49 +0800
> Subject: [PATCH] RFC: Support QEMU live uprgade
>
> This patch is to support upgrading QEMU version without restarting the
> virtual machine.
>
> Add new API virDomainQemuLiveUpgrade(), and a new virsh command
> qemu-live-upgrade. virDomainQemuLiveUpgrade() migrates a running VM to
> the same host as a new VM with new name and new UUID. Then it shutdown
> the original VM and drop the new VM definition without shutdown the QEMU
> process of the new VM. At last it attaches original VM to the new QEMU
> process.
>
> Firstly the admin installs new QEMU package, then he runs
> virsh qemu-live-upgrade domain_name
> to trigger our virDomainQemuLiveUpgrade() upgrading flow.
In general, I agree that we need a new API (in fact, I think I helped
suggest why we need it as opposed to reusing existing migration API,
precisely for some of the deadlock reasons you called out in your reply
to Dan). But the new API should still reuse as much of the existing
migration code as possible (refactor that to be reusable, rather than
bulk copying into completely new code). High-level review below (I
didn't test whether things work or look for details like memory leaks,
so much as a first impression of style problems and even some major
design problems).
>
> Signed-off-by: Zhou Zheng Sheng <zhshzhou(a)linux.vnet.ibm.com>
> ---
> include/libvirt/libvirt.h.in | 3 +
> src/driver.h | 4 +
> src/libvirt.c | 23 +++
> src/libvirt_public.syms | 1 +
I know this is an RFC patch, but ideally the final patch will be split
into parts. Part 1: public interface (the files above).
> src/qemu/qemu_driver.c | 339 +++++++++++++++++++++++++++++++++++++++++++
> src/qemu/qemu_migration.c | 2 +-
> src/qemu/qemu_migration.h | 3 +
Part 4: Qemu driver implementation (these 3 files)
> src/remote/remote_driver.c | 1 +
> src/remote/remote_protocol.x | 19 ++-
Part 3: RPC implementation (these 2)
> tools/virsh-domain.c | 139 ++++++++++++++++++
Part 2: Use of the public implementation (should compile and gracefully
fail until parts 3 and 4 are also applied; doing it early means you can
test the error paths as well as validate that the C API seems usable)
You may also need to do a Part 5 to modify the python bindings,
depending on whether the code generator was able to get it working on
your behalf.
> +++ b/include/libvirt/libvirt.h.in
> @@ -1331,6 +1331,9 @@ int virDomainMigrateGetMaxSpeed(virDomainPtr domain,
> unsigned long *bandwidth,
> unsigned int flags);
>
> +virDomainPtr virDomainQemuLiveUpgrade(virDomainPtr domain,
> + unsigned int flags);
No bandwidth parameter? Seems inconsistent with all the other migration
APIs. Also, since this is moving from one qemu process to another, this
_totally_ seems like a case for allowing a destination XML argument
(lots of people have asked for the ability to point qemu to an alternate
disk source with identical bits visible to the guest but better
characteristics for the host; right now, their solution is to 'virsh
save', 'virsh save-image-edit', 'virsh restore'; but your API could
let
the specify an alternate XML and do a single live-upgrade so that the
new qemu is using the new file name).
Anything with "Qemu" in the API name belongs in libvirt-qemu.h. But
this API seems like it is useful to more than just qemu; it could be
used for other hypervisors as well. You need a more generic name; mayb
virDomainLiveUpgrade().
Thanks for reminding me of the line wrapping problem. My usual email box
gets rejected by the libvir-list server. The server can not resolve
linux.vnet.ibm.com. I used this email box and did not notice the
wrapping problem, sorry.
Thank you very much for the suggestions. I agree with you on splitting
the patch, bandwidth and XML parameter, programme style, memcpy and many
other advices. I will start to refactor the framework and re-use it as
much as possible.
QEMU developer Lei Li said that for 1 GB memory guest, the downtime of
page-flipping migration is roughly 650ms. Kernel developers are working
on new vmsplice implementation to improve page-flipping. So timeout and
bandwidth may be not useful for it. But as you suggested, we can make the
virDomainLiveUpgrade a general API for all hypervisors. Since other
hypervisors may not support page-flipping, I will still keep timeout and
bandwidth arguments in the new API.
As I wrote in previous mails. I find domian UUID very important in
libvirt. It causes a lot of troubles if we start the destination domain
with the same UUID. Actually I did try to hack libvirt to do this but
wasn't successful.
I discussed with Lei Li in the CC list. We can add a new QEMU monitor
command to change the guest UUID. During the migration, we firstly start
destination QEMU process with all vCPUs paused, and this QEMU process gets
a different temp UUID. After migration, we call that monitor command to
change the UUID back, then resume vCPUs. Guest OS should not notice this
change.
So the new flow should like this
1. Orginal domain migrates to new domain with different UUID.
2. Send monitor command to QEMU process of the new domain to change UUID
back.
3. Drop new domain obj from QEMU driver but don't kill new QEMU process.
4. Shutdown original domain, and ask it to attach to the new QEMU process.
5. Resume vCPUs.
I doubt other hypervisors would support changing the UUID, that's why I
made the new API specific to QEMU in the RFC patch. Another option is
that we start the QEMU process with the same UUID but in libvirt domain
object we use a different UUID. At last we drop the temp domain object
and attaches to the destination QEMU process. In this way we avoid
changing the guest UUID in the hypervisor. Thus the new API is feasible
for all hypervisors. However this makes even tougher to re-use existing
migration framework.
Thanks and best regards!
_____________________________
Zhou Zheng Sheng / 周征晟
Software Engineer
E-mail: zhshzhou(a)cn.ibm.com
Telephone: 86-10-82454397