Hi Daniel,
I am happy that Libvirt is pushing local migration/live patching support, but
at the same time I am wondering what changed from what you said here:
https://www.redhat.com/archives/libvir-list/2017-September/msg00489.html
To give you a background, we have live patching enhancements in IBM backlog
since a few years ago, and one on the reasons these were being postponed
time and time again were the lack of Libvirt support and this direction of
"Libvirt is not interested in supporting it". And this message above was being
used internally as the rationale for it.
Thanks,
DHB
On 2/3/20 9:43 AM, Daniel P. Berrangé wrote:
I'm (re-)sending this patch series on behalf of Shaju Abraham
<shaju.abraham(a)nutanix.com> who has tried to send this several times
already.
Red Hat's email infrastructure is broken, accepting the mails and then
failing to deliver them to mailman, or any other Red Hat address.
Unfortunately it means that while we can send comments back to Shaju
on this thread, subscribers will then probably fail to see any responses
Shaju tries to give :-( To say this is bad is an understatement. I have
yet another ticket open tracking & escalating this awful problem but
can't give any ETA on a fix :-(
Anyway, with that out of the way, here's Shaju's original cover letter
below....
1) What is this patch series about?
Local live migration of a VM is about Live migrating a VM instance with in the
same node. Traditional libvirt live migration involves migrating the VM from a
source node to a remote node. The local migrations are forbidden in Libvirt for
a myriad of reasons. This patch series is to enable local migration in Libvirt.
2) Why Local Migration is important?
The ability to Live migrate a VM locally paves the way for hypervisor upgrades
without shutting down the VM. For example to upgrade qemu after a security
upgrade, we can locally migrate the VM to the new qemu instance. By utilising
capabilities like "bypass-shared-memory" in qemu, the hypervisor upgrades are
faster.
3) Why is local migration difficult in Libvirt?
Libvirt always assumes that the name/UUID pair is unique with in a node. During
local migration there will be two different VMs with the same UUID/name pair
which will confuse the management stack. There are other path variables like
monitor path, config paths etc which assumes that the name/UUID pair is unique.
So during migration the same monitor will be used by both the source and the
target. We cannot assign a temporary UUID to the target VM, since UUID is a part
of the machine ABI which is immutable.
To decouple the dependecy on UUID/name, a new field (the domain id) is included
in all the PATHs that Libvirt uses. This will ensure that all instances of the
VM gets a unique PATH.
4) How is the Local Migration Designed ?
Libvirt manages all the VM domain objects using two hash tables which are
indexed using either the UUID or Name.During the Live migration the domain
entry in the source node gets deleted and a new entry gets populated in the
target node, which are indexed using the same name/UUID.But for the Local
migration, there is no remote node. Both the source and the target nodes are
same. So inorder to model the remote node, two more hashtables are introduced
which represents the hash tables of the remote node during migration.
The Libvirt migration involves 5 stages
1) Begin
2) Prepare
3) Perform
4) Finish
5) Confirm
Begin,Perform and Confirm gets executed on the source node where as Prepare
and Finish gets executed on the target node. In the case of Local Migration
Perform and Finish stages uses the newly introduced 'remote hash table' and
rest of the stages uses the 'source hash tables'. Once the migration is
completed, that is after the confirm phase, the VM domain object is moved from
the 'remote hash table' to the 'source hash table'. This is required so
that
other Libvirt commands like 'virsh list' can display all the VMs running in the
node.
5) How to test Local Migration?
A new flag 'local' is added to the 'virsh migrate' command to enable
local
migration. The syntax is
virsh migrate --live --local 'domain-id' qemu+ssh://ip-address/system
6) What are the known issues?
SeLinux policies is know to have issues with the creating /dev/hugepages entries
during VM launch. In order to test local migration disable SeLinux using 'setenforce
0'.
Shaju Abraham (6):
Add VIR_MIGRATE_LOCAL flag to virsh migrate command
Introduce remote hash tables and helper routines
Add local migration support in QEMU Migration framework
Modify close callback routines to handle local migration
Make PATHs unique for a VM object instance
Move the domain object from remote to source hash table
include/libvirt/libvirt-domain.h | 6 +
src/conf/virdomainobjlist.c | 232 +++++++++++++++++++++++++++++--
src/conf/virdomainobjlist.h | 10 ++
src/libvirt_private.syms | 4 +
src/qemu/qemu_conf.c | 4 +-
src/qemu/qemu_domain.c | 28 +++-
src/qemu/qemu_domain.h | 2 +
src/qemu/qemu_driver.c | 46 +++++-
src/qemu/qemu_migration.c | 59 +++++---
src/qemu/qemu_migration.h | 5 +
src/qemu/qemu_migration_cookie.c | 121 ++++++++--------
src/qemu/qemu_migration_cookie.h | 2 +
src/qemu/qemu_process.c | 3 +-
src/qemu/qemu_process.h | 2 +
src/util/virclosecallbacks.c | 48 +++++--
src/util/virclosecallbacks.h | 3 +
tools/virsh-domain.c | 7 +
17 files changed, 471 insertions(+), 111 deletions(-)