Re: Live migration support for Cloud-Hypervisor VMs

Tuesday, 2 August 2022

On Mon, Aug 01, 2022 at 11:03:49AM -0500, Praveen K Paladugu wrote:
...
 Folks,

 We are implementing Live Migration support in "ch" driver of Libvirt. I'd
 like to confirm if the approach we have chosen would be accepted upstream
 once implemented.

 Our immediate goal is to implement "Hypervisor Native" + "Managed
Direct"
 mode of migration. "Hypervisor Native" here referring to VMM(ch) being
 responsible for data flow. This in contrast to TUNNELED migration where data
 is sent over libvirt rpc. 
Avoiding TUNNELLED migration is a very good idea. This was a short term
hack to workaround the lack of TLS support in QEMU. It is more efficient
to have TLS natively integrated in the hypervisor layer than libvirt.

IOW, "Hypervisor native" is a good choice.

...

 "Managed Direct" referring to virsh client responsible for control flow
 between source and dest hosts. The libvirtd daemons on source and
 destination do not have to communicate with each other. These modes are
 described further at
 https://libvirt.org/migration.html#network-data-transports. 
I'd caution that I think 'managed direct' migration leaves you with
fewer nice options for ensuring resilience of the migration.

IOW, if the client application goes away, I think it'll be harder
for the libvirt CH driver to recover from that scenario.

Also if a client app is using the DigitalOcean 'go-libvirt' API
instead of our 'libvit-go-module' API, things are even more
limited since thg 'go-libvirt' API directly speaks to the RPC
protocol, bypassing libvirt.so logic related to migration
process steps.

With the peer-to-peer mode, migration can carry on even if the
client app goes away, since the client app isn't a part of the
control loop.

So overall, I'd encourage peer-to-peer migration as the preferrable
option, unless you can hand-off absolutely everything to the CH
code and not have libvirt involved in orchestrating the migration
steps at all ?

...
 At the moment, Cloud-Hypervisor supports receiving migration data
only on
 Unix Domain Sockets. Also, Cloud-Hypervisor does not encrypt the VM data
 while sending. 
Hmm, that's quite limiting.

...

 We are considering forking "socat" processes as documented at
https://github.com/cloud-hypervisor/cloud-hypervisor/blob/main/docs/live_....
 The socat processes will be forked in "Prepare" and "Perform" phases
on
 Destination and Source hosts respectively.

 I couldn't find any existing implementation in libvirt to connect Domain
 Sockets on different hosts. Please let me know, if you'd recommend a
 different approach from forking socat processes to connect Domain Sockets on
 source and dest hosts to allow Live VM Migration. 
I think building something around socat will get you going quickly, but
ultimately be harmful over the long term.

Our experiance with QEMU has been that to maximise performance you need
the lowest level in full control. These days QEMU can open multiple TCP
connections concurrently from multiple, so that throughput isn't limited
by data copy performance of a single CPU. It also has ability to take
advantage of kernel features like zerocopy. Use of an socat proxy is
going to add many data copies to the transport which can only harm your
performance.

So my recommendation would be to invest time in first extending CH so
that it natively supports opening TCP connections, and then take advantage
of that in libvirt from the start. You then have the basic foundation
right on which to add stuff like TLS, zerocopy, multi-conection, and more

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: Live migration support for Cloud-Hypervisor VMs