On Thu, Jun 22, 2023 at 05:33:29PM +0100, Daniel P. Berrangé wrote:
On Thu, Jun 22, 2023 at 11:54:43AM -0400, Peter Xu wrote:
> I can try to move the todo even higher. Trying to list the initial goals
> here:
>
> - One extra phase of handshake between src/dst (maybe the time to boost
> QEMU_VM_FILE_VERSION) before anything else happens.
>
> - Dest shouldn't need to apply any cap/param, it should get all from src.
> Dest still need to be setup with an URI and that should be all it needs.
>
> - Src shouldn't need to worry on the binary version of dst anymore as long
> as dest qemu supports handshake, because src can fetch it from dest.
I'm not sure that works in general. Even if we have a handshake and
bi-directional comms for live migration, we still haave the save/restore
to file codepath to deal with. The dst QEMU doesn't exist at the time
the save process is done, so we can't add logic to VMSate handling that
assumes knowledge of the dst version at time of serialization.
My current thought was still based on a new cap or anything the user would
need to specify first on both sides (but hopefully the last cap to set on
dest).
E.g. if with a new handshake cap we shouldn't set it on a exec: or file:
protocol migration, and it should just fail on qmp_migrate() telling that
the URI is not supported if the cap is set. Return path is definitely
required here.
> - Handshake can always fail gracefully if anything wrong happened, it
> normally should mean dest qemu is not compatible with src's setup (either
> machine, device, or migration configs) for whatever reason. Src should
> be able to get a solid error from dest if so.
>
> - Handshake protocol should always be self-bootstrap-able, it means when we
> change the handshake protocol it should always works with old binaries.
>
> - When src is newer it should be able to know what's missing on dest and
> skip the new bits.
>
> - When dst is newer it should all rely on src (which is older) and it
> should always understand src's language.
I'm not convinced it can reliably self-bootstrap in a backwards
compatible manner, precisely because the current migration stream
has no handshake and only requires a unidirectional channel.
Yes, please see above. I meant when we grow the handshake protocol we
should make sure we don't need anything new to be setup either on src/dst
of qemu. It won't apply to before-handshake binaries.
I don't think its possible for QEMU to validate that it has a
fully
bi-directional channel, without adding timeouts to its detection which I
think we should strive to avoid.
I don't think we actually need self-bootstrapping anyway.
I think the mgmt app can just indicate the new v2 bi-directional
protocol when issuing the 'migrate' and 'migrate-incoming'
commands. This becomes trivial when Het's refactoring of the
migrate address QAPI is accepted:
https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg04851.html
eg:
{ "execute": "migrate",
"arguments": {
"channels": [ { "channeltype": "main",
"addr": { "transport": "socket",
"type": "inet",
"host": "10.12.34.9",
"port": "1050" } } ] } }
note the 'channeltype' parameter here. If we declare the 'main'
refers to the existing migration protocol, then we merely need
to define a new 'channeltype' to use as an indicator for the
v2 migration handshake protocol.
Using a new channeltype would also work at least on src qemu, but I'm not
sure on how dest qemu would know that it needs a handshake in that case,
because it knows nothing until the connection is established.
Maybe we still need QEMU_VM_FILE_VERSION to be boosted at least in this
case, so dest can read this at the very beginning, old binaries will fail
immediately, new binaries will start to talk with v2 language.
> - All !main channels need to be established later than the handshake - if
> we're going to do this anyway we probably should do it altogether to make
> channels named, so each channel used in migration needs to have a common
> header. Prepare to deprecate the old tricks of channel orderings.
Once the primary channel involves a bi-directional handshake,
we'll trivially ensure ordering - similar to how the existing
code worked fnie in TLS mode which had a bi-directional TLS
handshake.
I'm not sure I fully get it here.
IIUC tls handshake was mostly transparent to QEMU in this case while we're
relying on gnutls_handshake(). Here IIUC we need to design the roundtrip
messages to sync up two qemus well.
The round trip messages can contain a lot of things that can be useful to
us, besides knowing what features dest supports, what caps src use, we can
e.g. also provide a device tree dump from dest and try to match it on src,
failing the migration very early if we see any mismatch. Right now we fail
too late, only until the device load (which is the last stage).
For channel orders, I'd expect the v2 protocol contains a phase to talk on
the channels and creation of named channels should be part of setup phase
before anything will happen next.
Thanks,
--
Peter Xu