On Fri, Jun 23, 2023 at 09:23:18AM +0100, Daniel P. Berrangé wrote:
On Thu, Jun 22, 2023 at 11:54:43AM -0400, Peter Xu wrote:
> On Thu, Jun 22, 2023 at 10:59:58AM +0100, Daniel P. Berrangé wrote:
> > I've mentioned several times before that the user should never need to
> > set this multifd-channels parameter (nor many other parameters) on the
> > destination in the first place.
> >
> > The QEMU migration stream should be changed to add a full
> > bi-directional handshake, with negotiation of most parameters.
> > IOW, the src QEMU should be configured with 16 channels, and
> > it should connect the primary control channel, and then directly
> > tell the dest that it wants to use 16 multifd channels.
> >
> > If we're expecting the user to pass this info across to the dest
> > manually we've already spectacularly failed wrt user friendliness.
>
> I can try to move the todo even higher. Trying to list the initial goals
> here:
>
> - One extra phase of handshake between src/dst (maybe the time to boost
> QEMU_VM_FILE_VERSION) before anything else happens.
>
> - Dest shouldn't need to apply any cap/param, it should get all from src.
> Dest still need to be setup with an URI and that should be all it needs.
There are a few that the dest will still need set explicitly. Specifically
the TLS parameters - tls-authz and tls-creds, because those are both
related to --object parameters configured on the dst QEMU. Potentially
there's an argument to be made for the TLS parameters to be part fo the
initial 'migrate' and 'migrate-incoming' command data, as they're
specifically related to the connection establishment, while (most) of
the other params are related to the migration protocol running inside
the connection.
Ideally we can even make tls options to be after the main connection is
established, IOW the gnutls handshake can be part of the generic handshake.
But yeah I agree that may contain much more work, so we may start with
assuming the v2 handshake just happen on the tls channel built for now.
I think the new protocol should allow extension so when we want to move the
tls handshake into it v2 protocol should be able to first detect src/dst
binary support of that, and switch to that if we want - then we can even
got a src qemu migration failure which tells "dest qemu forget to setup tls
credentials in cmdlines", or anything wrong on dest during tls setup.
A few other parameters are also related to the connection establishment,
most notably the enablement multifd, postcopy and postcopy-pre-empt.
As I mentioned in the list, I plan to make this part of the default v2
where v2 handshake will take care of managing the connections rather than
relying on the old code. I'm not sure how complicated it'll be, but the v2
protocol just sounds a good fit for having such a major change on how we
setup the channels, and chance we get all things alright from the start.
I think with those ones we don't need to set them on the src either.
With the new migration handshake we should probably use multifd
codepaths unconditionally, with a single channel.
The v2 handshake will be beneficial to !multifd as well. Right now I tend
to make it also work for !multifd, e.g., it always makes sense to do a
device tree comparision before migration, even if someone used special
tunneling so multifd may not be able to be enabled for whatever reason, but
as long as a return path is available so they can talk.
By matching with the introduction of new protocol, we have a nice
point
against which to deprecate the old non-multifd codepaths. We'll need to
keep the non-multifd code around *alot* longer than the normal
deprecation cycle though, as we need mig to/from very old QEMUs.
I actually had a feeling that we should always keep it.. I'm not sure
whether we must combine a new handshake to "making multifd the default". I
do think we can make the !multifd path very simple though, e.g., I'd think
we should start considering deprecate things like !multifd+compressions etc
earlier than that.
The enablement of postcopy could be automatic too - src & dst can
both detect if their host OS supports it. That would make all
migrations post-copy capable. The mgmt app just needs to trigger
the switch to post-copy mode *if* they want to use it.
Sounds doable.
Likewise we can just always assume postcopy-pre-empt is available.
I think 'return-path' becomes another one we can just assume too.
Right, handshake cap (or with the new QAPI of URI replacement) should imply
return path already.
Thanks,
--
Peter Xu