On Fri, Jun 23, 2023 at 10:51:53AM -0400, Peter Xu wrote:
On Fri, Jun 23, 2023 at 09:23:18AM +0100, Daniel P. Berrangé wrote:
> On Thu, Jun 22, 2023 at 11:54:43AM -0400, Peter Xu wrote:
> > On Thu, Jun 22, 2023 at 10:59:58AM +0100, Daniel P. Berrangé wrote:
> > > I've mentioned several times before that the user should never need
to
> > > set this multifd-channels parameter (nor many other parameters) on the
> > > destination in the first place.
> > >
> > > The QEMU migration stream should be changed to add a full
> > > bi-directional handshake, with negotiation of most parameters.
> > > IOW, the src QEMU should be configured with 16 channels, and
> > > it should connect the primary control channel, and then directly
> > > tell the dest that it wants to use 16 multifd channels.
> > >
> > > If we're expecting the user to pass this info across to the dest
> > > manually we've already spectacularly failed wrt user friendliness.
> >
> > I can try to move the todo even higher. Trying to list the initial goals
> > here:
> >
> > - One extra phase of handshake between src/dst (maybe the time to boost
> > QEMU_VM_FILE_VERSION) before anything else happens.
> >
> > - Dest shouldn't need to apply any cap/param, it should get all from src.
> > Dest still need to be setup with an URI and that should be all it needs.
>
> There are a few that the dest will still need set explicitly. Specifically
> the TLS parameters - tls-authz and tls-creds, because those are both
> related to --object parameters configured on the dst QEMU. Potentially
> there's an argument to be made for the TLS parameters to be part fo the
> initial 'migrate' and 'migrate-incoming' command data, as
they're
> specifically related to the connection establishment, while (most) of
> the other params are related to the migration protocol running inside
> the connection.
Ideally we can even make tls options to be after the main connection is
established, IOW the gnutls handshake can be part of the generic handshake.
But yeah I agree that may contain much more work, so we may start with
assuming the v2 handshake just happen on the tls channel built for now.
I think the new protocol should allow extension so when we want to move the
tls handshake into it v2 protocol should be able to first detect src/dst
binary support of that, and switch to that if we want - then we can even
got a src qemu migration failure which tells "dest qemu forget to setup tls
credentials in cmdlines", or anything wrong on dest during tls setup.
Doing negotiated "upgrades" from plain to TLS mode is generally frowned
upon, as it opens up potentially dangerous attack routes which can prevent
the upgrade from happening.
If the user/app controlling the client and server side of a connection
both know they want TLS, the best practice is for a connection to start
in TLS mode *immediately*, never sending any data in the clear. We have
this ability in QEMU right now, with libvirt explicitly enabling TLS
mode on src + dst, and we should keep that in any v2 migration protocol.
> A few other parameters are also related to the connection
establishment,
> most notably the enablement multifd, postcopy and postcopy-pre-empt.
As I mentioned in the list, I plan to make this part of the default v2
where v2 handshake will take care of managing the connections rather than
relying on the old code. I'm not sure how complicated it'll be, but the v2
protocol just sounds a good fit for having such a major change on how we
setup the channels, and chance we get all things alright from the start.
>
> I think with those ones we don't need to set them on the src either.
> With the new migration handshake we should probably use multifd
> codepaths unconditionally, with a single channel.
The v2 handshake will be beneficial to !multifd as well. Right now I tend
to make it also work for !multifd, e.g., it always makes sense to do a
device tree comparision before migration, even if someone used special
tunneling so multifd may not be able to be enabled for whatever reason, but
as long as a return path is available so they can talk.
> By matching with the introduction of new protocol, we have a nice point
> against which to deprecate the old non-multifd codepaths. We'll need to
> keep the non-multifd code around *alot* longer than the normal
> deprecation cycle though, as we need mig to/from very old QEMUs.
I actually had a feeling that we should always keep it.. I'm not sure
whether we must combine a new handshake to "making multifd the default". I
do think we can make the !multifd path very simple though, e.g., I'd think
we should start considering deprecate things like !multifd+compressions etc
earlier than that.
My thought is that right now QEMU has effectively has two completely
distinct migration protocols (even though they share the initial
phase), and two distinct internal code paths, one for traditional
single channel and one for the multifd.
IIUC, Juan expressed a desire to get rid of non-multifd migration.
The deprecation of compression, with the message that users should
use compression on multifd just re-inforces this view the non-multifd
is a dead end.
If we go for a new v2 protocol, we're adding another network protocol
to the testing matrix. If we support the new protocol for both non-multifd
and multifd, then we've got two additions to the testing matrix instead of
just one extra. I think that's a bad idea if we're intending to get rid
of non-multifd codepaths in future.
The deprecation period of getting rid of non-multifd will be long, so
the sooner we start that the better. The deprecation period for getting
rid of v1 protocol and more to exclusively v2, will be similarly long.
Overall I think it is better for us to align the two and keep non-multifd
strictly v1 only.
With regards,
Daniel
--
|:
https://berrange.com -o-
https://www.flickr.com/photos/dberrange :|
|:
https://libvirt.org -o-
https://fstop138.berrange.com :|
|:
https://entangle-photo.org -o-
https://www.instagram.com/dberrange :|