On Fri, Jul 26, 2024 at 09:48:02AM +0100, Daniel P. Berrangé wrote:
On Fri, Jul 26, 2024 at 09:03:24AM +0200, Thomas Huth wrote:
> On 26/07/2024 08.08, Michael S. Tsirkin wrote:
> > On Thu, Jul 25, 2024 at 06:18:20PM -0400, Peter Xu wrote:
> > > On Tue, Aug 01, 2023 at 01:31:48AM +0300, Yuri Benditovich wrote:
> > > > USO features of virtio-net device depend on kernel ability
> > > > to support them, for backward compatibility by default the
> > > > features are disabled on 8.0 and earlier.
> > > >
> > > > Signed-off-by: Yuri Benditovich <yuri.benditovich(a)daynix.com>
> > > > Signed-off-by: Andrew Melnychecnko <andrew(a)daynix.com>
> > >
> > > Looks like this patch broke migration when the VM starts on a host that
has
> > > USO supported, to another host that doesn't..
> >
> > This was always the case with all offloads. The answer at the moment is,
> > don't do this.
>
> May I ask for my understanding:
> "don't do this" = don't automatically enable/disable virtio
features in QEMU
> depending on host kernel features, or "don't do this" = don't try
to migrate
> between machines that have different host kernel features?
>
> > Long term, we need to start exposing management APIs
> > to discover this, and management has to disable unsupported features.
>
> Ack, this likely needs some treatments from the libvirt side, too.
When QEMU automatically toggles machine type featuers based on host
kernel, relying on libvirt to then disable them again is impractical,
as we cannot assume that the libvirt people are using knows about
newly introduced features. Even if libvirt is updated to know about
it, people can easily be using a previous libvirt release.
QEMU itself needs to make the machine types do that they are there
todo, which is to define a stable machine ABI.
What QEMU is missing here is a "platform ABI" concept, to encode
sets of features which are tied to specific platform generations.
As long as we don't have that we'll keep having these broken
migration problems from machine types dynamically changing instead
of providing a stable guest ABI.
Any more elaboration on this idea? Would it be easily feasible in
implementation?
I'd second any sane solution that we can avoid happening similar breakages
in the future.
I also wonder what else might be easily affected like this too when
migration can break with changed kernel or changed HW. I suppose the CPU
model is well covered by Libvirt so we're fine at least on x86 etc. While
IIUC KVM always have such thoughts in mind, so that KVM will make sure to
not break an userspace in such way or it'll simply be a KVM bug and fixed.
Thanks,
--
Peter Xu