On 2024/07/30 12:17, Jason Wang wrote:
On Tue, Jul 30, 2024 at 11:12 AM Akihiko Odaki
<akihiko.odaki(a)daynix.com> wrote:
>
> On 2024/07/30 12:03, Jason Wang wrote:
>> On Tue, Jul 30, 2024 at 10:57 AM Akihiko Odaki <akihiko.odaki(a)daynix.com>
wrote:
>>>
>>> On 2024/07/30 11:04, Jason Wang wrote:
>>>> On Tue, Jul 30, 2024 at 12:43 AM Akihiko Odaki
<akihiko.odaki(a)daynix.com> wrote:
>>>>>
>>>>> On 2024/07/29 23:29, Peter Xu wrote:
>>>>>> On Mon, Jul 29, 2024 at 01:45:12PM +0900, Akihiko Odaki wrote:
>>>>>>> On 2024/07/29 12:50, Jason Wang wrote:
>>>>>>>> On Sun, Jul 28, 2024 at 11:19 PM Akihiko Odaki
<akihiko.odaki(a)daynix.com> wrote:
>>>>>>>>>
>>>>>>>>> On 2024/07/27 5:47, Peter Xu wrote:
>>>>>>>>>> On Fri, Jul 26, 2024 at 04:17:12PM +0100, Daniel
P. Berrangé wrote:
>>>>>>>>>>> On Fri, Jul 26, 2024 at 10:43:42AM -0400,
Peter Xu wrote:
>>>>>>>>>>>> On Fri, Jul 26, 2024 at 09:48:02AM +0100,
Daniel P. Berrangé wrote:
>>>>>>>>>>>>> On Fri, Jul 26, 2024 at 09:03:24AM
+0200, Thomas Huth wrote:
>>>>>>>>>>>>>> On 26/07/2024 08.08, Michael S.
Tsirkin wrote:
>>>>>>>>>>>>>>> On Thu, Jul 25, 2024 at
06:18:20PM -0400, Peter Xu wrote:
>>>>>>>>>>>>>>>> On Tue, Aug 01, 2023 at
01:31:48AM +0300, Yuri Benditovich wrote:
>>>>>>>>>>>>>>>>> USO features of
virtio-net device depend on kernel ability
>>>>>>>>>>>>>>>>> to support them, for
backward compatibility by default the
>>>>>>>>>>>>>>>>> features are disabled
on 8.0 and earlier.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Signed-off-by: Yuri
Benditovich <yuri.benditovich(a)daynix.com>
>>>>>>>>>>>>>>>>> Signed-off-by: Andrew
Melnychecnko <andrew(a)daynix.com>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Looks like this patch
broke migration when the VM starts on a host that has
>>>>>>>>>>>>>>>> USO supported, to another
host that doesn't..
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This was always the case with
all offloads. The answer at the moment is,
>>>>>>>>>>>>>>> don't do this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> May I ask for my understanding:
>>>>>>>>>>>>>> "don't do this" =
don't automatically enable/disable virtio features in QEMU
>>>>>>>>>>>>>> depending on host kernel
features, or "don't do this" = don't try to migrate
>>>>>>>>>>>>>> between machines that have
different host kernel features?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Long term, we need to start
exposing management APIs
>>>>>>>>>>>>>>> to discover this, and
management has to disable unsupported features.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ack, this likely needs some
treatments from the libvirt side, too.
>>>>>>>>>>>>>
>>>>>>>>>>>>> When QEMU automatically toggles
machine type featuers based on host
>>>>>>>>>>>>> kernel, relying on libvirt to then
disable them again is impractical,
>>>>>>>>>>>>> as we cannot assume that the libvirt
people are using knows about
>>>>>>>>>>>>> newly introduced features. Even if
libvirt is updated to know about
>>>>>>>>>>>>> it, people can easily be using a
previous libvirt release.
>>>>>>>>>>>>>
>>>>>>>>>>>>> QEMU itself needs to make the machine
types do that they are there
>>>>>>>>>>>>> todo, which is to define a stable
machine ABI.
>>>>>>>>>>>>>
>>>>>>>>>>>>> What QEMU is missing here is a
"platform ABI" concept, to encode
>>>>>>>>>>>>> sets of features which are tied to
specific platform generations.
>>>>>>>>>>>>> As long as we don't have that
we'll keep having these broken
>>>>>>>>>>>>> migration problems from machine types
dynamically changing instead
>>>>>>>>>>>>> of providing a stable guest ABI.
>>>>>>>>>>>>
>>>>>>>>>>>> Any more elaboration on this idea? Would
it be easily feasible in
>>>>>>>>>>>> implementation?
>>>>>>>>>>>
>>>>>>>>>>> In terms of launching QEMU I'd imagine:
>>>>>>>>>>>
>>>>>>>>>>> $QEMU -machine pc-q35-9.1 -platform
linux-6.9 ...args...
>>>>>>>>>>>
>>>>>>>>>>> Any virtual machine HW features which are
tied to host kernel features
>>>>>>>>>>> would have their defaults set based on the
requested -platform. The
>>>>>>>>>>> -machine will be fully invariant wrt the host
kernel.
>>>>>>>>>>>
>>>>>>>>>>> You would have -platform hlep to list
available platforms, and
>>>>>>>>>>> corresonding QMP "query-platforms"
command to list what platforms
>>>>>>>>>>> are supported on a given host OS.
>>>>>>>>>>>
>>>>>>>>>>> Downstream distros can provide their own
platforms definitions
>>>>>>>>>>> (eg "linux-rhel-9.5") if they have
kernels whose feature set
>>>>>>>>>>> diverges from upstream due to backports.
>>>>>>>>>>>
>>>>>>>>>>> Mgmt apps won't need to be taught about
every single little QEMU
>>>>>>>>>>> setting whose default is derived from the
kernel. Individual
>>>>>>>>>>> defaults are opaque and controlled by the
requested platform.
>>>>>>>>>>>
>>>>>>>>>>> Live migration has clearly defined semantics,
and mgmt app can
>>>>>>>>>>> use query-platforms to validate two hosts are
compatible.
>>>>>>>>>>>
>>>>>>>>>>> Omitting -platform should pick the very
latest platform that is
>>>>>>>>>>> cmpatible with the current host (not
neccessarily the latest
>>>>>>>>>>> platform built-in to QEMU).
>>>>>>>>>>
>>>>>>>>>> This seems to add one more layer to maintain, and
so far I don't know
>>>>>>>>>> whether it's a must.
>>>>>>>>>>
>>>>>>>>>> To put it simple, can we simply rely on qemu
cmdline as "the guest ABI"? I
>>>>>>>>>> thought it was mostly the case already, except
some extremely rare
>>>>>>>>>> outliers.
>>>>>>>>>>
>>>>>>>>>> When we have one host that boots up a VM using:
>>>>>>>>>>
>>>>>>>>>> $QEMU1 $cmdline
>>>>>>>>>>
>>>>>>>>>> Then another host boots up:
>>>>>>>>>>
>>>>>>>>>> $QEMU2 $cmdline -incoming XXX
>>>>>>>>>>
>>>>>>>>>> Then migration should succeed if $cmdline is
exactly the same, and the VM
>>>>>>>>>> can boot up all fine without errors on both
sides.
>>>>>>>>>>
>>>>>>>>>> AFAICT this has nothing to do with what kernel is
underneath, even not
>>>>>>>>>> Linux? I think either QEMU1 / QEMU2 has the
option to fail. But if it
>>>>>>>>>> didn't, I thought the ABI should be
guaranteed.
>>>>>>>>>>
>>>>>>>>>> That's why I think this is a migration
violation, as 99.99% of other device
>>>>>>>>>> properties should be following this rule. The
issue here is, we have the
>>>>>>>>>> same virtio-net-pci cmdline on both sides in this
case, but the ABI got
>>>>>>>>>> break.
>>>>>>>>>>
>>>>>>>>>> That's also why I was suggesting if the
property contributes to the guest
>>>>>>>>>> ABI, then AFAIU QEMU needs to:
>>>>>>>>>>
>>>>>>>>>> - Firstly, never quietly flipping any bit
that affects the ABI...
>>>>>>>>>>
>>>>>>>>>> - Have a default value of off, then QEMU
will always allow the VM to boot
>>>>>>>>>> by default, while advanced users can
opt-in on new features. We can't
>>>>>>>>>> make this ON by default otherwise some
VMs can already fail to boot,
>>>>>>>>>
>>>>>>>>> It may not be necessary the case that old features
are supported by
>>>>>>>>> every systems. In an extreme case, a user may migrate
a VM from Linux to
>>>>>>>>> Windows, which probably doesn't support any
offloading at all. A more
>>>>>>>>> convincing scenario is RSS offloading with eBPF;
using eBPF requires a
>>>>>>>>> privilege so we cannot assume it is always available
even on the latest
>>>>>>>>> version of Linux.
>>>>>>>>
>>>>>>>> I don't get why eBPF matters here. It is something
that is not noticed
>>>>>>>> by the guest and we have a fallback anyhow.
>>>>>
>>>>> It is noticeable for the guest, and the fallback is not effective
with
>>>>> vhost.
>>>>
>>>> It's a bug then. Qemu can fallback to tuntap if it sees issues in
vhost.
>>>
>>> We can certainly fallback to in-QEMU RSS by disabling vhost, but I would
>>> not say lack of such fallback is a bug.
>>
>> Such fallback is by design since the introduction of vhost.
>>
>>> We don't provide in-QEMU
>>> fallback for other offloads.
>>
>> Yes but what I want to say is that eBPF RSS is different from those
>> segmentation offloads. And technically, Qemu can do fallback for
>> offloads (as RSC did).
>
> Well, I couldn't find any code disabling vhost for the in-QEMU RSC
> implementation.
It should be a bug (and I remember we disabled vhost when the patches
were merged). Have you tested it in a guest to see if it can see RSC
when vhost is enabled?
I suspect we need to add the RSC bit into current kernel_feature_bits:
/* Features supported by host kernel. */
static const int kernel_feature_bits[] = {
VIRTIO_F_NOTIFY_ON_EMPTY,
VIRTIO_RING_F_INDIRECT_DESC,
VIRTIO_RING_F_EVENT_IDX,
VIRTIO_NET_F_MRG_RXBUF,
VIRTIO_F_VERSION_1,
VIRTIO_NET_F_MTU,
VIRTIO_F_IOMMU_PLATFORM,
VIRTIO_F_RING_PACKED,
VIRTIO_F_RING_RESET,
VIRTIO_NET_F_HASH_REPORT,
VHOST_INVALID_FEATURE_BIT
};
As RSC won't be provided by TUN/TAP anyhow.
Adding the RSC bit does not let QEMU disable vhost for RSC, but instead
it implicitly disables RSC in my understanding. It is still better than
advertising the availability of that feature while it is missing.
>
> Looking at the code, I also found the case of vhost-vdpa. vhost can be
> simply disabled if it is backed by tuntap, but it is not the case for vDPA.
True, technically, vDPA can fallback to SVQ, but it's another topic.
My point of this discussion is that we cannot enable features just
because they are sufficiently old or because the user claims QEMU runs
on Linux sufficiently new. eBPF requires privilege, and vDPA requires
hardware feature. A fallback is not a silver bullet either, and there
are situations that providing a fallback is not a trivial task.
Regards,
Akihiko Odaki