Re: [PATCH v2 4/4] virtio-net: Add support for USO features

Thursday, 1 August 2024

On 2024/08/01 11:28, Jason Wang wrote:
...
 On Wed, Jul 31, 2024 at 8:58 PM Peter Xu <peterx(a)redhat.com&gt;
wrote:
>
> On Wed, Jul 31, 2024 at 03:41:00AM -0400, Michael S. Tsirkin wrote:
>> On Wed, Jul 31, 2024 at 08:04:24AM +0100, Daniel P. Berrangé wrote:
>>> On Tue, Jul 30, 2024 at 05:32:48PM -0400, Michael S. Tsirkin wrote:
>>>> On Tue, Jul 30, 2024 at 04:03:53PM -0400, Peter Xu wrote:
>>>>> On Tue, Jul 30, 2024 at 03:22:50PM -0400, Michael S. Tsirkin wrote:
>>>>>> This is not what we did historically. Why should we start now?
>>>>>
>>>>> It's a matter of whether we still want migration to randomly
fail, like
>>>>> what this patch does.
>>>>>
>>>>> Or any better suggestions?  I'm definitely open to that.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> --
>>>>> Peter Xu
>>>>
>>>> Randomly is an overstatement. You need to switch between kernels
>>>> where this feature differs. We did it with a ton of features
>>>> in the past, donnu why we single out USO now.
>>>
>>> This has been a problem with a ton of features in the past. We've
>>> ignored the problem, but that doesn't make it the right solution
>>>
>>> With regards,
>>> Daniel
>>
>> Pushing it to domain xml does not really help,
>> migration will still fail unexpectedly (after wasting
>> a ton of resources copying memory, and getting
>> a downtime bump, I might add).
>
> Could you elaborate why it would fail if with what I proposed?
>
> Note that if this is a generic comment about "any migration can fail if we
> found a device mismatch", we have plan to fix that to some degree. It's
> just that we don't have enough people working on these topics yet. See:
>
> https://wiki.qemu.org/ToDo/LiveMigration#Migration_handshake
>
> It includes:
>
>   "Check device tree on both sides, etc., to make sure the migration is
>    applicable. E.g., we should fail early and clearly on any device
>    mismatch."
>
> However I don't think it'll cover all checks, e.g. I _think_ even if we
> verify VMSDs then post_load() hooks can still fail, and there can be some
> corner cases to think.  And of course, this may not even apply to virtio
> since virtio manages migration itself, without providing a top-level vmsd.
>
>>
>> The right solution is to have a tool that can query
>> backends, and that given the results from all of the cluster,
>> generate a set of parameters that will ensure migration works.

 This seems to be very hard for vhost-users. 
Can you elaborate more? I was thinking something like follows:
1. Prepare a QEMU command line.
2. Run the command line appended with -dump-platform on all hosts, which 
dumps platform features automatically enabled. For virtio devices, we 
can dump "host_features" variable.
3. Run the command line appended with -merge-platform with all dumps. 
For most virtio devices, this would be AND operations on "host_features" 
variable.
4. Run the command line appended with -use-platform with the merged 
dump. This will run VMs with features available on all hosts.

I may have missed something but this seems good enough for me. Of course 
this requires changes throughout the stack (QEMU common and 
device-specific code, libvirt, and even higher layers like OpenStack).

Regards,
Akihiko Odaki

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [PATCH v2 4/4] virtio-net: Add support for USO features