Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration

Tuesday, 2 June 2020

On Wed, 29 Apr 2020 20:39:50 -0400
Yan Zhao <yan.y.zhao(a)intel.com&gt; wrote:

...
 On Wed, Apr 29, 2020 at 05:48:44PM +0800, Dr. David Alan Gilbert
wrote:
 <snip>
 > > > > > > > > > > > > > An mdev type is meant to
define a software compatible interface, so in
 > > > > > > > > > > > > > the case of
mdev->mdev migration, doesn't migrating to a different type
 > > > > > > > > > > > > > fail the most basic of
compatibility tests that we expect userspace to
 > > > > > > > > > > > > > perform?  IOW, if two
mdev types are migration compatible, it seems a
 > > > > > > > > > > > > > prerequisite to that is
that they provide the same software interface,
 > > > > > > > > > > > > > which means they should
be the same mdev type.
 > > > > > > > > > > > > >
 > > > > > > > > > > > > > In the hybrid cases of
mdev->phys or phys->mdev, how does a  
 > > > > > > > > > > > > management  
 > > > > > > > > > > > > > tool begin to even guess
what might be compatible?  Are we expecting
 > > > > > > > > > > > > > libvirt to probe ever
device with this attribute in the system?  Is
 > > > > > > > > > > > > > there going to be a new
class hierarchy created to enumerate all
 > > > > > > > > > > > > > possible migrate-able
devices?
 > > > > > > > > > > > > >  
 > > > > > > > > > > > > yes, management tool needs to
guess and test migration compatible
 > > > > > > > > > > > > between two devices. But I
think it's not the problem only for
 > > > > > > > > > > > > mdev->phys or
phys->mdev. even for mdev->mdev, management tool needs
 > > > > > > > > > > > > to
 > > > > > > > > > > > > first assume that the two
mdevs have the same type of parent devices
 > > > > > > > > > > > > (e.g.their pciids are equal).
otherwise, it's still enumerating
 > > > > > > > > > > > > possibilities.
 > > > > > > > > > > > > 
 > > > > > > > > > > > > on the other hand, for two
mdevs,
 > > > > > > > > > > > > mdev1 from pdev1, its
mdev_type is 1/2 of pdev1;
 > > > > > > > > > > > > mdev2 from pdev2, its
mdev_type is 1/4 of pdev2;
 > > > > > > > > > > > > if pdev2 is exactly 2 times
of pdev1, why not allow migration between
 > > > > > > > > > > > > mdev1 <-> mdev2.  
 > > > > > > > > > > > 
 > > > > > > > > > > > How could the manage tool figure
out that 1/2 of pdev1 is equivalent 
 > > > > > > > > > > > to 1/4 of pdev2? If we really want
to allow such thing happen, the best
 > > > > > > > > > > > choice is to report the same mdev
type on both pdev1 and pdev2.  
 > > > > > > > > > > I think that's exactly the value of
this migration_version interface.
 > > > > > > > > > > the management tool can take advantage
of this interface to know if two
 > > > > > > > > > > devices are migration compatible, no
matter they are mdevs, non-mdevs,
 > > > > > > > > > > or mix.
 > > > > > > > > > > 
 > > > > > > > > > > as I know, (please correct me if not
right), current libvirt still
 > > > > > > > > > > requires manually generating mdev
devices, and it just duplicates src vm
 > > > > > > > > > > configuration to the target vm.
 > > > > > > > > > > for libvirt, currently it's always
phys->phys and mdev->mdev (and of the
 > > > > > > > > > > same mdev type).
 > > > > > > > > > > But it does not justify that hybrid
cases should not be allowed. otherwise,
 > > > > > > > > > > why do we need to introduce this
migration_version interface and leave
 > > > > > > > > > > the judgement of migration
compatibility to vendor driver? why not simply
 > > > > > > > > > > set the criteria to something like
"pciids of parent devices are equal,
 > > > > > > > > > > and mdev types are equal" ?
 > > > > > > > > > > 
 > > > > > > > > > >   
 > > > > > > > > > > > btw mdev<->phys just brings
trouble to upper stack as Alex pointed out.   
 > > > > > > > > > > could you help me understand why it
will bring trouble to upper stack?
 > > > > > > > > > > 
 > > > > > > > > > > I think it just needs to read src
migration_version under src dev node,
 > > > > > > > > > > and test it in target migration version
under target dev node. 
 > > > > > > > > > > 
 > > > > > > > > > > after all, through this interface we
just help the upper layer
 > > > > > > > > > > knowing available options through
reading and testing, and they decide
 > > > > > > > > > > to use it or not.
 > > > > > > > > > >   
 > > > > > > > > > > > Can we simplify the requirement by
allowing only mdev<->mdev and 
 > > > > > > > > > > > phys<->phys migration? If an
customer does want to migrate between a 
 > > > > > > > > > > > mdev and phys, he could wrap
physical device into a wrapped mdev 
 > > > > > > > > > > > instance (with the same type as
the source mdev) instead of using vendor 
 > > > > > > > > > > > ops. Doing so does add some burden
but if mdev<->phys is not dominant 
 > > > > > > > > > > > usage then such tradeoff might be
worthywhile...
 > > > > > > > > > > >  
 > > > > > > > > > > If the interfaces for phys<->phys
and mdev<->mdev are consistent, it makes no
 > > > > > > > > > > difference to phys<->mdev,
right?
 > > > > > > > > > > I think the vendor string for a mdev
device is something like:
 > > > > > > > > > > "Parent PCIID + mdev type +
software version", and
 > > > > > > > > > > that for a phys device is something
like:
 > > > > > > > > > > "PCIID + software version".
 > > > > > > > > > > as long as we don't migrate between
devices from different vendors, it's
 > > > > > > > > > > easy for vendor driver to tell if a
phys device is migration compatible
 > > > > > > > > > > to a mdev device according it supports
it or not.  
 > > > > > > > > > 
 > > > > > > > > > It surprises me that the PCIID matching is a
requirement; I'd assumed
 > > > > > > > > > with this clever mdev name setup that you
could migrate between two
 > > > > > > > > > different models in a series, or to a newer
model, as long as they
 > > > > > > > > > both supported the same mdev view.
 > > > > > > > > >   
 > > > > > > > > hi Dave
 > > > > > > > > the migration_version string is transparent to
userspace, and is
 > > > > > > > > completely defined by vendor driver.
 > > > > > > > > I put it there just as an example of how vendor
driver may implement it.
 > > > > > > > > e.g.
 > > > > > > > > the src migration_version string is "src
PCIID + src software version", 
 > > > > > > > > then when this string is write to target
migration_version node,
 > > > > > > > > the vendor driver in the target device will
compare it with its own
 > > > > > > > > device info and software version.
 > > > > > > > > If different models are allowed, the write just
succeeds even
 > > > > > > > > PCIIDs in src and target are different.
 > > > > > > > > 
 > > > > > > > > so, it is the vendor driver to define whether two
devices are able to
 > > > > > > > > migrate, no matter their PCIIDs, mdev types,
software versions..., which
 > > > > > > > > provides vendor driver full flexibility.
 > > > > > > > > 
 > > > > > > > > do you think it's good?  
 > > > > > > > 
 > > > > > > > Yeh that's OK; I guess it's going to need to
have a big table in their
 > > > > > > > with all the PCIIDs in.
 > > > > > > > The alternative would be to abstract it a little; e.g.
to say it's
 > > > > > > > an Intel-gpu-core-v4  and then it would be less
worried about the exact
 > > > > > > > clock speed etc - but yes you might be right htat
PCIIDs might be best
 > > > > > > > for checking for quirks.
 > > > > > > >  
 > > > > > > glad that you are agreed with it:)
 > > > > > > I think the vendor driver still can choose a way to
abstract a little
 > > > > > > (e.g. Intel-gpu-core-v4...) if they think it's better.
In that case, the
 > > > > > > migration_string would be something like
"Intel-gpu-core-v4 + instance
 > > > > > > number + software version".
 > > > > > > IOW, they can choose anything they think appropriate to
identify migration
 > > > > > > compatibility of a device.
 > > > > > > But Alex is right, we have to prevent namespace
overlapping. So I think
 > > > > > > we need to ensure src and target devices are from the same
vendors.
 > > > > > > or, any other ideas?  
 > > > > > 
 > > > > > That's why I kept the 'Intel' in that example; or
PCI vendor ID; I was  
 > > > > Yes, it's a good idea!
 > > > > could we add a line in the doc saying that
 > > > > it is the vendor driver to add a unique string to avoid namespace
 > > > > collision?  
 > > > 
 > > > So why don't we split the difference; lets say that it should start
with
 > > > the hex PCI Vendor ID.
 > > >  
 > > The problem is for mdev devices, if the parent devices are not PCI devices, 
 > > they don't have PCI vendor IDs.  
 > 
 > Hmm it would be best not to invent a whole new way of giving unique
 > idenitifiers for vendors if we can.
 >   
 what about leveraging the flags in vfio device info ?

 #define VFIO_DEVICE_FLAGS_RESET (1 << 0)        /* Device supports reset */
 #define VFIO_DEVICE_FLAGS_PCI   (1 << 1)        /* vfio-pci device */
 #define VFIO_DEVICE_FLAGS_PLATFORM (1 << 2)     /* vfio-platform device */
 #define VFIO_DEVICE_FLAGS_AMBA  (1 << 3)        /* vfio-amba device */
 #define VFIO_DEVICE_FLAGS_CCW   (1 << 4)        /* vfio-ccw device */
 #define VFIO_DEVICE_FLAGS_AP    (1 << 5)        /* vfio-ap device */

 Then for migration_version string,
 The first 64 bits are for device type, the second 64 bits are for device id.
 e.g.
 for PCI devices, it could be
 VFIO_DEVICE_FLAGS_PCI + PCI ID.

 Currently in the doc, we only define PCI devices to use PCI ID as the second
 64 bits. In future, if other types of devices want to support migration,
 they can define their own parts of device id. e.g. use ACPI ID as the
 second 64-bit...

 sounds good? 
[dead thread resurrection alert]

Not really.  We're deep into territory that we were trying to avoid.
We had previously defined the version string as opaque (not
transparent) specifically because we did not want userspace to make
assumptions about compatibility based on the content of the string.  It
was 100% left to the vendor driver to determine compatibility.  The
mdev type was the full extent of the first level filter that userspace
could use to narrow the set of potentially compatible devices.  If we
remove that due to physical device migration support, I'm not sure how
we simplify the problem for userspace.

We need to step away from PCI IDs and parent devices.  We're not
designing a solution that only works for PCI, there's no guarantee that
parent devices are similar or even from the same vendor.

Does the mdev type sufficiently solve the problem for mdev devices?  If
so, then what can we learn from it and how can we apply an equivalence
to physical devices?  For example, should a vfio bus driver (vfio-pci
or vfio-mdev) expose vfio_migration_type and vfio_migration_version
attributes under the device in sysfs where the _type provides the first
level, user transparent, matching string (ex. mdev type for mdev
devices) while the _version provides the user opaque, vendor known
compatibility test?

This pushes the problem out to the drivers where we can perhaps
incorporate the module name to avoid collisions.  For example Yan's
vendor extension proposal makes use of vfio-pci with extension modules
loaded via an alias incorporating the PCI vendor and device ID.  So
vfio-pci might use a type of "vfio-pci:$ALIAS".

It's still a bit messy that someone needs to go evaluate all these
types between devices that exist and mdev devices that might exist if
created, but I don't have any good ideas to resolve that (maybe a new
class hierarchy?).  Thanks,

Alex

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration