On Mon, Apr 27, 2020 at 11:37:43PM +0800, Dr. David Alan Gilbert wrote:
* Yan Zhao (yan.y.zhao(a)intel.com) wrote:
> On Sat, Apr 25, 2020 at 03:10:49AM +0800, Dr. David Alan Gilbert wrote:
> > * Yan Zhao (yan.y.zhao(a)intel.com) wrote:
> > > On Tue, Apr 21, 2020 at 08:08:49PM +0800, Tian, Kevin wrote:
> > > > > From: Yan Zhao
> > > > > Sent: Tuesday, April 21, 2020 10:37 AM
> > > > >
> > > > > On Tue, Apr 21, 2020 at 06:56:00AM +0800, Alex Williamson
wrote:
> > > > > > On Sun, 19 Apr 2020 21:24:57 -0400
> > > > > > Yan Zhao <yan.y.zhao(a)intel.com> wrote:
> > > > > >
> > > > > > > On Fri, Apr 17, 2020 at 07:24:57PM +0800, Cornelia
Huck wrote:
> > > > > > > > On Fri, 17 Apr 2020 05:52:02 -0400
> > > > > > > > Yan Zhao <yan.y.zhao(a)intel.com> wrote:
> > > > > > > >
> > > > > > > > > On Fri, Apr 17, 2020 at 04:44:50PM +0800,
Cornelia Huck wrote:
> > > > > > > > > > On Mon, 13 Apr 2020 01:52:01 -0400
> > > > > > > > > > Yan Zhao <yan.y.zhao(a)intel.com>
wrote:
> > > > > > > > > >
> > > > > > > > > > > This patchset introduces a
migration_version attribute under sysfs
> > > > > of VFIO
> > > > > > > > > > > Mediated devices.
> > > > > > > > > > >
> > > > > > > > > > > This migration_version attribute
is used to check migration
> > > > > compatibility
> > > > > > > > > > > between two mdev devices.
> > > > > > > > > > >
> > > > > > > > > > > Currently, it has two locations:
> > > > > > > > > > > (1) under mdev_type node,
> > > > > > > > > > > which can be used even before
device creation, but only for
> > > > > mdev
> > > > > > > > > > > devices of the same mdev
type.
> > > > > > > > > > > (2) under mdev device node,
> > > > > > > > > > > which can only be used after
the mdev devices are created, but
> > > > > the src
> > > > > > > > > > > and target mdev devices are
not necessarily be of the same
> > > > > mdev type
> > > > > > > > > > > (The second location is newly
added in v5, in order to keep
> > > > > consistent
> > > > > > > > > > > with the migration_version node
for migratable pass-though
> > > > > devices)
> > > > > > > > > >
> > > > > > > > > > What is the relationship between those
two attributes?
> > > > > > > > > >
> > > > > > > > > (1) is for mdev devices specifically, and
(2) is provided to keep the
> > > > > same
> > > > > > > > > sysfs interface as with non-mdev cases. so
(2) is for both mdev
> > > > > devices and
> > > > > > > > > non-mdev devices.
> > > > > > > > >
> > > > > > > > > in future, if we enable vfio-pci vendor ops,
(i.e. a non-mdev device
> > > > > > > > > is binding to vfio-pci, but is able to
register migration region and do
> > > > > > > > > migration transactions from a vendor
provided affiliate driver),
> > > > > > > > > the vendor driver would export (2) directly,
under device node.
> > > > > > > > > It is not able to provide (1) as
there're no mdev devices involved.
> > > > > > > >
> > > > > > > > Ok, creating an alternate attribute for non-mdev
devices makes sense.
> > > > > > > > However, wouldn't that rather be a case (3)?
The change here only
> > > > > > > > refers to mdev devices.
> > > > > > > >
> > > > > > > as you pointed below, (3) and (2) serve the same
purpose.
> > > > > > > and I think a possible usage is to migrate between a
non-mdev device and
> > > > > > > an mdev device. so I think it's better for them
both to use (2) rather
> > > > > > > than creating (3).
> > > > > >
> > > > > > An mdev type is meant to define a software compatible
interface, so in
> > > > > > the case of mdev->mdev migration, doesn't migrating
to a different type
> > > > > > fail the most basic of compatibility tests that we expect
userspace to
> > > > > > perform? IOW, if two mdev types are migration compatible,
it seems a
> > > > > > prerequisite to that is that they provide the same software
interface,
> > > > > > which means they should be the same mdev type.
> > > > > >
> > > > > > In the hybrid cases of mdev->phys or phys->mdev, how
does a
> > > > > management
> > > > > > tool begin to even guess what might be compatible? Are we
expecting
> > > > > > libvirt to probe ever device with this attribute in the
system? Is
> > > > > > there going to be a new class hierarchy created to
enumerate all
> > > > > > possible migrate-able devices?
> > > > > >
> > > > > yes, management tool needs to guess and test migration
compatible
> > > > > between two devices. But I think it's not the problem only
for
> > > > > mdev->phys or phys->mdev. even for mdev->mdev,
management tool needs
> > > > > to
> > > > > first assume that the two mdevs have the same type of parent
devices
> > > > > (e.g.their pciids are equal). otherwise, it's still
enumerating
> > > > > possibilities.
> > > > >
> > > > > on the other hand, for two mdevs,
> > > > > mdev1 from pdev1, its mdev_type is 1/2 of pdev1;
> > > > > mdev2 from pdev2, its mdev_type is 1/4 of pdev2;
> > > > > if pdev2 is exactly 2 times of pdev1, why not allow migration
between
> > > > > mdev1 <-> mdev2.
> > > >
> > > > How could the manage tool figure out that 1/2 of pdev1 is equivalent
> > > > to 1/4 of pdev2? If we really want to allow such thing happen, the
best
> > > > choice is to report the same mdev type on both pdev1 and pdev2.
> > > I think that's exactly the value of this migration_version interface.
> > > the management tool can take advantage of this interface to know if two
> > > devices are migration compatible, no matter they are mdevs, non-mdevs,
> > > or mix.
> > >
> > > as I know, (please correct me if not right), current libvirt still
> > > requires manually generating mdev devices, and it just duplicates src vm
> > > configuration to the target vm.
> > > for libvirt, currently it's always phys->phys and mdev->mdev
(and of the
> > > same mdev type).
> > > But it does not justify that hybrid cases should not be allowed.
otherwise,
> > > why do we need to introduce this migration_version interface and leave
> > > the judgement of migration compatibility to vendor driver? why not simply
> > > set the criteria to something like "pciids of parent devices are
equal,
> > > and mdev types are equal" ?
> > >
> > >
> > > > btw mdev<->phys just brings trouble to upper stack as Alex
pointed out.
> > > could you help me understand why it will bring trouble to upper stack?
> > >
> > > I think it just needs to read src migration_version under src dev node,
> > > and test it in target migration version under target dev node.
> > >
> > > after all, through this interface we just help the upper layer
> > > knowing available options through reading and testing, and they decide
> > > to use it or not.
> > >
> > > > Can we simplify the requirement by allowing only mdev<->mdev
and
> > > > phys<->phys migration? If an customer does want to migrate
between a
> > > > mdev and phys, he could wrap physical device into a wrapped mdev
> > > > instance (with the same type as the source mdev) instead of using
vendor
> > > > ops. Doing so does add some burden but if mdev<->phys is not
dominant
> > > > usage then such tradeoff might be worthywhile...
> > > >
> > > If the interfaces for phys<->phys and mdev<->mdev are
consistent, it makes no
> > > difference to phys<->mdev, right?
> > > I think the vendor string for a mdev device is something like:
> > > "Parent PCIID + mdev type + software version", and
> > > that for a phys device is something like:
> > > "PCIID + software version".
> > > as long as we don't migrate between devices from different vendors,
it's
> > > easy for vendor driver to tell if a phys device is migration compatible
> > > to a mdev device according it supports it or not.
> >
> > It surprises me that the PCIID matching is a requirement; I'd assumed
> > with this clever mdev name setup that you could migrate between two
> > different models in a series, or to a newer model, as long as they
> > both supported the same mdev view.
> >
> hi Dave
> the migration_version string is transparent to userspace, and is
> completely defined by vendor driver.
> I put it there just as an example of how vendor driver may implement it.
> e.g.
> the src migration_version string is "src PCIID + src software version",
> then when this string is write to target migration_version node,
> the vendor driver in the target device will compare it with its own
> device info and software version.
> If different models are allowed, the write just succeeds even
> PCIIDs in src and target are different.
>
> so, it is the vendor driver to define whether two devices are able to
> migrate, no matter their PCIIDs, mdev types, software versions..., which
> provides vendor driver full flexibility.
>
> do you think it's good?
Yeh that's OK; I guess it's going to need to have a big table in their
with all the PCIIDs in.
The alternative would be to abstract it a little; e.g. to say it's
an Intel-gpu-core-v4 and then it would be less worried about the exact
clock speed etc - but yes you might be right htat PCIIDs might be best
for checking for quirks.
glad that you are agreed with it:)
I think the vendor driver still can choose a way to abstract a little
(e.g. Intel-gpu-core-v4...) if they think it's better. In that case, the
migration_string would be something like "Intel-gpu-core-v4 + instance
number + software version".
IOW, they can choose anything they think appropriate to identify migration
compatibility of a device.
But Alex is right, we have to prevent namespace overlapping. So I think
we need to ensure src and target devices are from the same vendors.
or, any other ideas?
Thanks
Yan
> > > > >
> > > > >
> > > > > > I agree that there was a gap in the previous proposal for
non-mdev
> > > > > > devices, but I think this bring a lot of questions that we
need to
> > > > > > puzzle through and libvirt will need to re-evaluate how
they might
> > > > > > decide to pick a migration target device. For example,
I'm sure
> > > > > > libvirt would reject any policy decisions regarding picking
a physical
> > > > > > device versus an mdev device. Had we previously left it
that only a
> > > > > > layer above libvirt would select a target device and
libvirt only tests
> > > > > > compatibility to that target device?
> > > > > I'm not sure if there's a layer above libvirt would
select a target
> > > > > device. but if there is such a layer (even it's human), we
need to
> > > > > provide an interface for them to know whether their decision is
suitable
> > > > > for migration. The migration_version interface provides a
potential to
> > > > > allow mdev->phys migration, even libvirt may currently reject
it.
> > > > >
> > > > >
> > > > > > We also need to consider that this expands the namespace.
If we no
> > > > > > longer require matching types as the first level of
comparison, then
> > > > > > vendor migration strings can theoretically collide. How do
we
> > > > > > coordinate that can't happen? Thanks,
> > > > > yes, it's indeed a problem.
> > > > > could only allowing migration beteen devices from the same
vendor be a
> > > > > good
> > > > > prerequisite?
> > > > >
> > > > > Thanks
> > > > > Yan
> > > > > >
> > > > > > > > > > Is existence (and compatibility) of (1)
a pre-req for possible
> > > > > > > > > > existence (and compatibility) of (2)?
> > > > > > > > > >
> > > > > > > > > no. (2) does not reply on (1).
> > > > > > > >
> > > > > > > > Hm. Non-existence of (1) seems to imply
"this type does not support
> > > > > > > > migration". If an mdev created for such a
type suddenly does support
> > > > > > > > migration, it feels a bit odd.
> > > > > > > >
> > > > > > > yes. but I think if the condition happens, it should
be reported a bug
> > > > > > > to vendor driver.
> > > > > > > should I add a line in the doc like "vendor
driver should ensure that the
> > > > > > > migration compatibility from migration_version under
mdev_type should
> > > > > be
> > > > > > > consistent with that from migration_version under
device node" ?
> > > > > > >
> > > > > > > > (It obviously cannot be a prereq for what I
called (3) above.)
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > Does userspace need to check (1) or can
it completely rely on (2), if
> > > > > > > > > > it so chooses?
> > > > > > > > > >
> > > > > > > > > I think it can completely reply on (2) if
compatibility check before
> > > > > > > > > mdev creation is not required.
> > > > > > > > >
> > > > > > > > > > If devices with a different mdev type
are indeed compatible, it
> > > > > seems
> > > > > > > > > > userspace can only find out after the
devices have actually been
> > > > > > > > > > created, as (1) does not apply?
> > > > > > > > > yes, I think so.
> > > > > > > >
> > > > > > > > How useful would it be for userspace to even look
at (1) in that case?
> > > > > > > > It only knows if things have a chance of working
if it actually goes
> > > > > > > > ahead and creates devices.
> > > > > > > >
> > > > > > > hmm, is it useful for userspace to test the
migration_version under mdev
> > > > > > > type before it knows what mdev device to generate ?
> > > > > > > like when the userspace wants to migrate an mdev
device in src vm,
> > > > > > > but it has not created target vm and the target mdev
device.
> > > > > > >
> > > > > > > > >
> > > > > > > > > > One of my worries is that the existence
of an attribute with the
> > > > > same
> > > > > > > > > > name in two similar locations might
lead to confusion. But maybe it
> > > > > > > > > > isn't a problem.
> > > > > > > > > >
> > > > > > > > > Yes, I have the same feeling. but as (2) is
for sysfs interface
> > > > > > > > > consistency, to make it transparent to
userspace tools like libvirt,
> > > > > > > > > I guess the same name is necessary?
> > > > > > > >
> > > > > > > > What do we actually need here, I wonder? (1) and
(2) seem to serve
> > > > > > > > slightly different purposes, while (2) and what I
called (3) have the
> > > > > > > > same purpose. Is it important to userspace that
(1) and (2) have the
> > > > > > > > same name?
> > > > > > > so change (1) to migration_type_version and (2) to
> > > > > > > migration_instance_version?
> > > > > > > But as they are under different locations, could that
location imply
> > > > > > > enough information?
> > > > > > >
> > > > > > >
> > > > > > > Thanks
> > > > > > > Yan
> > > > > > >
> > > > > > >
> > > > > >
> > > > > _______________________________________________
> > > > > intel-gvt-dev mailing list
> > > > > intel-gvt-dev(a)lists.freedesktop.org
> > > > >
https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev
> > >
> > --
> > Dr. David Alan Gilbert / dgilbert(a)redhat.com / Manchester, UK
> >
>
--
Dr. David Alan Gilbert / dgilbert(a)redhat.com / Manchester, UK