On Wed, Apr 29, 2020 at 04:22:01PM +0800, Dr. David Alan Gilbert wrote:
* Yan Zhao (yan.y.zhao(a)intel.com) wrote:
> On Tue, Apr 28, 2020 at 10:14:37PM +0800, Dr. David Alan Gilbert wrote:
> > * Yan Zhao (yan.y.zhao(a)intel.com) wrote:
> > > On Mon, Apr 27, 2020 at 11:37:43PM +0800, Dr. David Alan Gilbert wrote:
> > > > * Yan Zhao (yan.y.zhao(a)intel.com) wrote:
> > > > > On Sat, Apr 25, 2020 at 03:10:49AM +0800, Dr. David Alan Gilbert
wrote:
> > > > > > * Yan Zhao (yan.y.zhao(a)intel.com) wrote:
> > > > > > > On Tue, Apr 21, 2020 at 08:08:49PM +0800, Tian, Kevin
wrote:
> > > > > > > > > From: Yan Zhao
> > > > > > > > > Sent: Tuesday, April 21, 2020 10:37 AM
> > > > > > > > >
> > > > > > > > > On Tue, Apr 21, 2020 at 06:56:00AM +0800,
Alex Williamson wrote:
> > > > > > > > > > On Sun, 19 Apr 2020 21:24:57 -0400
> > > > > > > > > > Yan Zhao <yan.y.zhao(a)intel.com>
wrote:
> > > > > > > > > >
> > > > > > > > > > > On Fri, Apr 17, 2020 at 07:24:57PM
+0800, Cornelia Huck wrote:
> > > > > > > > > > > > On Fri, 17 Apr 2020 05:52:02
-0400
> > > > > > > > > > > > Yan Zhao
<yan.y.zhao(a)intel.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Apr 17, 2020 at
04:44:50PM +0800, Cornelia Huck wrote:
> > > > > > > > > > > > > > On Mon, 13 Apr 2020
01:52:01 -0400
> > > > > > > > > > > > > > Yan Zhao
<yan.y.zhao(a)intel.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > This patchset
introduces a migration_version attribute under sysfs
> > > > > > > > > of VFIO
> > > > > > > > > > > > > > > Mediated
devices.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > This
migration_version attribute is used to check migration
> > > > > > > > > compatibility
> > > > > > > > > > > > > > > between two
mdev devices.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Currently, it
has two locations:
> > > > > > > > > > > > > > > (1) under
mdev_type node,
> > > > > > > > > > > > > > > which can
be used even before device creation, but only for
> > > > > > > > > mdev
> > > > > > > > > > > > > > > devices of
the same mdev type.
> > > > > > > > > > > > > > > (2) under mdev
device node,
> > > > > > > > > > > > > > > which can
only be used after the mdev devices are created, but
> > > > > > > > > the src
> > > > > > > > > > > > > > > and target
mdev devices are not necessarily be of the same
> > > > > > > > > mdev type
> > > > > > > > > > > > > > > (The second
location is newly added in v5, in order to keep
> > > > > > > > > consistent
> > > > > > > > > > > > > > > with the
migration_version node for migratable pass-though
> > > > > > > > > devices)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > What is the
relationship between those two attributes?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > (1) is for mdev devices
specifically, and (2) is provided to keep the
> > > > > > > > > same
> > > > > > > > > > > > > sysfs interface as with
non-mdev cases. so (2) is for both mdev
> > > > > > > > > devices and
> > > > > > > > > > > > > non-mdev devices.
> > > > > > > > > > > > >
> > > > > > > > > > > > > in future, if we enable
vfio-pci vendor ops, (i.e. a non-mdev device
> > > > > > > > > > > > > is binding to vfio-pci,
but is able to register migration region and do
> > > > > > > > > > > > > migration transactions
from a vendor provided affiliate driver),
> > > > > > > > > > > > > the vendor driver would
export (2) directly, under device node.
> > > > > > > > > > > > > It is not able to
provide (1) as there're no mdev devices involved.
> > > > > > > > > > > >
> > > > > > > > > > > > Ok, creating an alternate
attribute for non-mdev devices makes sense.
> > > > > > > > > > > > However, wouldn't that
rather be a case (3)? The change here only
> > > > > > > > > > > > refers to mdev devices.
> > > > > > > > > > > >
> > > > > > > > > > > as you pointed below, (3) and (2)
serve the same purpose.
> > > > > > > > > > > and I think a possible usage is to
migrate between a non-mdev device and
> > > > > > > > > > > an mdev device. so I think
it's better for them both to use (2) rather
> > > > > > > > > > > than creating (3).
> > > > > > > > > >
> > > > > > > > > > An mdev type is meant to define a
software compatible interface, so in
> > > > > > > > > > the case of mdev->mdev migration,
doesn't migrating to a different type
> > > > > > > > > > fail the most basic of compatibility
tests that we expect userspace to
> > > > > > > > > > perform? IOW, if two mdev types are
migration compatible, it seems a
> > > > > > > > > > prerequisite to that is that they
provide the same software interface,
> > > > > > > > > > which means they should be the same
mdev type.
> > > > > > > > > >
> > > > > > > > > > In the hybrid cases of mdev->phys or
phys->mdev, how does a
> > > > > > > > > management
> > > > > > > > > > tool begin to even guess what might be
compatible? Are we expecting
> > > > > > > > > > libvirt to probe ever device with this
attribute in the system? Is
> > > > > > > > > > there going to be a new class hierarchy
created to enumerate all
> > > > > > > > > > possible migrate-able devices?
> > > > > > > > > >
> > > > > > > > > yes, management tool needs to guess and test
migration compatible
> > > > > > > > > between two devices. But I think it's
not the problem only for
> > > > > > > > > mdev->phys or phys->mdev. even for
mdev->mdev, management tool needs
> > > > > > > > > to
> > > > > > > > > first assume that the two mdevs have the
same type of parent devices
> > > > > > > > > (e.g.their pciids are equal). otherwise,
it's still enumerating
> > > > > > > > > possibilities.
> > > > > > > > >
> > > > > > > > > on the other hand, for two mdevs,
> > > > > > > > > mdev1 from pdev1, its mdev_type is 1/2 of
pdev1;
> > > > > > > > > mdev2 from pdev2, its mdev_type is 1/4 of
pdev2;
> > > > > > > > > if pdev2 is exactly 2 times of pdev1, why
not allow migration between
> > > > > > > > > mdev1 <-> mdev2.
> > > > > > > >
> > > > > > > > How could the manage tool figure out that 1/2 of
pdev1 is equivalent
> > > > > > > > to 1/4 of pdev2? If we really want to allow such
thing happen, the best
> > > > > > > > choice is to report the same mdev type on both
pdev1 and pdev2.
> > > > > > > I think that's exactly the value of this
migration_version interface.
> > > > > > > the management tool can take advantage of this
interface to know if two
> > > > > > > devices are migration compatible, no matter they are
mdevs, non-mdevs,
> > > > > > > or mix.
> > > > > > >
> > > > > > > as I know, (please correct me if not right), current
libvirt still
> > > > > > > requires manually generating mdev devices, and it just
duplicates src vm
> > > > > > > configuration to the target vm.
> > > > > > > for libvirt, currently it's always phys->phys
and mdev->mdev (and of the
> > > > > > > same mdev type).
> > > > > > > But it does not justify that hybrid cases should not
be allowed. otherwise,
> > > > > > > why do we need to introduce this migration_version
interface and leave
> > > > > > > the judgement of migration compatibility to vendor
driver? why not simply
> > > > > > > set the criteria to something like "pciids of
parent devices are equal,
> > > > > > > and mdev types are equal" ?
> > > > > > >
> > > > > > >
> > > > > > > > btw mdev<->phys just brings trouble to
upper stack as Alex pointed out.
> > > > > > > could you help me understand why it will bring trouble
to upper stack?
> > > > > > >
> > > > > > > I think it just needs to read src migration_version
under src dev node,
> > > > > > > and test it in target migration version under target
dev node.
> > > > > > >
> > > > > > > after all, through this interface we just help the
upper layer
> > > > > > > knowing available options through reading and testing,
and they decide
> > > > > > > to use it or not.
> > > > > > >
> > > > > > > > Can we simplify the requirement by allowing only
mdev<->mdev and
> > > > > > > > phys<->phys migration? If an customer does
want to migrate between a
> > > > > > > > mdev and phys, he could wrap physical device into
a wrapped mdev
> > > > > > > > instance (with the same type as the source mdev)
instead of using vendor
> > > > > > > > ops. Doing so does add some burden but if
mdev<->phys is not dominant
> > > > > > > > usage then such tradeoff might be worthywhile...
> > > > > > > >
> > > > > > > If the interfaces for phys<->phys and
mdev<->mdev are consistent, it makes no
> > > > > > > difference to phys<->mdev, right?
> > > > > > > I think the vendor string for a mdev device is
something like:
> > > > > > > "Parent PCIID + mdev type + software
version", and
> > > > > > > that for a phys device is something like:
> > > > > > > "PCIID + software version".
> > > > > > > as long as we don't migrate between devices from
different vendors, it's
> > > > > > > easy for vendor driver to tell if a phys device is
migration compatible
> > > > > > > to a mdev device according it supports it or not.
> > > > > >
> > > > > > It surprises me that the PCIID matching is a requirement;
I'd assumed
> > > > > > with this clever mdev name setup that you could migrate
between two
> > > > > > different models in a series, or to a newer model, as long
as they
> > > > > > both supported the same mdev view.
> > > > > >
> > > > > hi Dave
> > > > > the migration_version string is transparent to userspace, and
is
> > > > > completely defined by vendor driver.
> > > > > I put it there just as an example of how vendor driver may
implement it.
> > > > > e.g.
> > > > > the src migration_version string is "src PCIID + src
software version",
> > > > > then when this string is write to target migration_version
node,
> > > > > the vendor driver in the target device will compare it with its
own
> > > > > device info and software version.
> > > > > If different models are allowed, the write just succeeds even
> > > > > PCIIDs in src and target are different.
> > > > >
> > > > > so, it is the vendor driver to define whether two devices are
able to
> > > > > migrate, no matter their PCIIDs, mdev types, software
versions..., which
> > > > > provides vendor driver full flexibility.
> > > > >
> > > > > do you think it's good?
> > > >
> > > > Yeh that's OK; I guess it's going to need to have a big table
in their
> > > > with all the PCIIDs in.
> > > > The alternative would be to abstract it a little; e.g. to say
it's
> > > > an Intel-gpu-core-v4 and then it would be less worried about the
exact
> > > > clock speed etc - but yes you might be right htat PCIIDs might be
best
> > > > for checking for quirks.
> > > >
> > > glad that you are agreed with it:)
> > > I think the vendor driver still can choose a way to abstract a little
> > > (e.g. Intel-gpu-core-v4...) if they think it's better. In that case,
the
> > > migration_string would be something like "Intel-gpu-core-v4 +
instance
> > > number + software version".
> > > IOW, they can choose anything they think appropriate to identify
migration
> > > compatibility of a device.
> > > But Alex is right, we have to prevent namespace overlapping. So I think
> > > we need to ensure src and target devices are from the same vendors.
> > > or, any other ideas?
> >
> > That's why I kept the 'Intel' in that example; or PCI vendor ID; I
was
> Yes, it's a good idea!
> could we add a line in the doc saying that
> it is the vendor driver to add a unique string to avoid namespace
> collision?
So why don't we split the difference; lets say that it should start with
the hex PCI Vendor ID.
The problem is for mdev devices, if the parent devices are not PCI devices,
they don't have PCI vendor IDs.
Thanks
Yan
> > only really trying to say that within one vendors range
there are often
> > a lot of PCI-IDs that have really minor variations.
> Yes. I also prefer to include PCI-IDs.
> BTW, sometimes even the same PCI-ID does not guarantee two devices are of no
> difference or are migration compatible. for example, two local NVMe
> devices may have the same PCI-ID but are configured to two different remote NVMe
> devices. the vendor driver needs to add extra info besides PCI-IDs then.
Ah, yes that's an interesting example.
Dave
>
> >
> >
> > >
> > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > I agree that there was a gap in the
previous proposal for non-mdev
> > > > > > > > > > devices, but I think this bring a lot
of questions that we need to
> > > > > > > > > > puzzle through and libvirt will need to
re-evaluate how they might
> > > > > > > > > > decide to pick a migration target
device. For example, I'm sure
> > > > > > > > > > libvirt would reject any policy
decisions regarding picking a physical
> > > > > > > > > > device versus an mdev device. Had we
previously left it that only a
> > > > > > > > > > layer above libvirt would select a
target device and libvirt only tests
> > > > > > > > > > compatibility to that target device?
> > > > > > > > > I'm not sure if there's a layer
above libvirt would select a target
> > > > > > > > > device. but if there is such a layer (even
it's human), we need to
> > > > > > > > > provide an interface for them to know
whether their decision is suitable
> > > > > > > > > for migration. The migration_version
interface provides a potential to
> > > > > > > > > allow mdev->phys migration, even libvirt
may currently reject it.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > We also need to consider that this
expands the namespace. If we no
> > > > > > > > > > longer require matching types as the
first level of comparison, then
> > > > > > > > > > vendor migration strings can
theoretically collide. How do we
> > > > > > > > > > coordinate that can't happen?
Thanks,
> > > > > > > > > yes, it's indeed a problem.
> > > > > > > > > could only allowing migration beteen devices
from the same vendor be a
> > > > > > > > > good
> > > > > > > > > prerequisite?
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > > Yan
> > > > > > > > > >
> > > > > > > > > > > > > > Is existence (and
compatibility) of (1) a pre-req for possible
> > > > > > > > > > > > > > existence (and
compatibility) of (2)?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > no. (2) does not reply
on (1).
> > > > > > > > > > > >
> > > > > > > > > > > > Hm. Non-existence of (1)
seems to imply "this type does not support
> > > > > > > > > > > > migration". If an mdev
created for such a type suddenly does support
> > > > > > > > > > > > migration, it feels a bit
odd.
> > > > > > > > > > > >
> > > > > > > > > > > yes. but I think if the condition
happens, it should be reported a bug
> > > > > > > > > > > to vendor driver.
> > > > > > > > > > > should I add a line in the doc
like "vendor driver should ensure that the
> > > > > > > > > > > migration compatibility from
migration_version under mdev_type should
> > > > > > > > > be
> > > > > > > > > > > consistent with that from
migration_version under device node" ?
> > > > > > > > > > >
> > > > > > > > > > > > (It obviously cannot be a
prereq for what I called (3) above.)
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Does userspace need
to check (1) or can it completely rely on (2), if
> > > > > > > > > > > > > > it so chooses?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > I think it can
completely reply on (2) if compatibility check before
> > > > > > > > > > > > > mdev creation is not
required.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > If devices with a
different mdev type are indeed compatible, it
> > > > > > > > > seems
> > > > > > > > > > > > > > userspace can only
find out after the devices have actually been
> > > > > > > > > > > > > > created, as (1)
does not apply?
> > > > > > > > > > > > > yes, I think so.
> > > > > > > > > > > >
> > > > > > > > > > > > How useful would it be for
userspace to even look at (1) in that case?
> > > > > > > > > > > > It only knows if things have
a chance of working if it actually goes
> > > > > > > > > > > > ahead and creates devices.
> > > > > > > > > > > >
> > > > > > > > > > > hmm, is it useful for userspace to
test the migration_version under mdev
> > > > > > > > > > > type before it knows what mdev
device to generate ?
> > > > > > > > > > > like when the userspace wants to
migrate an mdev device in src vm,
> > > > > > > > > > > but it has not created target vm
and the target mdev device.
> > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > One of my worries
is that the existence of an attribute with the
> > > > > > > > > same
> > > > > > > > > > > > > > name in two similar
locations might lead to confusion. But maybe it
> > > > > > > > > > > > > > isn't a
problem.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > Yes, I have the same
feeling. but as (2) is for sysfs interface
> > > > > > > > > > > > > consistency, to make it
transparent to userspace tools like libvirt,
> > > > > > > > > > > > > I guess the same name is
necessary?
> > > > > > > > > > > >
> > > > > > > > > > > > What do we actually need
here, I wonder? (1) and (2) seem to serve
> > > > > > > > > > > > slightly different purposes,
while (2) and what I called (3) have the
> > > > > > > > > > > > same purpose. Is it important
to userspace that (1) and (2) have the
> > > > > > > > > > > > same name?
> > > > > > > > > > > so change (1) to
migration_type_version and (2) to
> > > > > > > > > > > migration_instance_version?
> > > > > > > > > > > But as they are under different
locations, could that location imply
> > > > > > > > > > > enough information?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > > Yan
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
_______________________________________________
> > > > > > > > > intel-gvt-dev mailing list
> > > > > > > > > intel-gvt-dev(a)lists.freedesktop.org
> > > > > > > > >
https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev
> > > > > > >
> > > > > > --
> > > > > > Dr. David Alan Gilbert / dgilbert(a)redhat.com / Manchester,
UK
> > > > > >
> > > > >
> > > > --
> > > > Dr. David Alan Gilbert / dgilbert(a)redhat.com / Manchester, UK
> > > >
> > >
> > --
> > Dr. David Alan Gilbert / dgilbert(a)redhat.com / Manchester, UK
> >
>
--
Dr. David Alan Gilbert / dgilbert(a)redhat.com / Manchester, UK