On Thu, Sep 22, 2016 at 03:26:38PM +0100, Daniel P. Berrange wrote:
On Thu, Sep 22, 2016 at 08:19:21AM -0600, Alex Williamson wrote:
> On Thu, 22 Sep 2016 09:41:20 +0530
> Kirti Wankhede <kwankhede(a)nvidia.com> wrote:
>
> > >>>>> My concern is that a type id seems arbitrary but we're
specifying that
> > >>>>> it be unique. We already have something unique, the name.
So why try
> > >>>>> to make the type id unique as well? A vendor can
accidentally create
> > >>>>> their vendor driver so that a given name means something
very
> > >>>>> specific. On the other hand they need to be extremely
deliberate to
> > >>>>> coordinate that a type id means a unique thing across all
their product
> > >>>>> lines.
> > >>>>>
> > >>>>
> > >>>> Let me clarify, type id should be unique in the list of
> > >>>> mdev_supported_types. You can't have 2 directories in with
same name.
> > >>>
> > >>> Of course, but does that mean it's only unique to the machine
I'm
> > >>> currently running on? Let's say I have a Tesla P100 on my
system and
> > >>> type-id 11 is named "GRID-M60-0B". At some point in the
future I
> > >>> replace the Tesla P100 with a Q1000 (made up). Is type-id 11 on
that
> > >>> new card still going to be a "GRID-M60-0B"? If not then
we've based
> > >>> our XML on the wrong attribute. If the new device does not
support
> > >>> "GRID-M60-0B" then we should generate an error, not
simply initialize
> > >>> whatever type-id 11 happens to be on this new card.
> > >>>
> > >>
> > >> If there are 2 M60 in the system then you would find '11' type
directory
> > >> in mdev_supported_types of both M60. If you have P100, '11'
type would
> > >> not be there in its mdev_supported_types, it will have different
types.
> > >>
> > >> For example, if you replace M60 with P100, but XML is not updated.
XML
> > >> have type '11'. When libvirt would try to create mdev device,
libvirt
> > >> would have to find 'create' file in sysfs in following
directory format:
> > >>
> > >> --- mdev_supported_types
> > >> |-- 11
> > >> | |-- create
> > >>
> > >> but now for P100, '11' directory is not there, so libvirt
should throw
> > >> error on not able to find '11' directory.
> > >
> > > This really seems like an accident waiting to happen. What happens
> > > when the user replaces their M60 with an Intel XYZ device that happens
> > > to expose a type 11 mdev class gpu device? How is libvirt supposed to
> > > know that the XML used to refer to a GRID-M60-0B and now it's an
> > > INTEL-IGD-XYZ? Doesn't basing the XML entry on the name and removing
> > > yet another arbitrary requirement that we have some sort of globally
> > > unique type-id database make a lot of sense? The same issue applies
> > > for simple debug-ability, if I'm reviewing the XML for a domain and
the
> > > name is the primary index for the mdev device, I know what it is.
> > > Seeing type-id='11' is meaningless.
> > >
> >
> > Let me clarify again, type '11' is a string that vendor driver would
> > define (see my previous reply below) it could be "11" or
"GRID-M60-0B".
> > If 2 vendors used same string we can't control that. right?
> >
> >
> > >>>> Lets remove 'id' from type id in XML if that is the
concern. Supported
> > >>>> types is going to be defined by vendor driver, so let vendor
driver
> > >>>> decide what to use for directory name and same should be used
in device
> > >>>> xml file, it could be '11' or "GRID
M60-0B":
> > >>>>
> > >>>> <device>
> > >>>> <name>my-vgpu</name>
> > >>>> <parent>pci_0000_86_00_0</parent>
> > >>>> <capability type='mdev'>
> > >>>> <type='11'/>
> > >>>> ...
> > >>>> </capability>
> > >>>> </device>
>
> Then let's get rid of the 'name' attribute and let the sysfs directory
> simply be the name. Then we can get rid of 'type' altogether so we
> don't have this '11' vs 'GRID-M60-0B' issue. Thanks,
That sounds nice to me - we don't need two unique identifiers if
one will do.
Hi Alex and Daniel,
I just had some internal discussions here within NVIDIA and found out that
actually the name/label potentially might not be unique and the "id" will be.
So I think we still would like to keep both so the id is the programmatic id
and the name/label is a human readable string for it, which might get changed to
be non-unique by outside of engineering.
Sorry for the change.
Thanks,
Neo