On Thu, Sep 29, 2016 at 09:03:40AM +0100, Daniel P. Berrange wrote:
On Wed, Sep 28, 2016 at 12:22:35PM -0700, Neo Jia wrote:
> On Thu, Sep 22, 2016 at 03:26:38PM +0100, Daniel P. Berrange wrote:
> > On Thu, Sep 22, 2016 at 08:19:21AM -0600, Alex Williamson wrote:
> > > On Thu, 22 Sep 2016 09:41:20 +0530
> > > Kirti Wankhede <kwankhede(a)nvidia.com> wrote:
> > >
> > > > >>>>> My concern is that a type id seems arbitrary but
we're specifying that
> > > > >>>>> it be unique. We already have something unique,
the name. So why try
> > > > >>>>> to make the type id unique as well? A vendor
can accidentally create
> > > > >>>>> their vendor driver so that a given name means
something very
> > > > >>>>> specific. On the other hand they need to be
extremely deliberate to
> > > > >>>>> coordinate that a type id means a unique thing
across all their product
> > > > >>>>> lines.
> > > > >>>>>
> > > > >>>>
> > > > >>>> Let me clarify, type id should be unique in the list
of
> > > > >>>> mdev_supported_types. You can't have 2
directories in with same name.
> > > > >>>
> > > > >>> Of course, but does that mean it's only unique to
the machine I'm
> > > > >>> currently running on? Let's say I have a Tesla P100
on my system and
> > > > >>> type-id 11 is named "GRID-M60-0B". At some
point in the future I
> > > > >>> replace the Tesla P100 with a Q1000 (made up). Is
type-id 11 on that
> > > > >>> new card still going to be a "GRID-M60-0B"?
If not then we've based
> > > > >>> our XML on the wrong attribute. If the new device does
not support
> > > > >>> "GRID-M60-0B" then we should generate an
error, not simply initialize
> > > > >>> whatever type-id 11 happens to be on this new card.
> > > > >>>
> > > > >>
> > > > >> If there are 2 M60 in the system then you would find
'11' type directory
> > > > >> in mdev_supported_types of both M60. If you have P100,
'11' type would
> > > > >> not be there in its mdev_supported_types, it will have
different types.
> > > > >>
> > > > >> For example, if you replace M60 with P100, but XML is not
updated. XML
> > > > >> have type '11'. When libvirt would try to create
mdev device, libvirt
> > > > >> would have to find 'create' file in sysfs in
following directory format:
> > > > >>
> > > > >> --- mdev_supported_types
> > > > >> |-- 11
> > > > >> | |-- create
> > > > >>
> > > > >> but now for P100, '11' directory is not there, so
libvirt should throw
> > > > >> error on not able to find '11' directory.
> > > > >
> > > > > This really seems like an accident waiting to happen. What
happens
> > > > > when the user replaces their M60 with an Intel XYZ device that
happens
> > > > > to expose a type 11 mdev class gpu device? How is libvirt
supposed to
> > > > > know that the XML used to refer to a GRID-M60-0B and now
it's an
> > > > > INTEL-IGD-XYZ? Doesn't basing the XML entry on the name and
removing
> > > > > yet another arbitrary requirement that we have some sort of
globally
> > > > > unique type-id database make a lot of sense? The same issue
applies
> > > > > for simple debug-ability, if I'm reviewing the XML for a
domain and the
> > > > > name is the primary index for the mdev device, I know what it
is.
> > > > > Seeing type-id='11' is meaningless.
> > > > >
> > > >
> > > > Let me clarify again, type '11' is a string that vendor
driver would
> > > > define (see my previous reply below) it could be "11" or
"GRID-M60-0B".
> > > > If 2 vendors used same string we can't control that. right?
> > > >
> > > >
> > > > >>>> Lets remove 'id' from type id in XML if that
is the concern. Supported
> > > > >>>> types is going to be defined by vendor driver, so
let vendor driver
> > > > >>>> decide what to use for directory name and same
should be used in device
> > > > >>>> xml file, it could be '11' or "GRID
M60-0B":
> > > > >>>>
> > > > >>>> <device>
> > > > >>>> <name>my-vgpu</name>
> > > > >>>> <parent>pci_0000_86_00_0</parent>
> > > > >>>> <capability type='mdev'>
> > > > >>>> <type='11'/>
> > > > >>>> ...
> > > > >>>> </capability>
> > > > >>>> </device>
> > >
> > > Then let's get rid of the 'name' attribute and let the sysfs
directory
> > > simply be the name. Then we can get rid of 'type' altogether so
we
> > > don't have this '11' vs 'GRID-M60-0B' issue. Thanks,
> >
> > That sounds nice to me - we don't need two unique identifiers if
> > one will do.
>
> Hi Alex and Daniel,
>
> I just had some internal discussions here within NVIDIA and found out that
> actually the name/label potentially might not be unique and the "id" will
be.
> So I think we still would like to keep both so the id is the programmatic id
> and the name/label is a human readable string for it, which might get changed to
> be non-unique by outside of engineering.
The name is what the end user will used to decide which type of device
to create. If the name is not unique, how is the user supposed to decide
which of the duplicates to use. Allowing duplicate names just sounds
broken to me.
Yes, the name is what the end user will use to decide and from their point of
view the virtual device is presented to them same. For example, it might come
from two different physical devices (I think Alex has mentioned the same on his
response to me). To achieve the same "virtual device", there
will be internal implementation differences, therefore the "id" will be unique.
Thanks,
Neo