On Wed, Sep 28, 2016 at 04:31:25PM -0400, Laine Stump wrote:
On 09/28/2016 03:59 PM, Neo Jia wrote:
> On Wed, Sep 28, 2016 at 07:45:38PM +0000, Tian, Kevin wrote:
> > > From: Neo Jia [mailto:cjia@nvidia.com]
> > > Sent: Thursday, September 29, 2016 3:23 AM
> > >
> > > On Thu, Sep 22, 2016 at 03:26:38PM +0100, Daniel P. Berrange wrote:
> > > > On Thu, Sep 22, 2016 at 08:19:21AM -0600, Alex Williamson wrote:
> > > > > On Thu, 22 Sep 2016 09:41:20 +0530
> > > > > Kirti Wankhede <kwankhede(a)nvidia.com> wrote:
> > > > >
> > > > > > > > > > > My concern is that a type id seems
arbitrary but we're specifying that
> > > > > > > > > > > it be unique. We already have
something unique, the name. So why try
> > > > > > > > > > > to make the type id unique as
well? A vendor can accidentally create
> > > > > > > > > > > their vendor driver so that a
given name means something very
> > > > > > > > > > > specific. On the other hand they
need to be extremely deliberate to
> > > > > > > > > > > coordinate that a type id means a
unique thing across all their product
> > > > > > > > > > > lines.
> > > > > > > > > > >
> > > > > > > > > > Let me clarify, type id should be
unique in the list of
> > > > > > > > > > mdev_supported_types. You can't
have 2 directories in with same name.
> > > > > > > > > Of course, but does that mean it's only
unique to the machine I'm
> > > > > > > > > currently running on? Let's say I have
a Tesla P100 on my system and
> > > > > > > > > type-id 11 is named "GRID-M60-0B".
At some point in the future I
> > > > > > > > > replace the Tesla P100 with a Q1000 (made
up). Is type-id 11 on that
> > > > > > > > > new card still going to be a
"GRID-M60-0B"? If not then we've based
> > > > > > > > > our XML on the wrong attribute. If the new
device does not support
> > > > > > > > > "GRID-M60-0B" then we should
generate an error, not simply initialize
> > > > > > > > > whatever type-id 11 happens to be on this
new card.
> > > > > > > > >
> > > > > > > > If there are 2 M60 in the system then you would
find '11' type directory
> > > > > > > > in mdev_supported_types of both M60. If you have
P100, '11' type would
> > > > > > > > not be there in its mdev_supported_types, it will
have different types.
> > > > > > > >
> > > > > > > > For example, if you replace M60 with P100, but
XML is not updated. XML
> > > > > > > > have type '11'. When libvirt would try to
create mdev device, libvirt
> > > > > > > > would have to find 'create' file in sysfs
in following directory format:
> > > > > > > >
> > > > > > > > --- mdev_supported_types
> > > > > > > > |-- 11
> > > > > > > > | |-- create
> > > > > > > >
> > > > > > > > but now for P100, '11' directory is not
there, so libvirt should throw
> > > > > > > > error on not able to find '11'
directory.
> > > > > > > This really seems like an accident waiting to happen.
What happens
> > > > > > > when the user replaces their M60 with an Intel XYZ
device that happens
> > > > > > > to expose a type 11 mdev class gpu device? How is
libvirt supposed to
> > > > > > > know that the XML used to refer to a GRID-M60-0B and
now it's an
> > > > > > > INTEL-IGD-XYZ? Doesn't basing the XML entry on
the name and removing
> > > > > > > yet another arbitrary requirement that we have some
sort of globally
> > > > > > > unique type-id database make a lot of sense? The same
issue applies
> > > > > > > for simple debug-ability, if I'm reviewing the XML
for a domain and the
> > > > > > > name is the primary index for the mdev device, I know
what it is.
> > > > > > > Seeing type-id='11' is meaningless.
> > > > > > >
> > > > > > Let me clarify again, type '11' is a string that
vendor driver would
> > > > > > define (see my previous reply below) it could be
"11" or "GRID-M60-0B".
> > > > > > If 2 vendors used same string we can't control that.
right?
> > > > > >
> > > > > >
> > > > > > > > > > Lets remove 'id' from type id
in XML if that is the concern. Supported
> > > > > > > > > > types is going to be defined by vendor
driver, so let vendor driver
> > > > > > > > > > decide what to use for directory name
and same should be used in device
> > > > > > > > > > xml file, it could be '11' or
"GRID M60-0B":
> > > > > > > > > >
> > > > > > > > > > <device>
> > > > > > > > > >
<name>my-vgpu</name>
> > > > > > > > > >
<parent>pci_0000_86_00_0</parent>
> > > > > > > > > > <capability
type='mdev'>
> > > > > > > > > > <type='11'/>
> > > > > > > > > > ...
> > > > > > > > > > </capability>
> > > > > > > > > > </device>
> > > > > Then let's get rid of the 'name' attribute and let
the sysfs directory
> > > > > simply be the name. Then we can get rid of 'type'
altogether so we
> > > > > don't have this '11' vs 'GRID-M60-0B' issue.
Thanks,
> > > > That sounds nice to me - we don't need two unique identifiers if
> > > > one will do.
> > > Hi Alex and Daniel,
> > >
> > > I just had some internal discussions here within NVIDIA and found out
that
> > > actually the name/label potentially might not be unique and the
"id" will be.
My comment below follows the above statement^^^^.
> > > So I think we still would like to keep both so the id is the programmatic
id
> > > and the name/label is a human readable string for it, which might get
changed to
> > > be non-unique by outside of engineering.
> > >
> > > Sorry for the change.
> > >
> > > Thanks,
> > > Neo
> > >
> > A curious question. How do we expect such a descriptive name/label used
> > by upper-level stack (e.g. openstack)? Should openstack define a vGPU
> > flavor just using ID (GRID-type11) or using both ID/name (GRID-type11-
> > M60-0B) for end customer to choose? If it's only for human information,
> > does it make sense e.g. providing only unique ID in sysfs while relying on
> > vendor specific documentation to describe what the ID actually means?
> Hi Kevin,
>
> The id is not visible to the upper-level stack, only the name / label will be
> shown to the end customer to choose, such as "GRID-M60-0B", as we might
expose
> the same virtual device (name/label) with some internal difference which will
> be tracked by the different unique id.
If the upper layer will only see the descriptive name/label, then that label
must be unique. It's not acceptable for a particular key used by management
software to sometimes lead to one flavor of device and sometimes another (no
matter how small the differences may be). So if only the ID is unique, then
the ID is what must be used in any configuration at any level.
Probably I should be clear about the "upper layer" as I was replying to Kevin
regarding what the end user will see from their interface.
The "key" will be unique throughout all management stacks and you should use
the
"key / ID" for all configurations. But when you want to show a human readable
description of a virtual device, it will come from the "description" field.
Thanks,
Neo
>
> I think having the ability to allow libvirt or upper-level stack to display a
> human readable string for a given type of vgpu will make the user life easier.
>
> Thanks,
> Neo
>
> > Thanks,
> > Kevin
>
>
>
>
> --
> libvir-list mailing list
> libvir-list(a)redhat.com
>
https://www.redhat.com/mailman/listinfo/libvir-list
>