On Wed, Aug 05, 2020 at 12:53:19PM +0200, Jiri Pirko wrote:
Wed, Aug 05, 2020 at 11:33:38AM CEST, yan.y.zhao(a)intel.com wrote:
>On Wed, Aug 05, 2020 at 04:02:48PM +0800, Jason Wang wrote:
>>
>> On 2020/8/5 下午3:56, Jiri Pirko wrote:
>> > Wed, Aug 05, 2020 at 04:41:54AM CEST, jasowang(a)redhat.com wrote:
>> > > On 2020/8/5 上午10:16, Yan Zhao wrote:
>> > > > On Wed, Aug 05, 2020 at 10:22:15AM +0800, Jason Wang wrote:
>> > > > > On 2020/8/5 上午12:35, Cornelia Huck wrote:
>> > > > > > [sorry about not chiming in earlier]
>> > > > > >
>> > > > > > On Wed, 29 Jul 2020 16:05:03 +0800
>> > > > > > Yan Zhao <yan.y.zhao(a)intel.com> wrote:
>> > > > > >
>> > > > > > > On Mon, Jul 27, 2020 at 04:23:21PM -0600, Alex
Williamson wrote:
>> > > > > > (...)
>> > > > > >
>> > > > > > > > Based on the feedback we've received, the
previously proposed interface
>> > > > > > > > is not viable. I think there's agreement
that the user needs to be
>> > > > > > > > able to parse and interpret the version
information. Using json seems
>> > > > > > > > viable, but I don't know if it's the
best option. Is there any
>> > > > > > > > precedent of markup strings returned via
sysfs we could follow?
>> > > > > > I don't think encoding complex information in a
sysfs file is a viable
>> > > > > > approach. Quoting Documentation/filesystems/sysfs.rst:
>> > > > > >
>> > > > > > "Attributes should be ASCII text files, preferably
with only one value
>> > > > > > per file. It is noted that it may not be efficient to
contain only one
>> > > > > > value per file, so it is socially acceptable to express
an array of
>> > > > > > values of the same type.
>> > > > > > Mixing types, expressing multiple lines of data, and
doing fancy
>> > > > > > formatting of data is heavily frowned upon."
>> > > > > >
>> > > > > > Even though this is an older file, I think these
restrictions still
>> > > > > > apply.
>> > > > > +1, that's another reason why devlink(netlink) is
better.
>> > > > >
>> > > > hi Jason,
>> > > > do you have any materials or sample code about devlink, so we can
have a good
>> > > > study of it?
>> > > > I found some kernel docs about it but my preliminary study
didn't show me the
>> > > > advantage of devlink.
>> > >
>> > > CC Jiri and Parav for a better answer for this.
>> > >
>> > > My understanding is that the following advantages are obvious (as I
replied
>> > > in another thread):
>> > >
>> > > - existing users (NIC, crypto, SCSI, ib), mature and stable
>> > > - much better error reporting (ext_ack other than string or errno)
>> > > - namespace aware
>> > > - do not couple with kobject
>> > Jason, what is your use case?
>>
>>
>> I think the use case is to report device compatibility for live migration.
>> Yan proposed a simple sysfs based migration version first, but it looks not
>> sufficient and something based on JSON is discussed.
>>
>> Yan, can you help to summarize the discussion so far for Jiri as a
>> reference?
>>
>yes.
>we are currently defining an device live migration compatibility
>interface in order to let user space like openstack and libvirt knows
>which two devices are live migration compatible.
>currently the devices include mdev (a kernel emulated virtual device)
>and physical devices (e.g. a VF of a PCI SRIOV device).
>
>the attributes we want user space to compare including
>common attribues:
> device_api: vfio-pci, vfio-ccw...
> mdev_type: mdev type of mdev or similar signature for physical device
> It specifies a device's hardware capability. e.g.
> i915-GVTg_V5_4 means it's of 1/4 of a gen9 Intel graphics
> device.
> software_version: device driver's version.
> in <major>.<minor>[.bugfix] scheme, where there is no
> compatibility across major versions, minor versions have
> forward compatibility (ex. 1-> 2 is ok, 2 -> 1 is not) and
> bugfix version number indicates some degree of internal
> improvement that is not visible to the user in terms of
> features or compatibility,
>
>vendor specific attributes: each vendor may define different attributes
> device id : device id of a physical devices or mdev's parent pci device.
> it could be equal to pci id for pci devices
> aggregator: used together with mdev_type. e.g. aggregator=2 together
> with i915-GVTg_V5_4 means 2*1/4=1/2 of a gen9 Intel
> graphics device.
> remote_url: for a local NVMe VF, it may be configured with a remote
> url of a remote storage and all data is stored in the
> remote side specified by the remote url.
> ...
>
>Comparing those attributes by user space alone is not an easy job, as it
>can't simply assume an equal relationship between source attributes and
>target attributes. e.g.
>for a source device of mdev_type=i915-GVTg_V5_4,aggregator=2, (1/2 of
>gen9), it actually could find a compatible device of
>mdev_type=i915-GVTg_V5_8,aggregator=4 (also 1/2 of gen9),
>if mdev_type of i915-GVTg_V5_4 is not available in the target machine.
>
>So, in our current proposal, we want to create two sysfs attributes
>under a device sysfs node.
>/sys/<path to device>/migration/self
>/sys/<path to device>/migration/compatible
>
>#cat /sys/<path to device>/migration/self
>device_type=vfio_pci
>mdev_type=i915-GVTg_V5_4
>device_id=8086591d
>aggregator=2
>software_version=1.0.0
>
>#cat /sys/<path to device>/migration/compatible
>device_type=vfio_pci
>mdev_type=i915-GVTg_V5_{val1:int:2,4,8}
>device_id=8086591d
>aggregator={val1}/2
>software_version=1.0.0
>
>The /sys/<path to device>/migration/self specifies self attributes of
>a device.
>The /sys/<path to device>/migration/compatible specifies the list of
>compatible devices of a device. as in the example, compatible devices
>could have
> device_type == vfio_pci &&
> device_id == 8086591d &&
> software_version == 1.0.0 &&
> (
> (mdev_type of i915-GVTg_V5_2 && aggregator==1) ||
> (mdev_type of i915-GVTg_V5_4 && aggregator==2) ||
> (mdev_type of i915-GVTg_V5_8 && aggregator=4)
> )
>
>by comparing whether a target device is in compatible list of source
>device, the user space can know whether a two devices are live migration
>compatible.
>
>Additional notes:
>1)software_version in the compatible list may not be necessary as it
>already has a major.minor.bugfix scheme.
>2)for vendor attribute like remote_url, it may not be statically
>assigned and could be changed with a device interface.
>
>So, as Cornelia pointed that it's not good to use complex format in
>a sysfs attribute, we'd like to know whether there're other good ways to
>our use case, e.g. splitting a single attribute to multiple simple sysfs
>attributes as what Cornelia suggested or devlink that Jason has strongly
>recommended.
Hi Yan.
Hi Jiri,
Thanks for the explanation, I'm still fuzzy about the details.
Anyway, I suggest you to check "devlink dev info" command we have
implemented for multiple drivers. You can try netdevsim to test this.
I think that the info you need to expose might be put there.
do you mean
drivers/net/netdevsim/ ?
Devlink creates instance per-device. Specific device driver calls into
devlink core to create the instance. What device do you have? What
the devlink
core is net/core/devlink.c ?
driver is it handled by?
It looks that the devlink is for network device specific, and in
devlink.h, it says
include/uapi/linux/devlink.h - Network physical device Netlink
interface, I feel like it's not very appropriate for a GPU driver to use
this interface. Is that right?
Thanks
Yan