Hi Daniel,
Thanks a lot for the quick reply - it's much appreciated.
IIUC, the high level scenario is as follows
Yes, the high-level description matches the use-case.
The SmartNIC DPU cores may even rely on something else other than PCIe in a
general case: i.e. they could use a platform device or a different I/O
specification to access the network controller while the hypervisor host
would
rely on PCIe. The end result is the same though - hypervisor host PCI
addresses
cannot be relied upon to identify the so called "port representors"
(
https://lwn.net/Articles/692942/) at the SmartNIC DPU operating system
side.
Moreover, there can be multiple SmartNIC DPUs per hypervisor in a general
case
each with its own set of PFs and VFs. In order to determine which DPU is
going
to handle representor port programming at the control plane level, there
needs
to be a way to identify a DPU based on a VF selected by the hypervisor (at
least in Nova, VF selection is driven from the hypervisor side). A board
serial
number can be determined both from the hypervisor and the DPU independently
and so the hypervisor services can provide the board serial to the network
control plane for the discovery of a relevant DPU. That's where Libvirt
comes
in for helping with serial number retrieval.
This seems like a reasonable feature request to me, since there is
a piece of info that apps using libvirt need, and libvirt does not
expose this. Requiring the mgmt app like Nova to dig into the host
PCI config space indicates a clear gap in libvirt functionality.
Ack, thanks for confirming.
Is scenario (2) going to be at all common ? What would be a reason
why the
info is not exposed via the standardized VPD - is it just a legacy hardware
issue ?
VPD is an optional capability in the PCI and PCIe specs. While there is hope
that every SmartNIC DPU vendor will implement it seeing the need for it,
there might be some fragmentation because the specs do not mandate its
presence. Scenario (2) is an attempt to have an alternative source for the
same piece of information: if a serial is available via the driver (which
may
query NIC firmware instead of reading VPD) it can still be used with the
same
end result. The devlink-info API does not mandate that a board serial is
exposed either so there is no guarantee this will be available via devlink.
It will surely be simpler to just implement scenario (1) and add (2)
later if there is a significant need for it. The generally available
hardware
I have seen has VPD exposed so I can just focus on (1) while we can decide
on whether to do (2) or not.
Best Regards,
Dmitrii Shcherbakov
LP: ~dmitriis
On Tue, Jun 1, 2021 at 2:32 PM Daniel P. Berrangé <berrange(a)redhat.com>
wrote:
> On Fri, May 28, 2021 at 10:58:05PM +0300, Dmitrii Shcherbakov wrote:
> > Hello Libvirt Developers,
> >
> > I am looking for some feedback on a planned enhancement to Libvirt: the
> aim
> > is
> > to store a portion of PCI(e) Vital Product Data (VPD) for each device
> along
> > with other PCI/PCIe device information already collected. Specifically,
> the
> > SN
> > (Serial Number) read-only field of a VPD data structure of a device is of
> > interest which is described in PCI/PCIe specs (PCI local bus 2.1+ and
> PCIe
> > 4.0+).
> >
> > The context for this is the cross-project work in OpenStack (Nova,
> Neutron),
> > OVS and OVN to support for off-path SmartNIC DPUs ([1], [2], [3], [4]).
> The
> > Nova specification [1] provides an overview of the relevant hardware and
> the
> > use-case for board serial numbers, however, VPD is the standard
> capability
> > in
> > the PCI/PCIe specifications not tied to the use-case in particular so the
> > suggestion from the Nova core team was to aim at introducing means of
> > collecting this information via Libvirt. It can then be retrieved by the
> > respective virt driver in Nova via Libvirt without having to introduce
> this
> > code into Nova itself.
>
> I've talked with Sean Mooney at little about the use case this morning.
>
IIUC, the high level scenario is as follows
>
> - The main host machine has a PCI controller topology to which the
> NICs are attached. This is how the host OS and by extension libvirt,
> nova, etc, see the PCI devices.
>
> - There is a second PCI controller topology to which the NICs are
> attached. This is only visible to the arm cores for the offload
> engine
>
> - Nova/Neutron can identify the NICs based on the PCI topology
> seen by the host OS, but need to tell the NIC mgmt software
> which NIC to use in a way that can be undersood by the offload
> cores.
>
> IOW, the PCI address is not usable as a unique identifier because
> there are two completely independant PCI topologies with no mapping
> between them.
>
> The VPD data provides a replacement way to identify a NIC based on
> a unique serial number that is indendant of PCI topology. Nova needs
> this serial number in order to configure the device offload featues.
>
This seems like a reasonable feature request to me, since there is
> a piece of info that apps using libvirt need, and libvirt does not
> expose this. Requiring the mgmt app like Nova to dig into the host
> PCI config space indicates a clear gap in libvirt functionality.
>
> > I would like to suggest the following to be done in Libvirt:
> >
> > 1) adding the code for extracting a serial number from VPD for PCI/PCIe
> > devices
> > in general and storing it for exposure via the Libvirt API;
> > More specifically, I propose adding a nested capability called "vpd"
> under
> > VIR_NODE_DEV_CAP_PCI_DEV:
> > <capability type='pci'>
> > <capability type='vpd'>
> > <serial>UNIQUESERIAL</serial>
> > <!-- ... other VPD attributes if present -->
> > </capability>
> > <!-- ... -->
> > </capability>
>
> This looks like a reasonable proposal
>
> > 2) (optional) implementing functionality to obtain a board serial number
> via
> > devlink-info for PCIe devices if they do not expose a VPD capability
> > but the device driver can retrieve it via firmware. The board serial
> number
> > can be stored in the same element as suggested above.
>
Is scenario (2) going to be at all common ? What would be a reason
why the
> info is not exposed via the standardized VPD - is it just a legacy
hardware
> issue ?
>
> > Not all devices expose the devlink API and even fewer do expose board
> serial
> > via devlink-info:
> >
> > * devlink was added in 4.10 [11];
> > * devlink-info was introduced in 5.1 [12];
> > * querying for board.serial_number was added in kernel 5.9 [13] and
> iproute2
> > 5.9.0 [14];
> > * Besides the generic devlink infrastructure support above, device
> drivers
> > also need to support exposing this field.
> >
> > Therefore, implementing two approaches (sysfs VPD, devlink) is preferable
> > for better compatibility.
> >
> > I would appreciate any feedback on whether this potential addition makes
> > sense.
> > If so, I can look into implementing this.
>
> It makes sense to me.
>
> Regards,
> Daniel
> --
> |:
https://berrange.com -o-
>
https://www.flickr.com/photos/dberrange :|
> |:
https://libvirt.org -o-
>
https://fstop138.berrange.com :|
> |:
https://entangle-photo.org -o-
>
https://www.instagram.com/dberrange :|
>
>