[libvirt] how to know if PCI device has SR-IOV PF capability

Hi, I was looking on the output of virsh nodedev-dumpxml on a PCI to see if it has SR-IOV PF capability. It seem that if the virtual functions are enables the xml look like [1] but if the PCI has no VFs enabled the output is like in [2]. As you can see for PCI which has no VFs the <capability type='virt_functions'> tag doen't exist. Is this by design? I would except that <capability type='virt_functions'/> tag with empty elements will also be include in that case. Thanks, Moshe Levi. [1] root@r-ufm152:~# virsh nodedev-dumpxml pci_0000_03_00_0 <device> <name>pci_0000_03_00_0</name> <path>/sys/devices/pci0000:00/0000:00:02.0/0000:03:00.0</path> <parent>pci_0000_00_02_0</parent> <driver> <name>mlx5_core</name> </driver> <capability type='pci'> <domain>0</domain> <bus>3</bus> <slot>0</slot> <function>0</function> <product id='0x1013'>MT27700 Family [ConnectX-4]</product> <vendor id='0x15b3'>Mellanox Technologies</vendor> <capability type='virt_functions'> <address domain='0x0000' bus='0x03' slot='0x00' function='0x2'/> <address domain='0x0000' bus='0x03' slot='0x00' function='0x3'/> <address domain='0x0000' bus='0x03' slot='0x00' function='0x4'/> <address domain='0x0000' bus='0x03' slot='0x00' function='0x5'/> </capability> <iommuGroup number='15'> <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/> <address domain='0x0000' bus='0x03' slot='0x00' function='0x1'/> </iommuGroup> <numa node='0'/> <pci-express> <link validity='cap' port='0' speed='8' width='16'/> <link validity='sta' speed='8' width='16'/> </pci-express> </capability> </device> [2] root@r-ufm152:~# virsh nodedev-dumpxml pci_0000_03_00_1 <device> <name>pci_0000_03_00_1</name> <path>/sys/devices/pci0000:00/0000:00:02.0/0000:03:00.1</path> <parent>pci_0000_00_02_0</parent> <driver> <name>mlx5_core</name> </driver> <capability type='pci'> <domain>0</domain> <bus>3</bus> <slot>0</slot> <function>1</function> <product id='0x1013'>MT27700 Family [ConnectX-4]</product> <vendor id='0x15b3'>Mellanox Technologies</vendor> <iommuGroup number='15'> <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/> <address domain='0x0000' bus='0x03' slot='0x00' function='0x1'/> </iommuGroup> <numa node='0'/> <pci-express> <link validity='cap' port='0' speed='8' width='16'/> <link validity='sta' speed='8' width='16'/> </pci-express> </capability> </device>

On Sun, Nov 22, 2015 at 11:04:28AM +0000, Moshe Levi wrote:
Hi,
I was looking on the output of virsh nodedev-dumpxml on a PCI to see if it has SR-IOV PF capability. It seem that if the virtual functions are enables the xml look like [1] but if the PCI has no VFs enabled the output is like in [2]. As you can see for PCI which has no VFs the <capability type='virt_functions'> tag doen't exist. Is this by design? I would except that <capability type='virt_functions'/> tag with empty elements will also be include in that case.
That is an bug. The capability should be reported regardless of whether any are VFs currently enabled, so we should fix this. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 11/23/2015 05:05 AM, Daniel P. Berrange wrote:
On Sun, Nov 22, 2015 at 11:04:28AM +0000, Moshe Levi wrote:
Hi,
I was looking on the output of virsh nodedev-dumpxml on a PCI to see if it has SR-IOV PF capability. It seem that if the virtual functions are enables the xml look like [1] but if the PCI has no VFs enabled the output is like in [2]. As you can see for PCI which has no VFs the <capability type='virt_functions'> tag doen't exist. Is this by design? I would except that <capability type='virt_functions'/> tag with empty elements will also be include in that case. That is an bug. The capability should be reported regardless of whether any are VFs currently enabled, so we should fix this.
Prior to libvirt 1.0.4, "<capability type='virt_functions'>" was there for every PCI device regardless of whether or not it had the ability to have virtual functions. It looks like commit 9a3ff01d changed it to only emit that element when there was at least one VF. So it used to be wrong, and now it is wrong in a different way :-) If we're going to switch to emiting virt_functions whenever a device has the potential to provide VFs, we may as well make it worthwhile and also emit the maximum possible VFs for the device, maybe simply: <capability type='virt_functions' maxCount='7'> (the current count is implicit in the number of entries in the list that follows. I don't have an opinion on whether it is better to also include explicitly with, e.g. "count='7'", or just leave it implicit).

On Mon, Nov 23, 2015 at 10:34:43AM -0500, Laine Stump wrote:
On 11/23/2015 05:05 AM, Daniel P. Berrange wrote:
On Sun, Nov 22, 2015 at 11:04:28AM +0000, Moshe Levi wrote:
Hi,
I was looking on the output of virsh nodedev-dumpxml on a PCI to see if it has SR-IOV PF capability. It seem that if the virtual functions are enables the xml look like [1] but if the PCI has no VFs enabled the output is like in [2]. As you can see for PCI which has no VFs the <capability type='virt_functions'> tag doen't exist. Is this by design? I would except that <capability type='virt_functions'/> tag with empty elements will also be include in that case. That is an bug. The capability should be reported regardless of whether any are VFs currently enabled, so we should fix this.
Prior to libvirt 1.0.4, "<capability type='virt_functions'>" was there for every PCI device regardless of whether or not it had the ability to have virtual functions. It looks like commit 9a3ff01d changed it to only emit that element when there was at least one VF. So it used to be wrong, and now it is wrong in a different way :-)
Awesome.
If we're going to switch to emiting virt_functions whenever a device has the potential to provide VFs, we may as well make it worthwhile and also emit the maximum possible VFs for the device, maybe simply:
<capability type='virt_functions' maxCount='7'>
(the current count is implicit in the number of entries in the list that follows. I don't have an opinion on whether it is better to also include explicitly with, e.g. "count='7'", or just leave it implicit).
Is there any way for us to actually discover the max count ? If so, then it seems nice to include it. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 11/23/2015 10:40 AM, Daniel P. Berrange wrote:
On Mon, Nov 23, 2015 at 10:34:43AM -0500, Laine Stump wrote:
If we're going to switch to emiting virt_functions whenever a device has the potential to provide VFs, we may as well make it worthwhile and also emit the maximum possible VFs for the device, maybe simply:
<capability type='virt_functions' maxCount='7'>
(the current count is implicit in the number of entries in the list that follows. I don't have an opinion on whether it is better to also include explicitly with, e.g. "count='7'", or just leave it implicit). Is there any way for us to actually discover the max count ? If so, then it seems nice to include it.
Yes. I don't know if it existed back when that code was originally added, but at least RHEL6.7 (the oldest OS I have running on a machine with an SRIOV-capable card) and later have two files in the device's sysfs, sriov_numvfs and sriov_totalvfs. The former is the number that are currently active, and the latter is the maximum possible for this PF. (On a related topic - you can change the number of currently active VFs by writing "0" to sriov_numvfs then writing the desired number to it; this does temporarily delete any VFs that are already active though, so it can only be done if none are in use. I've planned to hook libvirt networks up to this so that VFs can be enabled completely within libvirt (since the driver commandline method isn't consistent between different vendors, and I believe is now considered to be deprecated). Since Openstack doesn't use libvirt networks but may want similar functionality, I'm wondering where would be a good place to do that. We could provide something via the node-device API, but that couldn't be automatically done by libvirtd at startup; alternately Openstack could create a network but not use it, but that just seems conceptually confusing even though it would work.)

On Mon, Nov 23, 2015 at 11:26:11AM -0500, Laine Stump wrote:
On 11/23/2015 10:40 AM, Daniel P. Berrange wrote:
On Mon, Nov 23, 2015 at 10:34:43AM -0500, Laine Stump wrote:
If we're going to switch to emiting virt_functions whenever a device has the potential to provide VFs, we may as well make it worthwhile and also emit the maximum possible VFs for the device, maybe simply:
<capability type='virt_functions' maxCount='7'>
(the current count is implicit in the number of entries in the list that follows. I don't have an opinion on whether it is better to also include explicitly with, e.g. "count='7'", or just leave it implicit). Is there any way for us to actually discover the max count ? If so, then it seems nice to include it.
Yes. I don't know if it existed back when that code was originally added, but at least RHEL6.7 (the oldest OS I have running on a machine with an SRIOV-capable card) and later have two files in the device's sysfs, sriov_numvfs and sriov_totalvfs. The former is the number that are currently active, and the latter is the maximum possible for this PF.
(On a related topic - you can change the number of currently active VFs by writing "0" to sriov_numvfs then writing the desired number to it; this does temporarily delete any VFs that are already active though, so it can only be done if none are in use. I've planned to hook libvirt networks up to this so that VFs can be enabled completely within libvirt (since the driver commandline method isn't consistent between different vendors, and I believe is now considered to be deprecated). Since Openstack doesn't use libvirt networks but may want similar functionality, I'm wondering where would be a good place to do that. We could provide something via the node-device API, but that couldn't be automatically done by libvirtd at startup; alternately Openstack could create a network but not use it, but that just seems conceptually confusing even though it would work.)
There is scope to extend node device APIs to allow definition of persistent config for virtual devices. We have similar scenario wrt NPIV devices which are dynamically allocatable Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Hi Laine, Did you have a bug for this issue? I need it to document it in for my workaround in OpenStack. Thanks, Moshe Levi.
-----Original Message----- From: Daniel P. Berrange [mailto:berrange@redhat.com] Sent: Monday, November 23, 2015 6:29 PM To: Laine Stump <laine@laine.org> Cc: libvir-list@redhat.com; Moshe Levi <moshele@mellanox.com> Subject: Re: [libvirt] how to know if PCI device has SR-IOV PF capability
On Mon, Nov 23, 2015 at 11:26:11AM -0500, Laine Stump wrote:
On 11/23/2015 10:40 AM, Daniel P. Berrange wrote:
On Mon, Nov 23, 2015 at 10:34:43AM -0500, Laine Stump wrote:
If we're going to switch to emiting virt_functions whenever a device has the potential to provide VFs, we may as well make it worthwhile and also emit the maximum possible VFs for the device, maybe simply:
<capability type='virt_functions' maxCount='7'>
(the current count is implicit in the number of entries in the list that follows. I don't have an opinion on whether it is better to also include explicitly with, e.g. "count='7'", or just leave it implicit). Is there any way for us to actually discover the max count ? If so, then it seems nice to include it.
Yes. I don't know if it existed back when that code was originally added, but at least RHEL6.7 (the oldest OS I have running on a machine with an SRIOV-capable card) and later have two files in the device's sysfs, sriov_numvfs and sriov_totalvfs. The former is the number that are currently active, and the latter is the maximum possible for this PF.
(On a related topic - you can change the number of currently active VFs by writing "0" to sriov_numvfs then writing the desired number to it; this does temporarily delete any VFs that are already active though, so it can only be done if none are in use. I've planned to hook libvirt networks up to this so that VFs can be enabled completely within libvirt (since the driver commandline method isn't consistent between different vendors, and I believe is now considered to be deprecated). Since Openstack doesn't use libvirt networks but may want similar functionality, I'm wondering where would be a good place to do that. We could provide something via the node-device API, but that couldn't be automatically done by libvirtd at startup; alternately Openstack could create a network but not use it, but that just seems conceptually confusing even though it would work.)
There is scope to extend node device APIs to allow definition of persistent config for virtual devices. We have similar scenario wrt NPIV devices which are dynamically allocatable
Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
participants (3)
-
Daniel P. Berrange
-
Laine Stump
-
Moshe Levi