On Wed, 21 Mar 2018 15:46:01 +0000
Ciprian Barbu <Ciprian.Barbu(a)enea.com> wrote:
Hello,
In the context of running Openstack on a cluster of Cavium ThunderX cn8890 aarch64
servers, we are trying to attach virtual functions to a VM.
First some introduction. This Cavium SoC has a different approach to Virtual Functions
than on x86 NICs, in which VFs are always enabled and there are two types of VFs and *one
single* PF, as follows:
- primary VFs - these are in fact assigned by the system to the physical ports of the
server, e.g em2p1s0f1, em2p1s0f3 etc below.
- secondary VFs - the main purpose of these is to provide additional HW queues under SW
control (usually DPDK applications) by automatically binding them to the needed physical
port.
- one single "physical" function, device 0002:01:00.0 below, which to the best
of my knowledge acts merely as a stub and cannot be assigned an interface name.
Below is the output of "dpdk-devbind.py -s" which provides some useful
information.
Network devices using DPDK-compatible driver
============================================
0002:01:00.2 'Device a034' drv=vfio-pci unused=nicvf
Network devices using kernel driver
===================================
0000:01:10.0 'THUNDERX BGX (Common Ethernet Interface)' if= drv=thunder-BGX
unused=thunder_bgx,vfio-pci
0000:01:10.1 'THUNDERX BGX (Common Ethernet Interface)' if= drv=thunder-BGX
unused=thunder_bgx,vfio-pci
0002:01:00.0 'THUNDERX Network Interface Controller' if= drv=thunder-nic
unused=nicpf,vfio-pci
0002:01:00.1 'Device a034' if=em2p1s0f1 drv=thunder-nicvf unused=nicvf,vfio-pci
0002:01:00.3 'Device a034' if=em2p1s0f3 drv=thunder-nicvf unused=nicvf,vfio-pci
0002:01:00.4 'Device a034' if=em2p1s0f4 drv=thunder-nicvf unused=nicvf,vfio-pci
0002:01:00.5 'Device a034' if=em2p1s0f5 drv=thunder-nicvf unused=nicvf,vfio-pci
0002:01:00.6 'Device a034' if= drv=thunder-nicvf unused=nicvf,vfio-pci
0002:01:00.7 'Device a034' if= drv=thunder-nicvf unused=nicvf,vfio-pci
0002:01:01.0 'Device a034' if= drv=thunder-nicvf unused=nicvf,vfio-pci
Now for the problem. I don't have a domain definition because libvirt fails to start
a domain, but I might be able to find what nova generates. But what it tries to do is
passthrough em2p1s0f3, address 0002:01:00.3:
<interface type='hostdev' managed='yes'>
<source>
<address type='pci' domain='0x0002' bus='0x1'
slot='0x0' function='0x3'/>
</source>
</interface>
When you use an <interface> definition, I believe libvirt is
interpreting this specifically as a network device and perhaps expects
to find an interface on the pf through which it can do setup. You can
also specify assigned devices via a <hostdev> entry, such as:
<hostdev mode='subsystem' type='pci' managed='yes'>
<driver name='vfio'/>
<source>
<address type='pci' domain='0x0002' bus='0x1'
slot='0x0' function='0x3'/>
</source>
</hostdev>
In which case libvirt shouldn't care that the device is a VF and
should have no dependency on a PF interface (or ability to configure
the VF via the PF), I think. Cc'ing libvirt experts. There's a
proposed stub driver in the upstream kernel that would also act in a
similar fashion, the host PF driver is nothing more than a stub that
enables the VFs, so libvirt would need to handle those VFs in a way
that has no dependency on the PF being a network interface, or any
other sort of interface. Thanks,
Alex
You can find attached a trimmed libvirtd.log where the main error
is:
43236: error : virPCIGetVirtualFunctionInfo:2927 : internal error: The PF device for VF
/sys/bus/pci/devices/0002:01:00.3 has no network device name
I have actually spent a few days trying to do some hacks and learn some more. The main
idea is that virPCIGetVirtualFunctionInfo fails to find the physical name for the virtual
device at address 0002:01:00.3, which as I explained in the introduction is something that
this Cavium SoC does not do.
Looking further down the stream, almost all of the helper functions need a linkdev for
the physical function, which means that making libvirt work on this system means some
heavy refactoring, a solution being to use the sysfs path rather than the interface name.
This will not work 100% from what I've seen, at least virNetDevGetVfConfig uses
netlink to save the admin MAC (part of virNetDevSaveNetConfig), and netlink needs the
ifname.
So I'm quite stuck on finding a workaround/fix for this platform which would
potentially be something upstreamable, so that we, ENEA, don't burden with maintaining
an ugly hack. Right now we are using libvirt 3.5.0 but we can upgrade to something newer
if need.
The question(s) thus, are
1. is this problem known in the libvirt community?
2. Is there any plan to make it work?
3. Can you give some pointers on an approach to adapt libvirt to this system?
4. Maybe it's worth changing the kernel to assign a sort of dummy interface to the
physical function?
Thanks and sorry for the long email,
/Ciprian