Hello,
In the context of running Openstack on a cluster of Cavium ThunderX cn8890 aarch64
servers, we are trying to attach virtual functions to a VM.
First some introduction. This Cavium SoC has a different approach to Virtual Functions
than on x86 NICs, in which VFs are always enabled and there are two types of VFs and *one
single* PF, as follows:
- primary VFs - these are in fact assigned by the system to the physical ports of the
server, e.g em2p1s0f1, em2p1s0f3 etc below.
- secondary VFs - the main purpose of these is to provide additional HW queues under SW
control (usually DPDK applications) by automatically binding them to the needed physical
port.
- one single "physical" function, device 0002:01:00.0 below, which to the best
of my knowledge acts merely as a stub and cannot be assigned an interface name.
Below is the output of "dpdk-devbind.py -s" which provides some useful
information.
Network devices using DPDK-compatible driver ============================================
0002:01:00.2 'Device a034' drv=vfio-pci unused=nicvf
Network devices using kernel driver
===================================
0000:01:10.0 'THUNDERX BGX (Common Ethernet Interface)' if= drv=thunder-BGX
unused=thunder_bgx,vfio-pci
0000:01:10.1 'THUNDERX BGX (Common Ethernet Interface)' if= drv=thunder-BGX
unused=thunder_bgx,vfio-pci
0002:01:00.0 'THUNDERX Network Interface Controller' if= drv=thunder-nic
unused=nicpf,vfio-pci
0002:01:00.1 'Device a034' if=em2p1s0f1 drv=thunder-nicvf unused=nicvf,vfio-pci
0002:01:00.3 'Device a034' if=em2p1s0f3 drv=thunder-nicvf unused=nicvf,vfio-pci
0002:01:00.4 'Device a034' if=em2p1s0f4 drv=thunder-nicvf unused=nicvf,vfio-pci
0002:01:00.5 'Device a034' if=em2p1s0f5 drv=thunder-nicvf unused=nicvf,vfio-pci
0002:01:00.6 'Device a034' if= drv=thunder-nicvf unused=nicvf,vfio-pci
0002:01:00.7 'Device a034' if= drv=thunder-nicvf unused=nicvf,vfio-pci
0002:01:01.0 'Device a034' if= drv=thunder-nicvf unused=nicvf,vfio-pci
Now for the problem. I don't have a domain definition because libvirt fails to start a
domain, but I might be able to find what nova generates. But what it tries to do is
passthrough em2p1s0f3, address 0002:01:00.3:
<interface type='hostdev' managed='yes'>
<source>
<address type='pci' domain='0x0002' bus='0x1'
slot='0x0' function='0x3'/>
</source>
</interface>
You can find attached a trimmed libvirtd.log where the main error is:
43236: error : virPCIGetVirtualFunctionInfo:2927 : internal error: The PF device for VF
/sys/bus/pci/devices/0002:01:00.3 has no network device name
I have actually spent a few days trying to do some hacks and learn some more. The main
idea is that virPCIGetVirtualFunctionInfo fails to find the physical name for the virtual
device at address 0002:01:00.3, which as I explained in the introduction is something that
this Cavium SoC does not do.
Looking further down the stream, almost all of the helper functions need a linkdev for the
physical function, which means that making libvirt work on this system means some heavy
refactoring, a solution being to use the sysfs path rather than the interface name.
This will not work 100% from what I've seen, at least virNetDevGetVfConfig uses
netlink to save the admin MAC (part of virNetDevSaveNetConfig), and netlink needs the
ifname.
So I'm quite stuck on finding a workaround/fix for this platform which would
potentially be something upstreamable, so that we, ENEA, don't burden with maintaining
an ugly hack. Right now we are using libvirt 3.5.0 but we can upgrade to something newer
if need.
The question(s) thus, are
1. is this problem known in the libvirt community?
2. Is there any plan to make it work?
3. Can you give some pointers on an approach to adapt libvirt to this system?
4. Maybe it's worth changing the kernel to assign a sort of dummy interface to the
physical function?
Thanks and sorry for the long email,
/Ciprian