So ultimately the problem was somewhere in the BIOS. A BIOS update fixed
the issue.
Riccardo
On Tue, 7 Apr 2020 at 18:05, Riccardo Ravaioli <riccardoravaioli(a)gmail.com>
wrote:
Hi,
I'm on a Dell VEP 1405 running Debian 9.11 and I'm running a few tests
with various interfaces given in PCI passthrough to a qemu/KVM Virtual
Machine also running Debian 9.11.
I noticed that only one of the four I350 network controllers can be used
in PCI passthrough. The available interfaces are:
*# dpdk-devbind.py --status Network devices using kernel driver
==============================*
*===== 0000:02:00.0 'I350 Gigabit Network Connection 1521' if=eth2 drv=igb
unused=igb_uio,vfio-pci,uio_*
*pci_generic 0000:02:00.1 'I350 Gigabit Network Connection 1521' if=eth3
drv=igb unused=igb_uio,vfio-pci,uio_*
*pci_generic 0000:02:00.2 'I350 Gigabit Network Connection 1521' if=eth0
drv=igb unused=igb_uio,vfio-pci,uio_*
*pci_generic 0000:02:00.3 'I350 Gigabit Network Connection 1521' if=eth1
drv=igb unused=igb_uio,vfio-pci,uio_*
*pci_generic 0000:04:00.0 'QCA986x/988x 802.11ac Wireless Network Adapter
003c' if= drv=ath10k_pci unused=igb_uio,vfio-pci,uio_*
*pci_generic 0000:05:00.0 'Device 15c4' if=eth7 drv=ixgbe
unused=igb_uio,vfio-pci,uio_*
*pci_generic 0000:05:00.1 'Device 15c4' if=eth6 drv=ixgbe
unused=igb_uio,vfio-pci,uio_*
*pci_generic 0000:07:00.0 'Device 15e5' if=eth5 drv=ixgbe
unused=igb_uio,vfio-pci,uio_*
*pci_generic 0000:07:00.1 'Device 15e5' if=eth4 drv=ixgbe
unused=igb_uio,vfio-pci,uio_**pci_generic*
If I try PCI passthrough on 02:00.2 (eth0), it works fine. With any of the
remaining three interfaces, libvirt fails with this error:
*# virsh create vnf.xml error: Failed to create domain from vnf.xml error:
internal error: process exited while connecting to monitor:
2020-04-06T16:08:47.048266Z qemu-system-x86_64: -device
vfio-pci,host=02:00.1,id=**hostdev0,bus=pci.0,addr=0x5: vfio
0000:02:00.1: failed to setup INTx fd: Operation not permitted*
The contents of vnf.xml are available here:
https://pastebin.com/rT3RmAi5
This is what happened in *dmesg* when I tried to start the VM:
*[ 7305.371730] igb 0000:02:00.1: removed PHC on eth3 [ 7307.085618] ACPI
Warning: \_SB.PCI0.PEX2._PRT: Return Package has no elements (empty)
(20160831/nsprepkg-130) [ 7307.085717] pcieport 0000:00:0b.0: can't derive
routing for PCI INT B [ 7307.085719] vfio-pci 0000:02:00.1: PCI INT B: no
GSI [ 7307.369611] igb 0000:02:00.1: enabling device (0400 -> 0402) [
7307.369668] ACPI Warning: \_SB.PCI0.PEX2._PRT: Return Package has no
elements (empty) (20160831/nsprepkg-130) [ 7307.369764] pcieport
0000:00:0b.0: can't derive routing for PCI INT B [ 7307.369766] igb
0000:02:00.1: PCI INT B: no GSI [ 7307.426266] igb 0000:02:00.1: added PHC
on eth3 [ 7307.426269] igb 0000:02:00.1: Intel(R) Gigabit Ethernet Network
Connection [ 7307.426271] igb 0000:02:00.1: eth3: (PCIe:5.0Gb/s:Width x2)
50:9a:4c:ee:9f:b1 [ 7307.426350] igb 0000:02:00.1: eth3: PBA No: 106300-000
[ 7307.426352] igb 0000:02:00.1: Using MSI-X interrupts. 4 rx queue(s), 4
tx queue(s)*
These are all the messages related to that device in dmesg before I tried
to start the VM:
*# dmesg | grep 02:00.1 [ 0.185301] pci 0000:02:00.1: [8086:1521] type
00 class 0x020000 [ 0.185317] pci 0000:02:00.1: reg 0x10: [mem
0xdfd40000-0xdfd5ffff] [ 0.185334] pci 0000:02:00.1: reg 0x18: [io
0xd040-0xd05f] [ 0.185343] pci 0000:02:00.1: reg 0x1c: [mem
0xdfd88000-0xdfd8bfff] [ 0.185434] pci 0000:02:00.1: PME# supported from
D0 D3hot D3cold [ 0.185464] pci 0000:02:00.1: reg 0x184: [mem
0xdeea0000-0xdeea3fff 64bit pref] [ 0.185467] pci 0000:02:00.1: VF(n)
BAR0 space: [mem 0xdeea0000-0xdeebffff 64bit pref] (contains BAR0 for 8
VFs) [ 0.185486] pci 0000:02:00.1: reg 0x190: [mem 0xdee80000-0xdee83fff
64bit pref] [ 0.185488] pci 0000:02:00.1: VF(n) BAR3 space: [mem
0xdee80000-0xdee9ffff 64bit pref] (contains BAR3 for 8 VFs) [ 0.334021]
DMAR: Hardware identity mapping for device 0000:02:00.1 [ 0.334463]
iommu: Adding device 0000:02:00.1 to group 16 [ 0.398809] pci
0000:02:00.1: Signaling PME through PCIe PME interrupt [ 2.588049] igb
0000:02:00.1: PCI INT B: not connected [ 2.643900] igb 0000:02:00.1:
added PHC on eth1 [ 2.643903] igb 0000:02:00.1: Intel(R) Gigabit
Ethernet Network Connection [ 2.643905] igb 0000:02:00.1: eth1:
(PCIe:5.0Gb/s:Width x2) 50:9a:4c:ee:9f:b1 [ 2.643984] igb 0000:02:00.1:
eth1: PBA No: 106300-000 [ 2.643986] igb 0000:02:00.1: Using MSI-X
interrupts. 4 rx queue(s), 4 tx queue(s) [ 2.873544] igb 0000:02:00.1
rename3: renamed from eth1 [ 2.939352] igb 0000:02:00.1 eth3: renamed
from rename3*
In particular this looks suspicious: *igb 0000:02:00.1: PCI INT B: not
connected*
The full dmesg is available here:
https://pastebin.com/kPbUAKCi
This is the PCI bus structure:
*# lspci -tv -[0000:00]-+-00.0 Intel Corporation Device 1980
+-04.0 Intel Corporation Device 19a1 +-05.0 Intel Corporation
Device 19a2 +-06.0-[01]----00.0 Intel Corporation Device 19e2
+-0b.0-[02-03]--+-00.0 Intel Corporation I350 Gigabit Network
Connection | +-00.1 Intel Corporation I350
Gigabit Network Connection | +-00.2 Intel
Corporation I350 Gigabit Network Connection |
\-00.3 Intel Corporation I350 Gigabit Network Connection
+-0f.0-[04]----00.0 Qualcomm Atheros QCA986x/988x 802.11ac Wireless
Network Adapter +-12.0 Intel Corporation DNV SMBus Contoller -
Host +-13.0 Intel Corporation DNV SATA Controller 0
+-15.0 Intel Corporation Device 19d0 +-16.0-[05-06]--+-00.0
Intel Corporation Device 15c4 | \-00.1 Intel
Corporation Device 15c4 +-17.0-[07-08]--+-00.0 Intel
Corporation Device 15e5 | \-00.1 Intel
Corporation Device 15e5 +-18.0 Intel Corporation Device 19d3
+-1c.0 Intel Corporation Device 19db +-1f.0 Intel
Corporation DNV LPC or eSPI +-1f.2 Intel Corporation Device
19de +-1f.4 Intel Corporation DNV SMBus controller
\-1f.5 Intel Corporation DNV SPI Controller*
By looking at lspci -v, there's something going on with the IRQ field
exactly in three devices I can't use in PCI passthrough ("IRQ
-2147483648"):
*# lspci -v|grep -A1 I350 02:00.0 Ethernet controller: Intel Corporation
I350 Gigabit Network Connection (rev 01) Flags: bus master, fast
devsel, latency 0, IRQ -2147483648 -- 02:00.1 Ethernet controller: Intel
Corporation I350 Gigabit Network Connection (rev 01) Flags: bus master,
fast devsel, latency 0, IRQ -2147483648 -- 02:00.2 Ethernet controller:
Intel Corporation I350 Gigabit Network Connection (rev 01) Flags: bus
master, fast devsel, latency 0, IRQ 18 -- 02:00.3 Ethernet controller:
Intel Corporation I350 Gigabit Network Connection (rev 01) Flags: bus
master, fast devsel, latency 0, IRQ -2147483648*
Finally, every i350 interface has its own IOMMU group in
/sys/kernel/iommu_groups/.
The kernel I'm using in the host machine is 4.9.189 and my libvirt version
is 4.3.0.
Any thoughts on this?
Is there something I should enable in the BIOS or in the kernel to make
this work?
Thanks!
Regards,
Riccardo Ravaioli