[libvirt-users] PCI passthrough fails in virsh: iommu group is not viable

I would really appreciate some pointers on what I am doing wrong here. I have a need to run multiple virtual guests which have each their own GPU and some USB controllers passed-through. I am able to run one of the guests like this (assuming vfio stuff has happened elsewhere), but I would prefer to use virsh: kvm -M q35 -m 8192 -cpu host,kvm=off \ -smp 4,sockets=1,cores=4,threads=1 \ -bios /usr/share/seabios/bios.bin -vga none \ -device ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=root.1 \ -device vfio-pci,host=02:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on \ -device vfio-pci,host=02:00.1,bus=root.1,addr=00.1 \ -device vfio-pci,host=00:1d.0,bus=pcie.0 \ -device vfio-pci,host=00:1a.0,bus=pcie.0 \ -nographic -boot menu=on /vm2/foo.img I found the hardware addresses using lspci. When I invoke the same machine with virsh with what I believe are the same addresses, I get: virsh # start foo error: Failed to start domain foo error: internal error: process exited while connecting to monitor: 2015-08-12T18:24:10.651720Z qemu-system-x86_64: -device vfio-pci,host=02:00.0,id=hostdev0,bus=pci.0,addr=0x4: vfio: error, group 18 is not viable, please ensure all devices within the iommu_group are bound to their vfio bus driver. 2015-08-12T18:24:10.651752Z qemu-system-x86_64: -device vfio-pci,host=02:00.0,id=hostdev0,bus=pci.0,addr=0x4: vfio: failed to get group 18 2015-08-12T18:24:10.651766Z qemu-system-x86_64: -device vfio-pci,host=02:00.0,id=hostdev0,bus=pci.0,addr=0x4: Device initialization failed. 2015-08-12T18:24:10.651781Z qemu-system-x86_64: -device vfio-pci,host=02:00.0,id=hostdev0,bus=pci.0,addr=0x4: Device 'vfio-pci' could not be initialized I have included dumpxml output below -- is the hostdev section wrong? <domain type='kvm'> <name>foo</name> <uuid>51f57655-11be-41bf-b925-2e6aef01f9c4</uuid> <memory unit='KiB'>8388608</memory> <currentMemory unit='KiB'>8388608</currentMemory> <vcpu placement='static' current='1'>4</vcpu> <os> <type arch='x86_64' machine='pc-i440fx-utopic'>hvm</type> <bootmenu enable='yes'/> </os> <features> <acpi/> <apic/> <pae/> <hyperv> <relaxed state='on'/> <vapic state='on'/> <spinlocks state='on' retries='8191'/> </hyperv> </features> <cpu mode='custom' match='exact'> <model fallback='allow'>SandyBridge</model> <topology sockets='1' cores='4' threads='1'/> </cpu> <clock offset='localtime'> <timer name='rtc' tickpolicy='catchup'/> <timer name='pit' tickpolicy='delay'/> <timer name='hpet' present='no'/> <timer name='hypervclock' present='yes'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <pm> <suspend-to-mem enabled='no'/> <suspend-to-disk enabled='no'/> </pm> <devices> <emulator>/usr/bin/kvm-spice</emulator> <disk type='file' device='disk'> <driver name='qemu' type='raw'/> <source file='/vm2/foo.img'/> <target dev='sda' bus='sata'/> <boot order='1'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> <disk type='block' device='cdrom'> <driver name='qemu' type='raw'/> <target dev='hdb' bus='ide'/> <readonly/> <boot order='2'/> <address type='drive' controller='0' bus='0' target='0' unit='1'/> </disk> <controller type='usb' index='0' model='ich9-ehci1'> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x7'/> </controller> <controller type='usb' index='0' model='ich9-uhci1'> <master startport='0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0' multifunction='on'/> </controller> <controller type='usb' index='0' model='ich9-uhci2'> <master startport='2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x1'/> </controller> <controller type='usb' index='0' model='ich9-uhci3'> <master startport='4'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x2'/> </controller> <controller type='pci' index='0' model='pci-root'/> <controller type='ide' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> </controller> <controller type='virtio-serial' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </controller> <controller type='sata' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </controller> <interface type='direct'> <mac address='52:54:00:f0:47:f5'/> <source dev='p5p1' mode='bridge'/> <model type='e1000'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> <serial type='pty'> <target port='0'/> </serial> <console type='pty'> <target type='serial' port='0'/> </console> <channel type='spicevmc'> <target type='virtio' name='com.redhat.spice.0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> <input type='tablet' bus='usb'/> <input type='mouse' bus='ps2'/> <input type='keyboard' bus='ps2'/> <graphics type='spice' autoport='yes'/> <video> <model type='cirrus' vram='16384' heads='1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </video> <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </hostdev> <memballoon model='virtio'> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </memballoon> </devices> </domain> -- Alex Holst

On 08/12/2015 02:34 PM, Alex Holst wrote:
I would really appreciate some pointers on what I am doing wrong here.
I have a need to run multiple virtual guests which have each their own GPU and some USB controllers passed-through. I am able to run one of the guests like this (assuming vfio stuff has happened elsewhere), but I would prefer to use virsh:
kvm -M q35 -m 8192 -cpu host,kvm=off \ -smp 4,sockets=1,cores=4,threads=1 \ -bios /usr/share/seabios/bios.bin -vga none \ -device ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=root.1 \ -device vfio-pci,host=02:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on \ -device vfio-pci,host=02:00.1,bus=root.1,addr=00.1 \ -device vfio-pci,host=00:1d.0,bus=pcie.0 \ -device vfio-pci,host=00:1a.0,bus=pcie.0 \ -nographic -boot menu=on /vm2/foo.img
I found the hardware addresses using lspci. When I invoke the same machine with virsh with what I believe are the same addresses,
1) The XML that you provide at the end shows you only assigning 02:00.0, but not 02:00.1, and you are using managed='yes', so you are depending on libvirt to unbind any host driver and bind vfio-pci to the devices. If 02:00.0 and 02:00.1 are both in iommu group 18, but one of them isn't bound to vfio-pci, qemu will give "some kind of error" when it tries to assign any other device in the group. 2) your example qemu commandline uses the q35 machinetype, but your example libvirt domain uses pc-i440fx-utopic machinetype, so the bus structure is completely different. Assuming a recent enough libvirt, you should be able to create a virtual machine based on q35 and attach the devices to pcie-root (bus 0). You wouldn't be able to add an ioh3420 controller in unless you are running libvirt built from upstream master - support for that was just pushed last weekend. (it is a new controller defined like this: <controller type='pci' model='pcie-root-port'/> 3) Doing the above *may* eliminate another problem that your current definition has - it has an emulated video device at 00:02.0, then the passthrough video device at 00:04.0. It's likely that even if you do get the domain started up, you'll end up with the passthrough vga as a secondary (which I'm guessing isn't what you want). 4) I don't know what "x-vga=on" is, but libvirt doesn't directly support that. If it is necessary, you would need to add it using libvirt's "qemu commandline passthrough": http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html (you can add a "-set", "device.hostdev0.x-vga=on" I think) 5) The libvirt definition you've provided add several other devices that you don't have in your qemu commandline which may get in the way of what you want to do; in particular the tablet, mouse, and keyboard. You'll probably want to trim those out (and if you're passing through USB, you'll likely want to replace all of the <controller type='usb'> elements with a single <controller type='usb' model='none'/> 6) your qemu commandline specifies an exact bios location, which you haven't done in the libvirt definition. If that isn't the default bios, then you'll want to look into how to specify a non-default bios file here: http://www.libvirt.org/formatdomain.html 7) If you're going to stick with the pc-ii440fx machinetype and your guest doesn't absolutely require a SATA disk, you'd probably be better off letting it attach to the 440fx's IDE controller instead (or even better, if your guest OS supports the virtio disk driver, use that since the performance will be *greatly* improved). (likewise, replace the network interface's "model type='e1000'" with "model type='virtio'" if you can).
I get:
virsh # start foo error: Failed to start domain foo error: internal error: process exited while connecting to monitor: 2015-08-12T18:24:10.651720Z qemu-system-x86_64: -device vfio-pci,host=02:00.0,id=hostdev0,bus=pci.0,addr=0x4: vfio: error, group 18 is not viable, please ensure all devices within the iommu_group are bound to their vfio bus driver. 2015-08-12T18:24:10.651752Z qemu-system-x86_64: -device vfio-pci,host=02:00.0,id=hostdev0,bus=pci.0,addr=0x4: vfio: failed to get group 18 2015-08-12T18:24:10.651766Z qemu-system-x86_64: -device vfio-pci,host=02:00.0,id=hostdev0,bus=pci.0,addr=0x4: Device initialization failed. 2015-08-12T18:24:10.651781Z qemu-system-x86_64: -device vfio-pci,host=02:00.0,id=hostdev0,bus=pci.0,addr=0x4: Device 'vfio-pci' could not be initialized
I have included dumpxml output below -- is the hostdev section wrong?
<domain type='kvm'> <name>foo</name> <uuid>51f57655-11be-41bf-b925-2e6aef01f9c4</uuid> <memory unit='KiB'>8388608</memory> <currentMemory unit='KiB'>8388608</currentMemory> <vcpu placement='static' current='1'>4</vcpu> <os> <type arch='x86_64' machine='pc-i440fx-utopic'>hvm</type> <bootmenu enable='yes'/> </os> <features> <acpi/> <apic/> <pae/> <hyperv> <relaxed state='on'/> <vapic state='on'/> <spinlocks state='on' retries='8191'/> </hyperv> </features> <cpu mode='custom' match='exact'> <model fallback='allow'>SandyBridge</model> <topology sockets='1' cores='4' threads='1'/> </cpu> <clock offset='localtime'> <timer name='rtc' tickpolicy='catchup'/> <timer name='pit' tickpolicy='delay'/> <timer name='hpet' present='no'/> <timer name='hypervclock' present='yes'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <pm> <suspend-to-mem enabled='no'/> <suspend-to-disk enabled='no'/> </pm> <devices> <emulator>/usr/bin/kvm-spice</emulator> <disk type='file' device='disk'> <driver name='qemu' type='raw'/> <source file='/vm2/foo.img'/> <target dev='sda' bus='sata'/> <boot order='1'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> <disk type='block' device='cdrom'> <driver name='qemu' type='raw'/> <target dev='hdb' bus='ide'/> <readonly/> <boot order='2'/> <address type='drive' controller='0' bus='0' target='0' unit='1'/> </disk> <controller type='usb' index='0' model='ich9-ehci1'> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x7'/> </controller> <controller type='usb' index='0' model='ich9-uhci1'> <master startport='0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0' multifunction='on'/> </controller> <controller type='usb' index='0' model='ich9-uhci2'> <master startport='2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x1'/> </controller> <controller type='usb' index='0' model='ich9-uhci3'> <master startport='4'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x2'/> </controller> <controller type='pci' index='0' model='pci-root'/> <controller type='ide' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> </controller> <controller type='virtio-serial' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </controller> <controller type='sata' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </controller> <interface type='direct'> <mac address='52:54:00:f0:47:f5'/> <source dev='p5p1' mode='bridge'/> <model type='e1000'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> <serial type='pty'> <target port='0'/> </serial> <console type='pty'> <target type='serial' port='0'/> </console> <channel type='spicevmc'> <target type='virtio' name='com.redhat.spice.0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> <input type='tablet' bus='usb'/> <input type='mouse' bus='ps2'/> <input type='keyboard' bus='ps2'/> <graphics type='spice' autoport='yes'/> <video> <model type='cirrus' vram='16384' heads='1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </video> <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </hostdev> <memballoon model='virtio'> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </memballoon> </devices> </domain>

Quoting Laine Stump (laine@laine.org):
On 08/12/2015 02:34 PM, Alex Holst wrote:
I would really appreciate some pointers on what I am doing wrong here.
I have a need to run multiple virtual guests which have each their own GPU and some USB controllers passed-through. I am able to run one of the guests like this (assuming vfio stuff has happened elsewhere), but I would prefer to use virsh: [..]
Thank you for your input. I have been working on this issue on and off since my original mail to this list. I have been unable to properly migrate the single VM from a shell script, much less being able to run several VMs that each have a pass-through. As for details missing from my previous mail: This is an Ubuntu 15.04 host running several Windows 10 guests. The entire kvm command line I have running is from this guide at Puget Systems: https://www.pugetsystems.com/labs/articles/Multiheaded-NVIDIA-Gaming-using-U... I have discovered several problems with this guide, in particular that I can remove the pci_stub ids from /etc/initramfs-tools/modules and the virtual Windows host continues to work just fine. So, now I'm back to scratch using virt-install and pointing to the existing img file that works with the kvm shell script:: $ virt-install --name foo --memory 8192 --machine q35 \ --host-device 02:00.0 --host-device 02:00.1 \ --host-device 00:1a.0 --host-device 00:1d.0 \ --disk /vm2/foo.img --boot menu=on Starting install... Creating domain... Connected to domain foo Escape character is ^] Even though the 02:00.0 and 02:00.1 devices are the GPU and on-board audio, the console remains in text mode and the actual guest OS is nowhere to be seen. $ lspci -nn | grep 02:00 02:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:1401] (rev a1) 02:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:0fba] (rev a1) $ lspci -nn | egrep 00:1[a,d] 00:1a.0 USB controller [0c03]: Intel Corporation C600/X79 series chipset USB2 Enhanced Host Controller #2 [8086:1d2d] (rev 06) 00:1d.0 USB controller [0c03]: Intel Corporation C600/X79 series chipset USB2 Enhanced Host Controller #1 [8086:1d26] (rev 06) Do you have any additional pointers for me on how to properly pass the GPU through so the guest OS detects it and is able to make use of the attached display? Thanks, Alex
kvm -M q35 -m 8192 -cpu host,kvm=off \ -smp 4,sockets=1,cores=4,threads=1 \ -bios /usr/share/seabios/bios.bin -vga none \ -device ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=root.1 \ -device vfio-pci,host=02:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on \ -device vfio-pci,host=02:00.1,bus=root.1,addr=00.1 \ -device vfio-pci,host=00:1d.0,bus=pcie.0 \ -device vfio-pci,host=00:1a.0,bus=pcie.0 \ -nographic -boot menu=on /vm2/foo.img
I found the hardware addresses using lspci. When I invoke the same machine with virsh with what I believe are the same addresses,
1) The XML that you provide at the end shows you only assigning 02:00.0, but not 02:00.1, and you are using managed='yes', so you are depending on libvirt to unbind any host driver and bind vfio-pci to the devices. If 02:00.0 and 02:00.1 are both in iommu group 18, but one of them isn't bound to vfio-pci, qemu will give "some kind of error" when it tries to assign any other device in the group.
2) your example qemu commandline uses the q35 machinetype, but your example libvirt domain uses pc-i440fx-utopic machinetype, so the bus structure is completely different. Assuming a recent enough libvirt, you should be able to create a virtual machine based on q35 and attach the devices to pcie-root (bus 0). You wouldn't be able to add an ioh3420 controller in unless you are running libvirt built from upstream master - support for that was just pushed last weekend. (it is a new controller defined like this:
<controller type='pci' model='pcie-root-port'/>
3) Doing the above *may* eliminate another problem that your current definition has - it has an emulated video device at 00:02.0, then the passthrough video device at 00:04.0. It's likely that even if you do get the domain started up, you'll end up with the passthrough vga as a secondary (which I'm guessing isn't what you want).
4) I don't know what "x-vga=on" is, but libvirt doesn't directly support that. If it is necessary, you would need to add it using libvirt's "qemu commandline passthrough":
http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html
(you can add a "-set", "device.hostdev0.x-vga=on" I think)
5) The libvirt definition you've provided add several other devices that you don't have in your qemu commandline which may get in the way of what you want to do; in particular the tablet, mouse, and keyboard. You'll probably want to trim those out (and if you're passing through USB, you'll likely want to replace all of the <controller type='usb'> elements with a single <controller type='usb' model='none'/>
6) your qemu commandline specifies an exact bios location, which you haven't done in the libvirt definition. If that isn't the default bios, then you'll want to look into how to specify a non-default bios file here: http://www.libvirt.org/formatdomain.html
7) If you're going to stick with the pc-ii440fx machinetype and your guest doesn't absolutely require a SATA disk, you'd probably be better off letting it attach to the 440fx's IDE controller instead (or even better, if your guest OS supports the virtio disk driver, use that since the performance will be *greatly* improved). (likewise, replace the network interface's "model type='e1000'" with "model type='virtio'" if you can).
-- Alex Holst

On 09/24/2015 03:26 PM, Alex Holst wrote:
Quoting Laine Stump (laine@laine.org):
On 08/12/2015 02:34 PM, Alex Holst wrote:
I would really appreciate some pointers on what I am doing wrong here.
I have a need to run multiple virtual guests which have each their own GPU and some USB controllers passed-through. I am able to run one of the guests like this (assuming vfio stuff has happened elsewhere), but I would prefer to use virsh: [..]
Thank you for your input. I have been working on this issue on and off since my original mail to this list.
I have been unable to properly migrate the single VM from a shell script,
You will not be able to migrate a guest that has a passthrough GPU device (or any other PCI device assigned to the guest using vfio or kvm device assignment), if that's one of the things you're trying to do.
much less being able to run several VMs that each have a pass-through.
So you have multiple GPUs on the hardware?
As for details missing from my previous mail: This is an Ubuntu 15.04 host running several Windows 10 guests. The entire kvm command line I have running is from this guide at Puget Systems:
https://www.pugetsystems.com/labs/articles/Multiheaded-NVIDIA-Gaming-using-U...
I haven't looked at this page in detail (and wouldn't know what to look for if I did :-), but it appears that it was last edited over a year ago, and I think there has been substantial progress/change in GPU passthrough since then. The "new hotness" for information about GPU passthrough is here: http://vfio.blogspot.com/ In particular, start with this article: http://vfio.blogspot.com/2015/05/vfio-gpu-how-to-series-part-1-hardware.html
I have discovered several problems with this guide, in particular that I can remove the pci_stub ids from /etc/initramfs-tools/modules and the virtual Windows host continues to work just fine.
So, now I'm back to scratch using virt-install and pointing to the existing img file that works with the kvm shell script::
$ virt-install --name foo --memory 8192 --machine q35 \ --host-device 02:00.0 --host-device 02:00.1 \ --host-device 00:1a.0 --host-device 00:1d.0 \ --disk /vm2/foo.img --boot menu=on
Starting install... Creating domain... Connected to domain foo Escape character is ^]
Even though the 02:00.0 and 02:00.1 devices are the GPU and on-board audio, the console remains in text mode and the actual guest OS is nowhere to be seen.
$ lspci -nn | grep 02:00 02:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:1401] (rev a1) 02:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:0fba] (rev a1) $ lspci -nn | egrep 00:1[a,d] 00:1a.0 USB controller [0c03]: Intel Corporation C600/X79 series chipset USB2 Enhanced Host Controller #2 [8086:1d2d] (rev 06) 00:1d.0 USB controller [0c03]: Intel Corporation C600/X79 series chipset USB2 Enhanced Host Controller #1 [8086:1d26] (rev 06)
I'm guessing the above lspci is on the host rather than the guest (since you say the guest OS is "nowhere to be seen").
Do you have any additional pointers for me on how to properly pass the GPU through so the guest OS detects it and is able to make use of the attached display?
I don't use virt-install enough to be intimately familiar with what is generated from the --host-device option, but I don't see that you've specified the PCI address on the *guest* anywhere, nor that you've told it to not setup an emulated graphics device, so I'm guessing that the generated guest has an emulated graphics device at the standard location, and the GPU is visible at some other address in the guest (so at best it would show up in the guest as a secondary display). But really I think you may have better luck by just starting over using the information at vfio.blogspot.com, since the person writing that is one of the people actually debugging problems with GPU passthrough and submitting kernel and qemu patches to fix them.
participants (2)
-
Alex Holst
-
Laine Stump