Am Fri, 28 Oct 2016 17:22:41 +0200
schrieb Laszlo Ersek <lersek(a)redhat.com>:
On 10/28/16 13:28, Henning Schild wrote:
> Hey,
>
> i am running an unusual setup where i assign pci devices behind the
> back of libvirt. I have two options to do that:
> 1. a wrapper script for qemu that takes care of suid-root and
> appends arguments for pci-assign
> 2. virsh qemu-monitor-command ... 'device_add pci-assign...'
>
> I know i should probably not be doing this, it is a workaround to
> introduce fine-grained pci-assignment in an openstack setup, where
> vendor and device id are not enough to pick the right device for a
> vm.
(1) The libvirt domain XML identifies the host PCI device to assign by
full PCI address (see the <source> element:
<
http://libvirt.org/formatdomain.html#elementsHostDev>); it does not
filter with vendor/device ID.
So, I believe your comment refers to the pci-stub host kernel driver
not being flexible enough for binding vs. not binding different
instances of the same vendor/device ID.
My comment referred to OpenStack. The version we are using assigns PCI
devices purely by device and vendor ID. The pci stub is no problem at
all, you can always bind/unbind by address.
If that's the case, would you be helped by the following host
kernel
patch?
[PATCH] PCI: pci-stub: accept exceptions to the ID- and class-based
matching
<
http://www.spinics.net/lists/linux-pci/msg55497.html>
(2) Is there any reason (other than (1)) that you are using the
legacy / deprecated pci-assign method, rather than VFIO?
I suggest to evaluate whether the "pci-stub.except=..." kernel
parameter helped your use case, and if (consequently) you could move
to a fully libvirt + VFIO based config.
I would like to do that in the long run and will look into the options.
But for now i was hoping for a quick answer to make the hacky version
work again.
Thanks,
Henning
Thanks
Laszlo
>
> In both cases qemu will crash with the following output:
>
>> qemu: hardware error: pci read failed, ret = 0 errno = 22
>
> followed by the usual machine state dump. With strace i found it to
> be a failing read on the config space file of my device.
> /sys/bus/pci/devices/0000:xx:xx.x/config
> A few reads out of that file succeeded, as well as accesses on
> vendor etc.
>
> Manually launching a qemu with the pci-assign works without a
> problem, so i "blame" libvirt and the cgroup environment the qemu
> ends up in. So i put a bash into the exact same cgroup setup - next
> to a running qemu, expecting a dd or hexdump on the config-space
> file to fail. But from that bash i can read the file without a
> problem.
>
> Has anyone seen that problem before? Right now i do not know what i
> am missing, maybe qemu is hitting some limits configured for the
> cgroups or whatever. I can not use pci-assign from libvirt, but if i
> did would it configure cgroups in a different way or relax some
> limits?
>
> What would be a good next step to debug that? Right now i am
> looking at kernel event traces, but the machine is pretty big and
> so is the trace.
>
> That assignment used to work and i do not know how it broke, i have
> tried combinations of several kernels, versions of libvirt and qemu.
> (kernel 3.18 and 4.4, libvirt 1.3.2 and 2.0.0, and qemu 2.2.1 and
> 2.7) All combinations show the same problem, even the ones that
> work on other machines. So when it comes to software versions the
> problem could well be caused by a software update of another
> component, that i got with the package manager and did not compile
> myself. It is a debian 8.6 with all recent updates installed. My
> guess would be that systemd could have an influence on cgroups or
> limits causing such a problem.
>
> regards,
> Henning
>