"Daniel P. Berrange" <berrange(a)redhat.com> writes:
Adding Alex & Bandan, since they signed off the kernel patch
which
broke things.
On Tue, Jun 28, 2016 at 11:39:59AM -0600, Jim Fehlig wrote:
> After updating the dom0 kernel on one of my Xen test hosts, I noticed
> problems with PCI hostdev management. E.g
>
> # virsh nodedev-detach pci_0000_07_10_1
> error: Failed to detach device pci_0000_07_10_1
> error: Failed to add PCI device ID '8086 1520' to pciback: File exists
>
> It turns out there was a small interface change to new_id with the following
> commit to 3.16 kernel
>
>
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/dr...
>
> which now causes xen_pciback to fail writes of "vendorid productid" to
new_id. e.g.
>
> # echo "8086 1520" > /sys/bus/pci/drivers/pciback/new_id
> -bash: echo: write error: File exists
>
> Interestingly, vfio doesn't encounter the same error
>
> # echo "8086 1520" > /sys/bus/pci/drivers/vfio-pci/new_id
> # echo $?
> 0
>
> vfio-pci has:
> static struct pci_driver vfio_pci_driver = {
> .name = "vfio-pci",
> .id_table = NULL, /* only dynamic ids */
pci-stub also has that setup with NULL id_table.
>
> while xen-pciback has:
> static const struct pci_device_id pcistub_ids[] = {
> {
> .vendor = PCI_ANY_ID,
> .device = PCI_ANY_ID,
> .subvendor = PCI_ANY_ID,
> .subdevice = PCI_ANY_ID,
> },
> {0,},
> };
> static struct pci_driver xen_pcibk_pci_driver = {
> .name = "pciback",
> .id_table = pcistub_ids,
>
> So any vendor/device pair will match for xen-pciback, while none will match
> for vfio-pci.
>
> But after reading that commit and the associated thread, it is not clear to
> me how to best fix this. Options are
>
> 1. set .id_table to NULL for xen-pciback
> 2. drop using the new_id interface from libvirt
> 3. pass more values (subvendor, subdevice, class, etc) to the new_id interface
Atleast 3. is what I had in mind if someone deliberately wants to
add a matching entry.
> I'm not sure what problems, if any, options 1 and 2 might
cause. Option 2
> seems the best approach since new_id seems to be a rather unsafe interface.
It is indeed an unsafe interface. I am not sure what else we can do
to not make it easy to cause a driver induced crash and not break existing
scripts.
I'm thinking either pci-back should be made to work more like
vfio, or the kernel patch should be reverted or fixed to take
account of the way pci-back works.
Whichever way, I don't consider this a libvirt problem to solve. As
Linus' always says - the kernel must never break existing userspace
Agreed, but in this specific case, the usage is unsafe since unknown indexes
are potentially being passed to the driver operations. It should always have been
3. to begin with.
Bandan
app usage.
Regards,
Daniel