Adding Alex & Bandan, since they signed off the kernel patch which
broke things.
On Tue, Jun 28, 2016 at 11:39:59AM -0600, Jim Fehlig wrote:
After updating the dom0 kernel on one of my Xen test hosts, I
noticed
problems with PCI hostdev management. E.g
# virsh nodedev-detach pci_0000_07_10_1
error: Failed to detach device pci_0000_07_10_1
error: Failed to add PCI device ID '8086 1520' to pciback: File exists
It turns out there was a small interface change to new_id with the following
commit to 3.16 kernel
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/dr...
which now causes xen_pciback to fail writes of "vendorid productid" to new_id.
e.g.
# echo "8086 1520" > /sys/bus/pci/drivers/pciback/new_id
-bash: echo: write error: File exists
Interestingly, vfio doesn't encounter the same error
# echo "8086 1520" > /sys/bus/pci/drivers/vfio-pci/new_id
# echo $?
0
vfio-pci has:
static struct pci_driver vfio_pci_driver = {
.name = "vfio-pci",
.id_table = NULL, /* only dynamic ids */
pci-stub also has that setup with NULL id_table.
while xen-pciback has:
static const struct pci_device_id pcistub_ids[] = {
{
.vendor = PCI_ANY_ID,
.device = PCI_ANY_ID,
.subvendor = PCI_ANY_ID,
.subdevice = PCI_ANY_ID,
},
{0,},
};
static struct pci_driver xen_pcibk_pci_driver = {
.name = "pciback",
.id_table = pcistub_ids,
So any vendor/device pair will match for xen-pciback, while none will match
for vfio-pci.
But after reading that commit and the associated thread, it is not clear to
me how to best fix this. Options are
1. set .id_table to NULL for xen-pciback
2. drop using the new_id interface from libvirt
3. pass more values (subvendor, subdevice, class, etc) to the new_id interface
I'm not sure what problems, if any, options 1 and 2 might cause. Option 2
seems the best approach since new_id seems to be a rather unsafe interface.
I'm thinking either pci-back should be made to work more like
vfio, or the kernel patch should be reverted or fixed to take
account of the way pci-back works.
Whichever way, I don't consider this a libvirt problem to solve. As
Linus' always says - the kernel must never break existing userspace
app usage.
Regards,
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|