On 8/8/20 11:53 PM, Daniel Black wrote:
In attempting to isolate vfio-pci problems between two different guest
instances, the creation of a second guest (with existing guest shutdown)
resulted in:.
Aug 09 12:43:23 grit libvirtd[6716]: internal error: Device 0000:01:00.3
is already in use
Aug 09 12:43:23 grit libvirtd[6716]: internal error: Device 0000:01:00.3
is already in use
Aug 09 12:43:23 grit libvirtd[6716]: Failed to allocate PCI device list:
internal error: Device 0000:01:00.3 is already in use
Hmm. Normally the error that would be logged if a device is already in
use would say something like this:
error: Failed to start domain Win10-GPU
error: Requested operation is not valid: PCI device 0000:05:00.0 is in
use by driver QEMU, domain F30
So you're encountering this in an unexpected place.
Compiled against library: libvirt 6.1.0
Using library: libvirt 6.1.0
Using API: QEMU 6.1.0
Running hypervisor: QEMU 4.2.1
(fc32 default install)
The upstream code seems also to test definitions rather than active
uses of the PCI device.
That isn't the case. You're misunderstanding what devices are on the
list. (see below for details)
My potentially naive patch to correct this (but not the failing test
cases) would be:
diff --git a/src/util/virpci.c b/src/util/virpci.c
index 47c671daa0..a00c5e6f44 100644
--- a/src/util/virpci.c
+++ b/src/util/virpci.c
@@ -1597,7 +1597,7 @@ int
virPCIDeviceListAdd(virPCIDeviceListPtr list,
virPCIDevicePtr dev)
{
- if (virPCIDeviceListFind(list, dev)) {
+ if (virPCIDeviceBusContainsActiveDevices(dev, list)) {
virReportError(VIR_ERR_INTERNAL_ERROR,
_("Device %s is already in use"), dev->name);
return -1;
Is this too simplistic or undesirable a feature request/implementation?
Only devices that are currently in use by a guest (activePCIHostdevs),
or that libvirt is in the process of detaching from the guest + vfio and
rebinding to the device's host driver (inactivePCIHostdevs) are on
either list of PCI devices maintained by libvirt. Once a device is
completely detached from the guest and (if "managed='yes'" was set in
the XML config) re-binded to the natural host driver for the device, it
is removed from the list and can be used elsewhere.
I just tested this with an assigned GPU + soundcard on two guests to
verify that it works properly. (I'm running the latest upstream master
though, so it's not an exact replication of your test)
I'd be more than grateful if someone carries this through as I'm unsure
when I may get time for this.
Can you provide the XML for your <hostdev> in the two guests, and the
exact sequence of commands that lead to this error? There is definitely
either a bug in the code, or a bug in what you're doing. By seeing the
sequence of events, we can either attempt to replicate it, or let you
know what change you need to make to your workflow to eliminate the error.