[AMD Official Use Only]
Hi,
I am using Fedora 33, with the following KVM, qemu and libvirt versions:
QEMU 5.1.0
libvirt 6.6.0
KVM 5.14.18
We have done pass-through of a PCIe NVMe device to the guest running on FC33
using either virt-manager or virsh and then we do the hot-unplug of the device
while it is attached to the guest.
The device is no longer seen on the guest hardware device list on virt-manager
and then we hotplug the device again and we are able to use it on the Host,
but when we try to re-attach it to the guest, we get the following error message:
Requested operation is not valid, PCI device 0000:c4::00.0 is in use by driver QEMU,
Domain fedora 33.
So somehow libvirt still thinks the hot-unplugged device is attached.
Tracing the flow of hot un-plug event from guest to host :
->Guest pcie hotplug support detected the NVMe driver unplug (from guest kernel logs):
pciehp: Slot (0-6): Attention button pressed
pciehp: Slot (0-6): Powering off due to to button press.
-> Also looks like the guest notified Host/KVM (from host kernel logs):
pcieport: 0000:c4:0000.0: pciehp: Slot (208): Card not present
-> Correspondingly, vfio-pci module notified Qemu :
vfio-pci: 0000:c4:0000.0: Relaying device request to user (#0)
-> Then the un-plugged device reset is done.
vfio-pci: vfio_bar_restore: reset recovery - restoring BARs
pci 0000:c4:00.0: Removing from iommu group 105.
-> Next tried to verify if libvirt detected the DELETED_DRIVE event from qemu.
Running SystemTap script to capture events between qemu and libvirt :
stap examples/systemtap/qemu-monitor.stp
When the NVMe drive is attached to VM the following log output is seen from SystemTap:
execute "device-add", driver: "vfio-pci", host:
"0000:c4:00.0", id: "hostdev0", bus: "pci.7", addr:
"0".
When we hot-unplug the NVMe drive, the following log output is seen from SystemTap:
event: DEVICE_DELETED, device: "hostdev0", path:
"/machine/peripheral/hostdev0".
So it looks like that qemu sent the "DEVICE_DELTED" event to libvirt, but
libvirt has still not removed the attached
device from its bookeeping list.
I understand there is already a thread from 20202, discussing a similar issue :
https://www.spinics.net/linux/fedora/libvirt-users/msg12590.html
But I am not sure if there is any fix/support added for this recently.
Looking for any feedback related to above and PCI device passthrough and hotplug support.
Thanks,
Ashish