NVMe drive PCI passthrough and suprise hotplug

[AMD Official Use Only] Hi, I am using Fedora 33, with the following KVM, qemu and libvirt versions: QEMU 5.1.0 libvirt 6.6.0 KVM 5.14.18 We have done pass-through of a PCIe NVMe device to the guest running on FC33 using either virt-manager or virsh and then we do the hot-unplug of the device while it is attached to the guest. The device is no longer seen on the guest hardware device list on virt-manager and then we hotplug the device again and we are able to use it on the Host, but when we try to re-attach it to the guest, we get the following error message: Requested operation is not valid, PCI device 0000:c4::00.0 is in use by driver QEMU, Domain fedora 33. So somehow libvirt still thinks the hot-unplugged device is attached. Tracing the flow of hot un-plug event from guest to host : ->Guest pcie hotplug support detected the NVMe driver unplug (from guest kernel logs): pciehp: Slot (0-6): Attention button pressed pciehp: Slot (0-6): Powering off due to to button press. -> Also looks like the guest notified Host/KVM (from host kernel logs): pcieport: 0000:c4:0000.0: pciehp: Slot (208): Card not present -> Correspondingly, vfio-pci module notified Qemu : vfio-pci: 0000:c4:0000.0: Relaying device request to user (#0) -> Then the un-plugged device reset is done. vfio-pci: vfio_bar_restore: reset recovery - restoring BARs pci 0000:c4:00.0: Removing from iommu group 105. -> Next tried to verify if libvirt detected the DELETED_DRIVE event from qemu. Running SystemTap script to capture events between qemu and libvirt : stap examples/systemtap/qemu-monitor.stp When the NVMe drive is attached to VM the following log output is seen from SystemTap: execute "device-add", driver: "vfio-pci", host: "0000:c4:00.0", id: "hostdev0", bus: "pci.7", addr: "0". When we hot-unplug the NVMe drive, the following log output is seen from SystemTap: event: DEVICE_DELETED, device: "hostdev0", path: "/machine/peripheral/hostdev0". So it looks like that qemu sent the "DEVICE_DELTED" event to libvirt, but libvirt has still not removed the attached device from its bookeeping list. I understand there is already a thread from 20202, discussing a similar issue : https://www.spinics.net/linux/fedora/libvirt-users/msg12590.html But I am not sure if there is any fix/support added for this recently. Looking for any feedback related to above and PCI device passthrough and hotplug support. Thanks, Ashish

On Thu, Feb 03, 2022 at 23:25:05 +0000, Kalra, Ashish wrote:
[AMD Official Use Only]
Well, I hope it's okay to use it for libvirt officially too ;)
Hi, I am using Fedora 33, with the following KVM, qemu and libvirt versions:
Note that Fedora 33 is already end-of-life, it would be great if you can re-test with a more recent version
QEMU 5.1.0 libvirt 6.6.0
specifically this was released 1.5 years ago
KVM 5.14.18
We have done pass-through of a PCIe NVMe device to the guest running on FC33 using either virt-manager or virsh and then we do the hot-unplug of the device while it is attached to the guest.
The device is no longer seen on the guest hardware device list on virt-manager and then we hotplug the device again and we are able to use it on the Host, but when we try to re-attach it to the guest, we get the following error message:
Requested operation is not valid, PCI device 0000:c4::00.0 is in use by driver QEMU, Domain fedora 33.
[...] Unfortunately the tracing you've done doesn't really help in seeing what gone wrong in libvirt. Please attach debug logs per https://www.libvirt.org/kbase/debuglogs.html

[AMD Official Use Only] Hello Peter, Thanks for your response, we will try with the latest fedora and possibly also with the latest libvirt and re-test. Will reply with attached debug logs as mentioned below in case we still face the same issue with re-testing. Thanks, Ashish -----Original Message----- From: Peter Krempa <pkrempa@redhat.com> Sent: Friday, February 4, 2022 3:21 AM To: Kalra, Ashish <Ashish.Kalra@amd.com> Cc: libvirt-users@redhat.com; Grimm, Jon <Jon.Grimm@amd.com>; Ma, Mang-kwan <Mang-kwan.Ma@amd.com>; Huang2, Wei <Wei.Huang2@amd.com> Subject: Re: NVMe drive PCI passthrough and suprise hotplug On Thu, Feb 03, 2022 at 23:25:05 +0000, Kalra, Ashish wrote:
[AMD Official Use Only]
Well, I hope it's okay to use it for libvirt officially too ;)
Hi, I am using Fedora 33, with the following KVM, qemu and libvirt versions:
Note that Fedora 33 is already end-of-life, it would be great if you can re-test with a more recent version
QEMU 5.1.0 libvirt 6.6.0
specifically this was released 1.5 years ago
KVM 5.14.18
We have done pass-through of a PCIe NVMe device to the guest running on FC33 using either virt-manager or virsh and then we do the hot-unplug of the device while it is attached to the guest.
The device is no longer seen on the guest hardware device list on virt-manager and then we hotplug the device again and we are able to use it on the Host, but when we try to re-attach it to the guest, we get the following error message:
Requested operation is not valid, PCI device 0000:c4::00.0 is in use by driver QEMU, Domain fedora 33.
[...] Unfortunately the tracing you've done doesn't really help in seeing what gone wrong in libvirt. Please attach debug logs per https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.libvir...
participants (2)
-
Kalra, Ashish
-
Peter Krempa