Adding Paolo.
Michal Privoznik <mprivozn(a)redhat.com> writes:
On 02.09.2016 01:11, Alex Williamson wrote:
> Hey,
>
> I'm out of my QOM depth, so I'll just beg for help in advance. I
> noticed in testing vfio-pci hotunplug that the host seems to be trying
> to reclaim the device before QEMU is actually done with it, there's a
> very short race where libvirt has seen the DEVICE_DELETED event and
> tries to unbind the physical device from vfio-pci, the use count is
> clearly non-zero because the host driver tries to send a device
> request, but that event channel has already been torn down. Nearly
> immediately after, QEMU finally releases the device, but we can't do a
> proper reset due to some issues with device references in the kernel.
>
> When I run gdb on QEMU with breakpoints at
> qapi_event_send_device_deleted() and vfio_instance_finalize(), the
> QAPI even happens first. Clearly this is horribly wrong, right? I
> can't unmap my references to the vfio device file until my
> instance_finalize is called, so I'm always going to have that open when
> libvirt takes the DEVICE_DELETED event as a cue to return the device to
> host drivers. The call chains look like this:
>
> #0 qapi_event_send_device_deleted (has_device=true,
> device=0x7f5ca3e36fb0 "hostdev0",
> path=0x7f5c89e84fe0 "/machine/peripheral/hostdev0",
> errp=0x7f5ca241f9e8 <error_abort>) at qapi-event.c:412
> #1 0x00007f5ca1701608 in device_unparent (obj=0x7f5ca43ffc00)
> at hw/core/qdev.c:1115
> #2 0x00007f5ca18b7891 in object_finalize_child_property (obj=0x7f5ca380f500,
> name=0x7f5ca3f21da0 "hostdev0", opaque=0x7f5ca43ffc00) at
qom/object.c:1362
> #3 0x00007f5ca18b56b2 in object_property_del_child (obj=0x7f5ca380f500,
> child=0x7f5ca43ffc00, errp=0x0) at qom/object.c:422
> #4 0x00007f5ca18b5790 in object_unparent (obj=0x7f5ca43ffc00)
> at qom/object.c:441
> #5 0x00007f5ca16c1f31 in acpi_pcihp_eject_slot (s=0x7f5ca4c41268, bsel=0,
> slots=4) at hw/acpi/pcihp.c:139
>
>
> #0 vfio_instance_finalize (obj=0x7f5ca43ffc00)
> at /net/gimli/home/alwillia/Work/qemu.git/hw/vfio/pci.c:2731
> #1 0x00007f5ca18b57c0 in object_deinit (obj=0x7f5ca43ffc00,
> type=0x7f5ca376f490) at qom/object.c:448
> #2 0x00007f5ca18b5831 in object_finalize (data=0x7f5ca43ffc00)
> at qom/object.c:462
> #3 0x00007f5ca18b6782 in object_unref (obj=0x7f5ca43ffc00) at qom/object.c:896
> #4 0x00007f5ca1550cc0 in memory_region_unref (mr=0x7f5ca43fff00)
> at /net/gimli/home/alwillia/Work/qemu.git/memory.c:1476
> #5 0x00007f5ca1553886 in do_address_space_destroy (as=0x7f5ca43ffe10)
> at /net/gimli/home/alwillia/Work/qemu.git/memory.c:2272
>
>
> It appears that DEVICE_DELETED only means the VM is done with the
> device but libvirt is interpreting it as QEMU is done with the device.
> Which is correct? Do we need a new event or do we need to fix the
> ordering of this event? An ordering fix would be more compatible with
> existing libvirt. Thanks,
What an interesting race. I think the even should be sent only after
both guest and qemu are done with the device. Having two events looks
like too much granularity to me. I mean, even if libvirt learns that
guest has detached device, it still can't do anything until qemu clears
its internal state.
On the other hand, it is clearly documented that the DEVICE_DELETED
event is sent as soon as guest acknowledges completion of device
removal. So libvirt's buggy if we'd follow documentation strictly. But
then again, I don't see much information value in "guest has detached
device but qemu hasn't yet" event. Libvirt would ignore such event.
Unless I'm missing something, libvirt needs an event that signals "Guest
and QEMU are done with this device". Current DEVICE_DELETED isn't.
Can we imagine a use for current DEVICE_DELETED, i.e. "Guest is done,
but QEMU isn't"?
Would anything break if we changed semantics of DEVICE_DELETED to what
libvirt actually needs?
If the answers are "no" and "no", let's do it.