On 2013年03月08日 18:37, Osier Yang wrote:
On 2013年03月08日 17:25, Jiri Denemark wrote:
> On Fri, Mar 08, 2013 at 09:50:55 +0100, Markus Armbruster wrote:
>> Osier Yang<jyang(a)redhat.com> writes:
>>
>>> I'm wondering if it could be long time to wait for the device_del
>>> completes (AFAIK from previous bugs, it can be, though it should be
>>> fine for most of the cases). If it's too long, it will be a problem
>>> for management, because it looks like hanging. We can have a timeout
>>> for the device_del in libvirt, but the problem is the device_del
>>> can be still in progress by qemu, which could cause the inconsistency.
>>> Unless qemu has some command to cancel the device_del.
>>
>> I'm afraid cancelling isn't possible, at least not for PCI.
>
> I don't think we need anything like that. We just need the device
> deletion API to return immediately without actually removing stuff from
> domain definition (unless the device was really removed fast enough,
> e.g., USB devices are removed before device_del returns) and then remove
> the device from domain definition when we get the event from QEMU or
> when libvirtd reconnects to a domain and sees a particular device is no
> longer present. After all, devices may be removed even if we didn't ask
> for it (when the removal is initiated by a guest OS). And we should also
> provide similar event for higher level apps.
Removing the device from domain config unless we get the event from
qemu or find the device disappeared by polling makes sense. That's
the mainly reason for we want the event and polling actually.
But the problem is our APIs don't want to have long time hanging.
If we don't change the APIs and return quickly just like what we
do currently, it's confused for user, because when he wants to
attach the device again while the device_del is still in progress,
he will get the error like "Device ID *** is in used", however,
our detaching APIs return success prior to that.
I.E, if device_del needs long time to complete in some cases?
can we live with it?
After talking with Jirka internally on IRC, we got agreement
that waiting for the qemu event or polling before the detaching
APIs returning success is not workable, because the time for
device_del completing is really depended, even worse, it may
never complete, that means we might break the back-compatibility.
if going that way.
The conclusion is that we need documents to say the detaching
APIs returning success doesn't mean the device is really removed,
and also we should expose the qemu event in libvirt so that the upper
layer management has a way to known if the device is really gone.
Osier