Re: [libvirt] [PATCH v2 4/5] qemu_hotplug: Fix a rare race condition when detaching a device twice

14 Mar 2019

      On Thu, Mar 14, 2019 at 15:31:48 +0100, Michal Privoznik wrote:
...
On 3/14/19 3:14 PM, Peter Krempa wrote:
...
On Thu, Mar 14, 2019 at 14:56:48 +0100, Michal Privoznik wrote:
...
On 3/14/19 2:18 PM, Peter Krempa wrote:
...
On Thu, Mar 14, 2019 at 13:22:38 +0100, Michal Privoznik wrote:
[...]
...
...
How can this be considered success? Also this introduces a possible
regression. The DEVICE_DELETED event should be fired only after the
device was entirely unplugged. Claiming success before seeing the event
can lead to another race when qemu deleted the device from the internal
list so that 'device_del' does not see it any more but did not finish
cleanup fully.
We need to start the '*Remove' handler only after the DEVICE_DELETED
event was received.
I beg to differ. If we were to report error here users would see the API
failing with error "Device not found". So they'd run 'virsh dumpxml' only to
find the device there. I don't find such behaviour sane. If one API tells me
a devie is not there then another one shall not tell otherwise.
Well. The user semantics can be confusing here. What we can't allow
though is that some of the steps done in the qemuDomainRemove*Device
will fail because qemu will still have some internal reference to some
backend object.
I'm not quite sure I follow. qemuDomainRemove*Device will be run exactly
once. Not any more times. Running it more times is a problem, but I'm
failing to see how my patch allows that. Can you shed more light into that
please?
What I've meant is that qemuDomainRemove*Device shall never be called
earlier than DEVICE_DELETED is received.

What I've forgot to take into account is that we will still call
qemuDomainWaitForDeviceRemoval (as your comment mentions).

This means that the scenario I was outlining will not happen as success
from this API does not automatically mean we'll try to delete the
backends of the device (which I didn't notice at first).

I'd probably just delete the mention of dumping XML. I think the
statement that qemuDomainWaitForDeviceRemoval should be clear enough.