Hi Laine,
On 2014/3/28 23:21, Laine Stump wrote:
On 03/28/2014 01:30 PM, xiexiangyou wrote:
> Thanks for your reply.
>
> On 2014/3/27 22:14, Jiri Denemark wrote:
>
>> On Thu, Mar 27, 2014 at 20:51:24 +0800, x00221466 wrote:
>>> Hi,
>>>
>>> When live detaching the virtual net device, such as virtio nic、
>>> RTL8139、E1000, there are some problems:
>>>
>>> (1)If the Guest OS don't support the hot plugging pci device, detach
>>> the virtual network device by Libvirt, the "net device" in Qemu
will
>>> still exist, but "hostnet"(tap) in Qemu will be removed. so the net
device
>>> in Guest OS will be of no effect.
>>>
>>> (2)If reject the nic in Guest OS, Qemu will remove the "net
device",
>>> then Qemu send DEVICE_DELETED to Libvirt, Libvirt receive the event
>>> in event-loop thread and release info of the net device in
>>> qemuDomainRemoveNetDevice func. but "hostnet" in Qemu still exist.
>>> So next live attaching virtual net device will be failed because of
>>> "Duplicate ID".
>>>
>>> #virsh attach-device win2008_st_r2_64 net.xml --live
>>> error: Failed to attach device from net.xml
>>> error: internal error: unable to execute QEMU command 'netdev_add':
>>> Duplicate ID 'hostnet0' for netdev
>>>
>>> (3)In addition, in qemuDomainDetachNetDevice, detach net device func,
>>> "netdev_del" command will be sent after sending
"device_del" command
>>> at once. So it is violent to remove the tap device before the net device
>>> is completely removed.
>>>
>>> So I think it's more logical that doing the work of sending Qemu command
>>> "netdev_del" after receive the DEVICE_DELETED event. It can avoid
the conflict
>>> of device info between Libvirt side and Qemu side.
>> This sounds like it could be correct, although I'd prefer Laine to
>> express his opinion on this since he knows the corners in network device
>> assignment...
>>
>>> I create a thread in qemuDomainRemoveDevice,the handle of DEVICE_DELETED
event,
>>> to execute QEMU command "netdev_del".
>> Hmm, it took me some time to realize why you'd need to do this. It's
>> because qemuDomainRemoveDevice is run from a DEVICE_DELETED event
>> handler and thus it cannot talk back to the monitor, right? In that
>
> Yep! Sending the Qemu monitor command in event handler is no allowed, so I create
> a new thread to do this.
>
>> case, I suggest spawning a thread for qemuDomainRemoveDevice itself
>> within the event handler (qemuProcessHandleDeviceDeleted) so that all
>> qemuDomainRemove* methods can talk to monitor if they need to.
>
> I will modify it as your suggest
>
>> To make the changes easier to follow, please do the change in two
>> patches. The first one to move qemuDomainRemoveDevice into a new thread
>> and the second one to move qemuMonitorRemoveNetdev and
>> qemuMonitorRemoveHostNetwork calls inside qemuDomainRemoveNetDevice.
>>
>> But first, wait for Laine's input, please.
Well, the level of my knowledge was that I noticed the problem caused by
the asynchronous nature of device_del (exactly the error message that
you're reporting) and reported this to QEMU, asking for an event to let
us know when it is okay to reuse a device ID (i.e. the DEVICE_DELETED
event). It appears that this isn't always good enough, though, so
*something* apparently needs to be done.
My understanding is that the problem is caused by the netdev_del being
executed too soon after device_del, and then the device ID is forever
lost due to the unclean "cleanup", is that correct? If so, then your
solution sounds correct.
Yes :)
But does netdev_del complete synchronously? If not, then we will also
need a completion event for that as well.
I think so, if do the job of executing Qemu command "netdev_del" after the
"device_del" work is really finished, then call qemuDomainRemoveNetDevice
to do cleanup. The solution is complete synchronously. It can guarantee the
consistency of net device info between Libvirt side and Qemu side.
Regards,
-xie