
Hi Laine, On 2014/3/28 23:21, Laine Stump wrote:
On 03/28/2014 01:30 PM, xiexiangyou wrote:
Thanks for your reply.
On 2014/3/27 22:14, Jiri Denemark wrote:
On Thu, Mar 27, 2014 at 20:51:24 +0800, x00221466 wrote:
Hi,
When live detaching the virtual net device, such as virtio nic、 RTL8139、E1000, there are some problems:
(1)If the Guest OS don't support the hot plugging pci device, detach the virtual network device by Libvirt, the "net device" in Qemu will still exist, but "hostnet"(tap) in Qemu will be removed. so the net device in Guest OS will be of no effect.
(2)If reject the nic in Guest OS, Qemu will remove the "net device", then Qemu send DEVICE_DELETED to Libvirt, Libvirt receive the event in event-loop thread and release info of the net device in qemuDomainRemoveNetDevice func. but "hostnet" in Qemu still exist. So next live attaching virtual net device will be failed because of "Duplicate ID".
#virsh attach-device win2008_st_r2_64 net.xml --live error: Failed to attach device from net.xml error: internal error: unable to execute QEMU command 'netdev_add': Duplicate ID 'hostnet0' for netdev
(3)In addition, in qemuDomainDetachNetDevice, detach net device func, "netdev_del" command will be sent after sending "device_del" command at once. So it is violent to remove the tap device before the net device is completely removed.
So I think it's more logical that doing the work of sending Qemu command "netdev_del" after receive the DEVICE_DELETED event. It can avoid the conflict of device info between Libvirt side and Qemu side. This sounds like it could be correct, although I'd prefer Laine to express his opinion on this since he knows the corners in network device assignment...
I create a thread in qemuDomainRemoveDevice,the handle of DEVICE_DELETED event, to execute QEMU command "netdev_del". Hmm, it took me some time to realize why you'd need to do this. It's because qemuDomainRemoveDevice is run from a DEVICE_DELETED event handler and thus it cannot talk back to the monitor, right? In that
Yep! Sending the Qemu monitor command in event handler is no allowed, so I create a new thread to do this.
case, I suggest spawning a thread for qemuDomainRemoveDevice itself within the event handler (qemuProcessHandleDeviceDeleted) so that all qemuDomainRemove* methods can talk to monitor if they need to.
I will modify it as your suggest
To make the changes easier to follow, please do the change in two patches. The first one to move qemuDomainRemoveDevice into a new thread and the second one to move qemuMonitorRemoveNetdev and qemuMonitorRemoveHostNetwork calls inside qemuDomainRemoveNetDevice.
But first, wait for Laine's input, please.
Well, the level of my knowledge was that I noticed the problem caused by the asynchronous nature of device_del (exactly the error message that you're reporting) and reported this to QEMU, asking for an event to let us know when it is okay to reuse a device ID (i.e. the DEVICE_DELETED event). It appears that this isn't always good enough, though, so *something* apparently needs to be done.
My understanding is that the problem is caused by the netdev_del being executed too soon after device_del, and then the device ID is forever lost due to the unclean "cleanup", is that correct? If so, then your solution sounds correct.
Yes :)
But does netdev_del complete synchronously? If not, then we will also need a completion event for that as well.
I think so, if do the job of executing Qemu command "netdev_del" after the "device_del" work is really finished, then call qemuDomainRemoveNetDevice to do cleanup. The solution is complete synchronously. It can guarantee the consistency of net device info between Libvirt side and Qemu side. Regards, -xie