On 03/28/2014 01:30 PM, xiexiangyou wrote:
Thanks for your reply.
On 2014/3/27 22:14, Jiri Denemark wrote:
> On Thu, Mar 27, 2014 at 20:51:24 +0800, x00221466 wrote:
>> Hi,
>>
>> When live detaching the virtual net device, such as virtio nic、
>> RTL8139、E1000, there are some problems:
>>
>> (1)If the Guest OS don't support the hot plugging pci device, detach
>> the virtual network device by Libvirt, the "net device" in Qemu will
>> still exist, but "hostnet"(tap) in Qemu will be removed. so the net
device
>> in Guest OS will be of no effect.
>>
>> (2)If reject the nic in Guest OS, Qemu will remove the "net device",
>> then Qemu send DEVICE_DELETED to Libvirt, Libvirt receive the event
>> in event-loop thread and release info of the net device in
>> qemuDomainRemoveNetDevice func. but "hostnet" in Qemu still exist.
>> So next live attaching virtual net device will be failed because of
>> "Duplicate ID".
>>
>> #virsh attach-device win2008_st_r2_64 net.xml --live
>> error: Failed to attach device from net.xml
>> error: internal error: unable to execute QEMU command 'netdev_add':
>> Duplicate ID 'hostnet0' for netdev
>>
>> (3)In addition, in qemuDomainDetachNetDevice, detach net device func,
>> "netdev_del" command will be sent after sending "device_del"
command
>> at once. So it is violent to remove the tap device before the net device
>> is completely removed.
>>
>> So I think it's more logical that doing the work of sending Qemu command
>> "netdev_del" after receive the DEVICE_DELETED event. It can avoid the
conflict
>> of device info between Libvirt side and Qemu side.
> This sounds like it could be correct, although I'd prefer Laine to
> express his opinion on this since he knows the corners in network device
> assignment...
>
>> I create a thread in qemuDomainRemoveDevice,the handle of DEVICE_DELETED event,
>> to execute QEMU command "netdev_del".
> Hmm, it took me some time to realize why you'd need to do this. It's
> because qemuDomainRemoveDevice is run from a DEVICE_DELETED event
> handler and thus it cannot talk back to the monitor, right? In that
Yep! Sending the Qemu monitor command in event handler is no allowed, so I create
a new thread to do this.
> case, I suggest spawning a thread for qemuDomainRemoveDevice itself
> within the event handler (qemuProcessHandleDeviceDeleted) so that all
> qemuDomainRemove* methods can talk to monitor if they need to.
I will modify it as your suggest
> To make the changes easier to follow, please do the change in two
> patches. The first one to move qemuDomainRemoveDevice into a new thread
> and the second one to move qemuMonitorRemoveNetdev and
> qemuMonitorRemoveHostNetwork calls inside qemuDomainRemoveNetDevice.
>
> But first, wait for Laine's input, please.
Well, the level of my knowledge was that I noticed the problem caused by
the asynchronous nature of device_del (exactly the error message that
you're reporting) and reported this to QEMU, asking for an event to let
us know when it is okay to reuse a device ID (i.e. the DEVICE_DELETED
event). It appears that this isn't always good enough, though, so
*something* apparently needs to be done.
My understanding is that the problem is caused by the netdev_del being
executed too soon after device_del, and then the device ID is forever
lost due to the unclean "cleanup", is that correct? If so, then your
solution sounds correct.
But does netdev_del complete synchronously? If not, then we will also
need a completion event for that as well.