
environment:libvirt-4.3.0 qemu-kvm-ev-2.10.0 kernel-3.10.0-1062 centos7 openvswitch-2.3.1 vm network xml : <interface type='bridge'> <mac address='52:54:00:46:45:95'/> <source bridge='ovsbr-mgt'/> <vlan> <tag id='0'/> </vlan> <virtualport type='openvswitch'> <parameters interfaceid='596c6ab7-4557-4935-af97-62a35d933f8d'/> </virtualport> <target dev='vnet0'/> <model type='virtio'/> <link state='up'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </interface> qemuProcessStart in qemu_process.c failed to start. The first is qemu process stop(At this time, the kernel will recycle tap device, and the tap device is applied by other virtual machines).Then, ovs removevport. It is possible to processing concurrently qemuProcessStart and qemuProcessStop. qemuProcessStop(ovs removevport) may remove ports of other virtual machines while using openvswitch virtualport. for example: Failure to start the vm1, the tap device vnet0 will be recovered first(at this time vm2 starts and uses vnet0 device,and ovs add vnet0 port), then the removevport vnet0( remove vnet0 belonging to vm2 at this time ). During this time interval, vm2 will apply for the same tap device vnet0 and add port vnet0. At this time, removing the port from vm1 will cause the port of vm2 to be lost. vm2 will not be able to access the network through this vnet0. reproduce: Batch start or migrate 10 virtual machines to the same node, one of the virtual machines start failed. This failure may be that the storage cannot connect or other failures(when we reproduced internally, one of the virtual machines was connected to an invalid storage, and it was artificially failed). this problem will cause: After batch migration, the network of a virtual machine cannot be accessed, and the virtual machine service is interrupted libvirt handles ovs logs: Jun 10 19:11:32 zbs-sh-elf-11 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=5 -- --if-exists del-port vnet4 -- add-port ovsbr-mgt vnet4 tag=0 -- set Interface vnet4 "external-ids:attached-mac=\"52:54:00:92:7e:7f\"" -- set Interface vnet4 "external-ids:iface-id=\"afb3a67a-5e5d-4ca6-b625-ebce6a9c8d03\"" -- set Interface vnet4 "external-ids:vm-id=\"7b9e4d5a-e8e9-4527-9b89-dd1f74d02526\"" -- set Interface vnet4 external-ids:iface-status=active Jun 10 19:11:32 zbs-sh-elf-11 kernel: device vnet4 entered promiscuous mode Jun 10 19:11:32 zbs-sh-elf-11 kernel: device vnet4 left promiscuous mode Jun 10 19:11:32 zbs-sh-elf-11 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=5 -- --if-exists del-port vnet4 -- add-port ovsbr-mgt vnet4 tag=0 -- set Interface vnet4 "external-ids:attached-mac=\"52:54:00:b7:f4:07\"" -- set Interface vnet4 "external-ids:iface-id=\"c837d02d-4a4e-4f9c-9bee-7e5efce01a8e\"" -- set Interface vnet4 "external-ids:vm-id=\"83035f1e-faed-43d6-951e-08c90c9006a9\"" -- set Interface vnet4 external-ids:iface-status=active Jun 10 19:11:32 zbs-sh-elf-11 kernel: device vnet4 entered promiscuous mode Jun 10 19:11:32 zbs-sh-elf-11 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=5 -- --if-exists del-port vnet4 Thanks Laine Stump <laine@redhat.com> 于2020年6月16日周二 上午10:01写道:
On 6/15/20 2:04 PM, Daniel Henrique Barboza wrote:
On 6/12/20 3:18 AM, gongwei@smartx.com wrote:
From: gongwei <gongwei@smartx.com>
start to failed will not remove the openvswitch port, the port recycling in this case lets openvswitch handle it by itself
Signed-off-by: gongwei <gongwei@smartx.com> ---
Can you please elaborate on the commit message? By the commit title and the code, I'm assuming that you're saying that we shouldn't remove the openvswitch port if the QEMU process failed to start, for any other reason aside from SHUTOFF_FAILED.
More importantly, what "port recycling" will take effect dependent on how the qemu process is stopped (which I would think wouldn't make any different to OVS), and why is it necessary for libvirt to not do it.
Up until now, what I have known is that ports will not be removed from an OVS switch unless they are explicitly removed with ovs-vsctl, and this attachment will persist across reboots of the host system. As a matter of fact I've had cases during development where libvirt didn't remove the OVS port for a tap device when a guest was terminated, and then many *days* (and several reboots) later the same tap device name was used for a different guest that was using a Linux host bridge, and the tap device failed to attach to the Linux host bridge because it had already been auto-attached back to the OVS switch as soon as it was created.
Can you desccribe how to reproduce the situation where libvirt removes the OVS port when it shouldn't, and what is the bad outcome of that happening?
The code itself looks ok.
src/qemu/qemu_process.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index d36088ba98..439bd5b396 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -7482,7 +7482,8 @@ void qemuProcessStop(virQEMUDriverPtr driver, if (vport) { if (vport->virtPortType == VIR_NETDEV_VPORT_PROFILE_MIDONET) { ignore_value(virNetDevMidonetUnbindPort(vport)); - } else if (vport->virtPortType == VIR_NETDEV_VPORT_PROFILE_OPENVSWITCH) { + } else if (vport->virtPortType == VIR_NETDEV_VPORT_PROFILE_OPENVSWITCH && + reason != VIR_DOMAIN_SHUTOFF_FAILED) { ignore_value(virNetDevOpenvswitchRemovePort( virDomainNetGetActualBridgeName(net), net->ifname));
-- 龚伟 手机:18883262137