environment:libvirt-4.3.0 qemu-kvm-ev-2.10.0 kernel-3.10.0-1062 centos7
openvswitch-2.3.1
vm network xml :
<interface type='bridge'>
<mac address='52:54:00:46:45:95'/>
<source bridge='ovsbr-mgt'/>
<vlan>
<tag id='0'/>
</vlan>
<virtualport type='openvswitch'>
<parameters interfaceid='596c6ab7-4557-4935-af97-62a35d933f8d'/>
</virtualport>
<target dev='vnet0'/>
<model type='virtio'/>
<link state='up'/>
<alias name='net0'/>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x04'
function='0x0'/>
</interface>
qemuProcessStart in qemu_process.c failed to start.
The first is qemu process stop(At this time, the kernel will recycle tap
device,
and the tap device is applied by other virtual machines).Then, ovs
removevport.
It is possible to processing concurrently qemuProcessStart and
qemuProcessStop.
qemuProcessStop(ovs removevport) may remove ports of other virtual machines
while using openvswitch virtualport.
for example:
Failure to start the vm1, the tap device vnet0 will be recovered first(at
this time vm2 starts and
uses vnet0 device,and ovs add vnet0 port), then the removevport vnet0(
remove vnet0
belonging to vm2 at this time ). During this time interval,
vm2 will apply for the same tap device vnet0 and add port vnet0.
At this time, removing the port from vm1 will cause the port of vm2 to be
lost.
vm2 will not be able to access the network through this vnet0.
reproduce:
Batch start or migrate 10 virtual machines to the same node, one of the
virtual machines start failed.
This failure may be that the storage cannot connect or other failures(when
we reproduced internally,
one of the virtual machines was connected to an invalid storage, and it
was artificially failed).
this problem will cause:
After batch migration, the network of a virtual machine cannot be accessed,
and the virtual machine service is interrupted
libvirt handles ovs logs:
Jun 10 19:11:32 zbs-sh-elf-11 ovs-vsctl: ovs|00001|vsctl|INFO|Called as
ovs-vsctl --timeout=5 -- --if-exists del-port vnet4 -- add-port ovsbr-mgt
vnet4 tag=0 -- set Interface vnet4
"external-ids:attached-mac=\"52:54:00:92:7e:7f\"" -- set Interface
vnet4
"external-ids:iface-id=\"afb3a67a-5e5d-4ca6-b625-ebce6a9c8d03\"" --
set
Interface vnet4
"external-ids:vm-id=\"7b9e4d5a-e8e9-4527-9b89-dd1f74d02526\"" -- set
Interface vnet4 external-ids:iface-status=active
Jun 10 19:11:32 zbs-sh-elf-11 kernel: device vnet4 entered promiscuous mode
Jun 10 19:11:32 zbs-sh-elf-11 kernel: device vnet4 left promiscuous mode
Jun 10 19:11:32 zbs-sh-elf-11 ovs-vsctl: ovs|00001|vsctl|INFO|Called as
ovs-vsctl --timeout=5 -- --if-exists del-port vnet4 -- add-port ovsbr-mgt
vnet4 tag=0 -- set Interface vnet4
"external-ids:attached-mac=\"52:54:00:b7:f4:07\"" -- set Interface
vnet4
"external-ids:iface-id=\"c837d02d-4a4e-4f9c-9bee-7e5efce01a8e\"" --
set
Interface vnet4
"external-ids:vm-id=\"83035f1e-faed-43d6-951e-08c90c9006a9\"" -- set
Interface vnet4 external-ids:iface-status=active
Jun 10 19:11:32 zbs-sh-elf-11 kernel: device vnet4 entered promiscuous mode
Jun 10 19:11:32 zbs-sh-elf-11 ovs-vsctl: ovs|00001|vsctl|INFO|Called as
ovs-vsctl --timeout=5 -- --if-exists del-port vnet4
Thanks
Laine Stump <laine(a)redhat.com> 于2020年6月16日周二 上午10:01写道:
On 6/15/20 2:04 PM, Daniel Henrique Barboza wrote:
>
>
> On 6/12/20 3:18 AM, gongwei(a)smartx.com wrote:
>> From: gongwei <gongwei(a)smartx.com>
>>
>> start to failed will not remove the openvswitch port,
>> the port recycling in this case lets openvswitch handle it by itself
>>
>> Signed-off-by: gongwei <gongwei(a)smartx.com>
>> ---
>
> Can you please elaborate on the commit message? By the commit title and
> the code, I'm assuming that you're saying that we shouldn't remove the
> openvswitch port if the QEMU process failed to start, for any other
> reason aside from SHUTOFF_FAILED.
More importantly, what "port recycling" will take effect dependent on
how the qemu process is stopped (which I would think wouldn't make any
different to OVS), and why is it necessary for libvirt to not do it.
Up until now, what I have known is that ports will not be removed from
an OVS switch unless they are explicitly removed with ovs-vsctl, and
this attachment will persist across reboots of the host system. As a
matter of fact I've had cases during development where libvirt didn't
remove the OVS port for a tap device when a guest was terminated, and
then many *days* (and several reboots) later the same tap device name
was used for a different guest that was using a Linux host bridge, and
the tap device failed to attach to the Linux host bridge because it had
already been auto-attached back to the OVS switch as soon as it was
created.
Can you desccribe how to reproduce the situation where libvirt removes
the OVS port when it shouldn't, and what is the bad outcome of that
happening?
>
> The code itself looks ok.
>
>
>
>> src/qemu/qemu_process.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
>> index d36088ba98..439bd5b396 100644
>> --- a/src/qemu/qemu_process.c
>> +++ b/src/qemu/qemu_process.c
>> @@ -7482,7 +7482,8 @@ void qemuProcessStop(virQEMUDriverPtr driver,
>> if (vport) {
>> if (vport->virtPortType ==
>> VIR_NETDEV_VPORT_PROFILE_MIDONET) {
>> ignore_value(virNetDevMidonetUnbindPort(vport));
>> - } else if (vport->virtPortType ==
>> VIR_NETDEV_VPORT_PROFILE_OPENVSWITCH) {
>> + } else if (vport->virtPortType ==
>> VIR_NETDEV_VPORT_PROFILE_OPENVSWITCH &&
>> + reason != VIR_DOMAIN_SHUTOFF_FAILED) {
>> ignore_value(virNetDevOpenvswitchRemovePort(
>> virDomainNetGetActualBridgeName(net),
>> net->ifname));
>>
>
--
龚伟
手机:18883262137