On 6/15/20 11:10 PM, Wei Gong wrote:
environment:libvirt-4.3.0 qemu-kvm-ev-2.10.0 kernel-3.10.0-1062
centos7 openvswitch-2.3.1
vm network xml :
<interface type='bridge'>
<mac address='52:54:00:46:45:95'/>
<source bridge='ovsbr-mgt'/>
<vlan>
<tag id='0'/>
</vlan>
<virtualport type='openvswitch'>
<parameters interfaceid='596c6ab7-4557-4935-af97-62a35d933f8d'/>
</virtualport>
<target dev='vnet0'/>
<model type='virtio'/>
<link state='up'/>
<alias name='net0'/>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x04'
function='0x0'/>
</interface>
qemuProcessStart in qemu_process.c failed to start.
The first is qemu process stop(At this time, the kernel will recycle
tap device,
and the tap device is applied by other virtual machines).Then, ovs
removevport.
It is possible to processing concurrently qemuProcessStart and
qemuProcessStop.
qemuProcessStop(ovs removevport) may remove ports of other virtual
machines
while using openvswitch virtualport.
for example:
Failure to start the vm1, the tap device vnet0 will be recovered
first(at this time vm2 starts and
uses vnet0 device,and ovs add vnet0 port), then the removevport vnet0(
remove vnet0
belonging to vm2 at this time ). During this time interval,
vm2 will apply for the same tap device vnet0 and add port vnet0.
At this time, removing the port from vm1 will cause the port of vm2
to be lost.
vm2 will not be able to access the network through this vnet0.
reproduce:
Batch start or migrate 10 virtual machines to the same node, one of
the virtual machines start failed.
This failure may be that the storage cannot connect or other
failures(when we reproduced internally,
one of the virtual machines was connected to an invalid storage, and
it was artificially failed).
this problem will cause:
After batch migration, the network of a virtual machine cannot be
accessed,
and the virtual machine service is interrupted
Okay, I understand the problem now, but your patch doesn't fix it.
The problem is (as also described in
https://www.redhat.com/archives/libvir-list/2020-June/msg00481.html ) a
race condition created when the qemu process is shutdown just as a new
qemu process is started - since the old tap device is deleted (and its
name made available for re-use) implicitly as a part of the old qemu
process being terminated, and since the old qemu process has terminated
before we remove the port from OVS, a new tap (with the old name, as the
kernel thinks it is now available) may have already been created by the
kernel by the time qemuProcessStop() gets around to removing the port
associated (by name) with the old tap from the OVS switch.
And we can't eliminate the race by simply moving the call to
virNetDevOpenvswitchRemovePort() up before the call to qemuProcessKill()
- it is also possible that qemu could have exited by itself, or that
some outside force other than libvirt killed it - in this case the tap
has already been deleted by the time qemuProcessStop() is reached.
As for your method of eliminating the race, there are two problems:
1) if virNetDevOpenvswitchRemovePort() isn't called, then OVS will
automatically grab the new tap device as soon as it is created and
re-attach it to the old switch. As long as the new qemu process asks to
attach it to that same switch, then there is no problem. But if the new
process tries to attach the device to a *different* switch (for example,
a Linux host bridge) then the attach will fail.
2) your method of deciding whether or not
virNetDevOpenvswitchRemovePort() should be called by libvirt is invalid
- the reason isn't always set to VIR_DOMAIN_SHUTOFF_FAILED when the qemu
process has been terminated external to libvirt. But beyond that, the
code shows that the qemu process is *always* terminated prior to the
call to virNetDevOpenvswitchRemovePort(). So at most, your patch might
be making the race window smaller in some cases, but it isn't
eliminating it.
Fixing this race condition requires something more than just adding an
extra clause to a conditional. It may be possible to tell OVS to
automatically delete the port as the tap is deleted (which would be
nice, but I'm actually not expecting to find a way to do that), or it
may require libvirt to name and track tap devices itself (as it already
does for macvtap devices), which *also* has problems - in particular
whether or not we need to account for the possibility of multiple
simultaneous libvirtd processes)
libvirt handles ovs logs:
Jun 10 19:11:32 zbs-sh-elf-11 ovs-vsctl: ovs|00001|vsctl|INFO|Called
as ovs-vsctl --timeout=5 -- --if-exists del-port vnet4 -- add-port
ovsbr-mgt vnet4 tag=0 -- set Interface vnet4
"external-ids:attached-mac=\"52:54:00:92:7e:7f\"" -- set Interface
vnet4
"external-ids:iface-id=\"afb3a67a-5e5d-4ca6-b625-ebce6a9c8d03\""
-- set Interface vnet4
"external-ids:vm-id=\"7b9e4d5a-e8e9-4527-9b89-dd1f74d02526\"" -- set
Interface vnet4 external-ids:iface-status=active
Jun 10 19:11:32 zbs-sh-elf-11 kernel: device vnet4 entered promiscuous
mode
Jun 10 19:11:32 zbs-sh-elf-11 kernel: device vnet4 left promiscuous mode
Jun 10 19:11:32 zbs-sh-elf-11 ovs-vsctl: ovs|00001|vsctl|INFO|Called
as ovs-vsctl --timeout=5 -- --if-exists del-port vnet4 -- add-port
ovsbr-mgt vnet4 tag=0 -- set Interface vnet4
"external-ids:attached-mac=\"52:54:00:b7:f4:07\"" -- set Interface
vnet4
"external-ids:iface-id=\"c837d02d-4a4e-4f9c-9bee-7e5efce01a8e\""
-- set Interface vnet4
"external-ids:vm-id=\"83035f1e-faed-43d6-951e-08c90c9006a9\"" -- set
Interface vnet4 external-ids:iface-status=active
Jun 10 19:11:32 zbs-sh-elf-11 kernel: device vnet4 entered promiscuous
mode
Jun 10 19:11:32 zbs-sh-elf-11 ovs-vsctl: ovs|00001|vsctl|INFO|Called
as ovs-vsctl --timeout=5 -- --if-exists del-port vnet4
Thanks
Laine Stump <laine(a)redhat.com <mailto:laine@redhat.com>>
于2020年6月16日周二 上午10:01写道:
On 6/15/20 2:04 PM, Daniel Henrique Barboza wrote:
>
>
> On 6/12/20 3:18 AM, gongwei(a)smartx.com
<mailto:gongwei@smartx.com> wrote:
>> From: gongwei <gongwei(a)smartx.com <mailto:gongwei@smartx.com>>
>>
>> start to failed will not remove the openvswitch port,
>> the port recycling in this case lets openvswitch handle it by
itself
>>
>> Signed-off-by: gongwei <gongwei(a)smartx.com
<mailto:gongwei@smartx.com>>
>> ---
>
> Can you please elaborate on the commit message? By the commit
title and
> the code, I'm assuming that you're saying that we shouldn't
remove the
> openvswitch port if the QEMU process failed to start, for any other
> reason aside from SHUTOFF_FAILED.
More importantly, what "port recycling" will take effect dependent on
how the qemu process is stopped (which I would think wouldn't make
any
different to OVS), and why is it necessary for libvirt to not do it.
Up until now, what I have known is that ports will not be removed
from
an OVS switch unless they are explicitly removed with ovs-vsctl, and
this attachment will persist across reboots of the host system. As a
matter of fact I've had cases during development where libvirt didn't
remove the OVS port for a tap device when a guest was terminated, and
then many *days* (and several reboots) later the same tap device name
was used for a different guest that was using a Linux host bridge,
and
the tap device failed to attach to the Linux host bridge because
it had
already been auto-attached back to the OVS switch as soon as it
was created.
Can you desccribe how to reproduce the situation where libvirt
removes
the OVS port when it shouldn't, and what is the bad outcome of that
happening?
>
> The code itself looks ok.
>
>
>
>> src/qemu/qemu_process.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
>> index d36088ba98..439bd5b396 100644
>> --- a/src/qemu/qemu_process.c
>> +++ b/src/qemu/qemu_process.c
>> @@ -7482,7 +7482,8 @@ void qemuProcessStop(virQEMUDriverPtr driver,
>> if (vport) {
>> if (vport->virtPortType ==
>> VIR_NETDEV_VPORT_PROFILE_MIDONET) {
>> ignore_value(virNetDevMidonetUnbindPort(vport));
>> - } else if (vport->virtPortType ==
>> VIR_NETDEV_VPORT_PROFILE_OPENVSWITCH) {
>> + } else if (vport->virtPortType ==
>> VIR_NETDEV_VPORT_PROFILE_OPENVSWITCH &&
>> + reason != VIR_DOMAIN_SHUTOFF_FAILED) {
>> ignore_value(virNetDevOpenvswitchRemovePort(
>> virDomainNetGetActualBridgeName(net),
>> net->ifname));
>>
>
--
龚伟
手机:18883262137