Thanks for your detailed analysis, I will remake a patch

 

 

 

发件人: Laine Stump
发送时间: Wednesday, June 17, 2020 7:46 AM
收件人: libvir-list@redhat.com
抄送: Daniel Henrique Barboza; Bingsong Si; Wei Gong
主题: Re: [PATCH] network: Fix a race condition when shutdown & start vm at the same time

 

(BTW, this other patch is also trying to solve the same problem:

 

 

https://www.redhat.com/archives/libvir-list/2020-June/msg00525.html

 

 

I made comments there earlier, and have learned a bit more since then:

 

 

https://www.redhat.com/archives/libvir-list/2020-June/msg00634.html

 

 

On 6/15/20 2:36 PM, Daniel Henrique Barboza wrote:

> 

> 

> On 6/11/20 6:58 AM, Bingsong Si wrote:

>> when shutdown vm, the qemuProcessStop cleanup virtual interface in

>> two steps:

> 

> s/when/When

> 

>> 1. qemuProcessKill kill qemu process, and vif disappeared

>> 2. ovs-vsctl del-port from the brige

>> 

>> if start a vm in the middle of the two steps, the new vm will reused

>> the vif,

> 

> s/if/If

> 

>> but removed from bridge by step 2

>> 

>> Signed-off-by: Bingsong Si <owen.si@ucloud.cn>

>> ---

>>   src/qemu/qemu_process.c | 8 +++++---

>>   1 file changed, 5 insertions(+), 3 deletions(-)

>> 

>> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c

>> index d36088ba98..706248815a 100644

>> --- a/src/qemu/qemu_process.c

>> +++ b/src/qemu/qemu_process.c

>> @@ -7483,9 +7483,11 @@ void qemuProcessStop(virQEMUDriverPtr driver,

>>               if (vport->virtPortType ==

>> VIR_NETDEV_VPORT_PROFILE_MIDONET) {

>> ignore_value(virNetDevMidonetUnbindPort(vport));

>>               } else if (vport->virtPortType ==

>> VIR_NETDEV_VPORT_PROFILE_OPENVSWITCH) {

>> -                ignore_value(virNetDevOpenvswitchRemovePort(

>> - virDomainNetGetActualBridgeName(net),

>> -                                 net->ifname));

>> +                virMacAddr mac;

>> +                if (virNetDevGetMAC(net->ifname, &mac) < 0 || 

>> !virMacAddrCmp(&mac, &net->mac))

 

 

(Before anything else - virNetDevGetMAC() will actually *log* an error

in libvirt's logs if the device isn't found (which will be in nearly

100% of all cases). That will lead to people reporting it as a bug,

which gets very time consuming and expensive for anyone providing

commercial support for a product that uses libvirt. If it is really

necessary to check the MAC address of a device that legitimately may or

may not exist, then there will need to be a "Quiet" version of the

function that doesn't log any errors.)(Update after thinking about it -

I don't think we should be checking the MAC address anyway, as it

doesn't reliably differentiate "new" tap from "old" tap).

 

 

> 

> Extra space between "||" and "!virMacAddrCmp(.."

> 

> 

> With these nits fixed:

> 

> 

> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>

 

 

 

This patch is attempting to solve a race condition between one guest

starting at the same time another is stopping, and the mess that results

if the new guest uses the same name for its tap device as the old guest

used. For example, lets say libvirt thread A / guest A is doing this:

 

 

A1) the QEMU process is terminated (and tap device, e.g. "vnet0", is

implicitly deleted) either by libvirtd or by some external force

(including possibly the QEMU process itself

 

A2) the tap's associated port (also "vnet0") is removed from the OVS

switch by libvirt.

 

 

While libvirt thread B is doing this:

 

B1) a new tap device is created for a new QEMU process. If A1 has

already happened, then the kernel will likely give the new tap the same

name - "vnet0".

 

B2) the new tap device is attached to an OVS switch (or possibly a Linux

host bridge).

 

 

The problem occurs when if B2 happens before A2, which could result in

B2 attaching the new tap to the OVS switch, and then A2 disconnecting it

from the switch. So libvirt thinks the new QEMU guest tap is attached to

the switch, but it isn't.

 

 

This patch attempts to eliminate the race by checking, prior to removing

"old tap"s port on the switch, that 1) the tap device doesn't exist, or

that if it does 2) that the MAC address of the tap device is unchanged.

 

 

Assuming that the two guests would not use the same MAC address for

their tap devices (which is probably the case, but isn't *required* to

be true), this does significantly narrow the potential time  for a race

condition, and in particular makes sure that we never remove a port that

hasn't just been "re-added" by the new QEMU.

 

 

However, this just creates a smaller window for the race, and different

problem for the remainder of the time.

 

1) smaller window - it would still be possible for the following to happen:

 

 

   a) old qemu terminates, tap device "vnet0" is deleted

 

   b) libvirt checks MAC address and learns that there is no device

"vnet0",

 

      so it calls virNetDevOpenvswitchRemovePort(), but before ovs-vsctl

 

      can be called...

 

   c) libvirt creates a new tap device for new QEMU, kernel names it

"vnet0"

 

   d) libvirt calls virNetDevOpenvswitchAddPort() and the new tap

 

      device "vnet0" to the switch

 

   e) the ovs-vsctl from (b) is finally able to run, and removes "vnet0"

 

      from the switch.

 

 

Granted, this is *highly* unlikely, since there is nothing extra between

checking MAC address and removing the port, but there is nothing

enforcing it.

 

2) New Problem - I think the testing here was done with two guests who

both attached their tap to the same (or another) OVS switch. It's

relying on the ability to attach the tap device to a new switch/bridge

even if it is already attached to some other switch bridge. Normally

ovs-vsctl would refuse to add a port to a switch if a port by that same

name was already on any OVS switch. I just looked it up though, and in

this case libvirt is able to make this work by including "--if-exists

del-port $ifname" in the ovs-vsctl command that *adds* the new port.

However, if you have the same situation but the new switch device is

instead a Linux bridge, the attempt to attach to the bridge fails.

 

 

So, if thread "B" has already created the new tap device by the time

thread "A" is deciding whether or not to remove the old port, the port

won't be removed, and if the new guest "B" is using a Linux host bridge,

libvirt will fail to attach the new tap to the bridge.

 

 

So, a summary of the problems with this patch:

 

 

1) The race window is reduced (and may be gone in practical terms), but

not guaranteed to be eliminated.

 

 

2) For the time during the "previous race window start" and "new race

window start" a new problem has been created - if the new guest uses a

Linux host bridge, the connection will fail

 

 

3) virNetDevGetMAC() will put an error in the system logs if the device

doesn't exist (and it almost always will *not* exist, so this will be

significant

 

 

4) Although it is almost always the case that two guests will not use

the same MAC address for their network interfaces, there is nothing

preventing it - we shouldn't assume that MAC addresses are unique. I

think that check is actually superfluous, since the qemu process has

always been terminated by the time we get to that place in the code, so

the tap device should have been auto-deleted.

 

 

What do I think should be done? Good question. Possibly we could:

 

 

A) Call virNetDevTapReattachBridge() rather than

virNetDevTapAttachBridge() in virNetDevTapCreateInBridgePort(). This

would eliminate problem (2).

 

 

B) Instead of checking if the tap device MAC address matches, just call

virNetDevExists() - if it exists, then skip the RemovePort() - this

eliminates problems (3) and (4). (NB - this would fail if it turns out

that tap device deletion isn't completed synchronously with qemu process

termination!)

 

 

C) If we want to make it 100% sound, we need to make "check for

interface existence + removeport" an atomic operation, and mutually

exclusive with virNetDevTapCreate(). This would eliminate problem (1)

 

 

 

> 

>> + ignore_value(virNetDevOpenvswitchRemovePort(

>> + virDomainNetGetActualBridgeName(net),

>> +                                     net->ifname));

>>               }

>>           }

>> 

>