Thanks for your detailed analysis, I will remake a patch
发件人: Laine Stump
发送时间: Wednesday, June 17, 2020 7:46 AM
收件人: libvir-list@redhat.com
抄送: Daniel Henrique Barboza; Bingsong Si; Wei Gong
主题: Re: [PATCH] network: Fix a race condition when shutdown & start vm at the same time
(BTW, this other patch is also trying to solve the same problem:
https://www.redhat.com/archives/libvir-list/2020-June/msg00525.html
I made comments there earlier, and have learned a bit more since then:
https://www.redhat.com/archives/libvir-list/2020-June/msg00634.html
On 6/15/20 2:36 PM, Daniel Henrique Barboza wrote:
>
>
> On 6/11/20 6:58 AM, Bingsong Si wrote:
>> when shutdown vm, the qemuProcessStop cleanup virtual interface in
>> two steps:
>
> s/when/When
>
>> 1. qemuProcessKill kill qemu process, and vif disappeared
>> 2. ovs-vsctl del-port from the brige
>>
>> if start a vm in the middle of the two steps, the new vm will reused
>> the vif,
>
> s/if/If
>
>> but removed from bridge by step 2
>>
>> Signed-off-by: Bingsong Si <owen.si@ucloud.cn>
>> ---
>> src/qemu/qemu_process.c | 8 +++++---
>> 1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
>> index d36088ba98..706248815a 100644
>> --- a/src/qemu/qemu_process.c
>> +++ b/src/qemu/qemu_process.c
>> @@ -7483,9 +7483,11 @@ void qemuProcessStop(virQEMUDriverPtr driver,
>> if (vport->virtPortType ==
>> VIR_NETDEV_VPORT_PROFILE_MIDONET) {
>> ignore_value(virNetDevMidonetUnbindPort(vport));
>> } else if (vport->virtPortType ==
>> VIR_NETDEV_VPORT_PROFILE_OPENVSWITCH) {
>> - ignore_value(virNetDevOpenvswitchRemovePort(
>> - virDomainNetGetActualBridgeName(net),
>> - net->ifname));
>> + virMacAddr mac;
>> + if (virNetDevGetMAC(net->ifname, &mac) < 0 ||
>> !virMacAddrCmp(&mac, &net->mac))
(Before anything else - virNetDevGetMAC() will actually *log* an error
in libvirt's logs if the device isn't found (which will be in nearly
100% of all cases). That will lead to people reporting it as a bug,
which gets very time consuming and expensive for anyone providing
commercial support for a product that uses libvirt. If it is really
necessary to check the MAC address of a device that legitimately may or
may not exist, then there will need to be a "Quiet" version of the
function that doesn't log any errors.)(Update after thinking about it -
I don't think we should be checking the MAC address anyway, as it
doesn't reliably differentiate "new" tap from "old" tap).
>
> Extra space between "||" and "!virMacAddrCmp(.."
>
>
> With these nits fixed:
>
>
> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
This patch is attempting to solve a race condition between one guest
starting at the same time another is stopping, and the mess that results
if the new guest uses the same name for its tap device as the old guest
used. For example, lets say libvirt thread A / guest A is doing this:
A1) the QEMU process is terminated (and tap device, e.g. "vnet0", is
implicitly deleted) either by libvirtd or by some external force
(including possibly the QEMU process itself
A2) the tap's associated port (also "vnet0") is removed from the OVS
switch by libvirt.
While libvirt thread B is doing this:
B1) a new tap device is created for a new QEMU process. If A1 has
already happened, then the kernel will likely give the new tap the same
name - "vnet0".
B2) the new tap device is attached to an OVS switch (or possibly a Linux
host bridge).
The problem occurs when if B2 happens before A2, which could result in
B2 attaching the new tap to the OVS switch, and then A2 disconnecting it
from the switch. So libvirt thinks the new QEMU guest tap is attached to
the switch, but it isn't.
This patch attempts to eliminate the race by checking, prior to removing
"old tap"s port on the switch, that 1) the tap device doesn't exist, or
that if it does 2) that the MAC address of the tap device is unchanged.
Assuming that the two guests would not use the same MAC address for
their tap devices (which is probably the case, but isn't *required* to
be true), this does significantly narrow the potential time for a race
condition, and in particular makes sure that we never remove a port that
hasn't just been "re-added" by the new QEMU.
However, this just creates a smaller window for the race, and different
problem for the remainder of the time.
1) smaller window - it would still be possible for the following to happen:
a) old qemu terminates, tap device "vnet0" is deleted
b) libvirt checks MAC address and learns that there is no device
"vnet0",
so it calls virNetDevOpenvswitchRemovePort(), but before ovs-vsctl
can be called...
c) libvirt creates a new tap device for new QEMU, kernel names it
"vnet0"
d) libvirt calls virNetDevOpenvswitchAddPort() and the new tap
device "vnet0" to the switch
e) the ovs-vsctl from (b) is finally able to run, and removes "vnet0"
from the switch.
Granted, this is *highly* unlikely, since there is nothing extra between
checking MAC address and removing the port, but there is nothing
enforcing it.
2) New Problem - I think the testing here was done with two guests who
both attached their tap to the same (or another) OVS switch. It's
relying on the ability to attach the tap device to a new switch/bridge
even if it is already attached to some other switch bridge. Normally
ovs-vsctl would refuse to add a port to a switch if a port by that same
name was already on any OVS switch. I just looked it up though, and in
this case libvirt is able to make this work by including "--if-exists
del-port $ifname" in the ovs-vsctl command that *adds* the new port.
However, if you have the same situation but the new switch device is
instead a Linux bridge, the attempt to attach to the bridge fails.
So, if thread "B" has already created the new tap device by the time
thread "A" is deciding whether or not to remove the old port, the port
won't be removed, and if the new guest "B" is using a Linux host bridge,
libvirt will fail to attach the new tap to the bridge.
So, a summary of the problems with this patch:
1) The race window is reduced (and may be gone in practical terms), but
not guaranteed to be eliminated.
2) For the time during the "previous race window start" and "new race
window start" a new problem has been created - if the new guest uses a
Linux host bridge, the connection will fail
3) virNetDevGetMAC() will put an error in the system logs if the device
doesn't exist (and it almost always will *not* exist, so this will be
significant
4) Although it is almost always the case that two guests will not use
the same MAC address for their network interfaces, there is nothing
preventing it - we shouldn't assume that MAC addresses are unique. I
think that check is actually superfluous, since the qemu process has
always been terminated by the time we get to that place in the code, so
the tap device should have been auto-deleted.
What do I think should be done? Good question. Possibly we could:
A) Call virNetDevTapReattachBridge() rather than
virNetDevTapAttachBridge() in virNetDevTapCreateInBridgePort(). This
would eliminate problem (2).
B) Instead of checking if the tap device MAC address matches, just call
virNetDevExists() - if it exists, then skip the RemovePort() - this
eliminates problems (3) and (4). (NB - this would fail if it turns out
that tap device deletion isn't completed synchronously with qemu process
termination!)
C) If we want to make it 100% sound, we need to make "check for
interface existence + removeport" an atomic operation, and mutually
exclusive with virNetDevTapCreate(). This would eliminate problem (1)
>
>> + ignore_value(virNetDevOpenvswitchRemovePort(
>> + virDomainNetGetActualBridgeName(net),
>> + net->ifname));
>> }
>> }
>>
>