On 8/7/24 1:32 PM, Daniel P. Berrangé wrote:
On Wed, Aug 07, 2024 at 01:16:01PM -0400, Laine Stump wrote:
> There has been a problem for several years with libvirt's default
> virtual network conflicting with the host physical network connection
> on new installs, particularly when the "host" is actually a virtual
> machine that is, itself, connected to the libvirt default network on
> its respective host. If the two default networks use the same subnet,
> and if the nested host's libvirt happens to start its network before
> the system networking connects to the L0 host, then network
> connectivity to the L1 guest is just silently non-working.
>
> We've tried several things over the years to eliminate this problem,
> including:
>
> 1) Checking for conflicting routes/interfaces during installation of
> the libvirt-daemon-config-network package (the one containing the
> default network config file) which tries different subnets until it
> finds one that doesn't conflict. This helps in some cases, but
> fails on 2 points: a) if the installation environment is different
> from the environment where the system is actually run (e.g. a "live
> CD" image of a distro, which is built in a container by the distro
> maintainers, then could later run in any number of places, and b)
> it modifies the installed files during the rpm installation %post
> script, which is now discouraged because people building container
> images don't like dealing with that.
>
> 2) When starting a libvirt network, we now check for any route or
> interface that conflicts with the new network's IP's and
> routes. This doesn't fix the problem, but does notify the user of
> the problem *as long as libvirt starts its networks after the host
> has started its system networks*.
>
> 3) New code (in the commits immediately previous to this one) add
> support for an "autoaddr" attribute in each virtual network <ip>
> element; when autoaddr is set, the network driver goes one step
> beyond (2) and actually finds an unused subnet and sets the new
> virtual network's addresses accordingly.
>
> These are all nice in their own ways, but none of them helps in the
> situation where libvirt's networks are started first (before the
> host's own network connections are all up). This led to this patch,
> which does the following:
>
> 4) Using a NetworkManager facility (dispatcher.d pscripts, which are
> run whenever any interface is brought up or down), check for any
> libvirt networks that conflict with a newly started NetworkManager
> interface, and if a conflict is found then log a message and
> destroy the libvirt network. Most usefully, though, if this
> destroyed network has autoaddr='yes' then the script will
> immediately restart the network, which will find a new, unused
> subnet.
>
> Once this is in place, the only issues are:
>
> 1) It only works with NetworkManager. But of course almost all of the
> cases where this problem has been an issue, networking is managed
> by NetworkManager.
>
> 2) If there are guests already running and connected to the network,
> they will be disconnected, and won't be reconnected until
> libvirtd/virtqemud is restarted (one of the things the QEMU driver
> does when rereading the status of active guests is to make sure all
> their interfaces are connected to their respective networks).
>
> Signed-off-by: Laine Stump <laine(a)redhat.com>
> ---
> libvirt.spec.in | 2 +
> src/network/meson.build | 6 +
> src/network/nm-dispatcher-check-nets.py | 196 ++++++++++++++++++++++++
> 3 files changed, 204 insertions(+)
> create mode 100755 src/network/nm-dispatcher-check-nets.py
>
> diff --git a/libvirt.spec.in b/libvirt.spec.in
> index 29101e74fe..51cecfa598 100644
> --- a/libvirt.spec.in
> +++ b/libvirt.spec.in
> @@ -603,6 +603,7 @@ Network filter configuration files for cleaning guest traffic
> Summary: Network driver plugin for the libvirtd daemon
> Requires: libvirt-daemon-common = %{version}-%{release}
> Requires: libvirt-libs = %{version}-%{release}
> +Requires: python3-libxml2
> Requires: dnsmasq >= 2.41
> %if %{prefer_nftables}
> Requires: nftables
> @@ -2151,6 +2152,7 @@ exit 0
> %dir %attr(0755, root, root) %{_localstatedir}/lib/libvirt/dnsmasq/
> %attr(0755, root, root) %{_libexecdir}/libvirt_leaseshelper
> %{_libdir}/libvirt/connection-driver/libvirt_driver_network.so
> +%{_prefix}/lib/NetworkManager/dispatcher.d/50-libvirt-check-nets
> %{_mandir}/man8/virtnetworkd.8*
> %if %{with_firewalld_zone}
> %{_prefix}/lib/firewalld/zones/libvirt.xml
> diff --git a/src/network/meson.build b/src/network/meson.build
> index 8faff6eb1c..f620407759 100644
> --- a/src/network/meson.build
> +++ b/src/network/meson.build
> @@ -169,4 +169,10 @@ if conf.has('WITH_NETWORK')
> rename: [ 'libvirt-routed-in.xml' ],
> )
> endif
> +
> + install_data(
> + 'nm-dispatcher-check-nets.py',
> + install_dir: prefix / 'lib' / 'NetworkManager' /
'dispatcher.d',
> + rename: [ '50-libvirt-check-nets' ],
> + )
> endif
> diff --git a/src/network/nm-dispatcher-check-nets.py
b/src/network/nm-dispatcher-check-nets.py
> new file mode 100755
> index 0000000000..454c434c88
> --- /dev/null
> +++ b/src/network/nm-dispatcher-check-nets.py
> @@ -0,0 +1,196 @@
> +#!/usr/bin/env python3
> +#
> +# Copyright (C) 2012-2019 Red Hat, Inc.
> +#
> +# This library is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU Lesser General Public
> +# License as published by the Free Software Foundation; either
> +# version 2.1 of the License, or (at your option) any later version.
> +#
> +# This library is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> +# Lesser General Public License for more details.
> +#
> +# You should have received a copy of the GNU Lesser General Public
> +# License along with this library. If not, see
> +# <
http://www.gnu.org/licenses/>.
> +
> +import libvirt
> +import sys
> +import os
> +import libxml2
> +from ipaddress import ip_network
> +
> +# This script should be installed in
> +# /usr/lib/NetworkManager/dispatcher.d/50-libvirt-check-nets. It will be
> +# called by NetworkManager every time a network interface is taken up
> +# or down. When a new network comes up, it checks the libvirt virtual
> +# networks to see if their IP address(es) (including any static
> +# routes) are in conflict with the IP address(es) (or static routes)
> +# of the newly added interface. If so, the libvirt network is
> +# disabled. It is assumed that the user will notice that their guests
> +# no longer have network connectvity (and/or the message logged by
> +# this script), see that the network has been disabled, and then
> +# realize the conflict when they try to restart it.
> +#
> +# set checkDefaultOnly=False to check *all* active libvirt networks
> +# for conflicts with the new interface. Set to True to check only the
> +# libvirt default network (since most networks other than the default
> +# network are added post-install at a time when all of the hosts other
> +# networks are already active, it may be overkill to check all of the
> +# libvirt networks for conflict here (and instead just add more
> +# needless overheard to bringing up a new host interface).
> +#
> +checkDefaultOnly = False
> +
> +# NB: since this file is installed in /usr/lib, it really shouldn't be
> +# modified by the user, but instead should be copied to
> +# /etc/NetworkManager/dispatcher.d, where it will override the copy in
> +# /usr/lib. Even that isn't a proper solution though - if we're going
> +# to actually have this config knob, perhaps we should check for it in
> +# the environment, and if someone wants to modify it they can put a
> +# short script in /etc that exports and environment variable and then
> +# calls this script? Just thinking out loud here.
> +
> +
> +def checkconflict(conn, netname, hostnets, hostif):
...snip complex code...
It occurrs to me that this python code is entirely duplicating
logic that already exists in virtnetworkd that checks for
conflicts.
Possibly not *exactly* what's in virtnetworkd, since I'm not certain if
NM includes all static routes in the list it puts in the environment
variables (while they *would* show up in the system routing table that
our network driver uses). Of course that sounds like another reason to
prefer re-using the existing code!
This makes me want to figure out a way to just re-use the existing
code rather than having to write it again.
Could we just send a SIGUSR2 to virtnetworkd as a way to trigger it
to re-validate all running networks ? That would be easy to trigger
from non-network manager setups too.
That sounds simpler (and probably quicker), and if we could do it that I
would prefer it ("only works with NetworkManager" is #1 on my list of
'issues' above after all :-)), but what would we do in the case that
virtnetworkd (or libvirtd) wasn't currently running? Run some virsh
command to trigger virtnetworkd to start, and then send the SIGUSR2?
(this does
But then how about ignoring network manager entirely and going right
to the source, ie the kernel. Does it send out either udev or netlink
events (probably the latter), when NICs have their link status set
online/offline ? If so, virtnetworkd could just listen to the kernel
events and "do the right thing", regardless of what tool is managing
the NICs.
I imagine *something* is sent out, and it's definitely worth me
investigating. But wouldn't virtnetworkd have to be running all of the
time in order to use something like that?