On 5/5/20 5:09 AM, Daniel P. Berrangé wrote:
On Mon, May 04, 2020 at 12:04:45PM -0400, Laine Stump wrote:
> There has been a problem for several years with libvirt's default
> network conflicting with the host network connection on new installs,
> particularly when the "host" is actually virtual machine that is,
> itself, connected to the libvirt default network on its respective
> host. If the two default networks use the same subnet, and if the
> nested host's libvirtd happens to start its network before the system
> networking connects to the toplevel host, then network connectivity to
> the guest is just silently non-working.
>
> We've tried several things over the years to eliminate this problem,
> including:
>
> 1) Checking for conflicting routes/interfaces during installation of
> the libvirt-daemon-config-network package (the one containing the
> default network config file) which tries different subnets until it
> finds one that doesn't conflict. This helps in some cases, but fails
> on 2 points: a) if the installation environment is different from the
> environment where the system is actually run (e.g. a "live CD" image
> of a distro, which is built in a container by the distro maintainers,
> then could be run in any number of places, and b) it modifies the
> installed files during the rpm installation post_install script, which
> is now discouraged because people building container images don't like
> dealing with that.
>
> 2) When starting a libvirt network, we now check for any route or
> interface that conflicts with the new network's IP's and routes. This
> doesn't fix the problem, but does notify the user of the problem *as
> long as libvirt starts its networks after the host has started its
> system networks*.
>
> Neither of these help in the situation where libvirt's networks are
> started first. The script in this patch uses a NetworkManager facility
> (dispatcher.d scripts, which are run whenever any interface is
> brought up or down) to check for any libvirt networks that conflict
> with the new NetworkManager interface, and if it finds a conflict it
> logs a message and destroys the libvirt network. So as with case (2)
> above, it doesn't solve the problem, but it does make it more obvious,
> and also makes sure the host networking isn't borked, so you can still
> ssh into the machine and fix it.
>
> There are a few caveats/misgivings with the script I've come up with,
> which cause me to post it as just an RFC:
>
> 1) It's written in python, and uses libxml2
>
> When I first sat down to make this script (following a suggestion by
> danpb during an IRC discussion), I immediately put #!/bin/bash at the
> top of the file, because it's supposed to be a script. But then I
> remembered that we're trying to narrow down the usage of languages in
> libvirt, and anyway it would be nice to be able to properly parse the
> XML to get the IP addresses/netmasks/prefixes. So I instead wrote it
> in python. This makes for a less ugly program, but it does mean that
> the daemon-config-network package is now dependent on python3-libxml
> (and python3 of course), which pulls in a bunch of other packages.
> Everybody's dev systems already have these packages and their
> dependencies installed (since both are required to build) and many
> other users systems already have them (they are required by
> virt-install and virt-manager, among others), including the Fedora
> live CD, for example. But not *all* systems have them. There has been
> a lot of work to reduce package bloat caused by pulling in more and
> more packages, so I'm reluctant to contribute to that. On the other
> hand, someone who is looking to minimize their system footprint will
> also probably not be installing the libvirt default network, so maybe
> it's acceptable (I'm leaning towards "not", but want to know if
anyone
> else has a different opinion)
I think this is a non-issue, since you've bundled the script into
the libvirt-daemon-config-network RPM. It would have been more of
an issue if in the libvirt-daemon-driver-network RPM as that would
have made it mandatory. Even then though, I really wouldn't be very
bothered by it.
Okay. Since you are the person who spent the most time analyzing the
footprint size of libvirt in various configurations and situations, (and
since I dislike the idea of writing yet another ugly hack of a bash
script) I won't dispute your opinion :-)
> 2) As delivered, it checks for conflicts with *all* libvirt
> networks.
>
> While that was fun to write, and in theory is the right
> thing to do, in practice it may be / probably is overkill. As we
> discussed in IRC, the main time when this is a problem is just after
> the first boot of a new OS / libvirt installation, and the only
> libvirt network that's going to exist at that time is the default
> network. So while having a loop that scans all libvirt networks is
> more complete, in almost all cases it is just wasting time. (I did
> provide a bool at the top of the script that can be modified to tell
> it to only check the default network, but of course the script is
> installed in /usr/lib, and it's not proper to modify files in
> /usr/lib, so that isn't a real-world solution, but more a way to allow
> people (i.e. "me") right now to test out the different behaviors.
I figure checking all networks is fine.
I guess if nothing else, we'll learn all about how efficient our API
calls are when someone inevitably has a system with hundreds of libvirt
networks and hundreds of host interfaces (NM is very thorough about this
- it even calls the scripts when *libvirt* creates a bridge device (even
though we never register it with NM in any way), and manages to get at
least the bridge device's IP address correct in the environment vars it
sends to the scripts; I haven't tried to find their code, but I guess
they must be calling the dispatch.d scripts just after they detect that
we've brought the bridge device online - fortunately we set all IP
addresses for the bridge, and associated static routes, before we bring
it online.
(One thing I just thought of - it doesn't seem quite as "tidy" if we
have a script that checks all networks for conflict, but that script is
only installed if a user chooses to install the package containing the
default network. That almost points to putting it in
libvirt-daemon-driver-network, but again that would mean *everybody*
would get python3 and python3-libxml.)
> 3) This only works if NetworkManager is enabled.
>
> I brought this up when danpb first mentioned the idea, and he rightly
> pointed out that every report of this problem we've had has been from
> the first boot of a new install on a system that used Network Manager
> - anybody using any other method of networking config on their host
> has had to manually intervene to get it setup, but NM tries to do
> everything auto-magically. So while there may still be a very
> occasional rare occurence of a silent networking failure even with
> this script installed, this should eliminate 99% of the cases.
I guess there could be another tool that is not NetworkManager, but
does the same kind of job as NetworkManager which would want this
functionality.
Yeah, but in order for us to make something generally applicable, we
would need to monitor for interface creation ourselves, and that would
require libvirtd to always be running. Maybe when a 2nd example comes
along, we can extract the host IP/prefix info part from this script and
put it into a mutually-used script somewhere. (or we might decide it's
simple enough that it's easier just to copy it).
> So, the questions I have are:
>
> 1) Do we want to allow adding the dependency on python3-libxml to the
> libvirt-daemon-config-network package in order to avoid adding a bash
> script. Or should I just redo it in bash
I think this is a non-issue
> 2) Do we want something that checks all active networks as this does,
> or just the default network (and if the latter, should we make it
> easily modifiable to check all networks, or just strip it down as much
> as possible)
We could have a config param in /etc/libvirt/something.conf. I'm
fine with checking all networks by default though.
Does "could" == "should"? Or should I just leave it as is? And if I do
add it, should it be its own file, or glommed into some existing file?
(The former increases clutter in the filesystem, the latter would
increase clutter in whichever .conf file was picked, and might create
unnecessary interconnections between packages. But I'm probably way
over-thinking it...)
> 3) I could eliminate the libxml2 dependency by just grepping the xml
> for "<address .*ip=". Is reduced correctness (which is probably 0%
> different in practice) worth the reduced package dependencies?
We're making functional changes to the host, so its important that
we are robust in our code. As such I strongly prefer that we're
correctly parsing XML instead of regex'ing data out of it.
Okay. As long as the extra dependency isn't a problem, I 100% agree.
(especially if we're checking all networks, and not just the one that
comes with the stock system)
> As a single question: on a sliding scale of "this script" ----->
> "simple bash script" where do we want to land?
The right answer to any problem is never a shell script :-)
Now you're just making me feel bad (I've written a lot of ugly hacky
bash scripts in my time)
> diff --git a/libvirt.spec.in b/libvirt.spec.in
> index 6abf97df85..e40068b2be 100644
> --- a/libvirt.spec.in
> +++ b/libvirt.spec.in
> @@ -496,6 +496,7 @@ Requires: libvirt-libs = %{version}-%{release}
> Requires: dnsmasq >= 2.41
> Requires: radvd
> Requires: iptables
> +Requires: python3-libxml2
>
> %description daemon-driver-network
> The network driver plugin for the libvirtd daemon, providing
> @@ -1630,6 +1631,7 @@ exit 0
> %dir %attr(0755, root, root) %{_localstatedir}/lib/libvirt/dnsmasq/
> %attr(0755, root, root) %{_libexecdir}/libvirt_leaseshelper
> %{_libdir}/%{name}/connection-driver/libvirt_driver_network.so
> +%{_prefix}/lib/NetworkManager/dispatcher.d/50-libvirt-check-nets
I was sceptical about /usr/lib being right, but it seems that NM
really does want this dir.
Yeah. If a user is going to add their own script, it should go in
/etc/NetworkManager/dispatcher.d, but anything installed as part of a
package should go in /usr/lib/NetworkManager/dispatcher.d (and if a user
wants to override it, they place a file of the same name in /etc/....)
>
> %if %{with_firewalld_zone}
> %{_prefix}/lib/firewalld/zones/libvirt.xml
> diff --git a/src/network/Makefile.inc.am b/src/network/Makefile.inc.am
> index 196a30e16c..d58e09f88b 100644
> --- a/src/network/Makefile.inc.am
> +++ b/src/network/Makefile.inc.am
> @@ -170,6 +170,9 @@ install-data-network:
> ( cd $(DESTDIR)$(confdir)/qemu/networks/autostart && \
> rm -f default.xml && \
> $(LN_S) ../default.xml default.xml )
> + $(MKDIR_P) "$(DESTDIR)$(prefix)/lib/NetworkManager/dispatcher.d"
> + $(INSTALL_DATA) $(srcdir)/network/nm-dispatcher-check-nets.py \
> + $(DESTDIR)$(prefix)/lib/NetworkManager/dispatcher.d/50-libvirt-check-nets
> if WITH_FIREWALLD_ZONE
> $(MKDIR_P) "$(DESTDIR)$(prefix)/lib/firewalld/zones"
> $(INSTALL_DATA) $(srcdir)/network/libvirt.zone \
> @@ -189,7 +192,10 @@ endif WITH_FIREWALLD_ZONE
>
> endif WITH_NETWORK
>
> -EXTRA_DIST += network/default.xml network/libvirt.zone
> +EXTRA_DIST += \
> + network/default.xml \
> + network/nm-dispatcher-check-nets.py \
> + network/libvirt.zone
>
> .PHONY: \
> install-data-network \
> diff --git a/src/network/nm-dispatcher-check-nets.py
b/src/network/nm-dispatcher-check-nets.py
> new file mode 100755
> index 0000000000..9225a08ea1
> --- /dev/null
> +++ b/src/network/nm-dispatcher-check-nets.py
> @@ -0,0 +1,182 @@
> +#!/usr/bin/env python3
> +#
> +# Copyright (C) 2012-2019 Red Hat, Inc.
2020
Dang! Cut-Paste got me again!
> +#
> +# This library is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU Lesser General Public
> +# License as published by the Free Software Foundation; either
> +# version 2.1 of the License, or (at your option) any later version.
> +#
> +# This library is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> +# Lesser General Public License for more details.
> +#
> +# You should have received a copy of the GNU Lesser General Public
> +# License along with this library. If not, see
> +# <
http://www.gnu.org/licenses/>.
> +
> +import libvirt
> +import sys
> +import os
> +import libxml2
> +from ipaddress import ip_network
> +
> +# This script should be installed in
> +# /usr/lib/NetworkManager/dispatcher.d/50-libvirt-check-nets. It will be
> +# called by NetworkManager every time a network interface is taken up
> +# or down. When a new network comes up, it checks the libvirt virtual
> +# networks to see if their IP address(es) (including any static
> +# routes) are in conflict with the IP address(es) (or static routes)
> +# of the newly added interface. If so, the libvirt network is
> +# disabled. It is assumed that the user will notice that their guests
> +# no longer have network connectvity (and/or the message logged by
> +# this script), see that the network has been disabled, and then
> +# realize the conflict when they try to restart it.
> +#
> +# set checkDefaultOnly=False to check *all* active libvirt networks
> +# for conflicts with the new interface. Set to True to check only the
> +# libvirt default network (since most networks other than the default
> +# network are added post-install at a time when all of the hosts other
> +# networks are already active, it may be overkill to check all of the
> +# libvirt networks for conflict here (and instead just add more
> +# needless overheard to bringing up a new host interface).
> +#
> +checkDefaultOnly = False
> +
> +# NB: since this file is installed in /usr/lib, it really shouldn't be
> +# modified by the user, but instead should be copied to
> +# /etc/NetworkManager/dispatcher.d, where it will override the copy in
> +# /usr/lib. Even that isn't a proper solution though - if we're going
> +# to actually have this config knob, perhaps we should check for it in
> +# the environment, and if someone wants to modify it they can put a
> +# short script in /etc that exports and environment variable and then
> +# calls this script? Just thinking out loud here...
> +
> +def checkconflict(conn, netname, hostnets, hostif):
> +
> + # ignore if the network has been brought down or removed since we
> + # got the list
> + try:
> + net = conn.networkLookupByName(netname)
> + except libvirt.libvirtError:
> + return
> +
> + if not net.isActive():
> + return
> +
> + xml = net.XMLDesc()
> + doc = libxml2.parseDoc(xml)
> + ctx = doc.xpathNewContext()
> +
> + # see if NetworkManager is informing us that this libvirt network
> + # itself is coming online
> + bridge = ctx.xpathEval("/network/bridge/@name");
> + if bridge and bridge[0].content == hostif:
> + return
Is it worth filtering based on the @type attribute too or are
we ok relying on ?
Do you mean the <forward> element mode attribute? Or were you thinking
of a domain <interface> element?
In this case, if there is a <bridge name='blah'/> in the network
definition, then the network definitely uses a bridge device named
"blah", so if that's the name NM is sending to the script, then it's a
case of them notifying us that we're starting our own network.
> +
> + # check *all* the addresses of this network
> + addrs = ctx.xpathEval("/network/*[@address]")
Shouldn't we match /network/ip/ here ? This wildcard also
matches /network/route/ - is that desirable ?
Yes. If there is a static <route> in a network definition that exactly
matches a host network, then it will obscure the host's network too. At
first I was matching them separately, but that was extra code, and I
didn't know the right xpath syntax to search for "/network/(route|ip)"
(that's not it), and I was tired, so I just switched to looking for any
element with an address attribute. This *does* erroneously match "<mac
address='blah'/>, but that ends up being weeded out because I later drop
anything that doesn't also have either @prefix or @netmask. (I still
think that's ugly though, and if someone is able to decipher the xpath
documentation and tell me how to search for <ip> and <route> elements
with the same xpath expression, I'd be really happy to hear about it!)
> + for ip in addrs:
> + ctx.setContextNode(ip)
> + address = ctx.xpathEval("@address")
> + prefix = ctx.xpathEval("@prefix")
> + netmask = ctx.xpathEval("@netmask")
> +
> + if not (address and len(address[0].content)):
> + continue
> +
> + addrstr = address[0].content
> + if not (prefix and len(prefix[0].content)):
> + # check for a netmask
> + if not (netmask and len(netmask[0].content)):
> + # this element has address, but no prefix or netmask
> + # probably it is <mac address ...> so we can ignore it
> + continue
> + # convert netmask to prefix
> + prefixstr = str(ip_network("0.0.0.0/%s" %
netmask[0].content).prefixlen)
> + else:
> + prefixstr = prefix[0].content
> +
> + virtnetaddress = ip_network("%s/%s" % (addrstr, prefixstr), strict
= False)
> + # print("network %s address %s" % (netname, str(virtnetaddress)))
> + for hostnet in hostnets:
> + if virtnetaddress == hostnet:
> + # There is a conflict with this libvirt network and the specified
> + # net, so we need to disable the libvirt network
> + print("Conflict with host net %s when starting interface %s -
bringing down libvirt network '%s'"
> + % (str(hostnet), hostif, netname))
> + try:
> + net.destroy()
> + except libvirt.libvirtError:
> + print("Failed to destroy network %s" % netname)
> + return
> + return
> +
> +
> +def addHostNets(hostnets, countenv, addrenv):
> +
> + count = os.getenv(countenv)
> + if not count or count == 0:
> + return
> +
> + for num in range(int(count)):
> + addrstr = os.getenv("%s_%d" % (addrenv, num))
> + if not addrstr or addrstr == "":
> + continue
> +
> + net = ip_network(addrstr.split()[0], strict=False)
> + if net:
> + hostnets.append(net)
> + return
> +
> +
> +############################################################
> +
> +if sys.argv[2] != "up":
> + sys.exit(0)
> +
> +hostif = sys.argv[1]
> +
> +try:
> + conn = libvirt.open(None)
> +except libvirt.libvirtError:
> + print('Failed to open connection to the hypervisor')
> + sys.exit(0)
> +
> +if checkDefaultOnly:
> + nets = []
> + net = conn.networkLookupByName("default")
> + if not (net and net.isActive()):
> + sys.exit(0)
> + nets.append(net)
> +else:
> + nets = conn.listAllNetworks(libvirt.VIR_CONNECT_LIST_NETWORKS_ACTIVE)
> + if not nets:
> + sys.exit(0)
> +
> +# We have at least one active network. Build a list of all network
> +# routes added by the new interface, and compare that list to the list
> +# of all networks used by each active libvirt network. If any are an
> +# exact match, then we have a conflict and need to shut down the
> +# libvirt network to avoid killing host networking.
> +
> +# When NetworkManager calls scripts in /etc/NetworkManager/dispatcher.d
> +# it will have all IP addresses and routes associated with the interface
> +# that is going up or down in the following environment variables:
> +#
> +# IP4_NUM_ADDRESSES - number of IPv4 addresses
> +# IP4_ADDRESS_N - one variable for each address, starting at _0
> +# IP4_NUM_ROUTES - number of IPv5 routes
> +# IP4_ROUTE_N - one for each route, starting at _0
> +# (replace "IP4" with "IP6" and repeat)
Oh that's very nice - much easier than probing for this info ourselves.
And they even seem to gather it when it's an interface created by
someone other than NM (although I don't think they try to do anything
about routes that are added outside of NM)
> +#
> +hostnets = []
> +addHostNets(hostnets, "IP4_NUM_ADDRESSES", "IP4_ADDRESS");
> +addHostNets(hostnets, "IP4_NUM_ROUTES", "IP4_ROUTE");
> +addHostNets(hostnets, "IP6_NUM_ADDRESSES", "IP6_ADDRESS");
> +addHostNets(hostnets, "IP6_NUM_ROUTES", "IP6_ROUTE");
> +
> +for net in nets:
> +
> + checkconflict(conn, net.name(), hostnets, hostif)
> --
> 2.25.4
>
Regards,
Daniel