[libvirt PATCH 0/3] network: support NAT with IPv6

The virtual network has never supported NAT with IPv6 since this feature didn't exist at the time. NAT has been available since RHEL-7 vintage though, and it is desirable to be able to use it. This series enables it with <forward mode=3D"nat"> <nat ipv6=3D"yes"/> </forward> Note that I do NOT actually change the default.xml to enable use of IPv6 because that will cause failure if the user has force disabled IPv6 on their host kernel. Of course our current default.xml is already broken if someone has done the reverse and force disabled IPv4. We've also long had a problem with guests bringing up the default network with the same subnet as the host. We'll have this same issue with IPv6 too. On my prompting Laine proposed a way to deal with the clash by tearing down a network, if we see a real host NIC get assigned the same subnet. Meanwhile we also have complaints about the fact that libvirt does anything todo with networking in the %post of the RPM. I'm thinking that we can do something entirely different by introducing a concept of "automatic subnet selection" into the virtual network. Consider if we made default.xml be able to contain only: <network> <name>default</name> <forward/> <ip family=3D"ipv4" autoaddr=3D"yes"> <dhcp/> </ip> <ip family=3D"ipv6" autoaddr=3D"yes"/> </network> Conceptually this means - Try to gimme a subnet with IPv4 and DHCP - Try to gimme a subnet with IPv6 and RAs Now when we start the virtual network - If IPv4 is not enabled on host, don't assign addr - Else - Iterate N=3D1..254 to find a free range for IPv4 - Use 192.168.N.0/24 for subnet - Use 192.168.N.1 for host IP - Use 192.168.N.2 -> 192.168.N.254 for guest DHCP - If IPv6 is not enabled on host, don't assign addr - Else - Generate NNNN:NNNN as 4 random bytes - Use fd00:add:f00d:NNNN:NNNN::0/64 for IPv6 subnet - Use fd00:add:f00d:NNNN:NNNN::1 for host IP - Use route advertizement for IPv6 zero-conf With NNNN:NNNN, even with 1000 guests running, we have just a 0.02% chance of clashing with a guest for IPv6. The "live" XML would always reflect the currently assigned addresses Proactively monitor the address allocations of the host. If we see a conflicting address appear, take down the dnsmasq intance, generate a new subnet, bring dnsmasq back online. Ideally we would have to bring the guest network links offline and then online again to force DHCP re-assignment immediately. Daniel P. Berrang=C3=A9 (3): util: add support for IPv6 masquerade rules conf: add an attribute to turn on NAT for IPv6 virtual networks network: wire up support for IPv6 NAT rules docs/formatnetwork.html.in | 14 ++ docs/schemas/network.rng | 8 + src/conf/network_conf.c | 26 +- src/conf/network_conf.h | 2 + src/network/bridge_driver_linux.c | 23 +- src/util/viriptables.c | 33 +-- .../nat-ipv6-masquerade-linux.args | 228 ++++++++++++++++++ .../nat-ipv6-masquerade.xml | 17 ++ tests/networkxml2firewalltest.c | 1 + .../nat-network-forward-nat-ipv6.xml | 10 + .../nat-network-forward-nat-ipv6.xml | 11 + tests/networkxml2xmltest.c | 1 + 12 files changed, 343 insertions(+), 31 deletions(-) create mode 100644 tests/networkxml2firewalldata/nat-ipv6-masquerade-linux.a= rgs create mode 100644 tests/networkxml2firewalldata/nat-ipv6-masquerade.xml create mode 100644 tests/networkxml2xmlin/nat-network-forward-nat-ipv6.xml create mode 100644 tests/networkxml2xmlout/nat-network-forward-nat-ipv6.xml --=20 2.26.2

IPv6 does support masquerade since Linux 3.9.0 / ip6tables 1.4.18, which is Fedora 18 / RHEL-7 vintage, which covers all our supported Linux versions. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> --- src/util/viriptables.c | 33 +++++++++++---------------------- 1 file changed, 11 insertions(+), 22 deletions(-) diff --git a/src/util/viriptables.c b/src/util/viriptables.c index e6a1ded8d5..8ccce835b2 100644 --- a/src/util/viriptables.c +++ b/src/util/viriptables.c @@ -854,29 +854,24 @@ iptablesForwardMasquerade(virFirewallPtr fw, g_autofree char *portRangeStr = NULL; g_autofree char *natRangeStr = NULL; virFirewallRulePtr rule; + int af = VIR_SOCKET_ADDR_FAMILY(netaddr); + virFirewallLayer layer = af == AF_INET ? + VIR_FIREWALL_LAYER_IPV4 : VIR_FIREWALL_LAYER_IPV6; if (!(networkstr = iptablesFormatNetwork(netaddr, prefix))) return -1; - if (!VIR_SOCKET_ADDR_IS_FAMILY(netaddr, AF_INET)) { - /* Higher level code *should* guaranteee it's impossible to get here. */ - virReportError(VIR_ERR_INTERNAL_ERROR, - _("Attempted to NAT '%s'. NAT is only supported for IPv4."), - networkstr); - return -1; - } - - if (VIR_SOCKET_ADDR_IS_FAMILY(&addr->start, AF_INET)) { + if (VIR_SOCKET_ADDR_IS_FAMILY(&addr->start, af)) { if (!(addrStartStr = virSocketAddrFormat(&addr->start))) return -1; - if (VIR_SOCKET_ADDR_IS_FAMILY(&addr->end, AF_INET)) { + if (VIR_SOCKET_ADDR_IS_FAMILY(&addr->end, af)) { if (!(addrEndStr = virSocketAddrFormat(&addr->end))) return -1; } } if (protocol && protocol[0]) { - rule = virFirewallAddRule(fw, VIR_FIREWALL_LAYER_IPV4, + rule = virFirewallAddRule(fw, layer, "--table", "nat", action == ADD ? "--insert" : "--delete", pvt ? "LIBVIRT_PRT" : "POSTROUTING", @@ -885,7 +880,7 @@ iptablesForwardMasquerade(virFirewallPtr fw, "!", "--destination", networkstr, NULL); } else { - rule = virFirewallAddRule(fw, VIR_FIREWALL_LAYER_IPV4, + rule = virFirewallAddRule(fw, layer, "--table", "nat", action == ADD ? "--insert" : "--delete", pvt ? "LIBVIRT_PRT" : "POSTROUTING", @@ -1004,20 +999,14 @@ iptablesForwardDontMasquerade(virFirewallPtr fw, int action) { g_autofree char *networkstr = NULL; + virFirewallLayer layer = VIR_SOCKET_ADDR_FAMILY(netaddr) == AF_INET ? + VIR_FIREWALL_LAYER_IPV4 : VIR_FIREWALL_LAYER_IPV6; if (!(networkstr = iptablesFormatNetwork(netaddr, prefix))) return -1; - if (!VIR_SOCKET_ADDR_IS_FAMILY(netaddr, AF_INET)) { - /* Higher level code *should* guaranteee it's impossible to get here. */ - virReportError(VIR_ERR_INTERNAL_ERROR, - _("Attempted to NAT '%s'. NAT is only supported for IPv4."), - networkstr); - return -1; - } - if (physdev && physdev[0]) - virFirewallAddRule(fw, VIR_FIREWALL_LAYER_IPV4, + virFirewallAddRule(fw, layer, "--table", "nat", action == ADD ? "--insert" : "--delete", pvt ? "LIBVIRT_PRT" : "POSTROUTING", @@ -1027,7 +1016,7 @@ iptablesForwardDontMasquerade(virFirewallPtr fw, "--jump", "RETURN", NULL); else - virFirewallAddRule(fw, VIR_FIREWALL_LAYER_IPV4, + virFirewallAddRule(fw, layer, "--table", "nat", action == ADD ? "--insert" : "--delete", pvt ? "LIBVIRT_PRT" : "POSTROUTING", -- 2.26.2

Historically IPv6 did not support NAT, so when IPv6 was added to libvirt's virtual networks, when requesting <forward mode="nat"/> libvirt will NOT apply NAT to IPv6 traffic, only IPv4 traffic. This is an annoying historical design decision as it means we cannot enable IPv6 automatically. We thus need to introduce a new attribute <forward mode="nat"> <nat ipv6="yes"/> </forward> The new attribute is a tri-state, so it leaves open the possibility of us intentionally changing the default behaviour in future to honour NAT for IPv6. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> --- docs/formatnetwork.html.in | 14 ++++++++++ docs/schemas/network.rng | 8 ++++++ src/conf/network_conf.c | 26 +++++++++++++++++-- src/conf/network_conf.h | 2 ++ .../nat-network-forward-nat-ipv6.xml | 10 +++++++ .../nat-network-forward-nat-ipv6.xml | 11 ++++++++ tests/networkxml2xmltest.c | 1 + 7 files changed, 70 insertions(+), 2 deletions(-) create mode 100644 tests/networkxml2xmlin/nat-network-forward-nat-ipv6.xml create mode 100644 tests/networkxml2xmlout/nat-network-forward-nat-ipv6.xml diff --git a/docs/formatnetwork.html.in b/docs/formatnetwork.html.in index 0383e2d891..42cfb6708a 100644 --- a/docs/formatnetwork.html.in +++ b/docs/formatnetwork.html.in @@ -276,6 +276,20 @@ </nat> </forward> ...</pre> + + <p> + <span class="since">Since 6.5.0</span> it is possible to + enable NAT with IPv6 networking. As noted above, IPv6 + has historically done plain forwarding and thus to avoid + breaking historical compatibility, IPv6 NAT must be + explicitly requested + </p> + <pre> +... + <forward mode='nat'> + <nat ipv6='yes'/> + </forward> +...</pre> </dd> <dt><code>route</code></dt> diff --git a/docs/schemas/network.rng b/docs/schemas/network.rng index 88b6f4dfdd..d9448fa3bb 100644 --- a/docs/schemas/network.rng +++ b/docs/schemas/network.rng @@ -181,6 +181,14 @@ </optional> <optional> <element name='nat'> + <optional> + <attribute name="ipv6"> + <choice> + <value>yes</value> + <value>no</value> + </choice> + </attribute> + </optional> <interleave> <optional> <element name='address'> diff --git a/src/conf/network_conf.c b/src/conf/network_conf.c index f1d22b25b1..cd7683e94b 100644 --- a/src/conf/network_conf.c +++ b/src/conf/network_conf.c @@ -1358,6 +1358,7 @@ virNetworkForwardNatDefParseXML(const char *networkName, int nNatAddrs, nNatPorts; char *addrStart = NULL; char *addrEnd = NULL; + char *ipv6 = NULL; VIR_XPATH_NODE_AUTORESTORE(ctxt); ctxt->node = node; @@ -1369,6 +1370,19 @@ virNetworkForwardNatDefParseXML(const char *networkName, goto cleanup; } + ipv6 = virXMLPropString(node, "ipv6"); + if (ipv6) { + if ((def->natIPv6 + = virTristateBoolTypeFromString(ipv6)) <= 0) { + virReportError(VIR_ERR_XML_ERROR, + _("Invalid ipv6 setting '%s' " + "in network '%s' NAT"), + ipv6, networkName); + goto cleanup; + } + VIR_FREE(ipv6); + } + /* addresses for SNAT */ nNatAddrs = virXPathNodeSet("./address", ctxt, &natAddrNodes); if (nNatAddrs < 0) { @@ -2516,10 +2530,18 @@ virNetworkForwardNatDefFormat(virBufferPtr buf, goto cleanup; } - if (!addrEnd && !addrStart && !fwd->port.start && !fwd->port.end) + if (!addrEnd && !addrStart && !fwd->port.start && !fwd->port.end && !fwd->natIPv6) return 0; - virBufferAddLit(buf, "<nat>\n"); + virBufferAddLit(buf, "<nat"); + if (fwd->natIPv6) + virBufferAsprintf(buf, " ipv6='%s'", virTristateBoolTypeToString(fwd->natIPv6)); + + if (!addrEnd && !addrStart && !fwd->port.start && !fwd->port.end) { + virBufferAddLit(buf, "/>\n"); + return 0; + } + virBufferAddLit(buf, ">\n"); virBufferAdjustIndent(buf, 2); if (addrStart) { diff --git a/src/conf/network_conf.h b/src/conf/network_conf.h index f2dc388ef0..e3a61c62ea 100644 --- a/src/conf/network_conf.h +++ b/src/conf/network_conf.h @@ -244,6 +244,8 @@ struct _virNetworkForwardDef { /* ranges for NAT */ virSocketAddrRange addr; virPortRange port; + + virTristateBool natIPv6; }; typedef struct _virPortGroupDef virPortGroupDef; diff --git a/tests/networkxml2xmlin/nat-network-forward-nat-ipv6.xml b/tests/networkxml2xmlin/nat-network-forward-nat-ipv6.xml new file mode 100644 index 0000000000..c3b641224c --- /dev/null +++ b/tests/networkxml2xmlin/nat-network-forward-nat-ipv6.xml @@ -0,0 +1,10 @@ +<network> + <name>default</name> + <uuid>81ff0d90-c91e-6742-64da-4a736edb9a9b</uuid> + <bridge name="virbr0"/> + <forward mode="nat" dev="eth1"> + <nat ipv6="yes"/> + </forward> + <ip family="ipv6" address="2001:db8:ac10:fe01::1" prefix="64"> + </ip> +</network> diff --git a/tests/networkxml2xmlout/nat-network-forward-nat-ipv6.xml b/tests/networkxml2xmlout/nat-network-forward-nat-ipv6.xml new file mode 100644 index 0000000000..74e1c36c69 --- /dev/null +++ b/tests/networkxml2xmlout/nat-network-forward-nat-ipv6.xml @@ -0,0 +1,11 @@ +<network> + <name>default</name> + <uuid>81ff0d90-c91e-6742-64da-4a736edb9a9b</uuid> + <forward dev='eth1' mode='nat'> + <nat ipv6='yes'/> + <interface dev='eth1'/> + </forward> + <bridge name='virbr0' stp='on' delay='0'/> + <ip family='ipv6' address='2001:db8:ac10:fe01::1' prefix='64'> + </ip> +</network> diff --git a/tests/networkxml2xmltest.c b/tests/networkxml2xmltest.c index 700744785a..17817418b7 100644 --- a/tests/networkxml2xmltest.c +++ b/tests/networkxml2xmltest.c @@ -140,6 +140,7 @@ mymain(void) DO_TEST("nat-network-dns-forward-plain"); DO_TEST("nat-network-dns-forwarders"); DO_TEST("nat-network-dns-forwarder-no-resolv"); + DO_TEST("nat-network-forward-nat-ipv6"); DO_TEST("nat-network-forward-nat-address"); DO_TEST("nat-network-forward-nat-no-address"); DO_TEST("nat-network-mtu"); -- 2.26.2

On 6/8/20 10:51 AM, Daniel P. Berrangé wrote:
Historically IPv6 did not support NAT, so when IPv6 was added to libvirt's virtual networks, when requesting <forward mode="nat"/> libvirt will NOT apply NAT to IPv6 traffic, only IPv4 traffic.
This is an annoying historical design decision as it means we cannot enable IPv6 automatically. We thus need to introduce a new attribute
<forward mode="nat"> <nat ipv6="yes"/> </forward>
The new attribute is a tri-state, so it leaves open the possibility of us intentionally changing the default behaviour in future to honour NAT for IPv6.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> --- docs/formatnetwork.html.in | 14 ++++++++++ docs/schemas/network.rng | 8 ++++++ src/conf/network_conf.c | 26 +++++++++++++++++-- src/conf/network_conf.h | 2 ++ .../nat-network-forward-nat-ipv6.xml | 10 +++++++ .../nat-network-forward-nat-ipv6.xml | 11 ++++++++ tests/networkxml2xmltest.c | 1 + 7 files changed, 70 insertions(+), 2 deletions(-) create mode 100644 tests/networkxml2xmlin/nat-network-forward-nat-ipv6.xml create mode 100644 tests/networkxml2xmlout/nat-network-forward-nat-ipv6.xml
diff --git a/docs/formatnetwork.html.in b/docs/formatnetwork.html.in index 0383e2d891..42cfb6708a 100644 --- a/docs/formatnetwork.html.in +++ b/docs/formatnetwork.html.in @@ -276,6 +276,20 @@ </nat> </forward> ...</pre> + + <p> + <span class="since">Since 6.5.0</span> it is possible to + enable NAT with IPv6 networking. As noted above, IPv6 + has historically done plain forwarding and thus to avoid + breaking historical compatibility, IPv6 NAT must be + explicitly requested
Missing . at the end.
+ </p> + <pre> +... + <forward mode='nat'> + <nat ipv6='yes'/> + </forward> +...</pre> </dd>
<dt><code>route</code></dt> diff --git a/docs/schemas/network.rng b/docs/schemas/network.rng index 88b6f4dfdd..d9448fa3bb 100644 --- a/docs/schemas/network.rng +++ b/docs/schemas/network.rng @@ -181,6 +181,14 @@ </optional> <optional> <element name='nat'> + <optional> + <attribute name="ipv6"> + <choice> + <value>yes</value> + <value>no</value> + </choice> + </attribute>
<ref name="virYesNo"/>
+ </optional> <interleave> <optional> <element name='address'> diff --git a/src/conf/network_conf.c b/src/conf/network_conf.c index f1d22b25b1..cd7683e94b 100644 --- a/src/conf/network_conf.c +++ b/src/conf/network_conf.c @@ -1358,6 +1358,7 @@ virNetworkForwardNatDefParseXML(const char *networkName, int nNatAddrs, nNatPorts; char *addrStart = NULL; char *addrEnd = NULL; + char *ipv6 = NULL; VIR_XPATH_NODE_AUTORESTORE(ctxt);
ctxt->node = node; @@ -1369,6 +1370,19 @@ virNetworkForwardNatDefParseXML(const char *networkName, goto cleanup; }
+ ipv6 = virXMLPropString(node, "ipv6"); + if (ipv6) { + if ((def->natIPv6 + = virTristateBoolTypeFromString(ipv6)) <= 0) {
You need to parse this into a temporary int, otherwise you'll never see < 0, and so won't be able to detect errors.
+ virReportError(VIR_ERR_XML_ERROR, + _("Invalid ipv6 setting '%s' " + "in network '%s' NAT"), + ipv6, networkName); + goto cleanup; + } + VIR_FREE(ipv6); + } + /* addresses for SNAT */ nNatAddrs = virXPathNodeSet("./address", ctxt, &natAddrNodes); if (nNatAddrs < 0) { @@ -2516,10 +2530,18 @@ virNetworkForwardNatDefFormat(virBufferPtr buf, goto cleanup; }
- if (!addrEnd && !addrStart && !fwd->port.start && !fwd->port.end) + if (!addrEnd && !addrStart && !fwd->port.start && !fwd->port.end && !fwd->natIPv6)
You'd *think* it would be enough to just add !fwd->natIPv6 here. But you'd unfortunately be wrong. :-( The problem is that in virNetworkDefFormatBuf() there is a local called "shortforward" - if shortforward is true, then the <forward> element is completed in a single line. And sadly, rather than being set based on whether or not virNetworkForwaddNatDefFormat() produces any output, shortforward is set by directly checking a bunch of attributes (includeing def->forward.port.(start|end).) The result is that with the current code you have, when you parse and format the example from the commit log message, you end up with just: <forward mode='nat'/> <nat ipv6='yes'/> (note the <forward> element ends early). You could just add another check for natIPv6 to virNetworkDefFormatBuf(), or you could go the whole 9 yards, do the honorable thing and make it depend on whether or not the subordinate function produces output.
return 0;
- virBufferAddLit(buf, "<nat>\n"); + virBufferAddLit(buf, "<nat"); + if (fwd->natIPv6) + virBufferAsprintf(buf, " ipv6='%s'", virTristateBoolTypeToString(fwd->natIPv6)); + + if (!addrEnd && !addrStart && !fwd->port.start && !fwd->port.end) { + virBufferAddLit(buf, "/>\n"); + return 0; + }
I think somebody is going to insist you put a blank line after the } (I personally don't care, but pedants lurk in every shadow! :-)
+ virBufferAddLit(buf, ">\n"); virBufferAdjustIndent(buf, 2);
if (addrStart) { diff --git a/src/conf/network_conf.h b/src/conf/network_conf.h index f2dc388ef0..e3a61c62ea 100644 --- a/src/conf/network_conf.h +++ b/src/conf/network_conf.h @@ -244,6 +244,8 @@ struct _virNetworkForwardDef { /* ranges for NAT */ virSocketAddrRange addr; virPortRange port; + + virTristateBool natIPv6; };
typedef struct _virPortGroupDef virPortGroupDef; diff --git a/tests/networkxml2xmlin/nat-network-forward-nat-ipv6.xml b/tests/networkxml2xmlin/nat-network-forward-nat-ipv6.xml new file mode 100644 index 0000000000..c3b641224c --- /dev/null +++ b/tests/networkxml2xmlin/nat-network-forward-nat-ipv6.xml @@ -0,0 +1,10 @@ +<network> + <name>default</name> + <uuid>81ff0d90-c91e-6742-64da-4a736edb9a9b</uuid> + <bridge name="virbr0"/> + <forward mode="nat" dev="eth1"> + <nat ipv6="yes"/> + </forward> + <ip family="ipv6" address="2001:db8:ac10:fe01::1" prefix="64"> + </ip> +</network> diff --git a/tests/networkxml2xmlout/nat-network-forward-nat-ipv6.xml b/tests/networkxml2xmlout/nat-network-forward-nat-ipv6.xml new file mode 100644 index 0000000000..74e1c36c69 --- /dev/null +++ b/tests/networkxml2xmlout/nat-network-forward-nat-ipv6.xml @@ -0,0 +1,11 @@ +<network> + <name>default</name> + <uuid>81ff0d90-c91e-6742-64da-4a736edb9a9b</uuid> + <forward dev='eth1' mode='nat'> + <nat ipv6='yes'/> + <interface dev='eth1'/>
Aha! This is the reason you didn't notice it in your unit test - the unit test also specifies an interface dev, which *is* checked for "shortforward".
+ </forward> + <bridge name='virbr0' stp='on' delay='0'/> + <ip family='ipv6' address='2001:db8:ac10:fe01::1' prefix='64'> + </ip> +</network> diff --git a/tests/networkxml2xmltest.c b/tests/networkxml2xmltest.c index 700744785a..17817418b7 100644 --- a/tests/networkxml2xmltest.c +++ b/tests/networkxml2xmltest.c @@ -140,6 +140,7 @@ mymain(void) DO_TEST("nat-network-dns-forward-plain"); DO_TEST("nat-network-dns-forwarders"); DO_TEST("nat-network-dns-forwarder-no-resolv"); + DO_TEST("nat-network-forward-nat-ipv6"); DO_TEST("nat-network-forward-nat-address"); DO_TEST("nat-network-forward-nat-no-address"); DO_TEST("nat-network-mtu");

Now that we have support for IPv6 in the iptables helpers, and a new option in the XML schema, we can wire up support for it in the network driver. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> --- src/network/bridge_driver_linux.c | 23 +- .../nat-ipv6-masquerade-linux.args | 228 ++++++++++++++++++ .../nat-ipv6-masquerade.xml | 17 ++ tests/networkxml2firewalltest.c | 1 + 4 files changed, 262 insertions(+), 7 deletions(-) create mode 100644 tests/networkxml2firewalldata/nat-ipv6-masquerade-linux.args create mode 100644 tests/networkxml2firewalldata/nat-ipv6-masquerade.xml diff --git a/src/network/bridge_driver_linux.c b/src/network/bridge_driver_linux.c index b0bd207250..fcb3803965 100644 --- a/src/network/bridge_driver_linux.c +++ b/src/network/bridge_driver_linux.c @@ -307,7 +307,8 @@ int networkCheckRouteCollision(virNetworkDefPtr def) return ret; } -static const char networkLocalMulticast[] = "224.0.0.0/24"; +static const char networkLocalMulticastIPv4[] = "224.0.0.0/24"; +static const char networkLocalMulticastIPv6[] = "ffx2::/16"; static const char networkLocalBroadcast[] = "255.255.255.255/32"; static int @@ -317,6 +318,7 @@ networkAddMasqueradingFirewallRules(virFirewallPtr fw, { int prefix = virNetworkIPDefPrefix(ipdef); const char *forwardIf = virNetworkDefForwardIf(def, 0); + bool isIPv4 = VIR_SOCKET_ADDR_IS_FAMILY(&ipdef->address, AF_INET); if (prefix < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, @@ -406,7 +408,8 @@ networkAddMasqueradingFirewallRules(virFirewallPtr fw, return -1; /* exempt local network broadcast address as destination */ - if (iptablesAddDontMasquerade(fw, + if (isIPv4 && + iptablesAddDontMasquerade(fw, &ipdef->address, prefix, forwardIf, @@ -418,7 +421,8 @@ networkAddMasqueradingFirewallRules(virFirewallPtr fw, &ipdef->address, prefix, forwardIf, - networkLocalMulticast) < 0) + isIPv4 ? networkLocalMulticastIPv4 : + networkLocalMulticastIPv6) < 0) return -1; return 0; @@ -431,6 +435,7 @@ networkRemoveMasqueradingFirewallRules(virFirewallPtr fw, { int prefix = virNetworkIPDefPrefix(ipdef); const char *forwardIf = virNetworkDefForwardIf(def, 0); + bool isIPv4 = VIR_SOCKET_ADDR_IS_FAMILY(&ipdef->address, AF_INET); if (prefix < 0) return 0; @@ -439,10 +444,12 @@ networkRemoveMasqueradingFirewallRules(virFirewallPtr fw, &ipdef->address, prefix, forwardIf, - networkLocalMulticast) < 0) + isIPv4 ? networkLocalMulticastIPv4 : + networkLocalMulticastIPv6) < 0) return -1; - if (iptablesRemoveDontMasquerade(fw, + if (isIPv4 && + iptablesRemoveDontMasquerade(fw, &ipdef->address, prefix, forwardIf, @@ -769,7 +776,8 @@ networkAddIPSpecificFirewallRules(virFirewallPtr fw, */ if (def->forward.type == VIR_NETWORK_FORWARD_NAT) { - if (VIR_SOCKET_ADDR_IS_FAMILY(&ipdef->address, AF_INET)) + if (VIR_SOCKET_ADDR_IS_FAMILY(&ipdef->address, AF_INET) || + def->forward.natIPv6 == VIR_TRISTATE_BOOL_YES) return networkAddMasqueradingFirewallRules(fw, def, ipdef); else if (VIR_SOCKET_ADDR_IS_FAMILY(&ipdef->address, AF_INET6)) return networkAddRoutingFirewallRules(fw, def, ipdef); @@ -786,7 +794,8 @@ networkRemoveIPSpecificFirewallRules(virFirewallPtr fw, virNetworkIPDefPtr ipdef) { if (def->forward.type == VIR_NETWORK_FORWARD_NAT) { - if (VIR_SOCKET_ADDR_IS_FAMILY(&ipdef->address, AF_INET)) + if (VIR_SOCKET_ADDR_IS_FAMILY(&ipdef->address, AF_INET) || + def->forward.natIPv6 == VIR_TRISTATE_BOOL_YES) return networkRemoveMasqueradingFirewallRules(fw, def, ipdef); else if (VIR_SOCKET_ADDR_IS_FAMILY(&ipdef->address, AF_INET6)) return networkRemoveRoutingFirewallRules(fw, def, ipdef); diff --git a/tests/networkxml2firewalldata/nat-ipv6-masquerade-linux.args b/tests/networkxml2firewalldata/nat-ipv6-masquerade-linux.args new file mode 100644 index 0000000000..4ba4c3da30 --- /dev/null +++ b/tests/networkxml2firewalldata/nat-ipv6-masquerade-linux.args @@ -0,0 +1,228 @@ +iptables \ +--table filter \ +--insert LIBVIRT_INP \ +--in-interface virbr0 \ +--protocol tcp \ +--destination-port 67 \ +--jump ACCEPT +iptables \ +--table filter \ +--insert LIBVIRT_INP \ +--in-interface virbr0 \ +--protocol udp \ +--destination-port 67 \ +--jump ACCEPT +iptables \ +--table filter \ +--insert LIBVIRT_OUT \ +--out-interface virbr0 \ +--protocol tcp \ +--destination-port 68 \ +--jump ACCEPT +iptables \ +--table filter \ +--insert LIBVIRT_OUT \ +--out-interface virbr0 \ +--protocol udp \ +--destination-port 68 \ +--jump ACCEPT +iptables \ +--table filter \ +--insert LIBVIRT_INP \ +--in-interface virbr0 \ +--protocol tcp \ +--destination-port 53 \ +--jump ACCEPT +iptables \ +--table filter \ +--insert LIBVIRT_INP \ +--in-interface virbr0 \ +--protocol udp \ +--destination-port 53 \ +--jump ACCEPT +iptables \ +--table filter \ +--insert LIBVIRT_OUT \ +--out-interface virbr0 \ +--protocol tcp \ +--destination-port 53 \ +--jump ACCEPT +iptables \ +--table filter \ +--insert LIBVIRT_OUT \ +--out-interface virbr0 \ +--protocol udp \ +--destination-port 53 \ +--jump ACCEPT +iptables \ +--table filter \ +--insert LIBVIRT_FWO \ +--in-interface virbr0 \ +--jump REJECT +iptables \ +--table filter \ +--insert LIBVIRT_FWI \ +--out-interface virbr0 \ +--jump REJECT +iptables \ +--table filter \ +--insert LIBVIRT_FWX \ +--in-interface virbr0 \ +--out-interface virbr0 \ +--jump ACCEPT +ip6tables \ +--table filter \ +--insert LIBVIRT_FWO \ +--in-interface virbr0 \ +--jump REJECT +ip6tables \ +--table filter \ +--insert LIBVIRT_FWI \ +--out-interface virbr0 \ +--jump REJECT +ip6tables \ +--table filter \ +--insert LIBVIRT_FWX \ +--in-interface virbr0 \ +--out-interface virbr0 \ +--jump ACCEPT +ip6tables \ +--table filter \ +--insert LIBVIRT_INP \ +--in-interface virbr0 \ +--protocol tcp \ +--destination-port 53 \ +--jump ACCEPT +ip6tables \ +--table filter \ +--insert LIBVIRT_INP \ +--in-interface virbr0 \ +--protocol udp \ +--destination-port 53 \ +--jump ACCEPT +ip6tables \ +--table filter \ +--insert LIBVIRT_OUT \ +--out-interface virbr0 \ +--protocol tcp \ +--destination-port 53 \ +--jump ACCEPT +ip6tables \ +--table filter \ +--insert LIBVIRT_OUT \ +--out-interface virbr0 \ +--protocol udp \ +--destination-port 53 \ +--jump ACCEPT +ip6tables \ +--table filter \ +--insert LIBVIRT_INP \ +--in-interface virbr0 \ +--protocol udp \ +--destination-port 547 \ +--jump ACCEPT +ip6tables \ +--table filter \ +--insert LIBVIRT_OUT \ +--out-interface virbr0 \ +--protocol udp \ +--destination-port 546 \ +--jump ACCEPT +iptables \ +--table filter \ +--insert LIBVIRT_FWO \ +--source 192.168.122.0/24 \ +--in-interface virbr0 \ +--jump ACCEPT +iptables \ +--table filter \ +--insert LIBVIRT_FWI \ +--destination 192.168.122.0/24 \ +--out-interface virbr0 \ +--match conntrack \ +--ctstate ESTABLISHED,RELATED \ +--jump ACCEPT +iptables \ +--table nat \ +--insert LIBVIRT_PRT \ +--source 192.168.122.0/24 '!' \ +--destination 192.168.122.0/24 \ +--jump MASQUERADE +iptables \ +--table nat \ +--insert LIBVIRT_PRT \ +--source 192.168.122.0/24 \ +-p udp '!' \ +--destination 192.168.122.0/24 \ +--jump MASQUERADE \ +--to-ports 1024-65535 +iptables \ +--table nat \ +--insert LIBVIRT_PRT \ +--source 192.168.122.0/24 \ +-p tcp '!' \ +--destination 192.168.122.0/24 \ +--jump MASQUERADE \ +--to-ports 1024-65535 +iptables \ +--table nat \ +--insert LIBVIRT_PRT \ +--source 192.168.122.0/24 \ +--destination 255.255.255.255/32 \ +--jump RETURN +iptables \ +--table nat \ +--insert LIBVIRT_PRT \ +--source 192.168.122.0/24 \ +--destination 224.0.0.0/24 \ +--jump RETURN +ip6tables \ +--table filter \ +--insert LIBVIRT_FWO \ +--source 2001:db8:ca2:2::/64 \ +--in-interface virbr0 \ +--jump ACCEPT +ip6tables \ +--table filter \ +--insert LIBVIRT_FWI \ +--destination 2001:db8:ca2:2::/64 \ +--out-interface virbr0 \ +--match conntrack \ +--ctstate ESTABLISHED,RELATED \ +--jump ACCEPT +ip6tables \ +--table nat \ +--insert LIBVIRT_PRT \ +--source 2001:db8:ca2:2::/64 '!' \ +--destination 2001:db8:ca2:2::/64 \ +--jump MASQUERADE +ip6tables \ +--table nat \ +--insert LIBVIRT_PRT \ +--source 2001:db8:ca2:2::/64 \ +-p udp '!' \ +--destination 2001:db8:ca2:2::/64 \ +--jump MASQUERADE \ +--to-ports 1024-65535 +ip6tables \ +--table nat \ +--insert LIBVIRT_PRT \ +--source 2001:db8:ca2:2::/64 \ +-p tcp '!' \ +--destination 2001:db8:ca2:2::/64 \ +--jump MASQUERADE \ +--to-ports 1024-65535 +ip6tables \ +--table nat \ +--insert LIBVIRT_PRT \ +--source 2001:db8:ca2:2::/64 \ +--destination ffx2::/16 \ +--jump RETURN +iptables \ +--table mangle \ +--insert LIBVIRT_PRT \ +--out-interface virbr0 \ +--protocol udp \ +--destination-port 68 \ +--jump CHECKSUM \ +--checksum-fill diff --git a/tests/networkxml2firewalldata/nat-ipv6-masquerade.xml b/tests/networkxml2firewalldata/nat-ipv6-masquerade.xml new file mode 100644 index 0000000000..03bcc8c67d --- /dev/null +++ b/tests/networkxml2firewalldata/nat-ipv6-masquerade.xml @@ -0,0 +1,17 @@ +<network> + <name>default</name> + <bridge name="virbr0"/> + <forward> + <nat ipv6="yes"/> + </forward> + <ip address="192.168.122.1" netmask="255.255.255.0"> + <dhcp> + <range start="192.168.122.2" end="192.168.122.254"/> + </dhcp> + </ip> + <ip family="ipv6" address="2001:db8:ca2:2::1" prefix="64" > + <dhcp> + <range start="2001:db8:ca2:2:1::10" end="2001:db8:ca2:2:1::ff" /> + </dhcp> + </ip> +</network> diff --git a/tests/networkxml2firewalltest.c b/tests/networkxml2firewalltest.c index 0ad5e2303b..697bfd7e03 100644 --- a/tests/networkxml2firewalltest.c +++ b/tests/networkxml2firewalltest.c @@ -173,6 +173,7 @@ mymain(void) DO_TEST("nat-many-ips"); DO_TEST("nat-no-dhcp"); DO_TEST("nat-ipv6"); + DO_TEST("nat-ipv6-masquerade"); DO_TEST("route-default"); return ret == 0 ? EXIT_SUCCESS : EXIT_FAILURE; -- 2.26.2

(After typing a lot here and spewing out some ideas, I've gotten my mind back on track and realized that I've mostly been talking about the design of *other* network-related stuff rather than reviewing IPv6-NAT enablement! I just saw your comments as a good starting point for getting out some thoughts I've had on the same subject. I'll try to not digress so much when reviewing the patches themselves :-) On 6/8/20 10:51 AM, Daniel P. Berrangé wrote:
The virtual network has never supported NAT with IPv6 since this feature didn't exist at the time. NAT has been available since RHEL-7 vintage though, and it is desirable to be able to use it.
This series enables it with
<forward mode=3D"nat"> <nat ipv6=3D"yes"/> </forward>
I've had this lurking on my "this is something I should do" list for a long time, but couldn't decide on the best name in XML (and also figured that the problem with accept_ra needed to be fixed first), so it never got to the top. So I'm glad to see you've done it, disappointed in myself that I never did it :-/ I like your XML knob naming better than what I'd considered. I had thought of having <forward mode='supernat'> (or some other more reasonable extra mode), but your proposal is more orthogonal and matches with the existing ipv6='yes' at the toplevel of <network> (which is used to enable ipv6 traffic between guests on the bridge even when there are no IPv6 addresses configured for the network.)
Note that I do NOT actually change the default.xml to enable use of IPv6 because that will cause failure if the user has force disabled IPv6 on their host kernel.
Of course our current default.xml is already broken if someone has done the reverse and force disabled IPv4.
We've also long had a problem with guests bringing up the default network with the same subnet as the host. We'll have this same issue with IPv6 too.
On my prompting Laine proposed a way to deal with the clash by tearing down a network, if we see a real host NIC get assigned the same subnet.
I need to re-send that as a normal patch rather than an RFC. My recollection is that there weren't objections to the way I'd done it, but I'll go back and look at the review comments again.
Meanwhile we also have complaints about the fact that libvirt does anything todo with networking in the %post of the RPM.
I'm thinking that we can do something entirely different by introducing a concept of "automatic subnet selection" into the virtual network.
Yes! I had the same idea, but thought of calling it "super-default". (I really seem to be liking "super-anything" names lately...)
Consider if we made default.xml be able to contain only:
<network> <name>default</name> <forward/> <ip family=3D"ipv4" autoaddr=3D"yes"> <dhcp/> </ip> <ip family=3D"ipv6" autoaddr=3D"yes"/>
(as discussed in IRC today - git-publish bug there, escaping the = sign) I like this name too. How do I always come up with such horrible names and you come up with good names?
</network>
Conceptually this means
- Try to gimme a subnet with IPv4 and DHCP - Try to gimme a subnet with IPv6 and RAs
Now when we start the virtual network
- If IPv4 is not enabled on host, don't assign addr
What will we use to check for this? Not just "no IP addresses configured", I guess, since it may be the case that libvirt has just happened to come up before NM or whoever has started any networks. (or maybe someone wants to use IPv6 on a libvirt virtual network, but have no IPv6 connectivity beyond the host).
- Else - Iterate N=3D1..254 to find a free range for IPv4 - Use 192.168.N.0/24 for subnet - Use 192.168.N.1 for host IP - Use 192.168.N.2 -> 192.168.N.254 for guest DHCP
- If IPv6 is not enabled on host, don't assign addr - Else - Generate NNNN:NNNN as 4 random bytes - Use fd00:add:f00d:NNNN:NNNN::0/64 for IPv6 subnet - Use fd00:add:f00d:NNNN:NNNN::1 for host IP - Use route advertizement for IPv6 zero-conf
With NNNN:NNNN, even with 1000 guests running, we have just a 0.02% chance of clashing with a guest for IPv6.
The "live" XML would always reflect the currently assigned addresses
Proactively monitor the address allocations of the host. If we see a conflicting address appear, take down the dnsmasq intance, generate a new subnet, bring dnsmasq back online.
Hmm. How would you see this monitoring happening? We couldn't do it with an external script like I had done for simple "shut down on conflict" without adding extra functionality to libvirt's network driver. We *could* go back to the idea of monitoring netlink change messages ourselves within libvirtd and doing it all internally ourselves. Or maybe the NM script I proposed could go beyond simply destroying conflicting networks, and also restart any network that had autoaddr='yes'; to make this fully functional we would need to finally put in the proper stuff so that tap devices (and the underlying emulated NICs) would be set offline when their connected network was destroyed, and then reconnected/set online when the network was re-started. Getting the networks to behave this way would be useful in general anyway, even without thinking about the conflicting-networks problem. The one downside of externally controlling renumbering-on-conflict using an external script is that it would only work with NetworkManager...
Ideally we would have to bring the guest network links offline and then online again to force DHCP re-assignment immediately.
Yeah, I think it really makes sense that when a libvirt network is destroyed, all the tap devices are set offline, and the emulated NICs are set offline as well; then when a libvirt network is started, we would go through all devices that are supposed to be connected to that network, reconnect the taps, set them online, and set the emulated NIC online. We currently do the reconnection part when libvirtd is restarted but can't do it immediately when a *network* is restarted because the network driver has no access to the list of active guests and their interfaces.... Hmm, we do now maintain the list of ports for each network though, and it would be possible to expand that to keep the name of the tap device associated with the port in addition to the other info (e.g. whether or not the NIC has been set offline via an API call), *but* when a network is destroyed, all ports registered with that network are also destroyed, so just expanding the attributes for the ports isn't going to get us where we need. So, do we want to 1) change it to maintain active ports for a network when it is destroyed so that they can be easily reactivated when the network is restarted? Or do we want to 2) change the network driver to make calls to all registered hypervisor drivers during a net-start to look for all guest interfaces that think they are connected to the network? The former sounds much more efficient, but I don't know how "dirty" it seems to maintain state for something that has been "destroyed"... Or maybe we instead need to also add a new API for networks virNetworkReconnect(), which will use newly expanded info in the network ports list to reconnect all guest interfaces. ** On a different sub-topic - it would be nice to provide some stability to the subnet used for an autoaddr='yes' network (think of the case where every time a host is booted, libvirt starts its default network when 192.168.122.0/24 is available, but then a short time later a host interface is always started on the same subnet - that would mean every time the host booted the exact same destabilizing dance would take place even though it would be pretty easy to predict the eventually-used subnet based on past experience). Although we historically have avoided automatic changes to libvirt config files by libvirtd itself as much as possible (the only cases I can think of are when we're modifying the config to take care of some compatibility problem after an upgrade), what do you think about having the autoaddr='yes' networks automatically update the config with the current subnet info? (maybe this would need to only be done if not starting from a live image or something, or maybe it should just always be done). This would then be used as the first guess the next time the network was started. That way we would avoid the need to delay starting libvirt networks until after host networking was fully up; the subnet might bounce around a bit that first time, but once a stable address was found during that first run, it would then be used from the get-go during all subsequent boots (until/unless something changed and it had to be changed yet again).

On Mon, Jun 08, 2020 at 11:05:00PM -0400, Laine Stump wrote:
On 6/8/20 10:51 AM, Daniel P. Berrangé wrote:
The virtual network has never supported NAT with IPv6 since this feature didn't exist at the time. NAT has been available since RHEL-7 vintage though, and it is desirable to be able to use it.
This series enables it with
<forward mode=3D"nat"> <nat ipv6=3D"yes"/> </forward>
I've had this lurking on my "this is something I should do" list for a long time, but couldn't decide on the best name in XML (and also figured that the problem with accept_ra needed to be fixed first), so it never got to the top. So I'm glad to see you've done it, disappointed in myself that I never did it :-/
I like your XML knob naming better than what I'd considered. I had thought of having <forward mode='supernat'> (or some other more reasonable extra mode), but your proposal is more orthogonal and matches with the existing ipv6='yes' at the toplevel of <network> (which is used to enable ipv6 traffic between guests on the bridge even when there are no IPv6 addresses configured for the network.)
I considered mode="nat6" as an alternative, but it would have meant updating many switch() statements, and is a somewhat misleading as a name.
</network>
Conceptually this means
- Try to gimme a subnet with IPv4 and DHCP - Try to gimme a subnet with IPv6 and RAs
Now when we start the virtual network
- If IPv4 is not enabled on host, don't assign addr
What will we use to check for this? Not just "no IP addresses configured", I guess, since it may be the case that libvirt has just happened to come up before NM or whoever has started any networks. (or maybe someone wants to use IPv6 on a libvirt virtual network, but have no IPv6 connectivity beyond the host).
IIUC, we can simply check whether it is possible to create a socket with AF_INET or AF_INET6. If the kernel supports it, then this should suceed, even if network manager isn't running yet.
- Else - Iterate N=3D1..254 to find a free range for IPv4 - Use 192.168.N.0/24 for subnet - Use 192.168.N.1 for host IP - Use 192.168.N.2 -> 192.168.N.254 for guest DHCP
- If IPv6 is not enabled on host, don't assign addr - Else - Generate NNNN:NNNN as 4 random bytes - Use fd00:add:f00d:NNNN:NNNN::0/64 for IPv6 subnet - Use fd00:add:f00d:NNNN:NNNN::1 for host IP - Use route advertizement for IPv6 zero-conf
With NNNN:NNNN, even with 1000 guests running, we have just a 0.02% chance of clashing with a guest for IPv6.
The "live" XML would always reflect the currently assigned addresses
Proactively monitor the address allocations of the host. If we see a conflicting address appear, take down the dnsmasq intance, generate a new subnet, bring dnsmasq back online.
Hmm. How would you see this monitoring happening? We couldn't do it with an external script like I had done for simple "shut down on conflict" without adding extra functionality to libvirt's network driver. We *could* go back to the idea of monitoring netlink change messages ourselves within libvirtd and doing it all internally ourselves. Or maybe the NM script I proposed could go beyond simply destroying conflicting networks, and also restart any network that had autoaddr='yes'; to make this fully functional we would need to finally put in the proper stuff so that tap devices (and the underlying emulated NICs) would be set offline when their connected network was destroyed, and then reconnected/set online when the network was re-started. Getting the networks to behave this way would be useful in general anyway, even without thinking about the conflicting-networks problem. The one downside of externally controlling renumbering-on-conflict using an external script is that it would only work with NetworkManager...
Yeah, I'm trying to remember now why we went the NM hook route, rather than listening for netlink events. I guess NM is much simpler to hook into. I'd honestly not thought about this too much though - just having an automatically numbered network will already be a huge step forward compared to current day. In particular if we insituted a rule that if we are NOT on a hypervisor, we count from N=254 -> 0, when picking 192.168.N.0, and count from N=0 -> 254 when we are on a hypervisor, then we'll trivially avoid the host/guest clash in simple case, even if network is not yet online. Don't anyone dare mention nested virt with 3 levels of libvirt... Seriously though, even without automatic teardown & restart, we'd be way better off by simply not hardcoding 192.168.N.0 at RPM install time when the network env is not the same as the run time network env. eg cloud images
Ideally we would have to bring the guest network links offline and then online again to force DHCP re-assignment immediately.
Yeah, I think it really makes sense that when a libvirt network is destroyed, all the tap devices are set offline, and the emulated NICs are set offline as well; then when a libvirt network is started, we would go through all devices that are supposed to be connected to that network, reconnect the taps, set them online, and set the emulated NIC online. We currently do the reconnection part when libvirtd is restarted but can't do it immediately when a *network* is restarted because the network driver has no access to the list of active guests and their interfaces....
Hmm, we do now maintain the list of ports for each network though, and it would be possible to expand that to keep the name of the tap device associated with the port in addition to the other info (e.g. whether or not the NIC has been set offline via an API call), *but* when a network is destroyed, all ports registered with that network are also destroyed, so just expanding the attributes for the ports isn't going to get us where we need. So, do we want to 1) change it to maintain active ports for a network when it is destroyed so that they can be easily reactivated when the network is restarted? Or do we want to 2) change the network driver to make calls to all registered hypervisor drivers during a net-start to look for all guest interfaces that think they are connected to the network? The former sounds much more efficient, but I don't know how "dirty" it seems to maintain state for something that has been "destroyed"...
Or maybe we instead need to also add a new API for networks virNetworkReconnect(), which will use newly expanded info in the network ports list to reconnect all guest interfaces.
Responsibility for enslaving a TAP device into a bridge still lives with the virt drivers, not the network driver. The virt drivers could listen for lifecycle events from the network driver and auto-reconnect. Alternatively the virt driver could listen for netlink events and see the virbr0 being deleted, and created by the kernel.
On a different sub-topic - it would be nice to provide some stability to the subnet used for an autoaddr='yes' network (think of the case where every time a host is booted, libvirt starts its default network when 192.168.122.0/24 is available, but then a short time later a host interface is always started on the same subnet - that would mean every time the host booted the exact same destabilizing dance would take place even though it would be pretty easy to predict the eventually-used subnet based on past experience).
Although we historically have avoided automatic changes to libvirt config files by libvirtd itself as much as possible (the only cases I can think of are when we're modifying the config to take care of some compatibility problem after an upgrade), what do you think about having the autoaddr='yes' networks automatically update the config with the current subnet info? (maybe this would need to only be done if not starting from a live image or something, or maybe it should just always be done). This would then be used as the first guess the next time the network was started. That way we would avoid the need to delay starting libvirt networks until after host networking was fully up; the subnet might bounce around a bit that first time, but once a stable address was found during that first run, it would then be used from the get-go during all subsequent boots (until/unless something changed and it had to be changed yet again).
We could stash the previously chosen subnet in /var/cache/libvirt/network or /var/lib/libvirt/network, no need to modify the inactive XML config. This is like how dnsmasq "remembers" DHCP leases previously given for guests. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 6/9/20 12:27 PM, Daniel P. Berrangé wrote:
On Mon, Jun 08, 2020 at 11:05:00PM -0400, Laine Stump wrote:
The virtual network has never supported NAT with IPv6 since this feature didn't exist at the time. NAT has been available since RHEL-7 vintage though, and it is desirable to be able to use it.
This series enables it with
<forward mode=3D"nat"> <nat ipv6=3D"yes"/> </forward> I've had this lurking on my "this is something I should do" list for a long time, but couldn't decide on the best name in XML (and also figured that the
On 6/8/20 10:51 AM, Daniel P. Berrangé wrote: problem with accept_ra needed to be fixed first), so it never got to the top. So I'm glad to see you've done it, disappointed in myself that I never did it :-/
I like your XML knob naming better than what I'd considered. I had thought of having <forward mode='supernat'> (or some other more reasonable extra mode), but your proposal is more orthogonal and matches with the existing ipv6='yes' at the toplevel of <network> (which is used to enable ipv6 traffic between guests on the bridge even when there are no IPv6 addresses configured for the network.) I considered mode="nat6" as an alternative, but it would have meant updating many switch() statements, and is a somewhat misleading as a name.
Yeah, the way you have it is much better.
</network>
Conceptually this means
- Try to gimme a subnet with IPv4 and DHCP - Try to gimme a subnet with IPv6 and RAs
Now when we start the virtual network
- If IPv4 is not enabled on host, don't assign addr
What will we use to check for this? Not just "no IP addresses configured", I guess, since it may be the case that libvirt has just happened to come up before NM or whoever has started any networks. (or maybe someone wants to use IPv6 on a libvirt virtual network, but have no IPv6 connectivity beyond the host). IIUC, we can simply check whether it is possible to create a socket with AF_INET or AF_INET6. If the kernel supports it, then this should suceed, even if network manager isn't running yet.
- Else - Iterate N=3D1..254 to find a free range for IPv4 - Use 192.168.N.0/24 for subnet - Use 192.168.N.1 for host IP - Use 192.168.N.2 -> 192.168.N.254 for guest DHCP
- If IPv6 is not enabled on host, don't assign addr - Else - Generate NNNN:NNNN as 4 random bytes - Use fd00:add:f00d:NNNN:NNNN::0/64 for IPv6 subnet - Use fd00:add:f00d:NNNN:NNNN::1 for host IP - Use route advertizement for IPv6 zero-conf
With NNNN:NNNN, even with 1000 guests running, we have just a 0.02% chance of clashing with a guest for IPv6.
The "live" XML would always reflect the currently assigned addresses
Proactively monitor the address allocations of the host. If we see a conflicting address appear, take down the dnsmasq intance, generate a new subnet, bring dnsmasq back online. Hmm. How would you see this monitoring happening? We couldn't do it with an external script like I had done for simple "shut down on conflict" without adding extra functionality to libvirt's network driver. We *could* go back to the idea of monitoring netlink change messages ourselves within libvirtd and doing it all internally ourselves. Or maybe the NM script I proposed could go beyond simply destroying conflicting networks, and also restart any network that had autoaddr='yes'; to make this fully functional we would need to finally put in the proper stuff so that tap devices (and the underlying emulated NICs) would be set offline when their connected network was destroyed, and then reconnected/set online when the network was re-started. Getting the networks to behave this way would be useful in general anyway, even without thinking about the conflicting-networks problem. The one downside of externally controlling renumbering-on-conflict using an external script is that it would only work with NetworkManager... Yeah, I'm trying to remember now why we went the NM hook route, rather than listening for netlink events. I guess NM is much simpler to hook into.
You mentioned the NM hook idea (I hadn't even known they had such hooks) when we were talking about libvirtd using activation sockets (unrelated to network conflict stuff), and I said that needing to have a thread always listening for netlink change messages was in conflict with the idea of having libvirtd use activation sockets and exit after a timeout. Once we decide that we're responsible for watching all netlink change messages, we have to have a process that's always running (also we would have to read the entire state of all interfaces each time that process was started, and respond to any changes that had taken place since the last time the process exited). (Probably a NM hook script is also much easier than monitoring netlink message, since there's no need to figure out which field in which type of messages to look at, and you don't have to setup the thread to listen for it, etc. Just a simple python program that gets called with all relevant info in text format either in argv, or in the environment.)
I'd honestly not thought about this too much though - just having an automatically numbered network will already be a huge step forward compared to current day.
True. First step is having it auto-numbered at startup. 2nd step is making it auto-renumber when required while running. Either step 1.5 or step 3 would be saving the auto-number decision to use as first try the next time it's started.
In particular if we insituted a rule that if we are NOT on a hypervisor, we count from N=254 -> 0, when picking 192.168.N.0, and count from N=0 -> 254 when we are on a hypervisor, then we'll trivially avoid the host/guest clash in simple case, even if network is not yet online.
Don't anyone dare mention nested virt with 3 levels of libvirt...
Seriously though, even without automatic teardown & restart, we'd be way better off by simply not hardcoding 192.168.N.0 at RPM install time when the network env is not the same as the run time network env. eg cloud images
Yep. Especially when you consider that (e.g.) even installing Fedora to disk from a live image doesn't involve running any of the rpm postinstall scripts - it just copies the complete image over to the destination from the live image; so the environment encountered when the final OS boots is two "environment steps" removed from the environment that was in effect at the time the rpm postinstall ran.
Ideally we would have to bring the guest network links offline and then online again to force DHCP re-assignment immediately. Yeah, I think it really makes sense that when a libvirt network is destroyed, all the tap devices are set offline, and the emulated NICs are set offline as well; then when a libvirt network is started, we would go through all devices that are supposed to be connected to that network, reconnect the taps, set them online, and set the emulated NIC online. We currently do the reconnection part when libvirtd is restarted but can't do it immediately when a *network* is restarted because the network driver has no access to the list of active guests and their interfaces....
Hmm, we do now maintain the list of ports for each network though, and it would be possible to expand that to keep the name of the tap device associated with the port in addition to the other info (e.g. whether or not the NIC has been set offline via an API call), *but* when a network is destroyed, all ports registered with that network are also destroyed, so just expanding the attributes for the ports isn't going to get us where we need. So, do we want to 1) change it to maintain active ports for a network when it is destroyed so that they can be easily reactivated when the network is restarted? Or do we want to 2) change the network driver to make calls to all registered hypervisor drivers during a net-start to look for all guest interfaces that think they are connected to the network? The former sounds much more efficient, but I don't know how "dirty" it seems to maintain state for something that has been "destroyed"...
Or maybe we instead need to also add a new API for networks virNetworkReconnect(), which will use newly expanded info in the network ports list to reconnect all guest interfaces. Responsibility for enslaving a TAP device into a bridge still lives with the virt drivers, not the network driver.
The virt drivers could listen for lifecycle events from the network driver and auto-reconnect.
Yeah, that could work too. *something* so that an entity that has the proper info about tap devices finds out that the tap device needs to be re-attached. And since currently all the code from creating/destroying/attaching tap devices lives in the various hypervisors, I guess this makes the most sense.
Alternatively the virt driver could listen for netlink events and see the virbr0 being deleted, and created by the kernel.
On one hand, it seems cleaner to do it by watching for lifecycle events from the network driver. On the other hand, that wouldn't properly handle cases where someone was "secretly" attaching to a libvirt-created bridge with <interface type='bridge'> (e.g. session-mode guests that are taking advantage of qemu-bridge-helper) or guests that attach to a non-libvirt-managed bridge that happens to disappear and reappear.
On a different sub-topic - it would be nice to provide some stability to the subnet used for an autoaddr='yes' network (think of the case where every time a host is booted, libvirt starts its default network when 192.168.122.0/24 is available, but then a short time later a host interface is always started on the same subnet - that would mean every time the host booted the exact same destabilizing dance would take place even though it would be pretty easy to predict the eventually-used subnet based on past experience).
Although we historically have avoided automatic changes to libvirt config files by libvirtd itself as much as possible (the only cases I can think of are when we're modifying the config to take care of some compatibility problem after an upgrade), what do you think about having the autoaddr='yes' networks automatically update the config with the current subnet info? (maybe this would need to only be done if not starting from a live image or something, or maybe it should just always be done). This would then be used as the first guess the next time the network was started. That way we would avoid the need to delay starting libvirt networks until after host networking was fully up; the subnet might bounce around a bit that first time, but once a stable address was found during that first run, it would then be used from the get-go during all subsequent boots (until/unless something changed and it had to be changed yet again). We could stash the previously chosen subnet in /var/cache/libvirt/network or /var/lib/libvirt/network, no need to modify the inactive XML config. This is like how dnsmasq "remembers" DHCP leases previously given for guests.
Yeah, even better. I think /var/cache/libvirt/network sounds more appropriate. What format do you think it should be stored in? We have have some stuff like that stored as XML (in particular, qemu capabilities cache is in /var/cache and stored as XML), and at least one thing stored in /var/run/libvirt as JSON (saved MAC address/vlan tag of VFs saved when a VF is assigned to a guest, so that it can be restored later). It was actually *me* who added that latter - the file originally just held the plain raw MAC address (in ASCII), but we needed to expand it; I was originally going to put it there in XML, but there was *something* else going on at the time that convinced me to put it in JSON instead (some other new file was being stored in JSON, or ??? I seriously don't remember the details, just that there was something that influenced that decision). (BTW, I notice that the specfile creates /var/lib/libvirt/network during install/upgrade, but it's empty, and I don't see any place in the code that references it (we do put stuff in /var/run/libvirt/network, but that's a different beast). Is that just an artifact of when we were incorrectly putting network status in /var/lib/libvirt instead of /var/run/libvirt, and we forgot to remove the line in the specfile when that was fixed? Or did my cursory audit of the code miss something?)
participants (2)
-
Daniel P. Berrangé
-
Laine Stump