Hi,
I hit this problem recently when trying to create a bridge with an IPv6
address on a 3.2 kernel: dnsmasq (and, further, radvd) would not bind to
the given address (resp. interface), waiting 20s and then giving up with
-EADDRNOTAVAIL (resp. exiting immediately with "error parsing or
activating the config file", without libvirt noticing it, BTW). This can
be reproduced with (I think) any kernel >= 2.6.39 and the following XML
(to be used with "virsh net-create"):
<network>
<name>test-bridge</name>
<bridge name='testbr0' />
<ip family='ipv6' address='fd00::1' prefix='64'>
</ip>
</network>
(it happens even when you have an IPv4, too)
The problem is that since commit [1] (which, ironically, was made to
“help IPv6 autoconfiguration”) the linux bridge code makes bridges
behave like “real” devices regarding carrier detection. This makes the
bridges created by libvirt, which are started without any up devices,
stay with the NO-CARRIER flag set, and thus prevents DAD (Duplicate
address detection) from happening, thus letting the IPv6 address flagged
as “tentative”. Such addresses cannot be bound to (see RFC 2462), so
dnsmasq fails binding to it (for radvd, it detects that "interface XXX
is not RUNNING", thus that "interface XXX does not exist, ignoring the
interface" (sic)). It seems that this behavior was enhanced somehow with
commit [2] by avoiding setting NO-CARRIER on empty bridges, but I
couldn't reproduce this behavior on my kernel. Anyway, with the “dummy
tap to set MAC address” trick, this wouldn't work.
To fix this, the idea is to get the bridge's attached device to be up so
that DAD can happen (deactivating DAD altogether is not a good idea, I
think). Currently, libvirt creates a dummy TAP device to set the MAC
address of the bridge, keeping it down. But even if we set this device
up, it is not RUNNING as soon as the tap file descriptor attached to it
is closed, thus still preventing DAD. So, we must modify the API a bit,
so that we can get the fd, keep the tap device persistent, run the
daemons, and close it after DAD has taken place. After that, the bridge
will be flagged NO-CARRIER again, but the daemons will be running, even
if not happy about the device's state (but we don't really care about
the bridge's daemons doing anything when no up interface is connected to
it).
Other solutions that I envisioned were:
* Keeping the *-nic interface up: this would waste an fd for each
bridge during all its life. May be acceptable, I don't really
know.
* Stop using the dummy tap trick, and set the MAC address directly
on the bridge: it is possible since quite some time it seems,
even if then there is the problem of the bridge not being
RUNNING when empty, contrary to what [2] says, so this will need
fixing (and this fix only happened in 3.1, so it wouldn't work
for 2.6.39)
* Using the --interface option of dnsmasq, but I saw somewhere
that it's not used by libvirt for backward compatibility. I am
not sure this would solve this problem, though, as I don't know
how dnsmasq binds itself to it with this option.
This is why this patch does what's described earlier. I see radvd
yelling quite often in the logs when the interface is NO-CARRIER, but
it's ok, it keeps running.
Still, this patch is not exactly correct, as radvd get daemonized “too
soon” most of the time (i.e. when not debugging libvirtd…) and probes
the bridge once it has been set down (even if started before setting it
down), thus failing as before (and libvirt lets it be like that: this
would need some more checking, maybe). One /may/ introduce some delay
between networkStartRadvd() and setting the dummy tap down to solve it,
but it seemed too hackish to me. But I couldn't come with a better
solution. I would welcome suggestions here.
BTW, I removed the “up” argument from virNetDevTapCreateInBridgePort()
and virNetDevTapCreate() as all TAP devices are created up now, and I
fixed a function name in the docstring.
[1]
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=1...
[2]
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=b...
---
src/network/bridge_driver.c | 15 ++++++++++++++-
src/qemu/qemu_command.c | 2 +-
src/uml/uml_conf.c | 2 +-
src/util/virnetdevtap.c | 29 +++++++++++++++++------------
src/util/virnetdevtap.h | 7 ++++---
5 files changed, 37 insertions(+), 18 deletions(-)
diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c
index 63338a2..a13efe6 100644
--- a/src/network/bridge_driver.c
+++ b/src/network/bridge_driver.c
@@ -62,6 +62,7 @@
#include "virnetdev.h"
#include "virnetdevbridge.h"
#include "virnetdevtap.h"
+#include "virfile.h"
#define NETWORK_PID_DIR LOCALSTATEDIR "/run/libvirt/network"
#define NETWORK_STATE_DIR LOCALSTATEDIR "/lib/libvirt/network"
@@ -1693,6 +1694,7 @@ networkStartNetworkVirtual(struct network_driver *driver,
virErrorPtr save_err = NULL;
virNetworkIpDefPtr ipdef;
char *macTapIfName = NULL;
+ int tapfd = -1;
/* Check to see if any network IP collides with an existing route */
if (networkCheckRouteCollision(network) < 0)
@@ -1714,8 +1716,9 @@ networkStartNetworkVirtual(struct network_driver *driver,
virReportOOMError();
goto err0;
}
+ /* Keep tun fd open and interface up to allow for IPv6 DAD to happen */
if (virNetDevTapCreateInBridgePort(network->def->bridge,
- &macTapIfName, network->def->mac, 0,
false, NULL) < 0) {
+ &macTapIfName, network->def->mac, 0,
&tapfd, true) < 0) {
VIR_FREE(macTapIfName);
goto err0;
}
@@ -1775,6 +1778,16 @@ networkStartNetworkVirtual(struct network_driver *driver,
if (v6present && networkStartRadvd(network) < 0)
goto err4;
+ /* DAD has happened (dnsmasq waits for it), dnsmasq is now bound to the
+ * bridge's IPv6 address, and radvd to the interface, so we can now set the
+ * dummy tun down.
+ */
+ if (tapfd >= 0) {
+ if (virNetDevSetOnline(macTapIfName, false) < 0)
+ goto err4;
+ VIR_FORCE_CLOSE(tapfd);
+ }
+
if (virNetDevBandwidthSet(network->def->bridge, network->def->bandwidth)
< 0) {
networkReportError(VIR_ERR_INTERNAL_ERROR,
_("cannot set bandwidth limits on %s"),
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
index 22dc871..08457e7 100644
--- a/src/qemu/qemu_command.c
+++ b/src/qemu/qemu_command.c
@@ -247,7 +247,7 @@ qemuNetworkIfaceConnect(virDomainDefPtr def,
memcpy(tapmac, net->mac, VIR_MAC_BUFLEN);
tapmac[0] = 0xFE; /* Discourage bridge from using TAP dev MAC */
err = virNetDevTapCreateInBridgePort(brname, &net->ifname, tapmac,
- vnet_hdr, true, &tapfd);
+ vnet_hdr, &tapfd, false);
virDomainAuditNetDevice(def, net, "/dev/net/tun", tapfd >= 0);
if (err < 0) {
if (template_ifname)
diff --git a/src/uml/uml_conf.c b/src/uml/uml_conf.c
index b878622..bd838a6 100644
--- a/src/uml/uml_conf.c
+++ b/src/uml/uml_conf.c
@@ -141,7 +141,7 @@ umlConnectTapDevice(virConnectPtr conn,
memcpy(tapmac, net->mac, VIR_MAC_BUFLEN);
tapmac[0] = 0xFE; /* Discourage bridge from using TAP dev MAC */
if (virNetDevTapCreateInBridgePort(bridge, &net->ifname, tapmac,
- 0, true, NULL) < 0) {
+ 0, NULL, true) < 0) {
if (template_ifname)
VIR_FREE(net->ifname);
goto error;
diff --git a/src/util/virnetdevtap.c b/src/util/virnetdevtap.c
index 2ed53c6..3e2c2cc 100644
--- a/src/util/virnetdevtap.c
+++ b/src/util/virnetdevtap.c
@@ -109,18 +109,21 @@ virNetDevProbeVnetHdr(int tapfd)
* @ifname: the interface name
* @vnet_hr: whether to try enabling IFF_VNET_HDR
* @tapfd: file descriptor return value for the new tap device
+ * @persist: if the device persists after the file descriptor is closed
*
* Creates a tap interface.
* If the @tapfd parameter is supplied, the open tap device file
- * descriptor will be returned, otherwise the TAP device will be made
- * persistent and closed. The caller must use brDeleteTap to remove
- * a persistent TAP devices when it is no longer needed.
+ * descriptor will be returned, otherwise the TAP device will be closed. The
+ * TAP device will persist after closing the file descriptor if @persist is
+ * true. The caller must use virNetDevTapDelete to remove a persistent TAP
+ * devices when it is no longer needed.
*
* Returns 0 in case of success or an errno code in case of failure.
*/
int virNetDevTapCreate(char **ifname,
int vnet_hdr ATTRIBUTE_UNUSED,
- int *tapfd)
+ int *tapfd,
+ bool persist)
{
int fd;
struct ifreq ifr;
@@ -156,7 +159,7 @@ int virNetDevTapCreate(char **ifname,
goto cleanup;
}
- if (!tapfd &&
+ if (persist &&
(errno = ioctl(fd, TUNSETPERSIST, 1))) {
virReportSystemError(errno,
_("Unable to set tap device %s to persistent"),
@@ -249,14 +252,16 @@ int virNetDevTapDelete(const char *ifname ATTRIBUTE_UNUSED)
* @macaddr: desired MAC address (VIR_MAC_BUFLEN long)
* @vnet_hdr: whether to try enabling IFF_VNET_HDR
* @tapfd: file descriptor return value for the new tap device
+ * @persist: if the device persists after the file descriptor is closed
*
* This function creates a new tap device on a bridge. @ifname can be either
* a fixed name or a name template with '%d' for dynamic name allocation.
* in either case the final name for the bridge will be stored in @ifname.
* If the @tapfd parameter is supplied, the open tap device file
- * descriptor will be returned, otherwise the TAP device will be made
- * persistent and closed. The caller must use brDeleteTap to remove
- * a persistent TAP devices when it is no longer needed.
+ * descriptor will be returned, otherwise the TAP device will be closed. The
+ * TAP device will persist after closing the file descriptor if @persist is
+ * true. The caller must use virNetDevTapDelete to remove a persistent TAP
+ * devices when it is no longer needed.
*
* Returns 0 in case of success or -1 on failure
*/
@@ -264,10 +269,10 @@ int virNetDevTapCreateInBridgePort(const char *brname,
char **ifname,
const unsigned char *macaddr,
int vnet_hdr,
- bool up,
- int *tapfd)
+ int *tapfd,
+ bool persist)
{
- if (virNetDevTapCreate(ifname, vnet_hdr, tapfd) < 0)
+ if (virNetDevTapCreate(ifname, vnet_hdr, tapfd, persist) < 0)
return -1;
/* We need to set the interface MAC before adding it
@@ -289,7 +294,7 @@ int virNetDevTapCreateInBridgePort(const char *brname,
if (virNetDevBridgeAddPort(brname, *ifname) < 0)
goto error;
- if (virNetDevSetOnline(*ifname, up) < 0)
+ if (virNetDevSetOnline(*ifname, true) < 0)
goto error;
return 0;
diff --git a/src/util/virnetdevtap.h b/src/util/virnetdevtap.h
index fb35ac5..6aff641 100644
--- a/src/util/virnetdevtap.h
+++ b/src/util/virnetdevtap.h
@@ -27,7 +27,8 @@
int virNetDevTapCreate(char **ifname,
int vnet_hdr,
- int *tapfd)
+ int *tapfd,
+ bool persist)
ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK;
int virNetDevTapDelete(const char *ifname)
@@ -37,8 +38,8 @@ int virNetDevTapCreateInBridgePort(const char *brname,
char **ifname,
const unsigned char *macaddr,
int vnet_hdr,
- bool up,
- int *tapfd)
+ int *tapfd,
+ bool persist)
ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3)
ATTRIBUTE_RETURN_CHECK;
--
Benjamin Cama <benjamin.cama(a)telecom-bretagne.eu>