[libvirt PATCH v2 0/2] Eliminate old tap/macvtap teardown stomping on new tap setup

V1 is here: https://www.redhat.com/archives/libvir-list/2020-August/msg00756.html The problem and this solution are very well described in patches 2 and 3, but in short - because we (libvirt for macvtap, the kernel for tap) always try to assign the lowest numbered names possible to macvtap and tap devices, we sometimes create a new tap for a new guest using the same name as an old tap for an old guest that is shutting down simultaneous to setting up the new guest/tap. This can lead to the old guest teardown stomping on the new guest setup. This is the problem that the authors were attempting to solve in these two patches sent earlier in the summer: https://www.redhat.com/archives/libvir-list/2020-June/msg00481.html https://www.redhat.com/archives/libvir-list/2020-June/msg00525.html and also in this V2 patch, which Bingsong Si sent in response to my poorly-thought-out advice in my response to his original patch: https://www.redhat.com/archives/libvir-list/2020-June/msg00755.html Somewhere during that discussion, danpb suggested that in order to *really* solve the problem, we should use our own counter for auto-generated tap device names (instead of relying on the kernel) and just never re-use a name until the counter rolls over. That's essentially what these two patches do. One possibly undesirable side effect of this (and the other) patch is that the longer a host is running without reboot, the higher the numbers tap device names will get. While users are accustomed to always seeing vnet0 and vnet1, they may be a bit surprised to now see vnet39283 or macvtap735. It has been pointed out to me (again by danpb) that the same thing happened with PIDs a few years ago, and while it looked strange at first, everyone is now accustomed to it. Changes from V1: Patch 1 from V1 was removed - everything it changed is now removed/replaced in the new Patch 1 (which was Patch 2 in V1). And so of course, what was Patch 3 in V1 is now Patch 2 in V2. I eliminated the old bitmap reservation system in the macvtap patch (Patch 1) rather than adding on top of it as I had in V1 - it really was beyond redundant and unnecessary, and just clouded up the whole situation. This also allowed me to get rid of the 8192 limit (which was there only to limit the size of the virBitmap, which we no longer need), and allow the device names to count up until they overflow either the ifname[IFNAMSIZ] buffer, or reach INT_MAX. Likewise, I modified the standard tap patch to remove the artificial maximum of 99999, and just let it count up until it overflows. (Jano suggested that I should have a test case to test the entire range, but I don't think anyone would be happy with that. If I was masochistic and wanted to mock a bunch of virNetDev functions I could artificially test it by bumping up the counters with calls to the virNetDevTapReserveName() function, but it's 1AM. I did test the rollover of both cases (macvtap, where it overflows the buffer size first, and standard tap where it overflow the 32 bit int first) with a one-off build that started the counter just a few below the overflow point, and it does work correctly.) Laine Stump (2): util: replace macvtap name reservation bitmap with a simple counter util: assign tap device names using a monotonically increasing integer src/libvirt_private.syms | 2 +- src/libxl/libxl_driver.c | 2 +- src/lxc/lxc_process.c | 2 +- src/qemu/qemu_process.c | 22 +- src/util/virnetdevmacvlan.c | 402 +++++++++++++----------------------- src/util/virnetdevmacvlan.h | 6 +- src/util/virnetdevtap.c | 108 +++++++++- src/util/virnetdevtap.h | 4 + 8 files changed, 275 insertions(+), 273 deletions(-) -- 2.26.2

There have been some reports that, due to libvirt always trying to assign the lowest numbered macvtap / tap device name possible, a new guest would sometimes be started using the same tap device name as previously used by another guest that is in the process of being destroyed *as the new guest is starting. In some cases this has led to, for example, the old guest's qemuProcessStop() code deleting a port from an OVS switch that had just been re-added by the new guest (because the port name is based on only the device name using the port). Similar problems can happen (and I believe have) with nwfilter rules and bandwidth rules (which are both instantiated based on the name of the tap device). A couple patches have been previously proposed to change the ordering of startup and shutdown processing, or to put a mutex around everything related to the tap/macvtap device name usage, but in the end no matter what you do there will still be possible holes, because the device could be deleted outside libvirt's control (for example, regular tap devices are automatically deleted when the qemu process terminates, and that isn't always initiated by libvirt but could instead happen completely asynchronously - libvirt then has no control over the ordering of shutdown operations, and no opportunity to protect it with a mutex.) But this only happens if a new device is created at the same time as one is being deleted. We can effectively eliminate the chance of this happening if we end the practice of always looking for the lowest numbered available device name, and instead just keep an integer that is incremented each time we need a new device name. At some point it will need to wrap back around to 0 (in order to avoid the IFNAMSIZ 15 character limit if nothing else), and we can't guarantee that the new name really will be the *least* recently used name, but "math" suggests that it will be *much* less common that we'll try to re-use the *most* recently used name. This patch implements such a counter for macvtap/macvlan, replacing the existing, and much more complicated, "ID reservation" system. The counter is set according to whatever macvtap/macvlan devices are already in use by guests when libvirtd is started, incremented each time a new device name is needed, and wraps back to 0 when either INT_MAX is reached, or when the resulting device name would be longer than IFNAMSIZ-1 characters (which actually is what happens when the template for the device name is "maccvtap%d"). The result is that no macvtap name will be re-used until the host has created (and possibly destroyed) 99,999,999 devices. Signed-off-by: Laine Stump <laine@redhat.com> --- src/libvirt_private.syms | 1 - src/libxl/libxl_driver.c | 2 +- src/lxc/lxc_process.c | 2 +- src/qemu/qemu_process.c | 2 +- src/util/virnetdevmacvlan.c | 402 +++++++++++++----------------------- src/util/virnetdevmacvlan.h | 6 +- 6 files changed, 145 insertions(+), 270 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index f950a68179..4b155691a8 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2642,7 +2642,6 @@ virNetDevMacVLanDelete; virNetDevMacVLanDeleteWithVPortProfile; virNetDevMacVLanIsMacvtap; virNetDevMacVLanModeTypeFromString; -virNetDevMacVLanReleaseName; virNetDevMacVLanReserveName; virNetDevMacVLanRestartWithVPortProfile; virNetDevMacVLanTapOpen; diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index dc602ea162..ccda4e0031 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -367,7 +367,7 @@ libxlReconnectNotifyNets(virDomainDefPtr def) * impolite. */ if (virDomainNetGetActualType(net) == VIR_DOMAIN_NET_TYPE_DIRECT) - ignore_value(virNetDevMacVLanReserveName(net->ifname, false)); + virNetDevMacVLanReserveName(net->ifname); if (net->type == VIR_DOMAIN_NET_TYPE_NETWORK) { if (!conn && !(conn = virGetConnectNetwork())) diff --git a/src/lxc/lxc_process.c b/src/lxc/lxc_process.c index fc59c2e5af..16969dbf33 100644 --- a/src/lxc/lxc_process.c +++ b/src/lxc/lxc_process.c @@ -1613,7 +1613,7 @@ virLXCProcessReconnectNotifyNets(virDomainDefPtr def) * impolite. */ if (virDomainNetGetActualType(net) == VIR_DOMAIN_NET_TYPE_DIRECT) - ignore_value(virNetDevMacVLanReserveName(net->ifname, false)); + virNetDevMacVLanReserveName(net->ifname); if (net->type == VIR_DOMAIN_NET_TYPE_NETWORK) { if (!conn && !(conn = virGetConnectNetwork())) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index ad461d8f34..2a862e6d9e 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3321,7 +3321,7 @@ qemuProcessNotifyNets(virDomainDefPtr def) * impolite. */ if (virDomainNetGetActualType(net) == VIR_DOMAIN_NET_TYPE_DIRECT) - ignore_value(virNetDevMacVLanReserveName(net->ifname, false)); + virNetDevMacVLanReserveName(net->ifname); if (net->type == VIR_DOMAIN_NET_TYPE_NETWORK) { if (!conn && !(conn = virGetConnectNetwork())) diff --git a/src/util/virnetdevmacvlan.c b/src/util/virnetdevmacvlan.c index dcea93a5fe..dc4db2c844 100644 --- a/src/util/virnetdevmacvlan.c +++ b/src/util/virnetdevmacvlan.c @@ -45,6 +45,7 @@ VIR_ENUM_IMPL(virNetDevMacVLanMode, # include <net/if.h> # include <linux/if_tun.h> +# include <math.h> /* Older kernels lacked this enum value. */ # if !HAVE_DECL_MACVLAN_MODE_PASSTHRU @@ -69,211 +70,121 @@ VIR_LOG_INIT("util.netdevmacvlan"); ((flags & VIR_NETDEV_MACVLAN_CREATE_WITH_TAP) ? \ VIR_NET_GENERATED_MACVTAP_PREFIX : VIR_NET_GENERATED_MACVLAN_PREFIX) -# define MACVLAN_MAX_ID 8191 virMutex virNetDevMacVLanCreateMutex = VIR_MUTEX_INITIALIZER; -virBitmapPtr macvtapIDs = NULL; -virBitmapPtr macvlanIDs = NULL; - -static int -virNetDevMacVLanOnceInit(void) -{ - if (!macvtapIDs && - !(macvtapIDs = virBitmapNew(MACVLAN_MAX_ID + 1))) - return -1; - if (!macvlanIDs && - !(macvlanIDs = virBitmapNew(MACVLAN_MAX_ID + 1))) - return -1; - return 0; -} - -VIR_ONCE_GLOBAL_INIT(virNetDevMacVLan); +static int virNetDevMacVTapLastID = -1; +static int virNetDevMacVLanLastID = -1; -/** - * virNetDevMacVLanReserveID: - * - * @id: id 0 - MACVLAN_MAX_ID+1 to reserve (or -1 for "first free") - * @flags: set VIR_NETDEV_MACVLAN_CREATE_WITH_TAP for macvtapN else macvlanN - * @quietFail: don't log an error if this name is already in-use - * @nextFree: reserve the next free ID *after* @id rather than @id itself - * - * Reserve the indicated ID in the appropriate bitmap, or find the - * first free ID if @id is -1. - * - * Returns newly reserved ID# on success, or -1 to indicate failure. - */ -static int -virNetDevMacVLanReserveID(int id, unsigned int flags, - bool quietFail, bool nextFree) +static void +virNetDevMacVLanReserveNameInternal(const char *name) { - virBitmapPtr bitmap; - - if (virNetDevMacVLanInitialize() < 0) - return -1; - - bitmap = (flags & VIR_NETDEV_MACVLAN_CREATE_WITH_TAP) ? - macvtapIDs : macvlanIDs; + unsigned int id; + const char *idstr = NULL; + int *lastID = NULL; + int len; - if (id > MACVLAN_MAX_ID) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("can't use name %s%d - out of range 0-%d"), - VIR_NET_GENERATED_PREFIX, id, MACVLAN_MAX_ID); - return -1; + if (STRPREFIX(name, VIR_NET_GENERATED_MACVTAP_PREFIX)) { + lastID = &virNetDevMacVTapLastID; + len = strlen(VIR_NET_GENERATED_MACVTAP_PREFIX); + } else if (STRPREFIX(name, VIR_NET_GENERATED_MACVLAN_PREFIX)) { + lastID = &virNetDevMacVTapLastID; + len = strlen(VIR_NET_GENERATED_MACVLAN_PREFIX); + } else { + return; } - if ((id < 0 || nextFree) && - (id = virBitmapNextClearBit(bitmap, id)) < 0) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("no unused %s names available"), - VIR_NET_GENERATED_PREFIX); - return -1; - } + VIR_INFO("marking device in use: '%s'", name); - if (virBitmapIsBitSet(bitmap, id)) { - if (quietFail) { - VIR_INFO("couldn't reserve name %s%d - already in use", - VIR_NET_GENERATED_PREFIX, id); - } else { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("couldn't reserve name %s%d - already in use"), - VIR_NET_GENERATED_PREFIX, id); - } - return -1; - } + idstr = name + len; - if (virBitmapSetBit(bitmap, id) < 0) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("couldn't mark %s%d as used"), - VIR_NET_GENERATED_PREFIX, id); - return -1; + if (virStrToLong_ui(idstr, NULL, 10, &id) >= 0) { + if (*lastID < (int)id) + *lastID = id; } - - VIR_INFO("reserving device %s%d", VIR_NET_GENERATED_PREFIX, id); - return id; } /** - * virNetDevMacVLanReleaseID: - * @id: id 0 - MACVLAN_MAX_ID+1 to release + * virNetDevMacVLanReserveName: + * @name: name of an existing macvtap/macvlan device * - * Returns 0 for success or -1 for failure. + * Set the value of virNetDevMacV(Lan|Tap)LastID to assure that any + * new device created with an autogenerated name will use a number + * higher than the number in the given device name. + * + * Returns nothing. */ -static int -virNetDevMacVLanReleaseID(int id, unsigned int flags) +void +virNetDevMacVLanReserveName(const char *name) { - virBitmapPtr bitmap; - - if (virNetDevMacVLanInitialize() < 0) - return 0; - - bitmap = (flags & VIR_NETDEV_MACVLAN_CREATE_WITH_TAP) ? - macvtapIDs : macvlanIDs; - - if (id > MACVLAN_MAX_ID) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("can't free name %s%d - out of range 0-%d"), - VIR_NET_GENERATED_PREFIX, id, MACVLAN_MAX_ID); - return -1; - } - - if (id < 0) - return 0; - - VIR_INFO("releasing %sdevice %s%d", - virBitmapIsBitSet(bitmap, id) ? "" : "unreserved", - VIR_NET_GENERATED_PREFIX, id); - - if (virBitmapClearBit(bitmap, id) < 0) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("couldn't mark %s%d as unused"), - VIR_NET_GENERATED_PREFIX, id); - return -1; - } - return 0; + virMutexLock(&virNetDevMacVLanCreateMutex); + virNetDevMacVLanReserveNameInternal(name); + virMutexUnlock(&virNetDevMacVLanCreateMutex); } /** - * virNetDevMacVLanReserveName: + * virNetDevMacVLanGenerateName: + * @ifname: pointer to pointer to string containing template + * @lastID: counter to add to the template to form the name * - * @name: already-known name of device - * @quietFail: don't log an error if this name is already in-use + * generate a new (currently unused) name for a new macvtap/macvlan + * device based on the template string in @ifname - replace %d with + * ++(*counter), and keep trying new values until one is found + * that doesn't already exist, or we've tried 10000 different + * names. Once a usable name is found, replace the template with the + * actual name. * - * Extract the device type and id from a macvtap/macvlan device name - * and mark the appropriate position as in-use in the appropriate - * bitmap. - * - * Returns reserved ID# on success, -1 on failure, -2 if the name - * doesn't fit the auto-pattern (so not reserveable). + * Returns 0 on success, -1 on failure. */ -int -virNetDevMacVLanReserveName(const char *name, bool quietFail) +static int +virNetDevMacVLanGenerateName(char **ifname, unsigned int flags) { - unsigned int id; - unsigned int flags = 0; - const char *idstr = NULL; - - if (virNetDevMacVLanInitialize() < 0) - return -1; + const char *prefix; + const char *iftemplate; + int *lastID; + int id; + double maxIDd; + int maxID = INT_MAX; + int attempts = 0; - if (STRPREFIX(name, VIR_NET_GENERATED_MACVTAP_PREFIX)) { - idstr = name + strlen(VIR_NET_GENERATED_MACVTAP_PREFIX); - flags |= VIR_NETDEV_MACVLAN_CREATE_WITH_TAP; - } else if (STRPREFIX(name, VIR_NET_GENERATED_MACVLAN_PREFIX)) { - idstr = name + strlen(VIR_NET_GENERATED_MACVLAN_PREFIX); + if (flags & VIR_NETDEV_MACVLAN_CREATE_WITH_TAP) { + prefix = VIR_NET_GENERATED_MACVTAP_PREFIX; + iftemplate = VIR_NET_GENERATED_MACVTAP_PREFIX "%d"; + lastID = &virNetDevMacVTapLastID; } else { - return -2; + prefix = VIR_NET_GENERATED_MACVLAN_PREFIX; + iftemplate = VIR_NET_GENERATED_MACVLAN_PREFIX "%d"; + lastID = &virNetDevMacVLanLastID; } - if (virStrToLong_ui(idstr, NULL, 10, &id) < 0) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("couldn't get id value from macvtap device name %s"), - name); - return -1; - } - return virNetDevMacVLanReserveID(id, flags, quietFail, false); -} + maxIDd = pow(10, IFNAMSIZ - 1 - strlen(prefix)); + if (maxIDd <= (double)INT_MAX) + maxID = (int)maxIDd; + do { + g_autofree char *try = NULL; -/** - * virNetDevMacVLanReleaseName: - * - * @name: already-known name of device - * - * Extract the device type and id from a macvtap/macvlan device name - * and mark the appropriate position as in-use in the appropriate - * bitmap. - * - * returns 0 on success, -1 on failure - */ -int -virNetDevMacVLanReleaseName(const char *name) -{ - unsigned int id; - unsigned int flags = 0; - const char *idstr = NULL; + id = ++(*lastID); - if (virNetDevMacVLanInitialize() < 0) - return -1; + /* reset before overflow */ + if (*lastID == maxID) + *lastID = -1; - if (STRPREFIX(name, VIR_NET_GENERATED_MACVTAP_PREFIX)) { - idstr = name + strlen(VIR_NET_GENERATED_MACVTAP_PREFIX); - flags |= VIR_NETDEV_MACVLAN_CREATE_WITH_TAP; - } else if (STRPREFIX(name, VIR_NET_GENERATED_MACVLAN_PREFIX)) { - idstr = name + strlen(VIR_NET_GENERATED_MACVLAN_PREFIX); - } else { - return 0; - } + try = g_strdup_printf(iftemplate, id); - if (virStrToLong_ui(idstr, NULL, 10, &id) < 0) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("couldn't get id value from macvtap device name %s"), - name); - return -1; - } - return virNetDevMacVLanReleaseID(id, flags); + if (!virNetDevExists(try)) { + g_free(*ifname); + *ifname = g_steal_pointer(&try); + return 0; + } + } while (++attempts < 10000); + + virReportError(VIR_ERR_INTERNAL_ERROR, + _("no unused %s names available"), + *ifname); + return -1; } @@ -320,8 +231,7 @@ virNetDevMacVLanCreate(const char *ifname, const char *type, const virMacAddr *macaddress, const char *srcdev, - uint32_t macvlan_mode, - int *retry) + uint32_t macvlan_mode) { int error = 0; int ifindex = 0; @@ -330,7 +240,6 @@ virNetDevMacVLanCreate(const char *ifname, .mac = macaddress, }; - *retry = 0; if (virNetDevGetIndex(srcdev, &ifindex) < 0) return -1; @@ -338,17 +247,15 @@ virNetDevMacVLanCreate(const char *ifname, data.ifindex = &ifindex; if (virNetlinkNewLink(ifname, type, &data, &error) < 0) { char macstr[VIR_MAC_STRING_BUFLEN]; - if (error == -EEXIST) - *retry = 1; - else if (error < 0) - virReportSystemError(-error, - _("error creating %s interface %s@%s (%s)"), - type, ifname, srcdev, - virMacAddrFormat(macaddress, macstr)); + virReportSystemError(-error, + _("error creating %s interface %s@%s (%s)"), + type, ifname, srcdev, + virMacAddrFormat(macaddress, macstr)); return -1; } + VIR_INFO("created device: '%s'", ifname); return 0; } @@ -363,6 +270,7 @@ virNetDevMacVLanCreate(const char *ifname, */ int virNetDevMacVLanDelete(const char *ifname) { + VIR_INFO("delete device: '%s'", ifname); return virNetlinkDelLink(ifname, NULL); } @@ -903,13 +811,8 @@ virNetDevMacVLanCreateWithVPortProfile(const char *ifnameRequested, unsigned int flags) { const char *type = VIR_NET_GENERATED_PREFIX; - const char *pattern = (flags & VIR_NETDEV_MACVLAN_CREATE_WITH_TAP) ? - VIR_NET_GENERATED_MACVTAP_PATTERN : VIR_NET_GENERATED_MACVLAN_PATTERN; - int reservedID = -1; - char ifname[IFNAMSIZ]; - int retries, do_retry = 0; + g_autofree char *ifname = NULL; uint32_t macvtapMode; - const char *ifnameCreated = NULL; int vf = -1; bool vnet_hdr = flags & VIR_NETDEV_MACVLAN_VNET_HDR; @@ -944,6 +847,8 @@ virNetDevMacVLanCreateWithVPortProfile(const char *ifnameRequested, return -1; } + virMutexLock(&virNetDevMacVLanCreateMutex); + if (ifnameRequested) { int rc; bool isAutoName @@ -951,97 +856,81 @@ virNetDevMacVLanCreateWithVPortProfile(const char *ifnameRequested, STRPREFIX(ifnameRequested, VIR_NET_GENERATED_MACVLAN_PREFIX)); VIR_INFO("Requested macvtap device name: %s", ifnameRequested); - virMutexLock(&virNetDevMacVLanCreateMutex); if ((rc = virNetDevExists(ifnameRequested)) < 0) { virMutexUnlock(&virNetDevMacVLanCreateMutex); return -1; } + if (rc) { - if (isAutoName) - goto create_name; - virReportSystemError(EEXIST, - _("Unable to create %s device %s"), - type, ifnameRequested); - virMutexUnlock(&virNetDevMacVLanCreateMutex); - return -1; - } - if (isAutoName && - (reservedID = virNetDevMacVLanReserveName(ifnameRequested, true)) < 0) { - reservedID = -1; - goto create_name; - } + /* ifnameRequested is already being used */ - if (virNetDevMacVLanCreate(ifnameRequested, type, macaddress, - linkdev, macvtapMode, &do_retry) < 0) { - if (isAutoName) { - virNetDevMacVLanReleaseName(ifnameRequested); - reservedID = -1; - goto create_name; + if (!isAutoName) { + virReportSystemError(EEXIST, + _("Unable to create device '%s'"), + ifnameRequested); + virMutexUnlock(&virNetDevMacVLanCreateMutex); + return -1; + } + } else { + + /* ifnameRequested is available. try to open it */ + + virNetDevMacVLanReserveNameInternal(ifnameRequested); + + if (virNetDevMacVLanCreate(ifnameRequested, type, macaddress, + linkdev, macvtapMode) == 0) { + + /* virNetDevMacVLanCreate() was successful - use this name */ + ifname = g_strdup(ifnameRequested); + + } else if (!isAutoName) { + /* coudn't open ifnameRequested, but it wasn't an + * autogenerated named, so there is nothing else to + * try - fail and return. + */ + virMutexUnlock(&virNetDevMacVLanCreateMutex); + return -1; } - virMutexUnlock(&virNetDevMacVLanCreateMutex); - return -1; } - /* virNetDevMacVLanCreate() was successful - use this name */ - ifnameCreated = ifnameRequested; - create_name: - virMutexUnlock(&virNetDevMacVLanCreateMutex); } - retries = MACVLAN_MAX_ID; - while (!ifnameCreated && retries) { - virMutexLock(&virNetDevMacVLanCreateMutex); - reservedID = virNetDevMacVLanReserveID(reservedID, flags, false, true); - if (reservedID < 0) { + if (!ifname) { + /* ifnameRequested was NULL, or it was an already in use + * autogenerated name, so now we look for an unused + * autogenerated name. + */ + if (virNetDevMacVLanGenerateName(&ifname, flags) < 0 || + virNetDevMacVLanCreate(ifname, type, macaddress, + linkdev, macvtapMode) < 0) { virMutexUnlock(&virNetDevMacVLanCreateMutex); return -1; } - g_snprintf(ifname, sizeof(ifname), pattern, reservedID); - if (virNetDevMacVLanCreate(ifname, type, macaddress, linkdev, - macvtapMode, &do_retry) < 0) { - virNetDevMacVLanReleaseID(reservedID, flags); - virMutexUnlock(&virNetDevMacVLanCreateMutex); - if (!do_retry) - return -1; - VIR_INFO("Device %s wasn't reserved but already existed, skipping", - ifname); - retries--; - continue; - } - ifnameCreated = ifname; - virMutexUnlock(&virNetDevMacVLanCreateMutex); } - if (!ifnameCreated) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("Too many unreserved %s devices in use"), - type); - return -1; - } + /* all done creating the device */ + virMutexUnlock(&virNetDevMacVLanCreateMutex); - if (virNetDevVPortProfileAssociate(ifnameCreated, + if (virNetDevVPortProfileAssociate(ifname, virtPortProfile, macaddress, linkdev, vf, - vmuuid, vmOp, false) < 0) + vmuuid, vmOp, false) < 0) { goto link_del_exit; + } if (flags & VIR_NETDEV_MACVLAN_CREATE_IFUP) { - if (virNetDevSetOnline(ifnameCreated, true) < 0) + if (virNetDevSetOnline(ifname, true) < 0) goto disassociate_exit; } if (flags & VIR_NETDEV_MACVLAN_CREATE_WITH_TAP) { - if (virNetDevMacVLanTapOpen(ifnameCreated, tapfd, tapfdSize) < 0) + if (virNetDevMacVLanTapOpen(ifname, tapfd, tapfdSize) < 0) goto disassociate_exit; if (virNetDevMacVLanTapSetup(tapfd, tapfdSize, vnet_hdr) < 0) goto disassociate_exit; - - *ifnameResult = g_strdup(ifnameCreated); - } else { - *ifnameResult = g_strdup(ifnameCreated); } if (vmOp == VIR_NETDEV_VPORT_PROFILE_OP_CREATE || @@ -1050,17 +939,18 @@ virNetDevMacVLanCreateWithVPortProfile(const char *ifnameRequested, * a saved image) - migration and libvirtd restart are handled * elsewhere. */ - if (virNetDevMacVLanVPortProfileRegisterCallback(ifnameCreated, macaddress, + if (virNetDevMacVLanVPortProfileRegisterCallback(ifname, macaddress, linkdev, vmuuid, virtPortProfile, vmOp) < 0) goto disassociate_exit; } + *ifnameResult = g_steal_pointer(&ifname); return 0; disassociate_exit: - ignore_value(virNetDevVPortProfileDisassociate(ifnameCreated, + ignore_value(virNetDevVPortProfileDisassociate(ifname, virtPortProfile, macaddress, linkdev, @@ -1070,9 +960,7 @@ virNetDevMacVLanCreateWithVPortProfile(const char *ifnameRequested, VIR_FORCE_CLOSE(tapfd[tapfdSize]); link_del_exit: - ignore_value(virNetDevMacVLanDelete(ifnameCreated)); - virNetDevMacVLanReleaseName(ifnameCreated); - + ignore_value(virNetDevMacVLanDelete(ifname)); return -1; } @@ -1106,7 +994,6 @@ int virNetDevMacVLanDeleteWithVPortProfile(const char *ifname, ret = -1; if (virNetDevMacVLanDelete(ifname) < 0) ret = -1; - virNetDevMacVLanReleaseName(ifname); } if (mode == VIR_NETDEV_MACVLAN_MODE_PASSTHRU) { @@ -1181,8 +1068,7 @@ int virNetDevMacVLanCreate(const char *ifname G_GNUC_UNUSED, const char *type G_GNUC_UNUSED, const virMacAddr *macaddress G_GNUC_UNUSED, const char *srcdev G_GNUC_UNUSED, - uint32_t macvlan_mode G_GNUC_UNUSED, - int *retry G_GNUC_UNUSED) + uint32_t macvlan_mode G_GNUC_UNUSED) { virReportSystemError(ENOSYS, "%s", _("Cannot create macvlan devices on this platform")); @@ -1271,15 +1157,7 @@ int virNetDevMacVLanVPortProfileRegisterCallback(const char *ifname G_GNUC_UNUSE return -1; } -int virNetDevMacVLanReleaseName(const char *name G_GNUC_UNUSED) -{ - virReportSystemError(ENOSYS, "%s", - _("Cannot create macvlan devices on this platform")); - return -1; -} - -int virNetDevMacVLanReserveName(const char *name G_GNUC_UNUSED, - bool quietFail G_GNUC_UNUSED) +void virNetDevMacVLanReserveName(const char *name G_GNUC_UNUSED) { virReportSystemError(ENOSYS, "%s", _("Cannot create macvlan devices on this platform")); diff --git a/src/util/virnetdevmacvlan.h b/src/util/virnetdevmacvlan.h index fc1bb018a2..48800a8fcf 100644 --- a/src/util/virnetdevmacvlan.h +++ b/src/util/virnetdevmacvlan.h @@ -54,8 +54,7 @@ typedef enum { #define VIR_NET_GENERATED_MACVTAP_PREFIX "macvtap" #define VIR_NET_GENERATED_MACVLAN_PREFIX "macvlan" -int virNetDevMacVLanReserveName(const char *name, bool quietfail); -int virNetDevMacVLanReleaseName(const char *name); +void virNetDevMacVLanReserveName(const char *name); bool virNetDevMacVLanIsMacvtap(const char *ifname) ATTRIBUTE_NONNULL(1) G_GNUC_WARN_UNUSED_RESULT G_GNUC_NO_INLINE; @@ -64,8 +63,7 @@ int virNetDevMacVLanCreate(const char *ifname, const char *type, const virMacAddr *macaddress, const char *srcdev, - uint32_t macvlan_mode, - int *retry) + uint32_t macvlan_mode) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3) ATTRIBUTE_NONNULL(4) G_GNUC_WARN_UNUSED_RESULT; -- 2.26.2

On 8/26/20 7:22 AM, Laine Stump wrote:
There have been some reports that, due to libvirt always trying to assign the lowest numbered macvtap / tap device name possible, a new guest would sometimes be started using the same tap device name as previously used by another guest that is in the process of being destroyed *as the new guest is starting.
In some cases this has led to, for example, the old guest's qemuProcessStop() code deleting a port from an OVS switch that had just been re-added by the new guest (because the port name is based on only the device name using the port). Similar problems can happen (and I believe have) with nwfilter rules and bandwidth rules (which are both instantiated based on the name of the tap device).
A couple patches have been previously proposed to change the ordering of startup and shutdown processing, or to put a mutex around everything related to the tap/macvtap device name usage, but in the end no matter what you do there will still be possible holes, because the device could be deleted outside libvirt's control (for example, regular tap devices are automatically deleted when the qemu process terminates, and that isn't always initiated by libvirt but could instead happen completely asynchronously - libvirt then has no control over the ordering of shutdown operations, and no opportunity to protect it with a mutex.)
But this only happens if a new device is created at the same time as one is being deleted. We can effectively eliminate the chance of this happening if we end the practice of always looking for the lowest numbered available device name, and instead just keep an integer that is incremented each time we need a new device name. At some point it will need to wrap back around to 0 (in order to avoid the IFNAMSIZ 15 character limit if nothing else), and we can't guarantee that the new name really will be the *least* recently used name, but "math" suggests that it will be *much* less common that we'll try to re-use the *most* recently used name.
This patch implements such a counter for macvtap/macvlan, replacing the existing, and much more complicated, "ID reservation" system. The counter is set according to whatever macvtap/macvlan devices are already in use by guests when libvirtd is started, incremented each time a new device name is needed, and wraps back to 0 when either INT_MAX is reached, or when the resulting device name would be longer than IFNAMSIZ-1 characters (which actually is what happens when the template for the device name is "maccvtap%d"). The result is that no macvtap name will be re-used until the host has created (and possibly destroyed) 99,999,999 devices.
Signed-off-by: Laine Stump <laine@redhat.com> --- src/libvirt_private.syms | 1 - src/libxl/libxl_driver.c | 2 +- src/lxc/lxc_process.c | 2 +- src/qemu/qemu_process.c | 2 +- src/util/virnetdevmacvlan.c | 402 +++++++++++++----------------------- src/util/virnetdevmacvlan.h | 6 +- 6 files changed, 145 insertions(+), 270 deletions(-)
diff --git a/src/util/virnetdevmacvlan.c b/src/util/virnetdevmacvlan.c index dcea93a5fe..dc4db2c844 100644 --- a/src/util/virnetdevmacvlan.c +++ b/src/util/virnetdevmacvlan.c
+static int +virNetDevMacVLanGenerateName(char **ifname, unsigned int flags) { - unsigned int id; - unsigned int flags = 0; - const char *idstr = NULL; - - if (virNetDevMacVLanInitialize() < 0) - return -1; + const char *prefix; + const char *iftemplate; + int *lastID; + int id; + double maxIDd; + int maxID = INT_MAX; + int attempts = 0;
- if (STRPREFIX(name, VIR_NET_GENERATED_MACVTAP_PREFIX)) { - idstr = name + strlen(VIR_NET_GENERATED_MACVTAP_PREFIX); - flags |= VIR_NETDEV_MACVLAN_CREATE_WITH_TAP; - } else if (STRPREFIX(name, VIR_NET_GENERATED_MACVLAN_PREFIX)) { - idstr = name + strlen(VIR_NET_GENERATED_MACVLAN_PREFIX); + if (flags & VIR_NETDEV_MACVLAN_CREATE_WITH_TAP) { + prefix = VIR_NET_GENERATED_MACVTAP_PREFIX; + iftemplate = VIR_NET_GENERATED_MACVTAP_PREFIX "%d"; + lastID = &virNetDevMacVTapLastID; } else { - return -2; + prefix = VIR_NET_GENERATED_MACVLAN_PREFIX; + iftemplate = VIR_NET_GENERATED_MACVLAN_PREFIX "%d"; + lastID = &virNetDevMacVLanLastID; }
- if (virStrToLong_ui(idstr, NULL, 10, &id) < 0) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("couldn't get id value from macvtap device name %s"), - name); - return -1; - } - return virNetDevMacVLanReserveID(id, flags, quietFail, false); -} + maxIDd = pow(10, IFNAMSIZ - 1 - strlen(prefix)); + if (maxIDd <= (double)INT_MAX) + maxID = (int)maxIDd;
pow() requires -lm. We need this to be squashed in: diff --git i/meson.build w/meson.build index dabd4196e6..81668a6681 100644 --- i/meson.build +++ w/meson.build @@ -1176,6 +1176,9 @@ endif libxml_version = '2.9.1' libxml_dep = dependency('libxml-2.0', version: '>=' + libxml_version) +cc = meson.get_compiler('c') +m_dep = cc.find_library('m', required : false) + use_macvtap = false if not get_option('macvtap').disabled() if (cc.has_header_symbol('linux/if_link.h', 'MACVLAN_MODE_BRIDGE') and diff --git i/src/util/meson.build w/src/util/meson.build index a7017f459f..f7092cc3f1 100644 --- i/src/util/meson.build +++ w/src/util/meson.build @@ -188,6 +188,7 @@ virt_util_lib = static_library( devmapper_dep, gnutls_dep, libnl_dep, + m_dep, numactl_dep, secdriver_dep, src_dep, NOTE: Doesn't come from my head. https://mesonbuild.com/howtox.html#add-math-library-lm-portably Michal

On 8/26/20 9:00 AM, Michal Privoznik wrote:
On 8/26/20 7:22 AM, Laine Stump wrote:
There have been some reports that, due to libvirt always trying to assign the lowest numbered macvtap / tap device name possible, a new guest would sometimes be started using the same tap device name as previously used by another guest that is in the process of being destroyed *as the new guest is starting.
In some cases this has led to, for example, the old guest's qemuProcessStop() code deleting a port from an OVS switch that had just been re-added by the new guest (because the port name is based on only the device name using the port). Similar problems can happen (and I believe have) with nwfilter rules and bandwidth rules (which are both instantiated based on the name of the tap device).
A couple patches have been previously proposed to change the ordering of startup and shutdown processing, or to put a mutex around everything related to the tap/macvtap device name usage, but in the end no matter what you do there will still be possible holes, because the device could be deleted outside libvirt's control (for example, regular tap devices are automatically deleted when the qemu process terminates, and that isn't always initiated by libvirt but could instead happen completely asynchronously - libvirt then has no control over the ordering of shutdown operations, and no opportunity to protect it with a mutex.)
But this only happens if a new device is created at the same time as one is being deleted. We can effectively eliminate the chance of this happening if we end the practice of always looking for the lowest numbered available device name, and instead just keep an integer that is incremented each time we need a new device name. At some point it will need to wrap back around to 0 (in order to avoid the IFNAMSIZ 15 character limit if nothing else), and we can't guarantee that the new name really will be the *least* recently used name, but "math" suggests that it will be *much* less common that we'll try to re-use the *most* recently used name.
This patch implements such a counter for macvtap/macvlan, replacing the existing, and much more complicated, "ID reservation" system. The counter is set according to whatever macvtap/macvlan devices are already in use by guests when libvirtd is started, incremented each time a new device name is needed, and wraps back to 0 when either INT_MAX is reached, or when the resulting device name would be longer than IFNAMSIZ-1 characters (which actually is what happens when the template for the device name is "maccvtap%d"). The result is that no macvtap name will be re-used until the host has created (and possibly destroyed) 99,999,999 devices.
Signed-off-by: Laine Stump <laine@redhat.com> --- src/libvirt_private.syms | 1 - src/libxl/libxl_driver.c | 2 +- src/lxc/lxc_process.c | 2 +- src/qemu/qemu_process.c | 2 +- src/util/virnetdevmacvlan.c | 402 +++++++++++++----------------------- src/util/virnetdevmacvlan.h | 6 +- 6 files changed, 145 insertions(+), 270 deletions(-)
diff --git a/src/util/virnetdevmacvlan.c b/src/util/virnetdevmacvlan.c index dcea93a5fe..dc4db2c844 100644 --- a/src/util/virnetdevmacvlan.c +++ b/src/util/virnetdevmacvlan.c
+static int +virNetDevMacVLanGenerateName(char **ifname, unsigned int flags) { - unsigned int id; - unsigned int flags = 0; - const char *idstr = NULL; - - if (virNetDevMacVLanInitialize() < 0) - return -1; + const char *prefix; + const char *iftemplate; + int *lastID; + int id; + double maxIDd; + int maxID = INT_MAX; + int attempts = 0; - if (STRPREFIX(name, VIR_NET_GENERATED_MACVTAP_PREFIX)) { - idstr = name + strlen(VIR_NET_GENERATED_MACVTAP_PREFIX); - flags |= VIR_NETDEV_MACVLAN_CREATE_WITH_TAP; - } else if (STRPREFIX(name, VIR_NET_GENERATED_MACVLAN_PREFIX)) { - idstr = name + strlen(VIR_NET_GENERATED_MACVLAN_PREFIX); + if (flags & VIR_NETDEV_MACVLAN_CREATE_WITH_TAP) { + prefix = VIR_NET_GENERATED_MACVTAP_PREFIX; + iftemplate = VIR_NET_GENERATED_MACVTAP_PREFIX "%d"; + lastID = &virNetDevMacVTapLastID; } else { - return -2; + prefix = VIR_NET_GENERATED_MACVLAN_PREFIX; + iftemplate = VIR_NET_GENERATED_MACVLAN_PREFIX "%d"; + lastID = &virNetDevMacVLanLastID; } - if (virStrToLong_ui(idstr, NULL, 10, &id) < 0) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("couldn't get id value from macvtap device name %s"), - name); - return -1; - } - return virNetDevMacVLanReserveID(id, flags, quietFail, false); -} + maxIDd = pow(10, IFNAMSIZ - 1 - strlen(prefix)); + if (maxIDd <= (double)INT_MAX) + maxID = (int)maxIDd;
pow() requires -lm. We need this to be squashed in:
Dan had said yesterday in IRC that we already link in libm, and it's been building correctly. Are there other targets where that isn't the case, and I'm just lucky?
diff --git i/meson.build w/meson.build index dabd4196e6..81668a6681 100644 --- i/meson.build +++ w/meson.build @@ -1176,6 +1176,9 @@ endif libxml_version = '2.9.1' libxml_dep = dependency('libxml-2.0', version: '>=' + libxml_version)
+cc = meson.get_compiler('c') +m_dep = cc.find_library('m', required : false) + use_macvtap = false if not get_option('macvtap').disabled() if (cc.has_header_symbol('linux/if_link.h', 'MACVLAN_MODE_BRIDGE') and diff --git i/src/util/meson.build w/src/util/meson.build index a7017f459f..f7092cc3f1 100644 --- i/src/util/meson.build +++ w/src/util/meson.build @@ -188,6 +188,7 @@ virt_util_lib = static_library( devmapper_dep, gnutls_dep, libnl_dep, + m_dep, numactl_dep, secdriver_dep, src_dep,
NOTE: Doesn't come from my head. https://mesonbuild.com/howtox.html#add-math-library-lm-portably
Michal

On a Wednesday in 2020, Laine Stump wrote:
On 8/26/20 9:00 AM, Michal Privoznik wrote:
On 8/26/20 7:22 AM, Laine Stump wrote:
There have been some reports that, due to libvirt always trying to assign the lowest numbered macvtap / tap device name possible, a new guest would sometimes be started using the same tap device name as previously used by another guest that is in the process of being destroyed *as the new guest is starting.
In some cases this has led to, for example, the old guest's qemuProcessStop() code deleting a port from an OVS switch that had just been re-added by the new guest (because the port name is based on only the device name using the port). Similar problems can happen (and I believe have) with nwfilter rules and bandwidth rules (which are both instantiated based on the name of the tap device).
A couple patches have been previously proposed to change the ordering of startup and shutdown processing, or to put a mutex around everything related to the tap/macvtap device name usage, but in the end no matter what you do there will still be possible holes, because the device could be deleted outside libvirt's control (for example, regular tap devices are automatically deleted when the qemu process terminates, and that isn't always initiated by libvirt but could instead happen completely asynchronously - libvirt then has no control over the ordering of shutdown operations, and no opportunity to protect it with a mutex.)
But this only happens if a new device is created at the same time as one is being deleted. We can effectively eliminate the chance of this happening if we end the practice of always looking for the lowest numbered available device name, and instead just keep an integer that is incremented each time we need a new device name. At some point it will need to wrap back around to 0 (in order to avoid the IFNAMSIZ 15 character limit if nothing else), and we can't guarantee that the new name really will be the *least* recently used name, but "math" suggests that it will be *much* less common that we'll try to re-use the *most* recently used name.
This patch implements such a counter for macvtap/macvlan, replacing the existing, and much more complicated, "ID reservation" system. The counter is set according to whatever macvtap/macvlan devices are already in use by guests when libvirtd is started, incremented each time a new device name is needed, and wraps back to 0 when either INT_MAX is reached, or when the resulting device name would be longer than IFNAMSIZ-1 characters (which actually is what happens when the template for the device name is "maccvtap%d"). The result is that no macvtap name will be re-used until the host has created (and possibly destroyed) 99,999,999 devices.
Signed-off-by: Laine Stump <laine@redhat.com> --- src/libvirt_private.syms | 1 - src/libxl/libxl_driver.c | 2 +- src/lxc/lxc_process.c | 2 +- src/qemu/qemu_process.c | 2 +- src/util/virnetdevmacvlan.c | 402 +++++++++++++----------------------- src/util/virnetdevmacvlan.h | 6 +- 6 files changed, 145 insertions(+), 270 deletions(-)
diff --git a/src/util/virnetdevmacvlan.c b/src/util/virnetdevmacvlan.c index dcea93a5fe..dc4db2c844 100644 --- a/src/util/virnetdevmacvlan.c +++ b/src/util/virnetdevmacvlan.c
+static int +virNetDevMacVLanGenerateName(char **ifname, unsigned int flags) { - unsigned int id; - unsigned int flags = 0; - const char *idstr = NULL; - - if (virNetDevMacVLanInitialize() < 0) - return -1; + const char *prefix; + const char *iftemplate; + int *lastID; + int id; + double maxIDd; + int maxID = INT_MAX; + int attempts = 0; - if (STRPREFIX(name, VIR_NET_GENERATED_MACVTAP_PREFIX)) { - idstr = name + strlen(VIR_NET_GENERATED_MACVTAP_PREFIX); - flags |= VIR_NETDEV_MACVLAN_CREATE_WITH_TAP; - } else if (STRPREFIX(name, VIR_NET_GENERATED_MACVLAN_PREFIX)) { - idstr = name + strlen(VIR_NET_GENERATED_MACVLAN_PREFIX); + if (flags & VIR_NETDEV_MACVLAN_CREATE_WITH_TAP) { + prefix = VIR_NET_GENERATED_MACVTAP_PREFIX; + iftemplate = VIR_NET_GENERATED_MACVTAP_PREFIX "%d"; + lastID = &virNetDevMacVTapLastID; } else { - return -2; + prefix = VIR_NET_GENERATED_MACVLAN_PREFIX; + iftemplate = VIR_NET_GENERATED_MACVLAN_PREFIX "%d"; + lastID = &virNetDevMacVLanLastID; } - if (virStrToLong_ui(idstr, NULL, 10, &id) < 0) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("couldn't get id value from macvtap device name %s"), - name); - return -1; - } - return virNetDevMacVLanReserveID(id, flags, quietFail, false); -} + maxIDd = pow(10, IFNAMSIZ - 1 - strlen(prefix)); + if (maxIDd <= (double)INT_MAX) + maxID = (int)maxIDd;
pow() requires -lm. We need this to be squashed in:
Dan had said yesterday in IRC that we already link in libm, and it's been building correctly. Are there other targets where that isn't the case, and I'm just lucky?
libxml2 is linking to it, at least. Anyway, we already use ldexp in virrandom and isnan in virxml.c so I'd consider the -lm change a separate issue from this commit. Jano

On 8/26/20 4:21 PM, Ján Tomko wrote:
On a Wednesday in 2020, Laine Stump wrote:
On 8/26/20 9:00 AM, Michal Privoznik wrote:
On 8/26/20 7:22 AM, Laine Stump wrote:
There have been some reports that, due to libvirt always trying to assign the lowest numbered macvtap / tap device name possible, a new guest would sometimes be started using the same tap device name as previously used by another guest that is in the process of being destroyed *as the new guest is starting.
In some cases this has led to, for example, the old guest's qemuProcessStop() code deleting a port from an OVS switch that had just been re-added by the new guest (because the port name is based on only the device name using the port). Similar problems can happen (and I believe have) with nwfilter rules and bandwidth rules (which are both instantiated based on the name of the tap device).
A couple patches have been previously proposed to change the ordering of startup and shutdown processing, or to put a mutex around everything related to the tap/macvtap device name usage, but in the end no matter what you do there will still be possible holes, because the device could be deleted outside libvirt's control (for example, regular tap devices are automatically deleted when the qemu process terminates, and that isn't always initiated by libvirt but could instead happen completely asynchronously - libvirt then has no control over the ordering of shutdown operations, and no opportunity to protect it with a mutex.)
But this only happens if a new device is created at the same time as one is being deleted. We can effectively eliminate the chance of this happening if we end the practice of always looking for the lowest numbered available device name, and instead just keep an integer that is incremented each time we need a new device name. At some point it will need to wrap back around to 0 (in order to avoid the IFNAMSIZ 15 character limit if nothing else), and we can't guarantee that the new name really will be the *least* recently used name, but "math" suggests that it will be *much* less common that we'll try to re-use the *most* recently used name.
This patch implements such a counter for macvtap/macvlan, replacing the existing, and much more complicated, "ID reservation" system. The counter is set according to whatever macvtap/macvlan devices are already in use by guests when libvirtd is started, incremented each time a new device name is needed, and wraps back to 0 when either INT_MAX is reached, or when the resulting device name would be longer than IFNAMSIZ-1 characters (which actually is what happens when the template for the device name is "maccvtap%d"). The result is that no macvtap name will be re-used until the host has created (and possibly destroyed) 99,999,999 devices.
Signed-off-by: Laine Stump <laine@redhat.com> --- src/libvirt_private.syms | 1 - src/libxl/libxl_driver.c | 2 +- src/lxc/lxc_process.c | 2 +- src/qemu/qemu_process.c | 2 +- src/util/virnetdevmacvlan.c | 402 +++++++++++++----------------------- src/util/virnetdevmacvlan.h | 6 +- 6 files changed, 145 insertions(+), 270 deletions(-)
diff --git a/src/util/virnetdevmacvlan.c b/src/util/virnetdevmacvlan.c index dcea93a5fe..dc4db2c844 100644 --- a/src/util/virnetdevmacvlan.c +++ b/src/util/virnetdevmacvlan.c
+static int +virNetDevMacVLanGenerateName(char **ifname, unsigned int flags) { - unsigned int id; - unsigned int flags = 0; - const char *idstr = NULL; - - if (virNetDevMacVLanInitialize() < 0) - return -1; + const char *prefix; + const char *iftemplate; + int *lastID; + int id; + double maxIDd; + int maxID = INT_MAX; + int attempts = 0; - if (STRPREFIX(name, VIR_NET_GENERATED_MACVTAP_PREFIX)) { - idstr = name + strlen(VIR_NET_GENERATED_MACVTAP_PREFIX); - flags |= VIR_NETDEV_MACVLAN_CREATE_WITH_TAP; - } else if (STRPREFIX(name, VIR_NET_GENERATED_MACVLAN_PREFIX)) { - idstr = name + strlen(VIR_NET_GENERATED_MACVLAN_PREFIX); + if (flags & VIR_NETDEV_MACVLAN_CREATE_WITH_TAP) { + prefix = VIR_NET_GENERATED_MACVTAP_PREFIX; + iftemplate = VIR_NET_GENERATED_MACVTAP_PREFIX "%d"; + lastID = &virNetDevMacVTapLastID; } else { - return -2; + prefix = VIR_NET_GENERATED_MACVLAN_PREFIX; + iftemplate = VIR_NET_GENERATED_MACVLAN_PREFIX "%d"; + lastID = &virNetDevMacVLanLastID; } - if (virStrToLong_ui(idstr, NULL, 10, &id) < 0) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("couldn't get id value from macvtap device name %s"), - name); - return -1; - } - return virNetDevMacVLanReserveID(id, flags, quietFail, false); -} + maxIDd = pow(10, IFNAMSIZ - 1 - strlen(prefix)); + if (maxIDd <= (double)INT_MAX) + maxID = (int)maxIDd;
pow() requires -lm. We need this to be squashed in:
Dan had said yesterday in IRC that we already link in libm, and it's been building correctly. Are there other targets where that isn't the case, and I'm just lucky?
libxml2 is linking to it, at least.
Anyway, we already use ldexp in virrandom and isnan in virxml.c so I'd consider the -lm change a separate issue from this commit.
It is, but we are not linking libxml2 to virt_util_lib rather than libvirt.so. Hence, when linking virt_util_lib we get this linking error. I'm okay with making the change in a separate commit. Michal

**************** (So here is a separate patch to add linking of libm. Care to ACK it? Also, what are peoples' opinions of pushing these patches now, so that they'll be in the upcoming release? I've put them on a private gitlab branch so that the CI is run (and found two mingw build problems :-)), except I haven't been able to make the cirrus-ci thing that builds freebsd and macos to work.) **************** On some platforms libm (needed for the pow() function) isn't being linked in somehow. This patch adds the necessary bits to assure that it's linked in when necessary. Suggested-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: Laine Stump <laine@redhat.com> --- meson.build | 3 +++ src/util/meson.build | 1 + 2 files changed, 4 insertions(+) diff --git a/meson.build b/meson.build index dabd4196e6..81668a6681 100644 --- a/meson.build +++ b/meson.build @@ -1176,6 +1176,9 @@ endif libxml_version = '2.9.1' libxml_dep = dependency('libxml-2.0', version: '>=' + libxml_version) +cc = meson.get_compiler('c') +m_dep = cc.find_library('m', required : false) + use_macvtap = false if not get_option('macvtap').disabled() if (cc.has_header_symbol('linux/if_link.h', 'MACVLAN_MODE_BRIDGE') and diff --git a/src/util/meson.build b/src/util/meson.build index a7017f459f..f7092cc3f1 100644 --- a/src/util/meson.build +++ b/src/util/meson.build @@ -188,6 +188,7 @@ virt_util_lib = static_library( devmapper_dep, gnutls_dep, libnl_dep, + m_dep, numactl_dep, secdriver_dep, src_dep, -- 2.26.2

On Wed, Aug 26, 2020 at 04:35:10PM -0400, Laine Stump wrote:
**************** (So here is a separate patch to add linking of libm. Care to ACK it? Also, what are peoples' opinions of pushing these patches now, so that they'll be in the upcoming release? I've put them on a private gitlab branch so that the CI is run (and found two mingw build problems :-)), except I haven't been able to make the cirrus-ci thing that builds freebsd and macos to work.) ****************
Given that our CI currently succeeds, we clearly don't have any bug which neeeds fixing. Either the C library contains the functions, or we're getting linkage to libm indirectly. With glibc it appears to be the former. Fedora / RHEL linker probably gets libm indirectly, but the Ubuntu/Debian linker wont. Either way, it isn't important for the release since we're not showing any broken builds currently.
On some platforms libm (needed for the pow() function) isn't being linked in somehow. This patch adds the necessary bits to assure that it's linked in when necessary.
Suggested-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: Laine Stump <laine@redhat.com> --- meson.build | 3 +++ src/util/meson.build | 1 + 2 files changed, 4 insertions(+)
diff --git a/meson.build b/meson.build index dabd4196e6..81668a6681 100644 --- a/meson.build +++ b/meson.build @@ -1176,6 +1176,9 @@ endif libxml_version = '2.9.1' libxml_dep = dependency('libxml-2.0', version: '>=' + libxml_version)
+cc = meson.get_compiler('c') +m_dep = cc.find_library('m', required : false) + use_macvtap = false if not get_option('macvtap').disabled() if (cc.has_header_symbol('linux/if_link.h', 'MACVLAN_MODE_BRIDGE') and diff --git a/src/util/meson.build b/src/util/meson.build index a7017f459f..f7092cc3f1 100644 --- a/src/util/meson.build +++ b/src/util/meson.build @@ -188,6 +188,7 @@ virt_util_lib = static_library( devmapper_dep, gnutls_dep, libnl_dep, + m_dep, numactl_dep, secdriver_dep, src_dep,
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

When creating a standard tap device, if provided with an ifname that contains "%d", rather than taking that literally as the name to use for the new device, the kernel will instead use that string as a template, and search for the lowest number that could be put in place of %d and produce an otherwise unused and unique name for the new device. For example, if there is no tap device name given in the XML, libvirt will always send "vnet%d" as the device name, and the kernel will create new devices named "vnet0", "vnet1", etc. If one of those devices is deleted, creating a "hole" in the name list, the kernel will always attempt to reuse the name in the hole first before using a name with a higher number (i.e. it finds the lowest possible unused number). The problem with this, as described in the previous patch dealing with macvtap device naming, is that it makes "immediate reuse" of a newly freed tap device name *much* more common, and in the aftermath of deleting a tap device, there is some other necessary cleanup of things which are named based on the device name (nwfilter rules, bandwidth rules, OVS switch ports, to name a few) that could end up stomping over the top of the setup of a new device of the same name for a different guest. Since the kernel "create a name based on a template" functionality for tap devices doesn't exist for macvtap, this patch for standard tap devices is a bit different from the previous patch for macvtap - in particular there was no previous "bitmap ID reservation system" or overly-complex retry loop that needed to be removed. We simply find and unused name, and pass that name on to the kernel instead of "vnet%d". This counter is also wrapped when either it gets to INT_MAX or if the full name would overflow IFNAMSIZ-1 characters. In the case of "vnet%d" and a 32 bit int, we would reach INT_MAX first, but possibly someday someone will change the name from vnet to something else. (NB: It is still possible for a user to provide their own parameterized template name (e.g. "mytap%d") in the XML, and libvirt will just pass that through to the kernel as it always has.) Signed-off-by: Laine Stump <laine@redhat.com> --- src/libvirt_private.syms | 1 + src/qemu/qemu_process.c | 20 +++++++- src/util/virnetdevtap.c | 108 ++++++++++++++++++++++++++++++++++++++- src/util/virnetdevtap.h | 4 ++ 4 files changed, 130 insertions(+), 3 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 4b155691a8..5736a2dbd3 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2676,6 +2676,7 @@ virNetDevTapGetName; virNetDevTapGetRealDeviceName; virNetDevTapInterfaceStats; virNetDevTapReattachBridge; +virNetDevTapReserveName; # util/virnetdevveth.h diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 2a862e6d9e..222a1376c4 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3320,8 +3320,26 @@ qemuProcessNotifyNets(virDomainDefPtr def) * domain to be unceremoniously killed, which would be *very* * impolite. */ - if (virDomainNetGetActualType(net) == VIR_DOMAIN_NET_TYPE_DIRECT) + switch (virDomainNetGetActualType(net)) { + case VIR_DOMAIN_NET_TYPE_DIRECT: virNetDevMacVLanReserveName(net->ifname); + break; + case VIR_DOMAIN_NET_TYPE_BRIDGE: + case VIR_DOMAIN_NET_TYPE_NETWORK: + case VIR_DOMAIN_NET_TYPE_ETHERNET: + virNetDevTapReserveName(net->ifname); + break; + case VIR_DOMAIN_NET_TYPE_USER: + case VIR_DOMAIN_NET_TYPE_VHOSTUSER: + case VIR_DOMAIN_NET_TYPE_SERVER: + case VIR_DOMAIN_NET_TYPE_CLIENT: + case VIR_DOMAIN_NET_TYPE_MCAST: + case VIR_DOMAIN_NET_TYPE_INTERNAL: + case VIR_DOMAIN_NET_TYPE_HOSTDEV: + case VIR_DOMAIN_NET_TYPE_UDP: + case VIR_DOMAIN_NET_TYPE_LAST: + break; + } if (net->type == VIR_DOMAIN_NET_TYPE_NETWORK) { if (!conn && !(conn = virGetConnectNetwork())) diff --git a/src/util/virnetdevtap.c b/src/util/virnetdevtap.c index c0a7c3019e..a46f836da2 100644 --- a/src/util/virnetdevtap.c +++ b/src/util/virnetdevtap.c @@ -49,11 +49,100 @@ #if defined(HAVE_GETIFADDRS) && defined(AF_LINK) # include <ifaddrs.h> #endif +#include <math.h> #define VIR_FROM_THIS VIR_FROM_NONE VIR_LOG_INIT("util.netdevtap"); +virMutex virNetDevTapCreateMutex = VIR_MUTEX_INITIALIZER; +static int virNetDevTapLastID = -1; /* not "unsigned" because callers use %d */ + + +/** + * virNetDevTapReserveName: + * @name: name of an existing tap device + * + * Set the value of virNetDevTapLastID to assure that any new tap + * device created with an autogenerated name will use a number higher + * than the number in the given tap device name. + * + * Returns nothing. + */ +void +virNetDevTapReserveName(const char *name) +{ + unsigned int id; + const char *idstr = NULL; + + + if (STRPREFIX(name, VIR_NET_GENERATED_TAP_PREFIX)) { + + VIR_INFO("marking device in use: '%s'", name); + + idstr = name + strlen(VIR_NET_GENERATED_TAP_PREFIX); + + if (virStrToLong_ui(idstr, NULL, 10, &id) >= 0) { + virMutexLock(&virNetDevTapCreateMutex); + + if (virNetDevTapLastID < (int)id) + virNetDevTapLastID = id; + + virMutexUnlock(&virNetDevTapCreateMutex); + } + } +} + + +/** + * virNetDevTapGenerateName: + * @ifname: pointer to pointer to string containing template + * + * generate a new (currently unused) name for a new tap device based + * on the templace string in @ifname - replace %d with + * ++virNetDevTapLastID, and keep trying new values until one is found + * that doesn't already exist, or we've tried 10000 different + * names. Once a usable name is found, replace the template with the + * actual name. + * + * Returns 0 on success, -1 on failure. + */ +static int +virNetDevTapGenerateName(char **ifname) +{ + int id; + double maxIDd = pow(10, IFNAMSIZ - 1 - strlen(VIR_NET_GENERATED_TAP_PREFIX)); + int maxID = INT_MAX; + int attempts = 0; + + if (maxIDd <= (double)INT_MAX) + maxID = (int)maxIDd; + + do { + g_autofree char *try = NULL; + + id = ++virNetDevTapLastID; + + /* reset before overflow */ + if (virNetDevTapLastID >= maxID) + virNetDevTapLastID = -1; + + try = g_strdup_printf(*ifname, id); + + if (!virNetDevExists(try)) { + g_free(*ifname); + *ifname = g_steal_pointer(&try); + return 0; + } + } while (++attempts < 10000); + + virReportError(VIR_ERR_INTERNAL_ERROR, + _("no unused %s names available"), + VIR_NET_GENERATED_TAP_PREFIX); + return -1; +} + + /** * virNetDevTapGetName: * @tapfd: a tun/tap file descriptor @@ -230,10 +319,22 @@ int virNetDevTapCreate(char **ifname, size_t tapfdSize, unsigned int flags) { - size_t i; + size_t i = 0; struct ifreq ifr; int ret = -1; - int fd; + int fd = 0; + + virMutexLock(&virNetDevTapCreateMutex); + + /* if ifname is "vnet%d", then auto-generate a name for the new + * device (the kernel could do this for us, but has a bad habit of + * immediately re-using names that have just been released, which + * can lead to race conditions). + */ + if (STREQ(*ifname, VIR_NET_GENERATED_TAP_PREFIX "%d") && + virNetDevTapGenerateName(ifname) < 0) { + goto cleanup; + } if (!tunpath) tunpath = "/dev/net/tun"; @@ -299,9 +400,11 @@ int virNetDevTapCreate(char **ifname, tapfd[i] = fd; } + VIR_INFO("created device: '%s'", *ifname); ret = 0; cleanup: + virMutexUnlock(&virNetDevTapCreateMutex); if (ret < 0) { VIR_FORCE_CLOSE(fd); while (i--) @@ -351,6 +454,7 @@ int virNetDevTapDelete(const char *ifname, goto cleanup; } + VIR_INFO("delete device: '%s'", ifname); ret = 0; cleanup: diff --git a/src/util/virnetdevtap.h b/src/util/virnetdevtap.h index c6bd9285ba..dea8aec3af 100644 --- a/src/util/virnetdevtap.h +++ b/src/util/virnetdevtap.h @@ -29,6 +29,10 @@ # define VIR_NETDEV_TAP_REQUIRE_MANUAL_CLEANUP 1 #endif +void +virNetDevTapReserveName(const char *name) + ATTRIBUTE_NONNULL(1); + int virNetDevTapCreate(char **ifname, const char *tunpath, int *tapfd, -- 2.26.2

On 8/26/20 7:22 AM, Laine Stump wrote:
V1 is here: https://www.redhat.com/archives/libvir-list/2020-August/msg00756.html
The problem and this solution are very well described in patches 2 and 3, but in short - because we (libvirt for macvtap, the kernel for tap) always try to assign the lowest numbered names possible to macvtap and tap devices, we sometimes create a new tap for a new guest using the same name as an old tap for an old guest that is shutting down simultaneous to setting up the new guest/tap. This can lead to the old guest teardown stomping on the new guest setup.
This is the problem that the authors were attempting to solve in these two patches sent earlier in the summer:
https://www.redhat.com/archives/libvir-list/2020-June/msg00481.html https://www.redhat.com/archives/libvir-list/2020-June/msg00525.html
and also in this V2 patch, which Bingsong Si sent in response to my poorly-thought-out advice in my response to his original patch:
https://www.redhat.com/archives/libvir-list/2020-June/msg00755.html
Somewhere during that discussion, danpb suggested that in order to *really* solve the problem, we should use our own counter for auto-generated tap device names (instead of relying on the kernel) and just never re-use a name until the counter rolls over. That's essentially what these two patches do.
One possibly undesirable side effect of this (and the other) patch is that the longer a host is running without reboot, the higher the numbers tap device names will get. While users are accustomed to always seeing vnet0 and vnet1, they may be a bit surprised to now see vnet39283 or macvtap735. It has been pointed out to me (again by danpb) that the same thing happened with PIDs a few years ago, and while it looked strange at first, everyone is now accustomed to it.
Changes from V1:
Patch 1 from V1 was removed - everything it changed is now removed/replaced in the new Patch 1 (which was Patch 2 in V1). And so of course, what was Patch 3 in V1 is now Patch 2 in V2.
I eliminated the old bitmap reservation system in the macvtap patch (Patch 1) rather than adding on top of it as I had in V1 - it really was beyond redundant and unnecessary, and just clouded up the whole situation. This also allowed me to get rid of the 8192 limit (which was there only to limit the size of the virBitmap, which we no longer need), and allow the device names to count up until they overflow either the ifname[IFNAMSIZ] buffer, or reach INT_MAX.
Likewise, I modified the standard tap patch to remove the artificial maximum of 99999, and just let it count up until it overflows.
(Jano suggested that I should have a test case to test the entire range, but I don't think anyone would be happy with that. If I was masochistic and wanted to mock a bunch of virNetDev functions I could artificially test it by bumping up the counters with calls to the virNetDevTapReserveName() function, but it's 1AM. I did test the rollover of both cases (macvtap, where it overflows the buffer size first, and standard tap where it overflow the 32 bit int first) with a one-off build that started the counter just a few below the overflow point, and it does work correctly.)
Laine Stump (2): util: replace macvtap name reservation bitmap with a simple counter util: assign tap device names using a monotonically increasing integer
src/libvirt_private.syms | 2 +- src/libxl/libxl_driver.c | 2 +- src/lxc/lxc_process.c | 2 +- src/qemu/qemu_process.c | 22 +- src/util/virnetdevmacvlan.c | 402 +++++++++++++----------------------- src/util/virnetdevmacvlan.h | 6 +- src/util/virnetdevtap.c | 108 +++++++++- src/util/virnetdevtap.h | 4 + 8 files changed, 275 insertions(+), 273 deletions(-)
Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Michal
participants (4)
-
Daniel P. Berrangé
-
Ján Tomko
-
Laine Stump
-
Michal Privoznik