[PATCH 0/2] util: Move udev settle wait to vNIC create functions
See 1/2 for in depth explanation. Michal Prívozník (2): util: Wait for udev to settle after creating vNIC virnetdevmacvlan: Drop udev busy loop from virNetDevMacVLanTapOpen() src/util/virnetdevmacvlan.c | 24 ++++++++++-------------- src/util/virnetdevtap.c | 6 ++++++ src/util/virnetdevveth.c | 4 ++++ 3 files changed, 20 insertions(+), 14 deletions(-) -- 2.52.0
From: Michal Privoznik <mprivozn@redhat.com> There are several types of virtual network interfaces that libvirt creates (TUN, TAP, MACVLAN, MACVTAP, VETH). After these are created (e.g. on domain startup or device hotplug), libvirt often opens their /dev/XXX representation (e.g. /dev/tapNN) in order to pass FDs to the hypervisor. Well, if creation an open() happen in very quick succession, then host's udev might not have had enough time and depending on system's SELinux even we might see open() fail, with AVC message logged. Signs of us trying to mitigate this problem are still to be found in virNetDevMacVLanTapOpen() where upon failed open() a very short g_usleep() is called. Alternatively, in linked gitlab issue, the user reports seeing the following message: type=AVC msg=audit(1774535384.365:1238): avc: denied { open } for pid=6765 comm="rpc-virtqemud" path="/dev/tap33" dev="devtmpfs" ino=805 scontext=system_u:system_r:virtqemud_t:s0 tcontext=system_u:object_r:device_t:s0 tclass=chr_file permissive=1 (For full reasoning why /dev/tap33 is of device_t type see linked issue). Long story short, /dev/tapNN devices are created initially with device_t SELinux type and udev later changes that to tun_tap_device_t. This device_t type is viewed as generic type that only an yet unlabelled device has. Hence missing rule in SELinux policy for virtqemud to open it. Therefore, to avoid this problem, wait for udev to settle by calling virWaitForDevices() (which under the hood spawns "udevadm settle". This may be a bit too heavy hammer though because the function is called basically once per (almost) each <interface/>. If we find that to be a performance drawback then we need to redesign how tun/tap/... devices are created (well, opened). Resolves: https://gitlab.com/libvirt/libvirt/-/work_items/866 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/util/virnetdevmacvlan.c | 4 ++++ src/util/virnetdevtap.c | 6 ++++++ src/util/virnetdevveth.c | 4 ++++ 3 files changed, 14 insertions(+) diff --git a/src/util/virnetdevmacvlan.c b/src/util/virnetdevmacvlan.c index cde9d70eef..347148542d 100644 --- a/src/util/virnetdevmacvlan.c +++ b/src/util/virnetdevmacvlan.c @@ -24,6 +24,7 @@ #include "virnetdevmacvlan.h" #include "virmacaddr.h" #include "virerror.h" +#include "virutil.h" #define VIR_FROM_THIS VIR_FROM_NET @@ -119,6 +120,9 @@ virNetDevMacVLanCreate(const char *ifname, return -1; } + /* Allow udev to process newly created mactap/macvlan. */ + virWaitForDevices(); + VIR_INFO("created device: '%s'", ifname); return 0; } diff --git a/src/util/virnetdevtap.c b/src/util/virnetdevtap.c index e3a6209642..38f50e959e 100644 --- a/src/util/virnetdevtap.c +++ b/src/util/virnetdevtap.c @@ -29,6 +29,7 @@ #include "viralloc.h" #include "virlog.h" #include "virstring.h" +#include "virutil.h" #include <unistd.h> #include <sys/types.h> @@ -265,6 +266,9 @@ int virNetDevTapCreate(char **ifname, tapfd[i] = fd; } + /* Allow udev to process newly created TUN/TAP. */ + virWaitForDevices(); + VIR_INFO("created device: '%s'", *ifname); ret = 0; @@ -375,6 +379,8 @@ int virNetDevTapCreate(char **ifname, if (virNetDevSetName(ifr.ifr_name, *ifname) == -1) goto cleanup; + /* Allow udev to process newly created TUN/TAP. */ + virWaitForDevices(); ret = 0; cleanup: diff --git a/src/util/virnetdevveth.c b/src/util/virnetdevveth.c index 4365345664..77b017427a 100644 --- a/src/util/virnetdevveth.c +++ b/src/util/virnetdevveth.c @@ -26,6 +26,7 @@ #include "virerror.h" #include "virnetdev.h" #include "virnetlink.h" +#include "virutil.h" #define VIR_FROM_THIS VIR_FROM_NONE @@ -127,6 +128,9 @@ int virNetDevVethCreate(char **veth1, char **veth2) if (virNetDevVethCreateInternal(*veth1, *veth2) < 0) goto cleanup; + /* Allow udev to process newly created veth. */ + virWaitForDevices(); + VIR_DEBUG("Create Host: %s guest: %s", *veth1, *veth2); return 0; -- 2.52.0
I just happened upon this while reviewing another patch ... On Wed, Apr 08, 2026 at 02:16:09PM +0200, Michal Privoznik via Devel wrote:
Therefore, to avoid this problem, wait for udev to settle by calling virWaitForDevices() (which under the hood spawns "udevadm settle". This may be a bit too heavy hammer though because the function is called basically once per (almost) each <interface/>. If we find that to be a performance drawback then we need to redesign how tun/tap/... devices are created (well, opened).
I should note that 'udevadm settle' is broken and upstream refuses to fix it, because of course they claim to know better than everyone else: https://github.com/systemd/systemd/issues/40499 I don't know if this also affects network devices. I guess it depends if the kernel can emit events in the same way as for block devices. For block devices our approach has been to continue to use 'udevadm settle' since there is no sane alternative. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html
On 4/8/26 15:02, Richard W.M. Jones wrote:
I just happened upon this while reviewing another patch ...
On Wed, Apr 08, 2026 at 02:16:09PM +0200, Michal Privoznik via Devel wrote:
Therefore, to avoid this problem, wait for udev to settle by calling virWaitForDevices() (which under the hood spawns "udevadm settle". This may be a bit too heavy hammer though because the function is called basically once per (almost) each <interface/>. If we find that to be a performance drawback then we need to redesign how tun/tap/... devices are created (well, opened).
I should note that 'udevadm settle' is broken and upstream refuses to fix it, because of course they claim to know better than everyone else:
https://github.com/systemd/systemd/issues/40499
I don't know if this also affects network devices. I guess it depends if the kernel can emit events in the same way as for block devices.
For block devices our approach has been to continue to use 'udevadm settle' since there is no sane alternative.
That's very unfortunate, but I don't think it affects network devices, given that you bisected udev into a commit that changes how partition table is read. I mean, there can be similar bug that affects network devices, but I haven't encountered it in my testing (on rawhide). Thanks for keeping an eye on this! Michal
On Wed, Apr 08, 2026 at 04:41:13PM +0200, Michal Prívozník wrote:
On 4/8/26 15:02, Richard W.M. Jones wrote:
I just happened upon this while reviewing another patch ...
On Wed, Apr 08, 2026 at 02:16:09PM +0200, Michal Privoznik via Devel wrote:
Therefore, to avoid this problem, wait for udev to settle by calling virWaitForDevices() (which under the hood spawns "udevadm settle". This may be a bit too heavy hammer though because the function is called basically once per (almost) each <interface/>. If we find that to be a performance drawback then we need to redesign how tun/tap/... devices are created (well, opened).
I should note that 'udevadm settle' is broken and upstream refuses to fix it, because of course they claim to know better than everyone else:
https://github.com/systemd/systemd/issues/40499
I don't know if this also affects network devices. I guess it depends if the kernel can emit events in the same way as for block devices.
For block devices our approach has been to continue to use 'udevadm settle' since there is no sane alternative.
That's very unfortunate, but I don't think it affects network devices, given that you bisected udev into a commit that changes how partition table is read. I mean, there can be similar bug that affects network devices, but I haven't encountered it in my testing (on rawhide).
For sure the change I bisected to was just in the code that rereads the partition table. But that commit widened a pre-existing race condition (and the "fix" narrows it again, but does not close it). The actual problem is that the kernel can emit udev events _before_ 'udevadm settle' runs, and udev doesn't take those into account, so udevadm settle isn't a proper barrier operation as it claims to be. Whether this affects network devices or not is going to depend on some deep kernel details that I'm not familiar with. Rich.
Thanks for keeping an eye on this!
Michal
-- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html
On 4/8/26 8:16 AM, Michal Privoznik via Devel wrote:
From: Michal Privoznik <mprivozn@redhat.com>
There are several types of virtual network interfaces that libvirt creates (TUN, TAP, MACVLAN, MACVTAP, VETH). After these are created (e.g. on domain startup or device hotplug), libvirt often opens their /dev/XXX representation (e.g. /dev/tapNN) in order to pass FDs to the hypervisor. Well, if creation an open() happen in very quick succession, then host's udev might not have had enough time and depending on system's SELinux even we might see open() fail, with AVC message logged.
Signs of us trying to mitigate this problem are still to be found in virNetDevMacVLanTapOpen() where upon failed open() a very short g_usleep() is called.
Alternatively, in linked gitlab issue, the user reports seeing the following message:
type=AVC msg=audit(1774535384.365:1238): avc: denied { open } for pid=6765 comm="rpc-virtqemud" path="/dev/tap33" dev="devtmpfs" ino=805 scontext=system_u:system_r:virtqemud_t:s0 tcontext=system_u:object_r:device_t:s0 tclass=chr_file permissive=1
(For full reasoning why /dev/tap33 is of device_t type see linked issue).
Long story short, /dev/tapNN devices are created initially with device_t SELinux type and udev later changes that to tun_tap_device_t. This device_t type is viewed as generic type that only an yet unlabelled device has. Hence missing rule in SELinux policy for virtqemud to open it.
The original SUSE bug was very long, so I skipped a lot of it when reading, but it seems like this all hinges on the SELinux labeling of new /dev/tapNN devices being temporarily incorrect (until udev has a chance to "fix" them, but only in the case where "NN" is > 29[1]). There's a lot of other stuff going on in that entire exchange, but it seems like (as would be expected) there's many red herrings, deadends, and maybe some other AVCs that aren't related to the network devices (I didn't spend enough time to figure that out). If that's the case (if the wait is *only* so that /dev/tapNN will get the correct SELinux label and nothing else not related to /dev/tapNN), then it's overkill (and maybe even an effective NOP) to force this wait for regular tap and veth devices - regular tap devices all multiplex on a single character device (/dev/net/tun) that is already properly labeled for application access, and veth devices don't have *any* character device, i.e. neither of them have a /dev/tapNN that needs to be relabeled, and if that's the only thing that we're waiting for, then we're waiting for nothing. (I did notice that the reporter of the bug pointed out several times that they are heavily using macvtap interfaces rather than standard tap devices.) Unless there is some other known reason to wait, I think this should only be done for macvtap devices, not for tap or veth (not even for macvlan devices, since they also don't have a character device that needs relabeling - this is determined in virNetDevMacVLanCreate() by looking for the flag VIR_NETDEV_MACVLAN_CREATE_WITH_TAP). (*[1] A side note to dispel a misconception I saw stated several times in the SUSE bug: It is correct that the reason that /dev/tapNN devices < 29 don't have the SELinux issue is that there are static rules in the SELinux policy that cause tap1-tap29 to be created with the correct label immediately, and that > 29 are create with a generic device_t label that is soon (but not instantaneously!) corrected by udev. But those numbers monotonically increase, and don't get re-used, so within a while after rebooting the host all new macvtap devices will have an NN > 29 in their /dev/tapNN name. There were several comments in the SUSE bug that this problem could be fixed, at least for smaller hosts, by just reusing names as guests are shut down and re-started, but that isn't true - the NN in the device name isn't based on the name libvirt gives to the device, it's based on the ifindex of the network device, which is assigned by the kernel, and the kernel very wisely doesn't recycle ifindex numbers, because that could lead to a situation where some piece of software thinks that it's operating on interface XYZ, but that interface was deleted and a new interface ABC was created with the same ifindex. (this is also why libvirt doesn't recycle the vnetNN, macvtapNN and vethNN numbers in the ifnames it gives to tap, macvtap, and veth devices.) So in order to delay the problem by recycling old tapNN numbers you would need to get a patch in the kernel, and I can 100% guarantee you that isn't going to happen :-))
Therefore, to avoid this problem, wait for udev to settle by calling virWaitForDevices() (which under the hood spawns "udevadm settle". This may be a bit too heavy hammer though because the function is called basically once per (almost) each <interface/>. If we find that to be a performance drawback then we need to redesign how tun/tap/... devices are created (well, opened).
Resolves: https://gitlab.com/libvirt/libvirt/-/work_items/866 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/util/virnetdevmacvlan.c | 4 ++++ src/util/virnetdevtap.c | 6 ++++++ src/util/virnetdevveth.c | 4 ++++ 3 files changed, 14 insertions(+)
diff --git a/src/util/virnetdevmacvlan.c b/src/util/virnetdevmacvlan.c index cde9d70eef..347148542d 100644 --- a/src/util/virnetdevmacvlan.c +++ b/src/util/virnetdevmacvlan.c @@ -24,6 +24,7 @@ #include "virnetdevmacvlan.h" #include "virmacaddr.h" #include "virerror.h" +#include "virutil.h"
#define VIR_FROM_THIS VIR_FROM_NET
@@ -119,6 +120,9 @@ virNetDevMacVLanCreate(const char *ifname, return -1; }
+ /* Allow udev to process newly created mactap/macvlan. */ + virWaitForDevices(); + VIR_INFO("created device: '%s'", ifname); return 0; } diff --git a/src/util/virnetdevtap.c b/src/util/virnetdevtap.c index e3a6209642..38f50e959e 100644 --- a/src/util/virnetdevtap.c +++ b/src/util/virnetdevtap.c @@ -29,6 +29,7 @@ #include "viralloc.h" #include "virlog.h" #include "virstring.h" +#include "virutil.h"
#include <unistd.h> #include <sys/types.h> @@ -265,6 +266,9 @@ int virNetDevTapCreate(char **ifname, tapfd[i] = fd; }
+ /* Allow udev to process newly created TUN/TAP. */ + virWaitForDevices(); + VIR_INFO("created device: '%s'", *ifname); ret = 0;
@@ -375,6 +379,8 @@ int virNetDevTapCreate(char **ifname, if (virNetDevSetName(ifr.ifr_name, *ifname) == -1) goto cleanup;
+ /* Allow udev to process newly created TUN/TAP. */ + virWaitForDevices();
ret = 0; cleanup: diff --git a/src/util/virnetdevveth.c b/src/util/virnetdevveth.c index 4365345664..77b017427a 100644 --- a/src/util/virnetdevveth.c +++ b/src/util/virnetdevveth.c @@ -26,6 +26,7 @@ #include "virerror.h" #include "virnetdev.h" #include "virnetlink.h" +#include "virutil.h"
#define VIR_FROM_THIS VIR_FROM_NONE
@@ -127,6 +128,9 @@ int virNetDevVethCreate(char **veth1, char **veth2) if (virNetDevVethCreateInternal(*veth1, *veth2) < 0) goto cleanup;
+ /* Allow udev to process newly created veth. */ + virWaitForDevices(); + VIR_DEBUG("Create Host: %s guest: %s", *veth1, *veth2); return 0;
From: Michal Privoznik <mprivozn@redhat.com> Now that after previous commit the wait for udev to settle down is done right after device creation, there's no need to have additional wait in virNetDevMacVLanTapOpen(). It's effectively a dead code. Remove it. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/util/virnetdevmacvlan.c | 20 ++++++-------------- 1 file changed, 6 insertions(+), 14 deletions(-) diff --git a/src/util/virnetdevmacvlan.c b/src/util/virnetdevmacvlan.c index 347148542d..bbc943cc7d 100644 --- a/src/util/virnetdevmacvlan.c +++ b/src/util/virnetdevmacvlan.c @@ -158,7 +158,6 @@ virNetDevMacVLanTapOpen(const char *ifname, int *tapfd, size_t tapfdSize) { - int retries = 10; int ret = -1; int ifindex; size_t i = 0; @@ -172,20 +171,13 @@ virNetDevMacVLanTapOpen(const char *ifname, for (i = 0; i < tapfdSize; i++) { int fd = -1; - while (fd < 0) { - if ((fd = open(tapname, O_RDWR)) >= 0) { - tapfd[i] = fd; - } else if (retries-- > 0) { - /* may need to wait for udev to be done */ - g_usleep(20000); - } else { - /* However, if haven't succeeded, quit. */ - virReportSystemError(errno, - _("cannot open macvtap tap device %1$s"), - tapname); - goto cleanup; - } + if ((fd = open(tapname, O_RDWR)) < 0) { + virReportSystemError(errno, + _("cannot open macvtap tap device %1$s"), + tapname); + goto cleanup; } + tapfd[i] = fd; } ret = 0; -- 2.52.0
On 4/8/26 8:16 AM, Michal Privoznik via Devel wrote:
From: Michal Privoznik <mprivozn@redhat.com>
Now that after previous commit the wait for udev to settle down is done right after device creation, there's no need to have additional wait in virNetDevMacVLanTapOpen(). It's effectively a dead code. Remove it.
I looked all the way back to when this retry loop was put in - it went in with the original macvtap support in 2010 (commit 315baab94432f52a039b1364588948e2f1365567) and had a comment that said "number of retires in case udev for example may need to be waited for to create the tap chardev". So anyway we can say that the reason for the retry loop is "waiting for udev to settle", so definitely once we add the virWaitForDevices() after the create (only in the case that has the CREATE_WITH_TAP flag!), this code can safely be removed (assuming there's not some other *unknown* reason that retrying was useful, of course :-)) Reviewed-by: Laine Stump <laine@redhat.com> (contingent on modified Patch 1 being pushed first)
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/util/virnetdevmacvlan.c | 20 ++++++-------------- 1 file changed, 6 insertions(+), 14 deletions(-)
diff --git a/src/util/virnetdevmacvlan.c b/src/util/virnetdevmacvlan.c index 347148542d..bbc943cc7d 100644 --- a/src/util/virnetdevmacvlan.c +++ b/src/util/virnetdevmacvlan.c @@ -158,7 +158,6 @@ virNetDevMacVLanTapOpen(const char *ifname, int *tapfd, size_t tapfdSize) { - int retries = 10; int ret = -1; int ifindex; size_t i = 0; @@ -172,20 +171,13 @@ virNetDevMacVLanTapOpen(const char *ifname, for (i = 0; i < tapfdSize; i++) { int fd = -1;
- while (fd < 0) { - if ((fd = open(tapname, O_RDWR)) >= 0) { - tapfd[i] = fd; - } else if (retries-- > 0) { - /* may need to wait for udev to be done */ - g_usleep(20000); - } else { - /* However, if haven't succeeded, quit. */ - virReportSystemError(errno, - _("cannot open macvtap tap device %1$s"), - tapname); - goto cleanup; - } + if ((fd = open(tapname, O_RDWR)) < 0) { + virReportSystemError(errno, + _("cannot open macvtap tap device %1$s"), + tapname); + goto cleanup; } + tapfd[i] = fd; }
ret = 0;
participants (4)
-
Laine Stump -
Michal Privoznik -
Michal Prívozník -
Richard W.M. Jones