On Monday, April 20, 2020 6:11:25 A.M. EDT Michal Privoznik wrote:
On 4/16/20 5:57 PM, Mark Asselstine wrote:
> The udev monitor thread "udevEventHandleThread()" will lag the
> actual/real view of devices in sysfs as it serially processes udev
> monitor events. So for instance if you were to run the following cmd
> to create a new veth pair and rename one of the veth endpoints
>
> you might see the following monitor events and real world that looks like
>
> time
> |
> | create v0 sysfs entry
>
> wake udevEventHandleThread | create v1 sysfs entry
> udev_monitor_receive_device(v1-add) | move v0 sysfs to v2
> udevHandleOneDevice(v1) |
> udev_monitor_receive_device(v0-add) |
> udevHandleOneDevice(v0) | <--- error msgs in
> virNetDevGetLinkInfo() udev_monitor_receive_device(v2-move) | as v0
> no longer exists udevHandleOneDevice(v2) |
>
> \/
>
> As you can see the changes in sysfs can take place well before we get
> to act on the events in the udevEventHandleThread(), so by the time we
> get around to processing the v0 add event, the sysfs entry has been
> moved to v2.
>
> To work around this we check if the sysfs entry is valid before
> attempting to read it and don't bother trying to read link info if
> not. This is safe since we will never read sysfs entries earlier than
> it existing, ie. if the entry is not there it has either been removed
> in the time since we enumerated the device or something bigger is
> busted, in either case, no sysfs entry, no link info. In the case
> described above we will eventually get the link info as we work
> through the queue of monitor events and get to the 'move' event.
>
>
https://bugzilla.redhat.com/show_bug.cgi?id=1557902
>
> Signed-off-by: Mark Asselstine <mark.asselstine(a)windriver.com>
> ---
>
> src/util/virnetdev.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/src/util/virnetdev.c b/src/util/virnetdev.c
> index b465bdac2e..bf256cbe35 100644
> --- a/src/util/virnetdev.c
> +++ b/src/util/virnetdev.c
> @@ -2438,6 +2438,17 @@ virNetDevGetLinkInfo(const char *ifname,
>
> if (virNetDevSysfsFile(&path, ifname, "operstate") < 0)
>
> goto cleanup;
>
> + /* The device may have been removed or moved by the time we got here.
> + * Obviously attempting to get LinkInfo on a no longer existing
> device
> + * is useless, so stop processing. If we got here via the udev
> monitor
> + * a remove or move event will follow and we will be able to get
> valid
> + * LinkInfo at that time */
> + if (!virFileExists(path)) {
> + VIR_WARN("The interface '%s' was removed before we could query
> it.",
> + ifname);
This is not aligned.
Should I send a V2 or can this be correct when merged?
> + goto cleanup;
> + }
But I have another idea to consider. We could return a different error
value here (say -2) and then have the caller decide what to do with it
(e.g. print the warning).
I had thought about propagating the error so the caller could possibly wrap
the warning message in more context but in the end I decided to call it out at
the issue site avoiding having each caller essentially reproducing essentially
the same code.
But this is still inherently racy. The "move" can happen just after
we've checked for the @path existence and before this virFileReadAll()
below.
But I guess this is still better than nothing.
Agreed, it isn't 100% fault proof. I didn't have the time to refactor the
function to make is safer, but I will definitely think about coming back and
doing so in the future.
Thanks for your review.
Mark
Reviewed-by: Michal Privoznik <mprivozn(a)redhat.com>
Michal