On 4/20/20 3:09 PM, Mark Asselstine wrote:
On Monday, April 20, 2020 6:11:25 A.M. EDT Michal Privoznik wrote:
> On 4/16/20 5:57 PM, Mark Asselstine wrote:
>> The udev monitor thread "udevEventHandleThread()" will lag the
>> actual/real view of devices in sysfs as it serially processes udev
>> monitor events. So for instance if you were to run the following cmd
>> to create a new veth pair and rename one of the veth endpoints
>>
>> you might see the following monitor events and real world that looks like
>>
>> time
>> |
>> | create v0 sysfs entry
>>
>> wake udevEventHandleThread | create v1 sysfs entry
>> udev_monitor_receive_device(v1-add) | move v0 sysfs to v2
>> udevHandleOneDevice(v1) |
>> udev_monitor_receive_device(v0-add) |
>> udevHandleOneDevice(v0) | <--- error msgs in
>> virNetDevGetLinkInfo() udev_monitor_receive_device(v2-move) | as v0
>> no longer exists udevHandleOneDevice(v2) |
>>
>> \/
>>
>> As you can see the changes in sysfs can take place well before we get
>> to act on the events in the udevEventHandleThread(), so by the time we
>> get around to processing the v0 add event, the sysfs entry has been
>> moved to v2.
>>
>> To work around this we check if the sysfs entry is valid before
>> attempting to read it and don't bother trying to read link info if
>> not. This is safe since we will never read sysfs entries earlier than
>> it existing, ie. if the entry is not there it has either been removed
>> in the time since we enumerated the device or something bigger is
>> busted, in either case, no sysfs entry, no link info. In the case
>> described above we will eventually get the link info as we work
>> through the queue of monitor events and get to the 'move' event.
>>
>>
https://bugzilla.redhat.com/show_bug.cgi?id=1557902
>>
>> Signed-off-by: Mark Asselstine <mark.asselstine(a)windriver.com>
>> ---
>>
>> src/util/virnetdev.c | 11 +++++++++++
>> 1 file changed, 11 insertions(+)
>>
>> diff --git a/src/util/virnetdev.c b/src/util/virnetdev.c
>> index b465bdac2e..bf256cbe35 100644
>> --- a/src/util/virnetdev.c
>> +++ b/src/util/virnetdev.c
>> @@ -2438,6 +2438,17 @@ virNetDevGetLinkInfo(const char *ifname,
>>
>> if (virNetDevSysfsFile(&path, ifname, "operstate") < 0)
>>
>> goto cleanup;
>>
>> + /* The device may have been removed or moved by the time we got here.
>> + * Obviously attempting to get LinkInfo on a no longer existing
>> device
>> + * is useless, so stop processing. If we got here via the udev
>> monitor
>> + * a remove or move event will follow and we will be able to get
>> valid
>> + * LinkInfo at that time */
>> + if (!virFileExists(path)) {
>> + VIR_WARN("The interface '%s' was removed before we could
query
>> it.",
>> + ifname);
>
> This is not aligned.
Should I send a V2 or can this be correct when merged?
No need, I've fixed it (also changed goto cleanup to return, since I've
merged a patch that made 'cleanup' label go away this morning) and pushed.
>
>> + goto cleanup;
>> + }
>
> But I have another idea to consider. We could return a different error
> value here (say -2) and then have the caller decide what to do with it
> (e.g. print the warning).
I had thought about propagating the error so the caller could possibly wrap
the warning message in more context but in the end I decided to call it out at
the issue site avoiding having each caller essentially reproducing essentially
the same code.
Yeah, what convinced me was that non-Linux version of this function
simply prints a debug message and returns 0. I haven't realized that
earlier, sorry.
Michal