On Monday, April 20, 2020 9:30:56 A.M. EDT Michal Privoznik wrote:
On 4/20/20 3:09 PM, Mark Asselstine wrote:
> On Monday, April 20, 2020 6:11:25 A.M. EDT Michal Privoznik wrote:
>> On 4/16/20 5:57 PM, Mark Asselstine wrote:
>>> The udev monitor thread "udevEventHandleThread()" will lag the
>>> actual/real view of devices in sysfs as it serially processes udev
>>> monitor events. So for instance if you were to run the following cmd
>>> to create a new veth pair and rename one of the veth endpoints
>>>
>>> you might see the following monitor events and real world that looks
>>> like
>>>
>>> time
>>> |
>>> | create v0 sysfs entry
>>>
>>> wake udevEventHandleThread | create v1 sysfs entry
>>> udev_monitor_receive_device(v1-add) | move v0 sysfs to v2
>>> udevHandleOneDevice(v1) |
>>> udev_monitor_receive_device(v0-add) |
>>> udevHandleOneDevice(v0) | <--- error msgs in
>>> virNetDevGetLinkInfo() udev_monitor_receive_device(v2-move) | as
>>> v0
>>> no longer exists udevHandleOneDevice(v2) |
>>>
>>> \/
>>>
>>> As you can see the changes in sysfs can take place well before we get
>>> to act on the events in the udevEventHandleThread(), so by the time we
>>> get around to processing the v0 add event, the sysfs entry has been
>>> moved to v2.
>>>
>>> To work around this we check if the sysfs entry is valid before
>>> attempting to read it and don't bother trying to read link info if
>>> not. This is safe since we will never read sysfs entries earlier than
>>> it existing, ie. if the entry is not there it has either been removed
>>> in the time since we enumerated the device or something bigger is
>>> busted, in either case, no sysfs entry, no link info. In the case
>>> described above we will eventually get the link info as we work
>>> through the queue of monitor events and get to the 'move' event.
>>>
>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1557902
>>>
>>> Signed-off-by: Mark Asselstine <mark.asselstine(a)windriver.com>
>>> ---
>>>
>>> src/util/virnetdev.c | 11 +++++++++++
>>> 1 file changed, 11 insertions(+)
>>>
>>> diff --git a/src/util/virnetdev.c b/src/util/virnetdev.c
>>> index b465bdac2e..bf256cbe35 100644
>>> --- a/src/util/virnetdev.c
>>> +++ b/src/util/virnetdev.c
>>> @@ -2438,6 +2438,17 @@ virNetDevGetLinkInfo(const char *ifname,
>>>
>>> if (virNetDevSysfsFile(&path, ifname, "operstate") <
0)
>>>
>>> goto cleanup;
>>>
>>> + /* The device may have been removed or moved by the time we got
>>> here.
>>> + * Obviously attempting to get LinkInfo on a no longer existing
>>> device
>>> + * is useless, so stop processing. If we got here via the udev
>>> monitor
>>> + * a remove or move event will follow and we will be able to get
>>> valid
>>> + * LinkInfo at that time */
>>> + if (!virFileExists(path)) {
>>> + VIR_WARN("The interface '%s' was removed before we could
query
>>> it.",
>>> + ifname);
>>
>> This is not aligned.
>
> Should I send a V2 or can this be correct when merged?
No need, I've fixed it (also changed goto cleanup to return, since I've
merged a patch that made 'cleanup' label go away this morning) and pushed.
>>> + goto cleanup;
>>> + }
>>
>> But I have another idea to consider. We could return a different error
>> value here (say -2) and then have the caller decide what to do with it
>> (e.g. print the warning).
>
> I had thought about propagating the error so the caller could possibly
> wrap
> the warning message in more context but in the end I decided to call it
> out at the issue site avoiding having each caller essentially reproducing
> essentially the same code.
Yeah, what convinced me was that non-Linux version of this function
simply prints a debug message and returns 0. I haven't realized that
earlier, sorry.
Michal
No problem. Thanks for taking the patches and making the necessary tweaks.
Mark