[PATCH v2 0/3] nodedev: Follow up fixes

This is a v2 of: https://listman.redhat.com/archives/libvir-list/2021-April/msg00478.html Patches that were reviewed in v1 are merged. The rest is reworked per Erik's suggestions. Michal Prívozník (3): nodedev: Wait for device initialization in all public API callbacks nodedev: Mark device initialization complete even in case of an error nodedev: Don't fail device enumeration if MDEVCTL is missing src/node_device/node_device_driver.c | 24 ++++++++++++++++++++++++ src/node_device/node_device_udev.c | 3 +++ 2 files changed, 27 insertions(+) -- 2.26.3

Although I have not experienced this in real life, there is a possible race condition when creating new device, getting its XML or parent or listing its capabilities. If the nodedev driver is still enumerating devices (in a separate thread) and one of virNodeDeviceGetXMLDesc(), virNodeDeviceGetParent(), virNodeDeviceNumOfCaps(), virNodeDeviceListCaps() or virNodeDeviceCreate() is called then it can lead to spurious results because the device enumeration thread is removing devices from or adding them to the internal list of devices (among with their states). Therefore, wait for things to settle down before proceeding with nodeDeviceCreate(). Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/node_device/node_device_driver.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/src/node_device/node_device_driver.c b/src/node_device/node_device_driver.c index 5f8995172d..7aee8201e8 100644 --- a/src/node_device/node_device_driver.c +++ b/src/node_device/node_device_driver.c @@ -340,6 +340,9 @@ nodeDeviceGetXMLDesc(virNodeDevicePtr device, virCheckFlags(0, NULL); + if (nodeDeviceInitWait() < 0) + return NULL; + if (!(obj = nodeDeviceObjFindByName(device->name))) return NULL; def = virNodeDeviceObjGetDef(obj); @@ -368,6 +371,9 @@ nodeDeviceGetParent(virNodeDevicePtr device) virNodeDeviceDef *def; char *ret = NULL; + if (nodeDeviceInitWait() < 0) + return NULL; + if (!(obj = nodeDeviceObjFindByName(device->name))) return NULL; def = virNodeDeviceObjGetDef(obj); @@ -395,6 +401,9 @@ nodeDeviceNumOfCaps(virNodeDevicePtr device) virNodeDeviceDef *def; int ret = -1; + if (nodeDeviceInitWait() < 0) + return -1; + if (!(obj = nodeDeviceObjFindByName(device->name))) return -1; def = virNodeDeviceObjGetDef(obj); @@ -423,6 +432,9 @@ nodeDeviceListCaps(virNodeDevicePtr device, int ret = -1; size_t i = 0; + if (nodeDeviceInitWait() < 0) + return -1; + if (!(obj = nodeDeviceObjFindByName(device->name))) return -1; def = virNodeDeviceObjGetDef(obj); @@ -1399,6 +1411,9 @@ nodeDeviceCreate(virNodeDevice *device, virCheckFlags(0, -1); + if (nodeDeviceInitWait() < 0) + return -1; + if (!(obj = nodeDeviceObjFindByName(device->name))) return -1; -- 2.26.3

On Tue, Apr 13, 2021 at 05:57:13PM +0200, Michal Privoznik wrote:
Although I have not experienced this in real life, there is a possible race condition when creating new device, getting its XML or parent or listing its capabilities. If the nodedev driver is still enumerating devices (in a separate thread) and one of virNodeDeviceGetXMLDesc(), virNodeDeviceGetParent(), virNodeDeviceNumOfCaps(), virNodeDeviceListCaps() or virNodeDeviceCreate() is called then it can lead to spurious results because the device enumeration thread is removing devices from or adding them to the internal list of devices (among with their states).
Therefore, wait for things to settle down before proceeding with nodeDeviceCreate().
s/nodeDeviceCreate/any of the APIs. Reviewed-by: Erik Skultety <eskultet@redhat.com>

To speed up nodedev driver initialization, the device enumeration is done in a separate thread. Once finished, the thread sets a boolean variable that allows public APIs to be called (instead of waiting for the thread to finish). However, if there's an error in the device enumeration thread then the control jumps over at the 'error' label and the boolean is never set. This means, that any virNodeDev*() API is stuck forever. Mark the initialization as complete (the thread is quitting anyway) and let the APIs proceed. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/node_device/node_device_udev.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c index 84785d9e1e..658170a767 100644 --- a/src/node_device/node_device_udev.c +++ b/src/node_device/node_device_udev.c @@ -1982,6 +1982,7 @@ nodeStateInitializeEnumerate(void *opaque) if (nodeDeviceUpdateMediatedDevices() != 0) goto error; + cleanup: nodeDeviceLock(); driver->initialized = true; virCondBroadcast(&driver->initCond); @@ -1996,6 +1997,8 @@ nodeStateInitializeEnumerate(void *opaque) priv->threadQuit = true; virCondSignal(&priv->threadCond); virObjectUnlock(priv); + + goto cleanup; } -- 2.26.3

On Tue, Apr 13, 2021 at 05:57:14PM +0200, Michal Privoznik wrote:
To speed up nodedev driver initialization, the device enumeration is done in a separate thread. Once finished, the thread sets a boolean variable that allows public APIs to be called (instead of waiting for the thread to finish).
However, if there's an error in the device enumeration thread then the control jumps over at the 'error' label and the boolean is never set. This means, that any virNodeDev*() API is stuck forever. Mark the initialization as complete (the thread is quitting anyway) and let the APIs proceed.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- Reviewed-by: Erik Skultety <eskultet@redhat.com>

After all devices were enumerated, the enumeration thread call nodeDeviceUpdateMediatedDevices() to refresh the state of mediated devices. This means that 'mdevctl' will be executed. But it may be missing on some systems (e.g. mine) in which case we should just skip the update of mdevs instead of failing whole device enumeration. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/node_device/node_device_driver.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/src/node_device/node_device_driver.c b/src/node_device/node_device_driver.c index 7aee8201e8..180d9da529 100644 --- a/src/node_device/node_device_driver.c +++ b/src/node_device/node_device_driver.c @@ -42,9 +42,12 @@ #include "virnetdev.h" #include "virutil.h" #include "vircommand.h" +#include "virlog.h" #define VIR_FROM_THIS VIR_FROM_NODEDEV +VIR_LOG_INIT("node_device.node_device_driver"); + virNodeDeviceDriverState *driver; virDrvOpenStatus @@ -1604,9 +1607,15 @@ nodeDeviceUpdateMediatedDevices(void) { g_autofree virNodeDeviceDef **defs = NULL; g_autofree char *errmsg = NULL; + g_autofree char *mdevctl = NULL; virMdevctlForEachData data = { 0, }; size_t i; + if (!(mdevctl = virFindFileInPath(MDEVCTL))) { + VIR_DEBUG(MDEVCTL " not found. Skipping update of mediated devices."); + return 0; + } + if ((data.ndefs = virMdevctlListDefined(&defs, &errmsg)) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("failed to query mdevs from mdevctl: %s"), errmsg); -- 2.26.3

On Tue, Apr 13, 2021 at 05:57:15PM +0200, Michal Privoznik wrote:
After all devices were enumerated, the enumeration thread call nodeDeviceUpdateMediatedDevices() to refresh the state of mediated devices. This means that 'mdevctl' will be executed. But it may be missing on some systems (e.g. mine) in which case we should just skip the update of mdevs instead of failing whole device enumeration.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- Reviewed-by: Erik Skultety <eskultet@redhat.com>
participants (2)
-
Erik Skultety
-
Michal Privoznik