[libvirt] [PATCH v5 0/6] Work around the kernel mdev uevent race in nodedev

v4: https://www.redhat.com/archives/libvir-list/2017-September/msg00532.html Since v4: - fixed the reviewer's notes in 1/7 - dropped the original 2/7 because of converting the udev private data into a lockable object - current 2/6 is a new patch reworking our current locking approach of the udev private data where we only relied on the driver global lock, after this patch, private data are going to be a self-lockable object - dropped the event counting from 4/7, the thread now reads everything it can until it gets EAGAIN/EWOULDBLOCK from libudev, which renders enablement/disablement of event loop fd polling unnecessary -> other minor adjustments raised during review -> dropped our udev fd sanity check from the worker thread, since the thread will encounter a libudev error sooner or later, so the sanity check inside the worker thread wouldn't really make a difference Erik Skultety (6): nodedev: Introduce udevEventMonitorSanityCheck helper function (ACKed) nodedev: udev: Convert udev private data to a lockable object (NEW) udev: Split udevEventHandleCallback in two functions (2/6 necessary changes) udev: Convert udevEventHandleThread to an actual thread routine (reworked) util: Introduce virFileWaitForExists (unchanged && ACKed) nodedev: Work around the uevent race by hooking up virFileWaitForAccess (unchanged && ACKed) src/libvirt_private.syms | 1 + src/node_device/node_device_udev.c | 281 ++++++++++++++++++++++++++++--------- src/node_device/node_device_udev.h | 3 - src/util/virfile.c | 31 ++++ src/util/virfile.h | 2 + 5 files changed, 249 insertions(+), 69 deletions(-) -- 2.13.6

We need to perform a sanity check on the udev monitor before every use so that we know nothing has changed in the meantime. The reason for moving the code to a separate helper is to enhance readability and shift the focus on the important stuff within the udevEventHandleCallback handler. Signed-off-by: Erik Skultety <eskultet@redhat.com> --- src/node_device/node_device_udev.c | 43 ++++++++++++++++++++++++-------------- 1 file changed, 27 insertions(+), 16 deletions(-) diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c index f4177455c..a9a4c9b6b 100644 --- a/src/node_device/node_device_udev.c +++ b/src/node_device/node_device_udev.c @@ -1615,24 +1615,20 @@ udevHandleOneDevice(struct udev_device *device) } -static void -udevEventHandleCallback(int watch ATTRIBUTE_UNUSED, - int fd, - int events ATTRIBUTE_UNUSED, - void *data ATTRIBUTE_UNUSED) +static bool +udevEventMonitorSanityCheck(struct udev_monitor *udev_monitor, + int fd) { - struct udev_device *device = NULL; - struct udev_monitor *udev_monitor = DRV_STATE_UDEV_MONITOR(driver); - int udev_fd = -1; + int rc = -1; - udev_fd = udev_monitor_get_fd(udev_monitor); - if (fd != udev_fd) { + rc = udev_monitor_get_fd(udev_monitor); + if (fd != rc) { udevPrivate *priv = driver->privateData; virReportError(VIR_ERR_INTERNAL_ERROR, _("File descriptor returned by udev %d does not " "match node device file descriptor %d"), - fd, udev_fd); + fd, rc); /* this is a non-recoverable error, let's remove the handle, so that we * don't get in here again because of some spurious behaviour and report @@ -1641,21 +1637,36 @@ udevEventHandleCallback(int watch ATTRIBUTE_UNUSED, virEventRemoveHandle(priv->watch); priv->watch = -1; - goto cleanup; + return false; } + return true; +} + + +static void +udevEventHandleCallback(int watch ATTRIBUTE_UNUSED, + int fd, + int events ATTRIBUTE_UNUSED, + void *data ATTRIBUTE_UNUSED) +{ + struct udev_device *device = NULL; + struct udev_monitor *udev_monitor = NULL; + + udev_monitor = DRV_STATE_UDEV_MONITOR(driver); + + if (!udevEventMonitorSanityCheck(udev_monitor, fd)) + return; + device = udev_monitor_receive_device(udev_monitor); if (device == NULL) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("udev_monitor_receive_device returned NULL")); - goto cleanup; + return; } udevHandleOneDevice(device); - - cleanup: udev_device_unref(device); - return; } -- 2.13.6

On 10/11/2017 10:52 AM, Erik Skultety wrote:
We need to perform a sanity check on the udev monitor before every use so that we know nothing has changed in the meantime. The reason for moving the code to a separate helper is to enhance readability and shift the focus on the important stuff within the udevEventHandleCallback handler.
Signed-off-by: Erik Skultety <eskultet@redhat.com> --- src/node_device/node_device_udev.c | 43 ++++++++++++++++++++++++-------------- 1 file changed, 27 insertions(+), 16 deletions(-)
Reviewed-by: John Ferlan <jferlan@redhat.com> John

Since there's going to be a worker thread which needs to have some data protected by a lock, the whole code would just simply get unnecessary complex, since two sets of locks would be necessary, driver lock (for udev monitor and event handle) and a mutex protecting thread-local data. Given the future thread will need to access the udev monitor socket as well, why not protect everything with a single lock, even better, by converting the driver's private data to a lockable object, we get the automatic object disposal feature for free. Signed-off-by: Erik Skultety <eskultet@redhat.com> --- src/node_device/node_device_udev.c | 140 ++++++++++++++++++++++++------------- src/node_device/node_device_udev.h | 3 - 2 files changed, 93 insertions(+), 50 deletions(-) diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c index a9a4c9b6b..bb9787fdb 100644 --- a/src/node_device/node_device_udev.c +++ b/src/node_device/node_device_udev.c @@ -53,12 +53,78 @@ VIR_LOG_INIT("node_device.node_device_udev"); # define TYPE_RAID 12 #endif -struct _udevPrivate { +typedef struct _udevEventData udevEventData; +typedef udevEventData *udevEventDataPtr; + +struct _udevEventData { + virObjectLockable parent; + struct udev_monitor *udev_monitor; int watch; bool privileged; }; +static virClassPtr udevEventDataClass; + +static void +udevEventDataDispose(void *obj) +{ + struct udev *udev = NULL; + udevEventDataPtr data = obj; + + if (data->watch != -1) + virEventRemoveHandle(data->watch); + + if (data->udev_monitor) + udev = udev_monitor_get_udev(data->udev_monitor); + + udev_unref(udev); + udev_monitor_unref(data->udev_monitor); +} + + +static int +udevEventDataOnceInit(void) +{ + if (!(udevEventDataClass = virClassNew(virClassForObjectLockable(), + "udevEventData", + sizeof(udevEventData), + udevEventDataDispose))) + return -1; + + return 0; +} + +VIR_ONCE_GLOBAL_INIT(udevEventData) + +static udevEventDataPtr +udevEventDataNew(void) +{ + udevEventDataPtr ret = NULL; + + if (udevEventDataInitialize() < 0) + return NULL; + + if (!(ret = virObjectLockableNew(udevEventDataClass))) + return NULL; + + ret->watch = -1; + return ret; +} + + +static bool +udevEventDataIsPrivileged(udevEventDataPtr data) +{ + bool privileged; + + virObjectLock(data); + privileged = data->privileged; + virObjectUnlock(data); + + return privileged; +} + static bool udevHasDeviceProperty(struct udev_device *dev, @@ -447,7 +513,7 @@ udevProcessPCI(struct udev_device *device, virNodeDevCapPCIDevPtr pci_dev = &def->caps->data.pci_dev; virPCIEDeviceInfoPtr pci_express = NULL; virPCIDevicePtr pciDev = NULL; - udevPrivate *priv = driver->privateData; + udevEventDataPtr priv = driver->privateData; int ret = -1; char *p; @@ -498,7 +564,7 @@ udevProcessPCI(struct udev_device *device, goto cleanup; /* We need to be root to read PCI device configs */ - if (priv->privileged) { + if (udevEventDataIsPrivileged(priv)) { if (virPCIGetHeaderType(pciDev, &pci_dev->hdrType) < 0) goto cleanup; @@ -1559,39 +1625,18 @@ udevPCITranslateDeinit(void) static int nodeStateCleanup(void) { - udevPrivate *priv = NULL; - struct udev_monitor *udev_monitor = NULL; - struct udev *udev = NULL; - if (!driver) return -1; nodeDeviceLock(); + virObjectUnref(driver->privateData); virObjectUnref(driver->nodeDeviceEventState); - priv = driver->privateData; - - if (priv) { - if (priv->watch != -1) - virEventRemoveHandle(priv->watch); - - udev_monitor = DRV_STATE_UDEV_MONITOR(driver); - - if (udev_monitor != NULL) { - udev = udev_monitor_get_udev(udev_monitor); - udev_monitor_unref(udev_monitor); - } - } - - if (udev != NULL) - udev_unref(udev); - virNodeDeviceObjListFree(driver->devs); nodeDeviceUnlock(); virMutexDestroy(&driver->lock); VIR_FREE(driver); - VIR_FREE(priv); udevPCITranslateDeinit(); return 0; @@ -1615,16 +1660,17 @@ udevHandleOneDevice(struct udev_device *device) } +/* the caller must be holding the udevEventData object lock prior to calling + * this function + */ static bool -udevEventMonitorSanityCheck(struct udev_monitor *udev_monitor, +udevEventMonitorSanityCheck(udevEventDataPtr priv, int fd) { int rc = -1; - rc = udev_monitor_get_fd(udev_monitor); + rc = udev_monitor_get_fd(priv->udev_monitor); if (fd != rc) { - udevPrivate *priv = driver->privateData; - virReportError(VIR_ERR_INTERNAL_ERROR, _("File descriptor returned by udev %d does not " "match node device file descriptor %d"), @@ -1650,15 +1696,19 @@ udevEventHandleCallback(int watch ATTRIBUTE_UNUSED, int events ATTRIBUTE_UNUSED, void *data ATTRIBUTE_UNUSED) { + udevEventDataPtr priv = driver->privateData; struct udev_device *device = NULL; - struct udev_monitor *udev_monitor = NULL; - udev_monitor = DRV_STATE_UDEV_MONITOR(driver); + virObjectLock(priv); - if (!udevEventMonitorSanityCheck(udev_monitor, fd)) + if (!udevEventMonitorSanityCheck(priv, fd)) { + virObjectUnlock(priv); return; + } + + device = udev_monitor_receive_device(priv->udev_monitor); + virObjectUnlock(priv); - device = udev_monitor_receive_device(udev_monitor); if (device == NULL) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("udev_monitor_receive_device returned NULL")); @@ -1675,12 +1725,13 @@ udevEventHandleCallback(int watch ATTRIBUTE_UNUSED, static void udevGetDMIData(virNodeDevCapSystemPtr syscap) { + udevEventDataPtr priv = driver->privateData; struct udev *udev = NULL; struct udev_device *device = NULL; virNodeDevCapSystemHardwarePtr hardware = &syscap->hardware; virNodeDevCapSystemFirmwarePtr firmware = &syscap->firmware; - udev = udev_monitor_get_udev(DRV_STATE_UDEV_MONITOR(driver)); + udev = udev_monitor_get_udev(priv->udev_monitor); device = udev_device_new_from_syspath(udev, DMI_DEVPATH); if (device == NULL) { @@ -1791,15 +1842,9 @@ nodeStateInitialize(bool privileged, virStateInhibitCallback callback ATTRIBUTE_UNUSED, void *opaque ATTRIBUTE_UNUSED) { - udevPrivate *priv = NULL; + udevEventDataPtr priv = NULL; struct udev *udev = NULL; - if (VIR_ALLOC(priv) < 0) - return -1; - - priv->watch = -1; - priv->privileged = privileged; - if (VIR_ALLOC(driver) < 0) { VIR_FREE(priv); return -1; @@ -1813,12 +1858,13 @@ nodeStateInitialize(bool privileged, return -1; } + nodeDeviceLock(); + + if (!(driver->devs = virNodeDeviceObjListNew()) || + !(priv = udevEventDataNew())) + goto unlock; + driver->privateData = priv; - nodeDeviceLock(); - - if (!(driver->devs = virNodeDeviceObjListNew())) - goto unlock; - driver->nodeDeviceEventState = virObjectEventStateNew(); if (udevPCITranslateInit(privileged) < 0) @@ -1836,7 +1882,7 @@ nodeStateInitialize(bool privileged, #endif priv->udev_monitor = udev_monitor_new_from_netlink(udev, "udev"); - if (priv->udev_monitor == NULL) { + if (!priv->udev_monitor) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("udev_monitor_new_from_netlink returned NULL")); goto unlock; diff --git a/src/node_device/node_device_udev.h b/src/node_device/node_device_udev.h index 9a07ab77e..f15e5204c 100644 --- a/src/node_device/node_device_udev.h +++ b/src/node_device/node_device_udev.h @@ -23,9 +23,6 @@ #include <libudev.h> #include <stdint.h> -typedef struct _udevPrivate udevPrivate; - #define SYSFS_DATA_SIZE 4096 -#define DRV_STATE_UDEV_MONITOR(ds) (((udevPrivate *)((ds)->privateData))->udev_monitor) #define DMI_DEVPATH "/sys/devices/virtual/dmi/id" #define DMI_DEVPATH_FALLBACK "/sys/class/dmi/id" -- 2.13.6

On 10/11/2017 10:52 AM, Erik Skultety wrote:
Since there's going to be a worker thread which needs to have some data protected by a lock, the whole code would just simply get unnecessary complex, since two sets of locks would be necessary, driver lock (for udev monitor and event handle) and a mutex protecting thread-local data. Given the future thread will need to access the udev monitor socket as well, why not protect everything with a single lock, even better, by converting the driver's private data to a lockable object, we get the automatic object disposal feature for free.
Signed-off-by: Erik Skultety <eskultet@redhat.com> --- src/node_device/node_device_udev.c | 140 ++++++++++++++++++++++++------------- src/node_device/node_device_udev.h | 3 - 2 files changed, 93 insertions(+), 50 deletions(-)
diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c index a9a4c9b6b..bb9787fdb 100644 --- a/src/node_device/node_device_udev.c +++ b/src/node_device/node_device_udev.c @@ -53,12 +53,78 @@ VIR_LOG_INIT("node_device.node_device_udev"); # define TYPE_RAID 12 #endif
-struct _udevPrivate { +typedef struct _udevEventData udevEventData; +typedef udevEventData *udevEventDataPtr; + +struct _udevEventData { + virObjectLockable parent; + struct udev_monitor *udev_monitor; int watch; bool privileged; };
Mental note - maybe the driver->privateData should change to driver->udevEventData in _virNodeDeviceDriverState You interchange using @priv and @data throughout which right now could make sense, but in the future may make less sense. I have a preference for consistency, so calling @data in some places and @priv in others leads to inconsistency. I believe pick one and be consistent.
+static virClassPtr udevEventDataClass; + +static void +udevEventDataDispose(void *obj) +{ + struct udev *udev = NULL; + udevEventDataPtr data = obj; + + if (data->watch != -1) + virEventRemoveHandle(data->watch); + + if (data->udev_monitor) + udev = udev_monitor_get_udev(data->udev_monitor); + + udev_unref(udev); + udev_monitor_unref(data->udev_monitor);
This is a different order than previous cleanup - perhaps should be: if (data->udev_monitor) { udev = udev_monitor_get_udev(data->udev_monitor); udev_monitor_unref(data->udev_monitor); udev_unref(udev); } or if (!data->udev_monitor) return; udev = udev_monitor_get_udev(data->udev_monitor); udev_monitor_unref(data->udev_monitor); udev_unref(udev); Just not clear what happens to udev if you unref udev before unref udev_monitor. Better safe than sorry and go in the opposite order of Initialization.
+} + + +static int +udevEventDataOnceInit(void) +{ + if (!(udevEventDataClass = virClassNew(virClassForObjectLockable(), + "udevEventData", + sizeof(udevEventData), + udevEventDataDispose))) + return -1; + + return 0; +} + +VIR_ONCE_GLOBAL_INIT(udevEventData) + +static udevEventDataPtr +udevEventDataNew(void) +{ + udevEventDataPtr ret = NULL; + + if (udevEventDataInitialize() < 0) + return NULL; + + if (!(ret = virObjectLockableNew(udevEventDataClass))) + return NULL; + + ret->watch = -1; + return ret; +} + + +static bool +udevEventDataIsPrivileged(udevEventDataPtr data) +{ + bool privileged;
Hmmm... is @data privileged or is the driver that needs the privilege check... IOW: should privileged by a DriverState value (even though it'd be unused in _hal). It comes in as a virDrvStateInitialize value. That way it's just a driver->privileged check regardless of whether there is an event thread running. I know - should have thought of this sooner, but sometimes things are more obvious at odd points in review time. Moving privileged would be a patch 1.5 and thus make this helper unnecessary. It'd be a simple move from private to NodeDeviceDriverState and then store/read from there instead of @priv. Then udevGetDMIData wouldn't need to get @priv.
+ + virObjectLock(data); + privileged = data->privileged; + virObjectUnlock(data); + + return privileged; +} +
static bool udevHasDeviceProperty(struct udev_device *dev, @@ -447,7 +513,7 @@ udevProcessPCI(struct udev_device *device, virNodeDevCapPCIDevPtr pci_dev = &def->caps->data.pci_dev; virPCIEDeviceInfoPtr pci_express = NULL; virPCIDevicePtr pciDev = NULL; - udevPrivate *priv = driver->privateData; + udevEventDataPtr priv = driver->privateData; int ret = -1; char *p;
@@ -498,7 +564,7 @@ udevProcessPCI(struct udev_device *device, goto cleanup;
/* We need to be root to read PCI device configs */ - if (priv->privileged) { + if (udevEventDataIsPrivileged(priv)) { if (virPCIGetHeaderType(pciDev, &pci_dev->hdrType) < 0) goto cleanup;
@@ -1559,39 +1625,18 @@ udevPCITranslateDeinit(void) static int nodeStateCleanup(void) { - udevPrivate *priv = NULL; - struct udev_monitor *udev_monitor = NULL; - struct udev *udev = NULL; - if (!driver) return -1;
nodeDeviceLock();
+ virObjectUnref(driver->privateData); virObjectUnref(driver->nodeDeviceEventState);
- priv = driver->privateData; - - if (priv) {
FWIW: It's this that got me to thinking privileged should never have been part of @priv...
- if (priv->watch != -1) - virEventRemoveHandle(priv->watch); - - udev_monitor = DRV_STATE_UDEV_MONITOR(driver); - - if (udev_monitor != NULL) { - udev = udev_monitor_get_udev(udev_monitor); - udev_monitor_unref(udev_monitor); - } - } - - if (udev != NULL) - udev_unref(udev); - virNodeDeviceObjListFree(driver->devs); nodeDeviceUnlock(); virMutexDestroy(&driver->lock); VIR_FREE(driver); - VIR_FREE(priv);
udevPCITranslateDeinit(); return 0; @@ -1615,16 +1660,17 @@ udevHandleOneDevice(struct udev_device *device) }
+/* the caller must be holding the udevEventData object lock prior to calling + * this function + */ static bool -udevEventMonitorSanityCheck(struct udev_monitor *udev_monitor, +udevEventMonitorSanityCheck(udevEventDataPtr priv, int fd) { int rc = -1;
- rc = udev_monitor_get_fd(udev_monitor); + rc = udev_monitor_get_fd(priv->udev_monitor); if (fd != rc) { - udevPrivate *priv = driver->privateData; - virReportError(VIR_ERR_INTERNAL_ERROR, _("File descriptor returned by udev %d does not " "match node device file descriptor %d"), @@ -1650,15 +1696,19 @@ udevEventHandleCallback(int watch ATTRIBUTE_UNUSED, int events ATTRIBUTE_UNUSED, void *data ATTRIBUTE_UNUSED) { + udevEventDataPtr priv = driver->privateData; struct udev_device *device = NULL; - struct udev_monitor *udev_monitor = NULL;
- udev_monitor = DRV_STATE_UDEV_MONITOR(driver);
Should we add a "if (!priv) return" before this?
+ virObjectLock(priv);
- if (!udevEventMonitorSanityCheck(udev_monitor, fd)) + if (!udevEventMonitorSanityCheck(priv, fd)) { + virObjectUnlock(priv); return; + } + + device = udev_monitor_receive_device(priv->udev_monitor); + virObjectUnlock(priv);
- device = udev_monitor_receive_device(udev_monitor); if (device == NULL) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("udev_monitor_receive_device returned NULL")); @@ -1675,12 +1725,13 @@ udevEventHandleCallback(int watch ATTRIBUTE_UNUSED, static void udevGetDMIData(virNodeDevCapSystemPtr syscap) { + udevEventDataPtr priv = driver->privateData; struct udev *udev = NULL; struct udev_device *device = NULL; virNodeDevCapSystemHardwarePtr hardware = &syscap->hardware; virNodeDevCapSystemFirmwarePtr firmware = &syscap->firmware;
- udev = udev_monitor_get_udev(DRV_STATE_UDEV_MONITOR(driver)); + udev = udev_monitor_get_udev(priv->udev_monitor);
There's no lock/unlock around priv-> access here (although there is earlier).
device = udev_device_new_from_syspath(udev, DMI_DEVPATH); if (device == NULL) { @@ -1791,15 +1842,9 @@ nodeStateInitialize(bool privileged, virStateInhibitCallback callback ATTRIBUTE_UNUSED, void *opaque ATTRIBUTE_UNUSED) { - udevPrivate *priv = NULL; + udevEventDataPtr priv = NULL; struct udev *udev = NULL;
- if (VIR_ALLOC(priv) < 0) - return -1; - - priv->watch = -1; - priv->privileged = privileged;
FWIW: You lost priv->privileged setting in this patch... So it'd always be false
- if (VIR_ALLOC(driver) < 0) { VIR_FREE(priv);
^^ won't be necessary and then of course the { } won't be either... Same a few lines lower where there's VIR_FREE(priv), but the { } would stay after a virMutexInit failure With a few tweaks and edits... Reviewed-by: John Ferlan <jferlan@redhat.com> John
return -1; @@ -1813,12 +1858,13 @@ nodeStateInitialize(bool privileged, return -1; }
+ nodeDeviceLock(); + + if (!(driver->devs = virNodeDeviceObjListNew()) || + !(priv = udevEventDataNew())) + goto unlock; + driver->privateData = priv; - nodeDeviceLock(); - - if (!(driver->devs = virNodeDeviceObjListNew())) - goto unlock; - driver->nodeDeviceEventState = virObjectEventStateNew();
if (udevPCITranslateInit(privileged) < 0) @@ -1836,7 +1882,7 @@ nodeStateInitialize(bool privileged, #endif
priv->udev_monitor = udev_monitor_new_from_netlink(udev, "udev"); - if (priv->udev_monitor == NULL) { + if (!priv->udev_monitor) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("udev_monitor_new_from_netlink returned NULL")); goto unlock; diff --git a/src/node_device/node_device_udev.h b/src/node_device/node_device_udev.h index 9a07ab77e..f15e5204c 100644 --- a/src/node_device/node_device_udev.h +++ b/src/node_device/node_device_udev.h @@ -23,9 +23,6 @@ #include <libudev.h> #include <stdint.h>
-typedef struct _udevPrivate udevPrivate; - #define SYSFS_DATA_SIZE 4096 -#define DRV_STATE_UDEV_MONITOR(ds) (((udevPrivate *)((ds)->privateData))->udev_monitor) #define DMI_DEVPATH "/sys/devices/virtual/dmi/id" #define DMI_DEVPATH_FALLBACK "/sys/class/dmi/id"

[...]
+struct _udevEventData { + virObjectLockable parent; + struct udev_monitor *udev_monitor; int watch; bool privileged; };
Mental note - maybe the driver->privateData should change to driver->udevEventData in _virNodeDeviceDriverState
I like the notion of generic private data per backend, even though one of them doesn't use it all. But in principle, I agree with you, I'd just wait 'till we decide that we can safely deprecate hal backend in upstream completely.
You interchange using @priv and @data throughout which right now could make sense, but in the future may make less sense. I have a preference for consistency, so calling @data in some places and @priv in others leads to inconsistency. I believe pick one and be consistent.
Good point, noted.
+static virClassPtr udevEventDataClass; + +static void +udevEventDataDispose(void *obj) +{ + struct udev *udev = NULL; + udevEventDataPtr data = obj; + + if (data->watch != -1) + virEventRemoveHandle(data->watch); + + if (data->udev_monitor) + udev = udev_monitor_get_udev(data->udev_monitor); + + udev_unref(udev); + udev_monitor_unref(data->udev_monitor);
This is a different order than previous cleanup - perhaps should be:
if (data->udev_monitor) { udev = udev_monitor_get_udev(data->udev_monitor); udev_monitor_unref(data->udev_monitor); udev_unref(udev); }
libudev handles NULLs gracefully...
or
if (!data->udev_monitor) return;
udev = udev_monitor_get_udev(data->udev_monitor); udev_monitor_unref(data->udev_monitor); udev_unref(udev);
I'll go with this version then, I did it this way, because udev keeps a pointer to the context withing the monitor, but doesn't touch it when freeing, leaving the responsibility to the caller. But I agree that releasing the monitor before releasing the context makes more sense.
Just not clear what happens to udev if you unref udev before unref udev_monitor. Better safe than sorry and go in the opposite order of Initialization.
+} + + +static int +udevEventDataOnceInit(void) +{ + if (!(udevEventDataClass = virClassNew(virClassForObjectLockable(), + "udevEventData", + sizeof(udevEventData), + udevEventDataDispose))) + return -1; + + return 0; +} + +VIR_ONCE_GLOBAL_INIT(udevEventData) + +static udevEventDataPtr +udevEventDataNew(void) +{ + udevEventDataPtr ret = NULL; + + if (udevEventDataInitialize() < 0) + return NULL; + + if (!(ret = virObjectLockableNew(udevEventDataClass))) + return NULL; + + ret->watch = -1; + return ret; +} + + +static bool +udevEventDataIsPrivileged(udevEventDataPtr data) +{ + bool privileged;
Hmmm... is @data privileged or is the driver that needs the privilege check... IOW: should privileged by a DriverState value (even though it'd be unused in _hal). It comes in as a virDrvStateInitialize value. That way it's just a driver->privileged check regardless of whether there is an event thread running.
Good observation, a patch will be added.
I know - should have thought of this sooner, but sometimes things are more obvious at odd points in review time. Moving privileged would be a patch 1.5 and thus make this helper unnecessary. It'd be a simple move from private to NodeDeviceDriverState and then store/read from there instead of @priv. Then udevGetDMIData wouldn't need to get @priv.
No problem. [..]
{ + udevEventDataPtr priv = driver->privateData; struct udev_device *device = NULL; - struct udev_monitor *udev_monitor = NULL;
- udev_monitor = DRV_STATE_UDEV_MONITOR(driver);
Should we add a "if (!priv) return" before this?
This would only be necessary if that was a user's input, but it's us who's completely responsible for that and @data are only touched by stateCleanup, which runs after the eventloop ends, so this check would be essentially useless.
+ virObjectLock(priv);
- if (!udevEventMonitorSanityCheck(udev_monitor, fd)) + if (!udevEventMonitorSanityCheck(priv, fd)) { + virObjectUnlock(priv); return; + } + + device = udev_monitor_receive_device(priv->udev_monitor); + virObjectUnlock(priv);
- device = udev_monitor_receive_device(udev_monitor); if (device == NULL) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("udev_monitor_receive_device returned NULL")); @@ -1675,12 +1725,13 @@ udevEventHandleCallback(int watch ATTRIBUTE_UNUSED, static void udevGetDMIData(virNodeDevCapSystemPtr syscap) { + udevEventDataPtr priv = driver->privateData; struct udev *udev = NULL; struct udev_device *device = NULL; virNodeDevCapSystemHardwarePtr hardware = &syscap->hardware; virNodeDevCapSystemFirmwarePtr firmware = &syscap->firmware;
- udev = udev_monitor_get_udev(DRV_STATE_UDEV_MONITOR(driver)); + udev = udev_monitor_get_udev(priv->udev_monitor);
There's no lock/unlock around priv-> access here (although there is earlier).
this is called from within Initialize which holds the lock almost the whole time, so this would cause a deadlock (I only realized this when I tried it). So because what you say makes more sense, I'll add a separate patch that will move the object lock in Initialize before trying to set up the system node (there's no need to hold it the whole time) and then we can go with this suggestion.
device = udev_device_new_from_syspath(udev, DMI_DEVPATH); if (device == NULL) { @@ -1791,15 +1842,9 @@ nodeStateInitialize(bool privileged, virStateInhibitCallback callback ATTRIBUTE_UNUSED, void *opaque ATTRIBUTE_UNUSED) { - udevPrivate *priv = NULL; + udevEventDataPtr priv = NULL; struct udev *udev = NULL;
- if (VIR_ALLOC(priv) < 0) - return -1; - - priv->watch = -1; - priv->privileged = privileged;
FWIW: You lost priv->privileged setting in this patch... So it'd always be false
Great eyes ;).
- if (VIR_ALLOC(driver) < 0) { VIR_FREE(priv);
^^ won't be necessary and then of course the { } won't be either...
Same a few lines lower where there's VIR_FREE(priv), but the { } would stay after a virMutexInit failure
It's always mistakes like these that are so obvious yet they can be only spotted during a review :), thanks. Also, I appreciate the RBs but since I'll be adding 3 more patches (even though trivial ones) I'd still appreciate one more (final) review. Erik

This patch splits udevEventHandleCallback in two (introduces udevEventHandleThread) in order to be later able to refactor the latter to actually become a normal thread which will wait some time for the kernel to create the whole sysfs tree for a device as we cannot do that in the event loop directly. Signed-off-by: Erik Skultety <eskultet@redhat.com> --- src/node_device/node_device_udev.c | 41 +++++++++++++++++++++++++++----------- 1 file changed, 29 insertions(+), 12 deletions(-) diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c index bb9787fdb..f7646cd8a 100644 --- a/src/node_device/node_device_udev.c +++ b/src/node_device/node_device_udev.c @@ -1691,32 +1691,49 @@ udevEventMonitorSanityCheck(udevEventDataPtr priv, static void +udevEventHandleThread(void *opaque) +{ + udevEventDataPtr priv = driver->privateData; + int fd = (intptr_t) opaque; + struct udev_device *device = NULL; + + virObjectLock(priv); + + if (!udevEventMonitorSanityCheck(priv, fd)) { + virObjectUnlock(priv); + return; + } + + device = udev_monitor_receive_device(priv->udev_monitor); + virObjectUnlock(priv); + + if (device == NULL) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("udev_monitor_receive_device returned NULL")); + return; + } + + udevHandleOneDevice(device); + udev_device_unref(device); +} + + +static void udevEventHandleCallback(int watch ATTRIBUTE_UNUSED, int fd, int events ATTRIBUTE_UNUSED, void *data ATTRIBUTE_UNUSED) { udevEventDataPtr priv = driver->privateData; - struct udev_device *device = NULL; virObjectLock(priv); - if (!udevEventMonitorSanityCheck(priv, fd)) { virObjectUnlock(priv); return; } - - device = udev_monitor_receive_device(priv->udev_monitor); virObjectUnlock(priv); - if (device == NULL) { - virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("udev_monitor_receive_device returned NULL")); - return; - } - - udevHandleOneDevice(device); - udev_device_unref(device); + udevEventHandleThread((void *)(intptr_t) fd); } -- 2.13.6

On 10/11/2017 10:52 AM, Erik Skultety wrote:
This patch splits udevEventHandleCallback in two (introduces udevEventHandleThread) in order to be later able to refactor the latter to actually become a normal thread which will wait some time for the kernel to create the whole sysfs tree for a device as we cannot do that in the event loop directly.
Signed-off-by: Erik Skultety <eskultet@redhat.com> --- src/node_device/node_device_udev.c | 41 +++++++++++++++++++++++++++----------- 1 file changed, 29 insertions(+), 12 deletions(-)
diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c index bb9787fdb..f7646cd8a 100644 --- a/src/node_device/node_device_udev.c +++ b/src/node_device/node_device_udev.c @@ -1691,32 +1691,49 @@ udevEventMonitorSanityCheck(udevEventDataPtr priv,
static void +udevEventHandleThread(void *opaque) +{ + udevEventDataPtr priv = driver->privateData; + int fd = (intptr_t) opaque; + struct udev_device *device = NULL; +
To go along with my thought from patch 2, should there be a "if (!priv)" check here too? I believe it doesn't matter yet because there are driver level locks that would seemingly prevent the thread code from running if Cleanup occurs, but soon enough that won't be the case. Reviewed-by: John Ferlan <jferlan@redhat.com> John
+ virObjectLock(priv); + + if (!udevEventMonitorSanityCheck(priv, fd)) { + virObjectUnlock(priv); + return; + } + + device = udev_monitor_receive_device(priv->udev_monitor); + virObjectUnlock(priv); + + if (device == NULL) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("udev_monitor_receive_device returned NULL")); + return; + } + + udevHandleOneDevice(device); + udev_device_unref(device); +} + + +static void udevEventHandleCallback(int watch ATTRIBUTE_UNUSED, int fd, int events ATTRIBUTE_UNUSED, void *data ATTRIBUTE_UNUSED) { udevEventDataPtr priv = driver->privateData; - struct udev_device *device = NULL;
virObjectLock(priv); - if (!udevEventMonitorSanityCheck(priv, fd)) { virObjectUnlock(priv); return; } - - device = udev_monitor_receive_device(priv->udev_monitor); virObjectUnlock(priv);
- if (device == NULL) { - virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("udev_monitor_receive_device returned NULL")); - return; - } - - udevHandleOneDevice(device); - udev_device_unref(device); + udevEventHandleThread((void *)(intptr_t) fd); }

On Sun, Oct 15, 2017 at 10:23:52AM -0400, John Ferlan wrote:
On 10/11/2017 10:52 AM, Erik Skultety wrote:
This patch splits udevEventHandleCallback in two (introduces udevEventHandleThread) in order to be later able to refactor the latter to actually become a normal thread which will wait some time for the kernel to create the whole sysfs tree for a device as we cannot do that in the event loop directly.
Signed-off-by: Erik Skultety <eskultet@redhat.com> --- src/node_device/node_device_udev.c | 41 +++++++++++++++++++++++++++----------- 1 file changed, 29 insertions(+), 12 deletions(-)
diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c index bb9787fdb..f7646cd8a 100644 --- a/src/node_device/node_device_udev.c +++ b/src/node_device/node_device_udev.c @@ -1691,32 +1691,49 @@ udevEventMonitorSanityCheck(udevEventDataPtr priv,
static void +udevEventHandleThread(void *opaque) +{ + udevEventDataPtr priv = driver->privateData; + int fd = (intptr_t) opaque; + struct udev_device *device = NULL; +
To go along with my thought from patch 2, should there be a "if (!priv)" check here too? I believe it doesn't matter yet because there are driver level locks that would seemingly prevent the thread code from running if Cleanup occurs, but soon enough that won't be the case.
It's called from within Initialize which is the main thread, if !priv, Initialize would have failed by this point Erik

Adjust udevEventHandleThread to be a proper thread routine running in an infinite loop handling devices. The handler thread pulls all available data from the udev monitor and only then waits until a wakeup signal for new incoming data has been emitted by udevEventHandleCallback. Signed-off-by: Erik Skultety <eskultet@redhat.com> --- src/node_device/node_device_udev.c | 125 +++++++++++++++++++++++++++---------- 1 file changed, 93 insertions(+), 32 deletions(-) diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c index f7646cd8a..a6d7e6d70 100644 --- a/src/node_device/node_device_udev.c +++ b/src/node_device/node_device_udev.c @@ -62,6 +62,12 @@ struct _udevEventData { struct udev_monitor *udev_monitor; int watch; bool privileged; + + /* Thread data */ + virThread th; + virCond threadCond; + bool threadQuit; + bool dataReady; }; static virClassPtr udevEventDataClass; @@ -80,6 +86,8 @@ udevEventDataDispose(void *obj) udev_unref(udev); udev_monitor_unref(data->udev_monitor); + + virCondDestroy(&data->threadCond); } @@ -108,9 +116,15 @@ udevEventDataNew(void) if (!(ret = virObjectLockableNew(udevEventDataClass))) return NULL; + if (virCondInit(&ret->threadCond) < 0) { + virObjectUnref(ret); + return NULL; + } + ret->watch = -1; return ret; -} +}; + static bool @@ -1625,15 +1639,25 @@ udevPCITranslateDeinit(void) static int nodeStateCleanup(void) { + udevEventDataPtr priv = NULL; + if (!driver) return -1; + priv = driver->privateData; + nodeDeviceLock(); - virObjectUnref(driver->privateData); + virObjectLock(priv); + priv->threadQuit = true; + virCondSignal(&priv->threadCond); + virObjectUnlock(priv); + virThreadJoin(&priv->th); + + virObjectUnref(priv); virObjectUnref(driver->nodeDeviceEventState); - virNodeDeviceObjListFree(driver->devs); + nodeDeviceUnlock(); virMutexDestroy(&driver->lock); VIR_FREE(driver); @@ -1691,30 +1715,60 @@ udevEventMonitorSanityCheck(udevEventDataPtr priv, static void -udevEventHandleThread(void *opaque) +udevEventHandleThread(void *opaque ATTRIBUTE_UNUSED) { udevEventDataPtr priv = driver->privateData; - int fd = (intptr_t) opaque; struct udev_device *device = NULL; - virObjectLock(priv); + /* continue rather than break from the loop on non-fatal errors */ + while (1) { + virObjectLock(priv); + while (!priv->dataReady && !priv->threadQuit) { + if (virCondWait(&priv->threadCond, &priv->parent.lock)) { + virReportSystemError(errno, "%s", + _("handler failed to wait on condition")); + virObjectUnlock(priv); + return; + } + } - if (!udevEventMonitorSanityCheck(priv, fd)) { + if (priv->threadQuit) { + virObjectUnlock(priv); + return; + } + + errno = 0; + device = udev_monitor_receive_device(priv->udev_monitor); virObjectUnlock(priv); - return; - } - device = udev_monitor_receive_device(priv->udev_monitor); - virObjectUnlock(priv); + if (!device) { + if (errno == 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("udev_monitor_receive_device failed")); + return; + } - if (device == NULL) { - virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("udev_monitor_receive_device returned NULL")); - return; - } + /* POSIX allows both EAGAIN and EWOULDBLOCK to be used + * interchangeably when the read would block or timeout was fired + */ + VIR_WARNINGS_NO_WLOGICALOP_EQUAL_EXPR + if (errno != EAGAIN && errno != EWOULDBLOCK) { + VIR_WARNINGS_RESET + virReportSystemError(errno, "%s", + _("udev_monitor_receive_device failed")); + return; + } + + virObjectLock(priv); + priv->dataReady = false; + virObjectUnlock(priv); - udevHandleOneDevice(device); - udev_device_unref(device); + continue; + } + + udevHandleOneDevice(device); + udev_device_unref(device); + } } @@ -1722,18 +1776,19 @@ static void udevEventHandleCallback(int watch ATTRIBUTE_UNUSED, int fd, int events ATTRIBUTE_UNUSED, - void *data ATTRIBUTE_UNUSED) + void *opaque ATTRIBUTE_UNUSED) { udevEventDataPtr priv = driver->privateData; virObjectLock(priv); - if (!udevEventMonitorSanityCheck(priv, fd)) { - virObjectUnlock(priv); - return; - } + + if (!udevEventMonitorSanityCheck(priv, fd)) + priv->threadQuit = true; + else + priv->dataReady = true; + + virCondSignal(&priv->threadCond); virObjectUnlock(priv); - - udevEventHandleThread((void *)(intptr_t) fd); } @@ -1875,29 +1930,29 @@ nodeStateInitialize(bool privileged, return -1; } - nodeDeviceLock(); - if (!(driver->devs = virNodeDeviceObjListNew()) || !(priv = udevEventDataNew())) - goto unlock; + goto cleanup; driver->privateData = priv; driver->nodeDeviceEventState = virObjectEventStateNew(); if (udevPCITranslateInit(privileged) < 0) - goto unlock; + goto cleanup; udev = udev_new(); if (!udev) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("failed to create udev context")); - goto unlock; + goto cleanup; } #if HAVE_UDEV_LOGGING /* cast to get rid of missing-format-attribute warning */ udev_set_log_fn(udev, (udevLogFunctionPtr) udevLogFunction); #endif + virObjectLock(priv); + priv->udev_monitor = udev_monitor_new_from_netlink(udev, "udev"); if (!priv->udev_monitor) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", @@ -1916,6 +1971,12 @@ nodeStateInitialize(bool privileged, 128 * 1024 * 1024); #endif + if (virThreadCreate(&priv->th, true, udevEventHandleThread, NULL) < 0) { + virReportSystemError(errno, "%s", + _("failed to create udev handler thread")); + goto unlock; + } + /* We register the monitor with the event callback so we are * notified by udev of device changes before we enumerate existing * devices because libvirt will simply recreate the device if we @@ -1934,7 +1995,7 @@ nodeStateInitialize(bool privileged, if (udevSetupSystemDev() != 0) goto unlock; - nodeDeviceUnlock(); + virObjectUnlock(priv); /* Populate with known devices */ if (udevEnumerateDevices(udev) != 0) @@ -1947,7 +2008,7 @@ nodeStateInitialize(bool privileged, return -1; unlock: - nodeDeviceUnlock(); + virObjectUnlock(priv); goto cleanup; } -- 2.13.6

On 10/11/2017 10:52 AM, Erik Skultety wrote:
Adjust udevEventHandleThread to be a proper thread routine running in an infinite loop handling devices. The handler thread pulls all available data from the udev monitor and only then waits until a wakeup signal for new incoming data has been emitted by udevEventHandleCallback.
This doing a bit more by removing the driver locks from initialization too. Perhaps those locks should be kept on Initialization for now and then in a followup patch remove them since @privateData (or whatever name it becomes) is then totally self locking. I don't think we'd run into any deadlocks since Initialization and Cleanup would still be the only consumers.
Signed-off-by: Erik Skultety <eskultet@redhat.com> --- src/node_device/node_device_udev.c | 125 +++++++++++++++++++++++++++---------- 1 file changed, 93 insertions(+), 32 deletions(-)
diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c index f7646cd8a..a6d7e6d70 100644 --- a/src/node_device/node_device_udev.c +++ b/src/node_device/node_device_udev.c @@ -62,6 +62,12 @@ struct _udevEventData { struct udev_monitor *udev_monitor; int watch; bool privileged; + + /* Thread data */ + virThread th; + virCond threadCond; + bool threadQuit; + bool dataReady; };
static virClassPtr udevEventDataClass; @@ -80,6 +86,8 @@ udevEventDataDispose(void *obj)
udev_unref(udev); udev_monitor_unref(data->udev_monitor); + + virCondDestroy(&data->threadCond); }
@@ -108,9 +116,15 @@ udevEventDataNew(void) if (!(ret = virObjectLockableNew(udevEventDataClass))) return NULL;
+ if (virCondInit(&ret->threadCond) < 0) { + virObjectUnref(ret); + return NULL; + } + ret->watch = -1; return ret; -} +}; +
Stray keystrokes that should be removed - blame the gremlins.
static bool @@ -1625,15 +1639,25 @@ udevPCITranslateDeinit(void) static int nodeStateCleanup(void) { + udevEventDataPtr priv = NULL; + if (!driver) return -1;
+ priv = driver->privateData; +
Although perhaps impossible, better safe than sorry if (!priv) return; Do we even need nodeDeviceLock now? It was removed from Initialize shouldn't this be after nodeDeviceLock...
nodeDeviceLock();
- virObjectUnref(driver->privateData); + virObjectLock(priv); + priv->threadQuit = true; + virCondSignal(&priv->threadCond); + virObjectUnlock(priv); + virThreadJoin(&priv->th); + + virObjectUnref(priv); virObjectUnref(driver->nodeDeviceEventState); - virNodeDeviceObjListFree(driver->devs); + nodeDeviceUnlock(); virMutexDestroy(&driver->lock); VIR_FREE(driver); @@ -1691,30 +1715,60 @@ udevEventMonitorSanityCheck(udevEventDataPtr priv,
static void -udevEventHandleThread(void *opaque) +udevEventHandleThread(void *opaque ATTRIBUTE_UNUSED) { udevEventDataPtr priv = driver->privateData; - int fd = (intptr_t) opaque; struct udev_device *device = NULL;
- virObjectLock(priv); + /* continue rather than break from the loop on non-fatal errors */ + while (1) { + virObjectLock(priv); + while (!priv->dataReady && !priv->threadQuit) { + if (virCondWait(&priv->threadCond, &priv->parent.lock)) { + virReportSystemError(errno, "%s", + _("handler failed to wait on condition")); + virObjectUnlock(priv); + return; + } + }
- if (!udevEventMonitorSanityCheck(priv, fd)) { + if (priv->threadQuit) { + virObjectUnlock(priv); + return; + } + + errno = 0; + device = udev_monitor_receive_device(priv->udev_monitor); virObjectUnlock(priv); - return; - }
- device = udev_monitor_receive_device(priv->udev_monitor); - virObjectUnlock(priv); + if (!device) { + if (errno == 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("udev_monitor_receive_device failed")); + return; + }
- if (device == NULL) { - virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("udev_monitor_receive_device returned NULL")) - return - } + /* POSIX allows both EAGAIN and EWOULDBLOCK to be used + * interchangeably when the read would block or timeout was fired + */ + VIR_WARNINGS_NO_WLOGICALOP_EQUAL_EXPR + if (errno != EAGAIN && errno != EWOULDBLOCK) { + VIR_WARNINGS_RESET + virReportSystemError(errno, "%s", + _("udev_monitor_receive_device failed"));
how about "unexpected error receiving udev device"
+ return; + } + + virObjectLock(priv); + priv->dataReady = false; + virObjectUnlock(priv);
- udevHandleOneDevice(device); - udev_device_unref(device); + continue; + } + + udevHandleOneDevice(device); + udev_device_unref(device); + } }
@@ -1722,18 +1776,19 @@ static void udevEventHandleCallback(int watch ATTRIBUTE_UNUSED, int fd, int events ATTRIBUTE_UNUSED, - void *data ATTRIBUTE_UNUSED) + void *opaque ATTRIBUTE_UNUSED) { udevEventDataPtr priv = driver->privateData;
virObjectLock(priv); - if (!udevEventMonitorSanityCheck(priv, fd)) { - virObjectUnlock(priv); - return; - } + + if (!udevEventMonitorSanityCheck(priv, fd)) + priv->threadQuit = true; + else + priv->dataReady = true; + + virCondSignal(&priv->threadCond); virObjectUnlock(priv); - - udevEventHandleThread((void *)(intptr_t) fd); }
@@ -1875,29 +1930,29 @@ nodeStateInitialize(bool privileged, return -1; }
- nodeDeviceLock(); -
And now the only time we ever use nodeDeviceLock/Unlock in node_device_udev is in Cleanup... Does something else prevent Cleanup from being called while perhaps Initialize is still running? Looking at libvirtd.c, virStateCleanup can only be called once driversInitialized is set. That's only set once virStateInitialize has been run successfully. Perhaps we should keep it just for this patch and remove in the "next" patch leaving the goto unlock intact, but modifying it to be : unlock: if (priv) virObjectUnlock(priv); nodeDeviceUnlock(); Then the next patch alters the gots's and unlock: label to what this patch has. It also removes nodeDeviceLock from Cleanup With a few tweaks: Reviewed-by: John Ferlan <jferlan@redhat.com> John
if (!(driver->devs = virNodeDeviceObjListNew()) || !(priv = udevEventDataNew())) - goto unlock; + goto cleanup;
driver->privateData = priv; driver->nodeDeviceEventState = virObjectEventStateNew();
if (udevPCITranslateInit(privileged) < 0) - goto unlock; + goto cleanup;
udev = udev_new(); if (!udev) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("failed to create udev context")); - goto unlock; + goto cleanup; } #if HAVE_UDEV_LOGGING /* cast to get rid of missing-format-attribute warning */ udev_set_log_fn(udev, (udevLogFunctionPtr) udevLogFunction); #endif
+ virObjectLock(priv); + priv->udev_monitor = udev_monitor_new_from_netlink(udev, "udev"); if (!priv->udev_monitor) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", @@ -1916,6 +1971,12 @@ nodeStateInitialize(bool privileged, 128 * 1024 * 1024); #endif
+ if (virThreadCreate(&priv->th, true, udevEventHandleThread, NULL) < 0) { + virReportSystemError(errno, "%s", + _("failed to create udev handler thread")); + goto unlock; + } + /* We register the monitor with the event callback so we are * notified by udev of device changes before we enumerate existing * devices because libvirt will simply recreate the device if we @@ -1934,7 +1995,7 @@ nodeStateInitialize(bool privileged, if (udevSetupSystemDev() != 0) goto unlock;
- nodeDeviceUnlock(); + virObjectUnlock(priv);
/* Populate with known devices */ if (udevEnumerateDevices(udev) != 0) @@ -1947,7 +2008,7 @@ nodeStateInitialize(bool privileged, return -1;
unlock: - nodeDeviceUnlock(); + virObjectUnlock(priv); goto cleanup; }

On Sun, Oct 15, 2017 at 10:23:56AM -0400, John Ferlan wrote:
On 10/11/2017 10:52 AM, Erik Skultety wrote:
Adjust udevEventHandleThread to be a proper thread routine running in an infinite loop handling devices. The handler thread pulls all available data from the udev monitor and only then waits until a wakeup signal for new incoming data has been emitted by udevEventHandleCallback.
This doing a bit more by removing the driver locks from initialization too. Perhaps those locks should be kept on Initialization for now and then in a followup patch remove them since @privateData (or whatever name it becomes) is then totally self locking.
I don't think we'd run into any deadlocks since Initialization and Cleanup would still be the only consumers.
Like I said, patch will be added. [...]
static bool @@ -1625,15 +1639,25 @@ udevPCITranslateDeinit(void) static int nodeStateCleanup(void) { + udevEventDataPtr priv = NULL; + if (!driver) return -1;
+ priv = driver->privateData; +
Although perhaps impossible, better safe than sorry
Here it's actually quite possible, when allocation of the private data in Initialize fails, cleanup is performed, so this would get a SIGSEGV, thanks. Erik
Perhaps we should keep it just for this patch and remove in the "next" patch leaving the goto unlock intact, but modifying it to be :
unlock: if (priv) virObjectUnlock(priv); nodeDeviceUnlock();
Then the next patch alters the gots's and unlock: label to what this patch has. It also removes nodeDeviceLock from Cleanup
Yeah, I'll handle the locking prior to adding this thread-related stuff.

Since we have a number of places where we workaround timing issues with devices, attributes (files in general) not being available at the time of processing them by calling usleep in a loop for a fixed number of tries, we could as well have a utility function that would do that. Therefore we won't have to duplicate this ugly workaround even more. Signed-off-by: Erik Skultety <eskultet@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virfile.c | 31 +++++++++++++++++++++++++++++++ src/util/virfile.h | 2 ++ 3 files changed, 34 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 26c5ddb40..d871c813a 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1753,6 +1753,7 @@ virFileStripSuffix; virFileTouch; virFileUnlock; virFileUpdatePerm; +virFileWaitForExists; virFileWrapperFdClose; virFileWrapperFdFree; virFileWrapperFdNew; diff --git a/src/util/virfile.c b/src/util/virfile.c index 7ca60052d..82cb36dbc 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -4177,3 +4177,34 @@ virFileReadValueString(char **value, const char *format, ...) VIR_FREE(str); return ret; } + + +/** + * virFileWaitForExists: + * @path: absolute path to a sysfs attribute (can be a symlink) + * @ms: how long to wait (in milliseconds) + * @tries: how many times should we try to wait for @path to become accessible + * + * Checks the existence of @path. In case the file defined by @path + * doesn't exist, we wait for it to appear in @ms milliseconds (for up to + * @tries attempts). + * + * Returns 0 on success, -1 on error, setting errno appropriately. + */ +int +virFileWaitForExists(const char *path, + size_t ms, + size_t tries) +{ + errno = 0; + + /* wait for @path to be accessible in @ms milliseconds, up to @tries */ + while (tries-- > 0 && !virFileExists(path)) { + if (tries == 0 || errno != ENOENT) + return -1; + + usleep(ms * 1000); + } + + return 0; +} diff --git a/src/util/virfile.h b/src/util/virfile.h index 21fb41b70..91d318622 100644 --- a/src/util/virfile.h +++ b/src/util/virfile.h @@ -349,6 +349,8 @@ int virFileReadValueScaledInt(unsigned long long *value, const char *format, ... int virFileReadValueString(char **value, const char *format, ...) ATTRIBUTE_FMT_PRINTF(2, 3); +int virFileWaitForExists(const char *path, size_t ms, size_t tries); + int virFileInData(int fd, int *inData, -- 2.13.6

On 10/11/2017 10:52 AM, Erik Skultety wrote:
Since we have a number of places where we workaround timing issues with devices, attributes (files in general) not being available at the time of processing them by calling usleep in a loop for a fixed number of tries, we could as well have a utility function that would do that. Therefore we won't have to duplicate this ugly workaround even more.
Signed-off-by: Erik Skultety <eskultet@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virfile.c | 31 +++++++++++++++++++++++++++++++ src/util/virfile.h | 2 ++ 3 files changed, 34 insertions(+)
Reviewed-by: John Ferlan <jferlan@redhat.com> John

If we find ourselves in the situation that the 'add' uevent has been fired earlier than the sysfs tree for a device was created, we should use the best-effort approach and give kernel some predetermined amount of time, thus waiting for the attributes to be ready rather than discarding the device from our device list forever. If those don't appear in the given time frame, we need to move on, since libvirt can't wait indefinitely. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1463285 Signed-off-by: Erik Skultety <eskultet@redhat.com> --- src/node_device/node_device_udev.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c index a6d7e6d70..0329cac90 100644 --- a/src/node_device/node_device_udev.c +++ b/src/node_device/node_device_udev.c @@ -1198,9 +1198,23 @@ udevProcessMediatedDevice(struct udev_device *dev, char *canonicalpath = NULL; virNodeDevCapMdevPtr data = &def->caps->data.mdev; - if (virAsprintf(&linkpath, "%s/mdev_type", udev_device_get_syspath(dev)) < 0) + /* Because of a kernel uevent race, we might get the 'add' event prior to + * the sysfs tree being ready, so any attempt to access any sysfs attribute + * would result in ENOENT and us dropping the device, so let's work around + * it by waiting for the attributes to become available. + */ + + if (virAsprintf(&linkpath, "%s/mdev_type", + udev_device_get_syspath(dev)) < 0) goto cleanup; + if (virFileWaitForExists(linkpath, 1, 100) < 0) { + virReportSystemError(errno, + _("failed to wait for file '%s' to appear"), + linkpath); + goto cleanup; + } + if (virFileResolveLink(linkpath, &canonicalpath) < 0) { virReportSystemError(errno, _("failed to resolve '%s'"), linkpath); goto cleanup; -- 2.13.6

On 10/11/2017 10:52 AM, Erik Skultety wrote:
If we find ourselves in the situation that the 'add' uevent has been fired earlier than the sysfs tree for a device was created, we should use the best-effort approach and give kernel some predetermined amount of time, thus waiting for the attributes to be ready rather than discarding the device from our device list forever. If those don't appear in the given time frame, we need to move on, since libvirt can't wait indefinitely.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1463285
Signed-off-by: Erik Skultety <eskultet@redhat.com> --- src/node_device/node_device_udev.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-)
Reviewed-by: John Ferlan <jferlan@redhat.com> John
participants (2)
-
Erik Skultety
-
John Ferlan