[PATCH 0/3] nodedev: fix and improve mdev nodedev API usage

These patches fix copying the configuration data when starting a persistent mdev and improve the reliability of configuration data being up to date when mdev configuration data is changed. Boris Fiuczynski (3): nodedev: fix mdev add udev event data handling nodedev: immediate update of active config on udev add nodedev: reset active config data on udev remove event src/node_device/node_device_driver.c | 22 ++++++++++++++++++++++ src/node_device/node_device_driver.h | 3 +++ src/node_device/node_device_udev.c | 9 ++++++++- 3 files changed, 33 insertions(+), 1 deletion(-) -- 2.42.0

Two situations will trigger an udev add event: 1) the mdev is created when started (transient) or 2) the mdev was defined and is started In case 1 there is no node object existing and no config data is copied. In case 2 copying the active config data of an existing node object will only copy invalid data. Instead copying the defined config data will store valid data into the newly added node object. Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com> --- src/node_device/node_device_udev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c index f1e402f8f7..4730a5b986 100644 --- a/src/node_device/node_device_udev.c +++ b/src/node_device/node_device_udev.c @@ -1572,7 +1572,7 @@ udevAddOneDevice(struct udev_device *device) objdef = virNodeDeviceObjGetDef(obj); if (is_mdev) - nodeDeviceDefCopyFromMdevctl(def, objdef, false); + nodeDeviceDefCopyFromMdevctl(def, objdef, true); persistent = virNodeDeviceObjIsPersistent(obj); autostart = virNodeDeviceObjIsAutostart(obj); -- 2.42.0

On 3/20/24 10:46 AM, Boris Fiuczynski wrote:
Two situations will trigger an udev add event: 1) the mdev is created when started (transient) or 2) the mdev was defined and is started In case 1 there is no node object existing and no config data is copied. In case 2 copying the active config data of an existing node object will only copy invalid data. Instead copying the defined config data will store valid data into the newly added node object.
Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com> --- src/node_device/node_device_udev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c index f1e402f8f7..4730a5b986 100644 --- a/src/node_device/node_device_udev.c +++ b/src/node_device/node_device_udev.c @@ -1572,7 +1572,7 @@ udevAddOneDevice(struct udev_device *device) objdef = virNodeDeviceObjGetDef(obj);
if (is_mdev) - nodeDeviceDefCopyFromMdevctl(def, objdef, false); + nodeDeviceDefCopyFromMdevctl(def, objdef, true);
persistent = virNodeDeviceObjIsPersistent(obj); autostart = virNodeDeviceObjIsAutostart(obj);
Reviewed-by: Jonathon Jongsma <jjongsma@redhat.com>

When an udev add event occurs the mdev active config data requires an update via mdevctl as the udev does not contain all config data. This update needs to occur immediate and to be finished before the libvirt CREATE event is issued to keep the API usage reliable. Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com> --- src/node_device/node_device_udev.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c index 4730a5b986..0f335df950 100644 --- a/src/node_device/node_device_udev.c +++ b/src/node_device/node_device_udev.c @@ -1606,6 +1606,12 @@ udevAddOneDevice(struct udev_device *device) if (has_mdev_types) scheduleMdevctlUpdate(driver->privateData, false); + /* The added mdev needs an immediate active config update before + * the event is issues to allow sane API usage. */ + if (is_mdev && (nodeDeviceUpdateMediatedDevices() < 0)) + VIR_WARN("Update of mediated device %s failed", + def ? NULLSTR(def->sysfs_path) : ""); + ret = 0; cleanup: -- 2.42.0

On 3/20/24 10:46 AM, Boris Fiuczynski wrote:
When an udev add event occurs the mdev active config data requires an update via mdevctl as the udev does not contain all config data. This update needs to occur immediate and to be finished before the libvirt CREATE event is issued to keep the API usage reliable.
The problem that you're trying to solve isn't spelled out very explicitly here. My understanding is that the issue occurs when you start a mediated device that has attributes (and the device is supported by callout scripts in mdevctl). With the current code, my assumption is that we observe the following behavior: - udev event is emitted - libvirt handles the udev event and emits a VIR_NODE_DEVICE_EVENT_CREATED event for the new mdev. At this point the mdev only information that is provided by udev (such as uuid, mdev type) and doesn't include any attributes, for example. - we schedule a thread to query mdevctl for the full information for the device. The mdev device gets update to include the full mdevctl data, including attributes - libvirt emits a VIR_NODE_DEVICE_EVENT_ID_UPDATE event to indicate that the device information was updated And you're trying to skip the last step and make sure that the attributes are set at the time that the first CREATED event is emitted. Correct? Or is the behavior you're seeing different than what I've described?
Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com> --- src/node_device/node_device_udev.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c index 4730a5b986..0f335df950 100644 --- a/src/node_device/node_device_udev.c +++ b/src/node_device/node_device_udev.c @@ -1606,6 +1606,12 @@ udevAddOneDevice(struct udev_device *device) if (has_mdev_types) scheduleMdevctlUpdate(driver->privateData, false);
+ /* The added mdev needs an immediate active config update before + * the event is issues to allow sane API usage. */ + if (is_mdev && (nodeDeviceUpdateMediatedDevices() < 0)) + VIR_WARN("Update of mediated device %s failed", + def ? NULLSTR(def->sysfs_path) : ""); + ret = 0;
cleanup:
`` Right now libvirt has a dedicated thread for handling udev updates, and we also spawn a separate thread to query mdevctl whenever we detect that mdev state has changed. This patch makes it so that mdevctl is now executed directly from the udev thread, which changes that separation of responsibilities. It shouldn't cause any memory corruption since nodeDeviceUpdateMediatedDevices() is already threadsafe, but it makes the code harder to reason about. When we introduced mdevctl support to libvirt, the first versions of my patch series did in fact query mdevctl from the udev thread. But based on concerns during code review, it was moved out to a separate thread. So I don't think that we should necessarily reintroduce that here. In addition, I notice that udevHandleOneDevice() already schedules an mdevctl update thread to be run after this function (udevAddOneDevice()) is called for a mediated device. So after this patch, a new udev event for a mediated device would result in the udev thread querying mdevctl immediately and then also spawning a new thread to query mdevctl. Jonathon

On Wed, Mar 27, 2024 at 10:53 AM -0500, Jonathon Jongsma <jjongsma@redhat.com> wrote:
On 3/20/24 10:46 AM, Boris Fiuczynski wrote:
When an udev add event occurs the mdev active config data requires an update via mdevctl as the udev does not contain all config data. This update needs to occur immediate and to be finished before the libvirt CREATE event is issued to keep the API usage reliable.
The problem that you're trying to solve isn't spelled out very explicitly here. My understanding is that the issue occurs when you start a mediated device that has attributes (and the device is supported by callout scripts in mdevctl). With the current code, my assumption is that we observe the following behavior: - udev event is emitted - libvirt handles the udev event and emits a VIR_NODE_DEVICE_EVENT_CREATED event for the new mdev. At this point the mdev only information that is provided by udev (such as uuid, mdev type) and doesn't include any attributes, for example. - we schedule a thread to query mdevctl for the full information for the device. The mdev device gets update to include the full mdevctl data, including attributes - libvirt emits a VIR_NODE_DEVICE_EVENT_ID_UPDATE event to indicate that the device information was updated
And you're trying to skip the last step and make sure that the attributes are set at the time that the first CREATED event is emitted.
Boris is currently on vacation so I cannot discuss this with him, but that’s my understanding as well. For me, this assumption also makes sense… because otherwise it’s awkward to use the libvirt events API for mdev devices (wait for a CREATED event and then for an UPDATE event for the same device - not a really good UX pattern IMO). […snip…]
+ /* The added mdev needs an immediate active config update before + * the event is issues to allow sane API usage. */ + if (is_mdev && (nodeDeviceUpdateMediatedDevices() < 0)) + VIR_WARN("Update of mediated device %s failed", + def ? NULLSTR(def->sysfs_path) : ""); + ret = 0;
cleanup:
``
Right now libvirt has a dedicated thread for handling udev updates, and we also spawn a separate thread to query mdevctl whenever we detect that mdev state has changed. This patch makes it so that mdevctl is now executed directly from the udev thread, which changes that separation of responsibilities. It shouldn't cause any memory corruption since nodeDeviceUpdateMediatedDevices() is already threadsafe, but it makes the code harder to reason about. When we introduced mdevctl support to libvirt, the first versions of my patch series did in fact query mdevctl from the udev thread. But based on concerns during code review, it was moved out to a separate thread. So I don't think that we should necessarily reintroduce that here.
Yep. Right now we already have in node_device_udev: + init thread for initialization + udev thread, which creates and then enqueues the libvirt events + mdevctl update thread, which calls `nodeDeviceUpdateMediateDevices()‘ (executing 'mdevctl' and updating the internal data structures with the result) + main thread for the timeout mechanism to launch the mdevCtlUpdateThread + another thread (not sure if it’s the main thread): virNodeDeviceUpdate -> nodeDeviceUpdateMediateDevices() How about: + instead of having a mdevctl update thread and an init thread we could have only one worker pool (similar to qemu_process.c) + one udev thread that dispatches the events and then submits jobs to the worker pool + main thread for the timeout mechanism to schedule the “mdev update worker pool job” + another thread (not sure if it’s the main thread): virNodeDeviceUpdate: -> nodeDeviceUpdateMediateDevices() <- maybe we can leave it as is.
In addition, I notice that udevHandleOneDevice() already schedules an mdevctl update thread to be run after this function (udevAddOneDevice()) is called for a mediated device. So after this patch, a new udev event for a mediated device would result in the udev thread querying mdevctl immediately and then also spawning a new thread to query mdevctl.
This can probably be removed after the patch from Boris.
Jonathon
-- Kind regards / Beste Grüße Marc Hartmayer IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Wolfgang Wendt Geschäftsführung: David Faller Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294

On 3/27/24 1:14 PM, Marc Hartmayer wrote:
On Wed, Mar 27, 2024 at 10:53 AM -0500, Jonathon Jongsma <jjongsma@redhat.com> wrote:
On 3/20/24 10:46 AM, Boris Fiuczynski wrote:
When an udev add event occurs the mdev active config data requires an update via mdevctl as the udev does not contain all config data. This update needs to occur immediate and to be finished before the libvirt CREATE event is issued to keep the API usage reliable.
The problem that you're trying to solve isn't spelled out very explicitly here. My understanding is that the issue occurs when you start a mediated device that has attributes (and the device is supported by callout scripts in mdevctl). With the current code, my assumption is that we observe the following behavior: - udev event is emitted - libvirt handles the udev event and emits a VIR_NODE_DEVICE_EVENT_CREATED event for the new mdev. At this point the mdev only information that is provided by udev (such as uuid, mdev type) and doesn't include any attributes, for example. - we schedule a thread to query mdevctl for the full information for the device. The mdev device gets update to include the full mdevctl data, including attributes - libvirt emits a VIR_NODE_DEVICE_EVENT_ID_UPDATE event to indicate that the device information was updated
And you're trying to skip the last step and make sure that the attributes are set at the time that the first CREATED event is emitted.
Boris is currently on vacation so I cannot discuss this with him, but that’s my understanding as well. For me, this assumption also makes sense… because otherwise it’s awkward to use the libvirt events API for mdev devices (wait for a CREATED event and then for an UPDATE event for the same device - not a really good UX pattern IMO).
[…snip…]
+ /* The added mdev needs an immediate active config update before + * the event is issues to allow sane API usage. */ + if (is_mdev && (nodeDeviceUpdateMediatedDevices() < 0)) + VIR_WARN("Update of mediated device %s failed", + def ? NULLSTR(def->sysfs_path) : ""); + ret = 0;
cleanup:
``
Right now libvirt has a dedicated thread for handling udev updates, and we also spawn a separate thread to query mdevctl whenever we detect that mdev state has changed. This patch makes it so that mdevctl is now executed directly from the udev thread, which changes that separation of responsibilities. It shouldn't cause any memory corruption since nodeDeviceUpdateMediatedDevices() is already threadsafe, but it makes the code harder to reason about. When we introduced mdevctl support to libvirt, the first versions of my patch series did in fact query mdevctl from the udev thread. But based on concerns during code review, it was moved out to a separate thread. So I don't think that we should necessarily reintroduce that here.
Yep.
Right now we already have in node_device_udev: + init thread for initialization + udev thread, which creates and then enqueues the libvirt events + mdevctl update thread, which calls `nodeDeviceUpdateMediateDevices()‘ (executing 'mdevctl' and updating the internal data structures with the result) + main thread for the timeout mechanism to launch the mdevCtlUpdateThread + another thread (not sure if it’s the main thread): virNodeDeviceUpdate -> nodeDeviceUpdateMediateDevices()
How about: + instead of having a mdevctl update thread and an init thread we could have only one worker pool (similar to qemu_process.c) + one udev thread that dispatches the events and then submits jobs to the worker pool + main thread for the timeout mechanism to schedule the “mdev update worker pool job” + another thread (not sure if it’s the main thread): virNodeDeviceUpdate: -> nodeDeviceUpdateMediateDevices() <- maybe we can leave it as is.
This seems OK to me. Jonathon

When a mdev device is destroyed or stopped the udev remove event handling needs to reset the active config data of the node object representing a persisted mdev. Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com> --- src/node_device/node_device_driver.c | 22 ++++++++++++++++++++++ src/node_device/node_device_driver.h | 3 +++ src/node_device/node_device_udev.c | 1 + 3 files changed, 26 insertions(+) diff --git a/src/node_device/node_device_driver.c b/src/node_device/node_device_driver.c index d99b48138e..1d93106e29 100644 --- a/src/node_device/node_device_driver.c +++ b/src/node_device/node_device_driver.c @@ -2018,6 +2018,28 @@ nodeDeviceDefCopyFromMdevctl(virNodeDeviceDef *dst, } +/* A mediated device definition contains data from mdevctl about the active + * device. When the device is deactivated the active configuration data needs + * to be removed. */ +void +nodeDeviceDefResetMdevActiveConfig(virNodeDeviceDef *def) +{ + size_t i = 0; + virMediatedDeviceConfig *active_config; + + if (def->caps->data.type != VIR_NODE_DEV_CAP_MDEV) + return; + + active_config = &def->caps->data.mdev.active_config; + + g_clear_pointer(&active_config->type, g_free); + for (i = 0; i < active_config->nattributes; i++) + virMediatedDeviceAttrFree(active_config->attributes[i]); + g_clear_pointer(&active_config->attributes, g_free); + active_config->nattributes = 0; +} + + int nodeDeviceSetAutostart(virNodeDevice *device, int autostart) diff --git a/src/node_device/node_device_driver.h b/src/node_device/node_device_driver.h index b3bc4b2e96..f195cfef9d 100644 --- a/src/node_device/node_device_driver.h +++ b/src/node_device/node_device_driver.h @@ -197,3 +197,6 @@ int nodeDeviceUpdate(virNodeDevice *dev, const char *xmlDesc, unsigned int flags); + +void +nodeDeviceDefResetMdevActiveConfig(virNodeDeviceDef *def); diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c index 0f335df950..93d0dcedbc 100644 --- a/src/node_device/node_device_udev.c +++ b/src/node_device/node_device_udev.c @@ -1467,6 +1467,7 @@ udevRemoveOneDeviceSysPath(const char *path) if (virNodeDeviceObjIsPersistent(obj)) { VIR_FREE(def->sysfs_path); virNodeDeviceObjSetActive(obj, false); + nodeDeviceDefResetMdevActiveConfig(def); } else { VIR_DEBUG("Removing device '%s' with sysfs path '%s'", def->name, path); -- 2.42.0

On 3/20/24 10:46 AM, Boris Fiuczynski wrote:
When a mdev device is destroyed or stopped the udev remove event handling needs to reset the active config data of the node object representing a persisted mdev.
Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com> --- src/node_device/node_device_driver.c | 22 ++++++++++++++++++++++ src/node_device/node_device_driver.h | 3 +++ src/node_device/node_device_udev.c | 1 + 3 files changed, 26 insertions(+)
diff --git a/src/node_device/node_device_driver.c b/src/node_device/node_device_driver.c index d99b48138e..1d93106e29 100644 --- a/src/node_device/node_device_driver.c +++ b/src/node_device/node_device_driver.c @@ -2018,6 +2018,28 @@ nodeDeviceDefCopyFromMdevctl(virNodeDeviceDef *dst, }
+/* A mediated device definition contains data from mdevctl about the active + * device. When the device is deactivated the active configuration data needs + * to be removed. */ +void +nodeDeviceDefResetMdevActiveConfig(virNodeDeviceDef *def) +{ + size_t i = 0; + virMediatedDeviceConfig *active_config; + + if (def->caps->data.type != VIR_NODE_DEV_CAP_MDEV) + return; + + active_config = &def->caps->data.mdev.active_config; + + g_clear_pointer(&active_config->type, g_free); + for (i = 0; i < active_config->nattributes; i++) + virMediatedDeviceAttrFree(active_config->attributes[i]); + g_clear_pointer(&active_config->attributes, g_free); + active_config->nattributes = 0; +} + +
A good portion of this function is duplicating code that exists in virNodeDevCapsDefFree(). Let's just factor that code out into a virMediatedDeviceConfigClear() function and then use it from both locations.
int nodeDeviceSetAutostart(virNodeDevice *device, int autostart) diff --git a/src/node_device/node_device_driver.h b/src/node_device/node_device_driver.h index b3bc4b2e96..f195cfef9d 100644 --- a/src/node_device/node_device_driver.h +++ b/src/node_device/node_device_driver.h @@ -197,3 +197,6 @@ int nodeDeviceUpdate(virNodeDevice *dev, const char *xmlDesc, unsigned int flags); + +void +nodeDeviceDefResetMdevActiveConfig(virNodeDeviceDef *def); diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c index 0f335df950..93d0dcedbc 100644 --- a/src/node_device/node_device_udev.c +++ b/src/node_device/node_device_udev.c @@ -1467,6 +1467,7 @@ udevRemoveOneDeviceSysPath(const char *path) if (virNodeDeviceObjIsPersistent(obj)) { VIR_FREE(def->sysfs_path); virNodeDeviceObjSetActive(obj, false); + nodeDeviceDefResetMdevActiveConfig(def); } else { VIR_DEBUG("Removing device '%s' with sysfs path '%s'", def->name, path);

On 3/21/24 16:28, Jonathon Jongsma wrote:
+void +nodeDeviceDefResetMdevActiveConfig(virNodeDeviceDef *def) +{ + size_t i = 0; + virMediatedDeviceConfig *active_config; + + if (def->caps->data.type != VIR_NODE_DEV_CAP_MDEV) + return; + + active_config = &def->caps->data.mdev.active_config; + + g_clear_pointer(&active_config->type, g_free); + for (i = 0; i < active_config->nattributes; i++) + virMediatedDeviceAttrFree(active_config->attributes[i]); + g_clear_pointer(&active_config->attributes, g_free); + active_config->nattributes = 0; +} + +
A good portion of this function is duplicating code that exists in virNodeDevCapsDefFree(). Let's just factor that code out into a virMediatedDeviceConfigClear() function and then use it from both locations.
OK, will do. -- Mit freundlichen Grüßen/Kind regards Boris Fiuczynski IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Wolfgang Wendt Geschäftsführung: David Faller Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294
participants (3)
-
Boris Fiuczynski
-
Jonathon Jongsma
-
Marc Hartmayer