[libvirt PATCH v4 00/25] Add support for persistent mediated devices

This patch series follows the previously-merged series which added support for transient mediated devices. This series expands mdev support to include persistent device definitions. Again, it relies on mdevctl as the backend. It follows the common libvirt pattern of APIs by adding the following new APIs for node devices: - virNodeDeviceDefineXML() - defines a persistent device - virNodeDeviceUndefine() - undefines a persistent device - virNodeDeviceCreate() - starts a previously-defined device It also adds virsh commands mapping to these new APIs: nodedev-define, nodedev-undefine, and nodedev-start. Since we rely on mdevctl for the definition of mediated devices, we need a way to stay up-to-date with devices that are defined by mdevctl (outside of libvirt). The method for staying up-to-date is currently a little bit crude due to the fact that mdevctl does not emit any events when new devices are added or removed. As a workaround, we create a file monitor for the mdevctl config directory and re-query mdevctl when we detect changes within that directory. In the future, mdevctl may introduce a more elegant solution. Changes in v4: - rebase to git master - switch to throwaway thread for querying mdevctl - fixed a bug when removing devices because I was accidentally using virHashForEach() instead of virHashForeachSafe() - use DEFINED/UNDEFINED events instead of STARTED/STOPPED events - changes related to merging information about mdev devices from both udev a= nd mdevctl: - Re-used the same function to copy extra data from mdevctl regardless of whether we're processing a udev event or a mdevctl event (recommended by Erik). This results in slightly more complex handling of the object lifetimes (see patch 9), but it consolidates some code. - nodeDeviceDefCopyFromMdevctl() previously only copied the data that was unique to mdevctl and didn't exist in udev. It now copies additional data (possibly overwriting some udev). This solves a problem where a device = is defined but not active (i.e. we have not gotten any data from udev), and then changed (e.g. somebody calls 'mdevctl modify' to change the mdev type), but libvirt was not updating to the new definition. - fix a bug where we were mistakenly emitting 'update' events for devices th= at had not changed - Added the ability to specify a uuid for an mdev via device XML. - split some commits into multiple patches - updated new API version info to 7.1.0 - Fixed a bug reported by Yan Fu which hangs the client when attempting to destroy a nodedev that is in use by an active vm. See https://www.redhat.com/archives/libvir-list/2021-February/msg00116.html for solution suggested by Alex. - numerous smaller fixes from review findings changes in v3: - streamlined tests -- removed some unnecessary duplication - split out patch to factor out node device name generation function - split nodeDeviceParseMdevctlChildDevice() into a separate function - added follow-up patch to remove space-padded alignment in header - refactored the mdevctl update handling significantly: - no longer a separate persistent thread that gets signaled by a timer - now piggybacks onto the existing udev thread and signals the thread in t= he same way that the udev event does. - Daniel suggested spawning a throw-away thread to handle mdevctl updates, but that introduces the complexity of possibly serializing multiple throw-away threads (e.g. if we get an 'created' event followed immediate= ly by a 'deleted' event, two threads may be spawned and we'd need to ensure they are properly ordered) - added virNodeDeviceObjListForEach() and virNodeDeviceObjListRemoveLocked() to simplify removing devices that are removed from mdevctl. - coding style fixes - NOTE: per Erik's request, I experimented with changing the way that mdevctl commands were generated and tested (e.g. introducing something like virMdevctlGetCommand(def, MDEVCTL_COMMAND_<SUBCOMMAND>, ...)), but it was too invasive and awkward and didn't seem worthwhile Changes in v2: - rebase to latest git master Jonathon Jongsma (25): tests: remove extra trailing semicolon nodedev: introduce concept of 'active' node devices nodedev: Add ability to filter by active state nodedev: expose internal helper for naming devices nodedev: add ability to parse mdevs from mdevctl nodedev: add ability to list defined mdevs nodedev: add persistence to virNodeDeviceObj nodedev: add DEFINED/UNDEFINED lifecycle events nodedev: add mdevctl devices to node device list nodedev: add helper functions to remove node devices nodedev: handle mdevs that disappear from mdevctl nodedev: Refresh mdev devices when changes are detected nodedev: add function to generate mdevctl define command api: add virNodeDeviceDefineXML() virsh: Add --inactive, --all to nodedev-list virsh: add nodedev-define command nodedev: refactor tests to support mdev undefine api: add virNodeDeviceUndefine() virsh: Factor out function to find node device virsh: add nodedev-undefine command api: add virNodeDeviceCreate() virsh: add "nodedev-start" command nodedev: add <uuid> element to mdev caps nodedev: add ability to specify UUID for new mdevs nodedev: fix hang when destroying an mdev in use docs/schemas/nodedev.rng | 43 +- examples/c/misc/event-test.c | 4 + include/libvirt/libvirt-nodedev.h | 19 +- src/access/viraccessperm.c | 2 +- src/access/viraccessperm.h | 6 + src/conf/node_device_conf.c | 15 + src/conf/node_device_conf.h | 8 + src/conf/virnodedeviceobj.c | 149 +++- src/conf/virnodedeviceobj.h | 25 + src/driver-nodedev.h | 14 + src/libvirt-nodedev.c | 115 +++ src/libvirt_private.syms | 6 + src/libvirt_public.syms | 7 + src/node_device/node_device_driver.c | 657 +++++++++++++++++- src/node_device/node_device_driver.h | 41 ++ src/node_device/node_device_udev.c | 214 +++++- src/remote/remote_driver.c | 3 + src/remote/remote_protocol.x | 40 +- src/remote_protocol-structs | 16 + src/rpc/gendispatch.pl | 1 + ...19_36ea_4111_8f0a_8c9a70e21366-define.argv | 2 + ...19_36ea_4111_8f0a_8c9a70e21366-define.json | 1 + ...019_36ea_4111_8f0a_8c9a70e21366-start.argv | 3 +- ...39_495e_4243_ad9f_beb3f14c23d9-define.argv | 1 + ...39_495e_4243_ad9f_beb3f14c23d9-define.json | 1 + ...16_1ca8_49ac_b176_871d16c13076-define.argv | 1 + ...16_1ca8_49ac_b176_871d16c13076-define.json | 1 + tests/nodedevmdevctldata/mdevctl-create.argv | 1 + .../mdevctl-list-defined.argv | 1 + .../mdevctl-list-multiple.json | 59 ++ .../mdevctl-list-multiple.out.xml | 43 ++ .../nodedevmdevctldata/mdevctl-undefine.argv | 1 + tests/nodedevmdevctltest.c | 222 +++++- ...v_d069d019_36ea_4111_8f0a_8c9a70e21366.xml | 1 + tools/virsh-nodedev.c | 268 ++++++- 35 files changed, 1853 insertions(+), 138 deletions(-) create mode 100644 tests/nodedevmdevctldata/mdev_d069d019_36ea_4111_8f0a_8c9= a70e21366-define.argv create mode 100644 tests/nodedevmdevctldata/mdev_d069d019_36ea_4111_8f0a_8c9= a70e21366-define.json create mode 100644 tests/nodedevmdevctldata/mdev_d2441d39_495e_4243_ad9f_beb= 3f14c23d9-define.argv create mode 100644 tests/nodedevmdevctldata/mdev_d2441d39_495e_4243_ad9f_beb= 3f14c23d9-define.json create mode 100644 tests/nodedevmdevctldata/mdev_fedc4916_1ca8_49ac_b176_871= d16c13076-define.argv create mode 100644 tests/nodedevmdevctldata/mdev_fedc4916_1ca8_49ac_b176_871= d16c13076-define.json create mode 100644 tests/nodedevmdevctldata/mdevctl-create.argv create mode 100644 tests/nodedevmdevctldata/mdevctl-list-defined.argv create mode 100644 tests/nodedevmdevctldata/mdevctl-list-multiple.json create mode 100644 tests/nodedevmdevctldata/mdevctl-list-multiple.out.xml create mode 100644 tests/nodedevmdevctldata/mdevctl-undefine.argv --=20 2.26.2

The macro should not have a trailing semicolon so that when the macro is used, the user can add a semicolon themselves. Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> Reviewed-by: Erik Skultety <eskultet@redhat.com> --- tests/nodedevmdevctltest.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/nodedevmdevctltest.c b/tests/nodedevmdevctltest.c index dab4b487b9..1d3cb00400 100644 --- a/tests/nodedevmdevctltest.c +++ b/tests/nodedevmdevctltest.c @@ -271,7 +271,7 @@ mymain(void) struct startTestInfo info = { virt_type, create, filename }; \ DO_TEST_FULL("mdevctl start " filename, testMdevctlStartHelper, info); \ } \ - while (0); + while (0) #define DO_TEST_START(filename) \ DO_TEST_START_FULL("QEMU", CREATE_DEVICE, filename) -- 2.26.2

we will be able to define mediated devices that can be started or stopped, so we need to be able to indicate whether the device is active or not, similar to other resources (storage pools, domains, etc.) Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> Reviewed-by: Erik Skultety <eskultet@redhat.com> --- src/conf/virnodedeviceobj.c | 14 ++++++++++++++ src/conf/virnodedeviceobj.h | 7 +++++++ src/libvirt_private.syms | 2 ++ src/node_device/node_device_udev.c | 3 +++ 4 files changed, 26 insertions(+) diff --git a/src/conf/virnodedeviceobj.c b/src/conf/virnodedeviceobj.c index c9bda77b2e..92f58dbf7d 100644 --- a/src/conf/virnodedeviceobj.c +++ b/src/conf/virnodedeviceobj.c @@ -39,6 +39,7 @@ struct _virNodeDeviceObj { virNodeDeviceDefPtr def; /* device definition */ bool skipUpdateCaps; /* whether to skip checking host caps, used by testdriver */ + bool active; }; struct _virNodeDeviceObjList { @@ -976,3 +977,16 @@ virNodeDeviceObjSetSkipUpdateCaps(virNodeDeviceObjPtr obj, { obj->skipUpdateCaps = skipUpdateCaps; } + +bool +virNodeDeviceObjIsActive(virNodeDeviceObjPtr obj) +{ + return obj->active; +} + +void +virNodeDeviceObjSetActive(virNodeDeviceObjPtr obj, + bool active) +{ + obj->active = active; +} diff --git a/src/conf/virnodedeviceobj.h b/src/conf/virnodedeviceobj.h index 6efdb23d36..c119f4c51f 100644 --- a/src/conf/virnodedeviceobj.h +++ b/src/conf/virnodedeviceobj.h @@ -121,3 +121,10 @@ virNodeDeviceObjSetSkipUpdateCaps(virNodeDeviceObjPtr obj, virNodeDeviceObjPtr virNodeDeviceObjListFindMediatedDeviceByUUID(virNodeDeviceObjListPtr devs, const char *uuid); + +bool +virNodeDeviceObjIsActive(virNodeDeviceObjPtr obj); + +void +virNodeDeviceObjSetActive(virNodeDeviceObjPtr obj, + bool active); diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 0636b0d8c9..c8ef6fe983 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1261,6 +1261,7 @@ virNetworkPortDefSaveStatus; # conf/virnodedeviceobj.h virNodeDeviceObjEndAPI; virNodeDeviceObjGetDef; +virNodeDeviceObjIsActive; virNodeDeviceObjListAssignDef; virNodeDeviceObjListExport; virNodeDeviceObjListFindByName; @@ -1273,6 +1274,7 @@ virNodeDeviceObjListGetParentHost; virNodeDeviceObjListNew; virNodeDeviceObjListNumOfDevices; virNodeDeviceObjListRemove; +virNodeDeviceObjSetActive; # conf/virnwfilterbindingdef.h diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c index 09048fb23f..3bae579a6c 100644 --- a/src/node_device/node_device_udev.c +++ b/src/node_device/node_device_udev.c @@ -1539,6 +1539,7 @@ udevAddOneDevice(struct udev_device *device) else event = virNodeDeviceEventUpdateNew(objdef->name); + virNodeDeviceObjSetActive(obj, true); virNodeDeviceObjEndAPI(&obj); ret = 0; @@ -1924,6 +1925,8 @@ udevSetupSystemDev(void) if (!(obj = virNodeDeviceObjListAssignDef(driver->devs, def))) goto cleanup; + virNodeDeviceObjSetActive(obj, true); + virNodeDeviceObjEndAPI(&obj); ret = 0; -- 2.26.2

...
struct _virNodeDeviceObjList { @@ -976,3 +977,16 @@ virNodeDeviceObjSetSkipUpdateCaps(virNodeDeviceObjPtr obj, { obj->skipUpdateCaps = skipUpdateCaps; }
2 blank lines in between the function definitions please.
+ +bool +virNodeDeviceObjIsActive(virNodeDeviceObjPtr obj) +{ + return obj->active; +} + +void +virNodeDeviceObjSetActive(virNodeDeviceObjPtr obj, + bool active) +{ + obj->active = active; +}
Erik

Add two flag values for virConnectListAllNodeDevices() so that we can list only node devices that are active or inactive. Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> Reviewed-by: Erik Skultety <eskultet@redhat.com> --- include/libvirt/libvirt-nodedev.h | 9 +++-- src/conf/node_device_conf.h | 8 ++++ src/conf/virnodedeviceobj.c | 56 ++++++++++++++++------------ src/libvirt-nodedev.c | 2 + src/node_device/node_device_driver.c | 2 +- 5 files changed, 50 insertions(+), 27 deletions(-) diff --git a/include/libvirt/libvirt-nodedev.h b/include/libvirt/libvirt-nodedev.h index eab8abf6ab..d304283871 100644 --- a/include/libvirt/libvirt-nodedev.h +++ b/include/libvirt/libvirt-nodedev.h @@ -61,10 +61,9 @@ int virNodeListDevices (virConnectPtr conn, * virConnectListAllNodeDevices: * * Flags used to filter the returned node devices. Flags in each group - * are exclusive. Currently only one group to filter the devices by cap - * type. - */ + * are exclusive. */ typedef enum { + /* filter the devices by cap type */ VIR_CONNECT_LIST_NODE_DEVICES_CAP_SYSTEM = 1 << 0, /* System capability */ VIR_CONNECT_LIST_NODE_DEVICES_CAP_PCI_DEV = 1 << 1, /* PCI device */ VIR_CONNECT_LIST_NODE_DEVICES_CAP_USB_DEV = 1 << 2, /* USB device */ @@ -86,6 +85,10 @@ typedef enum { VIR_CONNECT_LIST_NODE_DEVICES_CAP_AP_CARD = 1 << 18, /* s390 AP Card device */ VIR_CONNECT_LIST_NODE_DEVICES_CAP_AP_QUEUE = 1 << 19, /* s390 AP Queue */ VIR_CONNECT_LIST_NODE_DEVICES_CAP_AP_MATRIX = 1 << 20, /* s390 AP Matrix */ + + /* filter the devices by active state */ + VIR_CONNECT_LIST_NODE_DEVICES_INACTIVE = 1 << 29, /* Inactive devices */ + VIR_CONNECT_LIST_NODE_DEVICES_ACTIVE = 1 << 30, /* Active devices */ } virConnectListAllNodeDeviceFlags; int virConnectListAllNodeDevices (virConnectPtr conn, diff --git a/src/conf/node_device_conf.h b/src/conf/node_device_conf.h index c67b8e2aeb..3d7872fd6e 100644 --- a/src/conf/node_device_conf.h +++ b/src/conf/node_device_conf.h @@ -422,6 +422,14 @@ virNodeDevCapsDefFree(virNodeDevCapsDefPtr caps); VIR_CONNECT_LIST_NODE_DEVICES_CAP_AP_QUEUE | \ VIR_CONNECT_LIST_NODE_DEVICES_CAP_AP_MATRIX) +#define VIR_CONNECT_LIST_NODE_DEVICES_FILTERS_ACTIVE \ + VIR_CONNECT_LIST_NODE_DEVICES_ACTIVE |\ + VIR_CONNECT_LIST_NODE_DEVICES_INACTIVE + +#define VIR_CONNECT_LIST_NODE_DEVICES_FILTERS_ALL \ + VIR_CONNECT_LIST_NODE_DEVICES_FILTERS_CAP | \ + VIR_CONNECT_LIST_NODE_DEVICES_FILTERS_ACTIVE + int virNodeDeviceGetSCSIHostCaps(virNodeDevCapSCSIHostPtr scsi_host); diff --git a/src/conf/virnodedeviceobj.c b/src/conf/virnodedeviceobj.c index 92f58dbf7d..6e9291264a 100644 --- a/src/conf/virnodedeviceobj.c +++ b/src/conf/virnodedeviceobj.c @@ -848,8 +848,10 @@ virNodeDeviceObjListGetNames(virNodeDeviceObjListPtr devs, } -#define MATCH(FLAG) ((flags & (VIR_CONNECT_LIST_NODE_DEVICES_CAP_ ## FLAG)) && \ - virNodeDeviceObjHasCap(obj, VIR_NODE_DEV_CAP_ ## FLAG)) +#define MATCH_CAP(FLAG) ((flags & (VIR_CONNECT_LIST_NODE_DEVICES_CAP_ ## FLAG)) && \ + virNodeDeviceObjHasCap(obj, VIR_NODE_DEV_CAP_ ## FLAG)) +#define MATCH(FLAG) (flags & (FLAG)) + static bool virNodeDeviceObjMatch(virNodeDeviceObjPtr obj, unsigned int flags) @@ -861,33 +863,41 @@ virNodeDeviceObjMatch(virNodeDeviceObjPtr obj, /* filter by cap type */ if (flags & VIR_CONNECT_LIST_NODE_DEVICES_FILTERS_CAP) { - if (!(MATCH(SYSTEM) || - MATCH(PCI_DEV) || - MATCH(USB_DEV) || - MATCH(USB_INTERFACE) || - MATCH(NET) || - MATCH(SCSI_HOST) || - MATCH(SCSI_TARGET) || - MATCH(SCSI) || - MATCH(STORAGE) || - MATCH(FC_HOST) || - MATCH(VPORTS) || - MATCH(SCSI_GENERIC) || - MATCH(DRM) || - MATCH(MDEV_TYPES) || - MATCH(MDEV) || - MATCH(CCW_DEV) || - MATCH(CSS_DEV) || - MATCH(VDPA) || - MATCH(AP_CARD) || - MATCH(AP_QUEUE) || - MATCH(AP_MATRIX))) + if (!(MATCH_CAP(SYSTEM) || + MATCH_CAP(PCI_DEV) || + MATCH_CAP(USB_DEV) || + MATCH_CAP(USB_INTERFACE) || + MATCH_CAP(NET) || + MATCH_CAP(SCSI_HOST) || + MATCH_CAP(SCSI_TARGET) || + MATCH_CAP(SCSI) || + MATCH_CAP(STORAGE) || + MATCH_CAP(FC_HOST) || + MATCH_CAP(VPORTS) || + MATCH_CAP(SCSI_GENERIC) || + MATCH_CAP(DRM) || + MATCH_CAP(MDEV_TYPES) || + MATCH_CAP(MDEV) || + MATCH_CAP(CCW_DEV) || + MATCH_CAP(CSS_DEV) || + MATCH_CAP(VDPA) || + MATCH_CAP(AP_CARD) || + MATCH_CAP(AP_QUEUE) || + MATCH_CAP(AP_MATRIX))) return false; } + if (MATCH(VIR_CONNECT_LIST_NODE_DEVICES_FILTERS_ACTIVE) && + !((MATCH(VIR_CONNECT_LIST_NODE_DEVICES_ACTIVE) && + virNodeDeviceObjIsActive(obj)) || + (MATCH(VIR_CONNECT_LIST_NODE_DEVICES_INACTIVE) && + !virNodeDeviceObjIsActive(obj)))) + return false; + return true; } #undef MATCH +#undef MATCH_CAP typedef struct _virNodeDeviceObjListExportData virNodeDeviceObjListExportData; diff --git a/src/libvirt-nodedev.c b/src/libvirt-nodedev.c index eb8c735a8c..375b907852 100644 --- a/src/libvirt-nodedev.c +++ b/src/libvirt-nodedev.c @@ -105,6 +105,8 @@ virNodeNumOfDevices(virConnectPtr conn, const char *cap, unsigned int flags) * VIR_CONNECT_LIST_NODE_DEVICES_CAP_AP_CARD * VIR_CONNECT_LIST_NODE_DEVICES_CAP_AP_QUEUE * VIR_CONNECT_LIST_NODE_DEVICES_CAP_AP_MATRIX + * VIR_CONNECT_LIST_NODE_DEVICES_ACTIVE + * VIR_CONNECT_LIST_NODE_DEVICES_INACTIVE * * Returns the number of node devices found or -1 and sets @devices to NULL in * case of error. On success, the array stored into @devices is guaranteed to diff --git a/src/node_device/node_device_driver.c b/src/node_device/node_device_driver.c index 51b20848ce..c992251121 100644 --- a/src/node_device/node_device_driver.c +++ b/src/node_device/node_device_driver.c @@ -217,7 +217,7 @@ nodeConnectListAllNodeDevices(virConnectPtr conn, virNodeDevicePtr **devices, unsigned int flags) { - virCheckFlags(VIR_CONNECT_LIST_NODE_DEVICES_FILTERS_CAP, -1); + virCheckFlags(VIR_CONNECT_LIST_NODE_DEVICES_FILTERS_ALL, -1); if (virConnectListAllNodeDevicesEnsureACL(conn) < 0) return -1; -- 2.26.2

On Wed, Feb 03, 2021 at 11:38:47AM -0600, Jonathon Jongsma wrote:
Add two flag values for virConnectListAllNodeDevices() so that we can list only node devices that are active or inactive.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> Reviewed-by: Erik Skultety <eskultet@redhat.com> --- include/libvirt/libvirt-nodedev.h | 9 +++-- src/conf/node_device_conf.h | 8 ++++ src/conf/virnodedeviceobj.c | 56 ++++++++++++++++------------ src/libvirt-nodedev.c | 2 + src/node_device/node_device_driver.c | 2 +- 5 files changed, 50 insertions(+), 27 deletions(-)
diff --git a/include/libvirt/libvirt-nodedev.h b/include/libvirt/libvirt-nodedev.h index eab8abf6ab..d304283871 100644 --- a/include/libvirt/libvirt-nodedev.h +++ b/include/libvirt/libvirt-nodedev.h @@ -61,10 +61,9 @@ int virNodeListDevices (virConnectPtr conn, * virConnectListAllNodeDevices: * * Flags used to filter the returned node devices. Flags in each group - * are exclusive. Currently only one group to filter the devices by cap - * type. - */ + * are exclusive. */ typedef enum { + /* filter the devices by cap type */ VIR_CONNECT_LIST_NODE_DEVICES_CAP_SYSTEM = 1 << 0, /* System capability */ VIR_CONNECT_LIST_NODE_DEVICES_CAP_PCI_DEV = 1 << 1, /* PCI device */ VIR_CONNECT_LIST_NODE_DEVICES_CAP_USB_DEV = 1 << 2, /* USB device */ @@ -86,6 +85,10 @@ typedef enum { VIR_CONNECT_LIST_NODE_DEVICES_CAP_AP_CARD = 1 << 18, /* s390 AP Card device */ VIR_CONNECT_LIST_NODE_DEVICES_CAP_AP_QUEUE = 1 << 19, /* s390 AP Queue */ VIR_CONNECT_LIST_NODE_DEVICES_CAP_AP_MATRIX = 1 << 20, /* s390 AP Matrix */ + + /* filter the devices by active state */ + VIR_CONNECT_LIST_NODE_DEVICES_INACTIVE = 1 << 29, /* Inactive devices */ + VIR_CONNECT_LIST_NODE_DEVICES_ACTIVE = 1 << 30, /* Active devices */
We don't define sentinels on public flags, so what are we saving the last value for? Why couldn't ^this become 1U << 31? ...
+ if (MATCH(VIR_CONNECT_LIST_NODE_DEVICES_FILTERS_ACTIVE) && + !((MATCH(VIR_CONNECT_LIST_NODE_DEVICES_ACTIVE) && + virNodeDeviceObjIsActive(obj)) || + (MATCH(VIR_CONNECT_LIST_NODE_DEVICES_INACTIVE) && + !virNodeDeviceObjIsActive(obj)))) + return false;
I didn't notice this in previous versions, but I think this block would read better if written similarly as the one above it: if (flags & (VIR_CONNECT_LIST_NODE_DEVICES_FILTERS_ACTIVE)) { if (!(MATCH(VIR_CONNECT_LIST_NODE_DEVICES_ACTIVE) && virNodeDeviceObjIsActive(obj)) || MATCH(VIR_CONNECT_LIST_NODE_DEVICES_INACTIVE) && !virNodeDeviceObjIsActive(obj)) return false; } I'd also argue the new MATCH macro doesn't bring much value, but in case I was the one who suggested it in the past I'd like to avoid contradicting myself and thus we should keep it as is.
+ return true; } #undef MATCH +#undef MATCH_CAP
typedef struct _virNodeDeviceObjListExportData virNodeDeviceObjListExportData; diff --git a/src/libvirt-nodedev.c b/src/libvirt-nodedev.c index eb8c735a8c..375b907852 100644 --- a/src/libvirt-nodedev.c +++ b/src/libvirt-nodedev.c @@ -105,6 +105,8 @@ virNodeNumOfDevices(virConnectPtr conn, const char *cap, unsigned int flags) * VIR_CONNECT_LIST_NODE_DEVICES_CAP_AP_CARD * VIR_CONNECT_LIST_NODE_DEVICES_CAP_AP_QUEUE * VIR_CONNECT_LIST_NODE_DEVICES_CAP_AP_MATRIX + * VIR_CONNECT_LIST_NODE_DEVICES_ACTIVE + * VIR_CONNECT_LIST_NODE_DEVICES_INACTIVE
An idea for a trivial follow up patch: Listing the caps in the function commentary is counter productive, there's too many of them, it's easier to just look up the enum definition. Also, there's no such thing as exclusive grouped flags as the both the commentary and the header mention. Erik
* * Returns the number of node devices found or -1 and sets @devices to NULL in * case of error. On success, the array stored into @devices is guaranteed to diff --git a/src/node_device/node_device_driver.c b/src/node_device/node_device_driver.c index 51b20848ce..c992251121 100644 --- a/src/node_device/node_device_driver.c +++ b/src/node_device/node_device_driver.c @@ -217,7 +217,7 @@ nodeConnectListAllNodeDevices(virConnectPtr conn, virNodeDevicePtr **devices, unsigned int flags) { - virCheckFlags(VIR_CONNECT_LIST_NODE_DEVICES_FILTERS_CAP, -1); + virCheckFlags(VIR_CONNECT_LIST_NODE_DEVICES_FILTERS_ALL, -1);
if (virConnectListAllNodeDevicesEnsureACL(conn) < 0) return -1; -- 2.26.2

On Mon, 15 Feb 2021 18:22:41 +0100 Erik Skultety <eskultet@redhat.com> wrote:
On Wed, Feb 03, 2021 at 11:38:47AM -0600, Jonathon Jongsma wrote:
Add two flag values for virConnectListAllNodeDevices() so that we can list only node devices that are active or inactive.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> Reviewed-by: Erik Skultety <eskultet@redhat.com> --- include/libvirt/libvirt-nodedev.h | 9 +++-- src/conf/node_device_conf.h | 8 ++++ src/conf/virnodedeviceobj.c | 56 ++++++++++++++++------------ src/libvirt-nodedev.c | 2 + src/node_device/node_device_driver.c | 2 +- 5 files changed, 50 insertions(+), 27 deletions(-)
diff --git a/include/libvirt/libvirt-nodedev.h b/include/libvirt/libvirt-nodedev.h index eab8abf6ab..d304283871 100644 --- a/include/libvirt/libvirt-nodedev.h +++ b/include/libvirt/libvirt-nodedev.h @@ -61,10 +61,9 @@ int virNodeListDevices (virConnectPtr conn, * virConnectListAllNodeDevices: * * Flags used to filter the returned node devices. Flags in each group - * are exclusive. Currently only one group to filter the devices by cap - * type. - */ + * are exclusive. */ typedef enum { + /* filter the devices by cap type */ VIR_CONNECT_LIST_NODE_DEVICES_CAP_SYSTEM = 1 << 0, /* System capability */ VIR_CONNECT_LIST_NODE_DEVICES_CAP_PCI_DEV = 1 << 1, /* PCI device */ VIR_CONNECT_LIST_NODE_DEVICES_CAP_USB_DEV = 1 << 2, /* USB device */ @@ -86,6 +85,10 @@ typedef enum { VIR_CONNECT_LIST_NODE_DEVICES_CAP_AP_CARD = 1 << 18, /* s390 AP Card device */ VIR_CONNECT_LIST_NODE_DEVICES_CAP_AP_QUEUE = 1 << 19, /* s390 AP Queue */ VIR_CONNECT_LIST_NODE_DEVICES_CAP_AP_MATRIX = 1 << 20, /* s390 AP Matrix */ + + /* filter the devices by active state */ + VIR_CONNECT_LIST_NODE_DEVICES_INACTIVE = 1 << 29, /* Inactive devices */ + VIR_CONNECT_LIST_NODE_DEVICES_ACTIVE = 1 << 30, /* Active devices */
We don't define sentinels on public flags, so what are we saving the last value for? Why couldn't ^this become 1U << 31?
...
Because gcc implements the enum as a signed int and 1<<31 overflows the maximum positive integer value: In file included from ../include/libvirt/libvirt.h:42, from ../src/internal.h:65, from ../src/util/vircgroupv1.c:30: ../include/libvirt/libvirt-nodedev.h:91:57: error: result of β1 << 31β requires 33 bits to represent, but βintβ only has 32 bits [-Werror=shift-overflow=] 91 | VIR_CONNECT_LIST_NODE_DEVICES_ACTIVE = 1 << 31, /* Active devices */ | ^~ cc1: all warnings being treated as errors Jonathon

On Thu, Feb 18, 2021 at 04:05:19PM -0600, Jonathon Jongsma wrote:
On Mon, 15 Feb 2021 18:22:41 +0100 Erik Skultety <eskultet@redhat.com> wrote:
On Wed, Feb 03, 2021 at 11:38:47AM -0600, Jonathon Jongsma wrote:
Add two flag values for virConnectListAllNodeDevices() so that we can list only node devices that are active or inactive.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> Reviewed-by: Erik Skultety <eskultet@redhat.com> --- include/libvirt/libvirt-nodedev.h | 9 +++-- src/conf/node_device_conf.h | 8 ++++ src/conf/virnodedeviceobj.c | 56 ++++++++++++++++------------ src/libvirt-nodedev.c | 2 + src/node_device/node_device_driver.c | 2 +- 5 files changed, 50 insertions(+), 27 deletions(-)
diff --git a/include/libvirt/libvirt-nodedev.h b/include/libvirt/libvirt-nodedev.h index eab8abf6ab..d304283871 100644 --- a/include/libvirt/libvirt-nodedev.h +++ b/include/libvirt/libvirt-nodedev.h @@ -61,10 +61,9 @@ int virNodeListDevices (virConnectPtr conn, * virConnectListAllNodeDevices: * * Flags used to filter the returned node devices. Flags in each group - * are exclusive. Currently only one group to filter the devices by cap - * type. - */ + * are exclusive. */ typedef enum { + /* filter the devices by cap type */ VIR_CONNECT_LIST_NODE_DEVICES_CAP_SYSTEM = 1 << 0, /* System capability */ VIR_CONNECT_LIST_NODE_DEVICES_CAP_PCI_DEV = 1 << 1, /* PCI device */ VIR_CONNECT_LIST_NODE_DEVICES_CAP_USB_DEV = 1 << 2, /* USB device */ @@ -86,6 +85,10 @@ typedef enum { VIR_CONNECT_LIST_NODE_DEVICES_CAP_AP_CARD = 1 << 18, /* s390 AP Card device */ VIR_CONNECT_LIST_NODE_DEVICES_CAP_AP_QUEUE = 1 << 19, /* s390 AP Queue */ VIR_CONNECT_LIST_NODE_DEVICES_CAP_AP_MATRIX = 1 << 20, /* s390 AP Matrix */ + + /* filter the devices by active state */ + VIR_CONNECT_LIST_NODE_DEVICES_INACTIVE = 1 << 29, /* Inactive devices */ + VIR_CONNECT_LIST_NODE_DEVICES_ACTIVE = 1 << 30, /* Active devices */
We don't define sentinels on public flags, so what are we saving the last value for? Why couldn't ^this become 1U << 31?
...
Because gcc implements the enum as a signed int and 1<<31 overflows the maximum positive integer value:
In file included from ../include/libvirt/libvirt.h:42, from ../src/internal.h:65, from ../src/util/vircgroupv1.c:30: ../include/libvirt/libvirt-nodedev.h:91:57: error: result of β1 << 31β requires 33 bits to represent, but βintβ only has 32 bits [-Werror=shift-overflow=] 91 | VIR_CONNECT_LIST_NODE_DEVICES_ACTIVE = 1 << 31, /* Active devices */ | ^~ cc1: all warnings being treated as errors
Doesn't sound correct. GCC will accommodate the underlying enum type according to the needs, IOW it will select the type so that it can hold all the values inside. Note that I forced "1U" in there which explicitly asks GCC to select unsigned int for the enum, I could have gone with 1ULL << 61 (which would break our docs generator :D), but our API only supports unsigned int flags anyway. Erik

Expose a helper function that can be used by udev and mdevctl to generate device names for node devices. Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> Reviewed-by: Erik Skultety <eskultet@redhat.com> --- src/node_device/node_device_driver.c | 25 +++++++++++++++++++++++++ src/node_device/node_device_driver.h | 6 ++++++ src/node_device/node_device_udev.c | 19 +++---------------- 3 files changed, 34 insertions(+), 16 deletions(-) diff --git a/src/node_device/node_device_driver.c b/src/node_device/node_device_driver.c index c992251121..6143459618 100644 --- a/src/node_device/node_device_driver.c +++ b/src/node_device/node_device_driver.c @@ -968,3 +968,28 @@ nodedevRegister(void) return udevNodeRegister(); #endif } + + +void +nodeDeviceGenerateName(virNodeDeviceDefPtr def, + const char *subsystem, + const char *sysname, + const char *s) +{ + size_t i; + g_auto(virBuffer) buf = VIR_BUFFER_INITIALIZER; + + virBufferAsprintf(&buf, "%s_%s", + subsystem, + sysname); + + if (s != NULL) + virBufferAsprintf(&buf, "_%s", s); + + def->name = virBufferContentAndReset(&buf); + + for (i = 0; i < strlen(def->name); i++) { + if (!(g_ascii_isalnum(*(def->name + i)))) + *(def->name + i) = '_'; + } +} diff --git a/src/node_device/node_device_driver.h b/src/node_device/node_device_driver.h index 2113d2b0a5..02baf56dab 100644 --- a/src/node_device/node_device_driver.h +++ b/src/node_device/node_device_driver.h @@ -118,3 +118,9 @@ nodeDeviceGetMdevctlStartCommand(virNodeDeviceDefPtr def, char **uuid_out); virCommandPtr nodeDeviceGetMdevctlStopCommand(const char *uuid); + +void +nodeDeviceGenerateName(virNodeDeviceDefPtr def, + const char *subsystem, + const char *sysname, + const char *s); diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c index 3bae579a6c..5d1a192411 100644 --- a/src/node_device/node_device_udev.c +++ b/src/node_device/node_device_udev.c @@ -308,22 +308,9 @@ udevGenerateDeviceName(struct udev_device *device, virNodeDeviceDefPtr def, const char *s) { - size_t i; - g_auto(virBuffer) buf = VIR_BUFFER_INITIALIZER; - - virBufferAsprintf(&buf, "%s_%s", - udev_device_get_subsystem(device), - udev_device_get_sysname(device)); - - if (s != NULL) - virBufferAsprintf(&buf, "_%s", s); - - def->name = virBufferContentAndReset(&buf); - - for (i = 0; i < strlen(def->name); i++) { - if (!(g_ascii_isalnum(*(def->name + i)))) - *(def->name + i) = '_'; - } + nodeDeviceGenerateName(def, + udev_device_get_subsystem(device), + udev_device_get_sysname(device), s); return 0; } -- 2.26.2

This function will parse the list of mediated devices that are returned by mdevctl and convert it into our internal node device representation. Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> --- src/node_device/node_device_driver.c | 145 ++++++++++++++++++ src/node_device/node_device_driver.h | 4 + .../mdevctl-list-multiple.json | 59 +++++++ .../mdevctl-list-multiple.out.xml | 39 +++++ tests/nodedevmdevctltest.c | 56 ++++++- 5 files changed, 301 insertions(+), 2 deletions(-) create mode 100644 tests/nodedevmdevctldata/mdevctl-list-multiple.json create mode 100644 tests/nodedevmdevctldata/mdevctl-list-multiple.out.xml diff --git a/src/node_device/node_device_driver.c b/src/node_device/node_device_driver.c index 6143459618..fc5ac1d27e 100644 --- a/src/node_device/node_device_driver.c +++ b/src/node_device/node_device_driver.c @@ -853,6 +853,151 @@ virMdevctlStop(virNodeDeviceDefPtr def) } +static void mdevGenerateDeviceName(virNodeDeviceDefPtr dev) +{ + nodeDeviceGenerateName(dev, "mdev", dev->caps->data.mdev.uuid, NULL); +} + + +static virNodeDeviceDefPtr +nodeDeviceParseMdevctlChildDevice(const char *parent, + virJSONValuePtr json) +{ + virNodeDevCapMdevPtr mdev; + const char *uuid; + virJSONValuePtr props, attrs; + g_autoptr(virNodeDeviceDef) child = g_new0(virNodeDeviceDef, 1); + + /* the child object should have a single key equal to its uuid. + * The value is an object describing the properties of the mdev */ + if (virJSONValueObjectKeysNumber(json) != 1) + return NULL; + + uuid = virJSONValueObjectGetKey(json, 0); + props = virJSONValueObjectGetValue(json, 0); + + child->parent = g_strdup(parent); + child->caps = g_new0(virNodeDevCapsDef, 1); + child->caps->data.type = VIR_NODE_DEV_CAP_MDEV; + + mdev = &child->caps->data.mdev; + mdev->uuid = g_strdup(uuid); + mdev->type = + g_strdup(virJSONValueObjectGetString(props, "mdev_type")); + + attrs = virJSONValueObjectGet(props, "attrs"); + + if (attrs && virJSONValueIsArray(attrs)) { + size_t i; + int nattrs = virJSONValueArraySize(attrs); + + mdev->attributes = g_new0(virMediatedDeviceAttrPtr, nattrs); + mdev->nattributes = nattrs; + + for (i = 0; i < nattrs; i++) { + virJSONValuePtr attr = virJSONValueArrayGet(attrs, i); + virMediatedDeviceAttrPtr attribute; + virJSONValuePtr value; + + if (!virJSONValueIsObject(attr) || + virJSONValueObjectKeysNumber(attr) != 1) + return NULL; + + attribute = g_new0(virMediatedDeviceAttr, 1); + attribute->name = g_strdup(virJSONValueObjectGetKey(attr, 0)); + value = virJSONValueObjectGetValue(attr, 0); + attribute->value = g_strdup(virJSONValueGetString(value)); + mdev->attributes[i] = attribute; + } + } + mdevGenerateDeviceName(child); + + return g_steal_pointer(&child); +} + + +int +nodeDeviceParseMdevctlJSON(const char *jsonstring, + virNodeDeviceDefPtr **devs) +{ + int n; + g_autoptr(virJSONValue) json_devicelist = NULL; + virNodeDeviceDefPtr *outdevs = NULL; + size_t noutdevs = 0; + size_t i; + size_t j; + + json_devicelist = virJSONValueFromString(jsonstring); + + if (!json_devicelist || !virJSONValueIsArray(json_devicelist)) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("JSON response contains no devices")); + goto error; + } + + n = virJSONValueArraySize(json_devicelist); + + for (i = 0; i < n; i++) { + virJSONValuePtr obj = virJSONValueArrayGet(json_devicelist, i); + const char *parent; + virJSONValuePtr child_array; + int nchildren; + + if (!virJSONValueIsObject(obj)) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Parent device is not an object")); + goto error; + } + + /* mdevctl returns an array of objects. Each object is a parent device + * object containing a single key-value pair which maps from the name + * of the parent device to an array of child devices */ + if (virJSONValueObjectKeysNumber(obj) != 1) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Unexpected format for parent device object")); + goto error; + } + + parent = virJSONValueObjectGetKey(obj, 0); + child_array = virJSONValueObjectGetValue(obj, 0); + + if (!virJSONValueIsArray(child_array)) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Child list is not an array")); + goto error; + } + + nchildren = virJSONValueArraySize(child_array); + + for (j = 0; j < nchildren; j++) { + g_autoptr(virNodeDeviceDef) child = NULL; + virJSONValuePtr child_obj = virJSONValueArrayGet(child_array, j); + + if (!(child = nodeDeviceParseMdevctlChildDevice(parent, child_obj))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Unable to parse child device")); + goto error; + } + + if (VIR_APPEND_ELEMENT(outdevs, noutdevs, child) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Unable to append child device to list")); + goto error; + } + } + } + + *devs = outdevs; + return noutdevs; + + error: + for (i = 0; i < noutdevs; i++) + virNodeDeviceDefFree(outdevs[i]); + VIR_FREE(outdevs); + return -1; +} + + int nodeDeviceDestroy(virNodeDevicePtr device) { diff --git a/src/node_device/node_device_driver.h b/src/node_device/node_device_driver.h index 02baf56dab..2a162513c0 100644 --- a/src/node_device/node_device_driver.h +++ b/src/node_device/node_device_driver.h @@ -119,6 +119,10 @@ nodeDeviceGetMdevctlStartCommand(virNodeDeviceDefPtr def, virCommandPtr nodeDeviceGetMdevctlStopCommand(const char *uuid); +int +nodeDeviceParseMdevctlJSON(const char *jsonstring, + virNodeDeviceDefPtr **devs); + void nodeDeviceGenerateName(virNodeDeviceDefPtr def, const char *subsystem, diff --git a/tests/nodedevmdevctldata/mdevctl-list-multiple.json b/tests/nodedevmdevctldata/mdevctl-list-multiple.json new file mode 100644 index 0000000000..eefcd90c62 --- /dev/null +++ b/tests/nodedevmdevctldata/mdevctl-list-multiple.json @@ -0,0 +1,59 @@ +[ + { + "0000:00:02.0": [ + { + "200f228a-c80a-4d50-bfb7-f5a0e4e34045": { + "mdev_type": "i915-GVTg_V5_4", + "start": "manual" + } + }, + { + "de807ffc-1923-4d5f-b6c9-b20ecebc6d4b": { + "mdev_type": "i915-GVTg_V5_4", + "start": "auto" + } + }, + { + "435722ea-5f43-468a-874f-da34f1217f13": { + "mdev_type": "i915-GVTg_V5_8", + "start": "manual", + "attrs": [ + { + "testattr": "42" + } + ] + } + } + ] + }, + { + "matrix": [ + { "783e6dbb-ea0e-411f-94e2-717eaad438bf": { + "mdev_type": "vfio_ap-passthrough", + "start": "manual", + "attrs": [ + { + "assign_adapter": "5" + }, + { + "assign_adapter": "6" + }, + { + "assign_domain": "0xab" + }, + { + "assign_control_domain": "0xab" + }, + { + "assign_domain": "4" + }, + { + "assign_control_domain": "4" + } + ] + } + } + ] + } +] + diff --git a/tests/nodedevmdevctldata/mdevctl-list-multiple.out.xml b/tests/nodedevmdevctldata/mdevctl-list-multiple.out.xml new file mode 100644 index 0000000000..543ad916b7 --- /dev/null +++ b/tests/nodedevmdevctldata/mdevctl-list-multiple.out.xml @@ -0,0 +1,39 @@ +<device> + <name>mdev_200f228a_c80a_4d50_bfb7_f5a0e4e34045</name> + <parent>0000:00:02.0</parent> + <capability type='mdev'> + <type id='i915-GVTg_V5_4'/> + <iommuGroup number='0'/> + </capability> +</device> +<device> + <name>mdev_de807ffc_1923_4d5f_b6c9_b20ecebc6d4b</name> + <parent>0000:00:02.0</parent> + <capability type='mdev'> + <type id='i915-GVTg_V5_4'/> + <iommuGroup number='0'/> + </capability> +</device> +<device> + <name>mdev_435722ea_5f43_468a_874f_da34f1217f13</name> + <parent>0000:00:02.0</parent> + <capability type='mdev'> + <type id='i915-GVTg_V5_8'/> + <iommuGroup number='0'/> + <attr name='testattr' value='42'/> + </capability> +</device> +<device> + <name>mdev_783e6dbb_ea0e_411f_94e2_717eaad438bf</name> + <parent>matrix</parent> + <capability type='mdev'> + <type id='vfio_ap-passthrough'/> + <iommuGroup number='0'/> + <attr name='assign_adapter' value='5'/> + <attr name='assign_adapter' value='6'/> + <attr name='assign_domain' value='0xab'/> + <attr name='assign_control_domain' value='0xab'/> + <attr name='assign_domain' value='4'/> + <attr name='assign_control_domain' value='4'/> + </capability> +</device> diff --git a/tests/nodedevmdevctltest.c b/tests/nodedevmdevctltest.c index 1d3cb00400..93f4e3252c 100644 --- a/tests/nodedevmdevctltest.c +++ b/tests/nodedevmdevctltest.c @@ -143,6 +143,53 @@ testMdevctlStop(const void *data) return ret; } + +static int +testMdevctlParse(const void *data) +{ + g_autofree char *buf = NULL; + const char *filename = data; + g_autofree char *jsonfile = g_strdup_printf("%s/nodedevmdevctldata/%s.json", + abs_srcdir, filename); + g_autofree char *xmloutfile = g_strdup_printf("%s/nodedevmdevctldata/%s.out.xml", + abs_srcdir, filename); + int ret = -1; + int nmdevs = 0; + virNodeDeviceDefPtr *mdevs = NULL; + virBuffer xmloutbuf = VIR_BUFFER_INITIALIZER; + size_t i; + + if (virFileReadAll(jsonfile, 1024*1024, &buf) < 0) { + VIR_TEST_DEBUG("Unable to read file %s", jsonfile); + return -1; + } + + if ((nmdevs = nodeDeviceParseMdevctlJSON(buf, &mdevs)) < 0) { + VIR_TEST_DEBUG("Unable to parse json for %s", filename); + return -1; + } + + for (i = 0; i < nmdevs; i++) { + g_autofree char *devxml = virNodeDeviceDefFormat(mdevs[i]); + if (!devxml) + goto cleanup; + virBufferAddStr(&xmloutbuf, devxml); + } + + if (nodedevCompareToFile(virBufferCurrentContent(&xmloutbuf), xmloutfile) < 0) + goto cleanup; + + ret = 0; + + cleanup: + virBufferFreeAndReset(&xmloutbuf); + for (i = 0; i < nmdevs; i++) + virNodeDeviceDefFree(mdevs[i]); + g_free(mdevs); + + return ret; +} + static void nodedevTestDriverFree(virNodeDeviceDriverStatePtr drv) { @@ -263,13 +310,13 @@ mymain(void) } #define DO_TEST_FULL(desc, func, info) \ - if (virTestRun(desc, func, &info) < 0) \ + if (virTestRun(desc, func, info) < 0) \ ret = -1; #define DO_TEST_START_FULL(virt_type, create, filename) \ do { \ struct startTestInfo info = { virt_type, create, filename }; \ - DO_TEST_FULL("mdevctl start " filename, testMdevctlStartHelper, info); \ + DO_TEST_FULL("mdevctl start " filename, testMdevctlStartHelper, &info); \ } \ while (0) @@ -279,6 +326,9 @@ mymain(void) #define DO_TEST_STOP(uuid) \ DO_TEST_FULL("mdevctl stop " uuid, testMdevctlStop, uuid) +#define DO_TEST_PARSE_JSON(filename) \ + DO_TEST_FULL("parse mdevctl json " filename, testMdevctlParse, filename) + /* Test mdevctl start commands */ DO_TEST_START("mdev_d069d019_36ea_4111_8f0a_8c9a70e21366"); DO_TEST_START("mdev_fedc4916_1ca8_49ac_b176_871d16c13076"); @@ -287,6 +337,8 @@ mymain(void) /* Test mdevctl stop command, pass an arbitrary uuid */ DO_TEST_STOP("e2451f73-c95b-4124-b900-e008af37c576"); + DO_TEST_PARSE_JSON("mdevctl-list-multiple"); + done: nodedevTestDriverFree(driver); -- 2.26.2

On Wed, Feb 03, 2021 at 11:38:49AM -0600, Jonathon Jongsma wrote:
This function will parse the list of mediated devices that are returned by mdevctl and convert it into our internal node device representation.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> --- src/node_device/node_device_driver.c | 145 ++++++++++++++++++ src/node_device/node_device_driver.h | 4 + .../mdevctl-list-multiple.json | 59 +++++++ .../mdevctl-list-multiple.out.xml | 39 +++++ tests/nodedevmdevctltest.c | 56 ++++++- 5 files changed, 301 insertions(+), 2 deletions(-) create mode 100644 tests/nodedevmdevctldata/mdevctl-list-multiple.json create mode 100644 tests/nodedevmdevctldata/mdevctl-list-multiple.out.xml
diff --git a/src/node_device/node_device_driver.c b/src/node_device/node_device_driver.c index 6143459618..fc5ac1d27e 100644 --- a/src/node_device/node_device_driver.c +++ b/src/node_device/node_device_driver.c @@ -853,6 +853,151 @@ virMdevctlStop(virNodeDeviceDefPtr def) }
+static void mdevGenerateDeviceName(virNodeDeviceDefPtr dev) +{ + nodeDeviceGenerateName(dev, "mdev", dev->caps->data.mdev.uuid, NULL); +} + + +static virNodeDeviceDefPtr +nodeDeviceParseMdevctlChildDevice(const char *parent, + virJSONValuePtr json) +{ + virNodeDevCapMdevPtr mdev; + const char *uuid; + virJSONValuePtr props, attrs;
^This was the other "One declaration per line" place I could not remember during v3 review. ...
+int +nodeDeviceParseMdevctlJSON(const char *jsonstring, + virNodeDeviceDefPtr **devs) +{ + int n; + g_autoptr(virJSONValue) json_devicelist = NULL; + virNodeDeviceDefPtr *outdevs = NULL; + size_t noutdevs = 0; + size_t i; + size_t j; + + json_devicelist = virJSONValueFromString(jsonstring); + + if (!json_devicelist || !virJSONValueIsArray(json_devicelist)) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("JSON response contains no devices"));
I'd make it even more specific - "mdevctl JSON response..."
+ goto error; + } + + n = virJSONValueArraySize(json_devicelist); + + for (i = 0; i < n; i++) { + virJSONValuePtr obj = virJSONValueArrayGet(json_devicelist, i); + const char *parent; + virJSONValuePtr child_array; + int nchildren; + + if (!virJSONValueIsObject(obj)) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Parent device is not an object")); + goto error; + } + + /* mdevctl returns an array of objects. Each object is a parent device + * object containing a single key-value pair which maps from the name + * of the parent device to an array of child devices */ + if (virJSONValueObjectKeysNumber(obj) != 1) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Unexpected format for parent device object")); + goto error; + } + + parent = virJSONValueObjectGetKey(obj, 0); + child_array = virJSONValueObjectGetValue(obj, 0); + + if (!virJSONValueIsArray(child_array)) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Child list is not an array"));
"Parent device's JSON object data is not an array" - or something along those lines
+ goto error; + } + + nchildren = virJSONValueArraySize(child_array); + + for (j = 0; j < nchildren; j++) { + g_autoptr(virNodeDeviceDef) child = NULL; + virJSONValuePtr child_obj = virJSONValueArrayGet(child_array, j); + + if (!(child = nodeDeviceParseMdevctlChildDevice(parent, child_obj))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Unable to parse child device"));
"Unable to parse mdev child device"
+ goto error; + } + + if (VIR_APPEND_ELEMENT(outdevs, noutdevs, child) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Unable to append child device to list"));
APPEND_ELEMENT will either report an "Out of bounds" error or abort on allocation failure, so please drop ^this virReportError.
+ goto error; + } + } + } + + *devs = outdevs; + return noutdevs; + + error: + for (i = 0; i < noutdevs; i++) + virNodeDeviceDefFree(outdevs[i]); + VIR_FREE(outdevs); + return -1; +} + + int nodeDeviceDestroy(virNodeDevicePtr device) { diff --git a/src/node_device/node_device_driver.h b/src/node_device/node_device_driver.h index 02baf56dab..2a162513c0 100644 --- a/src/node_device/node_device_driver.h +++ b/src/node_device/node_device_driver.h @@ -119,6 +119,10 @@ nodeDeviceGetMdevctlStartCommand(virNodeDeviceDefPtr def, virCommandPtr nodeDeviceGetMdevctlStopCommand(const char *uuid);
+int +nodeDeviceParseMdevctlJSON(const char *jsonstring, + virNodeDeviceDefPtr **devs); + void nodeDeviceGenerateName(virNodeDeviceDefPtr def, const char *subsystem, diff --git a/tests/nodedevmdevctldata/mdevctl-list-multiple.json b/tests/nodedevmdevctldata/mdevctl-list-multiple.json new file mode 100644 index 0000000000..eefcd90c62 --- /dev/null +++ b/tests/nodedevmdevctldata/mdevctl-list-multiple.json @@ -0,0 +1,59 @@ +[ + { + "0000:00:02.0": [ + { + "200f228a-c80a-4d50-bfb7-f5a0e4e34045": { + "mdev_type": "i915-GVTg_V5_4", + "start": "manual" + } + }, + { + "de807ffc-1923-4d5f-b6c9-b20ecebc6d4b": { + "mdev_type": "i915-GVTg_V5_4", + "start": "auto" + } + }, + { + "435722ea-5f43-468a-874f-da34f1217f13": { + "mdev_type": "i915-GVTg_V5_8", + "start": "manual", + "attrs": [ + { + "testattr": "42" + } + ] + } + } + ] + }, + { + "matrix": [ + { "783e6dbb-ea0e-411f-94e2-717eaad438bf": { + "mdev_type": "vfio_ap-passthrough", + "start": "manual", + "attrs": [ + { + "assign_adapter": "5" + }, + { + "assign_adapter": "6" + },
I'd still like to know why there are 2 assign_adapter and 2 assign_domain attributes here, I'm simply confused what the outcome should be.
+ { + "assign_domain": "0xab" + }, + { + "assign_control_domain": "0xab" + }, + { + "assign_domain": "4" + }, + { + "assign_control_domain": "4" + } + ] + } + } + ] + } +] +
...
+ <name>mdev_783e6dbb_ea0e_411f_94e2_717eaad438bf</name> + <parent>matrix</parent> + <capability type='mdev'> + <type id='vfio_ap-passthrough'/> + <iommuGroup number='0'/> + <attr name='assign_adapter' value='5'/> + <attr name='assign_adapter' value='6'/> + <attr name='assign_domain' value='0xab'/> + <attr name='assign_control_domain' value='0xab'/> + <attr name='assign_domain' value='4'/> + <attr name='assign_control_domain' value='4'/>
Here too I'd like to hear an opinion (since v3) on naming the attributes in such way that they correspond to the respective elements: ap-adapter, ap-domain, etc. This naming is very unintuitive if not documented properly; unless there's an internal reason why they have to be named "assign_control", etc. Erik

On Mon, 15 Feb 2021 18:22:08 +0100 Erik Skultety <eskultet@redhat.com> wrote:
On Wed, Feb 03, 2021 at 11:38:49AM -0600, Jonathon Jongsma wrote:
diff --git a/tests/nodedevmdevctldata/mdevctl-list-multiple.json b/tests/nodedevmdevctldata/mdevctl-list-multiple.json new file mode 100644 index 0000000000..eefcd90c62 --- /dev/null +++ b/tests/nodedevmdevctldata/mdevctl-list-multiple.json @@ -0,0 +1,59 @@ +[ + { + "0000:00:02.0": [ + { + "200f228a-c80a-4d50-bfb7-f5a0e4e34045": { + "mdev_type": "i915-GVTg_V5_4", + "start": "manual" + } + }, + { + "de807ffc-1923-4d5f-b6c9-b20ecebc6d4b": { + "mdev_type": "i915-GVTg_V5_4", + "start": "auto" + } + }, + { + "435722ea-5f43-468a-874f-da34f1217f13": { + "mdev_type": "i915-GVTg_V5_8", + "start": "manual", + "attrs": [ + { + "testattr": "42" + } + ] + } + } + ] + }, + { + "matrix": [ + { "783e6dbb-ea0e-411f-94e2-717eaad438bf": { + "mdev_type": "vfio_ap-passthrough", + "start": "manual", + "attrs": [ + { + "assign_adapter": "5" + }, + { + "assign_adapter": "6" + },
I'd still like to know why there are 2 assign_adapter and 2 assign_domain attributes here, I'm simply confused what the outcome should be.
As far as I can recall, i was just trying to use some real-world-ish mdevctl output to test the parsing and handling of mdev attributes. In this case, I believe that I simply copied the example output from the mdevctl documentation. See: https://github.com/mdevctl/mdevctl#advanced-usage-attributes-and-json
+ { + "assign_domain": "0xab" + }, + { + "assign_control_domain": "0xab" + }, + { + "assign_domain": "4" + }, + { + "assign_control_domain": "4" + } + ] + } + } + ] + } +] +
...
+ <name>mdev_783e6dbb_ea0e_411f_94e2_717eaad438bf</name> + <parent>matrix</parent> + <capability type='mdev'> + <type id='vfio_ap-passthrough'/> + <iommuGroup number='0'/> + <attr name='assign_adapter' value='5'/> + <attr name='assign_adapter' value='6'/> + <attr name='assign_domain' value='0xab'/> + <attr name='assign_control_domain' value='0xab'/> + <attr name='assign_domain' value='4'/> + <attr name='assign_control_domain' value='4'/>
Here too I'd like to hear an opinion (since v3) on naming the attributes in such way that they correspond to the respective elements: ap-adapter, ap-domain, etc. This naming is very unintuitive if not documented properly; unless there's an internal reason why they have to be named "assign_control", etc.
These are the names of the attributes that are used to configure these devices in sysfs: https://www.ibm.com/support/knowledgecenter/linuxonibm/com.ibm.linux.z.ljdd/... The idea here is that <attr> is a direct mapping to and from the mdev sysfs attributes, since that is how these devices are configured. An attribute name means nothing to libvirt, it is just an opaque name that we pass to mdevctl. If we were to deviate from this strict mapping and add "friendlier" names in libvirt, we would need to maintain a custom per-device mapping from mdev sysfs attribute name to libvirt friendly-name. That seems unmaintainable. Thanks, Jonathon

On Tue, 16 Feb 2021 16:00:40 -0600 Jonathon Jongsma <jjongsma@redhat.com> wrote:
On Mon, 15 Feb 2021 18:22:08 +0100 Erik Skultety <eskultet@redhat.com> wrote:
On Wed, Feb 03, 2021 at 11:38:49AM -0600, Jonathon Jongsma wrote:
diff --git a/tests/nodedevmdevctldata/mdevctl-list-multiple.json b/tests/nodedevmdevctldata/mdevctl-list-multiple.json new file mode 100644 index 0000000000..eefcd90c62 --- /dev/null +++ b/tests/nodedevmdevctldata/mdevctl-list-multiple.json @@ -0,0 +1,59 @@ +[ + { + "0000:00:02.0": [ + { + "200f228a-c80a-4d50-bfb7-f5a0e4e34045": { + "mdev_type": "i915-GVTg_V5_4", + "start": "manual" + } + }, + { + "de807ffc-1923-4d5f-b6c9-b20ecebc6d4b": { + "mdev_type": "i915-GVTg_V5_4", + "start": "auto" + } + }, + { + "435722ea-5f43-468a-874f-da34f1217f13": { + "mdev_type": "i915-GVTg_V5_8", + "start": "manual", + "attrs": [ + { + "testattr": "42" + } + ] + } + } + ] + }, + { + "matrix": [ + { "783e6dbb-ea0e-411f-94e2-717eaad438bf": { + "mdev_type": "vfio_ap-passthrough", + "start": "manual", + "attrs": [ + { + "assign_adapter": "5" + }, + { + "assign_adapter": "6" + },
I'd still like to know why there are 2 assign_adapter and 2 assign_domain attributes here, I'm simply confused what the outcome should be.
As far as I can recall, i was just trying to use some real-world-ish mdevctl output to test the parsing and handling of mdev attributes. In this case, I believe that I simply copied the example output from the mdevctl documentation. See:
https://github.com/mdevctl/mdevctl#advanced-usage-attributes-and-json
+ { + "assign_domain": "0xab" + }, + { + "assign_control_domain": "0xab" + }, + { + "assign_domain": "4" + }, + { + "assign_control_domain": "4" + } + ] + } + } + ] + } +] +
...
+ <name>mdev_783e6dbb_ea0e_411f_94e2_717eaad438bf</name> + <parent>matrix</parent> + <capability type='mdev'> + <type id='vfio_ap-passthrough'/> + <iommuGroup number='0'/> + <attr name='assign_adapter' value='5'/> + <attr name='assign_adapter' value='6'/> + <attr name='assign_domain' value='0xab'/> + <attr name='assign_control_domain' value='0xab'/> + <attr name='assign_domain' value='4'/> + <attr name='assign_control_domain' value='4'/>
Here too I'd like to hear an opinion (since v3) on naming the attributes in such way that they correspond to the respective elements: ap-adapter, ap-domain, etc. This naming is very unintuitive if not documented properly; unless there's an internal reason why they have to be named "assign_control", etc.
These are the names of the attributes that are used to configure these devices in sysfs: https://www.ibm.com/support/knowledgecenter/linuxonibm/com.ibm.linux.z.ljdd/...
The idea here is that <attr> is a direct mapping to and from the mdev sysfs attributes, since that is how these devices are configured. An attribute name means nothing to libvirt, it is just an opaque name that we pass to mdevctl. If we were to deviate from this strict mapping and add "friendlier" names in libvirt, we would need to maintain a custom per-device mapping from mdev sysfs attribute name to libvirt friendly-name. That seems unmaintainable.
Yep, and the names don't mean anything to mdevctl either, as you say they're just sysfs attributes, mdevctl looks for the named attribute and writes the value to it. Note that ordering of the attributes might be important, which is why mdevctl stores them in an array and provides some utility to re-index attributes. I can't tell if the above example necessarily imposes that ordering from the xml. Thanks, Alex

On Tue, 16 Feb 2021 15:20:28 -0700 Alex Williamson <alex.williamson@redhat.com> wrote:
On Tue, 16 Feb 2021 16:00:40 -0600 Jonathon Jongsma <jjongsma@redhat.com> wrote:
On Mon, 15 Feb 2021 18:22:08 +0100 Erik Skultety <eskultet@redhat.com> wrote:
On Wed, Feb 03, 2021 at 11:38:49AM -0600, Jonathon Jongsma wrote:
diff --git a/tests/nodedevmdevctldata/mdevctl-list-multiple.json b/tests/nodedevmdevctldata/mdevctl-list-multiple.json new file mode 100644 index 0000000000..eefcd90c62 --- /dev/null +++ b/tests/nodedevmdevctldata/mdevctl-list-multiple.json @@ -0,0 +1,59 @@ +[ + { + "0000:00:02.0": [ + { + "200f228a-c80a-4d50-bfb7-f5a0e4e34045": { + "mdev_type": "i915-GVTg_V5_4", + "start": "manual" + } + }, + { + "de807ffc-1923-4d5f-b6c9-b20ecebc6d4b": { + "mdev_type": "i915-GVTg_V5_4", + "start": "auto" + } + }, + { + "435722ea-5f43-468a-874f-da34f1217f13": { + "mdev_type": "i915-GVTg_V5_8", + "start": "manual", + "attrs": [ + { + "testattr": "42" + } + ] + } + } + ] + }, + { + "matrix": [ + { "783e6dbb-ea0e-411f-94e2-717eaad438bf": { + "mdev_type": "vfio_ap-passthrough", + "start": "manual", + "attrs": [ + { + "assign_adapter": "5" + }, + { + "assign_adapter": "6" + },
I'd still like to know why there are 2 assign_adapter and 2 assign_domain attributes here, I'm simply confused what the outcome should be.
As far as I can recall, i was just trying to use some real-world-ish mdevctl output to test the parsing and handling of mdev attributes. In this case, I believe that I simply copied the example output from the mdevctl documentation. See:
https://github.com/mdevctl/mdevctl#advanced-usage-attributes-and-json
+ { + "assign_domain": "0xab" + }, + { + "assign_control_domain": "0xab" + }, + { + "assign_domain": "4" + }, + { + "assign_control_domain": "4" + } + ] + } + } + ] + } +] +
...
+ <name>mdev_783e6dbb_ea0e_411f_94e2_717eaad438bf</name> + <parent>matrix</parent> + <capability type='mdev'> + <type id='vfio_ap-passthrough'/> + <iommuGroup number='0'/> + <attr name='assign_adapter' value='5'/> + <attr name='assign_adapter' value='6'/> + <attr name='assign_domain' value='0xab'/> + <attr name='assign_control_domain' value='0xab'/> + <attr name='assign_domain' value='4'/> + <attr name='assign_control_domain' value='4'/>
Here too I'd like to hear an opinion (since v3) on naming the attributes in such way that they correspond to the respective elements: ap-adapter, ap-domain, etc. This naming is very unintuitive if not documented properly; unless there's an internal reason why they have to be named "assign_control", etc.
These are the names of the attributes that are used to configure these devices in sysfs: https://www.ibm.com/support/knowledgecenter/linuxonibm/com.ibm.linux.z.ljdd/...
The idea here is that <attr> is a direct mapping to and from the mdev sysfs attributes, since that is how these devices are configured. An attribute name means nothing to libvirt, it is just an opaque name that we pass to mdevctl. If we were to deviate from this strict mapping and add "friendlier" names in libvirt, we would need to maintain a custom per-device mapping from mdev sysfs attribute name to libvirt friendly-name. That seems unmaintainable.
Yep, and the names don't mean anything to mdevctl either, as you say they're just sysfs attributes, mdevctl looks for the named attribute and writes the value to it. Note that ordering of the attributes might be important, which is why mdevctl stores them in an array and provides some utility to re-index attributes. I can't tell if the above example necessarily imposes that ordering from the xml. Thanks,
Yes, the order of xml attributes is significant, they should be passed to mdevctl in the same order as they are defined in the xml. Though I don't think I've documented that anywhere... Jonathon

On Wed, Feb 17, 2021 at 08:55:12AM -0600, Jonathon Jongsma wrote:
On Tue, 16 Feb 2021 15:20:28 -0700 Alex Williamson <alex.williamson@redhat.com> wrote:
On Tue, 16 Feb 2021 16:00:40 -0600 Jonathon Jongsma <jjongsma@redhat.com> wrote:
On Mon, 15 Feb 2021 18:22:08 +0100 Erik Skultety <eskultet@redhat.com> wrote:
On Wed, Feb 03, 2021 at 11:38:49AM -0600, Jonathon Jongsma wrote:
diff --git a/tests/nodedevmdevctldata/mdevctl-list-multiple.json b/tests/nodedevmdevctldata/mdevctl-list-multiple.json new file mode 100644 index 0000000000..eefcd90c62 --- /dev/null +++ b/tests/nodedevmdevctldata/mdevctl-list-multiple.json @@ -0,0 +1,59 @@ +[ + { + "0000:00:02.0": [ + { + "200f228a-c80a-4d50-bfb7-f5a0e4e34045": { + "mdev_type": "i915-GVTg_V5_4", + "start": "manual" + } + }, + { + "de807ffc-1923-4d5f-b6c9-b20ecebc6d4b": { + "mdev_type": "i915-GVTg_V5_4", + "start": "auto" + } + }, + { + "435722ea-5f43-468a-874f-da34f1217f13": { + "mdev_type": "i915-GVTg_V5_8", + "start": "manual", + "attrs": [ + { + "testattr": "42" + } + ] + } + } + ] + }, + { + "matrix": [ + { "783e6dbb-ea0e-411f-94e2-717eaad438bf": { + "mdev_type": "vfio_ap-passthrough", + "start": "manual", + "attrs": [ + { + "assign_adapter": "5" + }, + { + "assign_adapter": "6" + },
I'd still like to know why there are 2 assign_adapter and 2 assign_domain attributes here, I'm simply confused what the outcome should be.
As far as I can recall, i was just trying to use some real-world-ish mdevctl output to test the parsing and handling of mdev attributes. In this case, I believe that I simply copied the example output from the mdevctl documentation. See:
https://github.com/mdevctl/mdevctl#advanced-usage-attributes-and-json
+ { + "assign_domain": "0xab" + }, + { + "assign_control_domain": "0xab" + }, + { + "assign_domain": "4" + }, + { + "assign_control_domain": "4" + } + ] + } + } + ] + } +] +
...
+ <name>mdev_783e6dbb_ea0e_411f_94e2_717eaad438bf</name> + <parent>matrix</parent> + <capability type='mdev'> + <type id='vfio_ap-passthrough'/> + <iommuGroup number='0'/> + <attr name='assign_adapter' value='5'/> + <attr name='assign_adapter' value='6'/> + <attr name='assign_domain' value='0xab'/> + <attr name='assign_control_domain' value='0xab'/> + <attr name='assign_domain' value='4'/> + <attr name='assign_control_domain' value='4'/>
Here too I'd like to hear an opinion (since v3) on naming the attributes in such way that they correspond to the respective elements: ap-adapter, ap-domain, etc. This naming is very unintuitive if not documented properly; unless there's an internal reason why they have to be named "assign_control", etc.
These are the names of the attributes that are used to configure these devices in sysfs: https://www.ibm.com/support/knowledgecenter/linuxonibm/com.ibm.linux.z.ljdd/...
The idea here is that <attr> is a direct mapping to and from the mdev sysfs attributes, since that is how these devices are configured. An attribute name means nothing to libvirt, it is just an opaque name that we pass to mdevctl. If we were to deviate from this strict mapping and add "friendlier" names in libvirt, we would need to maintain a custom per-device mapping from mdev sysfs attribute name to libvirt friendly-name. That seems unmaintainable.
Yep, and the names don't mean anything to mdevctl either, as you say they're just sysfs attributes, mdevctl looks for the named attribute and writes the value to it. Note that ordering of the attributes might be important, which is why mdevctl stores them in an array and provides some utility to re-index attributes. I can't tell if the above example necessarily imposes that ordering from the xml. Thanks,
Yes, the order of xml attributes is significant, they should be passed to mdevctl in the same order as they are defined in the xml. Though I don't think I've documented that anywhere...
Okay, those are fair arguments, so we need to document the usage in libvirt properly. Erik

This adds an internal API to query for persistent mediated devices that are defined by mdevctl. Upcoming commits will make use of this information. Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> --- src/node_device/node_device_driver.c | 18 +++++++++ src/node_device/node_device_driver.h | 3 ++ .../mdevctl-list-defined.argv | 1 + tests/nodedevmdevctltest.c | 39 +++++++++++++++++++ 4 files changed, 61 insertions(+) create mode 100644 tests/nodedevmdevctldata/mdevctl-list-defined.argv diff --git a/src/node_device/node_device_driver.c b/src/node_device/node_device_driver.c index fc5ac1d27e..2778e41f97 100644 --- a/src/node_device/node_device_driver.c +++ b/src/node_device/node_device_driver.c @@ -853,6 +853,24 @@ virMdevctlStop(virNodeDeviceDefPtr def) } +virCommandPtr +nodeDeviceGetMdevctlListCommand(bool defined, + char **output) +{ + virCommandPtr cmd = virCommandNewArgList(MDEVCTL, + "list", + "--dumpjson", + NULL); + + if (defined) + virCommandAddArg(cmd, "--defined"); + + virCommandSetOutputBuffer(cmd, output); + + return cmd; +} + + static void mdevGenerateDeviceName(virNodeDeviceDefPtr dev) { nodeDeviceGenerateName(dev, "mdev", dev->caps->data.mdev.uuid, NULL); diff --git a/src/node_device/node_device_driver.h b/src/node_device/node_device_driver.h index 2a162513c0..80ac7c5320 100644 --- a/src/node_device/node_device_driver.h +++ b/src/node_device/node_device_driver.h @@ -119,6 +119,9 @@ nodeDeviceGetMdevctlStartCommand(virNodeDeviceDefPtr def, virCommandPtr nodeDeviceGetMdevctlStopCommand(const char *uuid); +virCommandPtr +nodeDeviceGetMdevctlListCommand(bool defined, char **output); + int nodeDeviceParseMdevctlJSON(const char *jsonstring, virNodeDeviceDefPtr **devs); diff --git a/tests/nodedevmdevctldata/mdevctl-list-defined.argv b/tests/nodedevmdevctldata/mdevctl-list-defined.argv new file mode 100644 index 0000000000..72b5906e9e --- /dev/null +++ b/tests/nodedevmdevctldata/mdevctl-list-defined.argv @@ -0,0 +1 @@ +$MDEVCTL_BINARY$ list --dumpjson --defined diff --git a/tests/nodedevmdevctltest.c b/tests/nodedevmdevctltest.c index 93f4e3252c..8bc464d59f 100644 --- a/tests/nodedevmdevctltest.c +++ b/tests/nodedevmdevctltest.c @@ -143,6 +143,40 @@ testMdevctlStop(const void *data) return ret; } +static int +testMdevctlListDefined(const void *data G_GNUC_UNUSED) +{ + virBuffer buf = VIR_BUFFER_INITIALIZER; + const char *actualCmdline = NULL; + int ret = -1; + g_autoptr(virCommand) cmd = NULL; + g_autofree char *output = NULL; + g_autofree char *cmdlinefile = + g_strdup_printf("%s/nodedevmdevctldata/mdevctl-list-defined.argv", + abs_srcdir); + + cmd = nodeDeviceGetMdevctlListCommand(true, &output); + + if (!cmd) + goto cleanup; + + virCommandSetDryRun(&buf, NULL, NULL); + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + if (!(actualCmdline = virBufferCurrentContent(&buf))) + goto cleanup; + + if (nodedevCompareToFile(actualCmdline, cmdlinefile) < 0) + goto cleanup; + + ret = 0; + + cleanup: + virBufferFreeAndReset(&buf); + virCommandSetDryRun(NULL, NULL, NULL); + return ret; +} static int testMdevctlParse(const void *data) @@ -326,6 +360,9 @@ mymain(void) #define DO_TEST_STOP(uuid) \ DO_TEST_FULL("mdevctl stop " uuid, testMdevctlStop, uuid) +#define DO_TEST_LIST_DEFINED() \ + DO_TEST_FULL("mdevctl list --defined", testMdevctlListDefined, NULL) + #define DO_TEST_PARSE_JSON(filename) \ DO_TEST_FULL("parse mdevctl json " filename, testMdevctlParse, filename) @@ -337,6 +374,8 @@ mymain(void) /* Test mdevctl stop command, pass an arbitrary uuid */ DO_TEST_STOP("e2451f73-c95b-4124-b900-e008af37c576"); + DO_TEST_LIST_DEFINED(); + DO_TEST_PARSE_JSON("mdevctl-list-multiple"); done: -- 2.26.2

Consistent with other objects (e.g. virDomainObj), add a field to indicate whether the node device is persistent or transient. Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> --- src/conf/virnodedeviceobj.c | 14 ++++++++++++++ src/conf/virnodedeviceobj.h | 6 ++++++ src/libvirt_private.syms | 2 ++ 3 files changed, 22 insertions(+) diff --git a/src/conf/virnodedeviceobj.c b/src/conf/virnodedeviceobj.c index 6e9291264a..53cf3d1bcd 100644 --- a/src/conf/virnodedeviceobj.c +++ b/src/conf/virnodedeviceobj.c @@ -40,6 +40,7 @@ struct _virNodeDeviceObj { bool skipUpdateCaps; /* whether to skip checking host caps, used by testdriver */ bool active; + bool persistent; }; struct _virNodeDeviceObjList { @@ -1000,3 +1001,16 @@ virNodeDeviceObjSetActive(virNodeDeviceObjPtr obj, { obj->active = active; } + +bool +virNodeDeviceObjIsPersistent(virNodeDeviceObjPtr obj) +{ + return obj->persistent; +} + +void +virNodeDeviceObjSetPersistent(virNodeDeviceObjPtr obj, + bool persistent) +{ + obj->persistent = persistent; +} diff --git a/src/conf/virnodedeviceobj.h b/src/conf/virnodedeviceobj.h index c119f4c51f..43af012103 100644 --- a/src/conf/virnodedeviceobj.h +++ b/src/conf/virnodedeviceobj.h @@ -128,3 +128,9 @@ virNodeDeviceObjIsActive(virNodeDeviceObjPtr obj); void virNodeDeviceObjSetActive(virNodeDeviceObjPtr obj, bool active); +bool +virNodeDeviceObjIsPersistent(virNodeDeviceObjPtr obj); + +void +virNodeDeviceObjSetPersistent(virNodeDeviceObjPtr obj, + bool persistent); diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index c8ef6fe983..d044a1111b 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1262,6 +1262,7 @@ virNetworkPortDefSaveStatus; virNodeDeviceObjEndAPI; virNodeDeviceObjGetDef; virNodeDeviceObjIsActive; +virNodeDeviceObjIsPersistent; virNodeDeviceObjListAssignDef; virNodeDeviceObjListExport; virNodeDeviceObjListFindByName; @@ -1275,6 +1276,7 @@ virNodeDeviceObjListNew; virNodeDeviceObjListNumOfDevices; virNodeDeviceObjListRemove; virNodeDeviceObjSetActive; +virNodeDeviceObjSetPersistent; # conf/virnwfilterbindingdef.h -- 2.26.2

On Wed, Feb 03, 2021 at 11:38:51AM -0600, Jonathon Jongsma wrote:
Consistent with other objects (e.g. virDomainObj), add a field to indicate whether the node device is persistent or transient.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> --- src/conf/virnodedeviceobj.c | 14 ++++++++++++++ src/conf/virnodedeviceobj.h | 6 ++++++ src/libvirt_private.syms | 2 ++ 3 files changed, 22 insertions(+)
diff --git a/src/conf/virnodedeviceobj.c b/src/conf/virnodedeviceobj.c index 6e9291264a..53cf3d1bcd 100644 --- a/src/conf/virnodedeviceobj.c +++ b/src/conf/virnodedeviceobj.c @@ -40,6 +40,7 @@ struct _virNodeDeviceObj { bool skipUpdateCaps; /* whether to skip checking host caps, used by testdriver */ bool active; + bool persistent; };
struct _virNodeDeviceObjList { @@ -1000,3 +1001,16 @@ virNodeDeviceObjSetActive(virNodeDeviceObjPtr obj, { obj->active = active; }
2 blank lines in between function definitions please. Reviewed-by: Erik Skultety <eskultet@redhat.com>
+ +bool +virNodeDeviceObjIsPersistent(virNodeDeviceObjPtr obj) +{ + return obj->persistent; +} + +void +virNodeDeviceObjSetPersistent(virNodeDeviceObjPtr obj, + bool persistent) +{ + obj->persistent = persistent; +} diff --git a/src/conf/virnodedeviceobj.h b/src/conf/virnodedeviceobj.h index c119f4c51f..43af012103 100644 --- a/src/conf/virnodedeviceobj.h +++ b/src/conf/virnodedeviceobj.h @@ -128,3 +128,9 @@ virNodeDeviceObjIsActive(virNodeDeviceObjPtr obj); void virNodeDeviceObjSetActive(virNodeDeviceObjPtr obj, bool active); +bool +virNodeDeviceObjIsPersistent(virNodeDeviceObjPtr obj); + +void +virNodeDeviceObjSetPersistent(virNodeDeviceObjPtr obj, + bool persistent); diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index c8ef6fe983..d044a1111b 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1262,6 +1262,7 @@ virNetworkPortDefSaveStatus; virNodeDeviceObjEndAPI; virNodeDeviceObjGetDef; virNodeDeviceObjIsActive; +virNodeDeviceObjIsPersistent; virNodeDeviceObjListAssignDef; virNodeDeviceObjListExport; virNodeDeviceObjListFindByName; @@ -1275,6 +1276,7 @@ virNodeDeviceObjListNew; virNodeDeviceObjListNumOfDevices; virNodeDeviceObjListRemove; virNodeDeviceObjSetActive; +virNodeDeviceObjSetPersistent;
# conf/virnwfilterbindingdef.h -- 2.26.2

Since a mediated device can be persistently defined by the mdevctl backend, we need additional lifecycle events beyond CREATED/DELETED to indicate that e.g. the device has been stopped but the device definition still exists. Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> --- examples/c/misc/event-test.c | 4 ++++ include/libvirt/libvirt-nodedev.h | 2 ++ tools/virsh-nodedev.c | 4 +++- 3 files changed, 9 insertions(+), 1 deletion(-) diff --git a/examples/c/misc/event-test.c b/examples/c/misc/event-test.c index f164e825e1..bd6218cce6 100644 --- a/examples/c/misc/event-test.c +++ b/examples/c/misc/event-test.c @@ -381,6 +381,10 @@ nodeDeviceEventToString(int event) return "Created"; case VIR_NODE_DEVICE_EVENT_DELETED: return "Deleted"; + case VIR_NODE_DEVICE_EVENT_DEFINED: + return "Defined"; + case VIR_NODE_DEVICE_EVENT_UNDEFINED: + return "Undefined"; case VIR_NODE_DEVICE_EVENT_LAST: break; } diff --git a/include/libvirt/libvirt-nodedev.h b/include/libvirt/libvirt-nodedev.h index d304283871..7cd679dac4 100644 --- a/include/libvirt/libvirt-nodedev.h +++ b/include/libvirt/libvirt-nodedev.h @@ -197,6 +197,8 @@ int virConnectNodeDeviceEventDeregisterAny(virConnectPtr conn, typedef enum { VIR_NODE_DEVICE_EVENT_CREATED = 0, VIR_NODE_DEVICE_EVENT_DELETED = 1, + VIR_NODE_DEVICE_EVENT_DEFINED = 2, + VIR_NODE_DEVICE_EVENT_UNDEFINED = 3, # ifdef VIR_ENUM_SENTINELS VIR_NODE_DEVICE_EVENT_LAST diff --git a/tools/virsh-nodedev.c b/tools/virsh-nodedev.c index 69422e20f5..428ead7384 100644 --- a/tools/virsh-nodedev.c +++ b/tools/virsh-nodedev.c @@ -775,7 +775,9 @@ VIR_ENUM_DECL(virshNodeDeviceEvent); VIR_ENUM_IMPL(virshNodeDeviceEvent, VIR_NODE_DEVICE_EVENT_LAST, N_("Created"), - N_("Deleted")); + N_("Deleted"), + N_("Defined"), + N_("Undefined")); static const char * virshNodeDeviceEventToString(int event) -- 2.26.2

At startup, query devices that are defined by 'mdevctl' and add them to the node device list. This adds a complication: we now have two potential sources of information for a node device: - udev for all devices and for activated mediated devices - mdevctl for persistent mediated devices Unfortunately, neither backend returns full information for a mediated device. For example, if a persistent mediated device in the list (with information provided from mdevctl) is 'started', that same device will now be detected by udev. If we simply overwrite the existing device definition with the new one provided by the udev backend, we will lose extra information that was provided by mdevctl (e.g. attributes, etc). To avoid this, make sure to copy the extra information into the new device definition. Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> --- src/node_device/node_device_driver.c | 164 +++++++++++++++++++++++++++ src/node_device/node_device_driver.h | 6 + src/node_device/node_device_udev.c | 31 ++++- 3 files changed, 196 insertions(+), 5 deletions(-) diff --git a/src/node_device/node_device_driver.c b/src/node_device/node_device_driver.c index 2778e41f97..fd57dcacc1 100644 --- a/src/node_device/node_device_driver.c +++ b/src/node_device/node_device_driver.c @@ -1156,3 +1156,167 @@ nodeDeviceGenerateName(virNodeDeviceDefPtr def, *(def->name + i) = '_'; } } + + +static int +virMdevctlListDefined(virNodeDeviceDefPtr **devs) +{ + int status; + g_autofree char *output = NULL; + g_autoptr(virCommand) cmd = nodeDeviceGetMdevctlListCommand(true, &output); + + if (virCommandRun(cmd, &status) < 0) + return -1; + + if (!output) + return -1; + + return nodeDeviceParseMdevctlJSON(output, devs); +} + + +typedef struct _virMdevctlForEachData virMdevctlForEachData; +struct _virMdevctlForEachData { + int ndefs; + virNodeDeviceDefPtr *defs; +}; + + +int +nodeDeviceUpdateMediatedDevices(void) +{ + g_autofree virNodeDeviceDefPtr *defs = NULL; + int ndefs; + GList * tofree = NULL; + size_t i; + + if ((ndefs = virMdevctlListDefined(&defs)) < 0) { + virReportSystemError(errno, "%s", + _("failed to query mdevs from mdevctl")); + return -1; + } + + for (i = 0; i < ndefs; i++) { + virNodeDeviceObjPtr obj; + virObjectEventPtr event; + virNodeDeviceDefPtr def = defs[i]; + bool was_defined = false; + + def->driver = g_strdup("vfio_mdev"); + + if ((obj = virNodeDeviceObjListFindByName(driver->devs, def->name))) { + bool changed; + virNodeDeviceDefPtr olddef = virNodeDeviceObjGetDef(obj); + + was_defined = virNodeDeviceObjIsPersistent(obj); + /* Active devices contain some additional information (e.g. sysfs + * path) that is not provided by mdevctl, so re-use the existing + * definition and copy over new mdev data */ + changed = nodeDeviceDefCopyFromMdevctl(olddef, def); + + /* Re-use the existing definition (after merging new data from + * mdevctl), so def needs to be freed at the end of the function */ + tofree = g_list_prepend(tofree, def); + + if (was_defined && !changed) { + /* if this device was already defined and the definition + * hasn't changed, there's nothing to do for this device */ + virNodeDeviceObjEndAPI(&obj); + continue; + } + } else { + if (!(obj = virNodeDeviceObjListAssignDef(driver->devs, def))) { + virNodeDeviceDefFree(def); + goto cleanup; + } + } + + /* all devices returned by virMdevctlListDefined() are persistent */ + virNodeDeviceObjSetPersistent(obj, true); + + if (!was_defined) + event = virNodeDeviceEventLifecycleNew(def->name, + VIR_NODE_DEVICE_EVENT_DEFINED, + 0); + else + event = virNodeDeviceEventUpdateNew(def->name); + + virNodeDeviceObjEndAPI(&obj); + virObjectEventStateQueue(driver->nodeDeviceEventState, event); + } + + cleanup: + g_list_free_full(tofree, (GDestroyNotify)virNodeDeviceDefFree); + + return 0; +} + + +/* returns true if any attributes were copied, else returns false */ +static bool +virMediatedDeviceAttrsCopy(virNodeDevCapMdevPtr dst, + virNodeDevCapMdevPtr src) +{ + bool ret = false; + size_t i; + + if (src->nattributes != dst->nattributes) { + ret = true; + for (i = 0; i < dst->nattributes; i++) + virMediatedDeviceAttrFree(dst->attributes[i]); + g_free(dst->attributes); + + dst->nattributes = src->nattributes; + dst->attributes = g_new0(virMediatedDeviceAttrPtr, + src->nattributes); + for (i = 0; i < dst->nattributes; i++) + dst->attributes[i] = virMediatedDeviceAttrNew(); + } + + for (i = 0; i < src->nattributes; i++) { + if (STRNEQ_NULLABLE(src->attributes[i]->name, + dst->attributes[i]->name)) { + ret = true; + dst->attributes[i]->name = + g_strdup(src->attributes[i]->name); + } + if (STRNEQ_NULLABLE(src->attributes[i]->value, + dst->attributes[i]->value)) { + ret = true; + dst->attributes[i]->value = + g_strdup(src->attributes[i]->value); + } + } + + return ret; +} + + +/* A mediated device definitions from mdevctl contains additional info that is + * not available from udev. Transfer this data to the new definition. + * Returns true if anything was copied, else returns false */ +bool +nodeDeviceDefCopyFromMdevctl(virNodeDeviceDefPtr dst, + virNodeDeviceDefPtr src) +{ + bool ret = false; + virNodeDevCapMdevPtr srcmdev = &src->caps->data.mdev; + virNodeDevCapMdevPtr dstmdev = &dst->caps->data.mdev; + + if (STRNEQ_NULLABLE(dstmdev->type, srcmdev->type)) { + ret = true; + g_free(dstmdev->type); + dstmdev->type = g_strdup(srcmdev->type); + } + + if (STRNEQ_NULLABLE(dstmdev->uuid, srcmdev->uuid)) { + ret = true; + g_free(dstmdev->uuid); + dstmdev->uuid = g_strdup(srcmdev->uuid); + } + + if (virMediatedDeviceAttrsCopy(dstmdev, srcmdev)) + ret = true; + + return ret; +} diff --git a/src/node_device/node_device_driver.h b/src/node_device/node_device_driver.h index 80ac7c5320..d8837e4ef8 100644 --- a/src/node_device/node_device_driver.h +++ b/src/node_device/node_device_driver.h @@ -126,8 +126,14 @@ int nodeDeviceParseMdevctlJSON(const char *jsonstring, virNodeDeviceDefPtr **devs); +int +nodeDeviceUpdateMediatedDevices(void); + void nodeDeviceGenerateName(virNodeDeviceDefPtr def, const char *subsystem, const char *sysname, const char *s); + +bool nodeDeviceDefCopyFromMdevctl(virNodeDeviceDefPtr dst, + virNodeDeviceDefPtr src); diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c index 5d1a192411..038941ec51 100644 --- a/src/node_device/node_device_udev.c +++ b/src/node_device/node_device_udev.c @@ -1416,9 +1416,17 @@ udevRemoveOneDeviceSysPath(const char *path) VIR_NODE_DEVICE_EVENT_DELETED, 0); - VIR_DEBUG("Removing device '%s' with sysfs path '%s'", - def->name, path); - virNodeDeviceObjListRemove(driver->devs, obj); + /* If the device is a mediated device that has been 'stopped', it may still + * be defined by mdevctl and can therefore be started again. Don't drop it + * from the list of node devices */ + if (virNodeDeviceObjIsPersistent(obj)) { + VIR_FREE(def->sysfs_path); + virNodeDeviceObjSetActive(obj, false); + } else { + VIR_DEBUG("Removing device '%s' with sysfs path '%s'", + def->name, path); + virNodeDeviceObjListRemove(driver->devs, obj); + } virNodeDeviceObjEndAPI(&obj); virObjectEventStateQueue(driver->nodeDeviceEventState, event); @@ -1476,7 +1484,6 @@ udevSetParent(struct udev_device *device, return 0; } - static int udevAddOneDevice(struct udev_device *device) { @@ -1486,6 +1493,7 @@ udevAddOneDevice(struct udev_device *device) virObjectEventPtr event = NULL; bool new_device = true; int ret = -1; + bool was_persistent = false; def = g_new0(virNodeDeviceDef, 1); @@ -1509,14 +1517,23 @@ udevAddOneDevice(struct udev_device *device) goto cleanup; if ((obj = virNodeDeviceObjListFindByName(driver->devs, def->name))) { + objdef = virNodeDeviceObjGetDef(obj); + + if (def->caps->data.type == VIR_NODE_DEV_CAP_MDEV) + nodeDeviceDefCopyFromMdevctl(def, objdef); + was_persistent = virNodeDeviceObjIsPersistent(obj); + /* If the device was defined by mdevctl and was never instantiated, it + * won't have a sysfs path. We need to emit a CREATED event... */ + new_device = (objdef->sysfs_path == NULL); + virNodeDeviceObjEndAPI(&obj); - new_device = false; } /* If this is a device change, the old definition will be freed * and the current definition will take its place. */ if (!(obj = virNodeDeviceObjListAssignDef(driver->devs, def))) goto cleanup; + virNodeDeviceObjSetPersistent(obj, was_persistent); objdef = virNodeDeviceObjGetDef(obj); if (new_device) @@ -1935,6 +1952,10 @@ nodeStateInitializeEnumerate(void *opaque) /* Populate with known devices */ if (udevEnumerateDevices(udev) != 0) goto error; + /* Load persistent mdevs (which might not be activated yet) and additional + * information about active mediated devices from mdevctl */ + if (nodeDeviceUpdateMediatedDevices() != 0) + goto error; nodeDeviceLock(); driver->initialized = true; -- 2.26.2

On Wed, Feb 03, 2021 at 11:38:53AM -0600, Jonathon Jongsma wrote:
At startup, query devices that are defined by 'mdevctl' and add them to the node device list.
This adds a complication: we now have two potential sources of information for a node device: - udev for all devices and for activated mediated devices - mdevctl for persistent mediated devices
Unfortunately, neither backend returns full information for a mediated device. For example, if a persistent mediated device in the list (with information provided from mdevctl) is 'started', that same device will now be detected by udev. If we simply overwrite the existing device definition with the new one provided by the udev backend, we will lose extra information that was provided by mdevctl (e.g. attributes, etc). To avoid this, make sure to copy the extra information into the new device definition.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> --- src/node_device/node_device_driver.c | 164 +++++++++++++++++++++++++++ src/node_device/node_device_driver.h | 6 + src/node_device/node_device_udev.c | 31 ++++- 3 files changed, 196 insertions(+), 5 deletions(-)
diff --git a/src/node_device/node_device_driver.c b/src/node_device/node_device_driver.c index 2778e41f97..fd57dcacc1 100644 --- a/src/node_device/node_device_driver.c +++ b/src/node_device/node_device_driver.c @@ -1156,3 +1156,167 @@ nodeDeviceGenerateName(virNodeDeviceDefPtr def, *(def->name + i) = '_'; } } + + +static int +virMdevctlListDefined(virNodeDeviceDefPtr **devs) +{ + int status; + g_autofree char *output = NULL; + g_autoptr(virCommand) cmd = nodeDeviceGetMdevctlListCommand(true, &output); + + if (virCommandRun(cmd, &status) < 0) + return -1; + + if (!output) + return -1; + + return nodeDeviceParseMdevctlJSON(output, devs); +} + + +typedef struct _virMdevctlForEachData virMdevctlForEachData; +struct _virMdevctlForEachData { + int ndefs; + virNodeDeviceDefPtr *defs; +}; +
^This struct doesn't seem to be used anywhere in the patch, so please define it in the patch that makes use of it.
+ +int +nodeDeviceUpdateMediatedDevices(void) +{ + g_autofree virNodeDeviceDefPtr *defs = NULL; + int ndefs; + GList * tofree = NULL; + size_t i; + + if ((ndefs = virMdevctlListDefined(&defs)) < 0) { + virReportSystemError(errno, "%s", + _("failed to query mdevs from mdevctl")); + return -1; + } + + for (i = 0; i < ndefs; i++) { + virNodeDeviceObjPtr obj; + virObjectEventPtr event; + virNodeDeviceDefPtr def = defs[i]; + bool was_defined = false; + + def->driver = g_strdup("vfio_mdev"); + + if ((obj = virNodeDeviceObjListFindByName(driver->devs, def->name))) {
^This condition should be flipped so that the 'else' block becomes the larger one.
+ bool changed; + virNodeDeviceDefPtr olddef = virNodeDeviceObjGetDef(obj); + + was_defined = virNodeDeviceObjIsPersistent(obj);
"defined" as a name will do just fine
+ /* Active devices contain some additional information (e.g. sysfs + * path) that is not provided by mdevctl, so re-use the existing + * definition and copy over new mdev data */ + changed = nodeDeviceDefCopyFromMdevctl(olddef, def); + + /* Re-use the existing definition (after merging new data from + * mdevctl), so def needs to be freed at the end of the function */ + tofree = g_list_prepend(tofree, def);
I'm not a fan of ^this. Plain g_autoptr on virNodeDeviceDef would do just fine.
+ + if (was_defined && !changed) { + /* if this device was already defined and the definition + * hasn't changed, there's nothing to do for this device */ + virNodeDeviceObjEndAPI(&obj); + continue; + } + } else { + if (!(obj = virNodeDeviceObjListAssignDef(driver->devs, def))) { + virNodeDeviceDefFree(def); + goto cleanup; + } + } + + /* all devices returned by virMdevctlListDefined() are persistent */ + virNodeDeviceObjSetPersistent(obj, true); + + if (!was_defined) + event = virNodeDeviceEventLifecycleNew(def->name, + VIR_NODE_DEVICE_EVENT_DEFINED, + 0); + else + event = virNodeDeviceEventUpdateNew(def->name); + + virNodeDeviceObjEndAPI(&obj); + virObjectEventStateQueue(driver->nodeDeviceEventState, event); + } + + cleanup: + g_list_free_full(tofree, (GDestroyNotify)virNodeDeviceDefFree); + + return 0; +} + + +/* returns true if any attributes were copied, else returns false */ +static bool +virMediatedDeviceAttrsCopy(virNodeDevCapMdevPtr dst, + virNodeDevCapMdevPtr src) +{ + bool ret = false; + size_t i; + + if (src->nattributes != dst->nattributes) { + ret = true; + for (i = 0; i < dst->nattributes; i++) + virMediatedDeviceAttrFree(dst->attributes[i]); + g_free(dst->attributes); + + dst->nattributes = src->nattributes; + dst->attributes = g_new0(virMediatedDeviceAttrPtr, + src->nattributes); + for (i = 0; i < dst->nattributes; i++) + dst->attributes[i] = virMediatedDeviceAttrNew(); + } + + for (i = 0; i < src->nattributes; i++) { + if (STRNEQ_NULLABLE(src->attributes[i]->name, + dst->attributes[i]->name)) { + ret = true; + dst->attributes[i]->name = + g_strdup(src->attributes[i]->name);
^This could leak memory if src->attributes == dst->attributes and the loop above therefore did not execute.
+ } + if (STRNEQ_NULLABLE(src->attributes[i]->value, + dst->attributes[i]->value)) { + ret = true; + dst->attributes[i]->value = + g_strdup(src->attributes[i]->value); + } + }
I think we can just straight overwrite all the data in question, yes, some CPU cycles will be wasted, but time complexity remains at O(n) and readability improves significantly. These are mdev devices, so I doubt this is an RT critical feature. The function can then remain as void.
+ + return ret; +} + + +/* A mediated device definitions from mdevctl contains additional info that is + * not available from udev. Transfer this data to the new definition. + * Returns true if anything was copied, else returns false */ +bool +nodeDeviceDefCopyFromMdevctl(virNodeDeviceDefPtr dst, + virNodeDeviceDefPtr src) +{ + bool ret = false; + virNodeDevCapMdevPtr srcmdev = &src->caps->data.mdev; + virNodeDevCapMdevPtr dstmdev = &dst->caps->data.mdev; + + if (STRNEQ_NULLABLE(dstmdev->type, srcmdev->type)) { + ret = true; + g_free(dstmdev->type); + dstmdev->type = g_strdup(srcmdev->type); + } + + if (STRNEQ_NULLABLE(dstmdev->uuid, srcmdev->uuid)) { + ret = true; + g_free(dstmdev->uuid); + dstmdev->uuid = g_strdup(srcmdev->uuid); + } + + if (virMediatedDeviceAttrsCopy(dstmdev, srcmdev)) + ret = true;
Same as above, just overwrite all the data in question directly with those coming from mdevctl.
+ + return ret; +} diff --git a/src/node_device/node_device_driver.h b/src/node_device/node_device_driver.h index 80ac7c5320..d8837e4ef8 100644 --- a/src/node_device/node_device_driver.h +++ b/src/node_device/node_device_driver.h @@ -126,8 +126,14 @@ int nodeDeviceParseMdevctlJSON(const char *jsonstring, virNodeDeviceDefPtr **devs);
+int +nodeDeviceUpdateMediatedDevices(void); + void nodeDeviceGenerateName(virNodeDeviceDefPtr def, const char *subsystem, const char *sysname, const char *s); + +bool nodeDeviceDefCopyFromMdevctl(virNodeDeviceDefPtr dst, + virNodeDeviceDefPtr src); diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c index 5d1a192411..038941ec51 100644 --- a/src/node_device/node_device_udev.c +++ b/src/node_device/node_device_udev.c @@ -1416,9 +1416,17 @@ udevRemoveOneDeviceSysPath(const char *path) VIR_NODE_DEVICE_EVENT_DELETED, 0);
- VIR_DEBUG("Removing device '%s' with sysfs path '%s'", - def->name, path); - virNodeDeviceObjListRemove(driver->devs, obj); + /* If the device is a mediated device that has been 'stopped', it may still + * be defined by mdevctl and can therefore be started again. Don't drop it + * from the list of node devices */ + if (virNodeDeviceObjIsPersistent(obj)) { + VIR_FREE(def->sysfs_path); + virNodeDeviceObjSetActive(obj, false); + } else { + VIR_DEBUG("Removing device '%s' with sysfs path '%s'", + def->name, path); + virNodeDeviceObjListRemove(driver->devs, obj); + } virNodeDeviceObjEndAPI(&obj);
virObjectEventStateQueue(driver->nodeDeviceEventState, event); @@ -1476,7 +1484,6 @@ udevSetParent(struct udev_device *device, return 0; }
- static int udevAddOneDevice(struct udev_device *device) { @@ -1486,6 +1493,7 @@ udevAddOneDevice(struct udev_device *device) virObjectEventPtr event = NULL; bool new_device = true; int ret = -1; + bool was_persistent = false;
def = g_new0(virNodeDeviceDef, 1);
@@ -1509,14 +1517,23 @@ udevAddOneDevice(struct udev_device *device) goto cleanup;
if ((obj = virNodeDeviceObjListFindByName(driver->devs, def->name))) { + objdef = virNodeDeviceObjGetDef(obj); + + if (def->caps->data.type == VIR_NODE_DEV_CAP_MDEV) + nodeDeviceDefCopyFromMdevctl(def, objdef); + was_persistent = virNodeDeviceObjIsPersistent(obj); + /* If the device was defined by mdevctl and was never instantiated, it + * won't have a sysfs path. We need to emit a CREATED event... */
Why CREATED instead of DEFINED? Erik
+ new_device = (objdef->sysfs_path == NULL); + virNodeDeviceObjEndAPI(&obj); - new_device = false; }
/* If this is a device change, the old definition will be freed * and the current definition will take its place. */ if (!(obj = virNodeDeviceObjListAssignDef(driver->devs, def))) goto cleanup; + virNodeDeviceObjSetPersistent(obj, was_persistent); objdef = virNodeDeviceObjGetDef(obj);
if (new_device) @@ -1935,6 +1952,10 @@ nodeStateInitializeEnumerate(void *opaque) /* Populate with known devices */ if (udevEnumerateDevices(udev) != 0) goto error; + /* Load persistent mdevs (which might not be activated yet) and additional + * information about active mediated devices from mdevctl */ + if (nodeDeviceUpdateMediatedDevices() != 0) + goto error;
nodeDeviceLock(); driver->initialized = true; -- 2.26.2

On Tue, 16 Feb 2021 13:37:41 +0100 Erik Skultety <eskultet@redhat.com> wrote:
On Wed, Feb 03, 2021 at 11:38:53AM -0600, Jonathon Jongsma wrote:
At startup, query devices that are defined by 'mdevctl' and add them to the node device list.
This adds a complication: we now have two potential sources of information for a node device: - udev for all devices and for activated mediated devices - mdevctl for persistent mediated devices
Unfortunately, neither backend returns full information for a mediated device. For example, if a persistent mediated device in the list (with information provided from mdevctl) is 'started', that same device will now be detected by udev. If we simply overwrite the existing device definition with the new one provided by the udev backend, we will lose extra information that was provided by mdevctl (e.g. attributes, etc). To avoid this, make sure to copy the extra information into the new device definition.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> --- src/node_device/node_device_driver.c | 164 +++++++++++++++++++++++++++ src/node_device/node_device_driver.h | 6 + src/node_device/node_device_udev.c | 31 ++++- 3 files changed, 196 insertions(+), 5 deletions(-)
diff --git a/src/node_device/node_device_driver.c b/src/node_device/node_device_driver.c index 2778e41f97..fd57dcacc1 100644 --- a/src/node_device/node_device_driver.c +++ b/src/node_device/node_device_driver.c @@ -1156,3 +1156,167 @@ nodeDeviceGenerateName(virNodeDeviceDefPtr def, *(def->name + i) = '_'; } } + + +static int +virMdevctlListDefined(virNodeDeviceDefPtr **devs) +{ + int status; + g_autofree char *output = NULL; + g_autoptr(virCommand) cmd = nodeDeviceGetMdevctlListCommand(true, &output); + + if (virCommandRun(cmd, &status) < 0) + return -1; + + if (!output) + return -1; + + return nodeDeviceParseMdevctlJSON(output, devs); +} + + +typedef struct _virMdevctlForEachData virMdevctlForEachData; +struct _virMdevctlForEachData { + int ndefs; + virNodeDeviceDefPtr *defs; +}; +
^This struct doesn't seem to be used anywhere in the patch, so please define it in the patch that makes use of it.
+ +int +nodeDeviceUpdateMediatedDevices(void) +{ + g_autofree virNodeDeviceDefPtr *defs = NULL; + int ndefs; + GList * tofree = NULL; + size_t i; + + if ((ndefs = virMdevctlListDefined(&defs)) < 0) { + virReportSystemError(errno, "%s", + _("failed to query mdevs from mdevctl")); + return -1; + } + + for (i = 0; i < ndefs; i++) { + virNodeDeviceObjPtr obj; + virObjectEventPtr event; + virNodeDeviceDefPtr def = defs[i]; + bool was_defined = false; + + def->driver = g_strdup("vfio_mdev"); + + if ((obj = virNodeDeviceObjListFindByName(driver->devs, def->name))) {
^This condition should be flipped so that the 'else' block becomes the larger one.
+ bool changed; + virNodeDeviceDefPtr olddef = virNodeDeviceObjGetDef(obj); + + was_defined = virNodeDeviceObjIsPersistent(obj);
"defined" as a name will do just fine
+ /* Active devices contain some additional information (e.g. sysfs + * path) that is not provided by mdevctl, so re-use the existing + * definition and copy over new mdev data */ + changed = nodeDeviceDefCopyFromMdevctl(olddef, def); + + /* Re-use the existing definition (after merging new data from + * mdevctl), so def needs to be freed at the end of the function */ + tofree = g_list_prepend(tofree, def);
I'm not a fan of ^this. Plain g_autoptr on virNodeDeviceDef would do just fine.
I'm not a fan of it either, but g_autoptr is not ideal here either. The def must not be freed if it is a new device (because virNodeDeviceObjListAssignDef() takes ownership of the def). So we'd have to make sure we steal the autoptr in that case. But we have to use it after we'd need to clear it (in order to emit the lifecycle event, etc). But there's also another reason I took this approach. In patch 11 ("handle mdevs that disappear..."), we need to use the entire list of devices ('defs') to scan for devices that have disappeared. So I had to keep the entire list of defs until the end of the function where we traversed the list looking for removed devices to emit a lifecycle event. I could potentially move this check to the beginning of the function to get around the need to keep the defs around until the end of the function, though... I can try to rework it if you think it necessary.
+ + if (was_defined && !changed) { + /* if this device was already defined and the definition + * hasn't changed, there's nothing to do for this device */ + virNodeDeviceObjEndAPI(&obj); + continue; + } + } else { + if (!(obj = virNodeDeviceObjListAssignDef(driver->devs, def))) { + virNodeDeviceDefFree(def); + goto cleanup; + } + } + + /* all devices returned by virMdevctlListDefined() are persistent */ + virNodeDeviceObjSetPersistent(obj, true); + + if (!was_defined) + event = virNodeDeviceEventLifecycleNew(def->name, + VIR_NODE_DEVICE_EVENT_DEFINED, + 0); + else + event = virNodeDeviceEventUpdateNew(def->name); + + virNodeDeviceObjEndAPI(&obj); + virObjectEventStateQueue(driver->nodeDeviceEventState, event); + } + + cleanup: + g_list_free_full(tofree, (GDestroyNotify)virNodeDeviceDefFree); + + return 0; +} + + +/* returns true if any attributes were copied, else returns false */ +static bool +virMediatedDeviceAttrsCopy(virNodeDevCapMdevPtr dst, + virNodeDevCapMdevPtr src) +{ + bool ret = false; + size_t i; + + if (src->nattributes != dst->nattributes) { + ret = true; + for (i = 0; i < dst->nattributes; i++) + virMediatedDeviceAttrFree(dst->attributes[i]); + g_free(dst->attributes); + + dst->nattributes = src->nattributes; + dst->attributes = g_new0(virMediatedDeviceAttrPtr, + src->nattributes); + for (i = 0; i < dst->nattributes; i++) + dst->attributes[i] = virMediatedDeviceAttrNew(); + } + + for (i = 0; i < src->nattributes; i++) { + if (STRNEQ_NULLABLE(src->attributes[i]->name, + dst->attributes[i]->name)) { + ret = true; + dst->attributes[i]->name = + g_strdup(src->attributes[i]->name);
^This could leak memory if src->attributes == dst->attributes and the loop above therefore did not execute.
+ } + if (STRNEQ_NULLABLE(src->attributes[i]->value, + dst->attributes[i]->value)) { + ret = true; + dst->attributes[i]->value = + g_strdup(src->attributes[i]->value); + } + }
I think we can just straight overwrite all the data in question, yes, some CPU cycles will be wasted, but time complexity remains at O(n) and readability improves significantly. These are mdev devices, so I doubt this is an RT critical feature. The function can then remain as void.
It's perhaps not obvious, but I specifically added these checks and return value in order to avoid an issue that Daniel pointed out where I was emitting update events for every device regardless of whether they had changed or not. So rather than inventing a new function to check for node device equality and only copy over the new data if they weren't equal, I incorporated the equality check into the update function so that I could check it and update it in the same call. See above where I use the 'changed' variable in nodeDeviceUpdateMediatedDevices().
+ + return ret; +} + + +/* A mediated device definitions from mdevctl contains additional info that is + * not available from udev. Transfer this data to the new definition. + * Returns true if anything was copied, else returns false */ +bool +nodeDeviceDefCopyFromMdevctl(virNodeDeviceDefPtr dst, + virNodeDeviceDefPtr src) +{ + bool ret = false; + virNodeDevCapMdevPtr srcmdev = &src->caps->data.mdev; + virNodeDevCapMdevPtr dstmdev = &dst->caps->data.mdev; + + if (STRNEQ_NULLABLE(dstmdev->type, srcmdev->type)) { + ret = true; + g_free(dstmdev->type); + dstmdev->type = g_strdup(srcmdev->type); + } + + if (STRNEQ_NULLABLE(dstmdev->uuid, srcmdev->uuid)) { + ret = true; + g_free(dstmdev->uuid); + dstmdev->uuid = g_strdup(srcmdev->uuid); + } + + if (virMediatedDeviceAttrsCopy(dstmdev, srcmdev)) + ret = true;
Same as above, just overwrite all the data in question directly with those coming from mdevctl.
+ + return ret; +} diff --git a/src/node_device/node_device_driver.h b/src/node_device/node_device_driver.h index 80ac7c5320..d8837e4ef8 100644 --- a/src/node_device/node_device_driver.h +++ b/src/node_device/node_device_driver.h @@ -126,8 +126,14 @@ int nodeDeviceParseMdevctlJSON(const char *jsonstring, virNodeDeviceDefPtr **devs);
+int +nodeDeviceUpdateMediatedDevices(void); + void nodeDeviceGenerateName(virNodeDeviceDefPtr def, const char *subsystem, const char *sysname, const char *s); + +bool nodeDeviceDefCopyFromMdevctl(virNodeDeviceDefPtr dst, + virNodeDeviceDefPtr src); diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c index 5d1a192411..038941ec51 100644 --- a/src/node_device/node_device_udev.c +++ b/src/node_device/node_device_udev.c @@ -1416,9 +1416,17 @@ udevRemoveOneDeviceSysPath(const char *path) VIR_NODE_DEVICE_EVENT_DELETED, 0);
- VIR_DEBUG("Removing device '%s' with sysfs path '%s'", - def->name, path); - virNodeDeviceObjListRemove(driver->devs, obj); + /* If the device is a mediated device that has been 'stopped', it may still + * be defined by mdevctl and can therefore be started again. Don't drop it + * from the list of node devices */ + if (virNodeDeviceObjIsPersistent(obj)) { + VIR_FREE(def->sysfs_path); + virNodeDeviceObjSetActive(obj, false); + } else { + VIR_DEBUG("Removing device '%s' with sysfs path '%s'", + def->name, path); + virNodeDeviceObjListRemove(driver->devs, obj); + } virNodeDeviceObjEndAPI(&obj);
virObjectEventStateQueue(driver->nodeDeviceEventState, event); @@ -1476,7 +1484,6 @@ udevSetParent(struct udev_device *device, return 0; }
- static int udevAddOneDevice(struct udev_device *device) { @@ -1486,6 +1493,7 @@ udevAddOneDevice(struct udev_device *device) virObjectEventPtr event = NULL; bool new_device = true; int ret = -1; + bool was_persistent = false;
def = g_new0(virNodeDeviceDef, 1);
@@ -1509,14 +1517,23 @@ udevAddOneDevice(struct udev_device *device) goto cleanup;
if ((obj = virNodeDeviceObjListFindByName(driver->devs, def->name))) { + objdef = virNodeDeviceObjGetDef(obj); + + if (def->caps->data.type == VIR_NODE_DEV_CAP_MDEV) + nodeDeviceDefCopyFromMdevctl(def, objdef); + was_persistent = virNodeDeviceObjIsPersistent(obj); + /* If the device was defined by mdevctl and was never instantiated, it + * won't have a sysfs path. We need to emit a CREATED event... */
Why CREATED instead of DEFINED?
Because this is handling a udev event. udev only deals with with instantiated mdevs, not persistent mdev definitions. In other words, mdevctl determines whether a mdev is DEFINED or UNDEFINED, and udev determines whether a devices is CREATED or DELETED.
Erik
+ new_device = (objdef->sysfs_path == NULL); + virNodeDeviceObjEndAPI(&obj); - new_device = false; }
/* If this is a device change, the old definition will be freed * and the current definition will take its place. */ if (!(obj = virNodeDeviceObjListAssignDef(driver->devs, def))) goto cleanup; + virNodeDeviceObjSetPersistent(obj, was_persistent); objdef = virNodeDeviceObjGetDef(obj);
if (new_device) @@ -1935,6 +1952,10 @@ nodeStateInitializeEnumerate(void *opaque) /* Populate with known devices */ if (udevEnumerateDevices(udev) != 0) goto error; + /* Load persistent mdevs (which might not be activated yet) and additional + * information about active mediated devices from mdevctl */ + if (nodeDeviceUpdateMediatedDevices() != 0) + goto error;
nodeDeviceLock(); driver->initialized = true; -- 2.26.2

...
+ bool changed; + virNodeDeviceDefPtr olddef = virNodeDeviceObjGetDef(obj); + + was_defined = virNodeDeviceObjIsPersistent(obj);
"defined" as a name will do just fine
+ /* Active devices contain some additional information (e.g. sysfs + * path) that is not provided by mdevctl, so re-use the existing + * definition and copy over new mdev data */ + changed = nodeDeviceDefCopyFromMdevctl(olddef, def); + + /* Re-use the existing definition (after merging new data from + * mdevctl), so def needs to be freed at the end of the function */ + tofree = g_list_prepend(tofree, def);
I'm not a fan of ^this. Plain g_autoptr on virNodeDeviceDef would do just fine.
I'm not a fan of it either, but g_autoptr is not ideal here either. The def must not be freed if it is a new device (because virNodeDeviceObjListAssignDef() takes ownership of the def). So we'd have to make sure we steal the autoptr in that case. But we have to use it after we'd need to clear it (in order to emit the lifecycle event, etc).
But there's also another reason I took this approach. In patch 11 ("handle mdevs that disappear..."), we need to use the entire list of devices ('defs') to scan for devices that have disappeared. So I had to keep the entire list of defs until the end of the function where we traversed the list looking for removed devices to emit a lifecycle event. I could potentially move this check to the beginning of the function to get around the need to keep the defs around until the end of the function, though...
I can try to rework it if you think it necessary.
I think purging non-existent devices at the beginning of the function is fine, maybe even desirable. As for the ownership, you only need def->name for events so I think it would still be easily doable by dropping the 'tofree' list. ...
+ if (STRNEQ_NULLABLE(src->attributes[i]->value, + dst->attributes[i]->value)) { + ret = true; + dst->attributes[i]->value = + g_strdup(src->attributes[i]->value); + } + }
I think we can just straight overwrite all the data in question, yes, some CPU cycles will be wasted, but time complexity remains at O(n) and readability improves significantly. These are mdev devices, so I doubt this is an RT critical feature. The function can then remain as void.
It's perhaps not obvious, but I specifically added these checks and return value in order to avoid an issue that Daniel pointed out where I was emitting update events for every device regardless of whether they had changed or not. So rather than inventing a new function to check for node device equality and only copy over the new data if they weren't equal, I incorporated the equality check into the update function so that I could check it and update it in the same call. See above where I use the 'changed' variable in nodeDeviceUpdateMediatedDevices().
Darn...yeah, I missed that, it's fine as it is then. ...
+ if (def->caps->data.type == VIR_NODE_DEV_CAP_MDEV) + nodeDeviceDefCopyFromMdevctl(def, objdef); + was_persistent = virNodeDeviceObjIsPersistent(obj); + /* If the device was defined by mdevctl and was never instantiated, it + * won't have a sysfs path. We need to emit a CREATED event... */
Why CREATED instead of DEFINED?
Because this is handling a udev event. udev only deals with with instantiated mdevs, not persistent mdev definitions.
In other words, mdevctl determines whether a mdev is DEFINED or UNDEFINED, and udev determines whether a devices is CREATED or DELETED.
Okay, thanks for refreshing my memory. Erik

When a mediated device is stopped or undefined by an application outside of libvirt, we need to remove it from our list of node devices within libvirt. This patch introduces virNodeDeviceObjListRemoveLocked() and virNodeDeviceObjListForEach() (which are analogous to other types of object lists in libvirt) to facilitate that. They will be used in coming commits. Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> --- src/conf/virnodedeviceobj.c | 65 ++++++++++++++++++++++++++++++++++--- src/conf/virnodedeviceobj.h | 12 +++++++ src/libvirt_private.syms | 2 ++ 3 files changed, 75 insertions(+), 4 deletions(-) diff --git a/src/conf/virnodedeviceobj.c b/src/conf/virnodedeviceobj.c index 53cf3d1bcd..681ad54314 100644 --- a/src/conf/virnodedeviceobj.c +++ b/src/conf/virnodedeviceobj.c @@ -507,23 +507,29 @@ void virNodeDeviceObjListRemove(virNodeDeviceObjListPtr devs, virNodeDeviceObjPtr obj) { - virNodeDeviceDefPtr def; - if (!obj) return; - def = obj->def; virObjectRef(obj); virObjectUnlock(obj); virObjectRWLockWrite(devs); virObjectLock(obj); - virHashRemoveEntry(devs->objs, def->name); + virNodeDeviceObjListRemoveLocked(devs, obj); virObjectUnlock(obj); virObjectUnref(obj); virObjectRWUnlock(devs); } +/* The caller must hold lock on 'devs' */ +void +virNodeDeviceObjListRemoveLocked(virNodeDeviceObjListPtr devs, + virNodeDeviceObjPtr dev) +{ + virHashRemoveEntry(devs->objs, dev->def->name); +} + + /* * Return the NPIV dev's parent device name */ @@ -1014,3 +1020,54 @@ virNodeDeviceObjSetPersistent(virNodeDeviceObjPtr obj, { obj->persistent = persistent; } + + +struct _virNodeDeviceObjListForEachData { + virNodeDeviceObjListIterator iter; + const void *opaque; +}; + + +static int +virNodeDeviceObjListForEachCb(void *payload, + const char *name G_GNUC_UNUSED, + void *opaque) +{ + virNodeDeviceObjPtr obj = payload; + struct _virNodeDeviceObjListForEachData *data = opaque; + + /* Grab a reference so that we don't rely only on references grabbed by + * hash table earlier. Remember, an iterator can remove object from the + * hash table. */ + virObjectRef(obj); + virObjectLock(obj); + data->iter(obj, data->opaque); + virNodeDeviceObjEndAPI(&obj); + + return 0; +} + + +/** + * virNodeDeviceObjListForEach + * @devs: Pointer to object list + * @iter: Callback iteration helper + * @opaque: Opaque data to use as argument to helper + * + * For each object in @devs, call the @iter helper using @opaque as + * an argument. + */ +void +virNodeDeviceObjListForEachSafe(virNodeDeviceObjListPtr devs, + virNodeDeviceObjListIterator iter, + const void *opaque) +{ + struct _virNodeDeviceObjListForEachData data = { + .iter = iter, + .opaque = opaque + }; + + virObjectRWLockWrite(devs); + virHashForEachSafe(devs->objs, virNodeDeviceObjListForEachCb, &data); + virObjectRWUnlock(devs); +} diff --git a/src/conf/virnodedeviceobj.h b/src/conf/virnodedeviceobj.h index 43af012103..a7d12674b4 100644 --- a/src/conf/virnodedeviceobj.h +++ b/src/conf/virnodedeviceobj.h @@ -80,6 +80,10 @@ void virNodeDeviceObjListRemove(virNodeDeviceObjListPtr devs, virNodeDeviceObjPtr dev); +void +virNodeDeviceObjListRemoveLocked(virNodeDeviceObjListPtr devs, + virNodeDeviceObjPtr dev); + int virNodeDeviceObjListGetParentHost(virNodeDeviceObjListPtr devs, virNodeDeviceDefPtr def); @@ -134,3 +138,11 @@ virNodeDeviceObjIsPersistent(virNodeDeviceObjPtr obj); void virNodeDeviceObjSetPersistent(virNodeDeviceObjPtr obj, bool persistent); + +typedef void +(*virNodeDeviceObjListIterator)(virNodeDeviceObjPtr obj, + const void *opaque); + +void virNodeDeviceObjListForEachSafe(virNodeDeviceObjListPtr devs, + virNodeDeviceObjListIterator iter, + const void *opaque); diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index d044a1111b..69090a7efe 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1269,12 +1269,14 @@ virNodeDeviceObjListFindByName; virNodeDeviceObjListFindBySysfsPath; virNodeDeviceObjListFindMediatedDeviceByUUID; virNodeDeviceObjListFindSCSIHostByWWNs; +virNodeDeviceObjListForEachSafe; virNodeDeviceObjListFree; virNodeDeviceObjListGetNames; virNodeDeviceObjListGetParentHost; virNodeDeviceObjListNew; virNodeDeviceObjListNumOfDevices; virNodeDeviceObjListRemove; +virNodeDeviceObjListRemoveLocked; virNodeDeviceObjSetActive; virNodeDeviceObjSetPersistent; -- 2.26.2

...
+void +virNodeDeviceObjListForEachSafe(virNodeDeviceObjListPtr devs, + virNodeDeviceObjListIterator iter, + const void *opaque) +{ + struct _virNodeDeviceObjListForEachData data = { + .iter = iter, + .opaque = opaque + }; + + virObjectRWLockWrite(devs); + virHashForEachSafe(devs->objs, virNodeDeviceObjListForEachCb, &data);
I just checked out virHashForEachSafe and virHashForEach says the latter is deprecated while the use of the former in new code should be rewritten with GLib, which translates to the use of g_hash_table_foreach_remove(). Erik

mdevctl does not currently provide any events when the list of defined devices changes, so we will need to poll mdevctl for the list of defined devices periodically. When a mediated device no longer exists from one iteration to the next, we need to treat it as an "undefine" event. When we get such an event, we remove the device from the list if it's not active. Otherwise, we simply mark it as non-persistent. Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> --- src/node_device/node_device_driver.c | 55 +++++++++++++++++++++++++--- 1 file changed, 50 insertions(+), 5 deletions(-) diff --git a/src/node_device/node_device_driver.c b/src/node_device/node_device_driver.c index fd57dcacc1..598cd865c5 100644 --- a/src/node_device/node_device_driver.c +++ b/src/node_device/node_device_driver.c @@ -1182,24 +1182,63 @@ struct _virMdevctlForEachData { }; +/* This function keeps the list of persistent mediated devices consistent + * between the nodedev driver and mdevctl. + * @obj is a device that is currently known by the nodedev driver, and @opaque + * contains the most recent list of devices defined by mdevctl. If @obj is no + * longer defined in mdevctl, remove it from the driver as well. */ +static void +removeMissingPersistentMdevs(virNodeDeviceObjPtr obj, + const void *opaque) +{ + const virMdevctlForEachData *data = opaque; + size_t i; + virNodeDeviceDefPtr def = virNodeDeviceObjGetDef(obj); + virObjectEventPtr event; + + /* transient mdevs are populated via udev, so don't remove them from the + * nodedev driver just because they are not reported by by mdevctl */ + if (!virNodeDeviceObjIsPersistent(obj)) + return; + + for (i = 0; i < data->ndefs; i++) { + /* OK, this mdev is still defined by mdevctl */ + if (STREQ(data->defs[i]->name, def->name)) + return; + } + + event = virNodeDeviceEventLifecycleNew(def->name, + VIR_NODE_DEVICE_EVENT_UNDEFINED, + 0); + + /* The device is active, but no longer defined by mdevctl. Keep the device + * in the list, but mark it as non-persistent */ + if (virNodeDeviceObjIsActive(obj)) + virNodeDeviceObjSetPersistent(obj, false); + else + virNodeDeviceObjListRemoveLocked(driver->devs, obj); + + virObjectEventStateQueue(driver->nodeDeviceEventState, event); +} + + int nodeDeviceUpdateMediatedDevices(void) { - g_autofree virNodeDeviceDefPtr *defs = NULL; - int ndefs; + virMdevctlForEachData data = { 0, }; GList * tofree = NULL; size_t i; - if ((ndefs = virMdevctlListDefined(&defs)) < 0) { + if ((data.ndefs = virMdevctlListDefined(&data.defs)) < 0) { virReportSystemError(errno, "%s", _("failed to query mdevs from mdevctl")); return -1; } - for (i = 0; i < ndefs; i++) { + for (i = 0; i < data.ndefs; i++) { virNodeDeviceObjPtr obj; virObjectEventPtr event; - virNodeDeviceDefPtr def = defs[i]; + virNodeDeviceDefPtr def = data.defs[i]; bool was_defined = false; def->driver = g_strdup("vfio_mdev"); @@ -1245,8 +1284,14 @@ nodeDeviceUpdateMediatedDevices(void) virObjectEventStateQueue(driver->nodeDeviceEventState, event); } + /* Any mdevs that were previously defined but were not returned the latest + * mdevctl query should be removed from the device list */ + virNodeDeviceObjListForEachSafe(driver->devs, removeMissingPersistentMdevs, + &data); + cleanup: g_list_free_full(tofree, (GDestroyNotify)virNodeDeviceDefFree); + g_free(data.defs); return 0; } -- 2.26.2

On Wed, Feb 03, 2021 at 11:38:55AM -0600, Jonathon Jongsma wrote:
mdevctl does not currently provide any events when the list of defined devices changes, so we will need to poll mdevctl for the list of defined devices periodically. When a mediated device no longer exists from one iteration to the next, we need to treat it as an "undefine" event.
When we get such an event, we remove the device from the list if it's not active. Otherwise, we simply mark it as non-persistent.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> --- src/node_device/node_device_driver.c | 55 +++++++++++++++++++++++++--- 1 file changed, 50 insertions(+), 5 deletions(-)
diff --git a/src/node_device/node_device_driver.c b/src/node_device/node_device_driver.c index fd57dcacc1..598cd865c5 100644 --- a/src/node_device/node_device_driver.c +++ b/src/node_device/node_device_driver.c @@ -1182,24 +1182,63 @@ struct _virMdevctlForEachData { };
+/* This function keeps the list of persistent mediated devices consistent + * between the nodedev driver and mdevctl. + * @obj is a device that is currently known by the nodedev driver, and @opaque + * contains the most recent list of devices defined by mdevctl. If @obj is no + * longer defined in mdevctl, remove it from the driver as well. */ +static void +removeMissingPersistentMdevs(virNodeDeviceObjPtr obj, + const void *opaque) +{ + const virMdevctlForEachData *data = opaque; + size_t i; + virNodeDeviceDefPtr def = virNodeDeviceObjGetDef(obj); + virObjectEventPtr event; + + /* transient mdevs are populated via udev, so don't remove them from the + * nodedev driver just because they are not reported by by mdevctl */ + if (!virNodeDeviceObjIsPersistent(obj)) + return; + + for (i = 0; i < data->ndefs; i++) { + /* OK, this mdev is still defined by mdevctl */ + if (STREQ(data->defs[i]->name, def->name)) + return; + } + + event = virNodeDeviceEventLifecycleNew(def->name, + VIR_NODE_DEVICE_EVENT_UNDEFINED, + 0); + + /* The device is active, but no longer defined by mdevctl. Keep the device + * in the list, but mark it as non-persistent */ + if (virNodeDeviceObjIsActive(obj)) + virNodeDeviceObjSetPersistent(obj, false); + else + virNodeDeviceObjListRemoveLocked(driver->devs, obj); + + virObjectEventStateQueue(driver->nodeDeviceEventState, event); +} + + int nodeDeviceUpdateMediatedDevices(void) { - g_autofree virNodeDeviceDefPtr *defs = NULL; - int ndefs; + virMdevctlForEachData data = { 0, }; GList * tofree = NULL; size_t i;
- if ((ndefs = virMdevctlListDefined(&defs)) < 0) { + if ((data.ndefs = virMdevctlListDefined(&data.defs)) < 0) { virReportSystemError(errno, "%s", _("failed to query mdevs from mdevctl")); return -1; }
- for (i = 0; i < ndefs; i++) { + for (i = 0; i < data.ndefs; i++) { virNodeDeviceObjPtr obj; virObjectEventPtr event; - virNodeDeviceDefPtr def = defs[i]; + virNodeDeviceDefPtr def = data.defs[i]; bool was_defined = false;
def->driver = g_strdup("vfio_mdev"); @@ -1245,8 +1284,14 @@ nodeDeviceUpdateMediatedDevices(void) virObjectEventStateQueue(driver->nodeDeviceEventState, event); }
+ /* Any mdevs that were previously defined but were not returned the latest + * mdevctl query should be removed from the device list */ + virNodeDeviceObjListForEachSafe(driver->devs, removeMissingPersistentMdevs, + &data);
So, apparently it is now suggested to limit the usage of virHashX to only virHashNew and resort to the GLib primitives otherwise, which means the code will have to be adjusted for that. Apart from that I don't see an issue with this patch. Reviewed-by: Erik Skultety <eskultet@redhat.com> (I will still give it a test for aspects that are not immediately clear from the code)

We need to query mdevctl for changes to device definitions since an administrator can define new devices by executing mdevctl outside of libvirt. In the future, mdevctl may add a way to signal device add/remove via events, but for now we resort to a bit of a workaround: monitoring the mdevctl config directory for changes to files. When a change is detected, we query mdevctl and update our device list. The mdevctl querying is handled in a throwaway thread, and these threads are synchronized with a mutex. Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> --- src/node_device/node_device_udev.c | 158 +++++++++++++++++++++++++++++ 1 file changed, 158 insertions(+) diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c index 038941ec51..8a1210cffb 100644 --- a/src/node_device/node_device_udev.c +++ b/src/node_device/node_device_udev.c @@ -19,6 +19,7 @@ */ #include <config.h> +#include <gio/gio.h> #include <libudev.h> #include <pciaccess.h> #include <scsi/scsi.h> @@ -66,6 +67,10 @@ struct _udevEventData { virCond threadCond; bool threadQuit; bool dataReady; + + GList *mdevctlMonitors; + virMutex mdevctlLock; + int mdevctlTimeout; }; static virClassPtr udevEventDataClass; @@ -86,6 +91,9 @@ udevEventDataDispose(void *obj) udev_monitor_unref(priv->udev_monitor); udev_unref(udev); + g_list_free_full(priv->mdevctlMonitors, g_object_unref); + virMutexDestroy(&priv->mdevctlLock); + virCondDestroy(&priv->threadCond); } @@ -117,6 +125,11 @@ udevEventDataNew(void) return NULL; } + if (virMutexInit(&ret->mdevctlLock) < 0) { + virObjectUnref(ret); + return NULL; + } + ret->watch = -1; return ret; } @@ -1998,6 +2011,133 @@ udevPCITranslateInit(bool privileged G_GNUC_UNUSED) } +static void +mdevctlHandlerThread(void *opaque G_GNUC_UNUSED) +{ + udevEventDataPtr priv = driver->privateData; + + /* ensure only a single thread can query mdevctl at a time */ + virMutexLock(&priv->mdevctlLock); + if (nodeDeviceUpdateMediatedDevices() < 0) + VIR_WARN("mdevctl failed to updated mediated devices"); + virMutexUnlock(&priv->mdevctlLock); +} + + +static void +scheduleMdevctlHandler(int timer G_GNUC_UNUSED, void *opaque) +{ + udevEventDataPtr priv = opaque; + virThread thread; + + if (priv->mdevctlTimeout > 0) { + virEventRemoveTimeout(priv->mdevctlTimeout); + priv->mdevctlTimeout = -1; + } + + if (virThreadCreateFull(&thread, false, mdevctlHandlerThread, + "mdevctl-thread", false, NULL) < 0) { + virReportSystemError(errno, "%s", + _("failed to create mdevctl thread")); + } +} + + +static void +mdevctlEventHandleCallback(GFileMonitor *monitor G_GNUC_UNUSED, + GFile *file, + GFile *other_file G_GNUC_UNUSED, + GFileMonitorEvent event_type, + gpointer user_data); + + +/* Recursively monitors a directory and its subdirectories for file changes and + * returns a GList of GFileMonitor objects */ +static GList* +monitorFileRecursively(udevEventDataPtr udev, + GFile *file) +{ + GList *monitors = NULL; + g_autoptr(GError) error = NULL; + g_autoptr(GFileEnumerator) children = NULL; + GFileMonitor *mon; + + if (!(children = g_file_enumerate_children(file, "standard::*", + G_FILE_QUERY_INFO_NONE, NULL, &error))) + goto error; + + if (!(mon = g_file_monitor(file, G_FILE_MONITOR_NONE, NULL, &error))) + goto error; + + g_signal_connect(mon, "changed", + G_CALLBACK(mdevctlEventHandleCallback), udev); + monitors = g_list_append(monitors, mon); + + while (true) { + GFileInfo *info; + GFile *child; + GList *child_monitors = NULL; + + if (!g_file_enumerator_iterate(children, &info, &child, NULL, &error)) + goto error; + if (!info) + break; + + if (g_file_query_file_type(child, G_FILE_QUERY_INFO_NONE, NULL) == + G_FILE_TYPE_DIRECTORY) { + + child_monitors = monitorFileRecursively(udev, child); + if (child_monitors) + monitors = g_list_concat(monitors, child_monitors); + } + } + + return monitors; + + error: + g_list_free_full(monitors, g_object_unref); + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Unable to monitor directory: %s"), error->message); + g_clear_error(&error); + return NULL; +} + + +static void +mdevctlEventHandleCallback(GFileMonitor *monitor G_GNUC_UNUSED, + GFile *file, + GFile *other_file G_GNUC_UNUSED, + GFileMonitorEvent event_type, + gpointer user_data) +{ + udevEventDataPtr priv = user_data; + /* if a new directory appears, monitor that directory for changes */ + if (event_type == G_FILE_MONITOR_EVENT_CREATED && + g_file_query_file_type(file, G_FILE_QUERY_INFO_NONE, NULL) == + G_FILE_TYPE_DIRECTORY) { + GList *newmonitors = monitorFileRecursively(priv, file); + priv->mdevctlMonitors = g_list_concat(priv->mdevctlMonitors, newmonitors); + } + + /* When mdevctl creates a device, it can result in multiple notify events + * emitted for a single logical change (e.g. several CHANGED events, or a + * CREATED and CHANGED event followed by CHANGES_DONE_HINT). To avoid + * spawning a mdevctl thread multiple times for a single logical + * configuration change, try to coalesce these changes by waiting for the + * CHANGES_DONE_HINT event. As a fallback, add a timeout to trigger the + * signal if that event never comes */ + if (event_type != G_FILE_MONITOR_EVENT_CHANGES_DONE_HINT) { + if (priv->mdevctlTimeout > 0) + virEventRemoveTimeout(priv->mdevctlTimeout); + priv->mdevctlTimeout = virEventAddTimeout(100, scheduleMdevctlHandler, + priv, NULL); + return; + } + + scheduleMdevctlHandler(-1, priv); +} + + static int nodeStateInitialize(bool privileged, const char *root, @@ -2007,6 +2147,8 @@ nodeStateInitialize(bool privileged, udevEventDataPtr priv = NULL; struct udev *udev = NULL; virThread enumThread; + g_autoptr(GFile) mdevctlConfigDir = g_file_new_for_path("/etc/mdevctl.d"); + GError *error = NULL; if (root != NULL) { virReportError(VIR_ERR_INVALID_ARG, "%s", @@ -2108,6 +2250,22 @@ nodeStateInitialize(bool privileged, if (priv->watch == -1) goto unlock; + /* mdevctl may add notification events in the future: + * https://github.com/mdevctl/mdevctl/issues/27. For now, fall back to + * monitoring the mdevctl configuration directory for changes. + * mdevctl configuration is stored in a directory tree within + * /etc/mdevctl.d/. There is a directory for each parent device, which + * contains a file defining each mediated device */ + if (!(priv->mdevctlMonitors = monitorFileRecursively(priv, + mdevctlConfigDir))) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("failed to monitor mdevctl config directory: %s"), + error->message); + g_clear_error(&error); + + goto cleanup; + } + virObjectUnlock(priv); /* Create a fictional 'computer' device to root the device tree. */ -- 2.26.2

On Wed, Feb 03, 2021 at 11:38:56AM -0600, Jonathon Jongsma wrote:
We need to query mdevctl for changes to device definitions since an administrator can define new devices by executing mdevctl outside of libvirt.
In the future, mdevctl may add a way to signal device add/remove via events, but for now we resort to a bit of a workaround: monitoring the mdevctl config directory for changes to files. When a change is detected, we query mdevctl and update our device list. The mdevctl querying is handled in a throwaway thread, and these threads are synchronized with a mutex.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> --- src/node_device/node_device_udev.c | 158 +++++++++++++++++++++++++++++ 1 file changed, 158 insertions(+)
diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c index 038941ec51..8a1210cffb 100644 --- a/src/node_device/node_device_udev.c +++ b/src/node_device/node_device_udev.c @@ -19,6 +19,7 @@ */
#include <config.h> +#include <gio/gio.h> #include <libudev.h> #include <pciaccess.h> #include <scsi/scsi.h> @@ -66,6 +67,10 @@ struct _udevEventData { virCond threadCond; bool threadQuit; bool dataReady; + + GList *mdevctlMonitors; + virMutex mdevctlLock; + int mdevctlTimeout; };
Sorry for the delay in review, it took me a while to wrap my head around the amount of GLib black magic going on in this patch. ...
+ + +static void +mdevctlEventHandleCallback(GFileMonitor *monitor G_GNUC_UNUSED, + GFile *file, + GFile *other_file G_GNUC_UNUSED, + GFileMonitorEvent event_type, + gpointer user_data); + + +/* Recursively monitors a directory and its subdirectories for file changes and + * returns a GList of GFileMonitor objects */ +static GList* +monitorFileRecursively(udevEventDataPtr udev, + GFile *file) +{ + GList *monitors = NULL; + g_autoptr(GError) error = NULL; + g_autoptr(GFileEnumerator) children = NULL; + GFileMonitor *mon; + + if (!(children = g_file_enumerate_children(file, "standard::*", + G_FILE_QUERY_INFO_NONE, NULL, &error))) + goto error; + + if (!(mon = g_file_monitor(file, G_FILE_MONITOR_NONE, NULL, &error))) + goto error; + + g_signal_connect(mon, "changed", + G_CALLBACK(mdevctlEventHandleCallback), udev);
newline here^
+ monitors = g_list_append(monitors, mon); + + while (true) { + GFileInfo *info; + GFile *child;
We should initialize all the pointers - coverity might find creative ways of complaining about it.
+ GList *child_monitors = NULL; + + if (!g_file_enumerator_iterate(children, &info, &child, NULL, &error))
I hope that the fact that ^this can block will never pose a problem for us.
+ goto error;
newline here
+ if (!info) + break; + + if (g_file_query_file_type(child, G_FILE_QUERY_INFO_NONE, NULL) == + G_FILE_TYPE_DIRECTORY) { + + child_monitors = monitorFileRecursively(udev, child); + if (child_monitors) + monitors = g_list_concat(monitors, child_monitors); + } + } + + return monitors; + + error: + g_list_free_full(monitors, g_object_unref); + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Unable to monitor directory: %s"), error->message); + g_clear_error(&error); + return NULL; +} + + +static void +mdevctlEventHandleCallback(GFileMonitor *monitor G_GNUC_UNUSED, + GFile *file, + GFile *other_file G_GNUC_UNUSED, + GFileMonitorEvent event_type, + gpointer user_data) +{ + udevEventDataPtr priv = user_data; + /* if a new directory appears, monitor that directory for changes */ + if (event_type == G_FILE_MONITOR_EVENT_CREATED && + g_file_query_file_type(file, G_FILE_QUERY_INFO_NONE, NULL) == + G_FILE_TYPE_DIRECTORY) { + GList *newmonitors = monitorFileRecursively(priv, file); + priv->mdevctlMonitors = g_list_concat(priv->mdevctlMonitors, newmonitors);
priv->mdevctlMonitors is shared among threads, but you only acquire the mutex just before exec'ing mdevctl in the mdevctlHandler thread, so ^this can yield unexpected results in terms of directories monitored.
+ } + + /* When mdevctl creates a device, it can result in multiple notify events + * emitted for a single logical change (e.g. several CHANGED events, or a + * CREATED and CHANGED event followed by CHANGES_DONE_HINT). To avoid + * spawning a mdevctl thread multiple times for a single logical + * configuration change, try to coalesce these changes by waiting for the + * CHANGES_DONE_HINT event. As a fallback, add a timeout to trigger the + * signal if that event never comes */
So there's no guaranteed pattern to monitor, is that so? IOW Even the CHANGES_DONE_HINT may not actually arrive? That's a very poor design. I'd like to ditch the timer as much as possible.
+ if (event_type != G_FILE_MONITOR_EVENT_CHANGES_DONE_HINT) { + if (priv->mdevctlTimeout > 0) + virEventRemoveTimeout(priv->mdevctlTimeout); + priv->mdevctlTimeout = virEventAddTimeout(100, scheduleMdevctlHandler, + priv, NULL); + return; + } + + scheduleMdevctlHandler(-1, priv); +} + + static int nodeStateInitialize(bool privileged, const char *root, @@ -2007,6 +2147,8 @@ nodeStateInitialize(bool privileged, udevEventDataPtr priv = NULL; struct udev *udev = NULL; virThread enumThread; + g_autoptr(GFile) mdevctlConfigDir = g_file_new_for_path("/etc/mdevctl.d"); + GError *error = NULL;
^This error isn't passed anywhere...
if (root != NULL) { virReportError(VIR_ERR_INVALID_ARG, "%s", @@ -2108,6 +2250,22 @@ nodeStateInitialize(bool privileged, if (priv->watch == -1) goto unlock;
+ /* mdevctl may add notification events in the future: + * https://github.com/mdevctl/mdevctl/issues/27. For now, fall back to + * monitoring the mdevctl configuration directory for changes. + * mdevctl configuration is stored in a directory tree within + * /etc/mdevctl.d/. There is a directory for each parent device, which + * contains a file defining each mediated device */
One security aspect with this proposal that needs to mentioned is that we should not make the directory to be monitored configurable, otherwise it's so easy to deplete resources - simply because this patch doesn't introduce any limit on number of spawned threads (like we have for thread pools).
+ if (!(priv->mdevctlMonitors = monitorFileRecursively(priv, + mdevctlConfigDir))) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("failed to monitor mdevctl config directory: %s"), + error->message);
This error message is redundant, because monitorFileRecursively already reports errors appending the respective GError, not to mentioned you didn't pass error anywhere, so it would always be NULL. Erik

On Wed, 17 Feb 2021 14:35:51 +0100 Erik Skultety <eskultet@redhat.com> wrote:
+static void +mdevctlEventHandleCallback(GFileMonitor *monitor G_GNUC_UNUSED, + GFile *file, + GFile *other_file G_GNUC_UNUSED, + GFileMonitorEvent event_type, + gpointer user_data) +{ + udevEventDataPtr priv = user_data; + /* if a new directory appears, monitor that directory for changes */ + if (event_type == G_FILE_MONITOR_EVENT_CREATED && + g_file_query_file_type(file, G_FILE_QUERY_INFO_NONE, NULL) == + G_FILE_TYPE_DIRECTORY) { + GList *newmonitors = monitorFileRecursively(priv, file); + priv->mdevctlMonitors = g_list_concat(priv->mdevctlMonitors, newmonitors);
priv->mdevctlMonitors is shared among threads, but you only acquire the mutex just before exec'ing mdevctl in the mdevctlHandler thread, so ^this can yield unexpected results in terms of directories monitored.
A) It doesn't really affect the directories being monitored, it only affects our ability to free these monitors later in the destructor. B) Aside from the very first time this variable is set in nodeStateInitialize(), the only place that accesses the mdevctlMonitors variable is this glib signal callback (to add new monitors to the list) and the destructor (to free the monitors). And I believe those should always run in the main thread. So I don't think there's really much of a thread-safety issue here. Perhaps there's a slight race in nodeStateInitialize() which appears to be called from a non-main thread. If a directory gets created immediately after connecting to the file monitor signal but before the function returns and the GList gets assigned to priv->mdevctlMonitors, I suppose the signal handler could run and it could get corrupted. Would you like me to protect all accesses with a Mutex?
+ } + + /* When mdevctl creates a device, it can result in multiple notify events + * emitted for a single logical change (e.g. several CHANGED events, or a + * CREATED and CHANGED event followed by CHANGES_DONE_HINT). To avoid + * spawning a mdevctl thread multiple times for a single logical + * configuration change, try to coalesce these changes by waiting for the + * CHANGES_DONE_HINT event. As a fallback, add a timeout to trigger the + * signal if that event never comes */
So there's no guaranteed pattern to monitor, is that so? IOW Even the CHANGES_DONE_HINT may not actually arrive? That's a very poor design. I'd like to ditch the timer as much as possible.
Maybe we can simply rely on the CHANGES_DONE_HINT. I tend to be conservative and was trying to handle the case where the CHANGES_DONE_HINT might not be delivered (it is called a "hint" after all). Part of this was caused by my reading this comment in a test case within the glib repository: /* Sometimes the emission of 'CHANGES_DONE_HINT' may be late because * it depends on the ability of file monitor implementation to report * 'CHANGES_DONE_HINT' itself. If the file monitor implementation * doesn't report 'CHANGES_DONE_HINT' itself, it may be emitted by * GLocalFileMonitor after a few seconds, which causes the event to * mix with results from different steps. Since 'CHANGES_DONE_HINT' * is just a hint, we don't require it to be reliable and we simply * ignore unexpected 'CHANGES_DONE_HINT' events here. */ So it is not reliably communicated if your file monitor backend doesn't implement it. But the above comment seems to suggest that if the file monitor backend doesn't support it, the event will be generated by glib itself and will simply be delayed by a couple of seconds rather than missing altogether. I'd have to read the glib code to tell whether it's possible for it to be omitted altogether. I guess I don't see the timer as a horrible workaround, but I can ditch it if you want. Alternatively I suppose I could simply kick off a new thread for each file change event and not bother trying to coalesce them at all. This will result in perhaps a an extra thread or two being launched when mdevctl modifies the mdev registry behind our backs, but that's unlikely to be a common occurrence and so shouldn't really be a significant performance issue.
+ if (event_type != G_FILE_MONITOR_EVENT_CHANGES_DONE_HINT) { + if (priv->mdevctlTimeout > 0) + virEventRemoveTimeout(priv->mdevctlTimeout); + priv->mdevctlTimeout = virEventAddTimeout(100, scheduleMdevctlHandler, + priv, NULL); + return; + } + + scheduleMdevctlHandler(-1, priv); +} + + static int nodeStateInitialize(bool privileged, const char *root,
Jonathon

On Thu, Feb 18, 2021 at 12:04:56PM -0600, Jonathon Jongsma wrote:
On Wed, 17 Feb 2021 14:35:51 +0100 Erik Skultety <eskultet@redhat.com> wrote:
+static void +mdevctlEventHandleCallback(GFileMonitor *monitor G_GNUC_UNUSED, + GFile *file, + GFile *other_file G_GNUC_UNUSED, + GFileMonitorEvent event_type, + gpointer user_data) +{ + udevEventDataPtr priv = user_data; + /* if a new directory appears, monitor that directory for changes */ + if (event_type == G_FILE_MONITOR_EVENT_CREATED && + g_file_query_file_type(file, G_FILE_QUERY_INFO_NONE, NULL) == + G_FILE_TYPE_DIRECTORY) { + GList *newmonitors = monitorFileRecursively(priv, file); + priv->mdevctlMonitors = g_list_concat(priv->mdevctlMonitors, newmonitors);
priv->mdevctlMonitors is shared among threads, but you only acquire the mutex just before exec'ing mdevctl in the mdevctlHandler thread, so ^this can yield unexpected results in terms of directories monitored.
A) It doesn't really affect the directories being monitored, it only affects our ability to free these monitors later in the destructor.
B) Aside from the very first time this variable is set in nodeStateInitialize(), the only place that accesses the mdevctlMonitors variable is this glib signal callback (to add new monitors to the list) and the destructor (to free the monitors). And I believe those should always run in the main thread. So I don't think there's really much of a thread-safety issue here. Perhaps there's a slight race in nodeStateInitialize() which appears to be called from a non-main thread. If a directory gets created immediately after connecting to the file monitor signal but before the function returns and the GList gets assigned to priv->mdevctlMonitors, I suppose the signal handler could run and it could get corrupted. Would you like me to protect all accesses with a Mutex?
If for some reason we need to extend the code in the future, it's so easy to miss a single spot. So, I'd say - better safe than sorry - let's protect the variable the way it's supposed to be even though the race might not exist right now. Erik
+ } + + /* When mdevctl creates a device, it can result in multiple notify events + * emitted for a single logical change (e.g. several CHANGED events, or a + * CREATED and CHANGED event followed by CHANGES_DONE_HINT). To avoid + * spawning a mdevctl thread multiple times for a single logical + * configuration change, try to coalesce these changes by waiting for the + * CHANGES_DONE_HINT event. As a fallback, add a timeout to trigger the + * signal if that event never comes */
So there's no guaranteed pattern to monitor, is that so? IOW Even the CHANGES_DONE_HINT may not actually arrive? That's a very poor design. I'd like to ditch the timer as much as possible.
Maybe we can simply rely on the CHANGES_DONE_HINT. I tend to be conservative and was trying to handle the case where the CHANGES_DONE_HINT might not be delivered (it is called a "hint" after all). Part of this was caused by my reading this comment in a test case within the glib repository:
/* Sometimes the emission of 'CHANGES_DONE_HINT' may be late because * it depends on the ability of file monitor implementation to report * 'CHANGES_DONE_HINT' itself. If the file monitor implementation * doesn't report 'CHANGES_DONE_HINT' itself, it may be emitted by * GLocalFileMonitor after a few seconds, which causes the event to * mix with results from different steps. Since 'CHANGES_DONE_HINT' * is just a hint, we don't require it to be reliable and we simply * ignore unexpected 'CHANGES_DONE_HINT' events here. */
So it is not reliably communicated if your file monitor backend doesn't implement it. But the above comment seems to suggest that if the file monitor backend doesn't support it, the event will be generated by glib itself and will simply be delayed by a couple of seconds rather than missing altogether. I'd have to read the glib code to tell whether it's possible for it to be omitted altogether. I guess I don't see the timer as a horrible workaround, but I can ditch it if you want.
Alternatively I suppose I could simply kick off a new thread for each file change event and not bother trying to coalesce them at all. This will result in perhaps a an extra thread or two being launched when mdevctl modifies the mdev registry behind our backs, but that's unlikely to be a common occurrence and so shouldn't really be a significant performance issue.
If there may be multiple events connected to a single change, which one would trigger the actual device list update? I guess I'm starting to be okay with the timeout in the end. Erik

Abstract out the function used to generate the commandline for 'mdevctl start' since they take the same arguments. Add tests to ensure that we're generating the command properly. Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> --- src/node_device/node_device_driver.c | 28 ++++++-- src/node_device/node_device_driver.h | 5 ++ ...19_36ea_4111_8f0a_8c9a70e21366-define.argv | 1 + ...19_36ea_4111_8f0a_8c9a70e21366-define.json | 1 + ...39_495e_4243_ad9f_beb3f14c23d9-define.argv | 1 + ...39_495e_4243_ad9f_beb3f14c23d9-define.json | 1 + ...16_1ca8_49ac_b176_871d16c13076-define.argv | 1 + ...16_1ca8_49ac_b176_871d16c13076-define.json | 1 + tests/nodedevmdevctltest.c | 68 ++++++++++++++----- 9 files changed, 85 insertions(+), 22 deletions(-) create mode 100644 tests/nodedevmdevctldata/mdev_d069d019_36ea_4111_8f0a_8c9a70e21366-define.argv create mode 100644 tests/nodedevmdevctldata/mdev_d069d019_36ea_4111_8f0a_8c9a70e21366-define.json create mode 100644 tests/nodedevmdevctldata/mdev_d2441d39_495e_4243_ad9f_beb3f14c23d9-define.argv create mode 100644 tests/nodedevmdevctldata/mdev_d2441d39_495e_4243_ad9f_beb3f14c23d9-define.json create mode 100644 tests/nodedevmdevctldata/mdev_fedc4916_1ca8_49ac_b176_871d16c13076-define.argv create mode 100644 tests/nodedevmdevctldata/mdev_fedc4916_1ca8_49ac_b176_871d16c13076-define.json diff --git a/src/node_device/node_device_driver.c b/src/node_device/node_device_driver.c index 598cd865c5..c84f6ef03f 100644 --- a/src/node_device/node_device_driver.c +++ b/src/node_device/node_device_driver.c @@ -698,9 +698,13 @@ nodeDeviceFindAddressByName(const char *name) } -virCommandPtr -nodeDeviceGetMdevctlStartCommand(virNodeDeviceDefPtr def, - char **uuid_out) +/* the mdevctl 'start' and 'define' commands accept almost the exact same + * arguments, so provide a common implementation that can be wrapped by a more + * specific function */ +static virCommandPtr +nodeDeviceGetMdevctlDefineStartCommand(virNodeDeviceDefPtr def, + const char *subcommand, + char **uuid_out) { virCommandPtr cmd; g_autofree char *json = NULL; @@ -718,7 +722,7 @@ nodeDeviceGetMdevctlStartCommand(virNodeDeviceDefPtr def, return NULL; } - cmd = virCommandNewArgList(MDEVCTL, "start", + cmd = virCommandNewArgList(MDEVCTL, subcommand, "-p", parent_addr, "--jsonfile", "/dev/stdin", NULL); @@ -729,6 +733,22 @@ nodeDeviceGetMdevctlStartCommand(virNodeDeviceDefPtr def, return cmd; } +virCommandPtr +nodeDeviceGetMdevctlStartCommand(virNodeDeviceDefPtr def, + char **uuid_out) +{ + return nodeDeviceGetMdevctlDefineStartCommand(def, "start", uuid_out); +} + +virCommandPtr +nodeDeviceGetMdevctlDefineCommand(virNodeDeviceDefPtr def, + char **uuid_out) +{ + return nodeDeviceGetMdevctlDefineStartCommand(def, "define", uuid_out); +} + + + static int virMdevctlStart(virNodeDeviceDefPtr def, char **uuid) { diff --git a/src/node_device/node_device_driver.h b/src/node_device/node_device_driver.h index d8837e4ef8..04081a871d 100644 --- a/src/node_device/node_device_driver.h +++ b/src/node_device/node_device_driver.h @@ -116,6 +116,11 @@ nodeConnectNodeDeviceEventDeregisterAny(virConnectPtr conn, virCommandPtr nodeDeviceGetMdevctlStartCommand(virNodeDeviceDefPtr def, char **uuid_out); + +virCommandPtr +nodeDeviceGetMdevctlDefineCommand(virNodeDeviceDefPtr def, + char **uuid_out); + virCommandPtr nodeDeviceGetMdevctlStopCommand(const char *uuid); diff --git a/tests/nodedevmdevctldata/mdev_d069d019_36ea_4111_8f0a_8c9a70e21366-define.argv b/tests/nodedevmdevctldata/mdev_d069d019_36ea_4111_8f0a_8c9a70e21366-define.argv new file mode 100644 index 0000000000..773e98b963 --- /dev/null +++ b/tests/nodedevmdevctldata/mdev_d069d019_36ea_4111_8f0a_8c9a70e21366-define.argv @@ -0,0 +1 @@ +$MDEVCTL_BINARY$ define -p 0000:00:02.0 --jsonfile /dev/stdin diff --git a/tests/nodedevmdevctldata/mdev_d069d019_36ea_4111_8f0a_8c9a70e21366-define.json b/tests/nodedevmdevctldata/mdev_d069d019_36ea_4111_8f0a_8c9a70e21366-define.json new file mode 100644 index 0000000000..bfc6dcace3 --- /dev/null +++ b/tests/nodedevmdevctldata/mdev_d069d019_36ea_4111_8f0a_8c9a70e21366-define.json @@ -0,0 +1 @@ +{"mdev_type":"i915-GVTg_V5_8","start":"manual"} \ No newline at end of file diff --git a/tests/nodedevmdevctldata/mdev_d2441d39_495e_4243_ad9f_beb3f14c23d9-define.argv b/tests/nodedevmdevctldata/mdev_d2441d39_495e_4243_ad9f_beb3f14c23d9-define.argv new file mode 100644 index 0000000000..773e98b963 --- /dev/null +++ b/tests/nodedevmdevctldata/mdev_d2441d39_495e_4243_ad9f_beb3f14c23d9-define.argv @@ -0,0 +1 @@ +$MDEVCTL_BINARY$ define -p 0000:00:02.0 --jsonfile /dev/stdin diff --git a/tests/nodedevmdevctldata/mdev_d2441d39_495e_4243_ad9f_beb3f14c23d9-define.json b/tests/nodedevmdevctldata/mdev_d2441d39_495e_4243_ad9f_beb3f14c23d9-define.json new file mode 100644 index 0000000000..e5b22b2c44 --- /dev/null +++ b/tests/nodedevmdevctldata/mdev_d2441d39_495e_4243_ad9f_beb3f14c23d9-define.json @@ -0,0 +1 @@ +{"mdev_type":"i915-GVTg_V5_8","start":"manual","attrs":[{"example-attribute-1":"attribute-value-1"},{"example-attribute-2":"attribute-value-2"}]} \ No newline at end of file diff --git a/tests/nodedevmdevctldata/mdev_fedc4916_1ca8_49ac_b176_871d16c13076-define.argv b/tests/nodedevmdevctldata/mdev_fedc4916_1ca8_49ac_b176_871d16c13076-define.argv new file mode 100644 index 0000000000..773e98b963 --- /dev/null +++ b/tests/nodedevmdevctldata/mdev_fedc4916_1ca8_49ac_b176_871d16c13076-define.argv @@ -0,0 +1 @@ +$MDEVCTL_BINARY$ define -p 0000:00:02.0 --jsonfile /dev/stdin diff --git a/tests/nodedevmdevctldata/mdev_fedc4916_1ca8_49ac_b176_871d16c13076-define.json b/tests/nodedevmdevctldata/mdev_fedc4916_1ca8_49ac_b176_871d16c13076-define.json new file mode 100644 index 0000000000..2e03d0bd8e --- /dev/null +++ b/tests/nodedevmdevctldata/mdev_fedc4916_1ca8_49ac_b176_871d16c13076-define.json @@ -0,0 +1 @@ +{"mdev_type":"i915-GVTg_V5_8","start":"manual","attrs":[{"example-attribute":"attribute-value"}]} \ No newline at end of file diff --git a/tests/nodedevmdevctltest.c b/tests/nodedevmdevctltest.c index 8bc464d59f..1f48a605b3 100644 --- a/tests/nodedevmdevctltest.c +++ b/tests/nodedevmdevctltest.c @@ -10,10 +10,16 @@ #define VIR_FROM_THIS VIR_FROM_NODEDEV +typedef enum { + MDEVCTL_CMD_START, + MDEVCTL_CMD_DEFINE, +} MdevctlCmd; + struct startTestInfo { const char *virt_type; int create; const char *filename; + MdevctlCmd command; }; /* capture stdin passed to command */ @@ -45,12 +51,17 @@ nodedevCompareToFile(const char *actual, return virTestCompareToFile(replacedCmdline, filename); } + +typedef virCommandPtr (*MdevctlCmdFunc)(virNodeDeviceDefPtr, char **); + + static int testMdevctlStart(const char *virt_type, int create, + MdevctlCmdFunc mdevctl_cmd_func, const char *mdevxml, - const char *startcmdfile, - const char *startjsonfile) + const char *cmdfile, + const char *jsonfile) { g_autoptr(virNodeDeviceDef) def = NULL; virNodeDeviceObjPtr obj = NULL; @@ -66,7 +77,7 @@ testMdevctlStart(const char *virt_type, /* this function will set a stdin buffer containing the json configuration * of the device. The json value is captured in the callback above */ - cmd = nodeDeviceGetMdevctlStartCommand(def, &uuid); + cmd = mdevctl_cmd_func(def, &uuid); if (!cmd) goto cleanup; @@ -78,10 +89,10 @@ testMdevctlStart(const char *virt_type, if (!(actualCmdline = virBufferCurrentContent(&buf))) goto cleanup; - if (nodedevCompareToFile(actualCmdline, startcmdfile) < 0) + if (nodedevCompareToFile(actualCmdline, cmdfile) < 0) goto cleanup; - if (virTestCompareToFile(stdinbuf, startjsonfile) < 0) + if (virTestCompareToFile(stdinbuf, jsonfile) < 0) goto cleanup; ret = 0; @@ -96,17 +107,31 @@ static int testMdevctlStartHelper(const void *data) { const struct startTestInfo *info = data; + const char *cmd; + MdevctlCmdFunc func; + g_autofree char *mdevxml = NULL; + g_autofree char *cmdlinefile = NULL; + g_autofree char *jsonfile = NULL; + + if (info->command == MDEVCTL_CMD_START) { + cmd = "start"; + func = nodeDeviceGetMdevctlStartCommand; + } else if (info->command == MDEVCTL_CMD_DEFINE) { + cmd = "define"; + func = nodeDeviceGetMdevctlDefineCommand; + } else { + return -1; + } - g_autofree char *mdevxml = g_strdup_printf("%s/nodedevschemadata/%s.xml", - abs_srcdir, info->filename); - g_autofree char *cmdlinefile = g_strdup_printf("%s/nodedevmdevctldata/%s-start.argv", - abs_srcdir, info->filename); - g_autofree char *jsonfile = g_strdup_printf("%s/nodedevmdevctldata/%s-start.json", - abs_srcdir, info->filename); + mdevxml = g_strdup_printf("%s/nodedevschemadata/%s.xml", abs_srcdir, + info->filename); + cmdlinefile = g_strdup_printf("%s/nodedevmdevctldata/%s-%s.argv", + abs_srcdir, info->filename, cmd); + jsonfile = g_strdup_printf("%s/nodedevmdevctldata/%s-%s.json", abs_srcdir, + info->filename, cmd); - return testMdevctlStart(info->virt_type, - info->create, mdevxml, cmdlinefile, - jsonfile); + return testMdevctlStart(info->virt_type, info->create, func, + mdevxml, cmdlinefile, jsonfile); } static int @@ -347,15 +372,18 @@ mymain(void) if (virTestRun(desc, func, info) < 0) \ ret = -1; -#define DO_TEST_START_FULL(virt_type, create, filename) \ +#define DO_TEST_CMD(desc, virt_type, create, filename, command) \ do { \ - struct startTestInfo info = { virt_type, create, filename }; \ - DO_TEST_FULL("mdevctl start " filename, testMdevctlStartHelper, &info); \ + struct startTestInfo info = { virt_type, create, filename, command }; \ + DO_TEST_FULL(desc, testMdevctlStartHelper, &info); \ } \ while (0) #define DO_TEST_START(filename) \ - DO_TEST_START_FULL("QEMU", CREATE_DEVICE, filename) + DO_TEST_CMD("mdevctl start " filename, "QEMU", CREATE_DEVICE, filename, MDEVCTL_CMD_START) + +#define DO_TEST_DEFINE(filename) \ + DO_TEST_CMD("mdevctl define " filename, "QEMU", CREATE_DEVICE, filename, MDEVCTL_CMD_DEFINE) #define DO_TEST_STOP(uuid) \ DO_TEST_FULL("mdevctl stop " uuid, testMdevctlStop, uuid) @@ -378,6 +406,10 @@ mymain(void) DO_TEST_PARSE_JSON("mdevctl-list-multiple"); + DO_TEST_DEFINE("mdev_d069d019_36ea_4111_8f0a_8c9a70e21366"); + DO_TEST_DEFINE("mdev_fedc4916_1ca8_49ac_b176_871d16c13076"); + DO_TEST_DEFINE("mdev_d2441d39_495e_4243_ad9f_beb3f14c23d9"); + done: nodedevTestDriverFree(driver); -- 2.26.2

On Wed, Feb 03, 2021 at 11:38:57AM -0600, Jonathon Jongsma wrote:
Abstract out the function used to generate the commandline for 'mdevctl start' since they take the same arguments. Add tests to ensure that we're generating the command properly.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> --- ...
static int testMdevctlStart(const char *virt_type,
I think ^this now needs to become testMdevctlStartOrDefine or something like that
int create, + MdevctlCmdFunc mdevctl_cmd_func, const char *mdevxml, - const char *startcmdfile, - const char *startjsonfile) + const char *cmdfile, + const char *jsonfile) { g_autoptr(virNodeDeviceDef) def = NULL; virNodeDeviceObjPtr obj = NULL; @@ -66,7 +77,7 @@ testMdevctlStart(const char *virt_type,
/* this function will set a stdin buffer containing the json configuration * of the device. The json value is captured in the callback above */ - cmd = nodeDeviceGetMdevctlStartCommand(def, &uuid); + cmd = mdevctl_cmd_func(def, &uuid);
if (!cmd) goto cleanup; @@ -78,10 +89,10 @@ testMdevctlStart(const char *virt_type, if (!(actualCmdline = virBufferCurrentContent(&buf))) goto cleanup;
- if (nodedevCompareToFile(actualCmdline, startcmdfile) < 0) + if (nodedevCompareToFile(actualCmdline, cmdfile) < 0) goto cleanup;
- if (virTestCompareToFile(stdinbuf, startjsonfile) < 0) + if (virTestCompareToFile(stdinbuf, jsonfile) < 0) goto cleanup;
ret = 0; @@ -96,17 +107,31 @@ static int testMdevctlStartHelper(const void *data)
and ^this one similarly should become StartOrDefine or use a similar name along the lines. Reviewed-by: Erik Skultety <eskultet@redhat.com>

With mediated devices, we can now define persistent node devices that can be started and stopped. In order to take advantage of this, we need an API to define new node devices. Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> --- include/libvirt/libvirt-nodedev.h | 4 ++ src/driver-nodedev.h | 6 +++ src/libvirt-nodedev.c | 42 +++++++++++++++++ src/libvirt_public.syms | 5 ++ src/node_device/node_device_driver.c | 69 ++++++++++++++++++++++++++++ src/node_device/node_device_driver.h | 5 ++ src/node_device/node_device_udev.c | 1 + src/remote/remote_driver.c | 1 + src/remote/remote_protocol.x | 17 ++++++- src/remote_protocol-structs | 8 ++++ src/rpc/gendispatch.pl | 1 + 11 files changed, 158 insertions(+), 1 deletion(-) diff --git a/include/libvirt/libvirt-nodedev.h b/include/libvirt/libvirt-nodedev.h index 7cd679dac4..116684ae92 100644 --- a/include/libvirt/libvirt-nodedev.h +++ b/include/libvirt/libvirt-nodedev.h @@ -132,6 +132,10 @@ virNodeDevicePtr virNodeDeviceCreateXML (virConnectPtr conn, int virNodeDeviceDestroy (virNodeDevicePtr dev); +virNodeDevicePtr virNodeDeviceDefineXML(virConnectPtr conn, + const char *xmlDesc, + unsigned int flags); + /** * VIR_NODE_DEVICE_EVENT_CALLBACK: * diff --git a/src/driver-nodedev.h b/src/driver-nodedev.h index d0fc7f19cf..16d787f3fc 100644 --- a/src/driver-nodedev.h +++ b/src/driver-nodedev.h @@ -74,6 +74,11 @@ typedef virNodeDevicePtr typedef int (*virDrvNodeDeviceDestroy)(virNodeDevicePtr dev); +typedef virNodeDevicePtr +(*virDrvNodeDeviceDefineXML)(virConnectPtr conn, + const char *xmlDesc, + unsigned int flags); + typedef int (*virDrvConnectNodeDeviceEventRegisterAny)(virConnectPtr conn, virNodeDevicePtr dev, @@ -113,4 +118,5 @@ struct _virNodeDeviceDriver { virDrvNodeDeviceListCaps nodeDeviceListCaps; virDrvNodeDeviceCreateXML nodeDeviceCreateXML; virDrvNodeDeviceDestroy nodeDeviceDestroy; + virDrvNodeDeviceDefineXML nodeDeviceDefineXML; }; diff --git a/src/libvirt-nodedev.c b/src/libvirt-nodedev.c index 375b907852..15eb70218a 100644 --- a/src/libvirt-nodedev.c +++ b/src/libvirt-nodedev.c @@ -765,6 +765,48 @@ virNodeDeviceDestroy(virNodeDevicePtr dev) } +/** + * virNodeDeviceDefineXML: + * @conn: pointer to the hypervisor connection + * @xmlDesc: string containing an XML description of the device to be defined + * @flags: extra flags; not used yet, so callers should always pass 0 + * + * Define a new device on the VM host machine, for example, a mediated device + * + * virNodeDeviceFree should be used to free the resources after the + * node device object is no longer needed. + * + * Returns a node device object if successful, NULL in case of failure + */ +virNodeDevicePtr +virNodeDeviceDefineXML(virConnectPtr conn, + const char *xmlDesc, + unsigned int flags) +{ + VIR_DEBUG("conn=%p, xmlDesc=%s, flags=0x%x", conn, NULLSTR(xmlDesc), flags); + + virResetLastError(); + + virCheckConnectReturn(conn, NULL); + virCheckReadOnlyGoto(conn->flags, error); + virCheckNonNullArgGoto(xmlDesc, error); + + if (conn->nodeDeviceDriver && + conn->nodeDeviceDriver->nodeDeviceDefineXML) { + virNodeDevicePtr dev = conn->nodeDeviceDriver->nodeDeviceDefineXML(conn, xmlDesc, flags); + if (dev == NULL) + goto error; + return dev; + } + + virReportUnsupportedError(); + + error: + virDispatchError(conn); + return NULL; +} + + /** * virConnectNodeDeviceEventRegisterAny: * @conn: pointer to the connection diff --git a/src/libvirt_public.syms b/src/libvirt_public.syms index cf31f937d5..7d0f8a1c14 100644 --- a/src/libvirt_public.syms +++ b/src/libvirt_public.syms @@ -879,4 +879,9 @@ LIBVIRT_6.10.0 { virDomainAuthorizedSSHKeysSet; } LIBVIRT_6.0.0; +LIBVIRT_7.1.0 { + global: + virNodeDeviceDefineXML; +} LIBVIRT_6.10.0; + # .... define new API here using predicted next version number .... diff --git a/src/node_device/node_device_driver.c b/src/node_device/node_device_driver.c index c84f6ef03f..9f5ee51f08 100644 --- a/src/node_device/node_device_driver.c +++ b/src/node_device/node_device_driver.c @@ -769,6 +769,26 @@ virMdevctlStart(virNodeDeviceDefPtr def, char **uuid) } +static int +virMdevctlDefine(virNodeDeviceDefPtr def, char **uuid) +{ + int status; + g_autoptr(virCommand) cmd = nodeDeviceGetMdevctlDefineCommand(def, uuid); + if (!cmd) + return -1; + + /* an auto-generated uuid is returned via stdout if no uuid is specified in + * the mdevctl args */ + if (virCommandRun(cmd, &status) < 0 || status != 0) + return -1; + + /* remove newline */ + *uuid = g_strstrip(*uuid); + + return 0; +} + + static virNodeDevicePtr nodeDeviceCreateXMLMdev(virConnectPtr conn, virNodeDeviceDefPtr def) @@ -1100,6 +1120,55 @@ nodeDeviceDestroy(virNodeDevicePtr device) return ret; } +virNodeDevicePtr +nodeDeviceDefineXML(virConnectPtr conn, + const char *xmlDesc, + unsigned int flags) +{ + g_autoptr(virNodeDeviceDef) def = NULL; + virNodeDevicePtr device = NULL; + const char *virt_type = NULL; + g_autofree char *uuid = NULL; + + virCheckFlags(0, NULL); + + if (nodeDeviceWaitInit() < 0) + return NULL; + + virt_type = virConnectGetType(conn); + + if (!(def = virNodeDeviceDefParseString(xmlDesc, CREATE_DEVICE, virt_type))) + return NULL; + + if (virNodeDeviceDefineXMLEnsureACL(conn, def) < 0) + return NULL; + + if (!nodeDeviceHasCapability(def, VIR_NODE_DEV_CAP_MDEV)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("Unsupported device type")); + return NULL; + } + + if (!def->parent) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("cannot define a mediated device without a parent")); + return NULL; + } + + if (virMdevctlDefine(def, &uuid) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Unable to define mediated device")); + return NULL; + } + + def->caps->data.mdev.uuid = g_strdup(uuid); + mdevGenerateDeviceName(def); + device = nodeDeviceFindNewMediatedDevice(conn, uuid); + + return device; +} + + int nodeConnectNodeDeviceEventRegisterAny(virConnectPtr conn, diff --git a/src/node_device/node_device_driver.h b/src/node_device/node_device_driver.h index 04081a871d..fe6b8072bc 100644 --- a/src/node_device/node_device_driver.h +++ b/src/node_device/node_device_driver.h @@ -102,6 +102,11 @@ nodeDeviceCreateXML(virConnectPtr conn, int nodeDeviceDestroy(virNodeDevicePtr dev); +virNodeDevicePtr +nodeDeviceDefineXML(virConnectPtr conn, + const char *xmlDesc, + unsigned int flags); + int nodeConnectNodeDeviceEventRegisterAny(virConnectPtr conn, virNodeDevicePtr dev, diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c index 8a1210cffb..e54dfe1355 100644 --- a/src/node_device/node_device_udev.c +++ b/src/node_device/node_device_udev.c @@ -2313,6 +2313,7 @@ static virNodeDeviceDriver udevNodeDeviceDriver = { .nodeDeviceListCaps = nodeDeviceListCaps, /* 0.7.3 */ .nodeDeviceCreateXML = nodeDeviceCreateXML, /* 0.7.3 */ .nodeDeviceDestroy = nodeDeviceDestroy, /* 0.7.3 */ + .nodeDeviceDefineXML = nodeDeviceDefineXML, /* 7.1.0 */ }; diff --git a/src/remote/remote_driver.c b/src/remote/remote_driver.c index 1b784e61c7..0da22705bb 100644 --- a/src/remote/remote_driver.c +++ b/src/remote/remote_driver.c @@ -8661,6 +8661,7 @@ static virNodeDeviceDriver node_device_driver = { .nodeDeviceNumOfCaps = remoteNodeDeviceNumOfCaps, /* 0.5.0 */ .nodeDeviceListCaps = remoteNodeDeviceListCaps, /* 0.5.0 */ .nodeDeviceCreateXML = remoteNodeDeviceCreateXML, /* 0.6.3 */ + .nodeDeviceDefineXML = remoteNodeDeviceDefineXML, /* 7.1.0 */ .nodeDeviceDestroy = remoteNodeDeviceDestroy /* 0.6.3 */ }; diff --git a/src/remote/remote_protocol.x b/src/remote/remote_protocol.x index 2df38cef77..d2250502b4 100644 --- a/src/remote/remote_protocol.x +++ b/src/remote/remote_protocol.x @@ -2142,6 +2142,15 @@ struct remote_node_device_destroy_args { remote_nonnull_string name; }; +struct remote_node_device_define_xml_args { + remote_nonnull_string xml_desc; + unsigned int flags; +}; + +struct remote_node_device_define_xml_ret { + remote_nonnull_node_device dev; +}; + /* * Events Register/Deregister: @@ -6714,5 +6723,11 @@ enum remote_procedure { * @generate: none * @acl: domain:write */ - REMOTE_PROC_DOMAIN_AUTHORIZED_SSH_KEYS_SET = 425 + REMOTE_PROC_DOMAIN_AUTHORIZED_SSH_KEYS_SET = 425, + + /** + * @generate: both + * @acl: node_device:write + */ + REMOTE_PROC_NODE_DEVICE_DEFINE_XML = 426 }; diff --git a/src/remote_protocol-structs b/src/remote_protocol-structs index 9bcd14603d..565c1673f1 100644 --- a/src/remote_protocol-structs +++ b/src/remote_protocol-structs @@ -1600,6 +1600,13 @@ struct remote_node_device_create_xml_ret { struct remote_node_device_destroy_args { remote_nonnull_string name; }; +struct remote_node_device_define_xml_args { + remote_nonnull_string xml_desc; + u_int flags; +}; +struct remote_node_device_define_xml_ret { + remote_nonnull_node_device dev; +}; struct remote_connect_domain_event_register_ret { int cb_registered; }; @@ -3588,4 +3595,5 @@ enum remote_procedure { REMOTE_PROC_DOMAIN_EVENT_MEMORY_FAILURE = 423, REMOTE_PROC_DOMAIN_AUTHORIZED_SSH_KEYS_GET = 424, REMOTE_PROC_DOMAIN_AUTHORIZED_SSH_KEYS_SET = 425, + REMOTE_PROC_NODE_DEVICE_DEFINE_XML = 426, }; diff --git a/src/rpc/gendispatch.pl b/src/rpc/gendispatch.pl index 0020273d9e..30108272f1 100755 --- a/src/rpc/gendispatch.pl +++ b/src/rpc/gendispatch.pl @@ -567,6 +567,7 @@ elsif ($mode eq "server") { if ($argtype =~ m/^remote_node_device_/ and !($argtype =~ m/^remote_node_device_lookup_by_name_/) and !($argtype =~ m/^remote_node_device_create_xml_/) and + !($argtype =~ m/^remote_node_device_define_xml_/) and !($argtype =~ m/^remote_node_device_lookup_scsi_host_by_wwn_/)) { $has_node_device = 1; push(@vars_list, "virNodeDevicePtr dev = NULL"); -- 2.26.2

On Wed, Feb 03, 2021 at 11:38:58AM -0600, Jonathon Jongsma wrote:
With mediated devices, we can now define persistent node devices that can be started and stopped. In order to take advantage of this, we need an API to define new node devices.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> --- Reviewed-by: Erik Skultety <eskultet@redhat.com>

Now that we can filter active and inactive node devices in virConnectListAllNodeDevices(), add these switches to the virsh command. Eventual output (once everything is hooked up): virsh # nodedev-list --cap mdev mdev_bd2ea955_3402_4252_8c17_7468083a0f26 virsh # nodedev-list --inactive --cap mdev mdev_07d8b8b0_7e04_4c0f_97ed_9214ce12723c mdev_927c040f_ae7d_4a35_966e_286ba6ebbe1c virsh # nodedev-list --all --cap mdev mdev_07d8b8b0_7e04_4c0f_97ed_9214ce12723c mdev_927c040f_ae7d_4a35_966e_286ba6ebbe1c mdev_bd2ea955_3402_4252_8c17_7468083a0f26 Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> --- tools/virsh-nodedev.c | 30 ++++++++++++++++++++++++++---- 1 file changed, 26 insertions(+), 4 deletions(-) diff --git a/tools/virsh-nodedev.c b/tools/virsh-nodedev.c index 428ead7384..a2e83fb676 100644 --- a/tools/virsh-nodedev.c +++ b/tools/virsh-nodedev.c @@ -378,6 +378,14 @@ static const vshCmdOptDef opts_node_list_devices[] = { .completer = virshNodeDeviceCapabilityNameCompleter, .help = N_("capability names, separated by comma") }, + {.name = "inactive", + .type = VSH_OT_BOOL, + .help = N_("list inactive devices") + }, + {.name = "all", + .type = VSH_OT_BOOL, + .help = N_("list inactive & active devices") + }, {.name = NULL} }; @@ -393,18 +401,27 @@ cmdNodeListDevices(vshControl *ctl, const vshCmd *cmd G_GNUC_UNUSED) int ncaps = 0; virshNodeDeviceListPtr list = NULL; int cap_type = -1; + bool inactive, all; + inactive = vshCommandOptBool(cmd, "inactive"); + all = vshCommandOptBool(cmd, "all"); ignore_value(vshCommandOptStringQuiet(ctl, cmd, "cap", &cap_str)); if (cap_str) { - if (tree) { - vshError(ctl, "%s", _("Options --tree and --cap are incompatible")); - return false; - } if ((ncaps = vshStringToArray(cap_str, &caps)) < 0) return false; } + if (all && inactive) { + vshError(ctl, "%s", _("Option --all is incompatible with --inactive")); + return false; + } + + if (tree && (cap_str || inactive || all)) { + vshError(ctl, "%s", _("Option --tree is incompatible with other options")); + return false; + } + for (i = 0; i < ncaps; i++) { if ((cap_type = virNodeDevCapTypeFromString(caps[i])) < 0) { vshError(ctl, "%s", _("Invalid capability type")); @@ -481,6 +498,11 @@ cmdNodeListDevices(vshControl *ctl, const vshCmd *cmd G_GNUC_UNUSED) } } + if (inactive || all) + flags |= VIR_CONNECT_LIST_NODE_DEVICES_INACTIVE; + if (!inactive) + flags |= VIR_CONNECT_LIST_NODE_DEVICES_ACTIVE; + if (!(list = virshNodeDeviceListCollect(ctl, caps, ncaps, flags))) { ret = false; goto cleanup; -- 2.26.2

On Wed, Feb 03, 2021 at 11:38:59AM -0600, Jonathon Jongsma wrote:
Now that we can filter active and inactive node devices in virConnectListAllNodeDevices(), add these switches to the virsh command.
Eventual output (once everything is hooked up):
virsh # nodedev-list --cap mdev mdev_bd2ea955_3402_4252_8c17_7468083a0f26
virsh # nodedev-list --inactive --cap mdev mdev_07d8b8b0_7e04_4c0f_97ed_9214ce12723c mdev_927c040f_ae7d_4a35_966e_286ba6ebbe1c
virsh # nodedev-list --all --cap mdev mdev_07d8b8b0_7e04_4c0f_97ed_9214ce12723c mdev_927c040f_ae7d_4a35_966e_286ba6ebbe1c mdev_bd2ea955_3402_4252_8c17_7468083a0f26
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> --- tools/virsh-nodedev.c | 30 ++++++++++++++++++++++++++---- 1 file changed, 26 insertions(+), 4 deletions(-)
diff --git a/tools/virsh-nodedev.c b/tools/virsh-nodedev.c index 428ead7384..a2e83fb676 100644 --- a/tools/virsh-nodedev.c +++ b/tools/virsh-nodedev.c @@ -378,6 +378,14 @@ static const vshCmdOptDef opts_node_list_devices[] = { .completer = virshNodeDeviceCapabilityNameCompleter, .help = N_("capability names, separated by comma") }, + {.name = "inactive", + .type = VSH_OT_BOOL, + .help = N_("list inactive devices") + }, + {.name = "all", + .type = VSH_OT_BOOL, + .help = N_("list inactive & active devices") + }, {.name = NULL} };
@@ -393,18 +401,27 @@ cmdNodeListDevices(vshControl *ctl, const vshCmd *cmd G_GNUC_UNUSED) int ncaps = 0; virshNodeDeviceListPtr list = NULL; int cap_type = -1; + bool inactive, all;
1 declaration per line...
+ inactive = vshCommandOptBool(cmd, "inactive"); + all = vshCommandOptBool(cmd, "all");
...also ^these 2 can be used to initialize the variables at their definition. Reviewed-by: Erik Skultety <eskultet@redhat.com>

Add a virsh command that maps to virNodeDeviceDefineXML(). Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> Reviewed-by: Erik Skultety <eskultet@redhat.com> --- tools/virsh-nodedev.c | 58 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 58 insertions(+) diff --git a/tools/virsh-nodedev.c b/tools/virsh-nodedev.c index a2e83fb676..0f2f22d917 100644 --- a/tools/virsh-nodedev.c +++ b/tools/virsh-nodedev.c @@ -1014,6 +1014,58 @@ cmdNodeDeviceEvent(vshControl *ctl, const vshCmd *cmd) } +/* + * "nodedev-define" command + */ +static const vshCmdInfo info_node_device_define[] = { + {.name = "help", + .data = N_("Define a device by an xml file on a node") + }, + {.name = "desc", + .data = N_("Defines a persistent device on the node that can be " + "assigned to a domain. The device must be started before " + "it can be assigned to a domain.") + }, + {.name = NULL} +}; + +static const vshCmdOptDef opts_node_device_define[] = { + VIRSH_COMMON_OPT_FILE(N_("file containing an XML description " + "of the device")), + {.name = NULL} +}; + +static bool +cmdNodeDeviceDefine(vshControl *ctl, const vshCmd *cmd G_GNUC_UNUSED) +{ + virNodeDevicePtr dev = NULL; + const char *from = NULL; + bool ret = true; + char *buffer; + virshControlPtr priv = ctl->privData; + + if (vshCommandOptStringReq(ctl, cmd, "file", &from) < 0) + return false; + + if (virFileReadAll(from, VSH_MAX_XML_FILE, &buffer) < 0) + return false; + + dev = virNodeDeviceDefineXML(priv->conn, buffer, 0); + VIR_FREE(buffer); + + if (dev != NULL) { + vshPrintExtra(ctl, _("Node device '%s' defined from '%s'\n"), + virNodeDeviceGetName(dev), from); + virNodeDeviceFree(dev); + } else { + vshError(ctl, _("Failed to define node device from '%s'"), from); + ret = false; + } + + return ret; +} + + const vshCmdDef nodedevCmds[] = { {.name = "nodedev-create", .handler = cmdNodeDeviceCreate, @@ -1067,5 +1119,11 @@ const vshCmdDef nodedevCmds[] = { .info = info_node_device_event, .flags = 0 }, + {.name = "nodedev-define", + .handler = cmdNodeDeviceDefine, + .opts = opts_node_device_define, + .info = info_node_device_define, + .flags = 0 + }, {.name = NULL} }; -- 2.26.2

mdevctl 'stop' and 'undefine' commands take the same uuid parameter, so refactor the test infrastructure to share common implementation for both of these commands. The 'undefine' command will be introduced in a following patch. Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> --- tests/nodedevmdevctltest.c | 48 +++++++++++++++++++++++++++++++------- 1 file changed, 40 insertions(+), 8 deletions(-) diff --git a/tests/nodedevmdevctltest.c b/tests/nodedevmdevctltest.c index 1f48a605b3..84ae1b67be 100644 --- a/tests/nodedevmdevctltest.c +++ b/tests/nodedevmdevctltest.c @@ -12,7 +12,9 @@ typedef enum { MDEVCTL_CMD_START, + MDEVCTL_CMD_STOP, MDEVCTL_CMD_DEFINE, + MDEVCTL_CMD_UNDEFINE } MdevctlCmd; struct startTestInfo { @@ -134,19 +136,21 @@ testMdevctlStartHelper(const void *data) mdevxml, cmdlinefile, jsonfile); } +typedef virCommandPtr (*GetStopUndefineCmdFunc)(const char *uuid); +struct UuidCommandTestInfo { + const char *uuid; + MdevctlCmd command; +}; + static int -testMdevctlStop(const void *data) +testMdevctlUuidCommand(const char *uuid, GetStopUndefineCmdFunc func, const char *outfile) { - const char *uuid = data; g_auto(virBuffer) buf = VIR_BUFFER_INITIALIZER; const char *actualCmdline = NULL; int ret = -1; g_autoptr(virCommand) cmd = NULL; - g_autofree char *cmdlinefile = - g_strdup_printf("%s/nodedevmdevctldata/mdevctl-stop.argv", - abs_srcdir); - cmd = nodeDeviceGetMdevctlStopCommand(uuid); + cmd = func(uuid); if (!cmd) goto cleanup; @@ -158,7 +162,7 @@ testMdevctlStop(const void *data) if (!(actualCmdline = virBufferCurrentContent(&buf))) goto cleanup; - if (nodedevCompareToFile(actualCmdline, cmdlinefile) < 0) + if (nodedevCompareToFile(actualCmdline, outfile) < 0) goto cleanup; ret = 0; @@ -168,6 +172,27 @@ testMdevctlStop(const void *data) return ret; } +static int +testMdevctlUuidCommandHelper(const void *data) +{ + const struct UuidCommandTestInfo *info = data; + GetStopUndefineCmdFunc func; + const char *cmd; + g_autofree char *cmdlinefile = NULL; + + if (info->command == MDEVCTL_CMD_STOP) { + cmd = "stop"; + func = nodeDeviceGetMdevctlStopCommand; + } else { + return -1; + } + + cmdlinefile = g_strdup_printf("%s/nodedevmdevctldata/mdevctl-%s.argv", + abs_srcdir, cmd); + + return testMdevctlUuidCommand(info->uuid, func, cmdlinefile); +} + static int testMdevctlListDefined(const void *data G_GNUC_UNUSED) { @@ -385,8 +410,15 @@ mymain(void) #define DO_TEST_DEFINE(filename) \ DO_TEST_CMD("mdevctl define " filename, "QEMU", CREATE_DEVICE, filename, MDEVCTL_CMD_DEFINE) +#define DO_TEST_UUID_COMMAND_FULL(desc, uuid, command) \ + do { \ + struct UuidCommandTestInfo info = { uuid, command }; \ + DO_TEST_FULL(desc, testMdevctlUuidCommandHelper, &info); \ + } \ + while (0) + #define DO_TEST_STOP(uuid) \ - DO_TEST_FULL("mdevctl stop " uuid, testMdevctlStop, uuid) + DO_TEST_UUID_COMMAND_FULL("mdevctl stop " uuid, uuid, MDEVCTL_CMD_STOP) #define DO_TEST_LIST_DEFINED() \ DO_TEST_FULL("mdevctl list --defined", testMdevctlListDefined, NULL) -- 2.26.2

This interface allows you to undefine a persistently defined (but inactive) mediated devices. It is implemented via 'mdevctl' Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> --- include/libvirt/libvirt-nodedev.h | 2 + src/access/viraccessperm.c | 2 +- src/access/viraccessperm.h | 6 ++ src/driver-nodedev.h | 4 ++ src/libvirt-nodedev.c | 36 ++++++++++ src/libvirt_public.syms | 1 + src/node_device/node_device_driver.c | 67 +++++++++++++++++++ src/node_device/node_device_driver.h | 6 ++ src/node_device/node_device_udev.c | 1 + src/remote/remote_driver.c | 1 + src/remote/remote_protocol.x | 14 +++- src/remote_protocol-structs | 4 ++ .../nodedevmdevctldata/mdevctl-undefine.argv | 1 + tests/nodedevmdevctltest.c | 8 +++ 14 files changed, 151 insertions(+), 2 deletions(-) create mode 100644 tests/nodedevmdevctldata/mdevctl-undefine.argv diff --git a/include/libvirt/libvirt-nodedev.h b/include/libvirt/libvirt-nodedev.h index 116684ae92..facc02d243 100644 --- a/include/libvirt/libvirt-nodedev.h +++ b/include/libvirt/libvirt-nodedev.h @@ -136,6 +136,8 @@ virNodeDevicePtr virNodeDeviceDefineXML(virConnectPtr conn, const char *xmlDesc, unsigned int flags); +int virNodeDeviceUndefine(virNodeDevicePtr dev); + /** * VIR_NODE_DEVICE_EVENT_CALLBACK: * diff --git a/src/access/viraccessperm.c b/src/access/viraccessperm.c index 33db7752b6..d4a0c98b9b 100644 --- a/src/access/viraccessperm.c +++ b/src/access/viraccessperm.c @@ -70,7 +70,7 @@ VIR_ENUM_IMPL(virAccessPermNodeDevice, VIR_ACCESS_PERM_NODE_DEVICE_LAST, "getattr", "read", "write", "start", "stop", - "detach", + "detach", "delete", ); VIR_ENUM_IMPL(virAccessPermNWFilter, diff --git a/src/access/viraccessperm.h b/src/access/viraccessperm.h index 42996b9741..051246a7b6 100644 --- a/src/access/viraccessperm.h +++ b/src/access/viraccessperm.h @@ -500,6 +500,12 @@ typedef enum { */ VIR_ACCESS_PERM_NODE_DEVICE_DETACH, + /** + * @desc: Delete node device + * @message: Deleting node device driver requires authorization + */ + VIR_ACCESS_PERM_NODE_DEVICE_DELETE, + VIR_ACCESS_PERM_NODE_DEVICE_LAST } virAccessPermNodeDevice; diff --git a/src/driver-nodedev.h b/src/driver-nodedev.h index 16d787f3fc..e18029ab48 100644 --- a/src/driver-nodedev.h +++ b/src/driver-nodedev.h @@ -79,6 +79,9 @@ typedef virNodeDevicePtr const char *xmlDesc, unsigned int flags); +typedef int +(*virDrvNodeDeviceUndefine)(virNodeDevicePtr dev); + typedef int (*virDrvConnectNodeDeviceEventRegisterAny)(virConnectPtr conn, virNodeDevicePtr dev, @@ -119,4 +122,5 @@ struct _virNodeDeviceDriver { virDrvNodeDeviceCreateXML nodeDeviceCreateXML; virDrvNodeDeviceDestroy nodeDeviceDestroy; virDrvNodeDeviceDefineXML nodeDeviceDefineXML; + virDrvNodeDeviceUndefine nodeDeviceUndefine; }; diff --git a/src/libvirt-nodedev.c b/src/libvirt-nodedev.c index 15eb70218a..618778517b 100644 --- a/src/libvirt-nodedev.c +++ b/src/libvirt-nodedev.c @@ -807,6 +807,42 @@ virNodeDeviceDefineXML(virConnectPtr conn, } +/** + * virNodeDeviceUndefine: + * @dev: a device object + * + * Undefine the device object. The virtual device is removed from the host + * operating system. This function may require privileged access. + * + * Returns 0 in case of success and -1 in case of failure. + */ +int +virNodeDeviceUndefine(virNodeDevicePtr dev) +{ + VIR_DEBUG("dev=%p", dev); + + virResetLastError(); + + virCheckNodeDeviceReturn(dev, -1); + virCheckReadOnlyGoto(dev->conn->flags, error); + + if (dev->conn->nodeDeviceDriver && + dev->conn->nodeDeviceDriver->nodeDeviceUndefine) { + int retval = dev->conn->nodeDeviceDriver->nodeDeviceUndefine(dev); + if (retval < 0) + goto error; + + return 0; + } + + virReportUnsupportedError(); + + error: + virDispatchError(dev->conn); + return -1; +} + + /** * virConnectNodeDeviceEventRegisterAny: * @conn: pointer to the connection diff --git a/src/libvirt_public.syms b/src/libvirt_public.syms index 7d0f8a1c14..8382d5d313 100644 --- a/src/libvirt_public.syms +++ b/src/libvirt_public.syms @@ -882,6 +882,7 @@ LIBVIRT_6.10.0 { LIBVIRT_7.1.0 { global: virNodeDeviceDefineXML; + virNodeDeviceUndefine; } LIBVIRT_6.10.0; # .... define new API here using predicted next version number .... diff --git a/src/node_device/node_device_driver.c b/src/node_device/node_device_driver.c index 9f5ee51f08..a2aa4ce8be 100644 --- a/src/node_device/node_device_driver.c +++ b/src/node_device/node_device_driver.c @@ -878,6 +878,17 @@ nodeDeviceGetMdevctlStopCommand(const char *uuid) } +virCommandPtr +nodeDeviceGetMdevctlUndefineCommand(const char *uuid) +{ + return virCommandNewArgList(MDEVCTL, + "undefine", + "-u", + uuid, + NULL); + +} + static int virMdevctlStop(virNodeDeviceDefPtr def) { @@ -893,6 +904,21 @@ virMdevctlStop(virNodeDeviceDefPtr def) } +static int +virMdevctlUndefine(virNodeDeviceDefPtr def) +{ + int status; + g_autoptr(virCommand) cmd = NULL; + + cmd = nodeDeviceGetMdevctlUndefineCommand(def->caps->data.mdev.uuid); + + if (virCommandRun(cmd, &status) < 0 || status != 0) + return -1; + + return 0; +} + + virCommandPtr nodeDeviceGetMdevctlListCommand(bool defined, char **output) @@ -1169,6 +1195,47 @@ nodeDeviceDefineXML(virConnectPtr conn, } +int +nodeDeviceUndefine(virNodeDevicePtr device) +{ + int ret = -1; + virNodeDeviceObjPtr obj = NULL; + virNodeDeviceDefPtr def; + + if (nodeDeviceWaitInit() < 0) + return -1; + + if (!(obj = nodeDeviceObjFindByName(device->name))) + return -1; + def = virNodeDeviceObjGetDef(obj); + + if (virNodeDeviceUndefineEnsureACL(device->conn, def) < 0) + goto cleanup; + + if (virNodeDeviceObjIsActive(obj)) { + virReportError(VIR_ERR_OPERATION_INVALID, + _("node device '%s' is still active"), + def->name); + goto cleanup; + } + + if (nodeDeviceHasCapability(def, VIR_NODE_DEV_CAP_MDEV)) { + if (virMdevctlUndefine(def) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Unable to undefine mediated device")); + goto cleanup; + } + ret = 0; + } else { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("Unsupported device type")); + } + + cleanup: + virNodeDeviceObjEndAPI(&obj); + return ret; +} + int nodeConnectNodeDeviceEventRegisterAny(virConnectPtr conn, diff --git a/src/node_device/node_device_driver.h b/src/node_device/node_device_driver.h index fe6b8072bc..1a67391f87 100644 --- a/src/node_device/node_device_driver.h +++ b/src/node_device/node_device_driver.h @@ -107,6 +107,9 @@ nodeDeviceDefineXML(virConnectPtr conn, const char *xmlDesc, unsigned int flags); +int +nodeDeviceUndefine(virNodeDevicePtr dev); + int nodeConnectNodeDeviceEventRegisterAny(virConnectPtr conn, virNodeDevicePtr dev, @@ -129,6 +132,9 @@ nodeDeviceGetMdevctlDefineCommand(virNodeDeviceDefPtr def, virCommandPtr nodeDeviceGetMdevctlStopCommand(const char *uuid); +virCommandPtr +nodeDeviceGetMdevctlUndefineCommand(const char *uuid); + virCommandPtr nodeDeviceGetMdevctlListCommand(bool defined, char **output); diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c index e54dfe1355..b08fabd553 100644 --- a/src/node_device/node_device_udev.c +++ b/src/node_device/node_device_udev.c @@ -2314,6 +2314,7 @@ static virNodeDeviceDriver udevNodeDeviceDriver = { .nodeDeviceCreateXML = nodeDeviceCreateXML, /* 0.7.3 */ .nodeDeviceDestroy = nodeDeviceDestroy, /* 0.7.3 */ .nodeDeviceDefineXML = nodeDeviceDefineXML, /* 7.1.0 */ + .nodeDeviceUndefine = nodeDeviceUndefine, /* 7.1.0 */ }; diff --git a/src/remote/remote_driver.c b/src/remote/remote_driver.c index 0da22705bb..dbfde0c6d2 100644 --- a/src/remote/remote_driver.c +++ b/src/remote/remote_driver.c @@ -8662,6 +8662,7 @@ static virNodeDeviceDriver node_device_driver = { .nodeDeviceListCaps = remoteNodeDeviceListCaps, /* 0.5.0 */ .nodeDeviceCreateXML = remoteNodeDeviceCreateXML, /* 0.6.3 */ .nodeDeviceDefineXML = remoteNodeDeviceDefineXML, /* 7.1.0 */ + .nodeDeviceUndefine = remoteNodeDeviceUndefine, /* 7.1.0 */ .nodeDeviceDestroy = remoteNodeDeviceDestroy /* 0.6.3 */ }; diff --git a/src/remote/remote_protocol.x b/src/remote/remote_protocol.x index d2250502b4..7fee7539f0 100644 --- a/src/remote/remote_protocol.x +++ b/src/remote/remote_protocol.x @@ -2151,6 +2151,10 @@ struct remote_node_device_define_xml_ret { remote_nonnull_node_device dev; }; +struct remote_node_device_undefine_args { + remote_nonnull_string name; +}; + /* * Events Register/Deregister: @@ -6729,5 +6733,13 @@ enum remote_procedure { * @generate: both * @acl: node_device:write */ - REMOTE_PROC_NODE_DEVICE_DEFINE_XML = 426 + REMOTE_PROC_NODE_DEVICE_DEFINE_XML = 426, + + /** + * @generate: both + * @priority: high + * @acl: node_device:delete + */ + REMOTE_PROC_NODE_DEVICE_UNDEFINE = 427 + }; diff --git a/src/remote_protocol-structs b/src/remote_protocol-structs index 565c1673f1..6d136bdb66 100644 --- a/src/remote_protocol-structs +++ b/src/remote_protocol-structs @@ -1607,6 +1607,9 @@ struct remote_node_device_define_xml_args { struct remote_node_device_define_xml_ret { remote_nonnull_node_device dev; }; +struct remote_node_device_undefine_args { + remote_nonnull_string name; +}; struct remote_connect_domain_event_register_ret { int cb_registered; }; @@ -3596,4 +3599,5 @@ enum remote_procedure { REMOTE_PROC_DOMAIN_AUTHORIZED_SSH_KEYS_GET = 424, REMOTE_PROC_DOMAIN_AUTHORIZED_SSH_KEYS_SET = 425, REMOTE_PROC_NODE_DEVICE_DEFINE_XML = 426, + REMOTE_PROC_NODE_DEVICE_UNDEFINE = 427, }; diff --git a/tests/nodedevmdevctldata/mdevctl-undefine.argv b/tests/nodedevmdevctldata/mdevctl-undefine.argv new file mode 100644 index 0000000000..54717455f7 --- /dev/null +++ b/tests/nodedevmdevctldata/mdevctl-undefine.argv @@ -0,0 +1 @@ +$MDEVCTL_BINARY$ undefine -u d76a6b78-45ed-4149-a325-005f9abc5281 diff --git a/tests/nodedevmdevctltest.c b/tests/nodedevmdevctltest.c index 84ae1b67be..f01d8d2fbe 100644 --- a/tests/nodedevmdevctltest.c +++ b/tests/nodedevmdevctltest.c @@ -183,6 +183,9 @@ testMdevctlUuidCommandHelper(const void *data) if (info->command == MDEVCTL_CMD_STOP) { cmd = "stop"; func = nodeDeviceGetMdevctlStopCommand; + } else if (info->command == MDEVCTL_CMD_UNDEFINE) { + cmd = "undefine"; + func = nodeDeviceGetMdevctlUndefineCommand; } else { return -1; } @@ -420,6 +423,9 @@ mymain(void) #define DO_TEST_STOP(uuid) \ DO_TEST_UUID_COMMAND_FULL("mdevctl stop " uuid, uuid, MDEVCTL_CMD_STOP) +#define DO_TEST_UNDEFINE(uuid) \ + DO_TEST_UUID_COMMAND_FULL("mdevctl undefine " uuid, uuid, MDEVCTL_CMD_UNDEFINE) + #define DO_TEST_LIST_DEFINED() \ DO_TEST_FULL("mdevctl list --defined", testMdevctlListDefined, NULL) @@ -442,6 +448,8 @@ mymain(void) DO_TEST_DEFINE("mdev_fedc4916_1ca8_49ac_b176_871d16c13076"); DO_TEST_DEFINE("mdev_d2441d39_495e_4243_ad9f_beb3f14c23d9"); + DO_TEST_UNDEFINE("d76a6b78-45ed-4149-a325-005f9abc5281"); + done: nodedevTestDriverFree(driver); -- 2.26.2

Several functions accept providing a node device by name or by wwnn,wwpn pair. Extract the logic to do this into a function that can be used by both callers. Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> Reviewed-by: Erik Skultety <eskultet@redhat.com> --- tools/virsh-nodedev.c | 58 +++++++++++++++++++++---------------------- 1 file changed, 28 insertions(+), 30 deletions(-) diff --git a/tools/virsh-nodedev.c b/tools/virsh-nodedev.c index 0f2f22d917..530fe30c0f 100644 --- a/tools/virsh-nodedev.c +++ b/tools/virsh-nodedev.c @@ -111,23 +111,18 @@ static const vshCmdOptDef opts_node_device_destroy[] = { {.name = NULL} }; -static bool -cmdNodeDeviceDestroy(vshControl *ctl, const vshCmd *cmd) +static virNodeDevicePtr +vshFindNodeDevice(vshControl *ctl, const char *value) { virNodeDevicePtr dev = NULL; - bool ret = false; - const char *device_value = NULL; char **arr = NULL; int narr; virshControlPtr priv = ctl->privData; - if (vshCommandOptStringReq(ctl, cmd, "device", &device_value) < 0) - return false; - - if (strchr(device_value, ',')) { - narr = vshStringToArray(device_value, &arr); + if (strchr(value, ',')) { + narr = vshStringToArray(value, &arr); if (narr != 2) { - vshError(ctl, _("Malformed device value '%s'"), device_value); + vshError(ctl, _("Malformed device value '%s'"), value); goto cleanup; } @@ -136,9 +131,30 @@ cmdNodeDeviceDestroy(vshControl *ctl, const vshCmd *cmd) dev = virNodeDeviceLookupSCSIHostByWWN(priv->conn, arr[0], arr[1], 0); } else { - dev = virNodeDeviceLookupByName(priv->conn, device_value); + dev = virNodeDeviceLookupByName(priv->conn, value); } + if (!dev) { + vshError(ctl, "%s '%s'", _("Could not find matching device"), value); + goto cleanup; + } + + cleanup: + g_strfreev(arr); + return dev; +} + +static bool +cmdNodeDeviceDestroy(vshControl *ctl, const vshCmd *cmd) +{ + virNodeDevicePtr dev = NULL; + bool ret = false; + const char *device_value = NULL; + + if (vshCommandOptStringReq(ctl, cmd, "device", &device_value) < 0) + return false; + + dev = vshFindNodeDevice(ctl, device_value); if (!dev) { vshError(ctl, "%s '%s'", _("Could not find matching device"), device_value); goto cleanup; @@ -153,7 +169,6 @@ cmdNodeDeviceDestroy(vshControl *ctl, const vshCmd *cmd) ret = true; cleanup: - g_strfreev(arr); if (dev) virNodeDeviceFree(dev); return ret; @@ -579,28 +594,12 @@ cmdNodeDeviceDumpXML(vshControl *ctl, const vshCmd *cmd) virNodeDevicePtr device = NULL; char *xml = NULL; const char *device_value = NULL; - char **arr = NULL; - int narr; bool ret = false; - virshControlPtr priv = ctl->privData; if (vshCommandOptStringReq(ctl, cmd, "device", &device_value) < 0) return false; - if (strchr(device_value, ',')) { - narr = vshStringToArray(device_value, &arr); - if (narr != 2) { - vshError(ctl, _("Malformed device value '%s'"), device_value); - goto cleanup; - } - - if (!virValidateWWN(arr[0]) || !virValidateWWN(arr[1])) - goto cleanup; - - device = virNodeDeviceLookupSCSIHostByWWN(priv->conn, arr[0], arr[1], 0); - } else { - device = virNodeDeviceLookupByName(priv->conn, device_value); - } + device = vshFindNodeDevice(ctl, device_value); if (!device) { vshError(ctl, "%s '%s'", _("Could not find matching device"), device_value); @@ -614,7 +613,6 @@ cmdNodeDeviceDumpXML(vshControl *ctl, const vshCmd *cmd) ret = true; cleanup: - g_strfreev(arr); VIR_FREE(xml); if (device) virNodeDeviceFree(device); -- 2.26.2

Add a virsh command that maps to virNodeDeviceUndefine(). Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> --- tools/virsh-nodedev.c | 61 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 61 insertions(+) diff --git a/tools/virsh-nodedev.c b/tools/virsh-nodedev.c index 530fe30c0f..87e2998982 100644 --- a/tools/virsh-nodedev.c +++ b/tools/virsh-nodedev.c @@ -1012,6 +1012,61 @@ cmdNodeDeviceEvent(vshControl *ctl, const vshCmd *cmd) } +/* + * "nodedev-undefine" command + */ +static const vshCmdInfo info_node_device_undefine[] = { + {.name = "help", + .data = N_("Undefine an inactive node device") + }, + {.name = "desc", + .data = N_("Undefines the configuration for an inactive node device") + }, + {.name = NULL} +}; + +static const vshCmdOptDef opts_node_device_undefine[] = { + {.name = "device", + .type = VSH_OT_DATA, + .flags = VSH_OFLAG_REQ, + .help = N_("device name or wwn pair in 'wwnn,wwpn' format"), + .completer = virshNodeDeviceNameCompleter, + }, + {.name = NULL} +}; + +static bool +cmdNodeDeviceUndefine(vshControl *ctl, const vshCmd *cmd G_GNUC_UNUSED) +{ + virNodeDevicePtr dev = NULL; + bool ret = false; + const char *device_value = NULL; + + if (vshCommandOptStringReq(ctl, cmd, "device", &device_value) < 0) + return false; + + dev = vshFindNodeDevice(ctl, device_value); + + if (!dev) { + vshError(ctl, "%s '%s'", _("Could not find matching device"), device_value); + goto cleanup; + } + + if (virNodeDeviceUndefine(dev) == 0) { + vshPrintExtra(ctl, _("Undefined node device '%s'\n"), device_value); + } else { + vshError(ctl, _("Failed to undefine node device '%s'"), device_value); + goto cleanup; + } + + ret = true; + cleanup: + if (dev) + virNodeDeviceFree(dev); + return ret; +} + + /* * "nodedev-define" command */ @@ -1123,5 +1178,11 @@ const vshCmdDef nodedevCmds[] = { .info = info_node_device_define, .flags = 0 }, + {.name = "nodedev-undefine", + .handler = cmdNodeDeviceUndefine, + .opts = opts_node_device_undefine, + .info = info_node_device_undefine, + .flags = 0 + }, {.name = NULL} }; -- 2.26.2

On Wed, Feb 03, 2021 at 11:39:04AM -0600, Jonathon Jongsma wrote:
Add a virsh command that maps to virNodeDeviceUndefine().
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> Reviewed-by: Erik Skultety <eskultet@redhat.com>

This new API function provides a way to start a persistently-defined mediate device that was defined by virNodeDeviceDefineXML() (or one that was defined externally via mdevctl) Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> --- include/libvirt/libvirt-nodedev.h | 2 + src/driver-nodedev.h | 4 ++ src/libvirt-nodedev.c | 35 +++++++++++ src/libvirt_public.syms | 1 + src/node_device/node_device_driver.c | 63 ++++++++++++++++++++ src/node_device/node_device_driver.h | 6 ++ src/node_device/node_device_udev.c | 1 + src/remote/remote_driver.c | 1 + src/remote/remote_protocol.x | 13 +++- src/remote_protocol-structs | 4 ++ tests/nodedevmdevctldata/mdevctl-create.argv | 1 + tests/nodedevmdevctltest.c | 11 +++- 12 files changed, 140 insertions(+), 2 deletions(-) create mode 100644 tests/nodedevmdevctldata/mdevctl-create.argv diff --git a/include/libvirt/libvirt-nodedev.h b/include/libvirt/libvirt-nodedev.h index facc02d243..f826d5e534 100644 --- a/include/libvirt/libvirt-nodedev.h +++ b/include/libvirt/libvirt-nodedev.h @@ -138,6 +138,8 @@ virNodeDevicePtr virNodeDeviceDefineXML(virConnectPtr conn, int virNodeDeviceUndefine(virNodeDevicePtr dev); +int virNodeDeviceCreate(virNodeDevicePtr dev); + /** * VIR_NODE_DEVICE_EVENT_CALLBACK: * diff --git a/src/driver-nodedev.h b/src/driver-nodedev.h index e18029ab48..d00dd5845c 100644 --- a/src/driver-nodedev.h +++ b/src/driver-nodedev.h @@ -82,6 +82,9 @@ typedef virNodeDevicePtr typedef int (*virDrvNodeDeviceUndefine)(virNodeDevicePtr dev); +typedef int +(*virDrvNodeDeviceCreate)(virNodeDevicePtr def); + typedef int (*virDrvConnectNodeDeviceEventRegisterAny)(virConnectPtr conn, virNodeDevicePtr dev, @@ -123,4 +126,5 @@ struct _virNodeDeviceDriver { virDrvNodeDeviceDestroy nodeDeviceDestroy; virDrvNodeDeviceDefineXML nodeDeviceDefineXML; virDrvNodeDeviceUndefine nodeDeviceUndefine; + virDrvNodeDeviceCreate nodeDeviceCreate; }; diff --git a/src/libvirt-nodedev.c b/src/libvirt-nodedev.c index 618778517b..1d13466fc1 100644 --- a/src/libvirt-nodedev.c +++ b/src/libvirt-nodedev.c @@ -843,6 +843,41 @@ virNodeDeviceUndefine(virNodeDevicePtr dev) } +/** + * virNodeDeviceCreate: + * @dev: a device object + * + * Start a defined node device: + * + * Returns 0 in case of success and -1 in case of failure. + */ +int +virNodeDeviceCreate(virNodeDevicePtr dev) +{ + VIR_DEBUG("dev=%p", dev); + + virResetLastError(); + + virCheckNodeDeviceReturn(dev, -1); + virCheckReadOnlyGoto(dev->conn->flags, error); + + if (dev->conn->nodeDeviceDriver && + dev->conn->nodeDeviceDriver->nodeDeviceCreate) { + int retval = dev->conn->nodeDeviceDriver->nodeDeviceCreate(dev); + if (retval < 0) + goto error; + + return 0; + } + + virReportUnsupportedError(); + + error: + virDispatchError(dev->conn); + return -1; +} + + /** * virConnectNodeDeviceEventRegisterAny: * @conn: pointer to the connection diff --git a/src/libvirt_public.syms b/src/libvirt_public.syms index 8382d5d313..4007fbc78e 100644 --- a/src/libvirt_public.syms +++ b/src/libvirt_public.syms @@ -883,6 +883,7 @@ LIBVIRT_7.1.0 { global: virNodeDeviceDefineXML; virNodeDeviceUndefine; + virNodeDeviceCreate; } LIBVIRT_6.10.0; # .... define new API here using predicted next version number .... diff --git a/src/node_device/node_device_driver.c b/src/node_device/node_device_driver.c index a2aa4ce8be..d5cdf2b097 100644 --- a/src/node_device/node_device_driver.c +++ b/src/node_device/node_device_driver.c @@ -889,6 +889,17 @@ nodeDeviceGetMdevctlUndefineCommand(const char *uuid) } +virCommandPtr +nodeDeviceGetMdevctlCreateCommand(const char *uuid) +{ + return virCommandNewArgList(MDEVCTL, + "start", + "-u", + uuid, + NULL); + +} + static int virMdevctlStop(virNodeDeviceDefPtr def) { @@ -919,6 +930,21 @@ virMdevctlUndefine(virNodeDeviceDefPtr def) } +static int +virMdevctlCreate(virNodeDeviceDefPtr def) +{ + int status; + g_autoptr(virCommand) cmd = NULL; + + cmd = nodeDeviceGetMdevctlCreateCommand(def->caps->data.mdev.uuid); + + if (virCommandRun(cmd, &status) < 0 || status != 0) + return -1; + + return 0; +} + + virCommandPtr nodeDeviceGetMdevctlListCommand(bool defined, char **output) @@ -1237,6 +1263,43 @@ nodeDeviceUndefine(virNodeDevicePtr device) } +int nodeDeviceCreate(virNodeDevicePtr device) +{ + int ret = -1; + virNodeDeviceObjPtr obj = NULL; + virNodeDeviceDefPtr def; + + if (!(obj = nodeDeviceObjFindByName(device->name))) + return -1; + + if (virNodeDeviceObjIsActive(obj)) { + virReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("Device is already active")); + goto cleanup; + } + def = virNodeDeviceObjGetDef(obj); + + if (virNodeDeviceCreateEnsureACL(device->conn, def) < 0) + goto cleanup; + + if (nodeDeviceHasCapability(def, VIR_NODE_DEV_CAP_MDEV)) { + if (virMdevctlCreate(def) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Unable to create mediated device")); + goto cleanup; + } + ret = 0; + } else { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("Unsupported device type")); + } + + cleanup: + virNodeDeviceObjEndAPI(&obj); + return ret; +} + + int nodeConnectNodeDeviceEventRegisterAny(virConnectPtr conn, virNodeDevicePtr device, diff --git a/src/node_device/node_device_driver.h b/src/node_device/node_device_driver.h index 1a67391f87..1e23240559 100644 --- a/src/node_device/node_device_driver.h +++ b/src/node_device/node_device_driver.h @@ -153,3 +153,9 @@ nodeDeviceGenerateName(virNodeDeviceDefPtr def, bool nodeDeviceDefCopyFromMdevctl(virNodeDeviceDefPtr dst, virNodeDeviceDefPtr src); + +virCommandPtr +nodeDeviceGetMdevctlCreateCommand(const char *uuid); + +int +nodeDeviceCreate(virNodeDevicePtr dev); diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c index b08fabd553..56140a5711 100644 --- a/src/node_device/node_device_udev.c +++ b/src/node_device/node_device_udev.c @@ -2315,6 +2315,7 @@ static virNodeDeviceDriver udevNodeDeviceDriver = { .nodeDeviceDestroy = nodeDeviceDestroy, /* 0.7.3 */ .nodeDeviceDefineXML = nodeDeviceDefineXML, /* 7.1.0 */ .nodeDeviceUndefine = nodeDeviceUndefine, /* 7.1.0 */ + .nodeDeviceCreate = nodeDeviceCreate, /* 7.1.0 */ }; diff --git a/src/remote/remote_driver.c b/src/remote/remote_driver.c index dbfde0c6d2..41c63a7aec 100644 --- a/src/remote/remote_driver.c +++ b/src/remote/remote_driver.c @@ -8661,6 +8661,7 @@ static virNodeDeviceDriver node_device_driver = { .nodeDeviceNumOfCaps = remoteNodeDeviceNumOfCaps, /* 0.5.0 */ .nodeDeviceListCaps = remoteNodeDeviceListCaps, /* 0.5.0 */ .nodeDeviceCreateXML = remoteNodeDeviceCreateXML, /* 0.6.3 */ + .nodeDeviceCreate = remoteNodeDeviceCreate, /* 7.1.0 */ .nodeDeviceDefineXML = remoteNodeDeviceDefineXML, /* 7.1.0 */ .nodeDeviceUndefine = remoteNodeDeviceUndefine, /* 7.1.0 */ .nodeDeviceDestroy = remoteNodeDeviceDestroy /* 0.6.3 */ diff --git a/src/remote/remote_protocol.x b/src/remote/remote_protocol.x index 7fee7539f0..a7cf227e53 100644 --- a/src/remote/remote_protocol.x +++ b/src/remote/remote_protocol.x @@ -2155,6 +2155,10 @@ struct remote_node_device_undefine_args { remote_nonnull_string name; }; +struct remote_node_device_create_args { + remote_nonnull_string name; +}; + /* * Events Register/Deregister: @@ -6740,6 +6744,13 @@ enum remote_procedure { * @priority: high * @acl: node_device:delete */ - REMOTE_PROC_NODE_DEVICE_UNDEFINE = 427 + REMOTE_PROC_NODE_DEVICE_UNDEFINE = 427, + + /** + * @generate: both + * @priority: high + * @acl: node_device:start + */ + REMOTE_PROC_NODE_DEVICE_CREATE = 428 }; diff --git a/src/remote_protocol-structs b/src/remote_protocol-structs index 6d136bdb66..0f35efc5b3 100644 --- a/src/remote_protocol-structs +++ b/src/remote_protocol-structs @@ -1610,6 +1610,9 @@ struct remote_node_device_define_xml_ret { struct remote_node_device_undefine_args { remote_nonnull_string name; }; +struct remote_node_device_create_args { + remote_nonnull_string name; +}; struct remote_connect_domain_event_register_ret { int cb_registered; }; @@ -3600,4 +3603,5 @@ enum remote_procedure { REMOTE_PROC_DOMAIN_AUTHORIZED_SSH_KEYS_SET = 425, REMOTE_PROC_NODE_DEVICE_DEFINE_XML = 426, REMOTE_PROC_NODE_DEVICE_UNDEFINE = 427, + REMOTE_PROC_NODE_DEVICE_CREATE = 428, }; diff --git a/tests/nodedevmdevctldata/mdevctl-create.argv b/tests/nodedevmdevctldata/mdevctl-create.argv new file mode 100644 index 0000000000..802109340c --- /dev/null +++ b/tests/nodedevmdevctldata/mdevctl-create.argv @@ -0,0 +1 @@ +$MDEVCTL_BINARY$ start -u 8a05ad83-3472-497d-8631-8142f31460e8 diff --git a/tests/nodedevmdevctltest.c b/tests/nodedevmdevctltest.c index f01d8d2fbe..89731430bb 100644 --- a/tests/nodedevmdevctltest.c +++ b/tests/nodedevmdevctltest.c @@ -14,7 +14,8 @@ typedef enum { MDEVCTL_CMD_START, MDEVCTL_CMD_STOP, MDEVCTL_CMD_DEFINE, - MDEVCTL_CMD_UNDEFINE + MDEVCTL_CMD_UNDEFINE, + MDEVCTL_CMD_CREATE, } MdevctlCmd; struct startTestInfo { @@ -186,6 +187,9 @@ testMdevctlUuidCommandHelper(const void *data) } else if (info->command == MDEVCTL_CMD_UNDEFINE) { cmd = "undefine"; func = nodeDeviceGetMdevctlUndefineCommand; + }else if (info->command == MDEVCTL_CMD_CREATE) { + cmd = "create"; + func = nodeDeviceGetMdevctlCreateCommand; } else { return -1; } @@ -426,6 +430,9 @@ mymain(void) #define DO_TEST_UNDEFINE(uuid) \ DO_TEST_UUID_COMMAND_FULL("mdevctl undefine " uuid, uuid, MDEVCTL_CMD_UNDEFINE) +#define DO_TEST_CREATE(uuid) \ + DO_TEST_UUID_COMMAND_FULL("mdevctl create " uuid, uuid, MDEVCTL_CMD_CREATE) + #define DO_TEST_LIST_DEFINED() \ DO_TEST_FULL("mdevctl list --defined", testMdevctlListDefined, NULL) @@ -450,6 +457,8 @@ mymain(void) DO_TEST_UNDEFINE("d76a6b78-45ed-4149-a325-005f9abc5281"); + DO_TEST_CREATE("8a05ad83-3472-497d-8631-8142f31460e8"); + done: nodedevTestDriverFree(driver); -- 2.26.2

On Wed, Feb 03, 2021 at 11:39:05AM -0600, Jonathon Jongsma wrote:
This new API function provides a way to start a persistently-defined mediate device that was defined by virNodeDeviceDefineXML() (or one that was defined externally via mdevctl)
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> --- ...
+int nodeDeviceCreate(virNodeDevicePtr device) +{ + int ret = -1; + virNodeDeviceObjPtr obj = NULL; + virNodeDeviceDefPtr def;
...actually, please initialize ^this pointer as well to remain consistent. Erik (The RB still stands)

This virsh command maps to virNodeDeviceCreate(), which starts a node device that has been previously defined by virNodeDeviceDefineXML(). This is only supported for mediated devices. Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> --- tools/virsh-nodedev.c | 57 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 57 insertions(+) diff --git a/tools/virsh-nodedev.c b/tools/virsh-nodedev.c index 87e2998982..3630cd7e6d 100644 --- a/tools/virsh-nodedev.c +++ b/tools/virsh-nodedev.c @@ -1119,6 +1119,57 @@ cmdNodeDeviceDefine(vshControl *ctl, const vshCmd *cmd G_GNUC_UNUSED) } +/* + * "nodedev-start" command + */ +static const vshCmdInfo info_node_device_start[] = { + {.name = "help", + .data = N_("Start an inactive node device") + }, + {.name = "desc", + .data = N_("Starts an inactive node device that was previously defined") + }, + {.name = NULL} +}; + +static const vshCmdOptDef opts_node_device_start[] = { + {.name = "device", + .type = VSH_OT_DATA, + .flags = VSH_OFLAG_REQ, + .help = N_("device name"), + .completer = virshNodeDeviceNameCompleter, + }, + {.name = NULL} +}; + +static bool +cmdNodeDeviceStart(vshControl *ctl, const vshCmd *cmd) +{ + const char *name = NULL; + virNodeDevicePtr device; + bool ret = true; + virshControlPtr priv = ctl->privData; + + if (vshCommandOptStringReq(ctl, cmd, "device", &name) < 0) + return false; + + if (!(device = virNodeDeviceLookupByName(priv->conn, name))) { + vshError(ctl, _("Could not find matching device '%s'"), name); + return false; + } + + if (virNodeDeviceCreate(device) == 0) { + vshPrintExtra(ctl, _("Device %s started\n"), name); + } else { + vshError(ctl, _("Failed to start device %s"), name); + ret = false; + } + + virNodeDeviceFree(device); + return ret; +} + + const vshCmdDef nodedevCmds[] = { {.name = "nodedev-create", .handler = cmdNodeDeviceCreate, @@ -1184,5 +1235,11 @@ const vshCmdDef nodedevCmds[] = { .info = info_node_device_undefine, .flags = 0 }, + {.name = "nodedev-start", + .handler = cmdNodeDeviceStart, + .opts = opts_node_device_start, + .info = info_node_device_start, + .flags = 0 + }, {.name = NULL} }; -- 2.26.2

On Wed, Feb 03, 2021 at 11:39:06AM -0600, Jonathon Jongsma wrote:
This virsh command maps to virNodeDeviceCreate(), which starts a node device that has been previously defined by virNodeDeviceDefineXML(). This is only supported for mediated devices.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> --- Reviewed-by: Erik Skultety <eskultet@redhat.com>

It will be useful to be able to specify a particular UUID for a mediated device when defining the node device. To accomodate that, allow this to be specified in the xml schema. This patch also parses and formats that value to the xml, but does not yet use it. Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> --- docs/schemas/nodedev.rng | 43 +++++++++++-------- src/conf/node_device_conf.c | 15 +++++++ .../mdevctl-list-multiple.out.xml | 4 ++ 3 files changed, 44 insertions(+), 18 deletions(-) diff --git a/docs/schemas/nodedev.rng b/docs/schemas/nodedev.rng index 5840dc9f0d..777227c38a 100644 --- a/docs/schemas/nodedev.rng +++ b/docs/schemas/nodedev.rng @@ -606,27 +606,34 @@ </define> <define name="capmdev"> - <attribute name="type"> - <value>mdev</value> - </attribute> - <element name="type"> - <attribute name="id"> - <data type="string"/> + <interleave> + <attribute name="type"> + <value>mdev</value> </attribute> - </element> - <optional> - <element name="iommuGroup"> - <attribute name="number"> - <ref name="unsignedInt"/> + <element name="type"> + <attribute name="id"> + <data type="string"/> </attribute> </element> - </optional> - <zeroOrMore> - <element name="attr"> - <attribute name="name"/> - <attribute name="value"/> - </element> - </zeroOrMore> + <optional> + <element name="iommuGroup"> + <attribute name="number"> + <ref name="unsignedInt"/> + </attribute> + </element> + </optional> + <optional> + <element name="uuid"> + <ref name="UUID"/> + </element> + </optional> + <zeroOrMore> + <element name="attr"> + <attribute name="name"/> + <attribute name="value"/> + </element> + </zeroOrMore> + </interleave> </define> <define name="capccwdev"> diff --git a/src/conf/node_device_conf.c b/src/conf/node_device_conf.c index 1093a461af..44cf4d152a 100644 --- a/src/conf/node_device_conf.c +++ b/src/conf/node_device_conf.c @@ -527,6 +527,7 @@ virNodeDeviceCapMdevDefFormat(virBufferPtr buf, size_t i; virBufferEscapeString(buf, "<type id='%s'/>\n", data->mdev.type); + virBufferEscapeString(buf, "<uuid>%s</uuid>\n", data->mdev.uuid); virBufferAsprintf(buf, "<iommuGroup number='%u'/>\n", data->mdev.iommuGroupNumber); @@ -2024,6 +2025,8 @@ virNodeDevCapMdevParseXML(xmlXPathContextPtr ctxt, int nattrs = 0; g_autofree xmlNodePtr *attrs = NULL; size_t i; + g_autofree char *uuidstr = NULL; + unsigned char uuidbuf[VIR_UUID_BUFLEN]; ctxt->node = node; @@ -2033,6 +2036,18 @@ virNodeDevCapMdevParseXML(xmlXPathContextPtr ctxt, goto out; } + if ((uuidstr = virXPathString("string(./uuid[1])", ctxt))) { + /* make sure that the provided uuid is valid */ + if (virUUIDParse(uuidstr, uuidbuf) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Invalid uuid '%s' for '%s'"), mdev->uuid, + def->name); + return -1; + } + mdev->uuid = g_new0(char, VIR_UUID_STRING_BUFLEN); + virUUIDFormat(uuidbuf, mdev->uuid); + } + /* 'iommuGroup' is optional, only report an error if the supplied value is * invalid (-2), not if it's missing (-1) */ if (virXPathUInt("number(./iommuGroup[1]/@number)", diff --git a/tests/nodedevmdevctldata/mdevctl-list-multiple.out.xml b/tests/nodedevmdevctldata/mdevctl-list-multiple.out.xml index 543ad916b7..cf7e966256 100644 --- a/tests/nodedevmdevctldata/mdevctl-list-multiple.out.xml +++ b/tests/nodedevmdevctldata/mdevctl-list-multiple.out.xml @@ -3,6 +3,7 @@ <parent>0000:00:02.0</parent> <capability type='mdev'> <type id='i915-GVTg_V5_4'/> + <uuid>200f228a-c80a-4d50-bfb7-f5a0e4e34045</uuid> <iommuGroup number='0'/> </capability> </device> @@ -11,6 +12,7 @@ <parent>0000:00:02.0</parent> <capability type='mdev'> <type id='i915-GVTg_V5_4'/> + <uuid>de807ffc-1923-4d5f-b6c9-b20ecebc6d4b</uuid> <iommuGroup number='0'/> </capability> </device> @@ -19,6 +21,7 @@ <parent>0000:00:02.0</parent> <capability type='mdev'> <type id='i915-GVTg_V5_8'/> + <uuid>435722ea-5f43-468a-874f-da34f1217f13</uuid> <iommuGroup number='0'/> <attr name='testattr' value='42'/> </capability> @@ -28,6 +31,7 @@ <parent>matrix</parent> <capability type='mdev'> <type id='vfio_ap-passthrough'/> + <uuid>783e6dbb-ea0e-411f-94e2-717eaad438bf</uuid> <iommuGroup number='0'/> <attr name='assign_adapter' value='5'/> <attr name='assign_adapter' value='6'/> -- 2.26.2

On Wed, Feb 03, 2021 at 11:39:07AM -0600, Jonathon Jongsma wrote:
It will be useful to be able to specify a particular UUID for a mediated device when defining the node device. To accomodate that, allow this to be specified in the xml schema. This patch also parses and formats that value to the xml, but does not yet use it.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> --- ...
+ if ((uuidstr = virXPathString("string(./uuid[1])", ctxt))) { + /* make sure that the provided uuid is valid */ + if (virUUIDParse(uuidstr, uuidbuf) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Invalid uuid '%s' for '%s'"), mdev->uuid,
we don't have mdev->uuid at this point yet, so: s/mdev->uuid/uuidstr
+ def->name);
we also haven't generated any name yet, so ^this yields "new device". I'd suggest to hardcode the message with "new mdev device" instead of using def->name. Being able to specify UUIDs with mdevs directly is great. Reviewed-by: Erik Skultety <eskultet@redhat.com>

Use the new <uuid> element in the mdev caps to define and start devices with a specific UUID. Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> --- src/node_device/node_device_driver.c | 18 +++++++++++++++--- ...019_36ea_4111_8f0a_8c9a70e21366-define.argv | 3 ++- ...d019_36ea_4111_8f0a_8c9a70e21366-start.argv | 3 ++- ...ev_d069d019_36ea_4111_8f0a_8c9a70e21366.xml | 1 + 4 files changed, 20 insertions(+), 5 deletions(-) diff --git a/src/node_device/node_device_driver.c b/src/node_device/node_device_driver.c index d5cdf2b097..bf97291041 100644 --- a/src/node_device/node_device_driver.c +++ b/src/node_device/node_device_driver.c @@ -728,6 +728,10 @@ nodeDeviceGetMdevctlDefineStartCommand(virNodeDeviceDefPtr def, NULL); virCommandSetInputBuffer(cmd, json); + + if (def->caps->data.mdev.uuid) + virCommandAddArgPair(cmd, "--uuid", def->caps->data.mdev.uuid); + virCommandSetOutputBuffer(cmd, uuid_out); return cmd; @@ -806,8 +810,12 @@ nodeDeviceCreateXMLMdev(virConnectPtr conn, _("Unable to start mediated device")); return NULL; } + if (uuid) { + g_free(def->caps->data.mdev.uuid); + def->caps->data.mdev.uuid = g_steal_pointer(&uuid); + } - return nodeDeviceFindNewMediatedDevice(conn, uuid); + return nodeDeviceFindNewMediatedDevice(conn, def->caps->data.mdev.uuid); } @@ -1213,9 +1221,13 @@ nodeDeviceDefineXML(virConnectPtr conn, return NULL; } - def->caps->data.mdev.uuid = g_strdup(uuid); + if (uuid) { + g_free(def->caps->data.mdev.uuid); + def->caps->data.mdev.uuid = g_steal_pointer(&uuid); + } + mdevGenerateDeviceName(def); - device = nodeDeviceFindNewMediatedDevice(conn, uuid); + device = nodeDeviceFindNewMediatedDevice(conn, def->caps->data.mdev.uuid); return device; } diff --git a/tests/nodedevmdevctldata/mdev_d069d019_36ea_4111_8f0a_8c9a70e21366-define.argv b/tests/nodedevmdevctldata/mdev_d069d019_36ea_4111_8f0a_8c9a70e21366-define.argv index 773e98b963..118ec7a8da 100644 --- a/tests/nodedevmdevctldata/mdev_d069d019_36ea_4111_8f0a_8c9a70e21366-define.argv +++ b/tests/nodedevmdevctldata/mdev_d069d019_36ea_4111_8f0a_8c9a70e21366-define.argv @@ -1 +1,2 @@ -$MDEVCTL_BINARY$ define -p 0000:00:02.0 --jsonfile /dev/stdin +$MDEVCTL_BINARY$ define -p 0000:00:02.0 --jsonfile /dev/stdin \ +--uuid=d069d019-36ea-4111-8f0a-8c9a70e21366 diff --git a/tests/nodedevmdevctldata/mdev_d069d019_36ea_4111_8f0a_8c9a70e21366-start.argv b/tests/nodedevmdevctldata/mdev_d069d019_36ea_4111_8f0a_8c9a70e21366-start.argv index eb7262035e..129f438e4a 100644 --- a/tests/nodedevmdevctldata/mdev_d069d019_36ea_4111_8f0a_8c9a70e21366-start.argv +++ b/tests/nodedevmdevctldata/mdev_d069d019_36ea_4111_8f0a_8c9a70e21366-start.argv @@ -1 +1,2 @@ -$MDEVCTL_BINARY$ start -p 0000:00:02.0 --jsonfile /dev/stdin +$MDEVCTL_BINARY$ start -p 0000:00:02.0 --jsonfile /dev/stdin \ +--uuid=d069d019-36ea-4111-8f0a-8c9a70e21366 diff --git a/tests/nodedevschemadata/mdev_d069d019_36ea_4111_8f0a_8c9a70e21366.xml b/tests/nodedevschemadata/mdev_d069d019_36ea_4111_8f0a_8c9a70e21366.xml index d6a2e99edc..605d8f63a1 100644 --- a/tests/nodedevschemadata/mdev_d069d019_36ea_4111_8f0a_8c9a70e21366.xml +++ b/tests/nodedevschemadata/mdev_d069d019_36ea_4111_8f0a_8c9a70e21366.xml @@ -3,5 +3,6 @@ <parent>pci_0000_00_02_0</parent> <capability type='mdev'> <type id='i915-GVTg_V5_8'/> + <uuid>d069d019-36ea-4111-8f0a-8c9a70e21366</uuid> </capability> </device> -- 2.26.2

On Wed, Feb 03, 2021 at 11:39:08AM -0600, Jonathon Jongsma wrote:
Use the new <uuid> element in the mdev caps to define and start devices with a specific UUID.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> --- src/node_device/node_device_driver.c | 18 +++++++++++++++--- ...019_36ea_4111_8f0a_8c9a70e21366-define.argv | 3 ++- ...d019_36ea_4111_8f0a_8c9a70e21366-start.argv | 3 ++- ...ev_d069d019_36ea_4111_8f0a_8c9a70e21366.xml | 1 + 4 files changed, 20 insertions(+), 5 deletions(-)
diff --git a/src/node_device/node_device_driver.c b/src/node_device/node_device_driver.c index d5cdf2b097..bf97291041 100644 --- a/src/node_device/node_device_driver.c +++ b/src/node_device/node_device_driver.c @@ -728,6 +728,10 @@ nodeDeviceGetMdevctlDefineStartCommand(virNodeDeviceDefPtr def, NULL);
virCommandSetInputBuffer(cmd, json); + + if (def->caps->data.mdev.uuid) + virCommandAddArgPair(cmd, "--uuid", def->caps->data.mdev.uuid); + virCommandSetOutputBuffer(cmd, uuid_out);
return cmd; @@ -806,8 +810,12 @@ nodeDeviceCreateXMLMdev(virConnectPtr conn, _("Unable to start mediated device")); return NULL; } + if (uuid) {
mdevctl returns an empty string when UUID was provided, so this has to become 'if (uuid && uuid[0])'
+ g_free(def->caps->data.mdev.uuid); + def->caps->data.mdev.uuid = g_steal_pointer(&uuid); + }
- return nodeDeviceFindNewMediatedDevice(conn, uuid); + return nodeDeviceFindNewMediatedDevice(conn, def->caps->data.mdev.uuid); }
@@ -1213,9 +1221,13 @@ nodeDeviceDefineXML(virConnectPtr conn, return NULL; }
- def->caps->data.mdev.uuid = g_strdup(uuid); + if (uuid) {
^Here too... Erik
+ g_free(def->caps->data.mdev.uuid); + def->caps->data.mdev.uuid = g_steal_pointer(&uuid); + } + mdevGenerateDeviceName(def); - device = nodeDeviceFindNewMediatedDevice(conn, uuid); + device = nodeDeviceFindNewMediatedDevice(conn, def->caps->data.mdev.uuid);

Calling `mdevctl stop` for a mediated device that is in use by an active domain will block until that vm exits (or the vm closes the device). Since the nodedev driver cannot query the hypervisor driver to see whether any active domains are using the device, we resort to a workaround that relies on the fact that a vfio group can only be opened by one user at a time. If we get an EBUSY error when attempting to open the group file, we assume the device is in use and refuse to try to destroy that device. Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> --- src/node_device/node_device_driver.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/src/node_device/node_device_driver.c b/src/node_device/node_device_driver.c index bf97291041..e6b0213157 100644 --- a/src/node_device/node_device_driver.c +++ b/src/node_device/node_device_driver.c @@ -1164,6 +1164,23 @@ nodeDeviceDestroy(virNodeDevicePtr device) ret = 0; } else if (nodeDeviceHasCapability(def, VIR_NODE_DEV_CAP_MDEV)) { + /* If this mediated device is in use by a vm, attempting to stop it + * will block until the vm closes the device. Since the nodedev driver + * cannot query the hypervisor driver to determine whether the device + * is in use by any active domains, we need to resort to a workaround. + * vfio only allows the group for a device to be opened by one user at + * a time. So if we get EBUSY when opening the group, we infer that the + * device is in use and shouldn't try to remove the device. */ + g_autofree char *vfiogroup = + virMediatedDeviceGetIOMMUGroupDev(def->caps->data.mdev.uuid); + VIR_AUTOCLOSE fd = open(vfiogroup, O_RDONLY); + + if (fd < 0 && errno == EBUSY) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Unable to destroy a mediated device that is in use")); + goto cleanup; + } + if (virMdevctlStop(def) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("Unable to stop mediated device")); -- 2.26.2

On Wed, Feb 03, 2021 at 11:39:09AM -0600, Jonathon Jongsma wrote:
Calling `mdevctl stop` for a mediated device that is in use by an active domain will block until that vm exits (or the vm closes the device). Since the nodedev driver cannot query the hypervisor driver to see whether any active domains are using the device, we resort to a workaround that relies on the fact that a vfio group can only be opened by one user at a time. If we get an EBUSY error when attempting to open the group file, we assume the device is in use and refuse to try to destroy that device.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com> --- src/node_device/node_device_driver.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)
diff --git a/src/node_device/node_device_driver.c b/src/node_device/node_device_driver.c index bf97291041..e6b0213157 100644 --- a/src/node_device/node_device_driver.c +++ b/src/node_device/node_device_driver.c @@ -1164,6 +1164,23 @@ nodeDeviceDestroy(virNodeDevicePtr device)
ret = 0; } else if (nodeDeviceHasCapability(def, VIR_NODE_DEV_CAP_MDEV)) { + /* If this mediated device is in use by a vm, attempting to stop it + * will block until the vm closes the device. Since the nodedev driver + * cannot query the hypervisor driver to determine whether the device
A nice detailed commentary, but for future reference I'd still add the reason *why* it is that nodedev driver cannot poke the hypervisor driver.
+ * is in use by any active domains, we need to resort to a workaround. + * vfio only allows the group for a device to be opened by one user at + * a time. So if we get EBUSY when opening the group, we infer that the + * device is in use and shouldn't try to remove the device. */ + g_autofree char *vfiogroup = + virMediatedDeviceGetIOMMUGroupDev(def->caps->data.mdev.uuid); + VIR_AUTOCLOSE fd = open(vfiogroup, O_RDONLY); + + if (fd < 0 && errno == EBUSY) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Unable to destroy a mediated device that is in use"));
I think a slightly better message would look like: _("Unable to destroy '%s': device in use"), def->name This was a simple workaround indeed :). Reviewed-by: Erik Skultety <eskultet@redhat.com>

On Wed, Feb 03, 2021 at 11:38:44AM -0600, Jonathon Jongsma wrote:
This patch series follows the previously-merged series which added support for transient mediated devices. This series expands mdev support to include persistent device definitions. Again, it relies on mdevctl as the backend.
It follows the common libvirt pattern of APIs by adding the following new APIs for node devices: - virNodeDeviceDefineXML() - defines a persistent device - virNodeDeviceUndefine() - undefines a persistent device - virNodeDeviceCreate() - starts a previously-defined device
So, I tested the patches on Friday and have a few notes: - all of the driver implementations mentioned ^above need to pass an error buffer to the respective mdevctl wrapper, because the generic error "Unable to XZY mediated device" doesn't help at all. Kind of like with QEMU errors, we need to extract it from an error buffer (you already have input and output, so just add another one) --> this relates to patches 14,18,21 - trying to undefine an active mdev reports an error in libvirt --> this is both inconsistent with the rest of libvirt's subsystems and there's also no reason for it since mdevctl supports it --> we need to support transient devices as well - trying to define the following XML hangs for 5s and then fails: <device> <parent>pci_0000_06_00_0</parent> <driver> <name>vfio_mdev</name> </driver> <capability type='mdev'> <type id='nvidia-11'/> <uuid>d41787d2-2e0e-4021-8ed2-b6f1ef305a9f</uuid> </capability> </device> -> I debugged this on Friday evening and for some reason driver->devs hash table of devices was NULL going through the nodeDeviceFindNewMediatedDevice call, but I haven't managed to figure out why it was NULL, listing devices worked just fine - I also prepared several adjustments to how we define the mdevctl wrappers + some test adjustments which I wanted to share with you, but I haven't tested those thoroughly since I was debugging why that XML couldn't be defined, I'll share it when we eliminate the underlying problem -> if you're wondering why I didn't just put it into the review, well, I figured sharing the code was more descriptive than if I used words - one thing that I didn't test with your patches are events
It also adds virsh commands mapping to these new APIs: nodedev-define, nodedev-undefine, and nodedev-start.
Since we rely on mdevctl for the definition of mediated devices, we need a way to stay up-to-date with devices that are defined by mdevctl (outside of libvirt). The method for staying up-to-date is currently a little bit crude due to the fact that mdevctl does not emit any events when new devices are added or removed. As a workaround, we create a file monitor for the mdevctl config directory and re-query mdevctl when we detect changes within that directory. In the future, mdevctl may introduce a more elegant solution.
Changes in v4: - rebase to git master - switch to throwaway thread for querying mdevctl - fixed a bug when removing devices because I was accidentally using virHashForEach() instead of virHashForeachSafe() - use DEFINED/UNDEFINED events instead of STARTED/STOPPED events - changes related to merging information about mdev devices from both udev a= nd mdevctl: - Re-used the same function to copy extra data from mdevctl regardless of whether we're processing a udev event or a mdevctl event (recommended by Erik). This results in slightly more complex handling of the object lifetimes (see patch 9), but it consolidates some code. - nodeDeviceDefCopyFromMdevctl() previously only copied the data that was unique to mdevctl and didn't exist in udev. It now copies additional data (possibly overwriting some udev). This solves a problem where a device = is defined but not active (i.e. we have not gotten any data from udev), and then changed (e.g. somebody calls 'mdevctl modify' to change the mdev type), but libvirt was not updating to the new definition. - fix a bug where we were mistakenly emitting 'update' events for devices th= at had not changed - Added the ability to specify a uuid for an mdev via device XML. - split some commits into multiple patches - updated new API version info to 7.1.0 - Fixed a bug reported by Yan Fu which hangs the client when attempting to destroy a nodedev that is in use by an active vm. See https://www.redhat.com/archives/libvir-list/2021-February/msg00116.html for solution suggested by Alex. - numerous smaller fixes from review findings
changes in v3: - streamlined tests -- removed some unnecessary duplication - split out patch to factor out node device name generation function - split nodeDeviceParseMdevctlChildDevice() into a separate function - added follow-up patch to remove space-padded alignment in header - refactored the mdevctl update handling significantly: - no longer a separate persistent thread that gets signaled by a timer - now piggybacks onto the existing udev thread and signals the thread in t= he same way that the udev event does. - Daniel suggested spawning a throw-away thread to handle mdevctl updates, but that introduces the complexity of possibly serializing multiple throw-away threads (e.g. if we get an 'created' event followed immediate= ly by a 'deleted' event, two threads may be spawned and we'd need to ensure they are properly ordered) - added virNodeDeviceObjListForEach() and virNodeDeviceObjListRemoveLocked() to simplify removing devices that are removed from mdevctl. - coding style fixes - NOTE: per Erik's request, I experimented with changing the way that mdevctl commands were generated and tested (e.g. introducing something like virMdevctlGetCommand(def, MDEVCTL_COMMAND_<SUBCOMMAND>, ...)), but it was too invasive and awkward and didn't seem worthwhile
I wrote the code for ^this and to be honest, it didn't seem that bad after all, like I said, I'll share the follow up patches when we know more about the issue I encountered. Erik

On Mon, 22 Feb 2021 11:08:12 +0100 Erik Skultety <eskultet@redhat.com> wrote:
On Wed, Feb 03, 2021 at 11:38:44AM -0600, Jonathon Jongsma wrote:
This patch series follows the previously-merged series which added support for transient mediated devices. This series expands mdev support to include persistent device definitions. Again, it relies on mdevctl as the backend.
It follows the common libvirt pattern of APIs by adding the following new APIs for node devices: - virNodeDeviceDefineXML() - defines a persistent device - virNodeDeviceUndefine() - undefines a persistent device - virNodeDeviceCreate() - starts a previously-defined device
So, I tested the patches on Friday and have a few notes: - all of the driver implementations mentioned ^above need to pass an error buffer to the respective mdevctl wrapper, because the generic error "Unable to XZY mediated device" doesn't help at all. Kind of like with QEMU errors, we need to extract it from an error buffer (you already have input and output, so just add another one) --> this relates to patches 14,18,21
- trying to undefine an active mdev reports an error in libvirt --> this is both inconsistent with the rest of libvirt's subsystems and there's also no reason for it since mdevctl supports it --> we need to support transient devices as well
- trying to define the following XML hangs for 5s and then fails: <device> <parent>pci_0000_06_00_0</parent> <driver> <name>vfio_mdev</name> </driver> <capability type='mdev'> <type id='nvidia-11'/> <uuid>d41787d2-2e0e-4021-8ed2-b6f1ef305a9f</uuid> </capability> </device>
I assume that you have a parent device that supports creating an mdev of this type? In other words, this was expected to succeed, right?
-> I debugged this on Friday evening and for some reason driver->devs hash table of devices was NULL going through the nodeDeviceFindNewMediatedDevice call, but I haven't managed to figure out why it was NULL, listing devices worked just fine
So, I often get a 5s delay when trying to define a new mdev. But it always succeeds after the delay. The reason for the delay is that the device is generally not in our device list the first time we call nodeDeviceFindNewMediatedDevice(). It's usually not in our device list because the device is only added to the driver's device list after we get a notification from mdevctl (via monitoring the mdevctl config directory) and then we re-query mdevctl for the list of defined devices. Because mdevctl is not blazingly fast[1], and because we do the querying in a separate thread, the new device has usually not been discovered when we first call nodeDeviceFindNewMediatedDevice(). So we sleep for 5s before trying again. For udev devices, this 5s delay is usually not hit because nodeDeviceFindNewDevice() first calls 'udevadm settle' before checking the device list. This waits just long enough to ensure that pending udev events are handled. Unfortunately we don't have a similar "wait just long enough" command for mdevctl, so we almost always hit the 5s retry timeout. There are a couple of potential ways to avoid hitting this 5s timeout: 1. directly add a placeholder device to the driver's device list immediately after calling 'mdevctl define' so that it is guaranteed to be in the device list when we call nodeDeviceFindNewMediatedDevice(). We can then update that device definition with additional info when mdevctl is eventually queried. 2. don't wait 5s every time a device is not found. Instead, start with a small sleep timeout and increase it gradually until we hit the max timeout. In other words, instead of check; sleep(5); check; sleep(5); check; sleep(5); we could instead do something like: check; sleep(1); check; sleep(2); check; sleep(4); check; sleep(8)... [1]$ time mdevctl list --defined 49bf2346-6747-4ad6-be5a-adc2f0a10b5c 0000:00:02.0 i915-GVTg_V5_8 manual 4fcd3666-e58a-4c63-969c-bd616a158c0d 0000:00:02.0 i915-GVTg_V5_8 manual 5c152919-3a34-4338-b7c9-532f73fa5440 0000:00:02.0 i915-GVTg_V5_4 manual real 0m0.782s user 0m0.735s sys 0m0.056s
- I also prepared several adjustments to how we define the mdevctl wrappers + some test adjustments which I wanted to share with you, but I haven't tested those thoroughly since I was debugging why that XML couldn't be defined, I'll share it when we eliminate the underlying problem -> if you're wondering why I didn't just put it into the review, well, I figured sharing the code was more descriptive than if I used words
Would you like me to wait for this before sending a new series? Thanks, Jonathon

On Wed, Feb 24, 2021 at 04:48:34PM -0600, Jonathon Jongsma wrote:
On Mon, 22 Feb 2021 11:08:12 +0100 Erik Skultety <eskultet@redhat.com> wrote:
On Wed, Feb 03, 2021 at 11:38:44AM -0600, Jonathon Jongsma wrote:
This patch series follows the previously-merged series which added support for transient mediated devices. This series expands mdev support to include persistent device definitions. Again, it relies on mdevctl as the backend.
It follows the common libvirt pattern of APIs by adding the following new APIs for node devices: - virNodeDeviceDefineXML() - defines a persistent device - virNodeDeviceUndefine() - undefines a persistent device - virNodeDeviceCreate() - starts a previously-defined device
So, I tested the patches on Friday and have a few notes: - all of the driver implementations mentioned ^above need to pass an error buffer to the respective mdevctl wrapper, because the generic error "Unable to XZY mediated device" doesn't help at all. Kind of like with QEMU errors, we need to extract it from an error buffer (you already have input and output, so just add another one) --> this relates to patches 14,18,21
- trying to undefine an active mdev reports an error in libvirt --> this is both inconsistent with the rest of libvirt's subsystems and there's also no reason for it since mdevctl supports it --> we need to support transient devices as well
- trying to define the following XML hangs for 5s and then fails: <device> <parent>pci_0000_06_00_0</parent> <driver> <name>vfio_mdev</name> </driver> <capability type='mdev'> <type id='nvidia-11'/> <uuid>d41787d2-2e0e-4021-8ed2-b6f1ef305a9f</uuid> </capability> </device>
I assume that you have a parent device that supports creating an mdev of this type? In other words, this was expected to succeed, right?
Yeah, there's is a real NVIDIA card backing it.
-> I debugged this on Friday evening and for some reason driver->devs hash table of devices was NULL going through the nodeDeviceFindNewMediatedDevice call, but I haven't managed to figure out why it was NULL, listing devices worked just fine
So, I often get a 5s delay when trying to define a new mdev. But it always succeeds after the delay.
The reason for the delay is that the device is generally not in our device list the first time we call nodeDeviceFindNewMediatedDevice(). It's usually not in our device list because the device is only added to the driver's device list after we get a notification from mdevctl (via monitoring the mdevctl config directory) and then we re-query mdevctl for the list of defined devices. Because mdevctl is not blazingly fast[1], and because we do the querying in a separate thread, the new device has usually not been discovered when we first call nodeDeviceFindNewMediatedDevice(). So we sleep for 5s before trying again.
Yeah, I suspected that we didn't get notified quickly enough, but the weird thing was that "driver->devs" was NULL, which cannot be, the list couldn't have disappeared. Actually, what happened in my case was that even after 5s there was nothing so the code assumed the event would never come and failed, no idea why.
For udev devices, this 5s delay is usually not hit because nodeDeviceFindNewDevice() first calls 'udevadm settle' before checking the device list. This waits just long enough to ensure that pending udev events are handled. Unfortunately we don't have a similar "wait just long enough" command for mdevctl, so we almost always hit the 5s retry timeout.
There are a couple of potential ways to avoid hitting this 5s timeout: 1. directly add a placeholder device to the driver's device list immediately after calling 'mdevctl define' so that it is guaranteed to be in the device list when we call nodeDeviceFindNewMediatedDevice(). We can then update that device definition with additional info when mdevctl is eventually queried.
Hmm, ^this would work for define. Mdevctl has returned the UUID, so we have all the information to start the device, so we don't care when the monitoring thread wakes up. But we should hit the same issue with "create" right, because we don't need to wait for anything.
2. don't wait 5s every time a device is not found. Instead, start with a small sleep timeout and increase it gradually until we hit the max timeout. In other words, instead of check; sleep(5); check; sleep(5); check; sleep(5); we could instead do something like: check; sleep(1); check; sleep(2); check; sleep(4); check; sleep(8)...
IIRC we do something similar when we're spawning a machine and waiting for an mdev, because it takes time before the sysfs entry for mdev is created and complete directory hierarchy is created for it, so libvirt has to wait for it. We wait for several rounds and then we fail if a cap was hit. Anyway, 5s is a long time, I can imagine this timing out in the management layer easily.
[1]$ time mdevctl list --defined 49bf2346-6747-4ad6-be5a-adc2f0a10b5c 0000:00:02.0 i915-GVTg_V5_8 manual 4fcd3666-e58a-4c63-969c-bd616a158c0d 0000:00:02.0 i915-GVTg_V5_8 manual 5c152919-3a34-4338-b7c9-532f73fa5440 0000:00:02.0 i915-GVTg_V5_4 manual
real 0m0.782s user 0m0.735s sys 0m0.056s
- I also prepared several adjustments to how we define the mdevctl wrappers + some test adjustments which I wanted to share with you, but I haven't tested those thoroughly since I was debugging why that XML couldn't be defined, I'll share it when we eliminate the underlying problem -> if you're wondering why I didn't just put it into the review, well, I figured sharing the code was more descriptive than if I used words
Would you like me to wait for this before sending a new series?
No need, I won't provide more comments at this point. My changes can be applied as a follow up patch/series anyway, so even if we don't make them part of your series immediately, we can always apply it later. I'll reply with the patch once I figure out why define isn't working for me properly. Regards, Erik
participants (3)
-
Alex Williamson
-
Erik Skultety
-
Jonathon Jongsma