On 6/28/23 7:22 PM, Jonathon Jongsma wrote:
On 6/28/23 3:40 AM, Boris Fiuczynski wrote:
> On 6/28/23 12:03 AM, Jonathon Jongsma wrote:
>> On 6/23/23 5:43 AM, Boris Fiuczynski wrote:
>>> Update the optional mdev attributes on the new created nodedev
>>> object as
>>> they otherwise would not get set until the next mdevctl update cycle.
>>>
>>> Resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=2143158
>>> Signed-off-by: Boris Fiuczynski <fiuczy(a)linux.ibm.com>
>>> ---
>>> src/node_device/node_device_driver.c | 18 ++++++++++++++++--
>>> 1 file changed, 16 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/src/node_device/node_device_driver.c
>>> b/src/node_device/node_device_driver.c
>>> index a2d0600560..5134d246f3 100644
>>> --- a/src/node_device/node_device_driver.c
>>> +++ b/src/node_device/node_device_driver.c
>>> @@ -847,6 +847,9 @@ static virNodeDevicePtr
>>> nodeDeviceCreateXMLMdev(virConnectPtr conn,
>>> virNodeDeviceDef *def)
>>> {
>>> + virNodeDeviceObj *obj;
>>> + virNodeDeviceDef *new_def;
>>> + virNodeDevicePtr device;
>>> g_autofree char *uuid = NULL;
>>> if (!def->parent) {
>>> @@ -864,8 +867,19 @@ nodeDeviceCreateXMLMdev(virConnectPtr conn,
>>> def->caps->data.mdev.uuid = g_steal_pointer(&uuid);
>>> }
>>> - return nodeDeviceFindNewMediatedDevice(conn,
>>> def->caps->data.mdev.uuid,
>>> - def->caps->data.mdev.parent_addr);
>>> + device = nodeDeviceFindNewMediatedDevice(conn,
>>> def->caps->data.mdev.uuid,
>>> + def->caps->data.mdev.parent_addr);
>>
>> So, this function waits up to 60s for the given mediated device to
>> show up in the list of devices known to the driver (which is stored
>> in driver->devs and gets populated by handlers that respond to udev
>> or mdevctl events)...
>>
>>> + /* check on def for attributes and try update */
>>> + if (def->caps->data.mdev.nattributes > 0) {
>>> + /* ignore failures as mdevctl updates will recover later */
>>> + if (!(obj = nodeDeviceObjFindByName(device->name)))
>>> + return device;
>>
>> If the device was found and the device has mdev attributes, you then
>> query the exact same list for a device with the name of the device
>> you just found...
>>
>>> + new_def = virNodeDeviceObjGetDef(obj);
>>> + nodeDeviceDefCopyFromMdevctl(new_def, def);
>>
>> If the device was found in the list (which it should be, because the
>> nodeDeviceFindNewMediatedDevice() function had already found it in
>> the list of devices), then we simply copy the user-supplied device
>> definition into the definition we received from the nodedev driver's
>> device list.
>
> Please let me know which procedure to use if the node device object
> which is missing the attributes is created by the udev ADD event which
> the method nodeDeviceFindNewMediatedDevice is waiting for to happen.
>
>>
>> This is just papering over the issue. By copying the user-supplied
>> definition into the system-queried definition, we're just assuming
>> that the device was created properly with the requested attributes
>> rather than verifying that it was created correctly.
>
> I expect virMdevctlCreate to fail if attributes are not properly set
> on the transient mdev created by mdevctl. If that happens this code is
> never run anyway.
As far as I know, that might be driver-dependent behavior.
But regardless, you could sort of make the same case for all of the
other non-attribute information as well. Why don't we just copy the
entire user-supplied definition into our internal device list instead of
querying udev/mdevctl if the device creation was successful? I would
argue that we want our internal device list to accurately represent the
current state of the system, not what we expect the state of the system
to be.
> And speaking about verification:
> mdevctl only returns attributes if a callout script is present
> supporting the mdev type and has a method for event "get" and action
> "attributes" implemented.
> Therefore no "reliable" verification is possible! The only reliable
> source of information is the outcome of virMdevctlCreate.
That's true, but if you copy the user-supplied attributes directly into
the internal device list for a transient device that doesn't implement
the 'get-attributes' callout, you will lose those attributes when the
libvirt daemon restarts. That's also problematic.
Good that we seem to agree on validation is not possible.
Anyway I do not see that more or less problematic as not coping the
attributes and suddenly they appear when the noddev daemon restarts.
That is how it is today, see
https://bugzilla.redhat.com/show_bug.cgi?id=2143158
>> The root issue seems to be that when a transient mediated device is
>> created, our udev event handler runs and adds the device to the
>> device list, but the mdevctl handler is never run to query additional
>> mdev-related information about the device. mdevctl is only re-queried
>> when the config files for defined devices are changed or when a new
>> physical mdev parent device is added or removed.
>
> Actually that was the first place I looked at.
> There is already an mdevctl update triggered in method udevAddOneDeive
> if the device is an mdev parent device. This happens usually on either
> a vfio* module load or a device being bound to a vfio* device driver.
> Introducing an update on every mdev device added would create an high
> impact especially as the udevAddOneDeive is used in multiple contexts,
> 1) add and change udev event
> 2) move usev event
> 3) processing the each entry in a udev process device list, which is
> usually run when the node device daemon initializes. In this case a
> device unrelated/unconditional mdevctl update is triggered once
> already (nodeStateInitializedEnumerate).
>
> In cases 1) and 2) the context of what triggered the add udev event is
> unknown and we are only missing the update if a transient mdev device
> with attributes is being created. Therefore I choose the "shortcut" of
> added these where the context is known. Please also note that the
> callout script restriction (mentioned above) also applies here.
Can you quantify what you mean by high-impact here? How many mdevs are
you talking about on a single host? hundreds? thousands? How regularly
are they being started and stopped?
Let's assume you have a S390 host with AP setup for passthrough using
vfio_ap. Assuming of the 100+ guests 20 get a crypto device assigned
this results in 20 vfio-ap mdevs present on the host.
We also have tests which define up to 1000 guests and using the same
ration would result in 200 vfio-ap mdevs.
Assuming we would trigger an mdevctl update in udevAddOneDevice when it
is called for an mdev:
If virtnodedevd is already initialized and the add udev event is
received it would result in a single mdevctl update.
The virtnodedevd is stopped after 120 seconds unless used.
Starting virtnodedevd up would cause a mdevctl update for every existing
mdev. That is the impact which I am worried about.
I will try to find a way to just trigger the mdevctl update on the udev
event add pattern which is not reused for the daemon initialization.
Jonathon
--
Mit freundlichen Grüßen/Kind regards
Boris Fiuczynski
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Gregor Pillen
Geschäftsführung: David Faller
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294