On Wed, Feb 24, 2021 at 04:48:34PM -0600, Jonathon Jongsma wrote:
On Mon, 22 Feb 2021 11:08:12 +0100
Erik Skultety <eskultet(a)redhat.com> wrote:
> On Wed, Feb 03, 2021 at 11:38:44AM -0600, Jonathon Jongsma wrote:
> > This patch series follows the previously-merged series which added
> > support for transient mediated devices. This series expands mdev
> > support to include persistent device definitions. Again, it relies
> > on mdevctl as the backend.
> >
> > It follows the common libvirt pattern of APIs by adding the
> > following new APIs for node devices:
> > - virNodeDeviceDefineXML() - defines a persistent device
> > - virNodeDeviceUndefine() - undefines a persistent device
> > - virNodeDeviceCreate() - starts a previously-defined device
>
> So, I tested the patches on Friday and have a few notes:
> - all of the driver implementations mentioned ^above need to pass an
> error buffer to the respective mdevctl wrapper, because the generic
> error "Unable to XZY mediated device" doesn't help at all. Kind of
> like with QEMU errors, we need to extract it from an error buffer
> (you already have input and output, so just add another one) --> this
> relates to patches 14,18,21
>
> - trying to undefine an active mdev reports an error in libvirt
> --> this is both inconsistent with the rest of libvirt's
> subsystems and there's also no reason for it since mdevctl supports it
> --> we need to support transient devices as well
>
> - trying to define the following XML hangs for 5s and then fails:
> <device>
> <parent>pci_0000_06_00_0</parent>
> <driver>
> <name>vfio_mdev</name>
> </driver>
> <capability type='mdev'>
> <type id='nvidia-11'/>
> <uuid>d41787d2-2e0e-4021-8ed2-b6f1ef305a9f</uuid>
> </capability>
> </device>
I assume that you have a parent device that supports creating an mdev
of this type? In other words, this was expected to succeed, right?
Yeah, there's is a real NVIDIA card backing it.
>
> -> I debugged this on Friday evening and for some reason
> driver->devs hash table of devices was NULL going through the
> nodeDeviceFindNewMediatedDevice call, but I haven't managed to figure
> out why it was NULL, listing devices worked just fine
So, I often get a 5s delay when trying to define a new mdev. But it
always succeeds after the delay.
The reason for the delay is that the device is generally not in our
device list the first time we call nodeDeviceFindNewMediatedDevice().
It's usually not in our device list because the device is only added
to the driver's device list after we get a notification from mdevctl
(via monitoring the mdevctl config directory) and then we re-query
mdevctl for the list of defined devices. Because mdevctl is not
blazingly fast[1], and because we do the querying in a separate thread,
the new device has usually not been discovered when we first call
nodeDeviceFindNewMediatedDevice(). So we sleep for 5s before trying
again.
Yeah, I suspected that we didn't get notified quickly enough, but the weird
thing was that "driver->devs" was NULL, which cannot be, the list
couldn't have
disappeared. Actually, what happened in my case was that even after 5s there
was nothing so the code assumed the event would never come and failed, no idea
why.
For udev devices, this 5s delay is usually not hit because
nodeDeviceFindNewDevice() first calls 'udevadm settle' before
checking the device list. This waits just long enough to ensure that
pending udev events are handled. Unfortunately we don't have a similar
"wait just long enough" command for mdevctl, so we almost always hit
the 5s retry timeout.
There are a couple of potential ways to avoid hitting this 5s timeout:
1. directly add a placeholder device to the driver's device list
immediately after calling 'mdevctl define' so that it is guaranteed to
be in the device list when we call nodeDeviceFindNewMediatedDevice().
We can then update that device definition with additional info
when mdevctl is eventually queried.
Hmm, ^this would work for define. Mdevctl has returned the UUID, so we have all
the information to start the device, so we don't care when the monitoring
thread wakes up. But we should hit the same issue with "create" right, because
we don't need to wait for anything.
2. don't wait 5s every time a device is not found. Instead, start
with
a small sleep timeout and increase it gradually until we hit the max
timeout. In other words, instead of check; sleep(5); check; sleep(5);
check; sleep(5); we could instead do something like: check; sleep(1);
check; sleep(2); check; sleep(4); check; sleep(8)...
IIRC we do something similar when we're spawning a machine and waiting for an
mdev, because it takes time before the sysfs entry for mdev is created and
complete directory hierarchy is created for it, so libvirt has to wait for it.
We wait for several rounds and then we fail if a cap was hit.
Anyway, 5s is a long time, I can imagine this timing out in the management
layer easily.
[1]$ time mdevctl list --defined
49bf2346-6747-4ad6-be5a-adc2f0a10b5c 0000:00:02.0 i915-GVTg_V5_8 manual
4fcd3666-e58a-4c63-969c-bd616a158c0d 0000:00:02.0 i915-GVTg_V5_8 manual
5c152919-3a34-4338-b7c9-532f73fa5440 0000:00:02.0 i915-GVTg_V5_4 manual
real 0m0.782s
user 0m0.735s
sys 0m0.056s
>
> - I also prepared several adjustments to how we define the mdevctl
> wrappers + some test adjustments which I wanted to share with you,
> but I haven't tested those thoroughly since I was debugging why that
> XML couldn't be defined, I'll share it when we eliminate the
> underlying problem -> if you're wondering why I didn't just put it
> into the review, well, I figured sharing the code was more
> descriptive than if I used words
Would you like me to wait for this before sending a new series?
No need, I won't provide more comments at this point. My changes can be applied
as a follow up patch/series anyway, so even if we don't make them part of your
series immediately, we can always apply it later.
I'll reply with the patch once I figure out why define isn't working for me
properly.
Regards,
Erik