Re: [libvirt] Expose vfio device display/migration to libvirt and above, was Re: [PATCH 0/3] sample: vfio mdev display devices.

Friday, 4 May 2018

On Thu, May 03, 2018 at 12:58:00PM -0600, Alex Williamson wrote:
...
 Hi,

 The previous discussion hasn't produced results, so let's start over.
 Here's the situation:

  - We currently have kernel and QEMU support for the QEMU vfio-pci
    display option.

  - The default for this option is 'auto', so the device will attempt to
    generate a display if the underlying device supports it, currently
    only GVTg and some future release of NVIDIA vGPU (plus Gerd's
    sample mdpy and mbochs).

  - The display option is implemented via two different mechanism, a
    vfio region (NVIDIA, mdpy) or a dma-buf (GVTg, mbochs).

  - Displays using dma-buf require OpenGL support, displays making
    use of region support do not.

  - Enabling OpenGL support requires specific VM configurations, which
    libvirt /may/ want to facilitate.

  - Probing display support for a given device is complicated by the
    fact that GVTg and NVIDIA both impose requirements on the process
    opening the device file descriptor through the vfio API:

    - GVTg requires a KVM association or will fail to allow the device
      to be opened. 
How exactly is this association checked?

...

    - NVIDIA requires that their vgpu-manager process can locate a UUID
      for the VM via the process commandline.

    - These are both horrible impositions and prevent libvirt from
      simply probing the device itself. 
So I feel like we're trying to solve a problem coming from one layer on a bunch
of different layers which inherently prevents us to produce a viable long term
solution without dragging a significant amount of hacky nasty code and it is
not the missing sysfs attributes I have in mind. Why does NVIDIA's vgpu-manager
need to locate a UUID of a qemu VM? I assume that's to prevent multiple VM
instances trying to use the same mdev device, in which case can't the
vgpu-manager track references to how many "open" and "close" calls
have been
made to the same device? This is just from a layman's perspective, but it would
allow the following:
    - when libvirt starts, it initializes all its drivers (let's focus on
      QEMU)
    - as part of this initialization, libvirt probes QEMU for capabilities and
      caches them in order to use them when spawning VMs

Now, if we (theoretically) can settle on easing the restrictions Alex has
mentioned, we in fact could introduce a QMP command to probe these devices and
provide libvirt with useful information at that point in time. Of course, since
the 3rd party vendor is "de-coupled" from qemu, libvirt would have no way to
find out that the driver has changed in the meantime, thus still using the old
information we gathered, ergo potentially causing the QEMU process to fail
eventually. But then again, there's very often a strong recommendation to reboot
your host after a driver update, especially in NVIDIA's case, which means this
fact wouldn't matter. However, there's also a significant drawback to my
proposal which probably renders it completely useless (but we can continue from
there...) and that is the devices would either have to be present already (not
an option) or QEMU would need to be enhanced in a way, that it would create a
dummy device during QMP probing, open it, collect the information libvirt
needs, close it and remove it. If the driver doesn't change in the meantime,
this should be sufficient for a VM to be successfully instantiated with a
display, right?

...

 The above has pressed the need for investigating some sort of
 alternative API through which libvirt might introspect a vfio device
 and with vfio device migration on the horizon, it's natural that some
 sort of support for migration state compatibility for the device need be
 considered as a second user of such an API.  However, we currently have
 no concept of migration compatibility on a per-device level as there
 are no migratable devices that live outside of the QEMU code base.
 It's therefore assumed that per device migration compatibility is
 encompassed by the versioned machine type for the overall VM.  We need
 participation all the way to the top of the VM management stack to
 resolve this issue and it's dragging down the (possibly) more simple
 question of how do we resolve the display situation.  Therefore I'm
 looking for alternatives for display that work within what we have
 available to us at the moment.

 Erik Skultety, who initially raised the display question, has identified
 one possible solution, which is to simply make the display configuration
 the user's problem (apologies if I've misinterpreted Erik).  I believe
 this would work something like:

  - libvirt identifies a version of QEMU that includes 'display' support
    for vfio-pci devices and defaults to adding display=off for every
    vfio-pci device [have we chosen the wrong default (auto) in QEMU?]. 
...
From libvirt's POV, having a new XML attribute display to the host
device type mdev should with a default value 'off', potentially extending
this to 'auto'
once we have enough information to base our decision on. We'll need to combine
this with a new attribute value for the <video> element that would prevent
adding an emulated VGA any time <graphics> (spice,VNC) is requested, but that's
something we'd need to do anyway, so I'm just mentioning it.

...

  - New XML support would allow a user to enable display support on the
    vfio device.

  - Resolving any OpenGL dependencies of that change would be left to
    the user.

 A nice aspect of this is that policy decisions are left to the user and
 clearly no interface changes are necessary, perhaps with the exception
 of deciding whether we've made the wrong default choice for vfio-pci
 devices in QEMU. 
It's a common practice that we offload decisions like this to users
(including management layer, i.e. openstack, ovirt).

...

 On the other hand, if we do want to give libvirt a mechanism to probe
 the display support for a device, we can make a simplified QEMU
 instance be the mechanism through which we do that.  For example the
 script[1] can be provided with either a PCI device or sysfs path to an
 mdev device and run a minimal VM instance meeting the requirements of
 both GVTg and NVIDIA to report the display support and GL requirements
 for a device.  There are clearly some unrefined and atrocious bits of
 this script, but it's only a proof of concept, the process management
 can be improved and we can decide whether we want to provide qmp
 mechanism to introspect the device rather than grep'ing error
 messages.  The goal is simply to show that we could choose to embrace 
if not for anything else, error messages change, so that's not a way, QMP is a
much more standardized approach, but then again, as I mentioned above, at the
moment, libvirt probes for capabilities during its start.

...
 QEMU and use it not as a VM, but simply a tool for poking at a
device
 given the restrictions the mdev vendor drivers have already imposed.

 So I think the question bounces back to libvirt, does libvirt want
 enough information about the display requirements for a given device to
 automatically attempt to add GL support for it, effectively a policy of
 'if it's supported try to enable it', or should we leave well enough
 alone and let the user choose to enable it?

 Maybe some guiding questions:

  - Will dma-buf always require GL support?

  - Does GL support limit our ability to have a display over a remote
    connection?

  - Do region-based displays also work with GL support, even if not
    required? 
Yeah, these are IMHO really tough to answer because we can't really predict the
future, which again favours a new libvirt attribute more. Even if we decided
that we truly need a dummy VM as tool for libvirt to probe this info, I still
feel like this should be done up in the virtualization stack and libvirt again
would be just a tool to do stuff the way it's told to do it. But I'd very much
like to hear Dan's opinion, since beside libvirt he can cover openstack too.

Regards,
Erik

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] Expose vfio device display/migration to libvirt and above, was Re: [PATCH 0/3] sample: vfio mdev display devices.