Re: [libvirt] RFC: Creating mediated devices with libvirt

Saturday, 17 June 2017

On Fri, Jun 16, 2017 at 06:11:17PM +0100, Daniel P. Berrange wrote:
...
 On Fri, Jun 16, 2017 at 11:02:55AM -0600, Alex Williamson wrote:
 > On Fri, 16 Jun 2017 11:32:04 -0400
 > Laine Stump <laine(a)redhat.com&gt; wrote:
 > 
 > > On 06/15/2017 02:42 PM, Alex Williamson wrote:
 > > > On Thu, 15 Jun 2017 09:33:01 +0100
 > > > "Daniel P. Berrange" <berrange(a)redhat.com&gt; wrote:
 > > >   
 > > >> On Thu, Jun 15, 2017 at 12:06:43AM +0200, Erik Skultety wrote:  
 > > >>> Hi all,
 > > >>>
 > > >>> so there's been an off-list discussion about finally
implementing creation of
 > > >>> mediated devices with libvirt and it's more than desired to
get as many opinions
 > > >>> on that as possible, so please do share your ideas. This did come
up already as
 > > >>> part of some older threads ([1] for example), so this will be a
respin of the
 > > >>> discussions. Long story short, we decided to put device creation
off and focus
 > > >>> on the introduction of the framework as such first and build upon
that later,
 > > >>> i.e. now.
 > > >>>
 > > >>> [1]
https://www.redhat.com/archives/libvir-list/2017-February/msg00177.html
 > > >>>
 > > >>> ========================================
 > > >>> PART 1: NODEDEV-DRIVER
 > > >>> ========================================
 > > >>>
 > > >>> API-wise, device creation through the nodedev driver should be
pretty
 > > >>> straightforward and without any issues, since virNodeDevCreateXML
takes an XML
 > > >>> and does support flags. Looking at the current device XML:
 > > >>>
 > > >>> <device>
 > > >>>  
<name>mdev_0cce8709_0640_46ef_bd14_962c7f73cc6f</name>
 > > >>>  
<path>/sys/devices/pci0000:00/.../0cce8709-0640-46ef-bd14-962c7f73cc6f</path>
 > > >>>   <parent>pci_0000_03_00_0</parent>
 > > >>>   <driver>
 > > >>>     <name>vfio_mdev</name>
 > > >>>   </driver>
 > > >>>   <capability type='mdev'>
 > > >>>     <type id='nvidia-11'/>
 > > >>>     <iommuGroup number='13'/>
 > > >>>     <uuid>UUID<uuid> <!-- optional enhancement, see
below -->
 > > >>>   </capability>
 > > >>> </device>
 > > >>>
 > > >>> We can ignore <path>,<driver>,<iommugroup>
elements, since these are useless
 > > >>> during creation. We also cannot use <name> since we
don't support arbitrary
 > > >>> names and we also can't rely on users providing a name in
correct form which we
 > > >>> would need to further parse in order to get the UUID.
 > > >>> So since the only thing missing to successfully use create an mdev
using XML is
 > > >>> the UUID (if user doesn't want it to be generated
automatically), how about
 > > >>> having a <uuid> subelement under <capability> just
like PCIs have <domain> and
 > > >>> friends, USBs have <bus> & <device>, interfaces
have <address> to uniquely
 > > >>> identify the device even if the name itself is unique.
 > > >>> Removal of a device should work as well, although we might want
to
 > > >>> consider creating a *Flags version of the API.
 > > >>>
 > > >>> =============================================================
 > > >>> PART 2: DOMAIN XML & DEVICE AUTO-CREATION, NO POLICY
INVOLVED!
 > > >>> =============================================================
 > > >>>
 > > >>> There were some doubts about auto-creation mentioned in [1],
although they
 > > >>> weren't specified further. So hopefully, we'll get further
in the discussion
 > > >>> this time.
 > > >>>
 > > >>> From my perspective there are two main reasons/benefits to that:
 > > >>>
 > > >>> 1) Convenience
 > > >>> For apps like virt-manager, user will want to add a host device
transparently,
 > > >>> "hey libvirt, I want an mdev assigned to my VM, can you do
that". Even for
 > > >>> higher management apps, like oVirt, even they might not care about
the parent
 > > >>> device at all times and considering that they would need to
enumerate the
 > > >>> parents, pick one, create the device XML and pass it to the
nodedev driver, IMHO
 > > >>> it would actually	 be easier and faster to just do it directly
through sysfs,
 > > >>> bypassing libvirt once again....    
 > > >>
 > > >> The convenience only works if the policy we've provided in libvirt
actually
 > > >> matches the policy the application wants. I think it is quite likely
that with
 > > >> cloud the mdevs will be created out of band from the domain startup
process.
 > > >> It is possible the app will just have a fixed set of mdevs pre-created
when
 > > >> the host starts up. Or that the mgmt app wants the domain startup
process to
 > > >> be a two phase setup, where it first allocates the resources needed,
and later
 > > >> then tries to start the guest. This is why I keep saying that putting
this kind
 > > >> of "convenient" policy in libvirt is a bad idea - it is
essentially just putting
 > > >> a bit of virt-manager code into libvirt - more advanced apps will need
more
 > > >> flexibility in this area.
 > > >>  
 > > >>> 2) Future domain migration
 > > >>> Suppose now that the mdev backing physical devices support state
dump and
 > > >>> reload. Chances are, that the corresponding mdev doesn't even
exist or has a
 > > >>> different UUID on the destination, so libvirt would do its best to
handle this
 > > >>> before the domain could be resumed.    
 > > >>
 > > >> This is not an unusual scenario - there are already many other parts
of the
 > > >> device backend config that need to change prior to migration,
especially for
 > > >> anything related to host devices, so apps already have support for
doing
 > > >> this, which is more flexible & convenient becasue it doesn't
tie creation of
 > > >> the mdevs to running of the migrate command.
 > > >>
 > > >> IOW, I'm still against adding any kind of automatic creation
policy for
 > > >> mdevs in libvirt. Just provide the node device API support.  
 > > > 
 > > > I'm not super clear on the extent of what you're against here, is
it
 > > > all forms of device creation or only a placement policy?  Are you
 > > > against any form of having the XML specify the non-instantiated mdev
 > > > that it wants?  We've clearly made an important step with libvirt
 > > > supporting pre-created mdevs, but as a user of that support I find it
 > > > incredibly tedious.  I typically do a dumpxml, copy out the UUID,
 > > > wonder what type of device it might have been last time, create it,
 > > > start the domain and cross my fingers. Pre-creating mdev devices is not
 > > > really practical, I might have use cases where I want multiple low-end
 > > > mdev devices and another where I have a single high-end device.  Those
 > > > cannot exist at the same time. Requiring extensive higher level
 > > > management tools is not really an option either, I'm not going to
 > > > install oVirt on my desktop/laptop just so I can launch a GVT-g VM once
 > > > in a while (no offense).  So I really hope that libvirt itself can
 > > > provide some degree of mdev creation.  
 > > 
 > > 
 > > Maybe there can be something in between the "all child devices must be
 > > pre-created" and "a child device will be automatically created on an
 > > automatically chosen parent device as needed". In particular, we could
 > > forego the "automatically chosen parent device" part of that. The
guest
 > > configuration could simply contain the PCI address of the parent and the
 > > desired type of the child. If we did this there wouldn't be any policy
 > > decision to make - all the variables are determined - but it would make
 > > life easier for people running small hosts (i.e. no oVirt/Openstack, a
 > > single mdev parent device). Openstack and oVirt (and whoever) would of
 > > course be free to ignore this and pre-create pools of devices themselves
 > > in the name of more precise control and better predictability (just as,
 > > for example, OpenStack ignores libvirt's "pools of hostdev network
 > > devices" and instead manages the pool of devices itself and uses
 > > <interface type='hostdev'> directly).
 > 
 > This seems not that substantially different from managed='yes' on a
 > vfio hostdev to me.  It makes the device available to the VM before it
 > starts and returns it after.  In one case that's switching the binding
 > on an existing device, in another it's creating and removing.  Once
 > again, I can't tell from Dan's response if he's opposed to this entire
 > idea or just the aspects where libvirt needs to impose a policy
 > decision.  For me personally, the functionality difference is quite
 > substantial.

 I'm fine with libvirt having APIs in the node device APIs to enable
 create/delete with libvirt, as well as using managed=yes in the same
 manner that we do for regular PCI devices (the bind/unbind to vfio
 or pci-back) 
Oh, and we really need to fix the big missing feature in the node
device APIs of persistent, inactive configs. eg we should be able
to record XML configs of mdevs (and npiv devices too), in /etc/libvirt
so they persist across reboots, and can be setup for auto-start on
boot too.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] RFC: Creating mediated devices with libvirt