On Fri, 16 Jun 2017 18:11:17 +0100
"Daniel P. Berrange" <berrange(a)redhat.com> wrote:
On Fri, Jun 16, 2017 at 11:02:55AM -0600, Alex Williamson wrote:
> On Fri, 16 Jun 2017 11:32:04 -0400
> Laine Stump <laine(a)redhat.com> wrote:
>
> > On 06/15/2017 02:42 PM, Alex Williamson wrote:
> > > On Thu, 15 Jun 2017 09:33:01 +0100
> > > "Daniel P. Berrange" <berrange(a)redhat.com> wrote:
> > >
> > >> On Thu, Jun 15, 2017 at 12:06:43AM +0200, Erik Skultety wrote:
> > >>> Hi all,
> > >>>
> > >>> so there's been an off-list discussion about finally
implementing creation of
> > >>> mediated devices with libvirt and it's more than desired to
get as many opinions
> > >>> on that as possible, so please do share your ideas. This did come
up already as
> > >>> part of some older threads ([1] for example), so this will be a
respin of the
> > >>> discussions. Long story short, we decided to put device creation
off and focus
> > >>> on the introduction of the framework as such first and build upon
that later,
> > >>> i.e. now.
> > >>>
> > >>> [1]
https://www.redhat.com/archives/libvir-list/2017-February/msg00177.html
> > >>>
> > >>> ========================================
> > >>> PART 1: NODEDEV-DRIVER
> > >>> ========================================
> > >>>
> > >>> API-wise, device creation through the nodedev driver should be
pretty
> > >>> straightforward and without any issues, since virNodeDevCreateXML
takes an XML
> > >>> and does support flags. Looking at the current device XML:
> > >>>
> > >>> <device>
> > >>>
<name>mdev_0cce8709_0640_46ef_bd14_962c7f73cc6f</name>
> > >>>
<path>/sys/devices/pci0000:00/.../0cce8709-0640-46ef-bd14-962c7f73cc6f</path>
> > >>> <parent>pci_0000_03_00_0</parent>
> > >>> <driver>
> > >>> <name>vfio_mdev</name>
> > >>> </driver>
> > >>> <capability type='mdev'>
> > >>> <type id='nvidia-11'/>
> > >>> <iommuGroup number='13'/>
> > >>> <uuid>UUID<uuid> <!-- optional enhancement, see
below -->
> > >>> </capability>
> > >>> </device>
> > >>>
> > >>> We can ignore <path>,<driver>,<iommugroup>
elements, since these are useless
> > >>> during creation. We also cannot use <name> since we
don't support arbitrary
> > >>> names and we also can't rely on users providing a name in
correct form which we
> > >>> would need to further parse in order to get the UUID.
> > >>> So since the only thing missing to successfully use create an mdev
using XML is
> > >>> the UUID (if user doesn't want it to be generated
automatically), how about
> > >>> having a <uuid> subelement under <capability> just
like PCIs have <domain> and
> > >>> friends, USBs have <bus> & <device>, interfaces
have <address> to uniquely
> > >>> identify the device even if the name itself is unique.
> > >>> Removal of a device should work as well, although we might want
to
> > >>> consider creating a *Flags version of the API.
> > >>>
> > >>> =============================================================
> > >>> PART 2: DOMAIN XML & DEVICE AUTO-CREATION, NO POLICY
INVOLVED!
> > >>> =============================================================
> > >>>
> > >>> There were some doubts about auto-creation mentioned in [1],
although they
> > >>> weren't specified further. So hopefully, we'll get further
in the discussion
> > >>> this time.
> > >>>
> > >>> From my perspective there are two main reasons/benefits to that:
> > >>>
> > >>> 1) Convenience
> > >>> For apps like virt-manager, user will want to add a host device
transparently,
> > >>> "hey libvirt, I want an mdev assigned to my VM, can you do
that". Even for
> > >>> higher management apps, like oVirt, even they might not care about
the parent
> > >>> device at all times and considering that they would need to
enumerate the
> > >>> parents, pick one, create the device XML and pass it to the
nodedev driver, IMHO
> > >>> it would actually be easier and faster to just do it directly
through sysfs,
> > >>> bypassing libvirt once again....
> > >>
> > >> The convenience only works if the policy we've provided in libvirt
actually
> > >> matches the policy the application wants. I think it is quite likely
that with
> > >> cloud the mdevs will be created out of band from the domain startup
process.
> > >> It is possible the app will just have a fixed set of mdevs pre-created
when
> > >> the host starts up. Or that the mgmt app wants the domain startup
process to
> > >> be a two phase setup, where it first allocates the resources needed,
and later
> > >> then tries to start the guest. This is why I keep saying that putting
this kind
> > >> of "convenient" policy in libvirt is a bad idea - it is
essentially just putting
> > >> a bit of virt-manager code into libvirt - more advanced apps will need
more
> > >> flexibility in this area.
> > >>
> > >>> 2) Future domain migration
> > >>> Suppose now that the mdev backing physical devices support state
dump and
> > >>> reload. Chances are, that the corresponding mdev doesn't even
exist or has a
> > >>> different UUID on the destination, so libvirt would do its best to
handle this
> > >>> before the domain could be resumed.
> > >>
> > >> This is not an unusual scenario - there are already many other parts
of the
> > >> device backend config that need to change prior to migration,
especially for
> > >> anything related to host devices, so apps already have support for
doing
> > >> this, which is more flexible & convenient becasue it doesn't
tie creation of
> > >> the mdevs to running of the migrate command.
> > >>
> > >> IOW, I'm still against adding any kind of automatic creation
policy for
> > >> mdevs in libvirt. Just provide the node device API support.
> > >
> > > I'm not super clear on the extent of what you're against here, is
it
> > > all forms of device creation or only a placement policy? Are you
> > > against any form of having the XML specify the non-instantiated mdev
> > > that it wants? We've clearly made an important step with libvirt
> > > supporting pre-created mdevs, but as a user of that support I find it
> > > incredibly tedious. I typically do a dumpxml, copy out the UUID,
> > > wonder what type of device it might have been last time, create it,
> > > start the domain and cross my fingers. Pre-creating mdev devices is not
> > > really practical, I might have use cases where I want multiple low-end
> > > mdev devices and another where I have a single high-end device. Those
> > > cannot exist at the same time. Requiring extensive higher level
> > > management tools is not really an option either, I'm not going to
> > > install oVirt on my desktop/laptop just so I can launch a GVT-g VM once
> > > in a while (no offense). So I really hope that libvirt itself can
> > > provide some degree of mdev creation.
> >
> >
> > Maybe there can be something in between the "all child devices must be
> > pre-created" and "a child device will be automatically created on an
> > automatically chosen parent device as needed". In particular, we could
> > forego the "automatically chosen parent device" part of that. The
guest
> > configuration could simply contain the PCI address of the parent and the
> > desired type of the child. If we did this there wouldn't be any policy
> > decision to make - all the variables are determined - but it would make
> > life easier for people running small hosts (i.e. no oVirt/Openstack, a
> > single mdev parent device). Openstack and oVirt (and whoever) would of
> > course be free to ignore this and pre-create pools of devices themselves
> > in the name of more precise control and better predictability (just as,
> > for example, OpenStack ignores libvirt's "pools of hostdev network
> > devices" and instead manages the pool of devices itself and uses
> > <interface type='hostdev'> directly).
>
> This seems not that substantially different from managed='yes' on a
> vfio hostdev to me. It makes the device available to the VM before it
> starts and returns it after. In one case that's switching the binding
> on an existing device, in another it's creating and removing. Once
> again, I can't tell from Dan's response if he's opposed to this entire
> idea or just the aspects where libvirt needs to impose a policy
> decision. For me personally, the functionality difference is quite
> substantial.
I'm fine with libvirt having APIs in the node device APIs to enable
create/delete with libvirt, as well as using managed=yes in the same
manner that we do for regular PCI devices (the bind/unbind to vfio
or pci-back)
I'm only against the creation/deletion of mdevs, as a side effect of
starting/stopping the guest.
But this is exactly the useful case, and as Laine describes above can
be done without any policy decisions on the part of libvirt. The XML
defines a parent device and mdev type, libvirt tries to create it, just
as it might a tap device into a bridge, either it works and the VM is
started or it doesn't and we get an error. libvirt doesn't require tap
devices to exist prior to the VM starting. Thanks,
Alex