On Thu, 17 Mar 2016 17:59:53 +0000
"Daniel P. Berrange" <berrange(a)redhat.com> wrote:
On Thu, Mar 17, 2016 at 11:52:14AM -0600, Alex Williamson wrote:
> On Thu, 17 Mar 2016 17:32:08 +0000
> "Daniel P. Berrange" <berrange(a)redhat.com> wrote:
>
> > On Tue, Mar 15, 2016 at 02:21:35PM -0400, Laine Stump wrote:
> > > On 03/15/2016 01:00 PM, Daniel P. Berrange wrote:
> > > >On Mon, Mar 14, 2016 at 03:41:48PM -0400, Laine Stump wrote:
> > > >>Suggested by Alex Williamson.
> > > >>
> > > >>If you plan to assign a GPU to a virtual machine, but that GPU
happens
> > > >>to be the host system console, you likely want it to start out
using
> > > >>the host driver (so that boot messages/etc will be displayed),
then
> > > >>later have the host driver replaced with vfio-pci for assignment
to
> > > >>the virtual machine.
> > > >>
> > > >>However, in at least some cases (e.g. Intel i915) once the device
has
> > > >>been detached from the host driver and attached to vfio-pci,
attempts
> > > >>to reattach to the host driver only lead to "grief" (ask
Alex for
> > > >>details). This means that simply using
"managed='yes'" in libvirt
> > > >>won't work.
> > > >>
> > > >>And if you set "managed='no'" in libvirt then
either you have to
> > > >>manually run virsh nodedev-detach prior to the first start of the
> > > >>guest, or you have to have a management application intelligent
enough
> > > >>to know that it should detach from the host driver, but never
reattach
> > > >>to it.
> > > >>
> > > >>This patch makes it simple/automatic to deal with such a case -
it
> > > >>adds a third "managed" mode for assigned PCI devices,
called
> > > >>"detach". It will detach ("unbind" in driver
parlance) the device from
> > > >>the host driver prior to assigning it to the guest, but when the
guest
> > > >>is finished with the device, will leave it bound to vfio-pci.
This
> > > >>allows re-using the device for another guest, without requiring
> > > >>initial out-of-band intervention to unbind the host driver.
> > > >You say that managed=yes causes pain upon re-attachment and that
> > > >apps should use managed=detach to avoid it, but how do management
> > > >apps know which devices are going to cause pain ? Libvirt isn't
> > > >providing any info on whether a particular device id needs to
> > > >use managed=yes vs managed=detach, and we don't want to be asking
> > > >the user to choose between modes in openstack/ovirt IMHO. I think
> > > >thats a fundamental problem with inventing a new value for managed
> > > >here.
> > >
> > > My suspicion is that in many/most cases users don't actually need for
the
> > > device to be re-bound to the host driver after the guest is finished with
> > > it, because they're only going to use the device to assign to a
different
> > > guest anyway. But because managed='yes' is what's supplied and
is the
> > > easiest way to get it setup for assignment to a guest, that's what
they use.
> > >
> > > As a matter of fact, all this extra churn of changing the driver back and
> > > forth for devices that are only actually used when they're bound to
vfio-pci
> > > just wastes time, and makes it more likely that libvirt and its users
will
> > > reveal and get caught up in the effects of some strange kernel driver
> > > loading/unloading bug (there was recently a bug reported like this;
> > > unfortunately the BZ record had customer info in it, so it's not
publicly
> > > accessible :-( )
> > >
> > > So beyond making this behavior available only when absolutely necessary,
I
> > > think it is useful in other cases, at the user's discretion (and as I
> > > implied above, I think that if they understood the function and the
> > > tradeoffs, most people would choose to use managed='detach' rather
than
> > > managed='yes')
> >
> > IIUC, in managed=yes mode we explicitly track whether the device was
> > originally attached to a host device driver. ie we only re-attach
> > the device to the host when guest shuts down, if it was attached to
> > the host at guest startup.
> >
> > We already have a virNodeDeviceDetach() API that can be used to
> > detach a device from the host driver explicitly.
> >
> > So applications can in fact already achieve what you describe in
> > terms of managed=detach, by simply calling virNodeDeviceDetach()
> > prior to starting the guest with cold plugged PCI devices / hotplugging
> > the PCI device.
> >
> > IOW, even if we think applications should be using managed=detach,
> > they can already do so via existing libvirt APIs.
>
> Agreed, that was never in doubt, but now you've required a management
> step before the VM can be started. We're basically requiring users to
> write their own scripting or modify kernel commandlines simply to avoid
> libvirt trying to rebind a device at shutdown. We're leaving it as the
> users' problem how to handle autostart VMs that prefer this behavior.
> It can already be done, the question is whether it's worthwhile to make
> this easier path to do it. Thanks,
I don't think it is a significant burden really. Apps which want this
blacklisted forever likely want to setup the modprobe blacklist anyway
to stop the initial bind at boot up and instead permanently reserve
the device. This stops the device being used at startup - eg if we
have a bunch of NICs to be given to guests, you don't want the host
OS to automatically configure them and give them IP addresses on the
host before we start guests. So pre-reserving devices at the host OS
level is really want you want todo with data center / cloud management
apps like oVirt / OpenStack at least. They could easily use the
virNodeDeviceDetach API at the time they decide to assign a device
to a guest though.
modprobe blacklist assumes that all devices managed by a given driver
are reserved for VM use. That's very often not the case. Even with
SR-IOV VFs, several vendors use the same driver for PF and VF, so
that's just a poor solution. For GPU assignment we often recommend
using pci-stub.ids on the kernel commandline to pre-load the pci-stub
driver with PCI vendor and device IDs to claim to prevent host drivers
from attaching, but that also assumes that you want to use everything
matching those IDs for a VM, which users will quickly find fault with.
Additionally, using either solution assumes that the device will be
left entirely alone otherwise, which is also not true. If I blacklist
i915 or using pci-stub.ids to make pci-stub claim it, then efifb or
vesafb is more than happy to make use of it, so it's actually cleaner
to let i915 grab the device and unbind it when ready. And of course
the issue of assuming that the device can go without drivers, which may
make your user run a headless system. This is really not the
simplistic issue that it may seem. Thanks,
Alex