On 18/03/16 10:15 +0000, Daniel P. Berrange wrote:
On Thu, Mar 17, 2016 at 05:37:28PM -0400, Laine Stump wrote:
> On 03/17/2016 02:32 PM, Daniel P. Berrange wrote:
> >On Thu, Mar 17, 2016 at 12:18:49PM -0600, Alex Williamson wrote:
> >>On Thu, 17 Mar 2016 17:59:53 +0000
> >>"Daniel P. Berrange" <berrange(a)redhat.com> wrote:
> >>
> >>>I don't think it is a significant burden really. Apps which want
this
> >>>blacklisted forever likely want to setup the modprobe blacklist anyway
> >>>to stop the initial bind at boot up and instead permanently reserve
> >>>the device. This stops the device being used at startup - eg if we
> >>>have a bunch of NICs to be given to guests, you don't want the host
> >>>OS to automatically configure them and give them IP addresses on the
> >>>host before we start guests. So pre-reserving devices at the host OS
> >>>level is really want you want todo with data center / cloud management
> >>>apps like oVirt / OpenStack at least. They could easily use the
> >>>virNodeDeviceDetach API at the time they decide to assign a device
> >>>to a guest though.
> >>modprobe blacklist assumes that all devices managed by a given driver
> >>are reserved for VM use. That's very often not the case. Even with
> >>SR-IOV VFs, several vendors use the same driver for PF and VF, so
> >>that's just a poor solution. For GPU assignment we often recommend
> >>using pci-stub.ids on the kernel commandline to pre-load the pci-stub
> >>driver with PCI vendor and device IDs to claim to prevent host drivers
> >>from attaching, but that also assumes that you want to use everything
> >>matching those IDs for a VM, which users will quickly find fault with.
> >>Additionally, using either solution assumes that the device will be
> >>left entirely alone otherwise, which is also not true. If I blacklist
> >>i915 or using pci-stub.ids to make pci-stub claim it, then efifb or
> >>vesafb is more than happy to make use of it, so it's actually cleaner
> >>to let i915 grab the device and unbind it when ready. And of course
> >>the issue of assuming that the device can go without drivers, which may
> >>make your user run a headless system. This is really not the
> >>simplistic issue that it may seem. Thanks,
> >The issues you describe point towards the need for a better blacklisting
> >facility at the OS level IMHO. eg a way to tell the kernel module to
> >not automatically bind drivers for devices with a particular device ID.
>
> To paraphrase an old saying: this is a nail, and libvirt is my hammer :-) My
> first response to Alex when he asked about this feature was to ask if he
> couldn't configure it somehow outside libvirt, and his response (similar to
> above) made it fairly plain that you really can't, at least not without
> adding custom startup scripts. Since the barrier to entry for the kernel is
> much higher than for libvirt, this seems like a way to actually get
> something that works.
Looking at this from the POV of openstack, we would want to mark devices as
detached from host independantly of assigning them to guests, since we build
up a whitelist of assignable devices at initial startup.
So I'd expect OpenStack would simply call virNodeDeviceDetach() for each
device in its whitelist at startup. If we consider how we deal with guest
domain hotplug, we have the concept of VIR_DOMAIN_MODIFY_LIVE vs CONFIG.
If there were a way to blacklist devices based on address, then it would
be natural to extend libvirt so that you could do
virNodeDeviceDetach(dev, VIR_DOMAIN_MODIFY_CONFIG);
to ensure it was permanently disabled from attachment to host drivers,
instead of just for the current point in time. This is using libvirt as
a hammer, but we still need underlying OS support in some manner to
actually implement it.
Joining this discussion from oVirt's perspective. oVirt uses managed="no"
mode. What we do now is using virNodeDeviceDetachFlags() when VM is
started, and virNodeDeviceReAttach() when it's destroyed. This is because
we handle permissions of /dev/vfio/* ourselves and allows for finer
granularity (e.g. skipping PCI_HEADER_TYPE 0 devices or
hot(un)plugging a device).
To deal with GPUs or any kind of device with problematic driver, we
currently advise users to use pci-stub.ids and plan to make this
functionality available from our UI. We have to modify kernel
command line anyway, as user's modifications of it are not supported
and we need to enable IOMMU and possibly expose workarounds for
machines with issues (allow_unsafe_interrupts, pci=realloc for
SR-IOV).
After reading this thread, we are considering not explicitly reattaching
the devices. If user wants to reclaim a device, it will be possible to
reattach it explicitly from the UI.
Due to system configuration mentioned above, I don't believe there is
a need for managed="detach" in management applications. It should be
management application's developer that decides whether the devices
will be detached on boot or at VM start, and users decision if they require
reattaching back later.
That being said, I am not sure if people using libvirt directly expect
to have devices "reserved" for assignment.
mpolednik