On Mon, 6 Feb 2017 16:44:37 +0000
"Daniel P. Berrange" <berrange(a)redhat.com> wrote:
On Mon, Feb 06, 2017 at 01:19:42PM +0100, Erik Skultety wrote:
> Finally. It's here. This is the initial suggestion on how libvirt might
> interract with the mdev framework, currently only focussing on the non-managed
> devices, i.e. those pre-created by the user, since that will be revisited once
> we all settled on how the XML should look like, given we might not want to use
> the sysfs path directly as an attribute in the domain XML. My proposal on the
> XML is the following:
>
> <hostdev mode='subsystem' type='mdev'>
> <source>
> <!-- this is the host's physical device address -->
> <address domain='0x0000' bus='0x00' slot='0x00'
function='0x00'>
> <uuid>vGPU_UUID<uuid>
> <source>
> <!-- target PCI address can be omitted to assign it automatically -->
> </hostdev>
>
> So the mediated device is identified by the physical parent device visible on
> the host and a UUID which allows us to construct the sysfs path by ourselves,
> which we then put on the QEMU's command line.
>
> A few remarks if you actually happen to have a machine to test this on:
> - right now the mediated devices are one-time use only, i.e. they have to be
> recreated before every machine boot
> - I wouldn't recommend assigning multiple vGPUs to a single domain
>
> Once this series is sorted out, we can then continue with 'managed=yes'
where
> as Laine pointed out [1], we need to figure out how exactly should the
> management layer hint libvirt which vGPU type should be used for device
> instantiation.
You seem to be suggesting that managed=yes with mdev devices would
cause create / delete of a mdev device from a specified parent.
This is rather different semantics from what managed=yes does with
PCI device assignment today. There the managed=yes flag is just
about controlling host device driver attachment. ie whether libvirt
will manually bind to vfio.ko, or expect the admin to have bound
it to vfio.ko before hand. I think it is important to keep that
concept as is for mdev too.
While we're thinking of mdev purely in terms of KVM + vfio usage,
it wouldn't suprise me if there ended up being non-KVM based
use cases for mdev.
It isn't clear to me that auto-creation of mdev devices as a concept
even belongs in the domain XML neccessarily.
Looking at two similar areas. For SRIOV NICs, in the domain XML
you either specify an explicit VF to use, or you reference a
libvirt virtual network. The latter takes care of dynamically
providing VFs to VMs. For NPIV, IIRC, the domain XML works
similarly either taking an explicit vHBA, or referencing a
storage pool to get one more dynamically.
Nit, there are other constraints of SR-IOV which I think are over
simplifying this analogy. With SR-IOV, we can't dynamically
instantiate new VFs individually. The process there requires that we
set the number of VFs we need and enable them. Changing that number
of VFs requires that all existing VFs on that PF are removed and
recreated. So, does libvirt work the way it does with SR-IOV devices
because that's the optimal way for users to make use of those VFs, or
does it behave that way because it must to follow the constraints of
the device? I think libvirt handles VFs much like it does PFs because
it has no other choice. Here we do have a choice. Individual mdev
devices can be created and destroyed. The only dependency between mdev
devices is how creating one affects the availability of mdev types
remaining on the parent device. It would really be a shame to not take
advantage of the fact that the underlying device creation has advanced
so far from SR-IOV and lump it into the same sort of management. My
impression is that user management of creating SR-IOV VFs via module
options or self defined scripts is a stumbling point that libvirt could
help to address here.
Before we even consider auto-creation though, I think we need
to have manual creation designed & integrated in the node device
APIs.
So in terms of the domain XML, I think the only think we need
to provide is the address of the pre-existing mdev device
to be used. In this case "address" means the UUID. We should
not need anything about the parent device AFAICT.
Yep, agree. Thanks,
Alex