On Mon, 6 Feb 2017 13:19:42 +0100
Erik Skultety <eskultet(a)redhat.com> wrote:
Finally. It's here. This is the initial suggestion on how libvirt
might
interract with the mdev framework, currently only focussing on the non-managed
devices, i.e. those pre-created by the user, since that will be revisited once
we all settled on how the XML should look like, given we might not want to use
the sysfs path directly as an attribute in the domain XML. My proposal on the
XML is the following:
<hostdev mode='subsystem' type='mdev'>
<source>
<!-- this is the host's physical device address -->
<address domain='0x0000' bus='0x00' slot='0x00'
function='0x00'>
<uuid>vGPU_UUID<uuid>
<source>
<!-- target PCI address can be omitted to assign it automatically -->
</hostdev>
So the mediated device is identified by the physical parent device visible on
the host and a UUID which allows us to construct the sysfs path by ourselves,
which we then put on the QEMU's command line.
Based on your test code, I think you're creating something like this:
-device
vfio-pci,sysfsdev=/sys/class/mdev_bus/0000:00:03.0/53764d0e-85a0-42b4-af5c-2046b460b1dc
That would explain the need for the parent device address, but that's
an entirely self inflicted requirement. For a managed="no" scenarios,
we shouldn't need the parent, we can get to the mdev device
via /sys/bus/mdev/devices/53764d0e-85a0-42b4-af5c-2046b460b1dc. So it
seems that the UUID should be the only required source element for
managed="no".
For managed="yes", it seems like the parent device is still an optional
field. The most important thing that libvirt needs to know when
creating a mdev device for a VM is the mdev type name. The parent
device should be an optional field to help higher level management
tools deal with placement of the device for locality or load balancing.
Also, we can't assume that the parent device is a PCI device, the
sample mtty driver already breaks this assumption.
Also, grep'ing through the patches, I don't see that the "device_api"
file is being used to test that the mdev device actually exports the
vfio-pci API before making use of it with the QEMU vfio-pci driver. We
don't yet have any examples to the contrary, but non vfio-pci mdev
devices are in development. Just like we can't assume the parent
device type, we can't assume the API of an mdev device to the user.
Thanks,
Alex
A few remarks if you actually happen to have a machine to test this
on:
- right now the mediated devices are one-time use only, i.e. they have to be
recreated before every machine boot
- I wouldn't recommend assigning multiple vGPUs to a single domain
Once this series is sorted out, we can then continue with 'managed=yes' where
as Laine pointed out [1], we need to figure out how exactly should the
management layer hint libvirt which vGPU type should be used for device
instantiation.
[1]
https://www.redhat.com/archives/libvir-list/2017-January/msg00287.html
#pleaseshareyourfeedback
Thanks,
Erik
Erik Skultety (16):
util: Introduce new module virmdev
conf: Introduce new hostdev device type mdev
docs: Update RNG schema to reflect the new hostdev type mdev
conf: Adjust the domain parser to work with mdevs
Adjust the formatter to reflect the new hostdev type mdev
security: dac: Enable labeling of vfio mediated devices
security: selinux: Enable labeling of vfio mediated devices
conf: Enable cold-plug of a mediated device
qemu: Assign PCI addresses for mediated devices as well
hostdev: Maintain a driver list of active mediated devices
hostdev: Introduce a reattach method for mediated devices
qemu: cgroup: Adjust cgroups' logic to allow mediated devices
qemu: namespace: Hook up the discovery of mdevs into the namespace
code
qemu: Format mdevs on the qemu command line
test: Add some test cases for our test suite regarding the mdevs
docs: Document the new hostdev device type 'mdev'
docs/formatdomain.html.in | 40 ++-
docs/schemas/domaincommon.rng | 17 +
po/POTFILES.in | 1 +
src/Makefile.am | 1 +
src/conf/domain_conf.c | 81 ++++-
src/conf/domain_conf.h | 10 +
src/libvirt_private.syms | 19 ++
src/qemu/qemu_cgroup.c | 35 ++
src/qemu/qemu_command.c | 49 +++
src/qemu/qemu_command.h | 5 +
src/qemu/qemu_domain.c | 13 +
src/qemu/qemu_domain_address.c | 12 +-
src/qemu/qemu_hostdev.c | 37 ++
src/qemu/qemu_hostdev.h | 8 +
src/qemu/qemu_hotplug.c | 2 +
src/security/security_apparmor.c | 3 +
src/security/security_dac.c | 56 +++
src/security/security_selinux.c | 55 +++
src/util/virhostdev.c | 179 +++++++++-
src/util/virhostdev.h | 16 +
src/util/virmdev.c | 375 +++++++++++++++++++++
src/util/virmdev.h | 85 +++++
tests/domaincapsschemadata/full.xml | 1 +
...qemuxml2argv-hostdev-mdev-unmanaged-no-uuid.xml | 37 ++
.../qemuxml2argv-hostdev-mdev-unmanaged.args | 25 ++
.../qemuxml2argv-hostdev-mdev-unmanaged.xml | 38 +++
tests/qemuxml2argvtest.c | 6 +
.../qemuxml2xmlout-hostdev-mdev-unmanaged.xml | 41 +++
tests/qemuxml2xmltest.c | 1 +
29 files changed, 1239 insertions(+), 9 deletions(-)
create mode 100644 src/util/virmdev.c
create mode 100644 src/util/virmdev.h
create mode 100644
tests/qemuxml2argvdata/qemuxml2argv-hostdev-mdev-unmanaged-no-uuid.xml
create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-hostdev-mdev-unmanaged.args
create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-hostdev-mdev-unmanaged.xml
create mode 100644 tests/qemuxml2xmloutdata/qemuxml2xmlout-hostdev-mdev-unmanaged.xml