Going back to the beginning, with slightly more detail:
1) "Unmanaged" mediated device assignment - assigning an existing device to
a virtual machine
This will assume that the desired child device has already been created, and
can be found in /sys/bus/mdev/devices/$UUID. Here's a first attempt at what
the XML would look like:
<hostdev mode='subsystem' type='pci' /managed='no'>/
<source> <!-- (maybe add "type='mdev'" ???) -->
<mdev uuid='$uuid'/>
</source>
<address type='pci' blah blah blah/> <!-- PCI address in the
guest
-->
</hostdev>
In the past, the "type" attribute of hostdev described the type on both the
host and the guest. With mediated devices, the device isn't visible on the
host as a PCI device, but just as a software device. So the type attribute
in <hostdev> now gives the type of the device on the guest, and the device
type on the host is determined from the contents of <source>.
Erik had a different suggestion for this (which I think he's already working
on patches for) - that the type attribute in <hostdev> should be the type of
the device in the *host*, and the type in the guest would be that given in
the <address>. Something like this I think:
<hostdev mode='subsystem' type='mdev' /managed='no'/>
<source>
<mdev uuid='$uuid'/>
</source>
<address type='pci' blah blah blah/>
</hostdev>
(Is this correct, Erik?)
Yes, that's the way I decided to go prior you sending the mail. My reasoning
when looking at the code was that it potentially could lead to a cleaner code,
since there's a quite complex logic going on in PCI-related methods the vast
majority of which is unrelated to MDEV (basically the most interesting common
parts are checking whether there's VFIO driver available on the host and PCI
address assignment for the guest) which would lead to constant special casing
of MDEV and calling the appropriate mdev methods. So to sum it up code
cleanliness had the major impact to my reasoning to go with a new hostdev type
'mdev' rather than reuse the existing one. I must admit that I haven't
realized
the part with the 'managed' attribute until I read your paragraph below.
To the matter of guest device type being determined by the address field, that
was my initial idea. If the address field was missing (very likely) we would
have to guess the address type from the os architecture which I think we would
need to do anyway for the managed devices. The other idea I've got is similar
to specifying the driver element for various PCI backends for the assignment.
We could either reuse it and add some attributes (I think this wouldn't be the
preferred one) or introduce a new one that would specify the device api to be
used with the assignment (the value of which would correspond to what you can
find in /sys/class/mdev_bus/<vendor>/mdev_supported_types/<type>/device_api).
Erik
(I arrived at my suggestion by the thinking that, in other places
where
there are similar attributes for the host and guest side, e.g. the IP
addresses and routes that can be added on both the host and guest side of an
<interface>, everything related to the host side is in the <source>
subelement, while things related to the guest are directly under the
toplevel of the device element. On the other hand, the "managed" attribute
isn't something related to the guest, but to the host, and his idea has less
redundancy, so maybe he's onto something...)
(NB: a mediated device could be exposed to the guest as a PCI device, a CCW
device, or anything else supported by vfio. The type of device that the
guest will see can be determined from the contents of
mdev_supported_types/<type-id>/device_api under the parent device's
directory in sysfs (it will be, e.g., "vfio-pci" or "vfio-ccw"). But
libvirt
assigns guest-side addresses at the time a domain is defined, and it's
possible that the mdev child device won't be created yet at define time (and
therefore we won't know which parent device it's associated with, and so we
won't be able to look at device_api). In such situations, it will be up to
management to know something about the device it will be creating and assume
a type. Fortunately this is a reasonably safe thing to do - on x86 platforms
we can be fairly certain that the device will be a PCI device. (And, because
this also makes a difference for some machinetypes, that it will be a PCI
Express device). We will want to check device_api at runtime though, to
validate that the guest-side device really is a PCI device.
==
2) Reporting parent and child mediated devices and their capabilities in the
node device API.
There are 3 stages to this:
a) add mediated child devices to the list of devices provided by "virsh
nodedev-list". These will be called "mdev_$UUID", and will show up as
descendents of their respective parent devices in "virsh nodedev-list
--tree". The list of all these devices can easily be retrieved by
enumerating the links in /sys/bus/mdev/devices/$UUID.
b) report the capabilities of parent devices in their dumpxml output. This
will included supported child device types and a list of current children.
I don't have any experience with nodedev reporting for SCSI devices, but
recently noticed that nodedev-list can report lists of devices with certain
capabilities, e.g. "virsh nodedev-list --cap=scsi_host". Based on this, I
guess it would be useful for the parent devices to show something like this
(using the sample mtty driver as an example):
<device>
<name>pci_0000_02_00_0</name>
<parent>pci_0000_00_04_0</parent>
<driver>
<name>mtty</name>
</driver>
<capability type='mdev_parent'>
[list of supported types, each with number allowed]
[list of current child devices (just giving uuid or device name
("mdev_$uuid"?)]
[other info about parent/children?]
</capability>
...
Likewise, a nodedev-dumpxml of a child device should contain a pointer to
the parent device.
c) respond to dumpxml requests for mediated child devices. This should
include at least the uuid/type of the child device, and a link back to the
parent device (and I suppose somehow include <capability
type='mdev_child'>
so that it can be filtered with virsh modedev-list?)
==
(3), (4), and (5) need more thought that I haven't gotten to yet. TBD (if
anyone else has thoughts on those, please share!)
--
libvir-list mailing list
libvir-list(a)redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list