On 09/01/2016 12:59 PM, Alex Williamson wrote:
On Thu, 1 Sep 2016 18:47:06 +0200
Michal Privoznik <mprivozn(a)redhat.com> wrote:
> On 31.08.2016 08:12, Tian, Kevin wrote:
>>> From: Alex Williamson [mailto:alex.williamson@redhat.com]
>>> Sent: Wednesday, August 31, 2016 12:17 AM
>>>
>>> Hi folks,
>>>
>>> At KVM Forum we had a BoF session primarily around the mediated device
>>> sysfs interface. I'd like to share what I think we agreed on and the
>>> "problem areas" that still need some work so we can get the
thoughts
>>> and ideas from those who weren't able to attend.
>>>
>>> DanPB expressed some concern about the mdev_supported_types sysfs
>>> interface, which exposes a flat csv file with fields like "type",
>>> "number of instance", "vendor string", and then a bunch
of type
>>> specific fields like "framebuffer size", "resolution",
"frame rate
>>> limit", etc. This is not entirely machine parsing friendly and sort of
>>> abuses the sysfs concept of one value per file. Example output taken
>>> from Neo's libvirt RFC:
>>>
>>> cat /sys/bus/pci/devices/0000:86:00.0/mdev_supported_types
>>> # vgpu_type_id, vgpu_type, max_instance, num_heads, frl_config, framebuffer,
>>> max_resolution
>>> 11 ,"GRID M60-0B", 16, 2, 45, 512M,
2560x1600
>>> 12 ,"GRID M60-0Q", 16, 2, 60, 512M,
2560x1600
>>> 13 ,"GRID M60-1B", 8, 2, 45, 1024M,
2560x1600
>>> 14 ,"GRID M60-1Q", 8, 2, 60, 1024M,
2560x1600
>>> 15 ,"GRID M60-2B", 4, 2, 45, 2048M,
2560x1600
>>> 16 ,"GRID M60-2Q", 4, 4, 60, 2048M,
2560x1600
>>> 17 ,"GRID M60-4Q", 2, 4, 60, 4096M,
3840x2160
>>> 18 ,"GRID M60-8Q", 1, 4, 60, 8192M,
3840x2160
>>>
>>> The create/destroy then looks like this:
>>>
>>> echo "$mdev_UUID:vendor_specific_argument_list" >
>>> /sys/bus/pci/devices/.../mdev_create
>>>
>>> echo "$mdev_UUID:vendor_specific_argument_list" >
>>> /sys/bus/pci/devices/.../mdev_destroy
>>>
>>> "vendor_specific_argument_list" is nebulous.
>>>
>>> So the idea to fix this is to explode this into a directory structure,
>>> something like:
>>>
>>> ├── mdev_destroy
>>> └── mdev_supported_types
>>> ├── 11
>>> │ ├── create
>>> │ ├── description
>>> │ └── max_instances
>>> ├── 12
>>> │ ├── create
>>> │ ├── description
>>> │ └── max_instances
>>> └── 13
>>> ├── create
>>> ├── description
>>> └── max_instances
>>>
>>> Note that I'm only exposing the minimal attributes here for simplicity,
>>> the other attributes would be included in separate files and we would
>>> require vendors to create standard attributes for common device classes.
>> I like this idea. All standard attributes are reflected into this hierarchy.
>> In the meantime, can we still allow optional vendor string in create
>> interface? libvirt doesn't need to know the meaning, but allows upper
>> layer to do some vendor specific tweak if necessary.
> This is not the best idea IMO. Libvirt is there to shadow differences
> between hypervisors. While doing that, we often hide differences between
> various types of HW too. Therefore in order to provide good abstraction
> we should make vendor specific string as small as possible (ideally an
> empty string). I mean I see it as bad idea to expose "vgpu_type_id" from
> example above in domain XML. What I think the better idea is if we let
> users chose resolution and frame buffer size, e.g.: <video
> resolution="1024x768" framebuffer="16"/> (just the first idea
that came
> to my mind while writing this e-mail). The point is, XML part is
> completely free of any vendor-specific knobs.
That's not really what you want though, a user actually cares whether
they get an Intel of NVIDIA vGPU, we can't specify it as just a
resolution and framebuffer size. The user also doesn't want the model
changing each time the VM is started, so not only do you *need* to know
the vendor, you need to know the vendor model
as well as any other configuration that might change over time. A
similar issue - libvirt really doesn't know or care what a "chassis" is
in an ioh3420 (a PCIe root-port), but it's a guest-visible property of
the device that qemu can set (and could presumably decide to change the
default setting of some time in the future), so libvirt has to set a
value for it in the config, and specify it on the qemu commandline.
What I'm getting at is that if there is anything in the vendor-specific
string that changes guest ABI, and that could change over time, then
libvirt can't just rely on it remaining the same, it needs to have it
saved in the config for later reproduction, even if it doesn't
understand the contents.
(for that matter, you may want to consider some type of "versioned vGPU
type" similar to qemu's versions machinetypes (e.g. "pc-i440fx-2.6",
which has some sort of incompatible ABI differences from
"pc-i440fx-1.4"), where any guest-ABI-changing modifications to the vGPU
would take effect only when the appropriate version of device was
requested. That way a guest originally created to use today's version of
vGPU X in resolution Y would continue to work even if incompatible guest
ABI changes were made in the future.)