On 9/2/2016 3:35 PM, Paolo Bonzini wrote:
On 02/09/2016 07:21, Kirti Wankhede wrote:
> On 9/2/2016 10:18 AM, Michal Privoznik wrote:
>> Okay, maybe I'm misunderstanding something. I just thought that users
>> will consult libvirt's nodedev driver (e.g. virsh nodedev-list &&
virsh
>> nodedev-dumpxml $id) to fetch vGPU capabilities and then use that info
>> to construct domain XML.
>
> I'm not familiar with libvirt code, curious how libvirt's nodedev driver
> enumerates devices in the system?
It looks at sysfs and/or the udev database and transforms what it finds
there to XML.
I think people would consult the nodedev driver to fetch vGPU
capabilities, use "virsh nodedev-create" to create the vGPU device on
the host, and then somehow refer to the nodedev in the domain XML.
There isn't very much documentation on nodedev-create, but it's used
mostly for NPIV (virtual fibre channel adapter) and the XML looks like this:
<device>
<name>scsi_host6</name>
<parent>scsi_host5</parent>
<capability type='scsi_host'>
<capability type='fc_host'>
<wwnn>2001001b32a9da5e</wwnn>
<wwpn>2101001b32a9da5e</wwpn>
</capability>
</capability>
</device>
so I suppose for vGPU it would look like this:
<device>
<name>my-vgpu</name>
<parent>pci_0000_86_00_0</parent>
<capability type='mdev'>
<type id='11'/>
<uuid>0695d332-7831-493f-9e71-1c85c8911a08</uuid>
</capability>
</device>
while the parent would have:
<device>
<name>pci_0000_86_00_0</name>
<capability type='pci'>
<domain>0</domain>
<bus>134</bus>
<slot>0</slot>
<function>0</function>
<capability type='mdev'>
<!-- one type element per sysfs directory -->
<type id='11'>
<!-- one element per sysfs file roughly -->
<name>GRID M60-0B</name>
<attribute name='num_heads'>2</attribute>
<attribute name='frl_config'>45</attribute>
<attribute name='framebuffer'>524288</attribute>
<attribute name='hres'>2560</attribute>
<attribute name='vres'>1600</attribute>
</type>
</capability>
<product id='...'>GRID M60</product>
<vendor id='0x10de'>NVIDIA</vendor>
</capability>
</device>
After creating the vGPU, if required by the host driver, all the other
type ids would disappear from "virsh nodedev-dumpxml pci_0000_86_00_0" too.
Thanks Paolo for details.
'nodedev-create' parse the xml file and accordingly write to 'create'
file in sysfs to create mdev device. Right?
At this moment, does libvirt know which VM this device would be
associated with?
When dumping the mdev with nodedev-dumpxml, it could show more
complete
info, again taken from sysfs:
<device>
<name>my-vgpu</name>
<parent>pci_0000_86_00_0</parent>
<capability type='mdev'>
<uuid>0695d332-7831-493f-9e71-1c85c8911a08</uuid>
<!-- only the chosen type -->
<type id='11'>
<name>GRID M60-0B</name>
<attribute name='num_heads'>2</attribute>
<attribute name='frl_config'>45</attribute>
<attribute name='framebuffer'>524288</attribute>
<attribute name='hres'>2560</attribute>
<attribute name='vres'>1600</attribute>
</type>
<capability type='pci'>
<!-- no domain/bus/slot/function of course -->
<!-- could show whatever PCI IDs are seen by the guest: -->
<product id='...'>...</product>
<vendor id='0x10de'>NVIDIA</vendor>
</capability>
</capability>
</device>
Notice how the parent has mdev inside pci; the vGPU, if it has to have
pci at all, would have it inside mdev. This represents the difference
between the mdev provider and the mdev device.
Parent of mdev device might not always be a PCI device. I think we
shouldn't consider it as PCI capability.
Random proposal for the domain XML too:
<hostdev mode='subsystem' type='pci'>
<source type='mdev'>
<!-- possible alternative to uuid: <name>my-vgpu</name> ?!? -->
<uuid>0695d332-7831-493f-9e71-1c85c8911a08</uuid>
</source>
<address type='pci' bus='0' slot='2'
function='0'/>
</hostdev>
When user wants to assign two mdev devices to one VM, user have to add
such two entries or group the two devices in one entry?
On other mail thread with same subject we are thinking of creating group
of mdev devices to assign multiple mdev devices to one VM. Libvirt don't
have to know about group number but libvirt should add all mdev devices
in a group. Is that possible to do before starting QEMU process?
Thanks,
Kirti
Paolo