On 9/7/2016 2:58 AM, Alex Williamson wrote:
On Wed, 7 Sep 2016 01:05:11 +0530
Kirti Wankhede <kwankhede(a)nvidia.com> wrote:
> On 9/6/2016 11:10 PM, Alex Williamson wrote:
>> On Sat, 3 Sep 2016 22:04:56 +0530
>> Kirti Wankhede <kwankhede(a)nvidia.com> wrote:
>>
>>> On 9/3/2016 3:18 AM, Paolo Bonzini wrote:
>>>>
>>>>
>>>> On 02/09/2016 20:33, Kirti Wankhede wrote:
>>>>> <Alex> We could even do:
>>>>>>>
>>>>>>> echo $UUID1:$GROUPA > create
>>>>>>>
>>>>>>> where $GROUPA is the group ID of a previously created mdev
device into
>>>>>>> which $UUID1 is to be created and added to the same group.
>>>>> </Alex>
>>>>
>>>> From the point of view of libvirt, I think I prefer Alex's idea.
>>>> <group> could be an additional element in the nodedev-create XML:
>>>>
>>>> <device>
>>>> <name>my-vgpu</name>
>>>> <parent>pci_0000_86_00_0</parent>
>>>> <capability type='mdev'>
>>>> <type id='11'/>
>>>> <uuid>0695d332-7831-493f-9e71-1c85c8911a08</uuid>
>>>> <group>group1</group>
>>>> </capability>
>>>> </device>
>>>>
>>>> (should group also be a UUID?)
>>>>
>>>
>>> No, this should be a unique number in a system, similar to iommu_group.
>>
>> Sorry, just trying to catch up on this thread after a long weekend.
>>
>> We're talking about iommu groups here, we're not creating any sort of
>> parallel grouping specific to mdev devices.
>
> I thought we were talking about group of mdev devices and not iommu
> group. IIRC, there were concerns about it (this would be similar to
> UUID+instance) and that would (ab)use iommu groups.
What constraints does a group, which is not an iommu group, place on the
usage of the mdev devices? What happens if we put two mdev devices in
the same "mdev group" and then assign them to separate VMs/users? I
believe that the answer is that this theoretical "mdev group" doesn't
actually impose any constraints on the devices within the group or how
they're used.
We feel its not a good idea to try to associate device's iommu groups
with mdev device groups. That adds more complications.
As in above nodedev-create xml, 'group1' could be a unique number that
can be generated by libvirt. Then to create mdev device:
echo $UUID1:group1 > create
If user want to add more mdev devices to same group, he/she should use
same group number in next nodedev-create devices. So create commands
would be:
echo $UUID2:group1 > create
echo $UUID3:group1 > create
Each mdev device would store this group number in its mdev_device
structure.
With this, we would add open() and close() callbacks from vfio_mdev
module for vendor driver to commit resources. Then we don't need
'start'/'stop' or online/offline interface.
To commit resources for all devices associated to that domain/user space
application, vendor driver can use 'first open()' and 'last close()' to
free those. Or if vendor driver want to commit resources for each device
separately, they can do in each device's open() call. It will depend on
vendor driver how they want to implement.
Libvirt don't have to do anything about assigned group numbers while
managing mdev devices.
QEMU commandline parameter would be same as earlier (don't have to
mention group number here):
-device vfio-pci,sysfsdev=/sys/bus/mdev/devices/$UUID1 \
-device vfio-pci,sysfsdev=/sys/bus/mdev/devices/$UUID2
In case if two mdev devices from same groups are assigned to different
domains, we can fail open() call of second device. How would driver know
that those are being used by different domain? By checking <group1, pid>
of first device of 'group1'. The two devices in same group should have
same pid in their open() call.
To hot-plug mdev device to a domain in which there is already a mdev
device assigned, mdev device should be created with same group number as
the existing devices are and then hot-plug it. If there is no mdev
device in that domain, then group number should be a unique number.
This simplifies the mdev grouping and also provide flexibility for
vendor driver implementation.
Thanks,
Kirti