On 5/30/2025 8:05 AM, Daniel P. Berrangé wrote:
> Hi Daniel,
>
> On 5/20/2025 5:51 AM, Daniel P. Berrangé wrote:
>>> Hi,
>>>
>>> This is a follow up to the first RFC patchset [0] for supporting multiple
>>> vSMMU instances in a qemu VM. This patchset also introduces support for
>>> using iommufd to propagate DMA mappings to kernel for assigned devices.
>>>
>>> This patchset implements support for specifying multiple <iommu>
devices
>>> within the VM definition when smmuv3Dev IOMMU model is specified, and is
>>> tested with Shameer's latest qemu RFC for HW-accelerated vSMMU devices
[1]
>>>
>>> Moreover, it adds a new 'iommufd' member for virDomainIOMMUDef,
>>> in order to represent the iommufd object in qemu command line. This
>>> patchset also implements new 'iommufdId' and 'iommufdFd'
attributes for
>>> hostdev devices to be associated with the iommufd object.
>>>
>>> For instance, specifying the iommufd object and associated hostdev in a
>>> VM definition with multiple IOMMUs, configured to be routed to
>>> pcie-expander-bus controllers in a way where VFIO device to SMMUv3
>>> associations are matched with the host (pcie-expander-bus and
>>> pcie-root-port controllers are no longer auto-added/auto-routed
>>> like in the first revision of this RFC, as the PCIe topology will be
>>> configured by management apps):
>>>
>>> <devices>
>>> ...
>>> <controller type='pci' index='1'
model='pcie-expander-bus'>
>>> <model name='pxb-pcie'/>
>>> <target busNr='252'/>
>>> <address type='pci' domain='0x0000'
bus='0x00' slot='0x01' function='0x0'/>
>>> </controller>
>>> <controller type='pci' index='2'
model='pcie-expander-bus'>
>>> <model name='pxb-pcie'/>
>>> <target busNr='248'/>
>>> <address type='pci' domain='0x0000'
bus='0x00' slot='0x02' function='0x0'/>
>>> </controller>
>>> ...
>>> <controller type='pci' index='21'
model='pcie-root-port'>
>>> <model name='pcie-root-port'/>
>>> <target chassis='21' port='0x0'/>
>>> <address type='pci' domain='0x0000'
bus='0x01' slot='0x00' function='0x0'/>
>>> </controller>
>>> <controller type='pci' index='22'
model='pcie-root-port'>
>>> <model name='pcie-root-port'/>
>>> <target chassis='22' port='0xa8'/>
>>> <address type='pci' domain='0x0000'
bus='0x02' slot='0x00' function='0x0'/>
>>> </controller>
>>> ...
>>> <hostdev mode='subsystem' type='pci'
managed='no'>
>>> <source>
>>> <address domain='0x0009' bus='0x01'
slot='0x00' function='0x0'/>
>>> </source>
>>> <iommufdId>iommufd0</iommufdId>
>>> <address type='pci' domain='0x0000'
bus='0x15' slot='0x00' function='0x0'/>
>>> </hostdev>
>>> <hostdev mode='subsystem' type='pci'
managed='no'>
>>> <source>
>>> <address domain='0x0019' bus='0x01'
slot='0x00' function='0x0'/>
>>> </source>
>>> <iommufdId>iommufd0</iommufdId>
>>> <address type='pci' domain='0x0000'
bus='0x16' slot='0x00' function='0x0'/>
>>> </hostdev>
>>> <iommu model='smmuv3Dev'>
>>> <iommufd>
>>> <id>iommufd0</id>
>>> </iommufd>
>>> <address type='pci' domain='0x0000'
bus='0x01' slot='0x01' function='0x0'/>
>> IIUC, you're using <address> here to reference the earlier
<controller>
>> pcie-expander-bus. This is a bit wierd as it is making it look like the
>> smmuv3Dev itself has a PCI address, but this is just the PCI address
>> of the controller.
>>
>> The smmuv3dev also doesn't have an address on the pcie-expander-bus,
>> it is just an association IIUC.
>>
>> So from this pov, I think I'd be inclined to say we should just
>> reference the <controller> based on its index, using an attribute
>>
>> <iommu model='smmuv3dev' controller='2'/>
>>
> I see, I will revise this to reference the controller index instead.
>
>>> </iommu>
>>> <iommu model='smmuv3Dev'>
>>> <iommufd>
>>> <id>iommufd0</id>
>>> </iommufd>
>>> <address type='pci' domain='0x0000'
bus='0x02' slot='0x01' function='0x0'/>
>>> </iommu>
>>> </devices>
>>>
>>> This would get translated to a qemu command line with the arguments below:
>>>
>>> -device
'{"driver":"pxb-pcie","bus_nr":252,"id":"pci.1","bus":"pcie.0","addr":"0x1"}'
\
>>> -device
'{"driver":"pxb-pcie","bus_nr":248,"id":"pci.2","bus":"pcie.0","addr":"0x2"}'
\
>>> -device
'{"driver":"pcie-root-port","port":0,"chassis":21,"id":"pci.21","bus":"pci.1","addr":"0x0"}'
\
>>> -device
'{"driver":"pcie-root-port","port":168,"chassis":22,"id":"pci.22","bus":"pci.2","addr":"0x0"}'
\
>>> -object
'{"qom-type":"iommufd","id":"iommufd0"}'
\
>>> -device
'{"driver":"arm-smmuv3-accel","bus":"pci.1"}'
\
>>> -device
'{"driver":"arm-smmuv3-accel","bus":"pci.2"}'
\
>>> -device
'{"driver":"vfio-pci","host":"0009:01:00.0","id":"hostdev0","iommufd":"iommufd0","bus":"pci.21","addr":"0x0"}'
\
>>> -device
'{"driver":"vfio-pci","host":"0019:01:00.0","id":"hostdev1","iommufd":"iommufd0","bus":"pci.22","addr":"0x0"}'
\
>> The iommufd integration in the XML looks a bit wierd too - we have
>> four different elements all referencing 'iommufd0' but nothing
>> is defining this. The iommu references the iommufd0, but nothing
>> actually uses this on the arm-smuv3-accel command line.
>>
>>
>> I've not been paying much attention to iommufd in QEMU, but IIUC
>> it will apply to x86_64 too. So I'm wondering how iommufd integration
>> sound work in libvirt more broadly.
>>
> It is my understanding that we want to consider device classes for libvirt
> device representation in XML, so I intended to have users declare the
> iommufd definition as an attribute under the <iommu> stanza, which would>
translate to the following qemu argument:
> -object
'{"qom-type":"iommufd","id":"iommufd0"}'
>
> but since this series implements support for multiple <iommu> definitions,
> we specify iommufd0 for multiple <iommu> stanzas. For x86_64, we would just
> specify the iommufd attribute once under a single <iommu> stanza.
>
> Would you suggest we move iommufd out of the <iommu> definition instead,
> like the examples below?
AFAICT iommufd isn't connected to the guest iommu at all in terms
of configuration, it is simply an attribute of the hostdev. eg
we could do
<hostdev mode='subsystem' type='mdev' model='vfio-pci'
iommufd='on'>
that does leave open the possibility that someone configures iommufd on
one hostdev, but not on another, but that's not as bad as when we set it
on the <iommu> too. So something we can validate in post-parse logic if
we need to ensure consistent usage - if qemu allows a mix of iommfd and
non-iommufd for vfio-pci, we can just allow that at libvirt too
Ok sounds good, I will make this only a hostdev attribute and generate
the qemu iommufd object argument when detected instead of having users
specify it under the guest iommu definition. We should be able to allow
a mix of iommufd and non-iommufd for vfio-pci, and we can avoid the case
where someone specifies it for the hostdev but not in the <iommu> stanza
like you mentioned.
Thanks,
Nathan