On Tue, May 27, 2025 at 06:43:46PM -0700, Nathan Chen wrote:
Hi Daniel,
On 5/20/2025 5:51 AM, Daniel P. Berrangé wrote:
> > Hi,
> >
> > This is a follow up to the first RFC patchset [0] for supporting multiple
> > vSMMU instances in a qemu VM. This patchset also introduces support for
> > using iommufd to propagate DMA mappings to kernel for assigned devices.
> >
> > This patchset implements support for specifying multiple <iommu> devices
> > within the VM definition when smmuv3Dev IOMMU model is specified, and is
> > tested with Shameer's latest qemu RFC for HW-accelerated vSMMU devices [1]
> >
> > Moreover, it adds a new 'iommufd' member for virDomainIOMMUDef,
> > in order to represent the iommufd object in qemu command line. This
> > patchset also implements new 'iommufdId' and 'iommufdFd'
attributes for
> > hostdev devices to be associated with the iommufd object.
> >
> > For instance, specifying the iommufd object and associated hostdev in a
> > VM definition with multiple IOMMUs, configured to be routed to
> > pcie-expander-bus controllers in a way where VFIO device to SMMUv3
> > associations are matched with the host (pcie-expander-bus and
> > pcie-root-port controllers are no longer auto-added/auto-routed
> > like in the first revision of this RFC, as the PCIe topology will be
> > configured by management apps):
> >
> > <devices>
> > ...
> > <controller type='pci' index='1'
model='pcie-expander-bus'>
> > <model name='pxb-pcie'/>
> > <target busNr='252'/>
> > <address type='pci' domain='0x0000'
bus='0x00' slot='0x01' function='0x0'/>
> > </controller>
> > <controller type='pci' index='2'
model='pcie-expander-bus'>
> > <model name='pxb-pcie'/>
> > <target busNr='248'/>
> > <address type='pci' domain='0x0000'
bus='0x00' slot='0x02' function='0x0'/>
> > </controller>
> > ...
> > <controller type='pci' index='21'
model='pcie-root-port'>
> > <model name='pcie-root-port'/>
> > <target chassis='21' port='0x0'/>
> > <address type='pci' domain='0x0000'
bus='0x01' slot='0x00' function='0x0'/>
> > </controller>
> > <controller type='pci' index='22'
model='pcie-root-port'>
> > <model name='pcie-root-port'/>
> > <target chassis='22' port='0xa8'/>
> > <address type='pci' domain='0x0000'
bus='0x02' slot='0x00' function='0x0'/>
> > </controller>
> > ...
> > <hostdev mode='subsystem' type='pci'
managed='no'>
> > <source>
> > <address domain='0x0009' bus='0x01'
slot='0x00' function='0x0'/>
> > </source>
> > <iommufdId>iommufd0</iommufdId>
> > <address type='pci' domain='0x0000'
bus='0x15' slot='0x00' function='0x0'/>
> > </hostdev>
> > <hostdev mode='subsystem' type='pci'
managed='no'>
> > <source>
> > <address domain='0x0019' bus='0x01'
slot='0x00' function='0x0'/>
> > </source>
> > <iommufdId>iommufd0</iommufdId>
> > <address type='pci' domain='0x0000'
bus='0x16' slot='0x00' function='0x0'/>
> > </hostdev>
> > <iommu model='smmuv3Dev'>
> > <iommufd>
> > <id>iommufd0</id>
> > </iommufd>
> > <address type='pci' domain='0x0000'
bus='0x01' slot='0x01' function='0x0'/>
> IIUC, you're using <address> here to reference the earlier
<controller>
> pcie-expander-bus. This is a bit wierd as it is making it look like the
> smmuv3Dev itself has a PCI address, but this is just the PCI address
> of the controller.
>
> The smmuv3dev also doesn't have an address on the pcie-expander-bus,
> it is just an association IIUC.
>
> So from this pov, I think I'd be inclined to say we should just
> reference the <controller> based on its index, using an attribute
>
> <iommu model='smmuv3dev' controller='2'/>
>
I see, I will revise this to reference the controller index instead.
> > </iommu>
> > <iommu model='smmuv3Dev'>
> > <iommufd>
> > <id>iommufd0</id>
> > </iommufd>
> > <address type='pci' domain='0x0000'
bus='0x02' slot='0x01' function='0x0'/>
> > </iommu>
> > </devices>
> >
> > This would get translated to a qemu command line with the arguments below:
> >
> > -device
'{"driver":"pxb-pcie","bus_nr":252,"id":"pci.1","bus":"pcie.0","addr":"0x1"}'
\
> > -device
'{"driver":"pxb-pcie","bus_nr":248,"id":"pci.2","bus":"pcie.0","addr":"0x2"}'
\
> > -device
'{"driver":"pcie-root-port","port":0,"chassis":21,"id":"pci.21","bus":"pci.1","addr":"0x0"}'
\
> > -device
'{"driver":"pcie-root-port","port":168,"chassis":22,"id":"pci.22","bus":"pci.2","addr":"0x0"}'
\
> > -object
'{"qom-type":"iommufd","id":"iommufd0"}'
\
> > -device
'{"driver":"arm-smmuv3-accel","bus":"pci.1"}'
\
> > -device
'{"driver":"arm-smmuv3-accel","bus":"pci.2"}'
\
> > -device
'{"driver":"vfio-pci","host":"0009:01:00.0","id":"hostdev0","iommufd":"iommufd0","bus":"pci.21","addr":"0x0"}'
\
> > -device
'{"driver":"vfio-pci","host":"0019:01:00.0","id":"hostdev1","iommufd":"iommufd0","bus":"pci.22","addr":"0x0"}'
\
> The iommufd integration in the XML looks a bit wierd too - we have
> four different elements all referencing 'iommufd0' but nothing
> is defining this. The iommu references the iommufd0, but nothing
> actually uses this on the arm-smuv3-accel command line.
>
>
> I've not been paying much attention to iommufd in QEMU, but IIUC
> it will apply to x86_64 too. So I'm wondering how iommufd integration
> sound work in libvirt more broadly.
>
It is my understanding that we want to consider device classes for libvirt
device representation in XML, so I intended to have users declare the
iommufd definition as an attribute under the <iommu> stanza, which would>
translate to the following qemu argument:
-object
'{"qom-type":"iommufd","id":"iommufd0"}'
but since this series implements support for multiple <iommu> definitions,
we specify iommufd0 for multiple <iommu> stanzas. For x86_64, we would just
specify the iommufd attribute once under a single <iommu> stanza.
Would you suggest we move iommufd out of the <iommu> definition instead,
like the examples below?
AFAICT iommufd isn't connected to the guest iommu at all in terms
of configuration, it is simply an attribute of the hostdev. eg
we could do
<hostdev mode='subsystem' type='mdev' model='vfio-pci'
iommufd='on'>
that does leave open the possibility that someone configures iommufd on
one hostdev, but not on another, but that's not as bad as when we set it
on the <iommu> too. So something we can validate in post-parse logic if
we need to ensure consistent usage - if qemu allows a mix of iommfd and
non-iommufd for vfio-pci, we can just allow that at libvirt too
With regards,
Daniel
--
|:
https://berrange.com -o-
https://www.flickr.com/photos/dberrange :|
|:
https://libvirt.org -o-
https://fstop138.berrange.com :|
|:
https://entangle-photo.org -o-
https://www.instagram.com/dberrange :|