On Wed, Dec 11, 2024 at 04:24:18PM -0800, Nathan Chen via Devel wrote:
Hi,
This is a draft solution for supporting multiple vSMMU instances in a qemu VM.
Based on discussions/suggestions received for a previous RFC by Nicolin here[0],
the association of vSMMUs to VFIO devices in VM PCIe topology should be moved
out of qemu into libvirt. In addition, the nested SMMU nodes should be passed
to qemu as pluggable devices.
To address these changes, this patch series introduces a new "nestedSmmuv3"
IOMMU model and "nestedSmmuv3" device type. Upon specifying the nestedSmmuv3
IOMMU model, nestedSmmuv3 devices will be auto-added to the VM definition based
on the available SMMU nodes in the host's sysfs. The nestedSmmuv3 devices will
each be attached to a separate PXB controller, and VFIO devices will be routed
to PXBs based on their association with host SMMU nodes. This will maintain a VM
PCIe topology that allows for multiple nested SMMUs per Nicolin's original qemu
patch series in [0] and Shameer's work in [1] to remove VM topology changes from
qemu and allow the nested SMMUs to be specified as pluggable devices.
For instance, if we specify the nestedSmmuv3 IOMMU model and a hostdev for
passthrough:
<devices>
<hostdev mode='subsystem' type='pci' managed='no'>
<source>
<address domain='0x0009' bus='0x01' slot='0x00'
function='0x0'/>
</source>
</hostdev>
<iommu model='nestedSmmuv3'/>
</devices>
Libvirt will scan sysfs and populate the VM definition with controllers and
nestedSmmuv3 devices based on host config. So if
/sys/bus/pci/devices/0009:01:00.0/iommu is a symlink to the host SMMU node
represented by
/sys/devices/platform/arm-smmu-v3.8.auto/iommu/smmu3.0x0000000016000000
and there are 3 host SMMU nodes under /sys/class/iommu/, we'll see three
auto-added nestedSmmuv3 devices, each routed to a pcie-expander-bus controller.
Then the hostdev will be routed to a PXB controller that has a matching host
SMMU node associated with it:
<devices>
...
<controller type='pci' index='1'
model='pcie-expander-bus'>
<model name='pxb-pcie'/>
<target busNr='254'/>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x01' function='0x0'/>
</controller>
<controller type='pci' index='2'
model='pcie-expander-bus'>
<model name='pxb-pcie'/>
<target busNr='251'/>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x02' function='0x0'/>
</controller>
<controller type='pci' index='3'
model='pcie-expander-bus'>
<model name='pxb-pcie'/>
<target busNr='249'/>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x03' function='0x0'/>
</controller>
<controller type='pci' index='4'
model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='7' port='0x8'/>
<address type='pci' domain='0x0000' bus='0x02'
slot='0x01' function='0x0'/>
</controller>
<hostdev mode='subsystem' type='pci' managed='no'>
<source>
<address domain='0x0009' bus='0x01' slot='0x00'
function='0x0'/>
</source>
<address type='pci' domain='0x0000' bus='0x04'
slot='0x00' function='0x0'/>
</hostdev>
<iommu model='nestedSmmuv3'/>
<nestedSmmuv3>
<name>smmu3.0x0000000012000000</name>
<address type='pci' domain='0x0000' bus='0x01'
slot='0x00' function='0x0'/>
</nestedSmmuv3>
<nestedSmmuv3>
<name>smmu3.0x0000000016000000</name>
<address type='pci' domain='0x0000' bus='0x02'
slot='0x00' function='0x0'/>
</nestedSmmuv3>
<nestedSmmuv3>
<name>smmu3.0x0000000011000000</name>
<address type='pci' domain='0x0000' bus='0x03'
slot='0x00' function='0x0'/>
</nestedSmmuv3>
<iommu model='nestedSmmuv3'/>
</devices>
Top level libvirt device representation in XML is based on the device
*class*, not the specific device impl. Adding a <nestedSmmuv3> device
type XML element in libvirt is totally inappropriate. Any configuration
must be done beneath the <iommu> element.
With regards,
Daniel
--
|:
https://berrange.com -o-
https://www.flickr.com/photos/dberrange :|
|:
https://libvirt.org -o-
https://fstop138.berrange.com :|
|:
https://entangle-photo.org -o-
https://www.instagram.com/dberrange :|