-----Original Message-----
From: Nathan Chen <nathanc(a)nvidia.com>
Sent: Thursday, May 15, 2025 9:37 PM
To: devel(a)lists.libvirt.org
Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi(a)huawei.com>;
nicolinc(a)nvidia.com; Nathan Chen <nathanc(a)nvidia.com>
Subject: [RFC PATCH 0/5] qemu: Implement support for iommufd and multiple
vSMMUs
Hi,
This is a follow up to the first RFC patchset [0] for supporting multiple
vSMMU instances in a qemu VM. This patchset also introduces support for
using iommufd to propagate DMA mappings to kernel for assigned devices.
This patchset implements support for specifying multiple <iommu> devices
within the VM definition when smmuv3Dev IOMMU model is specified, and is
tested with Shameer's latest qemu RFC for HW-accelerated vSMMU devices
[1]
Based on feedback released on the above RFC and the discussion here[1],
there are certain changes to the name of the vSMMU device and the way
we associate the PCIe bus.
Going forward it is more likely to be something like below,
-device arm-smmuv3,primary-bus=pcie.0,accel=on
-device vfio-pci,host=xxx,,bus=pcie.0
-device pxb-pcie,id=pcie.1,bus_nr=2
-device arm-smmuv3,primary-bus=pcie.1,accel=on
...
Hopefully, this doesn't warrant any major changes to this libvirt
series, but please do make a note of it.
Thanks,
Shameer
[0]
Moreover, it adds a new 'iommufd' member for
virDomainIOMMUDef,
in order to represent the iommufd object in qemu command line. This
patchset also implements new 'iommufdId' and 'iommufdFd' attributes for
hostdev devices to be associated with the iommufd object.
For instance, specifying the iommufd object and associated hostdev in a
VM definition with multiple IOMMUs, configured to be routed to
pcie-expander-bus controllers in a way where VFIO device to SMMUv3
associations are matched with the host (pcie-expander-bus and
pcie-root-port controllers are no longer auto-added/auto-routed
like in the first revision of this RFC, as the PCIe topology will be
configured by management apps):
<devices>
...
<controller type='pci' index='1'
model='pcie-expander-bus'>
<model name='pxb-pcie'/>
<target busNr='252'/>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x01'
function='0x0'/>
</controller>
<controller type='pci' index='2'
model='pcie-expander-bus'>
<model name='pxb-pcie'/>
<target busNr='248'/>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x02'
function='0x0'/>
</controller>
...
<controller type='pci' index='21'
model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='21' port='0x0'/>
<address type='pci' domain='0x0000' bus='0x01'
slot='0x00'
function='0x0'/>
</controller>
<controller type='pci' index='22'
model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='22' port='0xa8'/>
<address type='pci' domain='0x0000' bus='0x02'
slot='0x00'
function='0x0'/>
</controller>
...
<hostdev mode='subsystem' type='pci' managed='no'>
<source>
<address domain='0x0009' bus='0x01' slot='0x00'
function='0x0'/>
</source>
<iommufdId>iommufd0</iommufdId>
<address type='pci' domain='0x0000' bus='0x15'
slot='0x00'
function='0x0'/>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='no'>
<source>
<address domain='0x0019' bus='0x01' slot='0x00'
function='0x0'/>
</source>
<iommufdId>iommufd0</iommufdId>
<address type='pci' domain='0x0000' bus='0x16'
slot='0x00'
function='0x0'/>
</hostdev>
<iommu model='smmuv3Dev'>
<iommufd>
<id>iommufd0</id>
</iommufd>
<address type='pci' domain='0x0000' bus='0x01'
slot='0x01'
function='0x0'/>
</iommu>
<iommu model='smmuv3Dev'>
<iommufd>
<id>iommufd0</id>
</iommufd>
<address type='pci' domain='0x0000' bus='0x02'
slot='0x01'
function='0x0'/>
</iommu>
</devices>
This would get translated to a qemu command line with the arguments below:
-device '{"driver":"pxb-
pcie","bus_nr":252,"id":"pci.1","bus":"pcie.0","addr":"0x1"}'
\
-device '{"driver":"pxb-
pcie","bus_nr":248,"id":"pci.2","bus":"pcie.0","addr":"0x2"}'
\
-device '{"driver":"pcie-root-
port","port":0,"chassis":21,"id":"pci.21","bus":"pci.1","addr":"0x0"}'
\
-device '{"driver":"pcie-root-
port","port":168,"chassis":22,"id":"pci.22","bus":"pci.2","addr":"0x0"}'
\
-object
'{"qom-type":"iommufd","id":"iommufd0"}'
\
-device
'{"driver":"arm-smmuv3-accel","bus":"pci.1"}'
\
-device
'{"driver":"arm-smmuv3-accel","bus":"pci.2"}'
\
-device '{"driver":"vfio-
pci","host":"0009:01:00.0","id":"hostdev0","iommufd":"iommufd0","bus":"pci
.21","addr":"0x0"}' \
-device '{"driver":"vfio-
pci","host":"0019:01:00.0","id":"hostdev1","iommufd":"iommufd0","bus":"pci
.22","addr":"0x0"}' \
If users would like to leverage qemu's iommufd feature to open the VFIO
cdev and /dev/iommu via an external management layer, the fd can be
specified like so in the VM definition:
<devices>
<hostdev mode='subsystem' type='pci' managed='yes'>
<driver name='vfio'/>
<source>
<address domain='0x0000' bus='0x06' slot='0x12'
function='0x2'/>
</source>
<iommufdId>iommufd0</iommufdId>
<iommufdFd>23</iommufdFd>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x03'
function='0x0'/>
</hostdev>
<iommu model='intel'>
<iommufd>
<id>iommufd0</id>
<fd>22</fd>
</iommufd>
</iommu>
</devices>
This would get translated to a qemu command line with the arguments below:
-object
'{"qom-type":"iommufd","id":"iommufd0","fd":"22"}'
\
-device '{"driver":"vfio-
pci","host":"0000:06:12.2","id":"hostdev1","iommufd":"iommufd0","fd":"23",
"bus":"pci.0","addr":"0x3"}' \
Summary of changes:
- Introduced support for specifying multiple <iommu> stanzas in the VM
XML definition when using smmuv3Dev model.
- Automating PCIe topology to populate VM definition with multiple vSMMUs
routed to pcie-expander-bus controllers is excluded, in favor of
deferring creation of PXBs and routing of VFIO devices to management apps.
- Introduced iommufd support.
TODO:
- I updated the namespace and cgroup configuration to allow access to
iommufd
paths at /dev/vfio/devices/vfio* and /dev/iommu. However, qemu needs to
be
launched with user and group set to 'root' in order for these paths to be
accessible. A passthrough device represented by /dev/vfio/18 normally has
'root' user and group permissions, but in the mount namespace it's changed
to
'libvirt-qemu' and 'kvm'. I wasn't able to discern where this is
happening by
looking at src/qemu/qemu_namespace.c and src/qemu/qemu_cgroup.c.
Would you have
any pointers on how to change the iommufd paths' user and group
permissions in
the libvirt mount namespace?
This series is on Github:
https://github.com/NathanChenNVIDIA/libvirt/tree/smmuv3Dev-iommufd-04-
15-25
Thanks,
Nathan
[0]
https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/7G...
PAJMPP4ZSC4ACME6GVMG236/
[1]
https://lore.kernel.org/qemu-devel/20250311141045.66620-1-
shameerali.kolothum.thodi(a)huawei.com/
Signed-off-by: Nathan Chen <nathanc(a)nvidia.com>
Nathan Chen (5):
conf: Support multiple smmuv3Dev IOMMU devices
conf: Add an iommufd member struct to virDomainIOMMUDef
qemu: Implement support for associating iommufd to hostdev
qemu: Update Cgroup and namespace for qemu to access iommufd paths
qemu: Add test case for specifying iommufd
docs/formatdomain.rst | 5 +-
src/conf/domain_addr.c | 12 +-
src/conf/domain_addr.h | 4 +-
src/conf/domain_conf.c | 292 ++++++++++++++++--
src/conf/domain_conf.h | 21 +-
src/conf/domain_validate.c | 94 +++++-
src/conf/schemas/domaincommon.rng | 37 ++-
src/conf/virconftypes.h | 2 +
src/libvirt_private.syms | 2 +
src/qemu/qemu_alias.c | 15 +-
src/qemu/qemu_cgroup.c | 47 +++
src/qemu/qemu_cgroup.h | 1 +
src/qemu/qemu_command.c | 146 ++++++---
src/qemu/qemu_domain_address.c | 33 +-
src/qemu/qemu_driver.c | 8 +-
src/qemu/qemu_namespace.c | 36 +++
src/qemu/qemu_postparse.c | 11 +-
src/qemu/qemu_validate.c | 22 +-
...fio-iommufd-intel-iommu.x86_64-latest.args | 43 +++
...vfio-iommufd-intel-iommu.x86_64-latest.xml | 80 +++++
.../hostdev-vfio-iommufd-intel-iommu.xml | 80 +++++
tests/qemuxmlconftest.c | 1 +
22 files changed, 878 insertions(+), 114 deletions(-)
create mode 100644 tests/qemuxmlconfdata/hostdev-vfio-iommufd-intel-
iommu.x86_64-latest.args
create mode 100644 tests/qemuxmlconfdata/hostdev-vfio-iommufd-intel-
iommu.x86_64-latest.xml
create mode 100644 tests/qemuxmlconfdata/hostdev-vfio-iommufd-intel-
iommu.xml
--
2.43.0