On 06/23/2017 11:03 AM, Andrea Bolognani wrote:
All the pieces are now in place, so we can finally start
using isolation groups to achieve our initial goal, which is
separating hostdevs from emulated PCI devices while keeping
hostdevs that belong to the same host IOMMU group together.
Set isolation groups
for devices when allocating PCI addresses, so that
hostdevs in different IOMMU groups will be placed on different PCI buses
to maintain proper isolation.
Resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=1280542
Signed-off-by: Andrea Bolognani <abologna(a)redhat.com>
---
src/qemu/qemu_domain_address.c | 118 +++++++++++++++++++++
src/qemu/qemu_hotplug.c | 10 ++
.../qemuxml2argv-pseries-hostdevs-1.args | 8 +-
.../qemuxml2argv-pseries-hostdevs-2.args | 3 +-
.../qemuxml2argv-pseries-hostdevs-3.args | 2 +-
.../qemuxml2xmlout-pseries-hostdevs-1.xml | 14 ++-
.../qemuxml2xmlout-pseries-hostdevs-2.xml | 6 +-
.../qemuxml2xmlout-pseries-hostdevs-3.xml | 2 +-
8 files changed, 153 insertions(+), 10 deletions(-)
diff --git a/src/qemu/qemu_domain_address.c b/src/qemu/qemu_domain_address.c
index 3d56900..b884d06 100644
--- a/src/qemu/qemu_domain_address.c
+++ b/src/qemu/qemu_domain_address.c
@@ -906,6 +906,121 @@ qemuDomainFillAllPCIConnectFlags(virDomainDefPtr def,
/**
+ * qemuDomainFillDeviceIsolationGroupIter:
+ * @def: domain definition
+ * @dev: device definition
+ * @info: device information
+ * @opaque: user data
+ *
+ * Fill isolation group information for a single device.
+ *
+ * Used by qemuDomainFillAllIsolationGroups(), don't call directly.
+ *
+ * Return: 0 on success, <0 on failure
+ * */
+static int
+qemuDomainFillDeviceIsolationGroupIter(virDomainDefPtr def ATTRIBUTE_UNUSED,
+ virDomainDeviceDefPtr dev,
+ virDomainDeviceInfoPtr info,
+ void *opaque ATTRIBUTE_UNUSED)
+{
+ virDomainHostdevDefPtr hostdev;
+ virPCIDeviceAddressPtr hostAddr;
+
+ /* Only hostdevs... */
+ if (dev->type != VIR_DOMAIN_DEVICE_HOSTDEV)
+ return 0;
I can see one problem here - a device that is a hostdev network
interface won't be properly recognized.
We can solve the problem for plain <interface type='hostdev'> by just
checking for that in addition to <hostdev> here. The one that we can't
deal with is <interface type='network'> where the network is a pool of VFs.
We had a similar problem when trying to decide whether a device needed
to be placed on a PCI or PCIe bus, but were able to punt on that one
because all SRIOV devices are PCIe. In this case it's a bit more
problematic though.
Maybe we can do this: if a device is <interface type='network'> then we
look to the network to see if it's a "hostdev network". If so, we assign
it "some isolation group that won't be otherwise used" (basically
*anything* that assures the device is placed on a bus by itself). It's
necessary to do this because we can't know until the domain is started
exactly which SRIOV VF from the pool will be used, so we can't know the
actual IOMMU group in advance. And it's reasonable because each SRIOV VF
is by itself in its own IOMMU group, so we won't have the exact same
number as the actual iommu group, but we'll be guaranteeing that it is
different from any other device (which is really all that's important in
this context).
+
+ hostdev = dev->data.hostdev;
+
+ /* ... of the PCI kind need this extra information */
+ if (hostdev->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS ||
+ hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) {
+ return 0;
+ }
+
+ hostAddr = &hostdev->source.subsys.u.pci.addr;
+
+ /* The isolation group is simply the IOMMU group assigned by the host */
+ info->isolationGroup = virPCIDeviceAddressGetIOMMUGroupNum(hostAddr);
+
+ if (info->isolationGroup < 0) {
+ VIR_WARN("Can't look up isolation group for device
%04x:%02x:%02x.%x",
+ hostAddr->domain, hostAddr->bus,
+ hostAddr->slot, hostAddr->function);
+ } else {
+ VIR_DEBUG("Isolation group for device %04x:%02x:%02x.%x is %d",
+ hostAddr->domain, hostAddr->bus,
+ hostAddr->slot, hostAddr->function,
+ info->isolationGroup);
+ }
+
+ return info->isolationGroup;
+}
+
+
+/**
+ * qemuDomainFillAllIsolationGroups:
+ * @def: domain definition
+ *
+ * Fill isolation group information for all devices in @def.
+ *
+ * Used by qemuDomainSetupIsolationGroups(), don't call directly.
+ *
+ * Return: 0 on success, <0 on failure
+ */
+static int
+qemuDomainFillAllIsolationGroups(virDomainDefPtr def)
+{
+ return virDomainDeviceInfoIterate(def,
+ qemuDomainFillDeviceIsolationGroupIter,
+ NULL);
+}
+
+
+/**
+ * qemuDomainSetupIsolationGroups:
+ * @def: domain definition
+ *
+ * High-level function to set up isolation groups for all devices
+ * and controllers in @def. Isolation groups will only be set up if
+ * the guest architecture and machine type require it, so this
+ * function can and should be called unconditionally before attempting
+ * to assign any PCI address.
+ *
+ * Return: 0 on success, <0 on failure
+ */
+static int
+qemuDomainSetupIsolationGroups(virDomainDefPtr def)
+{
+ virDomainControllerDefPtr defaultPHB;
+ int idx;
+ int ret = -1;
+
+ /* Only pSeries guests care about isolation groups at the moment */
+ if (!qemuDomainIsPSeries(def))
+ return 0;
I suppose this would be pointless on, e.g. Q35, since each device is on
its own (isolated?) root-port anyway, right?
+
+ if (qemuDomainFillAllIsolationGroups(def) < 0)
+ goto out;
+
+ idx = virDomainControllerFind(def, VIR_DOMAIN_CONTROLLER_TYPE_PCI, 0);
+ if (idx < 0)
+ goto out;
+
+ defaultPHB = def->controllers[idx];
+
+ /* We want to prevent hostdevs from being plugged into the default
+ * PHB, so we lock its isolation group */
+ defaultPHB->info.isolationGroupLocked = true;
+
+ ret = 0;
+
+ out:
+ return ret;
+}
+
+/**
* qemuDomainFillDevicePCIConnectFlags:
*
* @def: the entire DomainDef
@@ -2076,6 +2191,9 @@ qemuDomainAssignPCIAddresses(virDomainDefPtr def,
if (qemuDomainFillAllPCIConnectFlags(def, qemuCaps, driver) < 0)
goto cleanup;
+ if (qemuDomainSetupIsolationGroups(def) < 0)
+ goto cleanup;
+
if (nbuses > 0) {
/* 1st pass to figure out how many PCI bridges we need */
if (!(addrs = qemuDomainPCIAddressSetCreate(def, nbuses, true)))
diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
index 8ca5054..85db7af 100644
--- a/src/qemu/qemu_hotplug.c
+++ b/src/qemu/qemu_hotplug.c
@@ -1468,6 +1468,16 @@ qemuDomainAttachHostPCIDevice(virQEMUDriverPtr driver,
if (qemuAssignDeviceHostdevAlias(vm->def, &info->alias, -1) < 0)
goto error;
+
+ /* When attaching a hostdev to a pSeries guest, we need to set up
+ * its isolation group if we want the assigned address to respect
+ * isolation constraints */
+ if (qemuDomainIsPSeries(vm->def)) {
+ virPCIDeviceAddressPtr hostAddr = &hostdev->source.subsys.u.pci.addr;
+
+ info->isolationGroup = virPCIDeviceAddressGetIOMMUGroupNum(hostAddr);
+ }
+
if (qemuDomainEnsurePCIAddress(vm, &dev, driver) < 0)
goto error;
releaseaddr = true;
diff --git a/tests/qemuxml2argvdata/qemuxml2argv-pseries-hostdevs-1.args
b/tests/qemuxml2argvdata/qemuxml2argv-pseries-hostdevs-1.args
index 88eb081..36f2def 100644
--- a/tests/qemuxml2argvdata/qemuxml2argv-pseries-hostdevs-1.args
+++ b/tests/qemuxml2argvdata/qemuxml2argv-pseries-hostdevs-1.args
@@ -18,6 +18,8 @@ QEMU_AUDIO_DRV=none \
server,nowait \
-mon chardev=charmonitor,id=monitor,mode=readline \
-boot c \
--device vfio-pci,host=0001:01:00.0,id=hostdev0,bus=pci.0,addr=0x1 \
--device vfio-pci,host=0005:90:01.0,id=hostdev1,bus=pci.0,addr=0x2 \
--device vfio-pci,host=0001:01:00.1,id=hostdev2,bus=pci.0,addr=0x3
+-device spapr-pci-host-bridge,index=1,id=pci.1 \
+-device spapr-pci-host-bridge,index=2,id=pci.2 \
+-device vfio-pci,host=0001:01:00.0,id=hostdev0,bus=pci.1.0,addr=0x1 \
+-device vfio-pci,host=0005:90:01.0,id=hostdev1,bus=pci.2.0,addr=0x1 \
+-device vfio-pci,host=0001:01:00.1,id=hostdev2,bus=pci.1.0,addr=0x2
diff --git a/tests/qemuxml2argvdata/qemuxml2argv-pseries-hostdevs-2.args
b/tests/qemuxml2argvdata/qemuxml2argv-pseries-hostdevs-2.args
index 83d4306..cd5b664 100644
--- a/tests/qemuxml2argvdata/qemuxml2argv-pseries-hostdevs-2.args
+++ b/tests/qemuxml2argvdata/qemuxml2argv-pseries-hostdevs-2.args
@@ -19,6 +19,7 @@ server,nowait \
-mon chardev=charmonitor,id=monitor,mode=readline \
-boot c \
-device spapr-pci-host-bridge,index=1,id=pci.1 \
+-device spapr-pci-host-bridge,index=2,id=pci.2 \
-device virtio-scsi-pci,id=scsi0,bus=pci.1.0,addr=0x1 \
-device vfio-pci,host=0001:01:00.0,id=hostdev0,bus=pci.1.0,addr=0x2 \
--device vfio-pci,host=0005:90:01.0,id=hostdev1,bus=pci.0,addr=0x1
+-device vfio-pci,host=0005:90:01.0,id=hostdev1,bus=pci.2.0,addr=0x1
diff --git a/tests/qemuxml2argvdata/qemuxml2argv-pseries-hostdevs-3.args
b/tests/qemuxml2argvdata/qemuxml2argv-pseries-hostdevs-3.args
index eda6cc7..66a31ba 100644
--- a/tests/qemuxml2argvdata/qemuxml2argv-pseries-hostdevs-3.args
+++ b/tests/qemuxml2argvdata/qemuxml2argv-pseries-hostdevs-3.args
@@ -21,4 +21,4 @@ server,nowait \
-device spapr-pci-host-bridge,index=1,id=pci.1 \
-device spapr-pci-host-bridge,index=2,id=pci.2 \
-device vfio-pci,host=0001:01:00.0,id=hostdev0,bus=pci.2.0,addr=0x1 \
--device vfio-pci,host=0001:01:00.1,id=hostdev1,bus=pci.0,addr=0x1
+-device vfio-pci,host=0001:01:00.1,id=hostdev1,bus=pci.2.0,addr=0x2
diff --git a/tests/qemuxml2xmloutdata/qemuxml2xmlout-pseries-hostdevs-1.xml
b/tests/qemuxml2xmloutdata/qemuxml2xmlout-pseries-hostdevs-1.xml
index 34defea..494c414 100644
--- a/tests/qemuxml2xmloutdata/qemuxml2xmlout-pseries-hostdevs-1.xml
+++ b/tests/qemuxml2xmloutdata/qemuxml2xmlout-pseries-hostdevs-1.xml
@@ -19,26 +19,34 @@
<model name='spapr-pci-host-bridge'/>
<target index='0'/>
</controller>
+ <controller type='pci' index='1' model='pci-root'>
+ <model name='spapr-pci-host-bridge'/>
+ <target index='1'/>
+ </controller>
+ <controller type='pci' index='2' model='pci-root'>
+ <model name='spapr-pci-host-bridge'/>
+ <target index='2'/>
+ </controller>
<hostdev mode='subsystem' type='pci' managed='yes'>
<driver name='vfio'/>
<source>
<address domain='0x0001' bus='0x01' slot='0x00'
function='0x0'/>
</source>
- <address type='pci' domain='0x0000' bus='0x00'
slot='0x01' function='0x0'/>
+ <address type='pci' domain='0x0000' bus='0x01'
slot='0x01' function='0x0'/>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='yes'>
<driver name='vfio'/>
<source>
<address domain='0x0005' bus='0x90' slot='0x01'
function='0x0'/>
</source>
- <address type='pci' domain='0x0000' bus='0x00'
slot='0x02' function='0x0'/>
+ <address type='pci' domain='0x0000' bus='0x02'
slot='0x01' function='0x0'/>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='yes'>
<driver name='vfio'/>
<source>
<address domain='0x0001' bus='0x01' slot='0x00'
function='0x1'/>
</source>
- <address type='pci' domain='0x0000' bus='0x00'
slot='0x03' function='0x0'/>
+ <address type='pci' domain='0x0000' bus='0x01'
slot='0x02' function='0x0'/>
</hostdev>
<memballoon model='none'/>
<panic model='pseries'/>
diff --git a/tests/qemuxml2xmloutdata/qemuxml2xmlout-pseries-hostdevs-2.xml
b/tests/qemuxml2xmloutdata/qemuxml2xmlout-pseries-hostdevs-2.xml
index 17ff4c8..cfa395b 100644
--- a/tests/qemuxml2xmloutdata/qemuxml2xmlout-pseries-hostdevs-2.xml
+++ b/tests/qemuxml2xmloutdata/qemuxml2xmlout-pseries-hostdevs-2.xml
@@ -26,6 +26,10 @@
<model name='spapr-pci-host-bridge'/>
<target index='1'/>
</controller>
+ <controller type='pci' index='2' model='pci-root'>
+ <model name='spapr-pci-host-bridge'/>
+ <target index='2'/>
+ </controller>
<hostdev mode='subsystem' type='pci' managed='yes'>
<driver name='vfio'/>
<source>
@@ -38,7 +42,7 @@
<source>
<address domain='0x0005' bus='0x90' slot='0x01'
function='0x0'/>
</source>
- <address type='pci' domain='0x0000' bus='0x00'
slot='0x01' function='0x0'/>
+ <address type='pci' domain='0x0000' bus='0x02'
slot='0x01' function='0x0'/>
</hostdev>
<memballoon model='none'/>
<panic model='pseries'/>
diff --git a/tests/qemuxml2xmloutdata/qemuxml2xmlout-pseries-hostdevs-3.xml
b/tests/qemuxml2xmloutdata/qemuxml2xmlout-pseries-hostdevs-3.xml
index 58023ec..f91959b 100644
--- a/tests/qemuxml2xmloutdata/qemuxml2xmlout-pseries-hostdevs-3.xml
+++ b/tests/qemuxml2xmloutdata/qemuxml2xmlout-pseries-hostdevs-3.xml
@@ -39,7 +39,7 @@
<source>
<address domain='0x0001' bus='0x01' slot='0x00'
function='0x1'/>
</source>
- <address type='pci' domain='0x0000' bus='0x00'
slot='0x01' function='0x0'/>
+ <address type='pci' domain='0x0000' bus='0x02'
slot='0x02' function='0x0'/>
</hostdev>
<memballoon model='none'/>
<panic model='pseries'/>