The "group" attribute of the <driver> element provides a method to
easily detach all devices in an IOMMU group from the host and bind
them to the vfio-pci driver so that one or more of those devices can
be assigned to a guest. Because this operation makes all the devices
in the group unusable by the host, it is not the default mode of
operation, but must be consciously selected in the config by the user.
This patch is only the parser/formatter, RNG, docs and XML tests. It
does not actually tie this new option in to the driver - that will be
in the next patch.
---
docs/formatdomain.html.in | 94 +++++++++++++++++++---
docs/formatnetwork.html.in | 11 +++
docs/schemas/domaincommon.rng | 16 ++++
docs/schemas/network.rng | 8 ++
src/conf/domain_conf.c | 36 ++++++++-
src/conf/domain_conf.h | 13 +++
src/conf/network_conf.c | 39 ++++++++-
src/conf/network_conf.h | 14 ++++
tests/networkxml2xmlin/hostdev-pf.xml | 2 +-
tests/networkxml2xmlout/hostdev-pf.xml | 2 +-
.../qemuxml2argvdata/qemuxml2argv-hostdev-vfio.xml | 2 +-
.../qemuxml2argv-net-hostdev-vfio.xml | 2 +-
12 files changed, 220 insertions(+), 19 deletions(-)
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
index 77126a5..f8d662a 100644
--- a/docs/formatdomain.html.in
+++ b/docs/formatdomain.html.in
@@ -2515,18 +2515,60 @@
more details on the address element.</dd>
<dt><code>driver</code></dt>
<dd>
- PCI devices can have an optional <code>driver</code>
- subelement that specifies which backend driver to use for PCI
- device assignment. Use the <code>name</code> attribute to
- select either "vfio" (for the new VFIO device assignment
- backend, which is compatible with UEFI SecureBoot) or "kvm"
- (for the legacy device assignment handled directly by the KVM
- kernel module)<span class="since">Since 1.0.5 (QEMU and KVM
- only, requires kernel 3.6 or newer)</span>. Currently, "kvm"
- is the default used by libvirt when not explicitly provided,
- but since the two are functionally equivalent, this default
- could be changed in the future with no impact to domains that
- don't specify anything.
+ <p>
+ PCI devices can have an optional <code>driver</code>
+ subelement that specifies which backend driver to use for PCI
+ device assignment. Use the <code>name</code> attribute to
+ select either "vfio" (for the new VFIO device assignment
+ backend, which is compatible with UEFI SecureBoot) or "kvm"
+ (for the legacy device assignment handled directly by the KVM
+ kernel module)<span class="since">Since 1.0.5 (QEMU and KVM
+ only, requires kernel 3.6 or newer)</span>. Currently, "kvm"
+ is the default used by libvirt when not explicitly provided,
+ but since the two are functionally equivalent, this default
+ could be changed in the future with no impact to domains that
+ don't specify anything.
+ </p>
+ <p>
+ When <code>name='vfio'</code> is specified, the
+ attribute <code>group</code> can also be optionally given in
+ the <code><driver></code>
+ element<span class="since">Since 1.0.7 (QEMU and KVM only,
+ requires kernel 3.6 or newer)</span>. <code>group</code>
+ instructs libvirt how to deal with multiple devices in a
+ single "IOMMU group" when it is doing managed device
+ assignment. Normally libvirt will only bind the single
+ device being assigned to the vfio-pci driver. VFIO uses the
+ concept of device "groups" to describe sets of devices that
+ cannot be safely used by multiple guests and/or the host
+ without compromising the security of one or the other (due
+ to shared DMA resources, etc); VFIO will not allow such
+ mixing to happen, and one way that it enforces this is to
+ require that, before any device in a group can be assigned
+ to a guest, <b>all</b> devices in the group must be detached
+ from their respective host drivers and be attached instead
+ to the vfio-pci driver (or some other driver that VFIO deems
+ to adequately sequester the device - currently the pci-stub
+ and pcieport drivers are acceptable alternatives). To make
+ this happen automatically (keeping in mind that the host
+ will lose all use of the devices in the group), simply
+ add <code>group='auto'</code> to
+ the <code><driver></code> element; libvirt will then
+ automatically detach from the host all devices in the same
+ group as the device being assigned, then bind them all to
+ the vfio-pci driver; any or all of those devices can then be
+ assigned to the same guest, but they cannot be used by the
+ host nor can they be assigned to any other guest.<br/><br/>
+
+ <b>Extreme care</b> should be taken when
+ using <code>group='auto'</code>, to assure you are not
+ detaching the driver for a device that is currently in use
+ by the host (for example, the driver to the system disk or
+ something similar). The output of "virsh nodedev-dumpxml"
+ for the device you will be assigning will provide a list of
+ all devices in the same IOMMU group.
+ </p>
+
</dd>
<dt><code>readonly</code></dt>
<dd>Indicates that the device is readonly, only supported by SCSI host
@@ -3156,6 +3198,15 @@
</p>
<p>
+ When <code>name='vfio'</code> is specified, the
+ attribute <code>group</code> can also be optionally given in
+ the <code><driver></code> element. See the description
of
+ this attribute in the <a href="#elementsHostDev">Host Device
+ Assignment</a> section above. <span class="since">Since
1.0.7
+ (QEMU and KVM only, requires kernel 3.6 or newer)</span>.
+ </p>
+
+ <p>
Note that this "intelligent passthrough" of network devices is
very similar to the functionality of a standard <hostdev>
device, the difference being that this method allows specifying
@@ -3173,7 +3224,7 @@
...
<devices>
<interface type='hostdev'>
- <driver name='vfio'/>
+ <driver name='vfio' group='auto'/>
<source>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x07' function='0x0'/>
</source>
@@ -3318,6 +3369,23 @@ qemu-kvm -net nic,model=? /dev/null
kernel 3.6 or newer)</span>
</dd>
+ <dt><code>group</code></dt>
+ <dd>
+ <b>Only for interfaces of type='hostdev'</b>, and only when
+ the interface element has <code>managed='yes'</code> set, if
+ the driver name has been set to "vfio", the
<code>group</code>
+ attribute can be set to "auto" in order to automatically
+ detach from the host all devices in the same iommu group as
+ this device, and bind them to the vfio-pci driver. See the
+ description of this attribute in
+ the <a href="#elementsHostDev">Host Device Assignment</a>
+ section above. <span class="since">Since 1.0.7 (QEMU and KVM
+ only, requires kernel 3.6 or newer)</span>.<br/><br/>
+
+ <b>In general you should leave this option alone, unless you
+ are very certain you know what you are doing.</b>
+ </dd>
+
<dt><code>txmode</code></dt>
<dd>
The <code>txmode</code> attribute specifies how to handle
diff --git a/docs/formatnetwork.html.in b/docs/formatnetwork.html.in
index a1198ce..a7eff7e 100644
--- a/docs/formatnetwork.html.in
+++ b/docs/formatnetwork.html.in
@@ -295,6 +295,17 @@
<span class="since">Since 1.0.5 (QEMU and KVM only,
requires kernel 3.6 or newer)</span>
</p>
+ <p>
+ When <code>name='vfio'</code> is specified, the
+ attribute <code>group</code> can also be optionally
+ given in the <code><driver></code> element. See
+ the description of this attribute in the description of
+ domain <a href="formatdomain.html#elementsHostDev">Host
+ Device Assignment</a>. <span class="since">Since
1.0.7
+ (QEMU and KVM only, requires kernel 3.6 or
+ newer)</span>.
+ </p>
+
<p>Note that this "intelligent passthrough" of network
devices is very similar to the functionality of a
standard <code>< hostdev></code> device, the
diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng
index cf82878..0d0e37e 100644
--- a/docs/schemas/domaincommon.rng
+++ b/docs/schemas/domaincommon.rng
@@ -2000,6 +2000,14 @@
<value>vfio</value>
</choice>
</attribute>
+ <optional>
+ <attribute name="group">
+ <choice>
+ <value>manual</value>
+ <value>auto</value>
+ </choice>
+ </attribute>
+ </optional>
</group>
<group>
<optional>
@@ -3213,6 +3221,14 @@
<value>vfio</value>
</choice>
</attribute>
+ <optional>
+ <attribute name="group">
+ <choice>
+ <value>manual</value>
+ <value>auto</value>
+ </choice>
+ </attribute>
+ </optional>
<empty/>
</element>
</optional>
diff --git a/docs/schemas/network.rng b/docs/schemas/network.rng
index ded8580..9b786ae 100644
--- a/docs/schemas/network.rng
+++ b/docs/schemas/network.rng
@@ -157,6 +157,14 @@
<value>vfio</value>
</choice>
</attribute>
+ <optional>
+ <attribute name="group">
+ <choice>
+ <value>manual</value>
+ <value>auto</value>
+ </choice>
+ </attribute>
+ </optional>
<empty/>
</element>
</optional>
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index e41dfa2..997d68d 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -606,6 +606,12 @@ VIR_ENUM_IMPL(virDomainHostdevSubsysPciBackend,
"kvm",
"vfio")
+VIR_ENUM_IMPL(virDomainHostdevSubsysPciGroupMgmt,
+ VIR_DOMAIN_HOSTDEV_PCI_GROUP_MGMT_LAST,
+ "default",
+ "manual",
+ "auto")
+
VIR_ENUM_IMPL(virDomainHostdevCaps, VIR_DOMAIN_HOSTDEV_CAPS_TYPE_LAST,
"storage",
"misc",
@@ -3932,6 +3938,8 @@ virDomainHostdevDefParseXMLSubsys(xmlNodePtr node,
char *sgio = NULL;
char *backendStr = NULL;
int backend;
+ char *groupMgmtStr = NULL;
+ int groupMgmt;
int ret = -1;
/* @managed can be read from the xml document - it is always an
@@ -4014,6 +4022,18 @@ virDomainHostdevDefParseXMLSubsys(xmlNodePtr node,
}
def->source.subsys.u.pci.backend = backend;
+ groupMgmt = VIR_DOMAIN_HOSTDEV_PCI_GROUP_MGMT_DEFAULT;
+ if ((groupMgmtStr = virXPathString("string(./driver/@group)", ctxt))
&&
+ (((groupMgmt =
virDomainHostdevSubsysPciGroupMgmtTypeFromString(groupMgmtStr)) < 0) ||
+ groupMgmt == VIR_DOMAIN_HOSTDEV_PCI_GROUP_MGMT_DEFAULT)) {
+ virReportError(VIR_ERR_XML_ERROR,
+ _("Unknown PCI device group management method "
+ "<driver group='%s'/> has been
specified"),
+ groupMgmtStr);
+ goto error;
+ }
+ def->source.subsys.u.pci.groupMgmt = groupMgmt;
+
break;
case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_USB:
@@ -4038,6 +4058,7 @@ error:
VIR_FREE(managed);
VIR_FREE(sgio);
VIR_FREE(backendStr);
+ VIR_FREE(groupMgmtStr);
return ret;
}
@@ -14361,6 +14382,8 @@ virDomainHostdevDefFormatSubsys(virBufferPtr buf,
if (def->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI &&
def->source.subsys.u.pci.backend != VIR_DOMAIN_HOSTDEV_PCI_BACKEND_DEFAULT) {
const char *backend =
virDomainHostdevSubsysPciBackendTypeToString(def->source.subsys.u.pci.backend);
+ const char *groupMgmt
+ =
virDomainHostdevSubsysPciGroupMgmtTypeToString(def->source.subsys.u.pci.groupMgmt);
if (!backend) {
virReportError(VIR_ERR_INTERNAL_ERROR,
@@ -14368,7 +14391,18 @@ virDomainHostdevDefFormatSubsys(virBufferPtr buf,
def->source.subsys.u.pci.backend);
return -1;
}
- virBufferAsprintf(buf, "<driver name='%s'/>\n",
backend);
+ virBufferAsprintf(buf, "<driver name='%s'", backend);
+ if (def->source.subsys.u.pci.groupMgmt !=
VIR_DOMAIN_HOSTDEV_PCI_GROUP_MGMT_DEFAULT) {
+ if (!groupMgmt) {
+ virReportError(VIR_ERR_INTERNAL_ERROR,
+ _("unexpected pci hostdev driver group "
+ "management method %d"),
+ def->source.subsys.u.pci.groupMgmt);
+ return -1;
+ }
+ virBufferAsprintf(buf, " group='%s'", groupMgmt);
+ }
+ virBufferAddLit(buf, "/>\n");
}
virBufferAddLit(buf, "<source");
diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h
index 3817e37..04bddcc 100644
--- a/src/conf/domain_conf.h
+++ b/src/conf/domain_conf.h
@@ -400,6 +400,18 @@ typedef enum {
VIR_ENUM_DECL(virDomainHostdevSubsysPciBackend)
+/* how to manage multiple devices in the same iommu group */
+typedef enum {
+ VIR_DOMAIN_HOSTDEV_PCI_GROUP_MGMT_DEFAULT, /* default is "manual" */
+ /* don't automatically detach all devices in group */
+ VIR_DOMAIN_HOSTDEV_PCI_GROUP_MGMT_MANUAL,
+ /* *do* automatically detach/attach all devices in group */
+ VIR_DOMAIN_HOSTDEV_PCI_GROUP_MGMT_AUTO,
+ VIR_DOMAIN_HOSTDEV_PCI_GROUP_MGMT_LAST
+} virDomainHostdevSubsysPciGroupMgmtType;
+
+VIR_ENUM_DECL(virDomainHostdevSubsysPciGroupMgmt)
+
typedef struct _virDomainHostdevSubsys virDomainHostdevSubsys;
typedef virDomainHostdevSubsys *virDomainHostdevSubsysPtr;
struct _virDomainHostdevSubsys {
@@ -417,6 +429,7 @@ struct _virDomainHostdevSubsys {
struct {
virDevicePCIAddress addr; /* host address */
int backend; /* enum virDomainHostdevSubsysPciBackendType */
+ int groupMgmt; /* enum virDomainHostdevSubsysPciGroupMgmtType */
} pci;
struct {
char *adapter;
diff --git a/src/conf/network_conf.c b/src/conf/network_conf.c
index 2b4845c..16d2386 100644
--- a/src/conf/network_conf.c
+++ b/src/conf/network_conf.c
@@ -66,6 +66,12 @@ VIR_ENUM_IMPL(virNetworkForwardDriverName,
"kvm",
"vfio")
+VIR_ENUM_IMPL(virNetworkForwardDriverGroupMgmt,
+ VIR_NETWORK_FORWARD_DRIVER_GROUP_MGMT_LAST,
+ "default",
+ "manual",
+ "auto")
+
virNetworkObjPtr virNetworkFindByUUID(const virNetworkObjListPtr nets,
const unsigned char *uuid)
{
@@ -1688,6 +1694,7 @@ virNetworkForwardDefParseXML(const char *networkName,
char *forwardDev = NULL;
char *forwardManaged = NULL;
char *forwardDriverName = NULL;
+ char *forwardDriverGroupMgmt = NULL;
char *type = NULL;
xmlNodePtr save = ctxt->node;
@@ -1725,6 +1732,21 @@ virNetworkForwardDefParseXML(const char *networkName,
def->driverName = driverName;
}
+ forwardDriverGroupMgmt = virXPathString("string(./driver/@group)", ctxt);
+ if (forwardDriverGroupMgmt) {
+ int driverGroupMgmt
+ = virNetworkForwardDriverGroupMgmtTypeFromString(forwardDriverGroupMgmt);
+
+ if (driverGroupMgmt <= 0) {
+ virReportError(VIR_ERR_INTERNAL_ERROR,
+ _("Unknown forward <driver group='%s'/>
"
+ "in network %s"),
+ forwardDriverGroupMgmt, networkName);
+ goto cleanup;
+ }
+ def->driverGroupMgmt = driverGroupMgmt;
+ }
+
/* bridge and hostdev modes can use a pool of physical interfaces */
nForwardIfs = virXPathNodeSet("./interface", ctxt, &forwardIfNodes);
if (nForwardIfs < 0) {
@@ -1907,6 +1929,7 @@ cleanup:
VIR_FREE(forwardDev);
VIR_FREE(forwardManaged);
VIR_FREE(forwardDriverName);
+ VIR_FREE(forwardDriverGroupMgmt);
VIR_FREE(forwardPfNodes);
VIR_FREE(forwardIfNodes);
VIR_FREE(forwardAddrNodes);
@@ -2619,7 +2642,21 @@ virNetworkDefFormatInternal(virBufferPtr buf,
def->forward.driverName);
goto error;
}
- virBufferAsprintf(buf, "<driver name='%s'/>\n",
driverName);
+ virBufferAsprintf(buf, "<driver name='%s'",
driverName);
+ if (def->forward.driverGroupMgmt
+ != VIR_NETWORK_FORWARD_DRIVER_GROUP_MGMT_DEFAULT) {
+ const char *driverGroupMgmt
+ =
virNetworkForwardDriverGroupMgmtTypeToString(def->forward.driverGroupMgmt);
+
+ if (!driverGroupMgmt) {
+ virReportError(VIR_ERR_INTERNAL_ERROR,
+ _("unexpected hostdev driver group management
type %d"),
+ def->forward.driverGroupMgmt);
+ goto error;
+ }
+ virBufferAsprintf(buf, " group='%s'",
driverGroupMgmt);
+ }
+ virBufferAddLit(buf, "/>\n");
}
if (def->forward.type == VIR_NETWORK_FORWARD_NAT) {
if (virNetworkForwardNatDefFormat(buf, &def->forward) < 0)
diff --git a/src/conf/network_conf.h b/src/conf/network_conf.h
index 43f80d4..3a879f1 100644
--- a/src/conf/network_conf.h
+++ b/src/conf/network_conf.h
@@ -76,6 +76,19 @@ typedef enum {
VIR_ENUM_DECL(virNetworkForwardDriverName)
+/* how to manage multiple devices in the same iommu group */
+typedef enum {
+ VIR_NETWORK_FORWARD_DRIVER_GROUP_MGMT_DEFAULT, /* default is "manual" */
+ /* don't automatically detach all devices in group */
+ VIR_NETWORK_FORWARD_DRIVER_GROUP_MGMT_MANUAL,
+ /* *do* automatically detach/attach all devices in group */
+ VIR_NETWORK_FORWARD_DRIVER_GROUP_MGMT_AUTO,
+
+ VIR_NETWORK_FORWARD_DRIVER_GROUP_MGMT_LAST
+} virNetworkForwardDriverGroupMgmtType;
+
+VIR_ENUM_DECL(virNetworkForwardDriverGroupMgmt)
+
typedef struct _virNetworkDHCPHostDef virNetworkDHCPHostDef;
typedef virNetworkDHCPHostDef *virNetworkDHCPHostDefPtr;
struct _virNetworkDHCPHostDef {
@@ -193,6 +206,7 @@ struct _virNetworkForwardDef {
int type; /* One of virNetworkForwardType constants */
bool managed; /* managed attribute for hostdev mode */
int driverName; /* enum virNetworkForwardDriverNameType */
+ int driverGroupMgmt; /* enum virNetworkForwardDriverGroupMgmtType */
/* If there are multiple forward devices (i.e. a pool of
* interfaces), they will be listed here.
diff --git a/tests/networkxml2xmlin/hostdev-pf.xml
b/tests/networkxml2xmlin/hostdev-pf.xml
index 5b8f598..fd38556 100644
--- a/tests/networkxml2xmlin/hostdev-pf.xml
+++ b/tests/networkxml2xmlin/hostdev-pf.xml
@@ -2,7 +2,7 @@
<name>hostdev</name>
<uuid>81ff0d90-c91e-6742-64da-4a736edb9a9b</uuid>
<forward mode='hostdev' managed='yes'>
- <driver name='vfio'/>
+ <driver name='vfio' group='auto'/>
<pf dev='eth2'/>
</forward>
</network>
diff --git a/tests/networkxml2xmlout/hostdev-pf.xml
b/tests/networkxml2xmlout/hostdev-pf.xml
index 5b8f598..fd38556 100644
--- a/tests/networkxml2xmlout/hostdev-pf.xml
+++ b/tests/networkxml2xmlout/hostdev-pf.xml
@@ -2,7 +2,7 @@
<name>hostdev</name>
<uuid>81ff0d90-c91e-6742-64da-4a736edb9a9b</uuid>
<forward mode='hostdev' managed='yes'>
- <driver name='vfio'/>
+ <driver name='vfio' group='auto'/>
<pf dev='eth2'/>
</forward>
</network>
diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-vfio.xml
b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-vfio.xml
index 8daa53a..453ab2c 100644
--- a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-vfio.xml
+++ b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-vfio.xml
@@ -23,7 +23,7 @@
<controller type='ide' index='0'/>
<controller type='pci' index='0' model='pci-root'/>
<hostdev mode='subsystem' type='pci' managed='yes'>
- <driver name='vfio'/>
+ <driver name='vfio' group='auto'/>
<source>
<address domain='0x0000' bus='0x06' slot='0x12'
function='0x5'/>
</source>
diff --git a/tests/qemuxml2argvdata/qemuxml2argv-net-hostdev-vfio.xml
b/tests/qemuxml2argvdata/qemuxml2argv-net-hostdev-vfio.xml
index 90419a4..3b8ff14 100644
--- a/tests/qemuxml2argvdata/qemuxml2argv-net-hostdev-vfio.xml
+++ b/tests/qemuxml2argvdata/qemuxml2argv-net-hostdev-vfio.xml
@@ -24,7 +24,7 @@
<controller type='pci' index='0' model='pci-root'/>
<interface type='hostdev' managed='yes'>
<mac address='00:11:22:33:44:55'/>
- <driver name='vfio'/>
+ <driver name='vfio' group='auto'/>
<source>
<address type='pci' domain='0x0002' bus='0x03'
slot='0x07' function='0x1'/>
</source>
--
1.7.11.7