On 09/06/2016 08:58 AM, Eric Farman wrote:
We already have a "scsi" hostdev type, which refers to a
single LUN
that is passed through to a guest. But what of things where multiple
LUNs are passed through via a single SCSI HBA, such as with the
vhost-scsi target? Create a new hostdev type that will carry
this, and its associated documentation and XML schema information.
I assume from the review of v1 this is both HBA and vHBA? That is a
vHBA would be useful for the NPIV support to pass through the entire
vHBA rather than individual LUN's (something recently asked for at KVM
Forum). The vHBA is created when one has a vport capable scsi_host (see
http://wiki.libvirt.org/page/NPIV_in_libvirt). The "result" is a
'scsi_hostM' using a vport capable parent scsi_hostN. Now if a vHBA
cannot (or should not) be passed through, then we'll need to be sure to
test and avoid it. Unfortunately we had a lab issue and the system I
normally use for my vHBA/NPIV work was affected - trying to get it
resurected remotely...
Signed-off-by: Eric Farman <farman(a)linux.vnet.ibm.com>
---
docs/formatdomain.html.in | 24 ++++++++++++++++
docs/schemas/domaincommon.rng | 23 +++++++++++++++
src/conf/domain_audit.c | 2 ++
src/conf/domain_conf.c | 56 +++++++++++++++++++++++++++++++++++--
src/conf/domain_conf.h | 17 +++++++++++
src/qemu/qemu_domain_address.c | 10 +++++++
src/qemu/qemu_hotplug.c | 1 +
src/security/security_dac.c | 2 ++
tests/domaincapsschemadata/full.xml | 1 +
9 files changed, 134 insertions(+), 2 deletions(-)
I took a peek at the v1, but suffice to say didn't follow it that
closely since there's a number of changes in v2 related to patch
ordering. You could get 3 different opinions on that matter too!
Typically I'll try to add the qemu code first (capabilities first, then
command in one patch and hotplug in the next one). Once that's in place,
then I add the domain bits, docs, etc.
Since you're extending the domain XML, you will need to add an xml2xml
test in this patch. Use the XML from your patch 5 as a basis/guide.
You'll need a "tests/qemuxml2argvdata/" file and a
"tests/qemuxml2xmloutdata/" file as well as a changed to
qemuxml2xmltest.c. If the data file is the same as the outdata file,
then you can create a soft link to the data file in the outdata file
directory.
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
index feaeaa2..b79da95 100644
--- a/docs/formatdomain.html.in
+++ b/docs/formatdomain.html.in
@@ -3655,6 +3655,17 @@
</devices>
...</pre>
+ <p>or:</p>
+
+<pre>
+ ...
+ <devices>
+ <hostdev mode='subsystem' type='scsi_host'>
+ <source protocol='vhost'
wwpn='naa.50014057667280d8'/>
+ </hostdev>
+ </devices>
+ ...</pre>
+
A wwpn doesn't generally have the "naa." prefix on it. So, why not just
add that to the command line instead? That is a wwpn is a 16 hex digit
value. I'm assuming that in order to support this feature a "naa." is
prefixed and sent to qemu. Does that necessarily mean some other
hypervisor would choose to place the "naa." prefix or would then code
need to be added to strip it off for those other hypervisors? Looking
at the qemu code - it seems fairly specific.
<dl>
<dt><code>hostdev</code></dt>
<dd>The <code>hostdev</code> element is the main container for
describing
@@ -3693,6 +3704,12 @@
If a disk lun in the domain already has the rawio capability,
then this setting not required.
</dd>
+ <dt><code>scsi_host</code></dt>
+ <dd><span class="since">since 2.2.0</span>For SCSI
devices, user
+ is responsible to make sure the device is not used by host. This
+ <code>type</code> passes all LUNs presented by a single HBA to
+ the guest.
+ </dd>
It'll be at least 2.3.0
I see from the qemu code there's support for attributes 'num_queues',
'max_sectors', and 'cmd_per_lun' - since this is passing the scsi_host
through and not a 'scsi' LUN with a 'scsi' controller, should those
attributes be added as well? Not "required" for this version, but I
would hope we could add caps and attributes for that as well because I
can almost guarantee someone will ask.
</dl>
<p>
Note: The <code>managed</code> attribute is only used with PCI
devices
@@ -3756,6 +3773,13 @@
credentials to the iSCSI server.
</p>
</dd>
+ <dt><code>scsi_host</code></dt>
+ <dd><span class="since">Since 2.2.0</span>,
multiple LUNs behind a
2.3
+ single SCSI HBI are described by a
<code>protocol</code>
s/HBI/HBA
+ attribute set to "vhost" and a
<code>wwpn</code> attribute that
+ is the vhost_scsi wwpn (16 hexadecimal digits with a prefix of
+ "naa.") established in the host configfs.
+ </dd>
I don't think we mention the "naa." prefix...
It would be nice to tell a user how to find that wwpn value via some
command (virsh or otherwise) in order to fill in the wwpn attribute.
Although there's always varying opinions on this since many times it's
distro dependent. That's why I suggest virsh - at least if it's
supportable we can display the wwpn there using nodedev-dumpxml.
What/where is the 'vhost_scsi_wwpn'? IOW: It's not something defined
yet - I'm assuming there's some output somewhere where that's pulled from.
One final question - what is this exposed as on the guest? "Some"
scsi_hostM? For a SCSI LUN it's some 'sgN' value. I see the qemu side
seems to infer some amount of scsi_generic code (read_naa_id), so it's
mostly curiosity/learning for me.
</dl>
</dd>
<dt><code>vendor</code>,
<code>product</code></dt>
diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng
index af32df8..7fd6bd2 100644
--- a/docs/schemas/domaincommon.rng
+++ b/docs/schemas/domaincommon.rng
@@ -3938,6 +3938,7 @@
<ref name="hostdevsubsyspci"/>
<ref name="hostdevsubsysusb"/>
<ref name="hostdevsubsysscsi"/>
+ <ref name="hostdevsubsyshost"/>
</choice>
</define>
@@ -4066,6 +4067,28 @@
</element>
</define>
+ <define name="hostdevsubsyshost">
+ <attribute name="type">
+ <value>scsi_host</value>
+ </attribute>
+ <element name="source">
+ <choice>
+ <group>
+ <attribute name="protocol">
+ <choice>
+ <value>vhost</value> <!-- vhost, required -->
+ </choice>
+ </attribute>
+ <attribute name="wwpn">
+ <data type="string">
+ <param
name="pattern">(naa\.)[0-9a-fA-F]{16}</param>
+ </data>
If you go with the we'll add "naa." later, then you could just:
<ref name='wwn'/>
instead of <data ...> ... </data>
+ </attribute>
+ </group>
+ </choice>
+ </element>
+ </define>
+
<define name="hostdevcapsstorage">
<attribute name="type">
<value>storage</value>
diff --git a/src/conf/domain_audit.c b/src/conf/domain_audit.c
index 53a58ac..406dd8f 100644
--- a/src/conf/domain_audit.c
+++ b/src/conf/domain_audit.c
@@ -443,6 +443,8 @@ virDomainAuditHostdev(virDomainObjPtr vm, virDomainHostdevDefPtr
hostdev,
}
break;
}
+ case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_HOST:
Can't we produce something here? Not sure seeing "?" in the output of an
audit message would be useful. Perhaps the path to the device similar to
how we end up in virDomainAuditGenericDev...
+ break;
default:
VIR_WARN("Unexpected hostdev type while encoding audit message:
%d",
hostdev->source.subsys.type);
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index c8c4f61..45b643b 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -646,7 +646,8 @@ VIR_ENUM_IMPL(virDomainHostdevMode, VIR_DOMAIN_HOSTDEV_MODE_LAST,
VIR_ENUM_IMPL(virDomainHostdevSubsys, VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_LAST,
"usb",
"pci",
- "scsi")
+ "scsi",
+ "scsi_host")
VIR_ENUM_IMPL(virDomainHostdevSubsysPCIBackend,
VIR_DOMAIN_HOSTDEV_PCI_BACKEND_TYPE_LAST,
@@ -660,6 +661,10 @@ VIR_ENUM_IMPL(virDomainHostdevSubsysSCSIProtocol,
"adapter",
"iscsi")
+VIR_ENUM_IMPL(virDomainHostdevSubsysHostProtocol,
+ VIR_DOMAIN_HOSTDEV_HOST_PROTOCOL_TYPE_LAST,
+ "vhost")
+
Notice how the virDomainHostdevSubsysSCSIProtocol adds 'adapter' first -
this should do something similar (although the preference is to use
"none" - I cannot recall why that was different).
Remember that all structs are "calloc"'d and thus making sure this
setting to something other than 0 will ensure we haven't missed something.
VIR_ENUM_IMPL(virDomainHostdevCaps,
VIR_DOMAIN_HOSTDEV_CAPS_TYPE_LAST,
"storage",
"misc",
@@ -2270,6 +2275,9 @@ void virDomainHostdevDefClear(virDomainHostdevDefPtr def)
} else {
VIR_FREE(scsisrc->u.host.adapter);
}
+ } else if (def->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_HOST) {
+ virDomainHostdevSubsysHostPtr hostsrc = &def->source.subsys.u.host;
+ VIR_FREE(hostsrc->wwpn);
}
break;
}
@@ -5977,6 +5985,31 @@ virDomainHostdevSubsysSCSIDefParseXML(xmlNodePtr sourcenode,
return ret;
}
+static int
+virDomainHostdevSubsysHostDefParseXML(xmlNodePtr sourcenode,
+ virDomainHostdevDefPtr def)
+{
+ virDomainHostdevSubsysHostPtr hostsrc = &def->source.subsys.u.host;
+
This code would do the "protocol" parsing. I believe the way it's
written now protocol goes unchecked. You need to add code to use the
virDomainHostdevSubsysHostProtocolTypeFromString in order to validate
what was supplied in the XML is correct.
See virDomainHostdevSubsysSCSIDefParseXML - although make your
->protocol comparison <= 0 since we don't want a "0" value.
(Yes, there's a bug in the SCSIDefParse I think - although it's caught
later on).
+ if (!(hostsrc->wwpn = virXMLPropString(sourcenode,
"wwpn"))) {
+ virReportError(VIR_ERR_XML_ERROR, "%s",
+ _("missing vhost-scsi hostdev source path name"));
+ goto cleanup;
+ }
If we take the don't require a "naa." prefix, then virValidateWWN should
be used to validate things. Even if naa. was kept, the same validation
should be done for consistency.
+
+ if (!STRPREFIX(hostsrc->wwpn, "naa.") ||
+ strlen(hostsrc->wwpn) != strlen("naa.") + 16) {
+ virReportError(VIR_ERR_XML_ERROR, "%s", _("malformed
'wwpn' value"));
+ goto cleanup;
+ }
+
+ return 0;
+
+ cleanup:
+ VIR_FREE(hostsrc->wwpn);
+ return -1;
+}
+
static int
virDomainHostdevDefParseXMLSubsys(xmlNodePtr node,
@@ -6101,6 +6134,11 @@ virDomainHostdevDefParseXMLSubsys(xmlNodePtr node,
goto error;
break;
+ case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_HOST:
+ if (virDomainHostdevSubsysHostDefParseXML(sourcenode, def) < 0)
+ goto error;
+ break;
+
default:
virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
_("address type='%s' not supported in hostdev
interfaces"),
@@ -20521,9 +20559,11 @@ virDomainHostdevDefFormatSubsys(virBufferPtr buf,
unsigned int flags,
bool includeTypeInAddr)
{
+ bool closedSource = false;
virDomainHostdevSubsysUSBPtr usbsrc = &def->source.subsys.u.usb;
virDomainHostdevSubsysPCIPtr pcisrc = &def->source.subsys.u.pci;
virDomainHostdevSubsysSCSIPtr scsisrc = &def->source.subsys.u.scsi;
+ virDomainHostdevSubsysHostPtr hostsrc = &def->source.subsys.u.host;
virDomainHostdevSubsysSCSIHostPtr scsihostsrc = &scsisrc->u.host;
virDomainHostdevSubsysSCSIiSCSIPtr iscsisrc = &scsisrc->u.iscsi;
@@ -20564,6 +20604,15 @@ virDomainHostdevDefFormatSubsys(virBufferPtr buf,
protocol, iscsisrc->path);
}
+ if (def->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_HOST) {
FWIW: If some day in the future type='scsi_host' added some other
"protocol" (something other than vhost), then at that time this code
would be extended to handle that.
+ const char *protocol =
+ virDomainHostdevSubsysHostProtocolTypeToString(hostsrc->protocol);
+ closedSource = true;
+
+ virBufferAsprintf(buf, " protocol='%s' wwpn='%s'/",
+ protocol, hostsrc->wwpn);
+ }
+
virBufferAddLit(buf, ">\n");
virBufferAdjustIndent(buf, 2);
@@ -20617,6 +20666,8 @@ virDomainHostdevDefFormatSubsys(virBufferPtr buf,
scsihostsrc->unit);
}
break;
+ case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_HOST:
+ break;
default:
virReportError(VIR_ERR_INTERNAL_ERROR,
_("unexpected hostdev type %d"),
@@ -20632,7 +20683,8 @@ virDomainHostdevDefFormatSubsys(virBufferPtr buf,
}
virBufferAdjustIndent(buf, -2);
- virBufferAddLit(buf, "</source>\n");
+ if (!closedSource)
+ virBufferAddLit(buf, "</source>\n");
return 0;
}
diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h
index 0fe4154..781ea7b 100644
--- a/src/conf/domain_conf.h
+++ b/src/conf/domain_conf.h
@@ -294,6 +294,7 @@ typedef enum {
VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_USB,
VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI,
VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_SCSI,
+ VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_HOST,
VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_LAST
} virDomainHostdevSubsysType;
@@ -368,6 +369,21 @@ struct _virDomainHostdevSubsysSCSI {
} u;
};
+typedef enum {
VIR_DOMAIN_HOSTDEV_HOST_PROTOCOL_TYPE_NONE,
Add the NONE here.
+ VIR_DOMAIN_HOSTDEV_HOST_PROTOCOL_TYPE_VHOST,
+
+ VIR_DOMAIN_HOSTDEV_HOST_PROTOCOL_TYPE_LAST,
+} virDomainHostdevHostProtocolType;
+
+VIR_ENUM_DECL(virDomainHostdevSubsysHostProtocol)
+
+typedef struct _virDomainHostdevSubsysHost virDomainHostdevSubsysHost;
+typedef virDomainHostdevSubsysHost *virDomainHostdevSubsysHostPtr;
+struct _virDomainHostdevSubsysHost {
+ int protocol;
+ char *wwpn;
+};
+
typedef struct _virDomainHostdevSubsys virDomainHostdevSubsys;
typedef virDomainHostdevSubsys *virDomainHostdevSubsysPtr;
struct _virDomainHostdevSubsys {
@@ -376,6 +392,7 @@ struct _virDomainHostdevSubsys {
virDomainHostdevSubsysUSB usb;
virDomainHostdevSubsysPCI pci;
virDomainHostdevSubsysSCSI scsi;
+ virDomainHostdevSubsysHost host;
} u;
};
diff --git a/src/qemu/qemu_domain_address.c b/src/qemu/qemu_domain_address.c
index bb16738..8419e6a 100644
--- a/src/qemu/qemu_domain_address.c
+++ b/src/qemu/qemu_domain_address.c
@@ -301,6 +301,16 @@ qemuDomainPrimeVirtioDeviceAddresses(virDomainDefPtr def,
def->controllers[i]->info.type = type;
}
+ for (i = 0; i < def->nhostdevs; i++) {
+ if (def->hostdevs[i]->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS &&
+ def->hostdevs[i]->source.subsys.type ==
+ VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_HOST &&
+ def->hostdevs[i]->source.subsys.u.host.protocol ==
+ VIR_DOMAIN_HOSTDEV_HOST_PROTOCOL_TYPE_VHOST &&
+ def->hostdevs[i]->info->type ==
VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE)
+ def->hostdevs[i]->info->type = type;
+ }
+
if (def->memballoon &&
def->memballoon->model == VIR_DOMAIN_MEMBALLOON_MODEL_VIRTIO &&
def->memballoon->info.type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE)
diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
index d13474a..e9140a9 100644
--- a/src/qemu/qemu_hotplug.c
+++ b/src/qemu/qemu_hotplug.c
@@ -3238,6 +3238,7 @@ qemuDomainRemoveHostDevice(virQEMUDriverPtr driver,
qemuDomainRemoveUSBHostDevice(driver, vm, hostdev);
break;
case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_SCSI:
+ case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_HOST:
hmmm... If you're going to change this here, then
qemuDomainAttachHostDevice would be changed at the same time. Hence why
this ordering stuff to patches is important.
I cannot recall if removing would cause a build error in this case.
qemuDomainRemoveSCSIHostDevice(driver, vm, hostdev);
break;
case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_LAST:
diff --git a/src/security/security_dac.c b/src/security/security_dac.c
index 442ce70..de2b921 100644
--- a/src/security/security_dac.c
+++ b/src/security/security_dac.c
@@ -676,6 +676,7 @@ virSecurityDACSetHostdevLabel(virSecurityManagerPtr mgr,
break;
}
+ case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_HOST:
case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_LAST:
ret = 0;
break;
@@ -805,6 +806,7 @@ virSecurityDACRestoreHostdevLabel(virSecurityManagerPtr mgr,
break;
}
+ case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_HOST:
case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_LAST:
ret = 0;
break;
Why is only security_dac.c changed? Usually _selinux and _apparmor.c
are also changed, which I do see happening in your v1. Although I
wouldn't classify a "vhost" and an "iSCSI" host as the same thing.
The
latter is a networked resource; whereas, IIUC this new device is
essentially a file descriptor that's being passed along to qemu. So
there must be other examples of that in the security_*.c modules.
This particular area I usually don't get right either, but what I'll do
is follow the history of the changes and see if something is "similar"
to what I'm creating and then just copy that model.
Makes me wonder how this plays in a non-privileged environment (e.g.
session mode). Is it even allowed to open/read vhost-scsi.
diff --git a/tests/domaincapsschemadata/full.xml
b/tests/domaincapsschemadata/full.xml
index 2f529ff..41fb1fa 100644
--- a/tests/domaincapsschemadata/full.xml
+++ b/tests/domaincapsschemadata/full.xml
@@ -75,6 +75,7 @@
<value>usb</value>
<value>pci</value>
<value>scsi</value>
+ <value>scsi_host</value>
</enum>
<enum name='capsType'>
<value>storage</value>
FWIW: I usually use cscope to find all VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_
occurrances and be sure the _HOST option is covered. I see that nothing
in cgroup is modified. I also wonder if we need to track these in
virhostdev.c.
John