On 10/21/2015 08:22 AM, Pavel Hrdina wrote:
Adding this feature will allow users to easily attach a hostdev
network
interface using PCI passthrough.
The interface can be attached using --type=hostdev and PCI address or
network device name as --source. This command also allows you to tell,
whether the interface should be managed and to choose a assignment
driver.
Resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=997561
I think the PCI address version of this command is fairly
straightforward, so it can and should go in. But I've been thinking more
about the "netdev name" version since our exchange in the BZ, and even
had a very long treatise prepared defending my position from there, but
then I decided to look at it all again from the beginning and came to
the conclusion that we are *both* wrong :-)
The main aim here is convenience, and while I still have the position
that if we're going to make it more convenient, we should make it more
convenient at the XML level so that all consumers of libvirt can take
advantage without needing a ton of extra code, I also have realized that
specifying the netdev name is not very convenient anyway.
Why?
* Keep in mind that <interface type='hostdev'> will only work with SRIOV
VFs (because you can't set the MAC address of a normal netdev from the
host and have that MAC address persist through the guest driver's device
init).
* So any device that you assign in this way is a VF.
* A user will know which PF ("Physical Function", but from their point
of view it's "the physical port used for the connection") they want a VF
from; they don't care which VF (they're all functionally equivalent),
and anyway the standard utilities don't even tell you the netdev names
of the VFs associated with a particular PF. So it's not that simple to
learn the netdev name of the VF you want to use:
* virsh nodedev-list --tree doesn't group the VFs under their PF
(because its hierarchy is according to the PCI bus layout, which has
all the PFs and VFs at the same level)
* virsh nodedev-dumpxml for the PF device shows the VFs'
PCI addresses, NOT their netdev names.
<capability type='virt_functions'>
<address domain='0x0000' bus='0x02' slot='0x10'
function='0x0'/>
<address domain='0x0000' bus='0x02' slot='0x10'
function='0x2'/>
<address domain='0x0000' bus='0x02' slot='0x10'
function='0x4'/>
<address domain='0x0000' bus='0x02' slot='0x10'
function='0x6'/>
<address domain='0x0000' bus='0x02' slot='0x11'
function='0x0'/>
<address domain='0x0000' bus='0x02' slot='0x11'
function='0x2'/>
<address domain='0x0000' bus='0x02' slot='0x11'
function='0x4'/>
(Note the leading "0x" on these values :-P)
* "ip link show" lists the VFs directly under their PF,
but shows the VF#, not the netdev name
11: p4p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq
state UP mode DEFAULT group default qlen 1000
link/ether 90:e2:ba:02:22:01 brd ff:ff:ff:ff:ff:ff
vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
vf 2 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
vf 3 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
vf 4 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
vf 5 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
vf 6 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
So if there's not a simple way to get a list of the netdev names of the
VFs for a PF, then what convenience is gained by allowing use of netdev
name? What exactly was I thinking when I wrote comment 1 in that BZ???
What *could* be useful would be the ability to specify the PF name and
VF# of the device you want to assign, e.g.: "<source pf='p4p2'
vf='3'/>", since that's info you can easily get from "ip link
show".
Anymore, though, I don't even know how useful *that* is, since you can
already get the PCI addresses of the VFs from the output of virsh
nodedev-dumpxml (in almost exactly the format you need - just add
"type='pci'".
(I'm now even wondering if I misunderstood what the original reporter
was asking for, but unfortunately it's not possible to ask him, because
his account has been closed :-( Thinking back to what he said - there is
a *different* place where it's common to know the netdev name and want
to translate it into a PCI address - when you are assigning a
*non-SRIOV* ethernet device using plain <hostdev> (not <interface
type='hostdev'>, which only works for SRIOV VFs). In this case you
probably know the netdev name but not the PCI address. So for this case
a translation from netdev name to PCI address would make sense, and here
I would agree that the translation should happen in virsh rather than
the generic <hostdev> XML understanding what a netdev name is (contrary
to <interface type='hostdev'>, where it is well established that the
device you're assigning is a network device, and there are already
"netdev-y" config attributes, up until now <hostdev> has not contained
any config items specific to a particular type of PCI device, and I
don't think it should have any added)).
TL;DR of my opinion:
1) this patch minus the netdev-->PCI address translation is good
2) we don't need the netdev-->PCI translation for <interface
type='hostdev'>;
it's just extra code for (in my newly revised opinion) no gain.
3) a netdev-->PCI translation in virsh that creates a plain
<hostdev> might be useful (hmm, maybe keep (2) as you have it,
with an additional check for
4) adding <source pf='ethX' vf='n'/> support to <interface
type='hostdev'>
might be useful.
How much sense does any of that make?
Signed-off-by: Pavel Hrdina <phrdina(a)redhat.com>
---
tools/virsh-domain.c | 90 +++++++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 86 insertions(+), 4 deletions(-)
diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c
index e8503ec..b124441 100644
--- a/tools/virsh-domain.c
+++ b/tools/virsh-domain.c
@@ -56,6 +56,7 @@
#include "virtime.h"
#include "virtypedparam.h"
#include "virxml.h"
+#include "virsh-nodedev.h"
/* Gnulib doesn't guarantee SA_SIGINFO support. */
#ifndef SA_SIGINFO
@@ -866,6 +867,14 @@ static const vshCmdOptDef opts_attach_interface[] = {
.type = VSH_OT_BOOL,
.help = N_("print XML document rather than attach the interface")
},
+ {.name = "managed",
+ .type = VSH_OT_BOOL,
+ .help = N_("set the interface to be managed by libvirt")
Maybe a more descriptive help - "have libvirt automatically manage
detach/attach of device from host driver" (or something like that; I
know that's too long).
+ },
+ {.name = "driver",
+ .type = VSH_OT_STRING,
+ .help = N_("set driver for hostdev interface, default is 'kvm'")
Actually the default behavior depends on what is available on the host -
if only one of legacy-kvm and vfio is available, then that is the one
used, but if both are available, then vfio is used (so vfio is closer to
being "default" than kvm). The kvm method of assigning devices is
deprecated, and it has been completely disabled in some distros
(definitely RHEL7 and CentOS7, not sure about Fedora). These days
manually specifying which driver to use is mostly pointless; I vaguely
recall that it was initially added because when VFIO was first added
some were nervous about not having a simple way to fallback to old
behavior if something went wrong while using VFIO; those days are long
gone though, and I can't think of a situation where anyone would
want/need to specify which driver to use (i.e. I'm not sure we even need
to expose this option in virsh; it may confuse more than help).
+ },
{.name = NULL}
};
@@ -919,7 +928,7 @@ cmdAttachInterface(vshControl *ctl, const vshCmd *cmd)
virDomainPtr dom = NULL;
const char *mac = NULL, *target = NULL, *script = NULL,
*type = NULL, *source = NULL, *model = NULL,
- *inboundStr = NULL, *outboundStr = NULL;
+ *inboundStr = NULL, *outboundStr = NULL, *driver = NULL;
virNetDevBandwidthRate inbound, outbound;
virDomainNetType typ;
int ret;
@@ -931,6 +940,7 @@ cmdAttachInterface(vshControl *ctl, const vshCmd *cmd)
bool config = vshCommandOptBool(cmd, "config");
bool live = vshCommandOptBool(cmd, "live");
bool persistent = vshCommandOptBool(cmd, "persistent");
+ bool managed = vshCommandOptBool(cmd, "managed");
VSH_EXCLUSIVE_OPTIONS_VAR(persistent, current);
@@ -949,7 +959,8 @@ cmdAttachInterface(vshControl *ctl, const vshCmd *cmd)
vshCommandOptStringReq(ctl, cmd, "script", &script) < 0 ||
vshCommandOptStringReq(ctl, cmd, "model", &model) < 0 ||
vshCommandOptStringReq(ctl, cmd, "inbound", &inboundStr) < 0
||
- vshCommandOptStringReq(ctl, cmd, "outbound", &outboundStr) <
0)
+ vshCommandOptStringReq(ctl, cmd, "outbound", &outboundStr) < 0
||
+ vshCommandOptStringReq(ctl, cmd, "driver", &driver) < 0)
goto cleanup;
/* check interface type */
@@ -982,8 +993,23 @@ cmdAttachInterface(vshControl *ctl, const vshCmd *cmd)
}
}
+ if (typ != VIR_DOMAIN_NET_TYPE_HOSTDEV) {
+ if (managed) {
+ vshError(ctl, _("--managed is usable only with --type=hostdev"));
+ goto cleanup;
+ }
+ if (driver) {
+ vshError(ctl, _("--driver is usable only with --type=hostdev"));
+ goto cleanup;
+ }
+ }
+
/* Make XML of interface */
- virBufferAsprintf(&buf, "<interface type='%s'>\n",
type);
+ virBufferAsprintf(&buf, "<interface type='%s'", type);
+ if (managed)
+ virBufferAddLit(&buf, " managed='yes'>\n");
+ else
+ virBufferAddLit(&buf, ">\n");
virBufferAdjustIndent(&buf, 2);
switch (typ) {
@@ -995,6 +1021,63 @@ cmdAttachInterface(vshControl *ctl, const vshCmd *cmd)
case VIR_DOMAIN_NET_TYPE_DIRECT:
virBufferAsprintf(&buf, "<source dev='%s'/>\n",
source);
break;
+ case VIR_DOMAIN_NET_TYPE_HOSTDEV:
+ {
+ struct PCIAddress pciAddr = {0, 0, 0, 0};
+
+ if (str2PCIAddress(source, &pciAddr) < 0) {
+ const char *caps[] = {"net"};
+ char *tmpName = NULL;
+ virshNodeDeviceListPtr list = NULL;
+ virNodeDevicePtr netDev = NULL;
+ size_t i;
+
+ list = virshNodeDeviceListCollect(ctl, (char **)caps, 1,
+ VIR_CONNECT_LIST_NODE_DEVICES_CAP_NET);
+
+ if (!list) {
+ vshError(ctl, _("cannot list network devices"));
+ goto cleanup;
+ }
+
+ if (virAsprintf(&tmpName, "net_%s", source) < 0)
+ goto cleanup;
+
+ for (i = 0; i < list->ndevices; i++) {
+ if (STREQLEN(tmpName, virNodeDeviceGetName(list->devices[i]),
+ strlen(tmpName)))
+ netDev = list->devices[i];
+ }
+ VIR_FREE(tmpName);
+
+ if (!netDev) {
+ vshError(ctl, _("network interface '%s' doesn't
exist"),
+ source);
+ goto cleanup;
+ }
+
+ if (str2PCIAddress(virNodeDeviceGetParent(netDev)+4, &pciAddr) < 0)
{
+ virshNodeDeviceListFree(list);
+ vshError(ctl, _("cannot parse pci address for network "
+ "interface '%s'"), source);
+ goto cleanup;
+ }
+
+ virshNodeDeviceListFree(list);
+ }
+ if (driver)
+ virBufferAsprintf(&buf, "<driver name='%s'/>\n",
driver);
+
+ virBufferAddLit(&buf, "<source>\n");
+ virBufferAdjustIndent(&buf, 2);
+ virBufferAsprintf(&buf, "<address type='pci'
domain='0x%.4x'"
+ " bus='0x%.2x' slot='0x%.2x'
function='0x%.1x'/>\n",
+ pciAddr.domain, pciAddr.bus,
+ pciAddr.slot, pciAddr.function);
+ virBufferAdjustIndent(&buf, -2);
+ virBufferAddLit(&buf, "</source>\n");
+ break;
+ }
case VIR_DOMAIN_NET_TYPE_USER:
case VIR_DOMAIN_NET_TYPE_ETHERNET:
@@ -1004,7 +1087,6 @@ cmdAttachInterface(vshControl *ctl, const vshCmd *cmd)
case VIR_DOMAIN_NET_TYPE_MCAST:
case VIR_DOMAIN_NET_TYPE_UDP:
case VIR_DOMAIN_NET_TYPE_INTERNAL:
- case VIR_DOMAIN_NET_TYPE_HOSTDEV:
case VIR_DOMAIN_NET_TYPE_LAST:
vshError(ctl, _("No support for %s in command
'attach-interface'"),
type);