On Mon, Oct 26, 2015 at 04:05:28AM -0400, Laine Stump wrote:
On 10/21/2015 08:22 AM, Pavel Hrdina wrote:
> Adding this feature will allow users to easily attach a hostdev network
> interface using PCI passthrough.
>
> The interface can be attached using --type=hostdev and PCI address or
> network device name as --source. This command also allows you to tell,
> whether the interface should be managed and to choose a assignment
> driver.
>
> Resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=997561
I think the PCI address version of this command is fairly
straightforward, so it can and should go in. But I've been thinking more
about the "netdev name" version since our exchange in the BZ, and even
had a very long treatise prepared defending my position from there, but
then I decided to look at it all again from the beginning and came to
the conclusion that we are *both* wrong :-)
The main aim here is convenience, and while I still have the position
that if we're going to make it more convenient, we should make it more
convenient at the XML level so that all consumers of libvirt can take
advantage without needing a ton of extra code, I also have realized that
specifying the netdev name is not very convenient anyway.
Why?
* Keep in mind that <interface type='hostdev'> will only work with SRIOV
VFs (because you can't set the MAC address of a normal netdev from the
host and have that MAC address persist through the guest driver's device
init).
* So any device that you assign in this way is a VF.
* A user will know which PF ("Physical Function", but from their point
of view it's "the physical port used for the connection") they want a VF
from; they don't care which VF (they're all functionally equivalent),
and anyway the standard utilities don't even tell you the netdev names
of the VFs associated with a particular PF. So it's not that simple to
learn the netdev name of the VF you want to use:
* virsh nodedev-list --tree doesn't group the VFs under their PF
(because its hierarchy is according to the PCI bus layout, which has
all the PFs and VFs at the same level)
* virsh nodedev-dumpxml for the PF device shows the VFs'
PCI addresses, NOT their netdev names.
<capability type='virt_functions'>
<address domain='0x0000' bus='0x02' slot='0x10'
function='0x0'/>
<address domain='0x0000' bus='0x02' slot='0x10'
function='0x2'/>
<address domain='0x0000' bus='0x02' slot='0x10'
function='0x4'/>
<address domain='0x0000' bus='0x02' slot='0x10'
function='0x6'/>
<address domain='0x0000' bus='0x02' slot='0x11'
function='0x0'/>
<address domain='0x0000' bus='0x02' slot='0x11'
function='0x2'/>
<address domain='0x0000' bus='0x02' slot='0x11'
function='0x4'/>
(Note the leading "0x" on these values :-P)
* "ip link show" lists the VFs directly under their PF,
but shows the VF#, not the netdev name
11: p4p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq
state UP mode DEFAULT group default qlen 1000
link/ether 90:e2:ba:02:22:01 brd ff:ff:ff:ff:ff:ff
vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
vf 2 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
vf 3 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
vf 4 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
vf 5 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
vf 6 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
So if there's not a simple way to get a list of the netdev names of the
VFs for a PF, then what convenience is gained by allowing use of netdev
name? What exactly was I thinking when I wrote comment 1 in that BZ???
What *could* be useful would be the ability to specify the PF name and
VF# of the device you want to assign, e.g.: "<source pf='p4p2'
vf='3'/>", since that's info you can easily get from "ip link
show".
Anymore, though, I don't even know how useful *that* is, since you can
already get the PCI addresses of the VFs from the output of virsh
nodedev-dumpxml (in almost exactly the format you need - just add
"type='pci'".
(I'm now even wondering if I misunderstood what the original reporter
was asking for, but unfortunately it's not possible to ask him, because
his account has been closed :-( Thinking back to what he said - there is
a *different* place where it's common to know the netdev name and want
to translate it into a PCI address - when you are assigning a
*non-SRIOV* ethernet device using plain <hostdev> (not <interface
type='hostdev'>, which only works for SRIOV VFs). In this case you
probably know the netdev name but not the PCI address. So for this case
a translation from netdev name to PCI address would make sense, and here
I would agree that the translation should happen in virsh rather than
the generic <hostdev> XML understanding what a netdev name is (contrary
to <interface type='hostdev'>, where it is well established that the
device you're assigning is a network device, and there are already
"netdev-y" config attributes, up until now <hostdev> has not contained
any config items specific to a particular type of PCI device, and I
don't think it should have any added)).
TL;DR of my opinion:
1) this patch minus the netdev-->PCI address translation is good
I don't mind if the patch goes in with or without the translation. The main
purpose of this extension is to make it easier for users so they don't have to
use the generic 'attach-device' with XML definition but only specify some
arguments. But see the comment below for the 2).
2) we don't need the netdev-->PCI translation for <interface
type='hostdev'>;
it's just extra code for (in my newly revised opinion) no gain.
Actually it's not entirely true, the netdev names of VFs are listed by
'ip link show' command, but it's not clear to guess the relation to PFs:
3: enp5s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
mode DEFAULT qlen 1000
link/ether b4:b5:2f:66:3c:4f brd ff:ff:ff:ff:ff:ff
vf 0 MAC 22:67:e2:b0:28:b0, spoof checking on, link-state auto
vf 1 MAC 0a:d3:63:e6:06:45, spoof checking on, link-state auto
vf 2 MAC 52:54:00:32:93:5c, spoof checking on, link-state auto
vf 3 MAC 42:9a:79:f6:9d:cb, spoof checking on, link-state auto
14: enp5s16f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UP mode DEFAULT qlen 1000
link/ether 22:67:e2:b0:28:b0 brd ff:ff:ff:ff:ff:ff
One could benefit using the netdev name also for VF.
3) a netdev-->PCI translation in virsh that creates a plain
<hostdev> might be useful (hmm, maybe keep (2) as you have it,
with an additional check for
Yes, this might be a useful extension of 'attach-interface --type=hostdev', that
we would ether detect, whether the interface is a VF or we would use <hostdev>
as default, unless a mac address is specified or we can combine both approaches.
In this case, the command will remain the same, but it will be able to attach
VF and non-VF interfaces to the guest. This would probably require to extend
a detach-interface to be able to detach also a network interface attached as
<hostdev>.
4) adding <source pf='ethX' vf='n'/> support to
<interface type='hostdev'>
might be useful.
I don't like to extend our XMLs to introduce another way how to represent some
source device if we already have one. And I don't see the value of the PF and
VF in XML.
How much sense does any of that make?
>
> Signed-off-by: Pavel Hrdina <phrdina(a)redhat.com>
> ---
> tools/virsh-domain.c | 90 +++++++++++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 86 insertions(+), 4 deletions(-)
>
> diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c
> index e8503ec..b124441 100644
> --- a/tools/virsh-domain.c
> +++ b/tools/virsh-domain.c
> @@ -56,6 +56,7 @@
> #include "virtime.h"
> #include "virtypedparam.h"
> #include "virxml.h"
> +#include "virsh-nodedev.h"
>
> /* Gnulib doesn't guarantee SA_SIGINFO support. */
> #ifndef SA_SIGINFO
> @@ -866,6 +867,14 @@ static const vshCmdOptDef opts_attach_interface[] = {
> .type = VSH_OT_BOOL,
> .help = N_("print XML document rather than attach the interface")
> },
> + {.name = "managed",
> + .type = VSH_OT_BOOL,
> + .help = N_("set the interface to be managed by libvirt")
Maybe a more descriptive help - "have libvirt automatically manage
detach/attach of device from host driver" (or something like that; I
know that's too long).
That makes sense, I'll try to come up with something better.
> + },
> + {.name = "driver",
> + .type = VSH_OT_STRING,
> + .help = N_("set driver for hostdev interface, default is
'kvm'")
Actually the default behavior depends on what is available on the host -
if only one of legacy-kvm and vfio is available, then that is the one
used, but if both are available, then vfio is used (so vfio is closer to
being "default" than kvm). The kvm method of assigning devices is
deprecated, and it has been completely disabled in some distros
(definitely RHEL7 and CentOS7, not sure about Fedora). These days
manually specifying which driver to use is mostly pointless; I vaguely
recall that it was initially added because when VFIO was first added
some were nervous about not having a simple way to fallback to old
behavior if something went wrong while using VFIO; those days are long
gone though, and I can't think of a situation where anyone would
want/need to specify which driver to use (i.e. I'm not sure we even need
to expose this option in virsh; it may confuse more than help).
In that case, the documentation for <interface type='hostdev'> is not
correct,
because there is a statement '(or simply omit the <driver> element, since
"kvm"
is currently the default)'. Looking at the <hostdev> section in our
documentation, this is already updated and there is a correct description, when
is which driver used as default. In that case, I'll remove it from virsh.
Pavel