[libvirt] [PATCH 0/4] improve virsh attache-interface

Pavel Hrdina (4): virsh-domain: use correct base for virStrToLong_ui virsh-nodedev: makes struct and functions for NodeDevice list available virsh-domain: update attach-interface to support type=hostdev virsh.pod: update and improve a attach-interface section tools/virsh-domain.c | 120 ++++++++++++++++++++++++++++++++++++++++++-------- tools/virsh-nodedev.c | 16 +++---- tools/virsh-nodedev.h | 11 +++++ tools/virsh.pod | 85 +++++++++++++++++++++-------------- 4 files changed, 170 insertions(+), 62 deletions(-) -- 2.6.2

While parsing device addresses we should use correct base and don't count on auto-detect. For example, PCI address uses hex numbers, but each number starting with 0 will be auto-detected as octal number and that's wrong. Another wrong use-case is for PCI address if for example bus is 10, than it's incorrectly parsed as decimal number. PCI and CCW addresses have all values as hex numbers, IDE and SCSI addresses are in decimal numbers. Signed-off-by: Pavel Hrdina <phrdina@redhat.com> --- tools/virsh-domain.c | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index 4191548..e8503ec 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -455,19 +455,19 @@ static int str2PCIAddress(const char *str, struct PCIAddress *pciAddr) domain = (char *)str; - if (virStrToLong_ui(domain, &bus, 0, &pciAddr->domain) != 0) + if (virStrToLong_ui(domain, &bus, 16, &pciAddr->domain) != 0) return -1; bus++; - if (virStrToLong_ui(bus, &slot, 0, &pciAddr->bus) != 0) + if (virStrToLong_ui(bus, &slot, 16, &pciAddr->bus) != 0) return -1; slot++; - if (virStrToLong_ui(slot, &function, 0, &pciAddr->slot) != 0) + if (virStrToLong_ui(slot, &function, 16, &pciAddr->slot) != 0) return -1; function++; - if (virStrToLong_ui(function, NULL, 0, &pciAddr->function) != 0) + if (virStrToLong_ui(function, NULL, 16, &pciAddr->function) != 0) return -1; return 0; @@ -484,15 +484,15 @@ static int str2SCSIAddress(const char *str, struct SCSIAddress *scsiAddr) controller = (char *)str; - if (virStrToLong_uip(controller, &bus, 0, &scsiAddr->controller) != 0) + if (virStrToLong_uip(controller, &bus, 10, &scsiAddr->controller) != 0) return -1; bus++; - if (virStrToLong_uip(bus, &unit, 0, &scsiAddr->bus) != 0) + if (virStrToLong_uip(bus, &unit, 10, &scsiAddr->bus) != 0) return -1; unit++; - if (virStrToLong_ullp(unit, NULL, 0, &scsiAddr->unit) != 0) + if (virStrToLong_ullp(unit, NULL, 10, &scsiAddr->unit) != 0) return -1; return 0; @@ -509,15 +509,15 @@ static int str2IDEAddress(const char *str, struct IDEAddress *ideAddr) controller = (char *)str; - if (virStrToLong_ui(controller, &bus, 0, &ideAddr->controller) != 0) + if (virStrToLong_ui(controller, &bus, 10, &ideAddr->controller) != 0) return -1; bus++; - if (virStrToLong_ui(bus, &unit, 0, &ideAddr->bus) != 0) + if (virStrToLong_ui(bus, &unit, 10, &ideAddr->bus) != 0) return -1; unit++; - if (virStrToLong_ui(unit, NULL, 0, &ideAddr->unit) != 0) + if (virStrToLong_ui(unit, NULL, 10, &ideAddr->unit) != 0) return -1; return 0; @@ -534,15 +534,15 @@ static int str2CCWAddress(const char *str, struct CCWAddress *ccwAddr) cssid = (char *)str; - if (virStrToLong_ui(cssid, &ssid, 0, &ccwAddr->cssid) != 0) + if (virStrToLong_ui(cssid, &ssid, 16, &ccwAddr->cssid) != 0) return -1; ssid++; - if (virStrToLong_ui(ssid, &devno, 0, &ccwAddr->ssid) != 0) + if (virStrToLong_ui(ssid, &devno, 16, &ccwAddr->ssid) != 0) return -1; devno++; - if (virStrToLong_ui(devno, NULL, 0, &ccwAddr->devno) != 0) + if (virStrToLong_ui(devno, NULL, 16, &ccwAddr->devno) != 0) return -1; return 0; @@ -739,8 +739,8 @@ cmdAttachDisk(vshControl *ctl, const vshCmd *cmd) } else if (STRPREFIX((const char *)target, "hd")) { if (diskAddr.type == DISK_ADDR_TYPE_IDE) { virBufferAsprintf(&buf, - "<address type='drive' controller='%d'" - " bus='%d' unit='%d' />\n", + "<address type='drive' controller='%u'" + " bus='%u' unit='%u' />\n", diskAddr.addr.ide.controller, diskAddr.addr.ide.bus, diskAddr.addr.ide.unit); } else { -- 2.6.2

On 10/21/2015 08:22 AM, Pavel Hrdina wrote:
While parsing device addresses we should use correct base and don't count on auto-detect. For example, PCI address uses hex numbers, but each number starting with 0 will be auto-detected as octal number and that's wrong. Another wrong use-case is for PCI address if for example bus is 10, than it's incorrectly parsed as decimal number.
PCI and CCW addresses have all values as hex numbers, IDE and SCSI addresses are in decimal numbers.
I've seen examples for PCI that use decimal a number for the slot (which is the one item that's likely to have a value > 7 unless you have a system with a whole bunch of PCI controllers)[*], and those that use hex numbers always prefix the number with "0x". libvirt itself always prefixes the domain, bus, and slot with 0x, so an auto-detected base will always get those right. So I I think the existing code is correct, and don't see an upside to making this restriction/invisible change in semantics, and it could potentially lead to incorrect results if someone is thinking that they're entering decimal numbers. [*] There was one particular document that even went to the trouble of explaining how to convert the hex value of slot into a decimal number to use in the libvirt config!
Signed-off-by: Pavel Hrdina <phrdina@redhat.com> --- tools/virsh-domain.c | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-)
diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index 4191548..e8503ec 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -455,19 +455,19 @@ static int str2PCIAddress(const char *str, struct PCIAddress *pciAddr)
domain = (char *)str;
- if (virStrToLong_ui(domain, &bus, 0, &pciAddr->domain) != 0) + if (virStrToLong_ui(domain, &bus, 16, &pciAddr->domain) != 0) return -1;
bus++; - if (virStrToLong_ui(bus, &slot, 0, &pciAddr->bus) != 0) + if (virStrToLong_ui(bus, &slot, 16, &pciAddr->bus) != 0) return -1;
slot++; - if (virStrToLong_ui(slot, &function, 0, &pciAddr->slot) != 0) + if (virStrToLong_ui(slot, &function, 16, &pciAddr->slot) != 0) return -1;
function++; - if (virStrToLong_ui(function, NULL, 0, &pciAddr->function) != 0) + if (virStrToLong_ui(function, NULL, 16, &pciAddr->function) != 0) return -1;
return 0; @@ -484,15 +484,15 @@ static int str2SCSIAddress(const char *str, struct SCSIAddress *scsiAddr)
controller = (char *)str;
- if (virStrToLong_uip(controller, &bus, 0, &scsiAddr->controller) != 0) + if (virStrToLong_uip(controller, &bus, 10, &scsiAddr->controller) != 0) return -1;
bus++; - if (virStrToLong_uip(bus, &unit, 0, &scsiAddr->bus) != 0) + if (virStrToLong_uip(bus, &unit, 10, &scsiAddr->bus) != 0) return -1;
unit++; - if (virStrToLong_ullp(unit, NULL, 0, &scsiAddr->unit) != 0) + if (virStrToLong_ullp(unit, NULL, 10, &scsiAddr->unit) != 0) return -1;
return 0; @@ -509,15 +509,15 @@ static int str2IDEAddress(const char *str, struct IDEAddress *ideAddr)
controller = (char *)str;
- if (virStrToLong_ui(controller, &bus, 0, &ideAddr->controller) != 0) + if (virStrToLong_ui(controller, &bus, 10, &ideAddr->controller) != 0) return -1;
bus++; - if (virStrToLong_ui(bus, &unit, 0, &ideAddr->bus) != 0) + if (virStrToLong_ui(bus, &unit, 10, &ideAddr->bus) != 0) return -1;
unit++; - if (virStrToLong_ui(unit, NULL, 0, &ideAddr->unit) != 0) + if (virStrToLong_ui(unit, NULL, 10, &ideAddr->unit) != 0) return -1;
return 0; @@ -534,15 +534,15 @@ static int str2CCWAddress(const char *str, struct CCWAddress *ccwAddr)
cssid = (char *)str;
- if (virStrToLong_ui(cssid, &ssid, 0, &ccwAddr->cssid) != 0) + if (virStrToLong_ui(cssid, &ssid, 16, &ccwAddr->cssid) != 0) return -1;
ssid++; - if (virStrToLong_ui(ssid, &devno, 0, &ccwAddr->ssid) != 0) + if (virStrToLong_ui(ssid, &devno, 16, &ccwAddr->ssid) != 0) return -1;
devno++; - if (virStrToLong_ui(devno, NULL, 0, &ccwAddr->devno) != 0) + if (virStrToLong_ui(devno, NULL, 16, &ccwAddr->devno) != 0) return -1;
return 0; @@ -739,8 +739,8 @@ cmdAttachDisk(vshControl *ctl, const vshCmd *cmd) } else if (STRPREFIX((const char *)target, "hd")) { if (diskAddr.type == DISK_ADDR_TYPE_IDE) { virBufferAsprintf(&buf, - "<address type='drive' controller='%d'" - " bus='%d' unit='%d' />\n", + "<address type='drive' controller='%u'" + " bus='%u' unit='%u' />\n", diskAddr.addr.ide.controller, diskAddr.addr.ide.bus, diskAddr.addr.ide.unit); } else {

On Thu, Oct 22, 2015 at 11:24:02 -0400, Laine Stump wrote:
On 10/21/2015 08:22 AM, Pavel Hrdina wrote:
While parsing device addresses we should use correct base and don't count on auto-detect. For example, PCI address uses hex numbers, but each number starting with 0 will be auto-detected as octal number and that's wrong. Another wrong use-case is for PCI address if for example bus is 10, than it's incorrectly parsed as decimal number.
PCI and CCW addresses have all values as hex numbers, IDE and SCSI addresses are in decimal numbers.
I've seen examples for PCI that use decimal a number for the slot (which is the one item that's likely to have a value > 7 unless you have a system with a whole bunch of PCI controllers)[*], and those that use hex numbers always prefix the number with "0x". libvirt itself always a
lspci doesn't really use 0x prefix: 00:16.0 Communication controller: Intel Corporation Wildcat Point-LP MEI Controller #1 (rev 03) 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (3) I218-LM (rev 03) 00:1b.0 Audio device: Intel Corporation Wildcat Point-LP High Definition Audio Controller (rev 03) 00:1c.0 PCI bridge: Intel Corporation Wildcat Point-LP PCI Express Root Port #6 (rev e3) Btw that's a laptop, not a super special system.
prefixes the domain, bus, and slot with 0x, so an auto-detected base will always get those right.
Well, not entirely: nodedev identificators use hex, possibly padded with zeroes: pci_0000_00_03_0 pci_0000_00_14_0 pci_0000_00_16_0 pci_0000_00_19_0 pci_0000_00_1b_0 Which may result also in some octal 'fun' on boxes with 16 buses ;). nodedev XML uses decimal in the parsed address: $ virsh nodedev-dumpxml pci_0000_00_19_0 <device> <name>pci_0000_00_19_0</name> <path>/sys/devices/pci0000:00/0000:00:19.0</path> <parent>computer</parent> <driver> <name>e1000e</name> </driver> <capability type='pci'> <domain>0</domain> <bus>0</bus> <slot>25</slot> <function>0</function> <product id='0x15a2'>Ethernet Connection (3) I218-LM</product> <vendor id='0x8086'>Intel Corporation</vendor> </capability> </device> (Uhhh, so 19 ... or 25? ... or perhaps 31?) And for hostdevs we are promoting the use of 0x prefixed hex: <address domain='0x0000' bus='0x06' slot='0x02' function='0x0'/> As well as with target addresses. Fortunately. <sarcasm>Well that's very consistent.</sarcasm>
So I I think the existing code is correct, and don't see an upside to making this restriction/invisible change in semantics, and it could potentially lead to incorrect results if someone is thinking that they're entering decimal numbers.
The existing code is correct only perhaps from a historical point of view. From a functional it's more than flawed. ...
[*] There was one particular document that even went to the trouble of explaining how to convert the hex value of slot into a decimal number to use in the libvirt config!
Well, this hints that we do actually something wrong here. The ambiguity in parsing of numbers with virStrToLong_ui has already bitten us (I won't bother looking up the commit though). The problem is that if you use 0 as a base argument it's not really clear what you've parsed and that will result in situations like this. The developer may be lucky in trying numbers that were parsed correctly misleading him that he used the correct base. I think that we shouldn't be using 0 as base in ANY place in our code. As of this particular case: If we format it in base 16, we should parse it in base 16. Having ambiguity in the parser code will only end up in problems.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com> --- tools/virsh-domain.c | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-)
Peter

On Fri, Oct 23, 2015 at 09:33:49AM +0200, Peter Krempa wrote:
On Thu, Oct 22, 2015 at 11:24:02 -0400, Laine Stump wrote:
On 10/21/2015 08:22 AM, Pavel Hrdina wrote:
While parsing device addresses we should use correct base and don't count on auto-detect. For example, PCI address uses hex numbers, but each number starting with 0 will be auto-detected as octal number and that's wrong. Another wrong use-case is for PCI address if for example bus is 10, than it's incorrectly parsed as decimal number.
PCI and CCW addresses have all values as hex numbers, IDE and SCSI addresses are in decimal numbers.
I've seen examples for PCI that use decimal a number for the slot (which is the one item that's likely to have a value > 7 unless you have a system with a whole bunch of PCI controllers)[*], and those that use hex numbers always prefix the number with "0x". libvirt itself always a
lspci doesn't really use 0x prefix:
00:16.0 Communication controller: Intel Corporation Wildcat Point-LP MEI Controller #1 (rev 03) 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (3) I218-LM (rev 03) 00:1b.0 Audio device: Intel Corporation Wildcat Point-LP High Definition Audio Controller (rev 03) 00:1c.0 PCI bridge: Intel Corporation Wildcat Point-LP PCI Express Root Port #6 (rev e3)
Btw that's a laptop, not a super special system.
prefixes the domain, bus, and slot with 0x, so an auto-detected base will always get those right.
Well, not entirely:
nodedev identificators use hex, possibly padded with zeroes:
pci_0000_00_03_0 pci_0000_00_14_0 pci_0000_00_16_0 pci_0000_00_19_0 pci_0000_00_1b_0
Which may result also in some octal 'fun' on boxes with 16 buses ;).
nodedev XML uses decimal in the parsed address:
$ virsh nodedev-dumpxml pci_0000_00_19_0 <device> <name>pci_0000_00_19_0</name> <path>/sys/devices/pci0000:00/0000:00:19.0</path> <parent>computer</parent> <driver> <name>e1000e</name> </driver> <capability type='pci'> <domain>0</domain> <bus>0</bus> <slot>25</slot> <function>0</function> <product id='0x15a2'>Ethernet Connection (3) I218-LM</product> <vendor id='0x8086'>Intel Corporation</vendor> </capability> </device>
(Uhhh, so 19 ... or 25? ... or perhaps 31?)
And for hostdevs we are promoting the use of 0x prefixed hex:
<address domain='0x0000' bus='0x06' slot='0x02' function='0x0'/>
As well as with target addresses. Fortunately.
<sarcasm>Well that's very consistent.</sarcasm>
So I I think the existing code is correct, and don't see an upside to making this restriction/invisible change in semantics, and it could potentially lead to incorrect results if someone is thinking that they're entering decimal numbers.
The existing code is correct only perhaps from a historical point of view. From a functional it's more than flawed. ...
[*] There was one particular document that even went to the trouble of explaining how to convert the hex value of slot into a decimal number to use in the libvirt config!
Well, this hints that we do actually something wrong here. The ambiguity in parsing of numbers with virStrToLong_ui has already bitten us (I won't bother looking up the commit though).
The problem is that if you use 0 as a base argument it's not really clear what you've parsed and that will result in situations like this. The developer may be lucky in trying numbers that were parsed correctly misleading him that he used the correct base.
I think that we shouldn't be using 0 as base in ANY place in our code.
As of this particular case: If we format it in base 16, we should parse it in base 16. Having ambiguity in the parser code will only end up in problems.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com> --- tools/virsh-domain.c | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-)
Peter
A agree with Peter, as there is no reasonable point to represent PCI address in decimal numbers, only libvirt does that and as said, it's wrong. I don't know any other place, where the PCI address is printed as decimal number. I've been motivated to update this behavior after I've used '0000:05:10.1' as an argument for the '--srouce' and it parsed the slot as decimal number and printed in to the XML as '0x0a'. Pavel
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On 10/23/2015 03:59 AM, Pavel Hrdina wrote:
On Fri, Oct 23, 2015 at 09:33:49AM +0200, Peter Krempa wrote:
On Thu, Oct 22, 2015 at 11:24:02 -0400, Laine Stump wrote:
On 10/21/2015 08:22 AM, Pavel Hrdina wrote:
While parsing device addresses we should use correct base and don't count on auto-detect. For example, PCI address uses hex numbers, but each number starting with 0 will be auto-detected as octal number and that's wrong. Another wrong use-case is for PCI address if for example bus is 10, than it's incorrectly parsed as decimal number.
PCI and CCW addresses have all values as hex numbers, IDE and SCSI addresses are in decimal numbers. I've seen examples for PCI that use decimal a number for the slot (which is the one item that's likely to have a value > 7 unless you have a system with a whole bunch of PCI controllers)[*], and those that use hex numbers always prefix the number with "0x". libvirt itself always a lspci doesn't really use 0x prefix:
My statement above was for any libvirt documentation, not "any program anywhere displaying a PCI address". The standard notation for a "unified PCI address" (coining my own term :-) is as they are displayed in lspci (and in all the entries in sysfs).
00:16.0 Communication controller: Intel Corporation Wildcat Point-LP MEI Controller #1 (rev 03) 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (3) I218-LM (rev 03) 00:1b.0 Audio device: Intel Corporation Wildcat Point-LP High Definition Audio Controller (rev 03) 00:1c.0 PCI bridge: Intel Corporation Wildcat Point-LP PCI Express Root Port #6 (rev e3)
Btw that's a laptop, not a super special system.
prefixes the domain, bus, and slot with 0x, so an auto-detected base will always get those right. Well, not entirely:
nodedev identificators use hex, possibly padded with zeroes:
pci_0000_00_03_0 pci_0000_00_14_0 pci_0000_00_16_0 pci_0000_00_19_0 pci_0000_00_1b_0
Which may result also in some octal 'fun' on boxes with 16 buses ;).
Sigh. Yes, I was only considering the <address> element, not the stuff in node device XML and virsh nodedev-* output :-(. What you have pointed out makes me even more wary of changing libvirt's parsing behavior.
nodedev XML uses decimal in the parsed address:
$ virsh nodedev-dumpxml pci_0000_00_19_0 <device> <name>pci_0000_00_19_0</name> <path>/sys/devices/pci0000:00/0000:00:19.0</path> <parent>computer</parent> <driver> <name>e1000e</name> </driver> <capability type='pci'> <domain>0</domain> <bus>0</bus> <slot>25</slot> <function>0</function> <product id='0x15a2'>Ethernet Connection (3) I218-LM</product> <vendor id='0x8086'>Intel Corporation</vendor> </capability> </device>
(Uhhh, so 19 ... or 25? ... or perhaps 31?)
And for hostdevs we are promoting the use of 0x prefixed hex:
<address domain='0x0000' bus='0x06' slot='0x02' function='0x0'/>
As well as with target addresses. Fortunately.
<sarcasm>Well that's very consistent.</sarcasm>
Yes, the inconsistency is actually part of my point. If we silently change the parsing to now recognize "25" (no "0x" prefix) as a hex number, it's going to give the wrong results for any XML written using a decimal slot# that worked properly on "old" libvirt, and it's going to make everything confusing in a different way since everyone will need to keep straight which version of libvirt interprets it which way - you'll *never* be able to write examples for documentation that omit the "0x" anyway, since you can't be sure the reader of the documentation has a new enough version of libvirt to force parsing as hex. For example, if someone has a script that uses XML they created "a long time ago" by copy/pasting the slot# from the nodedev-dumpxml output (or maybe just hand-writing it) into XML that they use for "virsh create" or "virsh attach-device" (i.e. the XML is stored/generated external to libvirt in its original form, *not* normalized by the libvirt parse/format cycle), and upgraded libvirt will magically/silently begin interpreting any slot# > 9 differently, leading to anything from a different layout of devices on the guest PCI bus, to failure to attach a device because it wasn't found (or even worse, silently/erroneously attaching the *wrong* host device to the guest). I don't think that is acceptable. If we want to get rid of the inconsistency, without creating the possibility of *silently* performing incorrect operations, we should: * require the input in the <address> element to have a "0x" prefix to be sure that there is no assumption on the part of the user that they are entering a decimal number. * change the node device XML (nodedev-dumpxml) to output the domain, bus, and slot elements as hex, including the 0x (function is irrelevant, since by definition it can never be > 7) Even doing that would carry the danger of causing existing functioning systems to fail though (although they would at least fail in a vocal manner). (BTW, I looked for any places in libvirt that currently parse a number *in config* as hex without the 0x prefix - the only one I can find is this: * <address type='spapr-vio' reg='0x3000'/> <- reg is forced base16 (and a deprecated "devaddr" in the status (so probably unused in many years) that parses a "unified" address). Anything else that is read from config is either forced to base 10, or is base 0. So there isn't much precedence for interpreting numbers in libvirt config with no "0x" prefix as hex.)
So I I think the existing code is correct, and don't see an upside to making this restriction/invisible change in semantics, and it could potentially lead to incorrect results if someone is thinking that they're entering decimal numbers. The existing code is correct only perhaps from a historical point of view. From a functional it's more than flawed. ...
[*] There was one particular document that even went to the trouble of explaining how to convert the hex value of slot into a decimal number to use in the libvirt config! Well, this hints that we do actually something wrong here. The ambiguity in parsing of numbers with virStrToLong_ui has already bitten us (I won't bother looking up the commit though).
It's not ambiguous to the parser. Any given string is parsed to exactly one result. The problem is that the users expect the parser to behave differently than it does. Well, *some* of the users do. Maybe even *most* of them. But not *all* of them :-). As for previous problems with differing bases, this reminded me of a very troublesome problem that I spent the time to look up just to fill the blank in my memory - commit 8efebd176. This is a case where libvirt was formatting the bus/device numbers of a USB device on the qemu commandline with %.3d (forcing leading 0's) and qemu was then interpreting the numbers as octal, so silently/magically attaching the wrong USB device. This problem wasn't solved by making qemu always parse USB bus/device as decimal though; it was instead solved by making libvirt aware of / follow the rules of qemu's parser. (I realize it's a bit more difficult to "patch the user", but that is also an option :-)
The problem is that if you use 0 as a base argument it's not really clear what you've parsed and that will result in situations like this. The developer may be lucky in trying numbers that were parsed correctly misleading him that he used the correct base.
I think that we shouldn't be using 0 as base in ANY place in our code.
I think that it is dangerous to have a mix of base 16 and base 10 numbers in a file while not requiring a clear indicator on every number of what base was used to encode them. Sure there are some numbers that are obviously hex (an address in memory) and there are some that are obviously decimal (timeout in seconds), but there are some where it's just not clear to the user. The case of PCI addresses is one of those (mostly due to the node device XML. But simply changing that won't eliminate all the existing examples out on the internet (nor will it immediately upgrade all the existing installations of libvirt). (BTW, I also think there is absolutely *no* place for octal representations of numbers anywhere in anything libvirt does. This is not 1975, and we're not all (waiting in line for our turn at) sitting in front of a TTY connected to a PDP-11 :-P)
As of this particular case: If we format it in base 16, we should parse it in base 16. Having ambiguity in the parser code will only end up in problems.
Okay, then to take that to its logical conclusion - if we format it with a leading 0x, then we should require the leading 0x when parsing.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com> --- tools/virsh-domain.c | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-)
Peter
A agree with Peter, as there is no reasonable point to represent PCI address in decimal numbers, only libvirt does that and as said, it's wrong. I don't know any other place, where the PCI address is printed as decimal number.
I've been motivated to update this behavior after I've used '0000:05:10.1' as an argument for the '--srouce' and it parsed the slot as decimal number and printed in to the XML as '0x0a'.
Yeah I can understand that happening (and your annoyance when it did), and if all pci address info everywhere in libvirt and its documentation was (and always had been) output in hex (*without* the leading "0x") I would probably agree with this patch. But due to the history (and current state) I think it's just going to create as many problems as it solves. I *might* agree with it if it was done by requiring a leading '0x' (having the effect of causing failure for any existing XML that had been using decimal - I'm not 100% sure that would be acceptable either, but at least it is a *visible* failure, rather than a silent one), and I would be more prone to agree if the node device XML was modified to output the bus/slot/port in hex (with leading 0x - again, I can't say for sure that even that wouldn't screw up some existing management application though, so I'm *still* hesitant).

On Fri, Oct 23, 2015 at 12:48:07PM -0400, Laine Stump wrote:
On 10/23/2015 03:59 AM, Pavel Hrdina wrote:
On Fri, Oct 23, 2015 at 09:33:49AM +0200, Peter Krempa wrote:
On Thu, Oct 22, 2015 at 11:24:02 -0400, Laine Stump wrote:
On 10/21/2015 08:22 AM, Pavel Hrdina wrote:
While parsing device addresses we should use correct base and don't count on auto-detect. For example, PCI address uses hex numbers, but each number starting with 0 will be auto-detected as octal number and that's wrong. Another wrong use-case is for PCI address if for example bus is 10, than it's incorrectly parsed as decimal number.
PCI and CCW addresses have all values as hex numbers, IDE and SCSI addresses are in decimal numbers. I've seen examples for PCI that use decimal a number for the slot (which is the one item that's likely to have a value > 7 unless you have a system with a whole bunch of PCI controllers)[*], and those that use hex numbers always prefix the number with "0x". libvirt itself always a lspci doesn't really use 0x prefix:
My statement above was for any libvirt documentation, not "any program anywhere displaying a PCI address". The standard notation for a "unified PCI address" (coining my own term :-) is as they are displayed in lspci (and in all the entries in sysfs).
00:16.0 Communication controller: Intel Corporation Wildcat Point-LP MEI Controller #1 (rev 03) 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (3) I218-LM (rev 03) 00:1b.0 Audio device: Intel Corporation Wildcat Point-LP High Definition Audio Controller (rev 03) 00:1c.0 PCI bridge: Intel Corporation Wildcat Point-LP PCI Express Root Port #6 (rev e3)
Btw that's a laptop, not a super special system.
prefixes the domain, bus, and slot with 0x, so an auto-detected base will always get those right. Well, not entirely:
nodedev identificators use hex, possibly padded with zeroes:
pci_0000_00_03_0 pci_0000_00_14_0 pci_0000_00_16_0 pci_0000_00_19_0 pci_0000_00_1b_0
Which may result also in some octal 'fun' on boxes with 16 buses ;).
Sigh. Yes, I was only considering the <address> element, not the stuff in node device XML and virsh nodedev-* output :-(. What you have pointed out makes me even more wary of changing libvirt's parsing behavior.
nodedev XML uses decimal in the parsed address:
$ virsh nodedev-dumpxml pci_0000_00_19_0 <device> <name>pci_0000_00_19_0</name> <path>/sys/devices/pci0000:00/0000:00:19.0</path> <parent>computer</parent> <driver> <name>e1000e</name> </driver> <capability type='pci'> <domain>0</domain> <bus>0</bus> <slot>25</slot> <function>0</function> <product id='0x15a2'>Ethernet Connection (3) I218-LM</product> <vendor id='0x8086'>Intel Corporation</vendor> </capability> </device>
(Uhhh, so 19 ... or 25? ... or perhaps 31?)
And for hostdevs we are promoting the use of 0x prefixed hex:
<address domain='0x0000' bus='0x06' slot='0x02' function='0x0'/>
As well as with target addresses. Fortunately.
<sarcasm>Well that's very consistent.</sarcasm>
Yes, the inconsistency is actually part of my point. If we silently change the parsing to now recognize "25" (no "0x" prefix) as a hex number, it's going to give the wrong results for any XML written using a decimal slot# that worked properly on "old" libvirt, and it's going to make everything confusing in a different way since everyone will need to keep straight which version of libvirt interprets it which way - you'll *never* be able to write examples for documentation that omit the "0x" anyway, since you can't be sure the reader of the documentation has a new enough version of libvirt to force parsing as hex.
For example, if someone has a script that uses XML they created "a long time ago" by copy/pasting the slot# from the nodedev-dumpxml output (or maybe just hand-writing it) into XML that they use for "virsh create" or "virsh attach-device" (i.e. the XML is stored/generated external to libvirt in its original form, *not* normalized by the libvirt parse/format cycle), and upgraded libvirt will magically/silently begin interpreting any slot# > 9 differently, leading to anything from a different layout of devices on the guest PCI bus, to failure to attach a device because it wasn't found (or even worse, silently/erroneously attaching the *wrong* host device to the guest). I don't think that is acceptable.
If we want to get rid of the inconsistency, without creating the possibility of *silently* performing incorrect operations, we should:
* require the input in the <address> element to have a "0x" prefix to be sure that there is no assumption on the part of the user that they are entering a decimal number.
* change the node device XML (nodedev-dumpxml) to output the domain, bus, and slot elements as hex, including the 0x (function is irrelevant, since by definition it can never be > 7)
Even doing that would carry the danger of causing existing functioning systems to fail though (although they would at least fail in a vocal manner).
(BTW, I looked for any places in libvirt that currently parse a number *in config* as hex without the 0x prefix - the only one I can find is this:
* <address type='spapr-vio' reg='0x3000'/> <- reg is forced base16
(and a deprecated "devaddr" in the status (so probably unused in many years) that parses a "unified" address). Anything else that is read from config is either forced to base 10, or is base 0. So there isn't much precedence for interpreting numbers in libvirt config with no "0x" prefix as hex.)
So I I think the existing code is correct, and don't see an upside to making this restriction/invisible change in semantics, and it could potentially lead to incorrect results if someone is thinking that they're entering decimal numbers. The existing code is correct only perhaps from a historical point of view. From a functional it's more than flawed. ...
[*] There was one particular document that even went to the trouble of explaining how to convert the hex value of slot into a decimal number to use in the libvirt config! Well, this hints that we do actually something wrong here. The ambiguity in parsing of numbers with virStrToLong_ui has already bitten us (I won't bother looking up the commit though).
It's not ambiguous to the parser. Any given string is parsed to exactly one result. The problem is that the users expect the parser to behave differently than it does. Well, *some* of the users do. Maybe even *most* of them. But not *all* of them :-).
As for previous problems with differing bases, this reminded me of a very troublesome problem that I spent the time to look up just to fill the blank in my memory - commit 8efebd176. This is a case where libvirt was formatting the bus/device numbers of a USB device on the qemu commandline with %.3d (forcing leading 0's) and qemu was then interpreting the numbers as octal, so silently/magically attaching the wrong USB device. This problem wasn't solved by making qemu always parse USB bus/device as decimal though; it was instead solved by making libvirt aware of / follow the rules of qemu's parser. (I realize it's a bit more difficult to "patch the user", but that is also an option :-)
The problem is that if you use 0 as a base argument it's not really clear what you've parsed and that will result in situations like this. The developer may be lucky in trying numbers that were parsed correctly misleading him that he used the correct base.
I think that we shouldn't be using 0 as base in ANY place in our code.
I think that it is dangerous to have a mix of base 16 and base 10 numbers in a file while not requiring a clear indicator on every number of what base was used to encode them. Sure there are some numbers that are obviously hex (an address in memory) and there are some that are obviously decimal (timeout in seconds), but there are some where it's just not clear to the user. The case of PCI addresses is one of those (mostly due to the node device XML. But simply changing that won't eliminate all the existing examples out on the internet (nor will it immediately upgrade all the existing installations of libvirt).
(BTW, I also think there is absolutely *no* place for octal representations of numbers anywhere in anything libvirt does. This is not 1975, and we're not all (waiting in line for our turn at) sitting in front of a TTY connected to a PDP-11 :-P)
As of this particular case: If we format it in base 16, we should parse it in base 16. Having ambiguity in the parser code will only end up in problems.
Okay, then to take that to its logical conclusion - if we format it with a leading 0x, then we should require the leading 0x when parsing.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com> --- tools/virsh-domain.c | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-)
Peter
A agree with Peter, as there is no reasonable point to represent PCI address in decimal numbers, only libvirt does that and as said, it's wrong. I don't know any other place, where the PCI address is printed as decimal number.
I've been motivated to update this behavior after I've used '0000:05:10.1' as an argument for the '--srouce' and it parsed the slot as decimal number and printed in to the XML as '0x0a'.
Yeah I can understand that happening (and your annoyance when it did), and if all pci address info everywhere in libvirt and its documentation was (and always had been) output in hex (*without* the leading "0x") I would probably agree with this patch. But due to the history (and current state) I think it's just going to create as many problems as it solves. I *might* agree with it if it was done by requiring a leading '0x' (having the effect of causing failure for any existing XML that had been using decimal - I'm not 100% sure that would be acceptable either, but at least it is a *visible* failure, rather than a silent one), and I would be more prone to agree if the node device XML was modified to output the bus/slot/port in hex (with leading 0x - again, I can't say for sure that even that wouldn't screw up some existing management application though, so I'm *still* hesitant).
The current situation is not ideal, it's not documented anywhere and for users you can only try the command and see what happens. Yes, it can and probably will break some scripts for some users, but I think we should make it somehow consistent and document it properly how to format it. The output of nodedev-dumpxml should be definitely fixed to print the PCI address using only hex numbers. For the parsing part, this code is currently used only for 'attach-disk' and the 'attach-interface' will be a new functionality and we can easily restrict the format of PCI address to be provided only in hex numbers with '0x' prefix. The only issue here is the 'attach-disk' where we could break some user's use-cases. Pavel
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On 10/26/2015 05:53 AM, Pavel Hrdina wrote:
The current situation is not ideal, it's not documented anywhere and for users you can only try the command and see what happens. Yes, it can and probably will break some scripts for some users, but I think we should make it somehow consistent and document it properly how to format it.
The output of nodedev-dumpxml should be definitely fixed to print the PCI address using only hex numbers.
For the parsing part, this code is currently used only for 'attach-disk' and the 'attach-interface' will be a new functionality and we can easily restrict the format of PCI address to be provided only in hex numbers with '0x' prefix.
/me wakes up Monday morning, reads this paragraph, and realizes that he should have looked at the entire patch before commenting :-/ Yeah, forget everything I said. I hadn't scrolled all the way to the bottom of the patch, but just assumed from the commit message that it was going to fix the base at 16 for the XML parser too. Pretty stupid of me - sorry for all the noise over nothing. Definitely any time someone is writing a unified PCI address (DDDD:BB:SS.F) I think we can safely assume/require they are using hex. Really, I'm surprised this didn't trip anyone up before now. ACK.

On Mon, Oct 26, 2015 at 09:48:20AM -0400, Laine Stump wrote:
On 10/26/2015 05:53 AM, Pavel Hrdina wrote:
The current situation is not ideal, it's not documented anywhere and for users you can only try the command and see what happens. Yes, it can and probably will break some scripts for some users, but I think we should make it somehow consistent and document it properly how to format it.
The output of nodedev-dumpxml should be definitely fixed to print the PCI address using only hex numbers.
For the parsing part, this code is currently used only for 'attach-disk' and the 'attach-interface' will be a new functionality and we can easily restrict the format of PCI address to be provided only in hex numbers with '0x' prefix.
/me wakes up Monday morning, reads this paragraph, and realizes that he should have looked at the entire patch before commenting :-/
Yeah, forget everything I said. I hadn't scrolled all the way to the bottom of the patch, but just assumed from the commit message that it was going to fix the base at 16 for the XML parser too. Pretty stupid of me - sorry for all the noise over nothing.
Definitely any time someone is writing a unified PCI address (DDDD:BB:SS.F) I think we can safely assume/require they are using hex. Really, I'm surprised this didn't trip anyone up before now.
ACK.
Thanks :) pushed now Pavel
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

Next patch will use those function to collect NodeDevice list and find a specific device. Make functions virshNodeDeviceListCollect() and virshNodeDeviceListFree() together with struct virshNodeDeviceList available to reuse existing code. Signed-off-by: Pavel Hrdina <phrdina@redhat.com> --- tools/virsh-nodedev.c | 16 +++++----------- tools/virsh-nodedev.h | 11 +++++++++++ 2 files changed, 16 insertions(+), 11 deletions(-) diff --git a/tools/virsh-nodedev.c b/tools/virsh-nodedev.c index cc359e2..26f2c7b 100644 --- a/tools/virsh-nodedev.c +++ b/tools/virsh-nodedev.c @@ -194,13 +194,7 @@ virshNodeDeviceSorter(const void *a, const void *b) virNodeDeviceGetName(*nb)); } -struct virshNodeDeviceList { - virNodeDevicePtr *devices; - size_t ndevices; -}; -typedef struct virshNodeDeviceList *virshNodeDeviceListPtr; - -static void +void virshNodeDeviceListFree(virshNodeDeviceListPtr list) { size_t i; @@ -215,11 +209,11 @@ virshNodeDeviceListFree(virshNodeDeviceListPtr list) VIR_FREE(list); } -static virshNodeDeviceListPtr +virshNodeDeviceListPtr virshNodeDeviceListCollect(vshControl *ctl, - char **capnames, - int ncapnames, - unsigned int flags) + char **capnames, + int ncapnames, + unsigned int flags) { virshNodeDeviceListPtr list = vshMalloc(ctl, sizeof(*list)); size_t i; diff --git a/tools/virsh-nodedev.h b/tools/virsh-nodedev.h index c64f7df..1d2337b 100644 --- a/tools/virsh-nodedev.h +++ b/tools/virsh-nodedev.h @@ -30,4 +30,15 @@ extern const vshCmdDef nodedevCmds[]; +struct virshNodeDeviceList { + virNodeDevicePtr *devices; + size_t ndevices; +}; +typedef struct virshNodeDeviceList *virshNodeDeviceListPtr; + +virshNodeDeviceListPtr virshNodeDeviceListCollect(vshControl *ctl, + char **capnames, + int ncapnames, + unsigned int flags); +void virshNodeDeviceListFree(virshNodeDeviceListPtr list); #endif /* VIRSH_NODEDEV_H */ -- 2.6.2

Adding this feature will allow users to easily attach a hostdev network interface using PCI passthrough. The interface can be attached using --type=hostdev and PCI address or network device name as --source. This command also allows you to tell, whether the interface should be managed and to choose a assignment driver. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=997561 Signed-off-by: Pavel Hrdina <phrdina@redhat.com> --- tools/virsh-domain.c | 90 +++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 86 insertions(+), 4 deletions(-) diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index e8503ec..b124441 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -56,6 +56,7 @@ #include "virtime.h" #include "virtypedparam.h" #include "virxml.h" +#include "virsh-nodedev.h" /* Gnulib doesn't guarantee SA_SIGINFO support. */ #ifndef SA_SIGINFO @@ -866,6 +867,14 @@ static const vshCmdOptDef opts_attach_interface[] = { .type = VSH_OT_BOOL, .help = N_("print XML document rather than attach the interface") }, + {.name = "managed", + .type = VSH_OT_BOOL, + .help = N_("set the interface to be managed by libvirt") + }, + {.name = "driver", + .type = VSH_OT_STRING, + .help = N_("set driver for hostdev interface, default is 'kvm'") + }, {.name = NULL} }; @@ -919,7 +928,7 @@ cmdAttachInterface(vshControl *ctl, const vshCmd *cmd) virDomainPtr dom = NULL; const char *mac = NULL, *target = NULL, *script = NULL, *type = NULL, *source = NULL, *model = NULL, - *inboundStr = NULL, *outboundStr = NULL; + *inboundStr = NULL, *outboundStr = NULL, *driver = NULL; virNetDevBandwidthRate inbound, outbound; virDomainNetType typ; int ret; @@ -931,6 +940,7 @@ cmdAttachInterface(vshControl *ctl, const vshCmd *cmd) bool config = vshCommandOptBool(cmd, "config"); bool live = vshCommandOptBool(cmd, "live"); bool persistent = vshCommandOptBool(cmd, "persistent"); + bool managed = vshCommandOptBool(cmd, "managed"); VSH_EXCLUSIVE_OPTIONS_VAR(persistent, current); @@ -949,7 +959,8 @@ cmdAttachInterface(vshControl *ctl, const vshCmd *cmd) vshCommandOptStringReq(ctl, cmd, "script", &script) < 0 || vshCommandOptStringReq(ctl, cmd, "model", &model) < 0 || vshCommandOptStringReq(ctl, cmd, "inbound", &inboundStr) < 0 || - vshCommandOptStringReq(ctl, cmd, "outbound", &outboundStr) < 0) + vshCommandOptStringReq(ctl, cmd, "outbound", &outboundStr) < 0 || + vshCommandOptStringReq(ctl, cmd, "driver", &driver) < 0) goto cleanup; /* check interface type */ @@ -982,8 +993,23 @@ cmdAttachInterface(vshControl *ctl, const vshCmd *cmd) } } + if (typ != VIR_DOMAIN_NET_TYPE_HOSTDEV) { + if (managed) { + vshError(ctl, _("--managed is usable only with --type=hostdev")); + goto cleanup; + } + if (driver) { + vshError(ctl, _("--driver is usable only with --type=hostdev")); + goto cleanup; + } + } + /* Make XML of interface */ - virBufferAsprintf(&buf, "<interface type='%s'>\n", type); + virBufferAsprintf(&buf, "<interface type='%s'", type); + if (managed) + virBufferAddLit(&buf, " managed='yes'>\n"); + else + virBufferAddLit(&buf, ">\n"); virBufferAdjustIndent(&buf, 2); switch (typ) { @@ -995,6 +1021,63 @@ cmdAttachInterface(vshControl *ctl, const vshCmd *cmd) case VIR_DOMAIN_NET_TYPE_DIRECT: virBufferAsprintf(&buf, "<source dev='%s'/>\n", source); break; + case VIR_DOMAIN_NET_TYPE_HOSTDEV: + { + struct PCIAddress pciAddr = {0, 0, 0, 0}; + + if (str2PCIAddress(source, &pciAddr) < 0) { + const char *caps[] = {"net"}; + char *tmpName = NULL; + virshNodeDeviceListPtr list = NULL; + virNodeDevicePtr netDev = NULL; + size_t i; + + list = virshNodeDeviceListCollect(ctl, (char **)caps, 1, + VIR_CONNECT_LIST_NODE_DEVICES_CAP_NET); + + if (!list) { + vshError(ctl, _("cannot list network devices")); + goto cleanup; + } + + if (virAsprintf(&tmpName, "net_%s", source) < 0) + goto cleanup; + + for (i = 0; i < list->ndevices; i++) { + if (STREQLEN(tmpName, virNodeDeviceGetName(list->devices[i]), + strlen(tmpName))) + netDev = list->devices[i]; + } + VIR_FREE(tmpName); + + if (!netDev) { + vshError(ctl, _("network interface '%s' doesn't exist"), + source); + goto cleanup; + } + + if (str2PCIAddress(virNodeDeviceGetParent(netDev)+4, &pciAddr) < 0) { + virshNodeDeviceListFree(list); + vshError(ctl, _("cannot parse pci address for network " + "interface '%s'"), source); + goto cleanup; + } + + virshNodeDeviceListFree(list); + } + if (driver) + virBufferAsprintf(&buf, "<driver name='%s'/>\n", driver); + + virBufferAddLit(&buf, "<source>\n"); + virBufferAdjustIndent(&buf, 2); + virBufferAsprintf(&buf, "<address type='pci' domain='0x%.4x'" + " bus='0x%.2x' slot='0x%.2x' function='0x%.1x'/>\n", + pciAddr.domain, pciAddr.bus, + pciAddr.slot, pciAddr.function); + virBufferAdjustIndent(&buf, -2); + virBufferAddLit(&buf, "</source>\n"); + break; + } case VIR_DOMAIN_NET_TYPE_USER: case VIR_DOMAIN_NET_TYPE_ETHERNET: @@ -1004,7 +1087,6 @@ cmdAttachInterface(vshControl *ctl, const vshCmd *cmd) case VIR_DOMAIN_NET_TYPE_MCAST: case VIR_DOMAIN_NET_TYPE_UDP: case VIR_DOMAIN_NET_TYPE_INTERNAL: - case VIR_DOMAIN_NET_TYPE_HOSTDEV: case VIR_DOMAIN_NET_TYPE_LAST: vshError(ctl, _("No support for %s in command 'attach-interface'"), type); -- 2.6.2

On 10/21/2015 08:22 AM, Pavel Hrdina wrote:
Adding this feature will allow users to easily attach a hostdev network interface using PCI passthrough.
The interface can be attached using --type=hostdev and PCI address or network device name as --source. This command also allows you to tell, whether the interface should be managed and to choose a assignment driver.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=997561
I think the PCI address version of this command is fairly straightforward, so it can and should go in. But I've been thinking more about the "netdev name" version since our exchange in the BZ, and even had a very long treatise prepared defending my position from there, but then I decided to look at it all again from the beginning and came to the conclusion that we are *both* wrong :-) The main aim here is convenience, and while I still have the position that if we're going to make it more convenient, we should make it more convenient at the XML level so that all consumers of libvirt can take advantage without needing a ton of extra code, I also have realized that specifying the netdev name is not very convenient anyway. Why? * Keep in mind that <interface type='hostdev'> will only work with SRIOV VFs (because you can't set the MAC address of a normal netdev from the host and have that MAC address persist through the guest driver's device init). * So any device that you assign in this way is a VF. * A user will know which PF ("Physical Function", but from their point of view it's "the physical port used for the connection") they want a VF from; they don't care which VF (they're all functionally equivalent), and anyway the standard utilities don't even tell you the netdev names of the VFs associated with a particular PF. So it's not that simple to learn the netdev name of the VF you want to use: * virsh nodedev-list --tree doesn't group the VFs under their PF (because its hierarchy is according to the PCI bus layout, which has all the PFs and VFs at the same level) * virsh nodedev-dumpxml for the PF device shows the VFs' PCI addresses, NOT their netdev names. <capability type='virt_functions'> <address domain='0x0000' bus='0x02' slot='0x10' function='0x0'/> <address domain='0x0000' bus='0x02' slot='0x10' function='0x2'/> <address domain='0x0000' bus='0x02' slot='0x10' function='0x4'/> <address domain='0x0000' bus='0x02' slot='0x10' function='0x6'/> <address domain='0x0000' bus='0x02' slot='0x11' function='0x0'/> <address domain='0x0000' bus='0x02' slot='0x11' function='0x2'/> <address domain='0x0000' bus='0x02' slot='0x11' function='0x4'/> (Note the leading "0x" on these values :-P) * "ip link show" lists the VFs directly under their PF, but shows the VF#, not the netdev name 11: p4p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 90:e2:ba:02:22:01 brd ff:ff:ff:ff:ff:ff vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto vf 2 MAC 00:00:00:00:00:00, spoof checking on, link-state auto vf 3 MAC 00:00:00:00:00:00, spoof checking on, link-state auto vf 4 MAC 00:00:00:00:00:00, spoof checking on, link-state auto vf 5 MAC 00:00:00:00:00:00, spoof checking on, link-state auto vf 6 MAC 00:00:00:00:00:00, spoof checking on, link-state auto So if there's not a simple way to get a list of the netdev names of the VFs for a PF, then what convenience is gained by allowing use of netdev name? What exactly was I thinking when I wrote comment 1 in that BZ??? What *could* be useful would be the ability to specify the PF name and VF# of the device you want to assign, e.g.: "<source pf='p4p2' vf='3'/>", since that's info you can easily get from "ip link show". Anymore, though, I don't even know how useful *that* is, since you can already get the PCI addresses of the VFs from the output of virsh nodedev-dumpxml (in almost exactly the format you need - just add "type='pci'". (I'm now even wondering if I misunderstood what the original reporter was asking for, but unfortunately it's not possible to ask him, because his account has been closed :-( Thinking back to what he said - there is a *different* place where it's common to know the netdev name and want to translate it into a PCI address - when you are assigning a *non-SRIOV* ethernet device using plain <hostdev> (not <interface type='hostdev'>, which only works for SRIOV VFs). In this case you probably know the netdev name but not the PCI address. So for this case a translation from netdev name to PCI address would make sense, and here I would agree that the translation should happen in virsh rather than the generic <hostdev> XML understanding what a netdev name is (contrary to <interface type='hostdev'>, where it is well established that the device you're assigning is a network device, and there are already "netdev-y" config attributes, up until now <hostdev> has not contained any config items specific to a particular type of PCI device, and I don't think it should have any added)). TL;DR of my opinion: 1) this patch minus the netdev-->PCI address translation is good 2) we don't need the netdev-->PCI translation for <interface type='hostdev'>; it's just extra code for (in my newly revised opinion) no gain. 3) a netdev-->PCI translation in virsh that creates a plain <hostdev> might be useful (hmm, maybe keep (2) as you have it, with an additional check for 4) adding <source pf='ethX' vf='n'/> support to <interface type='hostdev'> might be useful. How much sense does any of that make?
Signed-off-by: Pavel Hrdina <phrdina@redhat.com> --- tools/virsh-domain.c | 90 +++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 86 insertions(+), 4 deletions(-)
diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index e8503ec..b124441 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -56,6 +56,7 @@ #include "virtime.h" #include "virtypedparam.h" #include "virxml.h" +#include "virsh-nodedev.h"
/* Gnulib doesn't guarantee SA_SIGINFO support. */ #ifndef SA_SIGINFO @@ -866,6 +867,14 @@ static const vshCmdOptDef opts_attach_interface[] = { .type = VSH_OT_BOOL, .help = N_("print XML document rather than attach the interface") }, + {.name = "managed", + .type = VSH_OT_BOOL, + .help = N_("set the interface to be managed by libvirt")
Maybe a more descriptive help - "have libvirt automatically manage detach/attach of device from host driver" (or something like that; I know that's too long).
+ }, + {.name = "driver", + .type = VSH_OT_STRING, + .help = N_("set driver for hostdev interface, default is 'kvm'")
Actually the default behavior depends on what is available on the host - if only one of legacy-kvm and vfio is available, then that is the one used, but if both are available, then vfio is used (so vfio is closer to being "default" than kvm). The kvm method of assigning devices is deprecated, and it has been completely disabled in some distros (definitely RHEL7 and CentOS7, not sure about Fedora). These days manually specifying which driver to use is mostly pointless; I vaguely recall that it was initially added because when VFIO was first added some were nervous about not having a simple way to fallback to old behavior if something went wrong while using VFIO; those days are long gone though, and I can't think of a situation where anyone would want/need to specify which driver to use (i.e. I'm not sure we even need to expose this option in virsh; it may confuse more than help).
+ }, {.name = NULL} };
@@ -919,7 +928,7 @@ cmdAttachInterface(vshControl *ctl, const vshCmd *cmd) virDomainPtr dom = NULL; const char *mac = NULL, *target = NULL, *script = NULL, *type = NULL, *source = NULL, *model = NULL, - *inboundStr = NULL, *outboundStr = NULL; + *inboundStr = NULL, *outboundStr = NULL, *driver = NULL; virNetDevBandwidthRate inbound, outbound; virDomainNetType typ; int ret; @@ -931,6 +940,7 @@ cmdAttachInterface(vshControl *ctl, const vshCmd *cmd) bool config = vshCommandOptBool(cmd, "config"); bool live = vshCommandOptBool(cmd, "live"); bool persistent = vshCommandOptBool(cmd, "persistent"); + bool managed = vshCommandOptBool(cmd, "managed");
VSH_EXCLUSIVE_OPTIONS_VAR(persistent, current);
@@ -949,7 +959,8 @@ cmdAttachInterface(vshControl *ctl, const vshCmd *cmd) vshCommandOptStringReq(ctl, cmd, "script", &script) < 0 || vshCommandOptStringReq(ctl, cmd, "model", &model) < 0 || vshCommandOptStringReq(ctl, cmd, "inbound", &inboundStr) < 0 || - vshCommandOptStringReq(ctl, cmd, "outbound", &outboundStr) < 0) + vshCommandOptStringReq(ctl, cmd, "outbound", &outboundStr) < 0 || + vshCommandOptStringReq(ctl, cmd, "driver", &driver) < 0) goto cleanup;
/* check interface type */ @@ -982,8 +993,23 @@ cmdAttachInterface(vshControl *ctl, const vshCmd *cmd) } }
+ if (typ != VIR_DOMAIN_NET_TYPE_HOSTDEV) { + if (managed) { + vshError(ctl, _("--managed is usable only with --type=hostdev")); + goto cleanup; + } + if (driver) { + vshError(ctl, _("--driver is usable only with --type=hostdev")); + goto cleanup; + } + } + /* Make XML of interface */ - virBufferAsprintf(&buf, "<interface type='%s'>\n", type); + virBufferAsprintf(&buf, "<interface type='%s'", type); + if (managed) + virBufferAddLit(&buf, " managed='yes'>\n"); + else + virBufferAddLit(&buf, ">\n"); virBufferAdjustIndent(&buf, 2);
switch (typ) { @@ -995,6 +1021,63 @@ cmdAttachInterface(vshControl *ctl, const vshCmd *cmd) case VIR_DOMAIN_NET_TYPE_DIRECT: virBufferAsprintf(&buf, "<source dev='%s'/>\n", source); break; + case VIR_DOMAIN_NET_TYPE_HOSTDEV: + { + struct PCIAddress pciAddr = {0, 0, 0, 0}; + + if (str2PCIAddress(source, &pciAddr) < 0) { + const char *caps[] = {"net"}; + char *tmpName = NULL; + virshNodeDeviceListPtr list = NULL; + virNodeDevicePtr netDev = NULL; + size_t i; + + list = virshNodeDeviceListCollect(ctl, (char **)caps, 1, + VIR_CONNECT_LIST_NODE_DEVICES_CAP_NET); + + if (!list) { + vshError(ctl, _("cannot list network devices")); + goto cleanup; + } + + if (virAsprintf(&tmpName, "net_%s", source) < 0) + goto cleanup; + + for (i = 0; i < list->ndevices; i++) { + if (STREQLEN(tmpName, virNodeDeviceGetName(list->devices[i]), + strlen(tmpName))) + netDev = list->devices[i]; + } + VIR_FREE(tmpName); + + if (!netDev) { + vshError(ctl, _("network interface '%s' doesn't exist"), + source); + goto cleanup; + } + + if (str2PCIAddress(virNodeDeviceGetParent(netDev)+4, &pciAddr) < 0) { + virshNodeDeviceListFree(list); + vshError(ctl, _("cannot parse pci address for network " + "interface '%s'"), source); + goto cleanup; + } + + virshNodeDeviceListFree(list); + } + if (driver) + virBufferAsprintf(&buf, "<driver name='%s'/>\n", driver); + + virBufferAddLit(&buf, "<source>\n"); + virBufferAdjustIndent(&buf, 2); + virBufferAsprintf(&buf, "<address type='pci' domain='0x%.4x'" + " bus='0x%.2x' slot='0x%.2x' function='0x%.1x'/>\n", + pciAddr.domain, pciAddr.bus, + pciAddr.slot, pciAddr.function); + virBufferAdjustIndent(&buf, -2); + virBufferAddLit(&buf, "</source>\n"); + break; + }
case VIR_DOMAIN_NET_TYPE_USER: case VIR_DOMAIN_NET_TYPE_ETHERNET: @@ -1004,7 +1087,6 @@ cmdAttachInterface(vshControl *ctl, const vshCmd *cmd) case VIR_DOMAIN_NET_TYPE_MCAST: case VIR_DOMAIN_NET_TYPE_UDP: case VIR_DOMAIN_NET_TYPE_INTERNAL: - case VIR_DOMAIN_NET_TYPE_HOSTDEV: case VIR_DOMAIN_NET_TYPE_LAST: vshError(ctl, _("No support for %s in command 'attach-interface'"), type);

On Mon, Oct 26, 2015 at 04:05:28AM -0400, Laine Stump wrote:
On 10/21/2015 08:22 AM, Pavel Hrdina wrote:
Adding this feature will allow users to easily attach a hostdev network interface using PCI passthrough.
The interface can be attached using --type=hostdev and PCI address or network device name as --source. This command also allows you to tell, whether the interface should be managed and to choose a assignment driver.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=997561
I think the PCI address version of this command is fairly straightforward, so it can and should go in. But I've been thinking more about the "netdev name" version since our exchange in the BZ, and even had a very long treatise prepared defending my position from there, but then I decided to look at it all again from the beginning and came to the conclusion that we are *both* wrong :-)
The main aim here is convenience, and while I still have the position that if we're going to make it more convenient, we should make it more convenient at the XML level so that all consumers of libvirt can take advantage without needing a ton of extra code, I also have realized that specifying the netdev name is not very convenient anyway.
Why?
* Keep in mind that <interface type='hostdev'> will only work with SRIOV VFs (because you can't set the MAC address of a normal netdev from the host and have that MAC address persist through the guest driver's device init).
* So any device that you assign in this way is a VF.
* A user will know which PF ("Physical Function", but from their point of view it's "the physical port used for the connection") they want a VF from; they don't care which VF (they're all functionally equivalent), and anyway the standard utilities don't even tell you the netdev names of the VFs associated with a particular PF. So it's not that simple to learn the netdev name of the VF you want to use:
* virsh nodedev-list --tree doesn't group the VFs under their PF (because its hierarchy is according to the PCI bus layout, which has all the PFs and VFs at the same level)
* virsh nodedev-dumpxml for the PF device shows the VFs' PCI addresses, NOT their netdev names.
<capability type='virt_functions'> <address domain='0x0000' bus='0x02' slot='0x10' function='0x0'/> <address domain='0x0000' bus='0x02' slot='0x10' function='0x2'/> <address domain='0x0000' bus='0x02' slot='0x10' function='0x4'/> <address domain='0x0000' bus='0x02' slot='0x10' function='0x6'/> <address domain='0x0000' bus='0x02' slot='0x11' function='0x0'/> <address domain='0x0000' bus='0x02' slot='0x11' function='0x2'/> <address domain='0x0000' bus='0x02' slot='0x11' function='0x4'/>
(Note the leading "0x" on these values :-P)
* "ip link show" lists the VFs directly under their PF, but shows the VF#, not the netdev name
11: p4p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 90:e2:ba:02:22:01 brd ff:ff:ff:ff:ff:ff vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto vf 2 MAC 00:00:00:00:00:00, spoof checking on, link-state auto vf 3 MAC 00:00:00:00:00:00, spoof checking on, link-state auto vf 4 MAC 00:00:00:00:00:00, spoof checking on, link-state auto vf 5 MAC 00:00:00:00:00:00, spoof checking on, link-state auto vf 6 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
So if there's not a simple way to get a list of the netdev names of the VFs for a PF, then what convenience is gained by allowing use of netdev name? What exactly was I thinking when I wrote comment 1 in that BZ???
What *could* be useful would be the ability to specify the PF name and VF# of the device you want to assign, e.g.: "<source pf='p4p2' vf='3'/>", since that's info you can easily get from "ip link show". Anymore, though, I don't even know how useful *that* is, since you can already get the PCI addresses of the VFs from the output of virsh nodedev-dumpxml (in almost exactly the format you need - just add "type='pci'".
(I'm now even wondering if I misunderstood what the original reporter was asking for, but unfortunately it's not possible to ask him, because his account has been closed :-( Thinking back to what he said - there is a *different* place where it's common to know the netdev name and want to translate it into a PCI address - when you are assigning a *non-SRIOV* ethernet device using plain <hostdev> (not <interface type='hostdev'>, which only works for SRIOV VFs). In this case you probably know the netdev name but not the PCI address. So for this case a translation from netdev name to PCI address would make sense, and here I would agree that the translation should happen in virsh rather than the generic <hostdev> XML understanding what a netdev name is (contrary to <interface type='hostdev'>, where it is well established that the device you're assigning is a network device, and there are already "netdev-y" config attributes, up until now <hostdev> has not contained any config items specific to a particular type of PCI device, and I don't think it should have any added)).
TL;DR of my opinion:
1) this patch minus the netdev-->PCI address translation is good
I don't mind if the patch goes in with or without the translation. The main purpose of this extension is to make it easier for users so they don't have to use the generic 'attach-device' with XML definition but only specify some arguments. But see the comment below for the 2).
2) we don't need the netdev-->PCI translation for <interface type='hostdev'>; it's just extra code for (in my newly revised opinion) no gain.
Actually it's not entirely true, the netdev names of VFs are listed by 'ip link show' command, but it's not clear to guess the relation to PFs: 3: enp5s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000 link/ether b4:b5:2f:66:3c:4f brd ff:ff:ff:ff:ff:ff vf 0 MAC 22:67:e2:b0:28:b0, spoof checking on, link-state auto vf 1 MAC 0a:d3:63:e6:06:45, spoof checking on, link-state auto vf 2 MAC 52:54:00:32:93:5c, spoof checking on, link-state auto vf 3 MAC 42:9a:79:f6:9d:cb, spoof checking on, link-state auto 14: enp5s16f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000 link/ether 22:67:e2:b0:28:b0 brd ff:ff:ff:ff:ff:ff One could benefit using the netdev name also for VF.
3) a netdev-->PCI translation in virsh that creates a plain <hostdev> might be useful (hmm, maybe keep (2) as you have it, with an additional check for
Yes, this might be a useful extension of 'attach-interface --type=hostdev', that we would ether detect, whether the interface is a VF or we would use <hostdev> as default, unless a mac address is specified or we can combine both approaches. In this case, the command will remain the same, but it will be able to attach VF and non-VF interfaces to the guest. This would probably require to extend a detach-interface to be able to detach also a network interface attached as <hostdev>.
4) adding <source pf='ethX' vf='n'/> support to <interface type='hostdev'> might be useful.
I don't like to extend our XMLs to introduce another way how to represent some source device if we already have one. And I don't see the value of the PF and VF in XML.
How much sense does any of that make?
Signed-off-by: Pavel Hrdina <phrdina@redhat.com> --- tools/virsh-domain.c | 90 +++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 86 insertions(+), 4 deletions(-)
diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index e8503ec..b124441 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -56,6 +56,7 @@ #include "virtime.h" #include "virtypedparam.h" #include "virxml.h" +#include "virsh-nodedev.h"
/* Gnulib doesn't guarantee SA_SIGINFO support. */ #ifndef SA_SIGINFO @@ -866,6 +867,14 @@ static const vshCmdOptDef opts_attach_interface[] = { .type = VSH_OT_BOOL, .help = N_("print XML document rather than attach the interface") }, + {.name = "managed", + .type = VSH_OT_BOOL, + .help = N_("set the interface to be managed by libvirt")
Maybe a more descriptive help - "have libvirt automatically manage detach/attach of device from host driver" (or something like that; I know that's too long).
That makes sense, I'll try to come up with something better.
+ }, + {.name = "driver", + .type = VSH_OT_STRING, + .help = N_("set driver for hostdev interface, default is 'kvm'")
Actually the default behavior depends on what is available on the host - if only one of legacy-kvm and vfio is available, then that is the one used, but if both are available, then vfio is used (so vfio is closer to being "default" than kvm). The kvm method of assigning devices is deprecated, and it has been completely disabled in some distros (definitely RHEL7 and CentOS7, not sure about Fedora). These days manually specifying which driver to use is mostly pointless; I vaguely recall that it was initially added because when VFIO was first added some were nervous about not having a simple way to fallback to old behavior if something went wrong while using VFIO; those days are long gone though, and I can't think of a situation where anyone would want/need to specify which driver to use (i.e. I'm not sure we even need to expose this option in virsh; it may confuse more than help).
In that case, the documentation for <interface type='hostdev'> is not correct, because there is a statement '(or simply omit the <driver> element, since "kvm" is currently the default)'. Looking at the <hostdev> section in our documentation, this is already updated and there is a correct description, when is which driver used as default. In that case, I'll remove it from virsh. Pavel

Rewrite the attach-interface section in man page to be more readable and add the new hostdev functionality. Signed-off-by: Pavel Hrdina <phrdina@redhat.com> --- tools/virsh.pod | 85 +++++++++++++++++++++++++++++++++++---------------------- 1 file changed, 53 insertions(+), 32 deletions(-) diff --git a/tools/virsh.pod b/tools/virsh.pod index 0212e7a..063b0d2 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -2507,51 +2507,72 @@ Likewise, I<--shareable> is an alias for I<--mode shareable>. [[[I<--live>] [I<--config>] | [I<--current>]] | [I<--persistent>]] [I<--target target>] [I<--mac mac>] [I<--script script>] [I<--model model>] [I<--inbound average,peak,burst,floor>] [I<--outbound average,peak,burst>] -[I<--print-xml>] - -Attach a new network interface to the domain. I<type> can be -I<network> to indicate connection via a libvirt virtual network, or -I<bridge> to indicate connection via a bridge device on the host, or -I<direct> to indicate connection directly to one of the host's network -interfaces or bridges. I<source> indicates the source of the -connection (the name of a network, or of a bridge device, or the -host's network interfaces or bridges). I<target> is used to specify -the tap/macvtap device to be used to connect the domain to the -source. Names starting with 'vnet' are considered as auto-generated -and are blanked out/regenerated each time the interface is attached. -I<mac> specifies the MAC address of the network interface; if a MAC +[I<--managed>] [I<--driver driver>] [I<--print-xml>] + +Attach a new network interface to the domain. + +B<--type> can be one of the: I<network> to indicate connection via a libvirt +virtual network, I<bridge> to indicate connection via a bridge device +on the host, I<direct> to indicate connection directly to one of the host's +network interfaces or bridges, I<hostdev> to indicate connection using a +passthrough of PCI device on the host. + +B<--source> indicates the source of the connection. The source depends +on the type of the interface: I<network> name of the virtual network, +I<bridge> the name of the bridge device, I<direct> the name of the host's +interface or bridge, I<hostdev> the name of the host's interface or +it's PCI address. + +B<--target> is used to specify the tap/macvtap device to be used to +connect the domain to the source. Names starting with 'vnet' are +considered as auto-generated and are blanked out/regenerated each +time the interface is attached. + +B<--mac> specifies the MAC address of the network interface; if a MAC address is not given, a new address will be automatically generated (and stored in the persistent configuration if "--config" is given on -the commandline). I<script> is used to specify a path to a custom -script to be called while attaching to a bridge - this will be called -instead of the default script not in addition to it; --script is valid -only for interfaces of type I<bridge> and only for Xen domains. -I<model> specifies the network device model to be presented to the -domain. I<inbound> and I<outbound> control the bandwidth of the -interface. At least one from the I<average>, I<floor> pair must be -specified. The other two I<peak> and I<burst> are optional, so +the command line). + +B<--script> is used to specify a path to a custom script to be called +while attaching to a bridge - this will be called instead of the default +script not in addition to it; --script is valid only for interfaces of +type I<bridge> and only for Xen domains. + +B<--model> specifies the network device model to be presented to the +domain. + +B<--inbound> and B<--outbound> control the bandwidth of the +interface. At least one from the I<average>, I<floor> pair must be +specified. The other two I<peak> and I<burst> are optional, so "average,peak", "average,,burst", "average,,,floor", "average" and -",,,floor" are also legal. Values for I<average>, I<floor> and I<peak> +",,,floor" are also legal. Values for I<average>, I<floor> and I<peak> are expressed in kilobytes per second, while I<burst> is expressed in kilobytes in a single burst at I<peak> speed as described in the Network XML documentation at L<http://libvirt.org/formatnetwork.html#elementQoS>. -If I<--print-xml> is specified, then the XML of the interface that would be +B<--managed> is usable only for I<hostdev> type and tells libvirt +that the interface should be managed, which means detached and reattached +from/to the guest by libvirt. + +B<--driver> is usable only for I<hostdev> type. Specifies which driver should +be used, it could be one of I<kvm> (default) and I<vfio>. + +If B<--print-xml> is specified, then the XML of the interface that would be attached is printed instead. -If I<--live> is specified, affect a running domain. -If I<--config> is specified, affect the next startup of a persistent domain. -If I<--current> is specified, affect the current domain state. -Both I<--live> and I<--config> flags may be given, but I<--current> is -exclusive. When no flag is specified legacy API is used whose behavior depends -on the hypervisor driver. +If B<--live> is specified, affect a running domain. +If B<--config> is specified, affect the next startup of a persistent domain. +If B<--current> is specified, affect the current domain state. +Both B<--live> and B<--config> flags may be given, but B<--current> is +exclusive. When no flag is specified legacy API is used whose behavior +depends on the hypervisor driver. -For compatibility purposes, I<--persistent> behaves like I<--config> for -an offline domain, and like I<--live> I<--config> for a running domain. +For compatibility purposes, B<--persistent> behaves like B<--config> for +an offline domain, and like B<--live> B<--config> for a running domain. B<Note>: the optional target value is the name of a device to be created -as the back-end on the node. If not provided a device named "vnetN" or "vifN" +as the back-end on the node. If not provided a device named "vnetN" or "vifN" will be created automatically. =item B<detach-device> I<domain> I<FILE> -- 2.6.2
participants (3)
-
Laine Stump
-
Pavel Hrdina
-
Peter Krempa