[libvirt] [PATCH 0/2] Add tx_alg attribute to interface XML for virtio backend

These two patches provide the ability to configure which of two algorithms is used on the TX side of the virtio-net-pci device. Details are in the comments for PATCH 2/2. There are also at least a couple of open questions about the patch, which are posed in the comments of 2/2. It's highly unlikely this patch will be pushed as-is - I'm sending it mostly to solicit comments on those questions.

When the <driver> element (and its "name" attribute) was added to the domain XML's interface element, a "backend" enum was simply added to the toplevel of the virDomainNetDef struct. Ignoring the naming inconsistency ("name" vs. "backend"), this is fine when there's only a single item contained in the driver element of the XML, but doesn't scale well as we add more attributes that apply to the backend of the virtio-net driver, or add attributes applicable to other drivers. This patch changes virDomainNetDef in two ways: 1) Rename the item in the struct from "backend" to "name", so that it's the same in the XML and in the struct, hopefully avoiding confusion for someone unfamiliar with the function of the attribute. 2) Create a "driver" union within virDomainNetDef, and a "virtio" struct in that struct, which contains the "name" enum value. 3) Move around the virDomainNetParse and virDomainNetFormat functions to allow for simple plugin of new attributes without disturbing existing code. (you'll note that this results in a seemingly redundant if() in the format function, but that will no longer be the case as soon as a 2nd attribute is added). In the future, new attributes for the virtio driver backend can be added to the "virtio" struct, and any other network device backend that needs an attribute will have its own struct added to the "driver" union. --- src/conf/domain_conf.c | 35 +++++++++++++++++++++-------------- src/conf/domain_conf.h | 6 +++++- src/qemu/qemu_command.c | 12 ++++++------ 3 files changed, 32 insertions(+), 21 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 9369ed4..84b866b 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -2756,19 +2756,21 @@ virDomainNetDefParseXML(virCapsPtr caps, model = NULL; } - if ((backend != NULL) && - (def->model && STREQ(def->model, "virtio"))) { - int b; - if (((b = virDomainNetBackendTypeFromString(backend)) < 0) || - (b == VIR_DOMAIN_NET_BACKEND_TYPE_DEFAULT)) { - virDomainReportError(VIR_ERR_INTERNAL_ERROR, - _("Unknown interface <driver name='%s'> " - "has been specified"), - backend); - goto error; + if (def->model && STREQ(def->model, "virtio")) { + if (backend != NULL) { + int name; + if (((name = virDomainNetBackendTypeFromString(backend)) < 0) || + (name == VIR_DOMAIN_NET_BACKEND_TYPE_DEFAULT)) { + virDomainReportError(VIR_ERR_INTERNAL_ERROR, + _("Unknown interface <driver name='%s'> " + "has been specified"), + backend); + goto error; + } + def->driver.virtio.name = name; } - def->backend = b; } + if (filter != NULL) { switch (def->type) { case VIR_DOMAIN_NET_TYPE_ETHERNET: @@ -6805,9 +6807,14 @@ virDomainNetDefFormat(virBufferPtr buf, if (def->model) { virBufferEscapeString(buf, " <model type='%s'/>\n", def->model); - if (STREQ(def->model, "virtio") && def->backend) { - virBufferVSprintf(buf, " <driver name='%s'/>\n", - virDomainNetBackendTypeToString(def->backend)); + if (STREQ(def->model, "virtio") && + def->driver.virtio.name) { + virBufferAddLit(buf, " <driver"); + if (def->driver.virtio.name) { + virBufferVSprintf(buf, " name='%s'", + virDomainNetBackendTypeToString(def->driver.virtio.name)); + } + virBufferAddLit(buf, "/>\n"); } } if (def->filter) { diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 5d35e43..44d0a4b 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -338,7 +338,11 @@ struct _virDomainNetDef { enum virDomainNetType type; unsigned char mac[VIR_MAC_BUFLEN]; char *model; - enum virDomainNetBackendType backend; + union { + struct { + enum virDomainNetBackendType name; /* which driver backend to use */ + } virtio; + } driver; union { struct { char *dev; diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index f78ce71..1b213da 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -311,7 +311,7 @@ qemuOpenVhostNet(virDomainNetDefPtr net, *vhostfd = -1; /* assume we won't use vhost */ /* If the config says explicitly to not use vhost, return now */ - if (net->backend == VIR_DOMAIN_NET_BACKEND_TYPE_QEMU) { + if (net->driver.virtio.name == VIR_DOMAIN_NET_BACKEND_TYPE_QEMU) { return 0; } @@ -321,7 +321,7 @@ qemuOpenVhostNet(virDomainNetDefPtr net, if (!(qemuCmdFlags & QEMUD_CMD_FLAG_VNET_HOST && qemuCmdFlags & QEMUD_CMD_FLAG_NETDEV && qemuCmdFlags & QEMUD_CMD_FLAG_DEVICE)) { - if (net->backend == VIR_DOMAIN_NET_BACKEND_TYPE_VHOST) { + if (net->driver.virtio.name == VIR_DOMAIN_NET_BACKEND_TYPE_VHOST) { qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("vhost-net is not supported with " "this QEMU binary")); @@ -332,7 +332,7 @@ qemuOpenVhostNet(virDomainNetDefPtr net, /* If the nic model isn't virtio, don't try to open. */ if (!(net->model && STREQ(net->model, "virtio"))) { - if (net->backend == VIR_DOMAIN_NET_BACKEND_TYPE_VHOST) { + if (net->driver.virtio.name == VIR_DOMAIN_NET_BACKEND_TYPE_VHOST) { qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("vhost-net is only supported for " "virtio network interfaces")); @@ -347,7 +347,7 @@ qemuOpenVhostNet(virDomainNetDefPtr net, * report an error. */ if ((*vhostfd < 0) && - (net->backend == VIR_DOMAIN_NET_BACKEND_TYPE_VHOST)) { + (net->driver.virtio.name == VIR_DOMAIN_NET_BACKEND_TYPE_VHOST)) { qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("vhost-net was requested for an interface, " "but is unavailable")); @@ -5012,9 +5012,9 @@ qemuParseCommandLineNet(virCapsPtr caps, values[i] = NULL; } else if (STREQ(keywords[i], "vhost")) { if ((values[i] == NULL) || STREQ(values[i], "on")) { - def->backend = VIR_DOMAIN_NET_BACKEND_TYPE_VHOST; + def->driver.virtio.name = VIR_DOMAIN_NET_BACKEND_TYPE_VHOST; } else if (STREQ(keywords[i], "off")) { - def->backend = VIR_DOMAIN_NET_BACKEND_TYPE_QEMU; + def->driver.virtio.name = VIR_DOMAIN_NET_BACKEND_TYPE_QEMU; } } else if (STREQ(keywords[i], "sndbuf") && values[i]) { if (virStrToLong_ul(values[i], NULL, 10, &def->tune.sndbuf) < 0) { -- 1.7.3.4

On 02/04/2011 02:00 PM, Laine Stump wrote:
When the <driver> element (and its "name" attribute) was added to the domain XML's interface element, a "backend" enum was simply added to the toplevel of the virDomainNetDef struct.
This patch changes virDomainNetDef in two ways:
s/two/three/, given:
1) Rename the item in the struct from "backend" to "name", so that it's the same in the XML and in the struct, hopefully avoiding confusion for someone unfamiliar with the function of the attribute.
2) Create a "driver" union within virDomainNetDef, and a "virtio" struct in that struct, which contains the "name" enum value.
3) Move around the virDomainNetParse and virDomainNetFormat functions to allow for simple plugin of new attributes without disturbing existing code. (you'll note that this results in a seemingly redundant if() in the format function, but that will no longer be the case as soon as a 2nd attribute is added).
...all nice changes. ACK to this half of the series. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On 02/07/2011 11:55 AM, Eric Blake wrote:
On 02/04/2011 02:00 PM, Laine Stump wrote:
When the<driver> element (and its "name" attribute) was added to the domain XML's interface element, a "backend" enum was simply added to the toplevel of the virDomainNetDef struct.
This patch changes virDomainNetDef in two ways: s/two/three/, given:
Sigh. I just noticed this after I pushed. So the comment is out of whack :-/
1) Rename the item in the struct from "backend" to "name", so that it's the same in the XML and in the struct, hopefully avoiding confusion for someone unfamiliar with the function of the attribute.
2) Create a "driver" union within virDomainNetDef, and a "virtio" struct in that struct, which contains the "name" enum value.
3) Move around the virDomainNetParse and virDomainNetFormat functions to allow for simple plugin of new attributes without disturbing existing code. (you'll note that this results in a seemingly redundant if() in the format function, but that will no longer be the case as soon as a 2nd attribute is added). ...all nice changes.
ACK to this half of the series.
Pushed, now that 0.8.8 is tagged and out.

qemu's virtio-net-pci driver allows setting the algorithm used for tx packets to either "bh" or "timer". I don't know exactly how these algorithms differ, but someone has requested the ability to choose between them in libvirt's domain XML. See: https://bugzilla.redhat.com/show_bug.cgi?id=629662 This patch provides that knob, by adding a new attribute "tx_alg" to the <driver> element that can be placed inside any <interface> element in a domain definition. It's use would be something like this: <interface ...> ... <model type='virtio'/> <driver tx_alg='bh'/> ... </interface> I chose to put this setting as an attribute to <driver> rather than as a sub-element to <tune> because it is specific to the virtio-net driver, not something that is generally usable by all network drivers. (note that this is the same placement as the "driver name=..." attribute used to choose kernel vs. userland backend for the virtio-net driver.) This is a bit troublesome to me, because I can see lots of new virtio options that could potentially be requested (just run "qemu-kvm -device virtio-net-pci,?" on a qemu that's version 0.13.0 or newer, and compare that output to potential tunable items in "-device e1000,?" or "-net tap,..."), so the attribute list could potentially get quite long (which is something I was concerned about when I first added the option to select kernel vs. userland backend, but didn't realize just how many virtio-specific options there were). Actually adding the tx=xxx option to the qemu commandline is only done if the version of qemu being used advertises it in the output of qemu -device virtio-net-pci,? If the option isn't listed there, the config item is ignored (should it instead generate a warning log? error out and prevent the domain from starting?) --- src/conf/domain_conf.c | 26 +++++++++++++++++++++++++- src/conf/domain_conf.h | 11 +++++++++++ src/libvirt_private.syms | 1 + src/qemu/qemu_capabilities.c | 3 +++ src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_command.c | 13 +++++++++++++ 6 files changed, 54 insertions(+), 1 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 84b866b..8f0f057 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -198,6 +198,11 @@ VIR_ENUM_IMPL(virDomainNetBackend, VIR_DOMAIN_NET_BACKEND_TYPE_LAST, "qemu", "vhost") +VIR_ENUM_IMPL(virDomainNetVirtioTxAlg, VIR_DOMAIN_NET_BACKEND_TYPE_LAST, + "default", + "bh", + "timer") + VIR_ENUM_IMPL(virDomainChrChannelTarget, VIR_DOMAIN_CHR_CHANNEL_TARGET_TYPE_LAST, "guestfwd", @@ -2477,6 +2482,7 @@ virDomainNetDefParseXML(virCapsPtr caps, char *port = NULL; char *model = NULL; char *backend = NULL; + char *tx_alg = NULL; char *filter = NULL; char *internal = NULL; char *devaddr = NULL; @@ -2565,6 +2571,7 @@ virDomainNetDefParseXML(virCapsPtr caps, model = virXMLPropString(cur, "type"); } else if (xmlStrEqual (cur->name, BAD_CAST "driver")) { backend = virXMLPropString(cur, "name"); + tx_alg = virXMLPropString(cur, "tx_alg"); } else if (xmlStrEqual (cur->name, BAD_CAST "filterref")) { filter = virXMLPropString(cur, "filter"); VIR_FREE(filterparams); @@ -2769,6 +2776,18 @@ virDomainNetDefParseXML(virCapsPtr caps, } def->driver.virtio.name = name; } + if (tx_alg != NULL) { + int alg; + if (((alg = virDomainNetVirtioTxAlgTypeFromString(tx_alg)) < 0) || + (alg == VIR_DOMAIN_NET_VIRTIO_TX_ALG_DEFAULT)) { + virDomainReportError(VIR_ERR_INTERNAL_ERROR, + _("Unknown interface <driver tx_alg='%s'> " + "has been specified"), + tx_alg); + goto error; + } + def->driver.virtio.tx_alg = alg; + } } if (filter != NULL) { @@ -2808,6 +2827,7 @@ cleanup: VIR_FREE(bridge); VIR_FREE(model); VIR_FREE(backend); + VIR_FREE(tx_alg); VIR_FREE(filter); VIR_FREE(type); VIR_FREE(internal); @@ -6808,12 +6828,16 @@ virDomainNetDefFormat(virBufferPtr buf, virBufferEscapeString(buf, " <model type='%s'/>\n", def->model); if (STREQ(def->model, "virtio") && - def->driver.virtio.name) { + (def->driver.virtio.name || def->driver.virtio.tx_alg)) { virBufferAddLit(buf, " <driver"); if (def->driver.virtio.name) { virBufferVSprintf(buf, " name='%s'", virDomainNetBackendTypeToString(def->driver.virtio.name)); } + if (def->driver.virtio.tx_alg) { + virBufferVSprintf(buf, " tx_alg='%s'", + virDomainNetVirtioTxAlgTypeToString(def->driver.virtio.tx_alg)); + } virBufferAddLit(buf, "/>\n"); } } diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 44d0a4b..b3a02f4 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -321,6 +321,15 @@ enum virDomainNetBackendType { VIR_DOMAIN_NET_BACKEND_TYPE_LAST, }; +/* the TX algorithm used for virtio interfaces */ +enum virDomainNetVirtioTxAlgType { + VIR_DOMAIN_NET_VIRTIO_TX_ALG_DEFAULT, /* default for this version of qemu */ + VIR_DOMAIN_NET_VIRTIO_TX_ALG_BH, + VIR_DOMAIN_NET_VIRTIO_TX_ALG_TIMER, + + VIR_DOMAIN_NET_VIRTIO_TX_ALG_LAST, +}; + /* the mode type for macvtap devices */ enum virDomainNetdevMacvtapType { VIR_DOMAIN_NETDEV_MACVTAP_MODE_VEPA, @@ -341,6 +350,7 @@ struct _virDomainNetDef { union { struct { enum virDomainNetBackendType name; /* which driver backend to use */ + enum virDomainNetVirtioTxAlgType tx_alg; } virtio; } driver; union { @@ -1367,6 +1377,7 @@ VIR_ENUM_DECL(virDomainFS) VIR_ENUM_DECL(virDomainFSAccessMode) VIR_ENUM_DECL(virDomainNet) VIR_ENUM_DECL(virDomainNetBackend) +VIR_ENUM_DECL(virDomainNetVirtioTxAlg) VIR_ENUM_DECL(virDomainChrDevice) VIR_ENUM_DECL(virDomainChrChannelTarget) VIR_ENUM_DECL(virDomainChrConsoleTarget) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 88e270c..dd44749 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -268,6 +268,7 @@ virDomainMemballoonModelTypeFromString; virDomainMemballoonModelTypeToString; virDomainNetDefFree; virDomainNetTypeToString; +virDomainNetVirtioTxAlgTypeToString; virDomainObjAssignDef; virDomainObjSetDefTransient; virDomainObjGetPersistentDef; diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 0e1f79c..935c669 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -1063,6 +1063,7 @@ qemuCapsExtractDeviceStr(const char *qemu, "-device", "?", "-device", "pci-assign,?", "-device", "virtio-blk-pci,?", + "-device", "virtio-net-pci,?", NULL); virCommandAddEnvPassCommon(cmd); /* qemu -help goes to stdout, but qemu -device ? goes to stderr. */ @@ -1104,6 +1105,8 @@ qemuCapsParseDeviceStr(const char *str, unsigned long long *flags) if (strstr(str, "pci-assign.bootindex")) *flags |= QEMUD_CMD_FLAG_PCI_BOOTINDEX; } + if (strstr(str, "virtio-net-pci.tx=")) + *flags |= QEMUD_CMD_FLAG_VIRTIO_TX_ALG; return 0; } diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index dd39b3b..c29d914 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -92,6 +92,7 @@ enum qemuCapsFlags { QEMUD_CMD_FLAG_CCID_PASSTHRU = (1LL << 55), /* -device ccid-card-passthru */ QEMUD_CMD_FLAG_CHARDEV_SPICEVMC = (1LL << 56), /* newer -chardev spicevmc */ QEMUD_CMD_FLAG_DEVICE_SPICEVMC = (1LL << 57), /* older -device spicevmc*/ + QEMUD_CMD_FLAG_VIRTIO_TX_ALG = (1LL << 58), /* -device virtio-net-pci,tx=string */ }; virCapsPtr qemuCapsInit(virCapsPtr old_caps); diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 1b213da..1d6f20c 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -1577,16 +1577,29 @@ qemuBuildNicDevStr(virDomainNetDefPtr net, { virBuffer buf = VIR_BUFFER_INITIALIZER; const char *nic; + bool usingVirtio = false; if (!net->model) { nic = "rtl8139"; } else if (STREQ(net->model, "virtio")) { nic = "virtio-net-pci"; + usingVirtio = true; } else { nic = net->model; } virBufferAdd(&buf, nic, strlen(nic)); + if (usingVirtio && net->driver.virtio.tx_alg) { + if (qemuCmdFlags & QEMUD_CMD_FLAG_VIRTIO_TX_ALG) { + virBufferVSprintf(&buf, ",tx=%s", + virDomainNetVirtioTxAlgTypeToString(net->driver.virtio.tx_alg)); + } else { + /* What should we do if the option isn't available? log a + * warning? prevent the domain from starting? Ignore it? + * Right now we're ignoring it. + */ + } + } if (vlan == -1) virBufferVSprintf(&buf, ",netdev=host%s", net->info.alias); else -- 1.7.3.4

On 02/04/2011 02:00 PM, Laine Stump wrote:
qemu's virtio-net-pci driver allows setting the algorithm used for tx packets to either "bh" or "timer". I don't know exactly how these algorithms differ, but someone has requested the ability to choose between them in libvirt's domain XML. See:
Same as qemu aio - we don't have to know exactly what the difference is, to know that someone else who does know the difference wants to be able to choose :)
<interface ...> ... <model type='virtio'/> <driver tx_alg='bh'/> ... </interface>
I chose to put this setting as an attribute to <driver> rather than as a sub-element to <tune> because it is specific to the virtio-net driver, not something that is generally usable by all network drivers. (note that this is the same placement as the "driver name=..." attribute used to choose kernel vs. userland backend for the virtio-net driver.)
I take it that tx_alg applies to both options of driver name=... when using a virtio interface, right?
This is a bit troublesome to me, because I can see lots of new virtio options that could potentially be requested (just run "qemu-kvm -device virtio-net-pci,?" on a qemu that's version 0.13.0 or newer, and compare that output to potential tunable items in "-device e1000,?" or "-net tap,..."), so the attribute list could potentially get quite long (which is something I was concerned about when I first added the option to select kernel vs. userland backend, but didn't realize just how many virtio-specific options there were).
I'd like feedback from danpb or DV, since this might be a long-term XML commitment. Maybe it makes sense to introduce sub-elements of <driver>, according to the driver chosen? <interface ...> ... <driver> <tunable name='tx_alg'>bh</tunable> </driver> </interface> But I'm not sure if that is any better.
If the option isn't listed there, the config item is ignored (should it instead generate a warning log? error out and prevent the domain from starting?)
I would lean towards erroring out, the same way that we error out if: <driver name='vhost'/> is explicitly requested when the kernel module is not present.
@@ -6808,12 +6828,16 @@ virDomainNetDefFormat(virBufferPtr buf, virBufferEscapeString(buf, " <model type='%s'/>\n", def->model); if (STREQ(def->model, "virtio") && - def->driver.virtio.name) { + (def->driver.virtio.name || def->driver.virtio.tx_alg)) { virBufferAddLit(buf, " <driver"); if (def->driver.virtio.name) { virBufferVSprintf(buf, " name='%s'", virDomainNetBackendTypeToString(def->driver.virtio.name)); } + if (def->driver.virtio.tx_alg) { + virBufferVSprintf(buf, " tx_alg='%s'", + virDomainNetVirtioTxAlgTypeToString(def->driver.virtio.tx_alg)); + } virBufferAddLit(buf, "/>\n");
Obviously, this would change if we settle on a different XML representation.
+++ b/src/qemu/qemu_capabilities.c @@ -1063,6 +1063,7 @@ qemuCapsExtractDeviceStr(const char *qemu, "-device", "?", "-device", "pci-assign,?", "-device", "virtio-blk-pci,?", + "-device", "virtio-net-pci,?", NULL); virCommandAddEnvPassCommon(cmd); /* qemu -help goes to stdout, but qemu -device ? goes to stderr. */ @@ -1104,6 +1105,8 @@ qemuCapsParseDeviceStr(const char *str, unsigned long long *flags) if (strstr(str, "pci-assign.bootindex")) *flags |= QEMUD_CMD_FLAG_PCI_BOOTINDEX; } + if (strstr(str, "virtio-net-pci.tx=")) + *flags |= QEMUD_CMD_FLAG_VIRTIO_TX_ALG;
That is just so slick for detecting the new bit! I'm glad I added that improvement in -device string parsing.
+++ b/src/qemu/qemu_capabilities.h @@ -92,6 +92,7 @@ enum qemuCapsFlags { QEMUD_CMD_FLAG_CCID_PASSTHRU = (1LL << 55), /* -device ccid-card-passthru */ QEMUD_CMD_FLAG_CHARDEV_SPICEVMC = (1LL << 56), /* newer -chardev spicevmc */ QEMUD_CMD_FLAG_DEVICE_SPICEVMC = (1LL << 57), /* older -device spicevmc*/ + QEMUD_CMD_FLAG_VIRTIO_TX_ALG = (1LL << 58), /* -device virtio-net-pci,tx=string */
5 more bits left. Get your patches in now, before we run out of space. Last one in has to rewrite this to be a bitset :)
virBufferAdd(&buf, nic, strlen(nic)); + if (usingVirtio && net->driver.virtio.tx_alg) { + if (qemuCmdFlags & QEMUD_CMD_FLAG_VIRTIO_TX_ALG) { + virBufferVSprintf(&buf, ",tx=%s", + virDomainNetVirtioTxAlgTypeToString(net->driver.virtio.tx_alg)); + } else { + /* What should we do if the option isn't available? log a + * warning? prevent the domain from starting? Ignore it? + * Right now we're ignoring it. + */
This would be the perfect place to error out with VIR_ERR_CONFIG_UNSUPPORTED. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On 02/07/2011 12:06 PM, Eric Blake wrote:
On 02/04/2011 02:00 PM, Laine Stump wrote:
qemu's virtio-net-pci driver allows setting the algorithm used for tx packets to either "bh" or "timer". I don't know exactly how these algorithms differ, but someone has requested the ability to choose between them in libvirt's domain XML. See:
https://bugzilla.redhat.com/show_bug.cgi?id=629662 Same as qemu aio - we don't have to know exactly what the difference is, to know that someone else who does know the difference wants to be able to choose :)
<interface ...> ... <model type='virtio'/> <driver tx_alg='bh'/> ... </interface>
I chose to put this setting as an attribute to<driver> rather than as a sub-element to<tune> because it is specific to the virtio-net driver, not something that is generally usable by all network drivers. (note that this is the same placement as the "driver name=..." attribute used to choose kernel vs. userland backend for the virtio-net driver.)
I take it that tx_alg applies to both options of driver name=... when using a virtio interface, right?
I assume so, but know little beyond the name of the option :-)
This is a bit troublesome to me, because I can see lots of new virtio options that could potentially be requested (just run "qemu-kvm -device virtio-net-pci,?" on a qemu that's version 0.13.0 or newer, and compare that output to potential tunable items in "-device e1000,?" or "-net tap,..."), so the attribute list could potentially get quite long (which is something I was concerned about when I first added the option to select kernel vs. userland backend, but didn't realize just how many virtio-specific options there were). I'd like feedback from danpb or DV, since this might be a long-term XML commitment.
Me too! :-)
Maybe it makes sense to introduce sub-elements of<driver>, according to the driver chosen?
<interface ...> ... <driver> <tunable name='tx_alg'>bh</tunable> </driver> </interface>
But I'm not sure if that is any better.
Well, there is already a <tune> element inside <interface> that works similar to <memtune> and <cputune>: <interface ...> <tune> <sndbuf>0</sndbuf> <tunable2>100</tunable2> </tune> ... </interface> One alternative would be to place tx_alg there, and ignore it when model type != 'virtio'. That seems a bit cheesy, though. Another alternative would be to put a "<tune> element inside <driver> that looks just like memtune, cputune, and interface/tune. Again, the issue here is concern over the attribute list of <driver> getting excessively long. If that's not an issue, then leaving tx_alg as a simple attribute will probably be fine.
If the option isn't listed there, the config item is ignored (should it instead generate a warning log? error out and prevent the domain from starting?) I would lean towards erroring out, the same way that we error out if:
<driver name='vhost'/>
is explicitly requested when the kernel module is not present.
Okay. I'll make it that way in the next revision. [...]
+ virBufferVSprintf(&buf, ",tx=%s", + virDomainNetVirtioTxAlgTypeToString(net->driver.virtio.tx_alg)); + } else { + /* What should we do if the option isn't available? log a + * warning? prevent the domain from starting? Ignore it? + * Right now we're ignoring it. + */ This would be the perfect place to error out with VIR_ERR_CONFIG_UNSUPPORTED.

============================== V3 changes: 0) The actual code remains untouched from V2. 1) Add txmode attribute to the domain schema, and to an xml2xml and xml2argv test (note that the particular file used for the xml2xml test was previously only being used for xml2argv, and it needed some cleaning up before it would work properly. 2) Document the attribute on the website, along with a bold warning to leave it alone! ============================== This is in response to: https://bugzilla.redhat.com/show_bug.cgi?id=629662 Explanation qemu's virtio-net-pci driver allows setting the algorithm used for tx packets to either "bh" or "timer". This is done by adding ",tx=bh" or ",tx=timer" to the "-device virtio-net-pci" commandline option. 'bh' stands for 'bottom half'; when this is set, packet tx is all done in an iothread in the bottom half of the driver. (In libvirt, this option is called the more descriptive "iothread".) 'timer' means that tx work is done in qemu, and if there is more tx data than can be sent at the present time, a timer is set before qemu moves on to do other things; when the timer fires, another attempt is made to send more data. (libvirt retains the name "timer" for this option.) The resulting difference, according to the qemu developer who added the option is: bh makes tx more asynchronous and reduces latency, but potentially causes more processor bandwidth contention since the cpu doing the tx isn't necessarily the cpu where the guest generated the packets. Solution This patch provides a libvirt domain xml knob to change the option on the qemu commandline, by adding a new attribute "txmode" to the <driver> element that can be placed inside any <interface> element in a domain definition. It's use would be something like this: <interface ...> ... <model type='virtio'/> <driver txmode='iothread'/> ... </interface> I chose to put this setting as an attribute to <driver> rather than as a sub-element to <tune> because it is specific to the virtio-net driver, not something that is generally usable by all network drivers. (note that this is the same placement as the "driver name=..." attribute used to choose kernel vs. userland backend for the virtio-net driver.) Actually adding the tx=xxx option to the qemu commandline is only done if the version of qemu being used advertises it in the output of qemu -device virtio-net-pci,? If a particular txmode is requested in the XML, and the option isn't listed in that help output, an UNSUPPORTED_CONFIG error is logged, and the domain fails to start. --- docs/formatdomain.html.in | 50 ++++++++++++++++++++ docs/schemas/domain.rng | 8 +++ src/conf/domain_conf.c | 26 ++++++++++- src/conf/domain_conf.h | 11 ++++ src/qemu/qemu_capabilities.c | 3 + src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_command.c | 27 +++++++++++ tests/qemuhelpdata/qemu-kvm-0.12.1.2-rhel61-device | 25 ++++++++++ tests/qemuhelptest.c | 3 +- .../qemuxml2argv-net-virtio-device.args | 2 +- .../qemuxml2argv-net-virtio-device.xml | 3 + tests/qemuxml2argvtest.c | 2 +- tests/qemuxml2xmltest.c | 1 + 13 files changed, 158 insertions(+), 4 deletions(-) diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 9f7a113..84b1cab 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1378,6 +1378,56 @@ qemu-kvm -net nic,model=? /dev/null ne2k_isa i82551 i82557b i82559er ne2k_pci pcnet rtl8139 e1000 virtio </p> + <h5><a name="elementsDriverBackendOptions">Setting NIC driver-specific options</a></h5> + +<pre> + ... + <devices> + <interface type='network'> + <source network='default'/> + <target dev='vnet1'/> + <model type='virtio'/> + <b><driver txmode='iothread'/></b> + </interface> + </devices> + ...</pre> + + <p> + Some NICs may have tunable driver-specific options. These are + set as attributes of the <code>driver</code> sub-element of the + interface definition. Currently the following attributes are + available for the <code>"virtio"</code> NIC driver: + </p> + + <dl> + <dt><code>txmode</code></dt> + <dd> + The <code>txmode</code> attribute specifies how to handle + transmission of packets when the transmit buffer is full. The + value can be either 'iothread' or 'timer'. + <span class="since">Since 0.8.8 (QEMU and KVM only)</span><br><br> + + If set to 'iothread', packet tx is all done in an iothread in + the bottom half of the driver (this option translates into + adding "tx=bh" to the qemu commandline -device virtio-net-pci + option).<br><br> + + If set to 'timer', tx work is done in qemu, and if there is + more tx data than can be sent at the present time, a timer is + set before qemu moves on to do other things; when the timer + fires, another attempt is made to send more data.<br><br> + + The resulting difference, according to the qemu developer who + added the option is: "bh makes tx more asynchronous and reduces + latency, but potentially causes more processor bandwidth + contention since the cpu doing the tx isn't necessarily the + cpu where the guest generated the packets."<br><br> + + <b>In general you should leave this option alone, unless you + are very certain you know what you are doing.</b> + </dd> + </dl> + <h5><a name="elementsNICSTargetOverride">Overriding the target element</a></h5> <pre> diff --git a/docs/schemas/domain.rng b/docs/schemas/domain.rng index 7fe66e3..8b215f3 100644 --- a/docs/schemas/domain.rng +++ b/docs/schemas/domain.rng @@ -1042,6 +1042,14 @@ </choice> </attribute> </optional> + <optional> + <attribute name="txmode"> + <choice> + <value>iothread</value> + <value>timer</value> + </choice> + </attribute> + </optional> <empty/> </element> </optional> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index a069508..2aeaf1c 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -198,6 +198,11 @@ VIR_ENUM_IMPL(virDomainNetBackend, VIR_DOMAIN_NET_BACKEND_TYPE_LAST, "qemu", "vhost") +VIR_ENUM_IMPL(virDomainNetVirtioTxMode, VIR_DOMAIN_NET_VIRTIO_TX_MODE_LAST, + "default", + "iothread", + "timer") + VIR_ENUM_IMPL(virDomainChrChannelTarget, VIR_DOMAIN_CHR_CHANNEL_TARGET_TYPE_LAST, "guestfwd", @@ -2477,6 +2482,7 @@ virDomainNetDefParseXML(virCapsPtr caps, char *port = NULL; char *model = NULL; char *backend = NULL; + char *txmode = NULL; char *filter = NULL; char *internal = NULL; char *devaddr = NULL; @@ -2565,6 +2571,7 @@ virDomainNetDefParseXML(virCapsPtr caps, model = virXMLPropString(cur, "type"); } else if (xmlStrEqual (cur->name, BAD_CAST "driver")) { backend = virXMLPropString(cur, "name"); + txmode = virXMLPropString(cur, "txmode"); } else if (xmlStrEqual (cur->name, BAD_CAST "filterref")) { filter = virXMLPropString(cur, "filter"); VIR_FREE(filterparams); @@ -2769,6 +2776,18 @@ virDomainNetDefParseXML(virCapsPtr caps, } def->driver.virtio.name = name; } + if (txmode != NULL) { + int m; + if (((m = virDomainNetVirtioTxModeTypeFromString(txmode)) < 0) || + (m == VIR_DOMAIN_NET_VIRTIO_TX_MODE_DEFAULT)) { + virDomainReportError(VIR_ERR_INTERNAL_ERROR, + _("Unknown interface <driver txmode='%s'> " + "has been specified"), + txmode); + goto error; + } + def->driver.virtio.txmode = m; + } } if (filter != NULL) { @@ -2808,6 +2827,7 @@ cleanup: VIR_FREE(bridge); VIR_FREE(model); VIR_FREE(backend); + VIR_FREE(txmode); VIR_FREE(filter); VIR_FREE(type); VIR_FREE(internal); @@ -6813,12 +6833,16 @@ virDomainNetDefFormat(virBufferPtr buf, virBufferEscapeString(buf, " <model type='%s'/>\n", def->model); if (STREQ(def->model, "virtio") && - def->driver.virtio.name) { + (def->driver.virtio.name || def->driver.virtio.txmode)) { virBufferAddLit(buf, " <driver"); if (def->driver.virtio.name) { virBufferVSprintf(buf, " name='%s'", virDomainNetBackendTypeToString(def->driver.virtio.name)); } + if (def->driver.virtio.txmode) { + virBufferVSprintf(buf, " txmode='%s'", + virDomainNetVirtioTxModeTypeToString(def->driver.virtio.txmode)); + } virBufferAddLit(buf, "/>\n"); } } diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index e05064f..30aeccc 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -321,6 +321,15 @@ enum virDomainNetBackendType { VIR_DOMAIN_NET_BACKEND_TYPE_LAST, }; +/* the TX algorithm used for virtio interfaces */ +enum virDomainNetVirtioTxModeType { + VIR_DOMAIN_NET_VIRTIO_TX_MODE_DEFAULT, /* default for this version of qemu */ + VIR_DOMAIN_NET_VIRTIO_TX_MODE_IOTHREAD, + VIR_DOMAIN_NET_VIRTIO_TX_MODE_TIMER, + + VIR_DOMAIN_NET_VIRTIO_TX_MODE_LAST, +}; + /* the mode type for macvtap devices */ enum virDomainNetdevMacvtapType { VIR_DOMAIN_NETDEV_MACVTAP_MODE_VEPA, @@ -341,6 +350,7 @@ struct _virDomainNetDef { union { struct { enum virDomainNetBackendType name; /* which driver backend to use */ + enum virDomainNetVirtioTxModeType txmode; } virtio; } driver; union { @@ -1371,6 +1381,7 @@ VIR_ENUM_DECL(virDomainFS) VIR_ENUM_DECL(virDomainFSAccessMode) VIR_ENUM_DECL(virDomainNet) VIR_ENUM_DECL(virDomainNetBackend) +VIR_ENUM_DECL(virDomainNetVirtioTxMode) VIR_ENUM_DECL(virDomainChrDevice) VIR_ENUM_DECL(virDomainChrChannelTarget) VIR_ENUM_DECL(virDomainChrConsoleTarget) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 4504b7e..bd43087 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -1072,6 +1072,7 @@ qemuCapsExtractDeviceStr(const char *qemu, "-device", "?", "-device", "pci-assign,?", "-device", "virtio-blk-pci,?", + "-device", "virtio-net-pci,?", NULL); virCommandAddEnvPassCommon(cmd); /* qemu -help goes to stdout, but qemu -device ? goes to stderr. */ @@ -1113,6 +1114,8 @@ qemuCapsParseDeviceStr(const char *str, unsigned long long *flags) if (strstr(str, "pci-assign.bootindex")) *flags |= QEMUD_CMD_FLAG_PCI_BOOTINDEX; } + if (strstr(str, "virtio-net-pci.tx=")) + *flags |= QEMUD_CMD_FLAG_VIRTIO_TX_ALG; return 0; } diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index dd39b3b..c29d914 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -92,6 +92,7 @@ enum qemuCapsFlags { QEMUD_CMD_FLAG_CCID_PASSTHRU = (1LL << 55), /* -device ccid-card-passthru */ QEMUD_CMD_FLAG_CHARDEV_SPICEVMC = (1LL << 56), /* newer -chardev spicevmc */ QEMUD_CMD_FLAG_DEVICE_SPICEVMC = (1LL << 57), /* older -device spicevmc*/ + QEMUD_CMD_FLAG_VIRTIO_TX_ALG = (1LL << 58), /* -device virtio-net-pci,tx=string */ }; virCapsPtr qemuCapsInit(virCapsPtr old_caps); diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 1b213da..7b49b3c 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -1577,16 +1577,43 @@ qemuBuildNicDevStr(virDomainNetDefPtr net, { virBuffer buf = VIR_BUFFER_INITIALIZER; const char *nic; + bool usingVirtio = false; if (!net->model) { nic = "rtl8139"; } else if (STREQ(net->model, "virtio")) { nic = "virtio-net-pci"; + usingVirtio = true; } else { nic = net->model; } virBufferAdd(&buf, nic, strlen(nic)); + if (usingVirtio && net->driver.virtio.txmode) { + if (qemuCmdFlags & QEMUD_CMD_FLAG_VIRTIO_TX_ALG) { + virBufferAddLit(&buf, ",tx="); + switch (net->driver.virtio.txmode) { + case VIR_DOMAIN_NET_VIRTIO_TX_MODE_IOTHREAD: + virBufferAddLit(&buf, "bh"); + break; + + case VIR_DOMAIN_NET_VIRTIO_TX_MODE_TIMER: + virBufferAddLit(&buf, "timer"); + break; + default: + /* this should never happen, if it does, we need + * to add another case to this switch. + */ + qemuReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("unrecognized virtio-net-pci 'tx' option")); + goto error; + } + } else { + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("virtio-net-pci 'tx' option not supported in this QEMU binary")); + goto error; + } + } if (vlan == -1) virBufferVSprintf(&buf, ",netdev=host%s", net->info.alias); else diff --git a/tests/qemuhelpdata/qemu-kvm-0.12.1.2-rhel61-device b/tests/qemuhelpdata/qemu-kvm-0.12.1.2-rhel61-device index 8a0e528..8ac9630 100644 --- a/tests/qemuhelpdata/qemu-kvm-0.12.1.2-rhel61-device +++ b/tests/qemuhelpdata/qemu-kvm-0.12.1.2-rhel61-device @@ -72,3 +72,28 @@ virtio-blk-pci.ioeventfd=on/off virtio-blk-pci.vectors=uint32 virtio-blk-pci.indirect_desc=on/off virtio-blk-pci.scsi=on/off +virtio-net-pci.vectors=uint32 +virtio-net-pci.indirect_desc=on/off +virtio-net-pci.csum=on/off +virtio-net-pci.guest_csum=on/off +virtio-net-pci.gso=on/off +virtio-net-pci.guest_tso4=on/off +virtio-net-pci.guest_tso6=on/off +virtio-net-pci.guest_ecn=on/off +virtio-net-pci.guest_ufo=on/off +virtio-net-pci.host_tso4=on/off +virtio-net-pci.host_tso6=on/off +virtio-net-pci.host_ecn=on/off +virtio-net-pci.host_ufo=on/off +virtio-net-pci.mrg_rxbuf=on/off +virtio-net-pci.status=on/off +virtio-net-pci.ctrl_vq=on/off +virtio-net-pci.ctrl_rx=on/off +virtio-net-pci.ctrl_vlan=on/off +virtio-net-pci.ctrl_rx_extra=on/off +virtio-net-pci.mac=macaddr +virtio-net-pci.vlan=vlan +virtio-net-pci.netdev=netdev +virtio-net-pci.x-txtimer=uint32 +virtio-net-pci.x-txburst=int32 +virtio-net-pci.tx=string diff --git a/tests/qemuhelptest.c b/tests/qemuhelptest.c index 3a04b61..11890ab 100644 --- a/tests/qemuhelptest.c +++ b/tests/qemuhelptest.c @@ -480,7 +480,8 @@ mymain(int argc, char **argv) QEMUD_CMD_FLAG_HDA_DUPLEX | QEMUD_CMD_FLAG_DRIVE_AIO | QEMUD_CMD_FLAG_CCID_PASSTHRU | - QEMUD_CMD_FLAG_CHARDEV_SPICEVMC, + QEMUD_CMD_FLAG_CHARDEV_SPICEVMC | + QEMUD_CMD_FLAG_VIRTIO_TX_ALG, 12001, 1, 0); return ret == 0 ? EXIT_SUCCESS : EXIT_FAILURE; diff --git a/tests/qemuxml2argvdata/qemuxml2argv-net-virtio-device.args b/tests/qemuxml2argvdata/qemuxml2argv-net-virtio-device.args index 92bd889..843c3e9 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-net-virtio-device.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-net-virtio-device.args @@ -1,6 +1,6 @@ LC_ALL=C PATH=/bin HOME=/home/test USER=test LOGNAME=test /usr/bin/qemu -S -M \ pc -m 214 -smp 1 -nographic -nodefconfig -nodefaults -monitor \ unix:/tmp/test-monitor,server,nowait -no-acpi -boot c -hda \ -/dev/HostVG/QEMUGuest1 -device virtio-net-pci,vlan=0,id=net0,\ +/dev/HostVG/QEMUGuest1 -device virtio-net-pci,tx=bh,vlan=0,id=net0,\ mac=00:11:22:33:44:55,bus=pci.0,addr=0x2 -net user,vlan=0,name=hostnet0 -usb \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 diff --git a/tests/qemuxml2argvdata/qemuxml2argv-net-virtio-device.xml b/tests/qemuxml2argvdata/qemuxml2argv-net-virtio-device.xml index 572f3cc..4e7abcd 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-net-virtio-device.xml +++ b/tests/qemuxml2argvdata/qemuxml2argv-net-virtio-device.xml @@ -17,10 +17,13 @@ <disk type='block' device='disk'> <source dev='/dev/HostVG/QEMUGuest1'/> <target dev='hda' bus='ide'/> + <address type='drive' controller='0' bus='0' unit='0'/> </disk> + <controller type='ide' index='0'/> <interface type='user'> <mac address='00:11:22:33:44:55'/> <model type='virtio'/> + <driver txmode='iothread'/> </interface> <memballoon model='virtio'/> </devices> diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 52808b5..b42495f 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -356,7 +356,7 @@ mymain(int argc, char **argv) DO_TEST("net-user", 0, false); DO_TEST("net-virtio", 0, false); DO_TEST("net-virtio-device", QEMUD_CMD_FLAG_DEVICE | - QEMUD_CMD_FLAG_NODEFCONFIG, false); + QEMUD_CMD_FLAG_NODEFCONFIG | QEMUD_CMD_FLAG_VIRTIO_TX_ALG, false); DO_TEST("net-virtio-netdev", QEMUD_CMD_FLAG_DEVICE | QEMUD_CMD_FLAG_NETDEV | QEMUD_CMD_FLAG_NODEFCONFIG, false); DO_TEST("net-eth", 0, false); diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index 15d94b7..67e721b 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -159,6 +159,7 @@ mymain(int argc, char **argv) DO_TEST("misc-no-reboot"); DO_TEST("net-user"); DO_TEST("net-virtio"); + DO_TEST("net-virtio-device"); DO_TEST("net-eth"); DO_TEST("net-eth-ifname"); DO_TEST("sound"); -- 1.7.3.4

On 02/10/2011 02:53 AM, Laine Stump wrote:
============================== V3 changes:
0) The actual code remains untouched from V2.
1) Add txmode attribute to the domain schema, and to an xml2xml and xml2argv test (note that the particular file used for the xml2xml test was previously only being used for xml2argv, and it needed some cleaning up before it would work properly.
2) Document the attribute on the website, along with a bold warning to leave it alone!
Looks good.
+++ b/tests/qemuhelpdata/qemu-kvm-0.12.1.2-rhel61-device @@ -72,3 +72,28 @@ virtio-blk-pci.ioeventfd=on/off virtio-blk-pci.vectors=uint32 virtio-blk-pci.indirect_desc=on/off virtio-blk-pci.scsi=on/off +virtio-net-pci.vectors=uint32 +virtio-net-pci.indirect_desc=on/off +virtio-net-pci.csum=on/off
Nice. Should we also be updating qemu-kvm-0.13.0-device with similar content (using the Fedora 14 qemu-kvm, for example), which has virtio-net-pci but lacks the tx parameter, to prove that the flag does not get set in that test case? ACK with that nit addressed (I trust you can amend that file without having to see a v4). However, I'm not sure if this is okay to push now (since you proposed the series before feature freeze) or not (since it missed the freeze announcement and therefore is missing out on release candidate testing); DV, can you give an answer? -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On 02/10/2011 03:35 PM, Eric Blake wrote:
On 02/10/2011 02:53 AM, Laine Stump wrote:
============================== V3 changes:
0) The actual code remains untouched from V2.
1) Add txmode attribute to the domain schema, and to an xml2xml and xml2argv test (note that the particular file used for the xml2xml test was previously only being used for xml2argv, and it needed some cleaning up before it would work properly.
2) Document the attribute on the website, along with a bold warning to leave it alone! Looks good.
+++ b/tests/qemuhelpdata/qemu-kvm-0.12.1.2-rhel61-device @@ -72,3 +72,28 @@ virtio-blk-pci.ioeventfd=on/off virtio-blk-pci.vectors=uint32 virtio-blk-pci.indirect_desc=on/off virtio-blk-pci.scsi=on/off +virtio-net-pci.vectors=uint32 +virtio-net-pci.indirect_desc=on/off +virtio-net-pci.csum=on/off Nice. Should we also be updating qemu-kvm-0.13.0-device with similar content (using the Fedora 14 qemu-kvm, for example), which has virtio-net-pci but lacks the tx parameter, to prove that the flag does not get set in that test case?
ACK with that nit addressed (I trust you can amend that file without having to see a v4). However, I'm not sure if this is okay to push now (since you proposed the series before feature freeze) or not (since it missed the freeze announcement and therefore is missing out on release candidate testing); DV, can you give an answer?
Now pushed, with amendment as noted above. Thanks!

This is in response to: https://bugzilla.redhat.com/show_bug.cgi?id=629662 Explanation qemu's virtio-net-pci driver allows setting the algorithm used for tx packets to either "bh" or "timer". This is done by adding ",tx=bh" or ",tx=timer" to the "-device virtio-net-pci" commandline option. 'bh' stands for 'bottom half'; when this is set, packet tx is all done in an iothread in the bottom half of the driver. (In libvirt, this option is called the more descriptive "iothread".) 'timer' means that tx work is done in qemu, and if there is more tx data than can be sent at the present time, a timer is set before qemu moves on to do other things; when the timer fires, another attempt is made to send more data. (libvirt retains the name "timer" for this option.) The resulting difference, according to the qemu developer who added the option is: bh makes tx more asynchronous and reduces latency, but potentially causes more processor bandwidth contention since the cpu doing the tx isn't necessarily the cpu where the guest generated the packets. Solution This patch provides a libvirt domain xml knob to change the option on the qemu commandline, by adding a new attribute "txmode" to the <driver> element that can be placed inside any <interface> element in a domain definition. It's use would be something like this: <interface ...> ... <model type='virtio'/> <driver txmode='iothread'/> ... </interface> I chose to put this setting as an attribute to <driver> rather than as a sub-element to <tune> because it is specific to the virtio-net driver, not something that is generally usable by all network drivers. (note that this is the same placement as the "driver name=..." attribute used to choose kernel vs. userland backend for the virtio-net driver.) Actually adding the tx=xxx option to the qemu commandline is only done if the version of qemu being used advertises it in the output of qemu -device virtio-net-pci,? If a particular txmode is requested in the XML, and the option isn't listed in that help output, an UNSUPPORTED_CONFIG error is logged, and the domain fails to start. --- Changes from v1: 1) add error log / abort domain startup if option isn't supported by qemu 2) change attribute name from tx_alg to txmode 3) change attribute values from bh|timer to iothread|timer (The difference in length between full patch and delta diff was small enough that I decided to just resend the full patch.) src/conf/domain_conf.c | 26 +++++++++++++++++++++++++- src/conf/domain_conf.h | 11 +++++++++++ src/qemu/qemu_capabilities.c | 3 +++ src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_command.c | 27 +++++++++++++++++++++++++++ 5 files changed, 67 insertions(+), 1 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 84b866b..dee05c4 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -198,6 +198,11 @@ VIR_ENUM_IMPL(virDomainNetBackend, VIR_DOMAIN_NET_BACKEND_TYPE_LAST, "qemu", "vhost") +VIR_ENUM_IMPL(virDomainNetVirtioTxMode, VIR_DOMAIN_NET_VIRTIO_TX_MODE_LAST, + "default", + "iothread", + "timer") + VIR_ENUM_IMPL(virDomainChrChannelTarget, VIR_DOMAIN_CHR_CHANNEL_TARGET_TYPE_LAST, "guestfwd", @@ -2477,6 +2482,7 @@ virDomainNetDefParseXML(virCapsPtr caps, char *port = NULL; char *model = NULL; char *backend = NULL; + char *txmode = NULL; char *filter = NULL; char *internal = NULL; char *devaddr = NULL; @@ -2565,6 +2571,7 @@ virDomainNetDefParseXML(virCapsPtr caps, model = virXMLPropString(cur, "type"); } else if (xmlStrEqual (cur->name, BAD_CAST "driver")) { backend = virXMLPropString(cur, "name"); + txmode = virXMLPropString(cur, "txmode"); } else if (xmlStrEqual (cur->name, BAD_CAST "filterref")) { filter = virXMLPropString(cur, "filter"); VIR_FREE(filterparams); @@ -2769,6 +2776,18 @@ virDomainNetDefParseXML(virCapsPtr caps, } def->driver.virtio.name = name; } + if (txmode != NULL) { + int m; + if (((m = virDomainNetVirtioTxModeTypeFromString(txmode)) < 0) || + (m == VIR_DOMAIN_NET_VIRTIO_TX_MODE_DEFAULT)) { + virDomainReportError(VIR_ERR_INTERNAL_ERROR, + _("Unknown interface <driver txmode='%s'> " + "has been specified"), + txmode); + goto error; + } + def->driver.virtio.txmode = m; + } } if (filter != NULL) { @@ -2808,6 +2827,7 @@ cleanup: VIR_FREE(bridge); VIR_FREE(model); VIR_FREE(backend); + VIR_FREE(txmode); VIR_FREE(filter); VIR_FREE(type); VIR_FREE(internal); @@ -6808,12 +6828,16 @@ virDomainNetDefFormat(virBufferPtr buf, virBufferEscapeString(buf, " <model type='%s'/>\n", def->model); if (STREQ(def->model, "virtio") && - def->driver.virtio.name) { + (def->driver.virtio.name || def->driver.virtio.txmode)) { virBufferAddLit(buf, " <driver"); if (def->driver.virtio.name) { virBufferVSprintf(buf, " name='%s'", virDomainNetBackendTypeToString(def->driver.virtio.name)); } + if (def->driver.virtio.txmode) { + virBufferVSprintf(buf, " txmode='%s'", + virDomainNetVirtioTxModeTypeToString(def->driver.virtio.txmode)); + } virBufferAddLit(buf, "/>\n"); } } diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 44d0a4b..0fd23e7 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -321,6 +321,15 @@ enum virDomainNetBackendType { VIR_DOMAIN_NET_BACKEND_TYPE_LAST, }; +/* the TX algorithm used for virtio interfaces */ +enum virDomainNetVirtioTxModeType { + VIR_DOMAIN_NET_VIRTIO_TX_MODE_DEFAULT, /* default for this version of qemu */ + VIR_DOMAIN_NET_VIRTIO_TX_MODE_IOTHREAD, + VIR_DOMAIN_NET_VIRTIO_TX_MODE_TIMER, + + VIR_DOMAIN_NET_VIRTIO_TX_MODE_LAST, +}; + /* the mode type for macvtap devices */ enum virDomainNetdevMacvtapType { VIR_DOMAIN_NETDEV_MACVTAP_MODE_VEPA, @@ -341,6 +350,7 @@ struct _virDomainNetDef { union { struct { enum virDomainNetBackendType name; /* which driver backend to use */ + enum virDomainNetVirtioTxModeType txmode; } virtio; } driver; union { @@ -1367,6 +1377,7 @@ VIR_ENUM_DECL(virDomainFS) VIR_ENUM_DECL(virDomainFSAccessMode) VIR_ENUM_DECL(virDomainNet) VIR_ENUM_DECL(virDomainNetBackend) +VIR_ENUM_DECL(virDomainNetVirtioTxMode) VIR_ENUM_DECL(virDomainChrDevice) VIR_ENUM_DECL(virDomainChrChannelTarget) VIR_ENUM_DECL(virDomainChrConsoleTarget) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 0e1f79c..935c669 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -1063,6 +1063,7 @@ qemuCapsExtractDeviceStr(const char *qemu, "-device", "?", "-device", "pci-assign,?", "-device", "virtio-blk-pci,?", + "-device", "virtio-net-pci,?", NULL); virCommandAddEnvPassCommon(cmd); /* qemu -help goes to stdout, but qemu -device ? goes to stderr. */ @@ -1104,6 +1105,8 @@ qemuCapsParseDeviceStr(const char *str, unsigned long long *flags) if (strstr(str, "pci-assign.bootindex")) *flags |= QEMUD_CMD_FLAG_PCI_BOOTINDEX; } + if (strstr(str, "virtio-net-pci.tx=")) + *flags |= QEMUD_CMD_FLAG_VIRTIO_TX_ALG; return 0; } diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index dd39b3b..c29d914 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -92,6 +92,7 @@ enum qemuCapsFlags { QEMUD_CMD_FLAG_CCID_PASSTHRU = (1LL << 55), /* -device ccid-card-passthru */ QEMUD_CMD_FLAG_CHARDEV_SPICEVMC = (1LL << 56), /* newer -chardev spicevmc */ QEMUD_CMD_FLAG_DEVICE_SPICEVMC = (1LL << 57), /* older -device spicevmc*/ + QEMUD_CMD_FLAG_VIRTIO_TX_ALG = (1LL << 58), /* -device virtio-net-pci,tx=string */ }; virCapsPtr qemuCapsInit(virCapsPtr old_caps); diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 1b213da..7b49b3c 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -1577,16 +1577,43 @@ qemuBuildNicDevStr(virDomainNetDefPtr net, { virBuffer buf = VIR_BUFFER_INITIALIZER; const char *nic; + bool usingVirtio = false; if (!net->model) { nic = "rtl8139"; } else if (STREQ(net->model, "virtio")) { nic = "virtio-net-pci"; + usingVirtio = true; } else { nic = net->model; } virBufferAdd(&buf, nic, strlen(nic)); + if (usingVirtio && net->driver.virtio.txmode) { + if (qemuCmdFlags & QEMUD_CMD_FLAG_VIRTIO_TX_ALG) { + virBufferAddLit(&buf, ",tx="); + switch (net->driver.virtio.txmode) { + case VIR_DOMAIN_NET_VIRTIO_TX_MODE_IOTHREAD: + virBufferAddLit(&buf, "bh"); + break; + + case VIR_DOMAIN_NET_VIRTIO_TX_MODE_TIMER: + virBufferAddLit(&buf, "timer"); + break; + default: + /* this should never happen, if it does, we need + * to add another case to this switch. + */ + qemuReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("unrecognized virtio-net-pci 'tx' option")); + goto error; + } + } else { + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("virtio-net-pci 'tx' option not supported in this QEMU binary")); + goto error; + } + } if (vlan == -1) virBufferVSprintf(&buf, ",netdev=host%s", net->info.alias); else -- 1.7.3.4

On 02/09/2011 10:14 AM, Laine Stump wrote:
The resulting difference, according to the qemu developer who added the option is:
bh makes tx more asynchronous and reduces latency, but potentially causes more processor bandwidth contention since the cpu doing the tx isn't necessarily the cpu where the guest generated the packets.
This sentence (or something derived from it, using our names of iothread|timer) should be in docs/formatdomain.html.in.
Changes from v1:
1) add error log / abort domain startup if option isn't supported by qemu
2) change attribute name from tx_alg to txmode
3) change attribute values from bh|timer to iothread|timer
All seem reasonable.
(The difference in length between full patch and delta diff was small enough that I decided to just resend the full patch.)
src/conf/domain_conf.c | 26 +++++++++++++++++++++++++- src/conf/domain_conf.h | 11 +++++++++++ src/qemu/qemu_capabilities.c | 3 +++ src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_command.c | 27 +++++++++++++++++++++++++++ 5 files changed, 67 insertions(+), 1 deletions(-)
Incomplete - need a v3 (or at least a follow-up patch) that touches docs/schemas/domain.rng, adds at least one test of the new attribute in tests/qemuxml2argvdata/, and documents the issue.
@@ -1367,6 +1377,7 @@ VIR_ENUM_DECL(virDomainFS) VIR_ENUM_DECL(virDomainFSAccessMode) VIR_ENUM_DECL(virDomainNet) VIR_ENUM_DECL(virDomainNetBackend) +VIR_ENUM_DECL(virDomainNetVirtioTxMode)
Also missing two new entries in src/libvirt_private.syms.
@@ -1104,6 +1105,8 @@ qemuCapsParseDeviceStr(const char *str, unsigned long long *flags) if (strstr(str, "pci-assign.bootindex")) *flags |= QEMUD_CMD_FLAG_PCI_BOOTINDEX; } + if (strstr(str, "virtio-net-pci.tx=")) + *flags |= QEMUD_CMD_FLAG_VIRTIO_TX_ALG;
It's a race between your patch and Jirka's bitmap cleanup :) Is this something that needs to go in 0.8.8, or should we wait until post-release? Looks better, but I don't feel comfortable giving this ack without a completion of the patch series. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org
participants (2)
-
Eric Blake
-
Laine Stump