On Tue, Jul 13, 2010 at 02:17:17AM -0400, Laine Stump wrote:
From: Laine Stump <laine(a)redhat.com>
(I don't know whether or not we want to commit this upstream yet - the
proposed iptables and kernel module backend for the changes have been
posted but not yet committed upstream. On the other hand, the new
libvirt code ends up simply printing a warning message if the
necessary iptables support isn't yet in place.)
Well in general we try to avoid this kind of preventive code change
as we're not 100% sure the support will ever make it unchanged in
upstream for dependancies. But that should not stop us from reviewing
the patches !
This patch attempts to take advantage of a newly added netfilter
module to correct for a problem with some guest DHCP client
implementations when used in conjunction with a DHCP server run on the
host systems with packet checksum offloading enabled.
The problem is that, when the guest uses a RAW socket to read the DHCP
response packets, the checksum hasn't yet been fixed by the IP stack,
so it is incorrect.
The fix implemented here is to add a rule to the POSTROUTING chain of
the mangle table in iptables that fixes up the checksum for packets on
the virtual network's bridge that are destined for the bootpc port (ie
"dhcpc", ie port 68) port on the guest.
Only very new versions of iptables will have this support (it has been
submitted upstream, but not yet committed), so a failure to add this
rule only results in a warning message. The iptables patch is here:
http://patchwork.ozlabs.org/patch/58525/
A corresponding kernel module patch is also required (the backend of
the iptables patch) and has been submitted, but I don't have the
details for that (I tested using a pre-built image I received from the
developer, Michael Tsirkin).
---
src/libvirt_private.syms | 2 +
src/network/bridge_driver.c | 17 ++++++++++
src/util/iptables.c | 71 +++++++++++++++++++++++++++++++++++++++++++
src/util/iptables.h | 6 ++++
4 files changed, 96 insertions(+), 0 deletions(-)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms
index 778ceb1..d81d4cf 100644
--- a/src/libvirt_private.syms
+++ b/src/libvirt_private.syms
@@ -328,6 +328,7 @@ iptablesAddForwardAllowRelatedIn;
iptablesAddForwardMasquerade;
iptablesAddForwardRejectIn;
iptablesAddForwardRejectOut;
+iptablesAddOutputFixUdpChecksum;
iptablesAddTcpInput;
iptablesAddUdpInput;
iptablesContextFree;
@@ -339,6 +340,7 @@ iptablesRemoveForwardAllowRelatedIn;
iptablesRemoveForwardMasquerade;
iptablesRemoveForwardRejectIn;
iptablesRemoveForwardRejectOut;
+iptablesRemoveOutputFixUdpChecksum;
iptablesRemoveTcpInput;
iptablesRemoveUdpInput;
diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c
index 72255c1..c4480ff 100644
--- a/src/network/bridge_driver.c
+++ b/src/network/bridge_driver.c
@@ -781,6 +781,19 @@ networkAddIptablesRules(struct network_driver *driver,
!networkAddRoutingIptablesRules(driver, network))
goto err8;
+ /* If we are doing local DHCP service on this network, attempt to
+ * add a rule that will fixup the checksum of DHCP response
+ * packets back to the guests (but report failure without
+ * aborting, since not all iptables implementations support it).
+ */
+
+ if ((network->def->ipAddress || network->def->nranges) &&
+ (iptablesAddOutputFixUdpChecksum(driver->iptables,
+ network->def->bridge, 68) != 0)) {
+ VIR_WARN("Could not add rule to fixup DHCP response checksums "
+ "on network '%s'", network->def->name);
+ }
+
return 1;
err8:
@@ -811,6 +824,10 @@ networkAddIptablesRules(struct network_driver *driver,
static void
networkRemoveIptablesRules(struct network_driver *driver,
virNetworkObjPtr network) {
+ if (network->def->ipAddress || network->def->nranges) {
+ iptablesRemoveOutputFixUdpChecksum(driver->iptables,
+ network->def->bridge, 68);
+ }
hum, there is no return code to check, if yes the block isn't
necessary. Also a empty line should be added to separate from next if
block, but it's purely cosmetic :-)
if (network->def->forwardType != VIR_NETWORK_FORWARD_NONE)
{
if (network->def->forwardType == VIR_NETWORK_FORWARD_NAT) {
iptablesRemoveForwardMasquerade(driver->iptables,
diff --git a/src/util/iptables.c b/src/util/iptables.c
index d06b857..9b888e5 100644
--- a/src/util/iptables.c
+++ b/src/util/iptables.c
@@ -60,6 +60,7 @@ struct _iptablesContext
iptRules *input_filter;
iptRules *forward_filter;
iptRules *nat_postrouting;
+ iptRules *mangle_postrouting;
};
static void
@@ -188,6 +189,9 @@ iptablesContextNew(void)
if (!(ctx->nat_postrouting = iptRulesNew("nat",
"POSTROUTING")))
goto error;
+ if (!(ctx->mangle_postrouting = iptRulesNew("mangle",
"POSTROUTING")))
+ goto error;
+
return ctx;
error:
@@ -210,6 +214,8 @@ iptablesContextFree(iptablesContext *ctx)
iptRulesFree(ctx->forward_filter);
if (ctx->nat_postrouting)
iptRulesFree(ctx->nat_postrouting);
+ if (ctx->mangle_postrouting)
+ iptRulesFree(ctx->mangle_postrouting);
VIR_FREE(ctx);
}
@@ -753,3 +759,68 @@ iptablesRemoveForwardMasquerade(iptablesContext *ctx,
{
return iptablesForwardMasquerade(ctx, network, physdev, REMOVE);
}
+
+
+static int
+iptablesOutputFixUdpChecksum(iptablesContext *ctx,
+ const char *iface,
+ int port,
+ int action)
+{
+ char portstr[32];
+
+ snprintf(portstr, sizeof(portstr), "%d", port);
+ portstr[sizeof(portstr) - 1] = '\0';
+
+ return iptablesAddRemoveRule(ctx->mangle_postrouting,
+ action,
+ "--out-interface", iface,
+ "--protocol", "udp",
+ "--destination-port", portstr,
+ "--jump", "CHECKSUM",
"--checksum-fill",
+ NULL);
+}
+
+/**
+ * iptablesAddOutputFixUdpChecksum:
+ * @ctx: pointer to the IP table context
+ * @iface: the interface name
+ * @port: the UDP port to match
+ *
+ * Add an rule to the mangle table's POSTROUTING chain that fixes up the
+ * checksum of packets with the given destination @port.
+ * the given @iface interface for TCP packets.
+ *
+ * Returns 0 in case of success or an error code in case of error.
+ * (NB: if the system's iptables does not support checksum mangling,
+ * this will return an error, which should be ignored.)
+ */
+
+int
+iptablesAddOutputFixUdpChecksum(iptablesContext *ctx,
+ const char *iface,
+ int port)
+{
+ return iptablesOutputFixUdpChecksum(ctx, iface, port, ADD);
+}
+
+/**
+ * iptablesRemoveOutputFixUdpChecksum:
+ * @ctx: pointer to the IP table context
+ * @iface: the interface name
+ * @port: the UDP port of the rule to remove
+ *
+ * Removes the checksum fixup rule that was previous added with
+ * iptablesAddOutputFixUdpChecksum.
+ *
+ * Returns 0 in case of success or an error code in case of error
+ * (again, if iptables doesn't support checksum fixup, this will
+ * return an error, which should be ignored)
+ */
+int
+iptablesRemoveOutputFixUdpChecksum(iptablesContext *ctx,
+ const char *iface,
+ int port)
+{
+ return iptablesOutputFixUdpChecksum(ctx, iface, port, REMOVE);
+}
diff --git a/src/util/iptables.h b/src/util/iptables.h
index 7d55a6d..5c2e553 100644
--- a/src/util/iptables.h
+++ b/src/util/iptables.h
@@ -89,5 +89,11 @@ int iptablesAddForwardMasquerade (iptablesContext
*ctx,
int iptablesRemoveForwardMasquerade (iptablesContext *ctx,
const char *network,
const char *physdev);
+int iptablesAddOutputFixUdpChecksum (iptablesContext *ctx,
+ const char *iface,
+ int port);
+int iptablesRemoveOutputFixUdpChecksum (iptablesContext *ctx,
+ const char *iface,
+ int port);
#endif /* __QEMUD_IPTABLES_H__ */
I appreciate the level of comments :-)
Patch looks fine but I would wait a bit until support is upstream, once
it is, ACK !
Daniel
--
Daniel Veillard | libxml Gnome XML XSLT toolkit
http://xmlsoft.org/
daniel(a)veillard.com | Rpmfind RPM search engine
http://rpmfind.net/
http://veillard.com/ | virtualization library
http://libvirt.org/