On 09/16/2014 04:50 PM, Matthew Rosato wrote:
Currently, MAC registration occurs during device creation, which is
early enough that, during live migration, you end up with duplicate
MAC addresses on still-running source and target devices, even though
the target device isn't actually being used yet.
This patch proposes to defer MAC registration until right before
the guest can actually use the device -- In other words, right
before starting guest CPUs.
I've probably led you on at least three wild goose chases related to
this patch - the first was that I had thought that it would be better to
unconditionally wait to ifup *all* network devices. It turns out,
though, that regular tap devices which will be connected to a bridge
should be ifup'ed and attached to the bridge as soon as possible, so
that the forwarding delay timer of the bridge can start to count down.
(However, at the same point that macvtap interfaces need to be ifup'ed
(i.e. your new qemuInterfaceStartDevice()), tap devices connected to
bridges need to have their FDB entries updated (if libvirt is managing
the bridge fdb), so it is useful to have a function to call here, it
just needs to do something different for different types of connections.
Also, your addition of the call to qemuInterfaceStartDevice() during
qemuDomainChangeNet() almost certainly came from me telling you that
something was needed there. I had forgotten, however, that it isn't
possible to change the tap or macvtap device of a network connection
without completely removing it anyway, because qemu requires completely
detaching both ends of the device in order to change the tap, and that
means the device disappears from the guest, in other words it's
impossible to *change* it in that way, you have to actually remove and
re-add it (at one time I had code to do this, but when it failed to work
I inquired and learned of this limitation).
Finally, I think it was me that suggested it would be good to stow away
the vmOp somewhere so it didn't have to be sent up and down the call
stack. When I looked at what you're using it for though (calling
virNetDevVPortProfileDisassociate() in case of a failed
virNetDevSetOnline() in qemuInterfaceStartDevice()), I could see that if
you modify qemuInterfaceStartDevice(s) to return an error on failure,
the caller's cleanup will always automatically take care of doing the
disassociate for you anyway (this is done in qemuProcessStop() as well
as the failure cleanup of qemuDomainAttachNetDevice() - if there is a
cleanup path where that isn't called, you've got bigger problems than a
dangling 802.11qbh association!). So, we *really* don't need the vmOp :-))
(as aside - it turns out that even if we *did* need to do the
disassociate ourselves, analysis of the code shows that there is only a
single value of vmOp that would cause any change in behavior, and we're
guaranteed that vmOp will *never* be set to that value in the cases
where we would be called. I spent *a lot* of time figuring that out
before I realized we didn't need to call disassociate *at all*)
Signed-off-by: Matthew Rosato <mjrosato(a)linux.vnet.ibm.com>
---
Associated BZ:
https://bugzilla.redhat.com/show_bug.cgi?id=1081461
Changes for v3:
* Some minor formatting fixes.
* in qemuNetworkIfaceConnect, set VIR_NETDEV_TAP_CREATE_IFUP
unconditionally.
* in qemuDomainAttachNetDevice, call qemuInterfaceStartDevice only for
VIR_DOMAIN_NET_TYPE_DIRECT, _BRIDGE and _NETWORK.
* in qemuProcessStartCPUs, use 'reason' to determine whether or not
qemuInterfaceStartDevices needs to be called. Basically, it needs
to be called for any reason that the system would be initializing
(or re-initializing).
src/Makefile.am | 3 +-
src/conf/domain_conf.h | 2 ++
src/lxc/lxc_process.c | 4 ++-
src/qemu/qemu_command.c | 3 ++
src/qemu/qemu_hotplug.c | 8 +++++
src/qemu/qemu_interface.c | 78 +++++++++++++++++++++++++++++++++++++++++++
src/qemu/qemu_interface.h | 32 ++++++++++++++++++
src/qemu/qemu_process.c | 7 ++++
src/util/virnetdevmacvlan.c | 8 +++--
src/util/virnetdevmacvlan.h | 2 ++
10 files changed, 142 insertions(+), 5 deletions(-)
create mode 100644 src/qemu/qemu_interface.c
create mode 100644 src/qemu/qemu_interface.h
diff --git a/src/Makefile.am b/src/Makefile.am
index fa741a8..035120e 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -703,7 +703,8 @@ QEMU_DRIVER_SOURCES = \
qemu/qemu_monitor_text.h \
qemu/qemu_monitor_json.c \
qemu/qemu_monitor_json.h \
- qemu/qemu_driver.c qemu/qemu_driver.h
+ qemu/qemu_driver.c qemu/qemu_driver.h \
+ qemu/qemu_interface.c qemu/qemu_interface.h
XENAPI_DRIVER_SOURCES = \
xenapi/xenapi_driver.c xenapi/xenapi_driver.h \
diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h
index 0862bd7..5f328cf 100644
--- a/src/conf/domain_conf.h
+++ b/src/conf/domain_conf.h
@@ -951,6 +951,8 @@ struct _virDomainNetDef {
virNetDevBandwidthPtr bandwidth;
virNetDevVlan vlan;
int linkstate;
+ /* vmOp value saved if deferring interface start */
+ virNetDevVPortProfileOp vmOp;
};
I've removed this as unnecessary, per paragraph 3 above.
/* Used for prefix of ifname of any network name generated dynamically
diff --git a/src/lxc/lxc_process.c b/src/lxc/lxc_process.c
index ed30c37..b2256c0 100644
--- a/src/lxc/lxc_process.c
+++ b/src/lxc/lxc_process.c
@@ -300,6 +300,7 @@ char *virLXCProcessSetupInterfaceDirect(virConnectPtr conn,
virNetDevBandwidthPtr bw;
virNetDevVPortProfilePtr prof;
virLXCDriverConfigPtr cfg = virLXCDriverGetConfig(driver);
+ unsigned int macvlan_create_flags = VIR_NETDEV_MACVLAN_CREATE_IFUP;
/* XXX how todo bandwidth controls ?
* Since the 'net-ifname' is about to be moved to a different
@@ -336,7 +337,8 @@ char *virLXCProcessSetupInterfaceDirect(virConnectPtr conn,
&res_ifname,
VIR_NETDEV_VPORT_PROFILE_OP_CREATE,
cfg->stateDir,
- virDomainNetGetActualBandwidth(net), 0) < 0)
+ virDomainNetGetActualBandwidth(net),
+ macvlan_create_flags) < 0)
This *could* be inferred from the other flags, but I like making it
explicit as you have done (in the future the CREATE_WITH_TAP might *not*
always be the opposite of CREATE_IFUP).
goto cleanup;
ret = res_ifname;
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
index e5270bd..229dff4 100644
--- a/src/qemu/qemu_command.c
+++ b/src/qemu/qemu_command.c
@@ -199,6 +199,9 @@ qemuPhysIfaceConnect(virDomainDefPtr def,
net->ifname = res_ifname;
}
+ /* Save vport profile op for later */
+ net->vmOp = vmop;
+
removed, as noted in paragraph 3 at the top.
virObjectUnref(cfg);
return rc;
}
diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
index 7bc19cd..530e6da 100644
--- a/src/qemu/qemu_hotplug.c
+++ b/src/qemu/qemu_hotplug.c
@@ -30,6 +30,7 @@
#include "qemu_domain.h"
#include "qemu_command.h"
#include "qemu_hostdev.h"
+#include "qemu_interface.h"
#include "domain_audit.h"
#include "domain_nwfilter.h"
#include "virlog.h"
@@ -922,6 +923,8 @@ int qemuDomainAttachNetDevice(virConnectPtr conn,
priv->qemuCaps, tapfd, &tapfdSize) < 0)
goto cleanup;
iface_connected = true;
+ /* Set device online immediately */
+ qemuInterfaceStartDevice(net);
For now calling this for a standard tap device (connected to a bridge)
has no effect, but I'm leaving it in as it is going to be useful very
soon (to update the fdb at the proper time).
if (qemuOpenVhostNet(vm->def, net, priv->qemuCaps,
vhostfd, &vhostfdSize) < 0)
goto cleanup;
} else if (actualType == VIR_DOMAIN_NET_TYPE_DIRECT) {
@@ -937,6 +940,8 @@ int qemuDomainAttachNetDevice(virConnectPtr conn,
VIR_NETDEV_VPORT_PROFILE_OP_CREATE)) <
0)
goto cleanup;
iface_connected = true;
+ /* Set device online immediately */
+ qemuInterfaceStartDevice(net);
Well, actually what makes more sense is to put a single call to
qemuInterfaceStartDevice() after all the if-elses. For all cases except
DIRECT, BRIDGE, and NETWORK it is a NOP, and for those 3 the only
difference will be that it gets called after qemuOpenVhostNet() rather
than before, and that makes no practical difference (also sets us up
better for having it take effect in the future if/when other interface
types need special treatment during "start" phase).
if (qemuOpenVhostNet(vm->def, net, priv->qemuCaps,
vhostfd, &vhostfdSize) < 0)
goto cleanup;
} else if (actualType == VIR_DOMAIN_NET_TYPE_ETHERNET) {
@@ -2070,6 +2075,9 @@ qemuDomainChangeNet(virQEMUDriverPtr driver,
goto cleanup;
}
+ /* Set device online immediately */
+ qemuInterfaceStartDevice(newdev);
+
Removed, as noted in paragraph 2 at the top.
newType = virDomainNetGetActualType(newdev);
if (newType == VIR_DOMAIN_NET_TYPE_HOSTDEV) {
diff --git a/src/qemu/qemu_interface.c b/src/qemu/qemu_interface.c
new file mode 100644
index 0000000..dccfcc4
--- /dev/null
+++ b/src/qemu/qemu_interface.c
@@ -0,0 +1,78 @@
+/*
+ * qemu_interface.c: QEMU interface management
+ *
+ * Copyright IBM Corp. 2014
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library. If not, see
+ * <
http://www.gnu.org/licenses/>.
+ *
+ * Authors:
+ * Matthew J. Rosato <mjrosato(a)linux.vnet.ibm.com>
+ */
+
+#include <config.h>
+
+#include "qemu_interface.h"
+#include "virnetdev.h"
+#include "virnetdevtap.h"
+#include "virnetdevmacvlan.h"
+#include "virnetdevvportprofile.h"
+
+/**
+ * qemuInterfaceStartDevice:
+ * @net: net device to start
+ *
+ * Based upon the type of device provided, perform the appropriate
+ * work to set the device online.
+ */
+void
+qemuInterfaceStartDevice(virDomainNetDefPtr net)
Both of these functions should return errors, since failure to ifup an
interface was previously grounds for aborting whatever operation was
happening.
+{
+ switch (virDomainNetGetActualType(net)) {
I like the idea of typecasting the result here and putting in in all
possible cases (even if empty) so that anyone adding a new type will be
notified that they potentially need to add something.
+ case VIR_DOMAIN_NET_TYPE_BRIDGE:
The indentation is off here - the "case" should be at the same level as
"switch", with contents of the case indented by 4.
+ case VIR_DOMAIN_NET_TYPE_NETWORK:
+ if (virNetDevSetOnline(net->ifname, true) < 0) {
+ ignore_value(virNetDevTapDelete(net->ifname));
+ }
Actually, in the case of tap devices connected to a bridge, they should
be (and are) created IFF_UP (required to start the forwarding delay
timers counting down as soon as possible), so doing this here is
redundant. What *is* needed here is to update the bridge's fdb if
macTableManager='libvirt'. I'm removing the existing call to
virNetDevSetOnline() here, and will add in a call to update the fdb in
an upcoming patch.
+ break;
+ case VIR_DOMAIN_NET_TYPE_DIRECT:
+ if (virNetDevSetOnline(net->ifname, true) < 0) {
+ ignore_value(virNetDevVPortProfileDisassociate(net->ifname,
+ virDomainNetGetActualVirtPortProfile(net),
+ &net->mac,
+ virDomainNetGetActualDirectDev(net),
+ -1,
+ net->vmOp));
As noted in paragraph 4 at the top - if we properly return an error from
this function (and qemuInterfaceStartDevices) then we don't need to call
virNetDevVPortProfileDisassociate here - it will be handled in the
cleanup of our caller. So I'm removing that bit.
+ }
+ break;
+ }
+}
+
+/**
+ * qemuInterfaceStartDevices:
+ * @def: domain definition
+ *
+ * Set all ifaces associated with this domain to the online state.
+ */
+void
+qemuInterfaceStartDevices(virDomainDefPtr def)
again - this should return errors rather than ignoring them.
+{
+ size_t i;
+
+ for (i = 0; i < def->nnets; i++) {
+ qemuInterfaceStartDevice(def->nets[i]);
+ }
Because I'm adding a "goto cleanup" on failure here, we *will* need the
braces, but just for future reference if this code had stayed as is, you
would need to remove the braces because the body is a single line.
+
+ return;
+}
diff --git a/src/qemu/qemu_interface.h b/src/qemu/qemu_interface.h
new file mode 100644
index 0000000..5810cda
--- /dev/null
+++ b/src/qemu/qemu_interface.h
@@ -0,0 +1,32 @@
+/*
+ * qemu_interface.h: QEMU interface management
+ *
+ * Copyright IBM Corp. 2014
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library. If not, see
+ * <
http://www.gnu.org/licenses/>.
+ *
+ * Authors:
+ * Matthew J. Rosato <mjrosato(a)linux.vnet.ibm.com>
+ */
+
+#ifndef __QEMU_INTERFACE_H__
+# define __QEMU_INTERFACE_H__
+
+# include "domain_conf.h"
+
+void qemuInterfaceStartDevice(virDomainNetDefPtr net);
+void qemuInterfaceStartDevices(virDomainDefPtr def);
changed both to return int.
+
+#endif /* __QEMU_INTERFACE_H__ */
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index f391743..3fc50a0 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -42,6 +42,7 @@
#include "qemu_hostdev.h"
#include "qemu_hotplug.h"
#include "qemu_migration.h"
+#include "qemu_interface.h"
#include "cpu/cpu.h"
#include "datatypes.h"
@@ -2947,6 +2948,12 @@ qemuProcessStartCPUs(virQEMUDriverPtr driver, virDomainObjPtr vm,
qemuDomainObjPrivatePtr priv = vm->privateData;
virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver);
+ /* Bring up netdevs before starting CPUs */
+ if (reason != VIR_DOMAIN_RUNNING_UNPAUSED &&
+ reason != VIR_DOMAIN_RUNNING_SAVE_CANCELED) {
+ qemuInterfaceStartDevices(vm->def);
+ }
+
changed to goto cleanup if qemuInterfaceStartDevices() fails, which will
guarantee that disassociate is called in the cleanup)
VIR_DEBUG("Using lock state '%s'",
NULLSTR(priv->lockState));
if (virDomainLockProcessResume(driver->lockManager, cfg->uri,
vm, priv->lockState) < 0) {
diff --git a/src/util/virnetdevmacvlan.c b/src/util/virnetdevmacvlan.c
index c83341c..1edf3ae 100644
--- a/src/util/virnetdevmacvlan.c
+++ b/src/util/virnetdevmacvlan.c
@@ -902,9 +902,11 @@ int virNetDevMacVLanCreateWithVPortProfile(const char *tgifname,
goto link_del_exit;
}
- if (virNetDevSetOnline(cr_ifname, true) < 0) {
- rc = -1;
- goto disassociate_exit;
+ if (flags & VIR_NETDEV_MACVLAN_CREATE_IFUP) {
+ if (virNetDevSetOnline(cr_ifname, true) < 0) {
+ rc = -1;
+ goto disassociate_exit;
+ }
}
if (flags & VIR_NETDEV_MACVLAN_CREATE_WITH_TAP) {
diff --git a/src/util/virnetdevmacvlan.h b/src/util/virnetdevmacvlan.h
index 41aa4e2..41b4014 100644
--- a/src/util/virnetdevmacvlan.h
+++ b/src/util/virnetdevmacvlan.h
@@ -44,6 +44,8 @@ typedef enum {
VIR_NETDEV_MACVLAN_CREATE_NONE = 0,
/* Create with a tap device */
VIR_NETDEV_MACVLAN_CREATE_WITH_TAP = 1 << 0,
+ /* Bring the interface up */
+ VIR_NETDEV_MACVLAN_CREATE_IFUP = 1 << 1,
} virNetDevMacVLanCreateFlags;
int virNetDevMacVLanCreate(const char *ifname,
ACK with the above modifications. I've made those (see attached diff)
and am pushing this as a bugfix.