On 03/21/2017 04:03 PM, Laine Stump wrote:
It's been a long-standing problem that when you stop and restart
a
libvirt network, the guest tap devices are no longer connected to the
network. Until now the only way to recover from this was to either
shutdown and restart all the affected guests, or to detach and then
re-attach all affected guest network devices.
These patches permit the situation to be remedied by just restarting
libvirtd - when the hypervisor drivers are notifying the network
drivers of each connection, the network can retrieve the current
"master" for the tap device and compare it to the network's bridge
device. If they don't match, then it disconnects the tap from any
incorrect bridge and connects to the correct bridge.
***
For now, that's as far as we go - *semi*-automated (since you need to
restart libvirtd for it to happen). My intent is for this to be
completely automated, but the logic to do that is a bit "convoluted"
so I've deferred trying to implement it until I've given it more
thought - a few different trains of thought have led to dead-ends, and
so far only one seems reasonably doable:
* add a networkStartCallback list to the network driver state object.
* as each hypervisor driver that uses the network driver is
initialized, it will call an internal-only function in the network
driver to register a callback.
* whenever a network is started, it will call all the registered
callback functions (sending the name of the network being started,
and some HV-specific context pointer).
* Each hypervisor's callback function will look through all of its
active domains for interfaces that are using the given network, and
for each such interface, will call networkNotifyActualDevice() (the
function that has been updated in Patch 3/3 to reconnect taps to
bridges).
The "convoluted" part here is that we need to make sure there is
enough (but not too much!) locking/refcounting both when adding items
to the callback list (which should only be done single threaded, since
it happens during the driver initializations) and when the
networkStart() function (in some state of lockedness/refcountedness)
calls over to some e.g. qemu function that will then need to do some
amount of locking (at least on each domain as it is processed) and
then calls back into the network driver (which will need "some amount"
of locking/refcounting). Since we're calling from network driver into
qemu into network driver, while the normal nesting is just calling
from qemu into network, I want to be sure there is no possibility for
a deadlock. (Also, I want to avoid making the list of callbacks any
more complicated than absolutely necessary - the "in" thing to do
these days seems to be to allocate a hash table, but there's a lot of
extra code required for that (see util/virclosecallbacks.[hc]) which
seems like overkill for a list of maximum 4-5 items that will *never*
change after the driver initialization - if anyone has ideas/opinions
about that, please speak up.
Laine Stump (2):
util: new function virNetDevGetMaster()
util: new function virNetDevTapAttachBridge()
root (1):
network: reconnect tap devices during networkNotifyActualDevice
You might want to reset the owner here ;-)
src/libvirt_private.syms | 2 +
src/network/bridge_driver.c | 30 +++++++++++-
src/util/virnetdev.c | 49 +++++++++++++++++++
src/util/virnetdev.h | 3 ++
src/util/virnetdevtap.c | 111 +++++++++++++++++++++++++++++---------------
src/util/virnetdevtap.h | 12 +++++
6 files changed, 169 insertions(+), 38 deletions(-)
ACK series.
Michal