On Mon, Mar 08, 2010 at 04:29:42PM -0800, Ed Swierk wrote:
I posted this RFC/patch set a few weeks ago but didn't receive
any
response. My implementation works, but I'd like to hear from anyone more
familiar with libvirt concurrency than I (i.e. nearly everyone) about
how it might be improved.
In v2 I rebased against libvirt-0.7.7 and made a bunch of minor changes.
Thanks for taking the time to update this & remind us about it!
Using a bridge to connect a qemu NIC to a host interface offers a
fair
amount of flexibility to reconfigure the host without restarting the VM.
For example, if the bridge connects host interface eth0 to the qemu tap0
interface, eth0 can be hot-removed and hot-plugged without affecting the
VM. Similarly, if the bridge connects host VLAN interface vlan0 to the
qemu tap0 interface, the admin can easily replace vlan0 with vlan1
without the VM noticing.
Using the macvtap driver instead of a kernel bridge, the host interface
is much more tightly tied to the VM. Qemu communicates with the macvtap
interface through a file descriptor, and the macvtap interface is bound
permanently to a specific host interface when it is first created.
What's more, if the underlying host interface disappears, the macvtap
interface vanishes along with it, leaving the VM holding a file
descriptor for a deleted file.
There is a related issue which IIRC Stefan raised before for migration,
which is that assuming the host interface has the same name on src &
dst is somewhat inflexible. ie, eth0 on src may be plugged to the same
LAN as eth3 on the destination.
Even in the context of a single host, in your scenario of hotpluggable
NICs, it could end up with the VM tap device having to switch from
eth0 to eth1 after hotplug.
To avoid race conditions during system startup, I would like libvirt
to
allow starting up the VM with a NIC even if the underlying host
interface doesn't yet exist, deferring creation of the macvtap interface
(analogous to starting up the VM with a tap interface bound to an orphan
bridge). To support adding and removing a host interface without
restarting the VM, I would like libvirt to react to the (re)appearance
of the underlying host interface, creating a new macvtap interface and
passing the new fd to qemu to reconnect to the NIC.
(It would also be nice if libvirt allowed the user to change which
underlying host interface the qemu NIC is connected to. I'm ignoring
this issue for now, except to note that implementing the above features
should make this easier.)
The problem I have the idea of libvirt automatically re-connecting the
interfaces is that it embeds in a policy in libvirt. We aim to avoid
hardcoding policies in libvirt itself because it limits the way in which
apps can make use of libvirt. Having libvirt auto-reconnect NICs, would
make it difficult for an app which didn't want that, or wanted to reconfigure
the NIC to point at a different named interface.
So while I agree that the scenario you raise is one we need a solution to,
I don't think libvirt should be auto-reconnecting NICs itself. Instead we
should provide a mechanism to allow applications to reconfigure the NIC
backends via the libvirt API. A separate management service, could then use
this to implement automatic re-connection, if desired.
The libvirt API already supports domainAttachDevice and
domainDetachDevice to add or remove an interface while the VM is
running. In the qemu implementation, these commands add or remove the
VM NIC device as well as reconfiguring the host side. This works only
if the OS and application running in the VM can handle PCI hotplug and
dynamically reconfigure its network. I would like to isolate the VM
from changes to the host network setup, whether you use macvtap or a
bridge.
The changes I think are needed to implement this include:
1. Refactor qemudDomainAttachNetDevice/qemudDomainDetachNetDevice, which
currently handle both backend (host) setup and adding/removing the VM
NIC device; move the backend setup code into separate functions that can
called separately without affecting VM devices.
Agreed, this would be a useful refactoring.
2. Implement a thread or task that watches for changes to the
underlying
host interface for each configured macvtap interface, and reacts by
invoking the appropriate backend setup code.
I don't think we should be doing this piece.
3. Change qemudBuildCommandLine to defer backend setup if qemu
supports
the necessary features for doing it later (e.g. the host_net_add monitor
command).
To cope with this, I think we should introduce a new type of network
configuration in the XML format. A type='none' which would indicate
that a NIC should be created, but not connected to any host device.
On the topic of hotplug, for disks, we currently allow change of media
for CDROMs/Floppys by calling virDomainAttachDevice & detecting that
it is an existing device. We could do a similar for NICs, though it is
a little gross so I'm thinking we might want an explicit API called
virDomainModifyDevice() we could use for all types of device reconfiguration.
In the context of NICs, we use virDomainModifyDevice to enable changing
of the host backend type.
THus, a guest could be booted with a NIC configured with type=none, and
later virDomainModifyDevice could change it to type=direct for macvtap
or type=bridge, etc, etc
This allows an external app to make arbitrary changes to a NIC on the
fly.
We probably also want to introduce the idea of link state to the NIC
XML format. The virDomainModifyDevice API could use the QEMU monitor
"set_link" command for this. Thus the guest OS will see the NIC link
state go up & down, allowing it to redo DHCP, etc.
I ran into two major issues while implementing (2) and (3):
- Can we use the existing virEvent functions to invoke the reconnection
process, triggered either by a timer or by an event from the host? It
seems like this ought to work, but it appears that communication between
libvirt and the qemu monitor relies on an event, and since all events
run in the same thread, there's no way for an event to call the monitor.
- Should the reconnection process use udev or hal to get notifications,
or leverage the node device code which itself uses udev or hal?
Currently there doesn't appear to be a way to get notifications of
changes to node devices; if there were, we'd still need to address the
threading issue. If we use node devices, what changes to the
configuration schema would be needed to associate a macvtap interface
with the underlying node device?
I think we need to make it possible for applications to see host devices
come & go. We expose the current set of devices via our virNodeDev APIs,
and internally track the add/remove events from hal/udev, but we don't
expose these events to apps. We really need to add an event API for
virNodeDev to allow apps to be notified when a device comes / goes. They
would need this to be able to then decide to change the guest NIC host
backend config.
The combo of allowing virDomainModifyDevice to change NIC host backends,
and event notifications from virNodeDev APIs, allows a external management
app to implement the kind of policy you are suggesting, and all sorts of
other policies too.
Longer term though, I think we need to add a managed object called a
'switch' / virSwitchPtr. This would encapsulate a connection to a LAN.
A guest XML config would just refer to a named switch. The switch itself
could be configured with any of the macvtap modes, or traditional bridging.
We could allow the switch to be re-configured on the fly to point to a
different host device, or change between macvtap / bridging, all without
ever needing to alter the guest config. This is especially useful for the
migration scenario.
Regards,
Daniel
--
|: Red Hat, Engineering, London -o-
http://people.redhat.com/berrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org -o-
http://deltacloud.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|