On 04/17/2015 04:53 AM, Chen Fan wrote:
backgrond:
Live migration is one of the most important features of virtualization technology.
With regard to recent virtualization techniques, performance of network I/O is critical.
Current network I/O virtualization (e.g. Para-virtualized I/O, VMDq) has a significant
performance gap with native network I/O. Pass-through network devices have near
native performance, however, they have thus far prevented live migration. No existing
methods solve the problem of live migration with pass-through devices perfectly.
There was an idea to solve the problem in website:
https://www.kernel.org/doc/ols/2008/ols2008v2-pages-261-267.pdf
Please refer to above document for detailed information.
This functionality has been on my mind/bug list for a long time, but I
haven't been able to pursue it much. See this BZ, along with the
original patches submitted by Shradha Shah from SolarFlare:
https://bugzilla.redhat.com/show_bug.cgi?id=896716
(I was a bit optimistic in my initial review of the patches - there are
actually a lot of issues that weren't handled by those patches.)
So I think this problem maybe could be solved by using the combination of existing
technology. and the following steps are we considering to implement:
- before boot VM, we anticipate to specify two NICs for creating bonding device
(one plugged and one virtual NIC) in XML. here we can specify the NIC's mac
addresses
in XML, which could facilitate qemu-guest-agent to find the network interfaces in
guest.
An interesting idea, but I think that is a 2nd level enhancement, not
necessary initially (and maybe not ever, due to the high possibility of
it being extremely difficult to get right in 100% of the cases).
- when qemu-guest-agent startup in guest it would send a notification to libvirt,
then libvirt will call the previous registered initialize callbacks. so through
the callback functions, we can create the bonding device according to the XML
configuration. and here we use netcf tool which can facilitate to create bonding
device
easily.
This isn't quite making sense - the bond will be on the guest, which may
not have netcf installed. Anyway, I think it should be up to the guest's
own system network config to have the bond already setup. If you try to
impose it from outside that infrastructure, you run too much risk of
running afoul of something on the guest (e.g. NetworkManager)
- during migration, unplug the passthroughed NIC. then do native migration.
Correct. This is the most important part. But not just unplugging it,
you also need to wait until the unplug operation completes (it is
asynchronous). (After this point, the emulated NIC that is part of the
bond would get all of the traffic).
- on destination side, check whether need to hotplug new NIC according to specified
XML.
usually, we use migrate "--xml" command option to specify the destination
host NIC mac
address to hotplug a new NIC, because source side passthrough NIC mac address is
different,
then hotplug the deivce according to the destination XML configuration.
Why does the MAC address need to be different? Are you suggesting doing
this with passed-through non-SRIOV NICs? An SRIOV virtual function gets
its MAC address from the libvirt config, so it's very simple to use the
same MAC address across the migration. Any network card that would be
able to do this on any sort of useful scale will be SRIOV-capable (or
should be replaced with one that is - some of them are not that expensive).
TODO:
1. when hot add a new NIC in destination side after migration finished, the NIC
device
need to re-enslave on bonding device in guest. otherwise, it is offline. maybe
we should consider bonding driver to support add interfaces dynamically.
I never looked at the details of how SolarFlare's code handled the guest
side (they have/had their own patchset they maintained for some older
version of libvirt which integrated with some sort of enhanced bonding
driver on the guests). I assumed the bond driver could handle this
already, but have to say I never investigated.
This is an example on how this might work, so I want to hear some voices about this
scenario.
Thanks,
Chen
Chen Fan (7):
qemu-agent: add agent init callback when detecting guest setup
qemu: add guest init event callback to do the initialize work for
guest
hostdev: add a 'bond' type element in <hostdev> element
Putting this into <hostdev> is the wrong approach, for two reasons: 1)
it doesn't account for the device to be used being in a different
address on the source and destination hosts, 2) the <interface> element
already has much of the config you need, and an interface type
supporting hostdev passthrough.
It has been possible to do passthrough of an SRIOV VF via <interface
type='hostdev'> for a long time now and, even better, via an <interface
type='network'> where the network pointed to contains a pool of VFs - As
long as the source and destination hosts both have networks with the
same name, libvirt will be able to find a currently available device on
the destination as it migrates from one host to another instead of
relying on both hosts having the exact same device at the exact same
address on the host and destination (and also magically unused by any
other guest). This page explains the use of a "hostdev network" which
has a pool of devices:
http://wiki.libvirt.org/page/Networking#Assignment_from_a_pool_of_SRIOV_V...
This was designed specifically with the idea in mind that one day it
would be possible to migrate a domain with a hostdev device (as long as
the guest could handle the hostdev device being temporarily unplugged
during the migration).
qemu-agent: add qemuAgentCreateBond interface
hostdev: add parse ip and route for bond configure
Again, I think that this level of detail about the guest network config
belongs on the guest, not in libvirt.
migrate: hot remove hostdev at perform phase for bond device
^^ this is the useful part but I don't think the right method is to make
this action dependent on the device being a "bond".
I think that in this respect Shradha's patches had a better idea - any
hostdev (or, by implication <interface type='hostdev'> or, much more
usefully <interface type='network'> pointing to a pool of VFs - could
have an attribute "ephemeral". If ephemeral was "yes", then the
device
would always be unplugged prior to migration and re-plugged when
migration was completed (the same thing should be done when
saving/restoring a domain which also can't currently be done with a
domain that has a passthrough device).
For that matter, this could be a general-purpose thing (although
probably most useful for hostdevs) - just make it possible for *any*
hotpluggable device to be "ephemeral"; the meaning of this would be that
every device marked as ephemeral should be unplugged prior to migration
or save (and libvirt should wait for qemu to notify that the unplug is
completed), and re-plugged right after the guest is restarted.
(possibly it should be implemented as an <ephemeral> *element* rather
than attribute, so that options could be specified).
After that is implemented and works properly, then it might be the time
to think about auto-creating the bond (although again, my opinion is
that this is getting a bit too intrusive into the guest (and making it
more likely to fail - I know from long experience with netcf that it is
all too easy for some other service on the system (ahem) to mess up all
your hard work); I think it would be better to just let the guest deal
with setting up a bond in its system network config, and if the bond
driver can't handle having a device in the bond unplugging and plugging,
then the bond driver should be enhanced).
migrate: add hostdev migrate status to support hostdev migration
docs/schemas/basictypes.rng | 6 ++
docs/schemas/domaincommon.rng | 37 ++++++++
src/conf/domain_conf.c | 195 ++++++++++++++++++++++++++++++++++++++---
src/conf/domain_conf.h | 40 +++++++--
src/conf/networkcommon_conf.c | 17 ----
src/conf/networkcommon_conf.h | 17 ++++
src/libvirt_private.syms | 1 +
src/qemu/qemu_agent.c | 196 +++++++++++++++++++++++++++++++++++++++++-
src/qemu/qemu_agent.h | 12 +++
src/qemu/qemu_command.c | 3 +
src/qemu/qemu_domain.c | 70 +++++++++++++++
src/qemu/qemu_domain.h | 14 +++
src/qemu/qemu_driver.c | 38 ++++++++
src/qemu/qemu_hotplug.c | 8 +-
src/qemu/qemu_migration.c | 91 ++++++++++++++++++++
src/qemu/qemu_migration.h | 4 +
src/qemu/qemu_process.c | 32 +++++++
src/util/virhostdev.c | 3 +
18 files changed, 745 insertions(+), 39 deletions(-)