On Thu, Apr 23, 2015 at 11:01:44AM -0400, Laine Stump wrote:
On 04/23/2015 04:34 AM, Chen Fan wrote:
>
> On 04/20/2015 06:29 AM, Laine Stump wrote:
>> On 04/17/2015 04:53 AM, Chen Fan wrote:
>>> - on destination side, check whether need to hotplug new NIC
>>> according to specified XML.
>>> usually, we use migrate "--xml" command option to specify the
>>> destination host NIC mac
>>> address to hotplug a new NIC, because source side passthrough
>>> NIC mac address is different,
>>> then hotplug the deivce according to the destination XML
>>> configuration.
>> Why does the MAC address need to be different? Are you suggesting doing
>> this with passed-through non-SRIOV NICs? An SRIOV virtual function gets
>> its MAC address from the libvirt config, so it's very simple to use the
>> same MAC address across the migration. Any network card that would be
>> able to do this on any sort of useful scale will be SRIOV-capable (or
>> should be replaced with one that is - some of them are not that
>> expensive).
> Hi Laine,
>
> I think SRIOV virtual NIC to support migration is good idea,
> but I also think some passthrough NIC without SRIOV-capable. for
> these NIC devices we only able to use <hostdev> to specify the
> passthrough
> function, so for these NIC I think we should support too.
As I think you've already discovered, passing through non-SRIOV NICS is
problematic. It is completely impossible for the host to change their
MAC address before assigning them to the guest - the guest's driver sees
standard netdev hardware and resets it, which resets the MAC address to
the original value burned into the firmware. This makes management more
complicated, especially when you get into scenarios such as what we're
discussing (i.e. migration) where the actual hardware (and thus MAC
address) may be different from one run to the next.
Right, passing through PFs is also insecure. Let's get
everything working fine with VFs first, worry about PFs later.
Since libvirt's <interface> element requires a fixed MAC
address in the
XML, it's not possible to have an <interface> that gets the actual
device from a network pool (without some serious hacking to that code),
and there is no support for plain (non-network) <hostdev> device pools;
there would need to be a separate (nonexistent) driver for that. Since
the <hostdev> element relies on the PCI address of the device (in the
<source> subelement, which also must be fixed) to determine which device
to passthrough, a domain config with a <hostdev> that could be run on
two different machines would require the device to reside at exactly the
same PCI address on both machines, which is a very serious limitation to
have in an environment large enough that migrating domains is a requirement.
Also, non-SRIOV NICs are limited to a single device per physical port,
meaning probably at most 4 devices per physical host PCIe slot, and this
results in a greatly reduced density on the host (and even more so on
the switch that connects to the host!) compared to even the old Intel
82576 cards, which have 14 VFs (7VFs x 2 ethernet ports). Think about it
- with an 82576, you can get 14 guests into 1 PCIe slot and 2 switch
ports, while the same number of guests with non-SRIOV would take 4 PCIe
slots and 14(!) switch ports. The difference is even more striking when
comparing to chips like the 82599 (64 VFs per port x 2), or a Mellanox
(also 64?) or SolarFlare (128?) card. And don't forget that, because you
don't have pools of devices to be automatically chosen from, that each
guest domain that will be migrated requires a reserved NIC on *every*
machine it will be migrated to (no other domain can be configured to use
that NIC, in order to avoid conflicts).
Of course you could complicate the software by adding a driver that
manages pools of generic hostdevs, and coordinates MAC address changes
with the guest (part of what you're suggesting), but all that extra
complexity not only takes a lot of time and effort to develop, it also
creates more code that needs to be maintained and tested for regressions
at each release.
The alternative is to just spend $130 per host for an 82576 or Intel
I350 card (these are the cheapest SRIOV options I'm aware of). When
compared to the total cost of any hardware installation large enough to
support migration and have performance requirements high enough that NIC
passthrough is needed, this is a trivial amount.
I guess the bottom line of all this is that (in my opinion, of course
:-) supporting useful migration of domains that used passed-through
non-SRIOV NICs would be an interesting experiment, but I don't see much
utility to it, other than "scratching an intellectual itch", and I'm
concerned that it would create more long term maintenance cost than it
was worth.
I'm not sure it has no utility but it's easy to agree that
VFs are more important, and focusing on this first is a good
idea.