* Anthony Liguori (aliguori(a)linux.vnet.ibm.com) wrote:
There are two modes worth supporting for vhost-net in libvirt. The
first mode is where vhost-net backs to a tun/tap device. This is
behaves in very much the same way that -net tap behaves in qemu
today. Basically, the difference is that the virtio backend is in
the kernel instead of in qemu so there should be some performance
improvement.
Current, libvirt invokes qemu with -net tap,fd=X where X is an
already open fd to a tun/tap device. I suspect that after we merge
vhost-net, libvirt could support vhost-net in this mode by just
doing -net vhost,fd=X. I think the only real question for libvirt
is whether to provide a user visible switch to use vhost or to just
always use vhost when it's available and it makes sense.
Personally, I think the later makes sense.
Doesn't sound useful. Low-level, sure worth being able to turn things
on and off for testing/debugging, but probably not something a user
should be burdened with in libvirt.
But I dont' understand your -net vhost,fd=X, that would still be -net
tap=fd=X, no? IOW, vhost is an internal qemu impl. detail of the virtio
backend (or if you get your wish, $nic_backend).
The more interesting invocation of vhost-net though is one where the
vhost-net device backs directly to a physical network card. In this
mode, vhost should get considerably better performance than the
current implementation. I don't know the syntax yet, but I think
it's reasonable to assume that it will look something like -net
tap,dev=eth0. The effect will be that eth0 is dedicated to the
guest.
tap? we'd want either macvtap or raw socket here.
On most modern systems, there is a small number of network devices
so this model is not all that useful except when dealing with SR-IOV
adapters. In that case, each physical device can be exposed as many
virtual devices (VFs). There are a few restrictions here though.
The biggest is that currently, you can only change the number of VFs
by reloading a kernel module so it's really a parameter that must be
set at startup time.
I think there are a few ways libvirt could support vhost-net in this
second mode. The simplest would be to introduce a new tag similar
to <source network='br0'>. In fact, if you probed the device type
for the network parameter, you could probably do something like
<source network='eth0'> and have it Just Work.
We'll need to keep track of more than just the other en
We need to 0
Another model would be to have libvirt see an SR-IOV adapter as a
network pool whereas it handled all of the VF management.
Considering how inflexible SR-IOV is today, I'm not sure whether
this is the best model.
We already need to know the VF<->PF relationship. For example, don't
want to assign a VF to a guest, then a PF to another guest for basic
sanity reasons. As we get better ability to manage the embedded switch
in an SR-IOV NIC we will need to manage them as well. So we do need
to have some concept of managing an SR-IOV adapter.
So I think we want to maintain a concept of the qemu backend (virtio,
e1000, etc), the fd that connects the qemu backend to the host (tap,
socket, macvtap, etc), and the bridge. The bridge bit gets a little
complicated. We have the following bridge cases:
- sw bridge
- normal existing setup, w/ Linux bridging code
- macvlan
- hw bridge
- on SR-IOV card
- configured to simply fwd to external hw bridge (like VEPA mode)
- configured as a bridge w/ policies (QoS, ACL, port mirroring,
etc. and allows inter-guest traffic and looks a bit like above
sw switch)
- external
- need to possibly inform switch of incoming vport
And, we can have a hybrid. E.g., no reason one VF can't be shared by a
few guests.