
* Anthony Liguori (aliguori@linux.vnet.ibm.com) wrote:
There are two modes worth supporting for vhost-net in libvirt. The first mode is where vhost-net backs to a tun/tap device. This is behaves in very much the same way that -net tap behaves in qemu today. Basically, the difference is that the virtio backend is in the kernel instead of in qemu so there should be some performance improvement.
Current, libvirt invokes qemu with -net tap,fd=X where X is an already open fd to a tun/tap device. I suspect that after we merge vhost-net, libvirt could support vhost-net in this mode by just doing -net vhost,fd=X. I think the only real question for libvirt is whether to provide a user visible switch to use vhost or to just always use vhost when it's available and it makes sense. Personally, I think the later makes sense.
Doesn't sound useful. Low-level, sure worth being able to turn things on and off for testing/debugging, but probably not something a user should be burdened with in libvirt. But I dont' understand your -net vhost,fd=X, that would still be -net tap=fd=X, no? IOW, vhost is an internal qemu impl. detail of the virtio backend (or if you get your wish, $nic_backend).
The more interesting invocation of vhost-net though is one where the vhost-net device backs directly to a physical network card. In this mode, vhost should get considerably better performance than the current implementation. I don't know the syntax yet, but I think it's reasonable to assume that it will look something like -net tap,dev=eth0. The effect will be that eth0 is dedicated to the guest.
tap? we'd want either macvtap or raw socket here.
On most modern systems, there is a small number of network devices so this model is not all that useful except when dealing with SR-IOV adapters. In that case, each physical device can be exposed as many virtual devices (VFs). There are a few restrictions here though. The biggest is that currently, you can only change the number of VFs by reloading a kernel module so it's really a parameter that must be set at startup time.
I think there are a few ways libvirt could support vhost-net in this second mode. The simplest would be to introduce a new tag similar to <source network='br0'>. In fact, if you probed the device type for the network parameter, you could probably do something like <source network='eth0'> and have it Just Work.
We'll need to keep track of more than just the other en We need to 0
Another model would be to have libvirt see an SR-IOV adapter as a network pool whereas it handled all of the VF management. Considering how inflexible SR-IOV is today, I'm not sure whether this is the best model.
We already need to know the VF<->PF relationship. For example, don't want to assign a VF to a guest, then a PF to another guest for basic sanity reasons. As we get better ability to manage the embedded switch in an SR-IOV NIC we will need to manage them as well. So we do need to have some concept of managing an SR-IOV adapter. So I think we want to maintain a concept of the qemu backend (virtio, e1000, etc), the fd that connects the qemu backend to the host (tap, socket, macvtap, etc), and the bridge. The bridge bit gets a little complicated. We have the following bridge cases: - sw bridge - normal existing setup, w/ Linux bridging code - macvlan - hw bridge - on SR-IOV card - configured to simply fwd to external hw bridge (like VEPA mode) - configured as a bridge w/ policies (QoS, ACL, port mirroring, etc. and allows inter-guest traffic and looks a bit like above sw switch) - external - need to possibly inform switch of incoming vport And, we can have a hybrid. E.g., no reason one VF can't be shared by a few guests.