Hi Rich,
On Mon, 2009-01-26 at 11:39 +0000, Richard W.M. Jones wrote:
On Mon, Jan 26, 2009 at 10:39:25AM +0000, Mark McLoughlin wrote:
> IFF_VNET_HDR is a tun/tap flag that allows you to send and receive
> large (i.e. GSO) packets and packets with partial checksums. Setting
> the flag means that every packet is proceeded by the same header which
> virtio uses to communicate GSO/csum metadata.
Translating this for people not familiar with the intricacies of
recent Linux networking changes ...
GSO = generic segmentation offload. In a baremetal Linux install, the
network driver can pass the job of splitting large packets over to the
network card. In virtualized environments, the "network card" is, for
example, a virtio backend running in the host. Because the network
bridge runs entirely inside the host kernel, there are no physical
limitations on packet size as there would be if it was real ethernet,
so we can use this mechanism to pass over-sized packets to the host.
Another advantage is that you don't need to compute checksums over the
packets which are sent this way.
Yep, see also:
http://blogs.gnome.org/markmc/2008/05/28/checksums-scatter-gather-io-and-...
"VNET_HDR" as far as I can gather refers to the special
header that
virtio_net prepends to such over-sized packets. I'm not quite clear
if userspace has to add this header, but if so presumably that is done
inside qemu userspace(?).
VNET_HDR refers to the tap interface.
A tap interface is what qemu uses to inject ethernet frames into the
kernel networking stack. Normally, you just write() and read() each raw
frame with a single syscall per frame.
VNET_HDR is a flag for tap interfaces to say that frames we read() and
write() will have a struct virtio_net_hdr prepended so as to give the
kernel/qemu information about partial checksums and GSO.
We need to set this flag before we bring the interface up and add it to
a bridge. That's why libvirt has to do this rather than just leaving it
up to qemu.
Libvirt sets the flag on the socket, passes the socket by number to
qemu, and qemu needs to be able to query whether the flag was set. So
the patch concerns itself with making sure that all the relevant bits
of this are supported.
Correct me if I'm wrong here ...
You're spot on, thanks for elaborating.
Cheers,
Mark.