
Hi Rich, On Mon, 2009-01-26 at 11:39 +0000, Richard W.M. Jones wrote:
On Mon, Jan 26, 2009 at 10:39:25AM +0000, Mark McLoughlin wrote:
IFF_VNET_HDR is a tun/tap flag that allows you to send and receive large (i.e. GSO) packets and packets with partial checksums. Setting the flag means that every packet is proceeded by the same header which virtio uses to communicate GSO/csum metadata.
Translating this for people not familiar with the intricacies of recent Linux networking changes ...
GSO = generic segmentation offload. In a baremetal Linux install, the network driver can pass the job of splitting large packets over to the network card. In virtualized environments, the "network card" is, for example, a virtio backend running in the host. Because the network bridge runs entirely inside the host kernel, there are no physical limitations on packet size as there would be if it was real ethernet, so we can use this mechanism to pass over-sized packets to the host. Another advantage is that you don't need to compute checksums over the packets which are sent this way.
Yep, see also: http://blogs.gnome.org/markmc/2008/05/28/checksums-scatter-gather-io-and-seg...
"VNET_HDR" as far as I can gather refers to the special header that virtio_net prepends to such over-sized packets. I'm not quite clear if userspace has to add this header, but if so presumably that is done inside qemu userspace(?).
VNET_HDR refers to the tap interface. A tap interface is what qemu uses to inject ethernet frames into the kernel networking stack. Normally, you just write() and read() each raw frame with a single syscall per frame. VNET_HDR is a flag for tap interfaces to say that frames we read() and write() will have a struct virtio_net_hdr prepended so as to give the kernel/qemu information about partial checksums and GSO. We need to set this flag before we bring the interface up and add it to a bridge. That's why libvirt has to do this rather than just leaving it up to qemu.
Libvirt sets the flag on the socket, passes the socket by number to qemu, and qemu needs to be able to query whether the flag was set. So the patch concerns itself with making sure that all the relevant bits of this are supported.
Correct me if I'm wrong here ...
You're spot on, thanks for elaborating. Cheers, Mark.