[libvirt] Re: Supporting vhost-net and macvtap in libvirt for QEMU

Friday, 18 December 2009

On Thu, Dec 17, 2009 at 03:39:05PM -0600, Anthony Liguori wrote:
...
 Chris Wright wrote:
 >
 >Doesn't sound useful.  Low-level, sure worth being able to turn things
 >on and off for testing/debugging, but probably not something a user
 >should be burdened with in libvirt.
 >
 >But I dont' understand  your -net vhost,fd=X, that would still be -net
 >tap=fd=X, no?  IOW, vhost is an internal qemu impl. detail of the virtio
 >backend (or if you get your wish, $nic_backend).
 >  

 I don't want to get bogged down in a qemu-devel discussion on 
 libvirt-devel :-)

 But from a libvirt perspective, I assume that it wants to open up 
 /dev/vhost in order to not have to grant the qemu instance privileges 
 which means that it needs to hand qemu the file descriptor to it.

 Given a file descriptor, I don't think qemu can easily tell whether it's 
 a tun/tap fd or whether it's a vhost fd.  Since they have different 
 interfaces, we need libvirt to tell us which one it is.  Whether that's 
 -net tap,vhost or -net vhost, we can figure that part out on qemu-devel :-) 
That is no problem, since we already do that kind of thing for TAP
devices it is perfectly feasible for us to also do it for vhost FDs.

...

 >>The more interesting invocation of vhost-net though is one where the
 >>vhost-net device backs directly to a physical network card.  In this
 >>mode, vhost should get considerably better performance than the
 >>current implementation.  I don't know the syntax yet, but I think
 >>it's reasonable to assume that it will look something like -net
 >>tap,dev=eth0.   The effect will be that eth0 is dedicated to the
 >>guest.
 >>    
 >
 >tap?  we'd want either macvtap or raw socket here.
 >  

 I screwed up.  I meant to say, -net vhost,dev=eth0.  But maybe it 
 doesn't matter if libvirt is the one that initializes the vhost device, 
 setups up the raw socket (or macvtap), and hands us a file descriptor.

 In general, I think it's best to avoid as much network configuration in 
 qemu as humanly possible so I'd rather see libvirt configure the vhost 
 device ahead of time and pass us an fd that we can start using. 
Agreed, if we can avoid needing to give QEMU  CAP_NET_ADMIN then
that is preferred - indeed when libvirt runs QEMU as root, we already
strip it of CAP_NET_ADMIN (and all other capabilities).

...
 >>Another model would be to have libvirt see an SR-IOV adapter
as a
 >>network pool whereas it handled all of the VF management.
 >>Considering how inflexible SR-IOV is today, I'm not sure whether
 >>this is the best model.
 >>    
 >
 >We already need to know the VF<->PF relationship.  For example, don't
 >want to assign a VF to a guest, then a PF to another guest for basic
 >sanity reasons.  As we get better ability to manage the embedded switch
 >in an SR-IOV NIC we will need to manage them as well.  So we do need
 >to have some concept of managing an SR-IOV adapter.
 >  

 But we still need to support the notion of backing a VNIC to a NIC, no?  
 If this just happens to also work with a naive usage of SR-IOV, is that 
 so bad? :-)

 Long term, yes, I think you want to manage SR-IOV adapters as if they're 
 a network pool.  But since they're sufficiently inflexible right now, 
 I'm not sure it's all that useful today. 
FYI, we have generic capabilities for creating & deleting host devices 
via the virNodeDevCreate / virNodeDevDestroy  APIs. We use this for
creating & deleting NPIV scsi adapters. If we need to support this for
some types of NICs too, that fits into the model fine.

...
 >So I think we want to maintain a concept of the qemu backend
(virtio,
 >e1000, etc), tbhe fd that connects the qemu backend to the host (tap,
 >socket, macvtap, etc), and the bridge.  The bridge bit gets a little
 >complicated.  We have the following bridge cases:
 >
 >- sw bridge
 >  - normal existing setup, w/ Linux bridging code
 >  - macvlan
 >- hw bridge
 >  - on SR-IOV card
 >    - configured to simply fwd to external hw bridge (like VEPA mode)
 >    - configured as a bridge w/ policies (QoS, ACL, port mirroring,
 >      etc. and allows inter-guest traffic and looks a bit like above
 >      sw switch)
 >  - external
 >    - need to possibly inform switch of incoming vport

 I've got mixed feelings here.  With respect to sw vs. hw bridge, I 
 really think that that's an implementation detail that should not be 
 exposed to a user.  A user doesn't typically want to think about whether 
 they're using a hardware switch vs. software switch.  Instead, they 
 approach it from, I want to have this network topology, and these 
 features enabled. 
Agree there is alot of low level detail there, and I think it will be
very hard for users, or apps to gain enough knowledge to make intelligent
decisions about which they should use. So I don't think we want to expose
all that detail. For a libvirt representation we need to consider it more
in terms of what capabilities does each options provide, rather than what
implementation each option uses

Regards,
Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

[libvirt] Re: Supporting vhost-net and macvtap in libvirt for QEMU