Re: [libvirt] RFC: Improve performance of macvtap device creation

Friday, 30 October 2015

On 10/29/2015 01:48 PM, Laine Stump wrote:
...
 On 10/29/2015 12:49 PM, Tony Krowiak wrote:
> For a guest domain defined with a large number of macvtap devices, it takes an
exceedingly long time to boot the guest. In a test of a guest domain configured with 82
macvtap devices, it took over two minutes for the guest to boot. An strace of the ioctl
calls during guest start up showed the SIOCGIFFLAGS ioctl literally being invoked 3,403
times. I was able to isolate the source of the ioctl calls to
the*virNetDevMacVLanCreateWithVPortProfile*  function in*virnetdevmacvlan.c*. The macvtap
interface name is created by looping over a counter variable, starting with zero, and
appending the counter value to 'macvtap'.

 I've wondered ever since the first time I saw that code why it was 
 done that way, and why there had never been any performance 
 complaints. Lacking any complaints, I promptly forgot about it (until 
 the next time I went past the code for some other tangentially related 
 reason.)

 Since you're the first to complain, you have the honor of fixing it :-) Thank
you for that honor.
...

> With each iteration, a call is made to*virNetDevExists*  (SIOCGIFFLAGS ioctl) to
determine if a device with that name already exists, until a unique name is created. In
the test case cited above, to create an interface name for the 82nd macvtap device,
the*virNetDevExists*  function will be called for interface names 'macvtap0' to
'macvtap80' before it is determined that 'mavtap81' can be used. So if N
is the number of macvtap interfaces defined for a guest, the SIOCGIFFLAGS ioctl will be
invoked (N x N + N)/2 times to find an unused macvtap device names. That's assuming
only one guest is being started, who knows how many times the ioctl may have to be called
in an installation running a large number of guests defined with macvtap devices.
>
> I was able to reduce the amount of time for starting a guest domain defined with 82
macvtap devices from over 2 minutes to about 14 seconds by keeping track of the interface
name suffixes previously used. I defined two static bit maps (virBitmap), one each for
macvtap and macvlan device name suffixes. When a macvtap/macvlan device is created, the
index of the next clear bit (virBitmapNextClearBit) is retrieved to create the name. If an
interface with that name does not exist, the device is created and the bit at the index
used to create the interface name is set (virBitmapSetBit). When a macvtap/macvlan device
is deleted, if the interface name has the pattern 'macvtap%d' or
'macvlan%d', the suffix is parsed into a bit index and used to clear the
(virBitMapClearBit) bit in the respective bitmap.

 This sounds fine, as long as 1) you recreate the bitmap whenever 
 libvirtd is restarted (while scanning through all the interfaces of 
 every domain; there is already code being executed in exactly the 
 right place - look for qemu_process.c:qemuProcessNotifyNets() and add 
 appropriate code inside the loop there), and 2) you retry some number 
 of times if a supposedly unused device name is actually in use (to 
 account for processes other than libvirt using the same naming 
 convention).

> I am not sure that is the best design because there is no way to track interface
names used to create macvtap devices outside of libvirt, for example using the ip
command.

 If you wanted to get *really* complicated, you could use netlink to 
 get a list of all network devices, or even monitor netlink traffic to 
 maintain your own cache, but I think that's serious overkill (until 
 proven otherwise). I agree, I think this would be overkill. I think it would
require that 
we track the complete interface names as opposed to maintaining a bitmap 
of interface name suffixes.
...
>   There may also be other issues I've not contemplated. I
included a couple of additional ideas below and am looking for comments or other
suggestions that I have not considered.
>
>   * Define a global counter variable initialized to 0, that gets
>     incremented each time an interface name is created, to keep track
>     of the last used interface name suffix. At some maximum value,
>     the counter will be set back to 0.
>

 There could be some merit to this, as it is simpler and likely faster. 
 You would need to maintain the counter somewhere in persistent storage 
 so it could be retrieved when libvirtd is restarted though. I have a problem with
this one, because certain scenarios could 
introduce performance issues, for example:

  * Guest1, defined with 1 macvtap device is started and the 'macvtap0'
    device is created
  * A plethora of guests are subsequently defined, such that there are
    no gaps between interface names 'macvtap0' and 'macvtap5100'
  * Guest1 is deleted, thus removing the 'macvtap0' device
  * Additional guests are defined until the counter recycles back to 0
  * GuestN is defined with more than one macvtap device. When guestN is
    started, the 'macvtap0' device will get created for it right off the
    bat, but then 5000 ioctl calls will be made until 'macvtap5200' is
    found to be available.

I don't know what the likelihood of such a scenario is, but we should 
probably code for such contingencies. What say you?
...

>   * Append a random number to 'macvlan' or 'macvtap' when creating
>     the interface name. Of course, the number of digits would have to
>     be limited so the interface name would not exceed the maximum
>     allowed.
>

 Well, that has the advantage that no persistent state information is 
 required. This one would be pretty easy to implement and as you said, would not 
require maintaining persistent state information. The only question I 
have with regard to this one is would users complain that the expected 
behavior has dramatically changed. Curently, the macvtap interface names 
are somewhat consecutive and look like 'macvtap0', 'macvtap1' ... 
macvtapN, with the gaps being filled in as new macvtap devices are 
created. With this idea, the device names would look like 
'macvtap83927611', 'macvtap91304510', 'macvtap18294667' .... Do
you 
think this would be a problem?
...

>   * Create the interface name in code that has more knowledge of the
>     environment and pass the name into the
>     *virNetDevMacVLanCreateWithVPortProfile* function via the
>     *tgifname* parameter. For example, the *qemuBuildCommandLine*
>     function in *qemu_command.c* contains the loop that iterates over
>     the network devices defined for the guest domain that ultimately
>     get created via the *virNetDevMacVLanCreateWithVPortProfile*
>     function. That function has access to the network device
>     configuration and at the very least could ensure none of the
>     names previously defined for the guest aren't used. I believe it
>     would be matter of creating a macvtap interface name - e.g.,
>     maybe a call to some function in *virnetdevmacvlan.c* - and
>     setting the name in the virDomainNetDef structure prior to
>     invoking *qemuBuildInterfaceCommandLine*?
>

 I don't quite follow what you're saying, but it sounds like you are 
 suggesting that we try to know enough about the environment that we 
 can predetermine an interface name. That won't work though - you can't 
 know for certain that some other program hasn't taken the name you 
 want until you try to create is. The name creation function in *virnetdevmacvlan.c*
would still check to 
see if a device with the name exists. I don't really like this idea 
anyway for a lot of other reasons.
...

> There are shortcomings in all of these ideas, so if you have a better 
> one, feel free to present it.

 Any of the first three is better than what we currently do. Note that 
 in the case of standard tap devices, the kernel itself handles the 
 creation of a unique name - if you call ioctl(TUNSETIFF) with a string 
 with "%d" in it and it finds the lowest numbered unused name and 
 returns that. For some reason, the macvtap authors didn't want to do that.
Correct me if I am wrong, but doing something like this would require 
changes in the kernel?

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] RFC: Improve performance of macvtap device creation