On 10/29/2015 12:49 PM, Tony Krowiak wrote:
For a guest domain defined with a large number of macvtap devices, it
takes an exceedingly long time to boot the guest. In a test of a guest domain configured
with 82 macvtap devices, it took over two minutes for the guest to boot. An strace of the
ioctl calls during guest start up showed the SIOCGIFFLAGS ioctl literally being invoked
3,403 times. I was able to isolate the source of the ioctl calls to
the*virNetDevMacVLanCreateWithVPortProfile* function in*virnetdevmacvlan.c*. The macvtap
interface name is created by looping over a counter variable, starting with zero, and
appending the counter value to 'macvtap'.
I've wondered ever since the first time I saw that code why it was done
that way, and why there had never been any performance complaints.
Lacking any complaints, I promptly forgot about it (until the next time
I went past the code for some other tangentially related reason.)
Since you're the first to complain, you have the honor of fixing it :-)
With each iteration, a call is made to*virNetDevExists*
(SIOCGIFFLAGS ioctl) to determine if a device with that name already exists, until a
unique name is created. In the test case cited above, to create an interface name for the
82nd macvtap device, the*virNetDevExists* function will be called for interface names
'macvtap0' to 'macvtap80' before it is determined that 'mavtap81'
can be used. So if N is the number of macvtap interfaces defined for a guest, the
SIOCGIFFLAGS ioctl will be invoked (N x N + N)/2 times to find an unused macvtap device
names. That's assuming only one guest is being started, who knows how many times the
ioctl may have to be called in an installation running a large number of guests defined
with macvtap devices.
I was able to reduce the amount of time for starting a guest domain defined with 82
macvtap devices from over 2 minutes to about 14 seconds by keeping track of the interface
name suffixes previously used. I defined two static bit maps (virBitmap), one each for
macvtap and macvlan device name suffixes. When a macvtap/macvlan device is created, the
index of the next clear bit (virBitmapNextClearBit) is retrieved to create the name. If an
interface with that name does not exist, the device is created and the bit at the index
used to create the interface name is set (virBitmapSetBit). When a macvtap/macvlan device
is deleted, if the interface name has the pattern 'macvtap%d' or
'macvlan%d', the suffix is parsed into a bit index and used to clear the
(virBitMapClearBit) bit in the respective bitmap.
This sounds fine, as long as 1) you recreate the bitmap whenever
libvirtd is restarted (while scanning through all the interfaces of
every domain; there is already code being executed in exactly the right
place - look for qemu_process.c:qemuProcessNotifyNets() and add
appropriate code inside the loop there), and 2) you retry some number of
times if a supposedly unused device name is actually in use (to account
for processes other than libvirt using the same naming convention).
I am not sure that is the best design because there is no way to track interface names
used to create macvtap devices outside of libvirt, for example using the ip command.
If you wanted to get *really* complicated, you could use netlink to get
a list of all network devices, or even monitor netlink traffic to
maintain your own cache, but I think that's serious overkill (until
proven otherwise).
There may also be other issues I've not contemplated. I
included a couple of additional ideas below and am looking for comments or other
suggestions that I have not considered.
* Define a global counter variable initialized to 0, that gets
incremented each time an interface name is created, to keep track
of the last used interface name suffix. At some maximum value, the
counter will be set back to 0.
There could be some merit to this, as it is simpler and likely faster.
You would need to maintain the counter somewhere in persistent storage
so it could be retrieved when libvirtd is restarted though.
* Append a random number to 'macvlan' or 'macvtap'
when creating the
interface name. Of course, the number of digits would have to be
limited so the interface name would not exceed the maximum allowed.
Well, that has the advantage that no persistent state information is
required.
* Create the interface name in code that has more knowledge of the
environment and pass the name into the
*virNetDevMacVLanCreateWithVPortProfile* function via the
*tgifname* parameter. For example, the *qemuBuildCommandLine*
function in *qemu_command.c* contains the loop that iterates over
the network devices defined for the guest domain that ultimately
get created via the *virNetDevMacVLanCreateWithVPortProfile*
function. That function has access to the network device
configuration and at the very least could ensure none of the names
previously defined for the guest aren't used. I believe it would
be matter of creating a macvtap interface name - e.g., maybe a
call to some function in *virnetdevmacvlan.c* - and setting the
name in the virDomainNetDef structure prior to invoking
*qemuBuildInterfaceCommandLine*?
I don't quite follow what you're saying, but it sounds like you are
suggesting that we try to know enough about the environment that we can
predetermine an interface name. That won't work though - you can't know
for certain that some other program hasn't taken the name you want until
you try to create is.
There are shortcomings in all of these ideas, so if you have a better
one, feel free to present it.
Any of the first three is better than what we currently do. Note that in
the case of standard tap devices, the kernel itself handles the creation
of a unique name - if you call ioctl(TUNSETIFF) with a string with "%d"
in it and it finds the lowest numbered unused name and returns that. For
some reason, the macvtap authors didn't want to do that.