[libvirt] RFC: Improve performance of macvtap device creation

For a guest domain defined with a large number of macvtap devices, it takes an exceedingly long time to boot the guest. In a test of a guest domain configured with 82 macvtap devices, it took over two minutes for the guest to boot. An strace of the ioctl calls during guest start up showed the SIOCGIFFLAGS ioctl literally being invoked 3,403 times. I was able to isolate the source of the ioctl calls to the*virNetDevMacVLanCreateWithVPortProfile* function in*virnetdevmacvlan.c*. The macvtap interface name is created by looping over a counter variable, starting with zero, and appending the counter value to 'macvtap'. With each iteration, a call is made to*virNetDevExists* (SIOCGIFFLAGS ioctl) to determine if a device with that name already exists, until a unique name is created. In the test case cited above, to create an interface name for the 82nd macvtap device, the*virNetDevExists* function will be called for interface names 'macvtap0' to 'macvtap80' before it is determined that 'mavtap81' can be used. So if N is the number of macvtap interfaces defined for a guest, the SIOCGIFFLAGS ioctl will be invoked (N x N + N)/2 times to find an unused macvtap device names. That's assuming only one guest is being started, who knows how many times the ioctl may have to be called in an installation running a large number of guests defined with macvtap devices. I was able to reduce the amount of time for starting a guest domain defined with 82 macvtap devices from over 2 minutes to about 14 seconds by keeping track of the interface name suffixes previously used. I defined two static bit maps (virBitmap), one each for macvtap and macvlan device name suffixes. When a macvtap/macvlan device is created, the index of the next clear bit (virBitmapNextClearBit) is retrieved to create the name. If an interface with that name does not exist, the device is created and the bit at the index used to create the interface name is set (virBitmapSetBit). When a macvtap/macvlan device is deleted, if the interface name has the pattern 'macvtap%d' or 'macvlan%d', the suffix is parsed into a bit index and used to clear the (virBitMapClearBit) bit in the respective bitmap. I am not sure that is the best design because there is no way to track interface names used to create macvtap devices outside of libvirt, for example using the ip command. There may also be other issues I've not contemplated. I included a couple of additional ideas below and am looking for comments or other suggestions that I have not considered. * Define a global counter variable initialized to 0, that gets incremented each time an interface name is created, to keep track of the last used interface name suffix. At some maximum value, the counter will be set back to 0. * Append a random number to 'macvlan' or 'macvtap' when creating the interface name. Of course, the number of digits would have to be limited so the interface name would not exceed the maximum allowed. * Create the interface name in code that has more knowledge of the environment and pass the name into the *virNetDevMacVLanCreateWithVPortProfile* function via the *tgifname* parameter. For example, the *qemuBuildCommandLine* function in *qemu_command.c* contains the loop that iterates over the network devices defined for the guest domain that ultimately get created via the *virNetDevMacVLanCreateWithVPortProfile* function. That function has access to the network device configuration and at the very least could ensure none of the names previously defined for the guest aren't used. I believe it would be matter of creating a macvtap interface name - e.g., maybe a call to some function in *virnetdevmacvlan.c* - and setting the name in the virDomainNetDef structure prior to invoking *qemuBuildInterfaceCommandLine*? There are shortcomings in all of these ideas, so if you have a better one, feel free to present it.

On 10/29/2015 12:49 PM, Tony Krowiak wrote:
For a guest domain defined with a large number of macvtap devices, it takes an exceedingly long time to boot the guest. In a test of a guest domain configured with 82 macvtap devices, it took over two minutes for the guest to boot. An strace of the ioctl calls during guest start up showed the SIOCGIFFLAGS ioctl literally being invoked 3,403 times. I was able to isolate the source of the ioctl calls to the*virNetDevMacVLanCreateWithVPortProfile* function in*virnetdevmacvlan.c*. The macvtap interface name is created by looping over a counter variable, starting with zero, and appending the counter value to 'macvtap'.
I've wondered ever since the first time I saw that code why it was done that way, and why there had never been any performance complaints. Lacking any complaints, I promptly forgot about it (until the next time I went past the code for some other tangentially related reason.) Since you're the first to complain, you have the honor of fixing it :-)
With each iteration, a call is made to*virNetDevExists* (SIOCGIFFLAGS ioctl) to determine if a device with that name already exists, until a unique name is created. In the test case cited above, to create an interface name for the 82nd macvtap device, the*virNetDevExists* function will be called for interface names 'macvtap0' to 'macvtap80' before it is determined that 'mavtap81' can be used. So if N is the number of macvtap interfaces defined for a guest, the SIOCGIFFLAGS ioctl will be invoked (N x N + N)/2 times to find an unused macvtap device names. That's assuming only one guest is being started, who knows how many times the ioctl may have to be called in an installation running a large number of guests defined with macvtap devices.
I was able to reduce the amount of time for starting a guest domain defined with 82 macvtap devices from over 2 minutes to about 14 seconds by keeping track of the interface name suffixes previously used. I defined two static bit maps (virBitmap), one each for macvtap and macvlan device name suffixes. When a macvtap/macvlan device is created, the index of the next clear bit (virBitmapNextClearBit) is retrieved to create the name. If an interface with that name does not exist, the device is created and the bit at the index used to create the interface name is set (virBitmapSetBit). When a macvtap/macvlan device is deleted, if the interface name has the pattern 'macvtap%d' or 'macvlan%d', the suffix is parsed into a bit index and used to clear the (virBitMapClearBit) bit in the respective bitmap.
This sounds fine, as long as 1) you recreate the bitmap whenever libvirtd is restarted (while scanning through all the interfaces of every domain; there is already code being executed in exactly the right place - look for qemu_process.c:qemuProcessNotifyNets() and add appropriate code inside the loop there), and 2) you retry some number of times if a supposedly unused device name is actually in use (to account for processes other than libvirt using the same naming convention).
I am not sure that is the best design because there is no way to track interface names used to create macvtap devices outside of libvirt, for example using the ip command.
If you wanted to get *really* complicated, you could use netlink to get a list of all network devices, or even monitor netlink traffic to maintain your own cache, but I think that's serious overkill (until proven otherwise).
There may also be other issues I've not contemplated. I included a couple of additional ideas below and am looking for comments or other suggestions that I have not considered.
* Define a global counter variable initialized to 0, that gets incremented each time an interface name is created, to keep track of the last used interface name suffix. At some maximum value, the counter will be set back to 0.
There could be some merit to this, as it is simpler and likely faster. You would need to maintain the counter somewhere in persistent storage so it could be retrieved when libvirtd is restarted though.
* Append a random number to 'macvlan' or 'macvtap' when creating the interface name. Of course, the number of digits would have to be limited so the interface name would not exceed the maximum allowed.
Well, that has the advantage that no persistent state information is required.
* Create the interface name in code that has more knowledge of the environment and pass the name into the *virNetDevMacVLanCreateWithVPortProfile* function via the *tgifname* parameter. For example, the *qemuBuildCommandLine* function in *qemu_command.c* contains the loop that iterates over the network devices defined for the guest domain that ultimately get created via the *virNetDevMacVLanCreateWithVPortProfile* function. That function has access to the network device configuration and at the very least could ensure none of the names previously defined for the guest aren't used. I believe it would be matter of creating a macvtap interface name - e.g., maybe a call to some function in *virnetdevmacvlan.c* - and setting the name in the virDomainNetDef structure prior to invoking *qemuBuildInterfaceCommandLine*?
I don't quite follow what you're saying, but it sounds like you are suggesting that we try to know enough about the environment that we can predetermine an interface name. That won't work though - you can't know for certain that some other program hasn't taken the name you want until you try to create is.
There are shortcomings in all of these ideas, so if you have a better one, feel free to present it.
Any of the first three is better than what we currently do. Note that in the case of standard tap devices, the kernel itself handles the creation of a unique name - if you call ioctl(TUNSETIFF) with a string with "%d" in it and it finds the lowest numbered unused name and returns that. For some reason, the macvtap authors didn't want to do that.

On 29.10.2015 18:48, Laine Stump wrote:
On 10/29/2015 12:49 PM, Tony Krowiak wrote:
For a guest domain defined with a large number of macvtap devices, it takes an exceedingly long time to boot the guest. In a test of a guest domain configured with 82 macvtap devices, it took over two minutes for the guest to boot. An strace of the ioctl calls during guest start up showed the SIOCGIFFLAGS ioctl literally being invoked 3,403 times. I was able to isolate the source of the ioctl calls to the*virNetDevMacVLanCreateWithVPortProfile* function in*virnetdevmacvlan.c*. The macvtap interface name is created by looping over a counter variable, starting with zero, and appending the counter value to 'macvtap'.
I've wondered ever since the first time I saw that code why it was done that way, and why there had never been any performance complaints. Lacking any complaints, I promptly forgot about it (until the next time I went past the code for some other tangentially related reason.)
Since you're the first to complain, you have the honor of fixing it :-)
With each iteration, a call is made to*virNetDevExists* (SIOCGIFFLAGS ioctl) to determine if a device with that name already exists, until a unique name is created. In the test case cited above, to create an interface name for the 82nd macvtap device, the*virNetDevExists* function will be called for interface names 'macvtap0' to 'macvtap80' before it is determined that 'mavtap81' can be used. So if N is the number of macvtap interfaces defined for a guest, the SIOCGIFFLAGS ioctl will be invoked (N x N + N)/2 times to find an unused macvtap device names. That's assuming only one guest is being started, who knows how many times the ioctl may have to be called in an installation running a large number of guests defined with macvtap devices.
Not only that, but unitl c0d162c68c2f19af8d55a435a9e372da33857048 ( contained v1.2.2~32) if two threads were starting a domain concurrently, they even competed with each other in that specific area of the code.
I was able to reduce the amount of time for starting a guest domain defined with 82 macvtap devices from over 2 minutes to about 14 seconds by keeping track of the interface name suffixes previously used. I defined two static bit maps (virBitmap), one each for macvtap and macvlan device name suffixes. When a macvtap/macvlan device is created, the index of the next clear bit (virBitmapNextClearBit) is retrieved to create the name. If an interface with that name does not exist, the device is created and the bit at the index used to create the interface name is set (virBitmapSetBit). When a macvtap/macvlan device is deleted, if the interface name has the pattern 'macvtap%d' or 'macvlan%d', the suffix is parsed into a bit index and used to clear the (virBitMapClearBit) bit in the respective bitmap.
This sounds fine, as long as 1) you recreate the bitmap whenever libvirtd is restarted (while scanning through all the interfaces of every domain; there is already code being executed in exactly the right place - look for qemu_process.c:qemuProcessNotifyNets() and add appropriate code inside the loop there), and 2) you retry some number of times if a supposedly unused device name is actually in use (to account for processes other than libvirt using the same naming convention).
How about re-using the approach we have for virPortAllocator? We maintain a bitmap of ports. On acquiring new port, we try to bind() to it. If we succeeded, we set the corresponding bit in the bitmap. Of course it may happen that a port in the host is already taken but our bitmap does not think so. That's okay. We just leave the corresponding bit alone => if we would set it as used, nobody will ever unset it. Moreover, we will try the port next time, and it may be free. Moreover, the bitmap is not saved anywhere, nor restored on daemon restart - this could be changed though. So what am I saying is practically the same as Laine, just extending his thoughts and giving you an example how to proceed further :) Michal

On Fri, Oct 30, 2015 at 11:49:12AM +0100, Michal Privoznik wrote:
On 29.10.2015 18:48, Laine Stump wrote:
On 10/29/2015 12:49 PM, Tony Krowiak wrote:
For a guest domain defined with a large number of macvtap devices, it takes an exceedingly long time to boot the guest. In a test of a guest domain configured with 82 macvtap devices, it took over two minutes for the guest to boot. An strace of the ioctl calls during guest start up showed the SIOCGIFFLAGS ioctl literally being invoked 3,403 times. I was able to isolate the source of the ioctl calls to the*virNetDevMacVLanCreateWithVPortProfile* function in*virnetdevmacvlan.c*. The macvtap interface name is created by looping over a counter variable, starting with zero, and appending the counter value to 'macvtap'.
I've wondered ever since the first time I saw that code why it was done that way, and why there had never been any performance complaints. Lacking any complaints, I promptly forgot about it (until the next time I went past the code for some other tangentially related reason.)
Since you're the first to complain, you have the honor of fixing it :-)
With each iteration, a call is made to*virNetDevExists* (SIOCGIFFLAGS ioctl) to determine if a device with that name already exists, until a unique name is created. In the test case cited above, to create an interface name for the 82nd macvtap device, the*virNetDevExists* function will be called for interface names 'macvtap0' to 'macvtap80' before it is determined that 'mavtap81' can be used. So if N is the number of macvtap interfaces defined for a guest, the SIOCGIFFLAGS ioctl will be invoked (N x N + N)/2 times to find an unused macvtap device names. That's assuming only one guest is being started, who knows how many times the ioctl may have to be called in an installation running a large number of guests defined with macvtap devices.
Not only that, but unitl c0d162c68c2f19af8d55a435a9e372da33857048 ( contained v1.2.2~32) if two threads were starting a domain concurrently, they even competed with each other in that specific area of the code.
I was able to reduce the amount of time for starting a guest domain defined with 82 macvtap devices from over 2 minutes to about 14 seconds by keeping track of the interface name suffixes previously used. I defined two static bit maps (virBitmap), one each for macvtap and macvlan device name suffixes. When a macvtap/macvlan device is created, the index of the next clear bit (virBitmapNextClearBit) is retrieved to create the name. If an interface with that name does not exist, the device is created and the bit at the index used to create the interface name is set (virBitmapSetBit). When a macvtap/macvlan device is deleted, if the interface name has the pattern 'macvtap%d' or 'macvlan%d', the suffix is parsed into a bit index and used to clear the (virBitMapClearBit) bit in the respective bitmap.
This sounds fine, as long as 1) you recreate the bitmap whenever libvirtd is restarted (while scanning through all the interfaces of every domain; there is already code being executed in exactly the right place - look for qemu_process.c:qemuProcessNotifyNets() and add appropriate code inside the loop there), and 2) you retry some number of times if a supposedly unused device name is actually in use (to account for processes other than libvirt using the same naming convention).
How about re-using the approach we have for virPortAllocator? We maintain a bitmap of ports. On acquiring new port, we try to bind() to it. If we succeeded, we set the corresponding bit in the bitmap. Of course it may happen that a port in the host is already taken but our bitmap does not think so. That's okay. We just leave the corresponding bit alone => if we would set it as used, nobody will ever unset it. Moreover, we will try the port next time, and it may be free.
Moreover, the bitmap is not saved anywhere, nor restored on daemon restart - this could be changed though.
So what am I saying is practically the same as Laine, just extending his thoughts and giving you an example how to proceed further :)
Yeah, I think maintaining a bitmap of used device indexes or names is a fine idea for this. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

For a guest domain defined with a large number of macvtap devices, it takes an exceedingly long time to boot the guest. In a test of a guest domain configured with 82 macvtap devices, it took over two minutes for the guest to boot. An strace of the ioctl calls during guest start up showed the SIOCGIFFLAGS ioctl literally being invoked 3,403 times. I was able to isolate the source of the ioctl calls to the*virNetDevMacVLanCreateWithVPortProfile* function in*virnetdevmacvlan.c*. The macvtap interface name is created by looping over a counter variable, starting with zero, and appending the counter value to 'macvtap'. I've wondered ever since the first time I saw that code why it was done
On 10/29/2015 12:49 PM, Tony Krowiak wrote: that way, and why there had never been any performance complaints. Lacking any complaints, I promptly forgot about it (until the next time I went past the code for some other tangentially related reason.)
Since you're the first to complain, you have the honor of fixing it :-)
With each iteration, a call is made to*virNetDevExists* (SIOCGIFFLAGS ioctl) to determine if a device with that name already exists, until a unique name is created. In the test case cited above, to create an interface name for the 82nd macvtap device, the*virNetDevExists* function will be called for interface names 'macvtap0' to 'macvtap80' before it is determined that 'mavtap81' can be used. So if N is the number of macvtap interfaces defined for a guest, the SIOCGIFFLAGS ioctl will be invoked (N x N + N)/2 times to find an unused macvtap device names. That's assuming only one guest is being started, who knows how many times the ioctl may have to be called in an installation running a large number of guests defined with macvtap devices. Not only that, but unitl c0d162c68c2f19af8d55a435a9e372da33857048 ( contained v1.2.2~32) if two threads were starting a domain concurrently,
On 29.10.2015 18:48, Laine Stump wrote: they even competed with each other in that specific area of the code.
I was able to reduce the amount of time for starting a guest domain defined with 82 macvtap devices from over 2 minutes to about 14 seconds by keeping track of the interface name suffixes previously used. I defined two static bit maps (virBitmap), one each for macvtap and macvlan device name suffixes. When a macvtap/macvlan device is created, the index of the next clear bit (virBitmapNextClearBit) is retrieved to create the name. If an interface with that name does not exist, the device is created and the bit at the index used to create the interface name is set (virBitmapSetBit). When a macvtap/macvlan device is deleted, if the interface name has the pattern 'macvtap%d' or 'macvlan%d', the suffix is parsed into a bit index and used to clear the (virBitMapClearBit) bit in the respective bitmap. This sounds fine, as long as 1) you recreate the bitmap whenever libvirtd is restarted (while scanning through all the interfaces of every domain; there is already code being executed in exactly the right place - look for qemu_process.c:qemuProcessNotifyNets() and add appropriate code inside the loop there), and 2) you retry some number of times if a supposedly unused device name is actually in use (to account for processes other than libvirt using the same naming convention). How about re-using the approach we have for virPortAllocator? We maintain a bitmap of ports. On acquiring new port, we try to bind() to it. If we succeeded, we set the corresponding bit in the bitmap. Of course it may happen that a port in the host is already taken but our bitmap does not think so. That's okay. We just leave the corresponding bit alone => if we would set it as used, nobody will ever unset it. Moreover, we will try the port next time, and it may be free.
Moreover, the bitmap is not saved anywhere, nor restored on daemon restart - this could be changed though.
So what am I saying is practically the same as Laine, just extending his thoughts and giving you an example how to proceed further :) I appreciate the input. This is similar to the first solution I
On 10/30/2015 06:49 AM, Michal Privoznik wrote: proposed, which I actually implemented and tested. It is described above.
Michal

On 10/30/2015 11:56 AM, Tony Krowiak wrote:
On 10/30/2015 06:49 AM, Michal Privoznik wrote:
For a guest domain defined with a large number of macvtap devices, it takes an exceedingly long time to boot the guest. In a test of a guest domain configured with 82 macvtap devices, it took over two minutes for the guest to boot. An strace of the ioctl calls during guest start up showed the SIOCGIFFLAGS ioctl literally being invoked 3,403 times. I was able to isolate the source of the ioctl calls to the*virNetDevMacVLanCreateWithVPortProfile* function in*virnetdevmacvlan.c*. The macvtap interface name is created by looping over a counter variable, starting with zero, and appending the counter value to 'macvtap'. I've wondered ever since the first time I saw that code why it was done
On 10/29/2015 12:49 PM, Tony Krowiak wrote: that way, and why there had never been any performance complaints. Lacking any complaints, I promptly forgot about it (until the next time I went past the code for some other tangentially related reason.)
Since you're the first to complain, you have the honor of fixing it :-)
With each iteration, a call is made to*virNetDevExists* (SIOCGIFFLAGS ioctl) to determine if a device with that name already exists, until a unique name is created. In the test case cited above, to create an interface name for the 82nd macvtap device, the*virNetDevExists* function will be called for interface names 'macvtap0' to 'macvtap80' before it is determined that 'mavtap81' can be used. So if N is the number of macvtap interfaces defined for a guest, the SIOCGIFFLAGS ioctl will be invoked (N x N + N)/2 times to find an unused macvtap device names. That's assuming only one guest is being started, who knows how many times the ioctl may have to be called in an installation running a large number of guests defined with macvtap devices. Not only that, but unitl c0d162c68c2f19af8d55a435a9e372da33857048 ( contained v1.2.2~32) if two threads were starting a domain concurrently,
On 29.10.2015 18:48, Laine Stump wrote: they even competed with each other in that specific area of the code.
That's only for the veth devices used by lxc, which is a separate thing from the macvtap devices (rough description: a veth is a pair of netdevs where the gazinta of one is the gazouta of the other and vice versa, while a macvtap is a chardev on one side, and is a netdev permanently attached "to the side" of another netdev at a point where the two can't see each others' traffic, but can both see all traffic going in and out the original device). veth creation does suffer from the same problem though - each time you want to create a veth pair, libvirt will check for an unused device name by calling virNetDevExists() starting at vnet0, so it could benefit from the same code if it was made agnostic of basename (note that all standard tap devices would need to be marked in such a bitmap, since they also use the name "vnet%d"). (An aside: I'm a bit surprised to see that we are creating veths using "ip link add", since the same thing can be done with a netlink message and would remove the need to exec an external binary. Not that I haven't done the same thing myself when it was expedient :-)

On 10/29/2015 01:48 PM, Laine Stump wrote:
On 10/29/2015 12:49 PM, Tony Krowiak wrote:
For a guest domain defined with a large number of macvtap devices, it takes an exceedingly long time to boot the guest. In a test of a guest domain configured with 82 macvtap devices, it took over two minutes for the guest to boot. An strace of the ioctl calls during guest start up showed the SIOCGIFFLAGS ioctl literally being invoked 3,403 times. I was able to isolate the source of the ioctl calls to the*virNetDevMacVLanCreateWithVPortProfile* function in*virnetdevmacvlan.c*. The macvtap interface name is created by looping over a counter variable, starting with zero, and appending the counter value to 'macvtap'.
I've wondered ever since the first time I saw that code why it was done that way, and why there had never been any performance complaints. Lacking any complaints, I promptly forgot about it (until the next time I went past the code for some other tangentially related reason.)
Since you're the first to complain, you have the honor of fixing it :-) Thank you for that honor.
With each iteration, a call is made to*virNetDevExists* (SIOCGIFFLAGS ioctl) to determine if a device with that name already exists, until a unique name is created. In the test case cited above, to create an interface name for the 82nd macvtap device, the*virNetDevExists* function will be called for interface names 'macvtap0' to 'macvtap80' before it is determined that 'mavtap81' can be used. So if N is the number of macvtap interfaces defined for a guest, the SIOCGIFFLAGS ioctl will be invoked (N x N + N)/2 times to find an unused macvtap device names. That's assuming only one guest is being started, who knows how many times the ioctl may have to be called in an installation running a large number of guests defined with macvtap devices.
I was able to reduce the amount of time for starting a guest domain defined with 82 macvtap devices from over 2 minutes to about 14 seconds by keeping track of the interface name suffixes previously used. I defined two static bit maps (virBitmap), one each for macvtap and macvlan device name suffixes. When a macvtap/macvlan device is created, the index of the next clear bit (virBitmapNextClearBit) is retrieved to create the name. If an interface with that name does not exist, the device is created and the bit at the index used to create the interface name is set (virBitmapSetBit). When a macvtap/macvlan device is deleted, if the interface name has the pattern 'macvtap%d' or 'macvlan%d', the suffix is parsed into a bit index and used to clear the (virBitMapClearBit) bit in the respective bitmap.
This sounds fine, as long as 1) you recreate the bitmap whenever libvirtd is restarted (while scanning through all the interfaces of every domain; there is already code being executed in exactly the right place - look for qemu_process.c:qemuProcessNotifyNets() and add appropriate code inside the loop there), and 2) you retry some number of times if a supposedly unused device name is actually in use (to account for processes other than libvirt using the same naming convention).
I am not sure that is the best design because there is no way to track interface names used to create macvtap devices outside of libvirt, for example using the ip command.
If you wanted to get *really* complicated, you could use netlink to get a list of all network devices, or even monitor netlink traffic to maintain your own cache, but I think that's serious overkill (until proven otherwise). I agree, I think this would be overkill. I think it would require that we track the complete interface names as opposed to maintaining a bitmap of interface name suffixes.
There may also be other issues I've not contemplated. I included a couple of additional ideas below and am looking for comments or other suggestions that I have not considered.
* Define a global counter variable initialized to 0, that gets incremented each time an interface name is created, to keep track of the last used interface name suffix. At some maximum value, the counter will be set back to 0.
There could be some merit to this, as it is simpler and likely faster. You would need to maintain the counter somewhere in persistent storage so it could be retrieved when libvirtd is restarted though. I have a problem with this one, because certain scenarios could introduce performance issues, for example:
* Guest1, defined with 1 macvtap device is started and the 'macvtap0' device is created * A plethora of guests are subsequently defined, such that there are no gaps between interface names 'macvtap0' and 'macvtap5100' * Guest1 is deleted, thus removing the 'macvtap0' device * Additional guests are defined until the counter recycles back to 0 * GuestN is defined with more than one macvtap device. When guestN is started, the 'macvtap0' device will get created for it right off the bat, but then 5000 ioctl calls will be made until 'macvtap5200' is found to be available. I don't know what the likelihood of such a scenario is, but we should probably code for such contingencies. What say you?
* Append a random number to 'macvlan' or 'macvtap' when creating the interface name. Of course, the number of digits would have to be limited so the interface name would not exceed the maximum allowed.
Well, that has the advantage that no persistent state information is required.
This one would be pretty easy to implement and as you said, would not require maintaining persistent state information. The only question I have with regard to this one is would users complain that the expected behavior has dramatically changed. Curently, the macvtap interface names are somewhat consecutive and look like 'macvtap0', 'macvtap1' ... macvtapN, with the gaps being filled in as new macvtap devices are created. With this idea, the device names would look like 'macvtap83927611', 'macvtap91304510', 'macvtap18294667' .... Do you think this would be a problem?
* Create the interface name in code that has more knowledge of the environment and pass the name into the *virNetDevMacVLanCreateWithVPortProfile* function via the *tgifname* parameter. For example, the *qemuBuildCommandLine* function in *qemu_command.c* contains the loop that iterates over the network devices defined for the guest domain that ultimately get created via the *virNetDevMacVLanCreateWithVPortProfile* function. That function has access to the network device configuration and at the very least could ensure none of the names previously defined for the guest aren't used. I believe it would be matter of creating a macvtap interface name - e.g., maybe a call to some function in *virnetdevmacvlan.c* - and setting the name in the virDomainNetDef structure prior to invoking *qemuBuildInterfaceCommandLine*?
I don't quite follow what you're saying, but it sounds like you are suggesting that we try to know enough about the environment that we can predetermine an interface name. That won't work though - you can't know for certain that some other program hasn't taken the name you want until you try to create is.
The name creation function in *virnetdevmacvlan.c* would still check to see if a device with the name exists. I don't really like this idea anyway for a lot of other reasons.
There are shortcomings in all of these ideas, so if you have a better one, feel free to present it.
Any of the first three is better than what we currently do. Note that in the case of standard tap devices, the kernel itself handles the creation of a unique name - if you call ioctl(TUNSETIFF) with a string with "%d" in it and it finds the lowest numbered unused name and returns that. For some reason, the macvtap authors didn't want to do that.
Correct me if I am wrong, but doing something like this would require changes in the kernel?

On 10/30/2015 01:44 PM, Tony Krowiak wrote:
On 10/29/2015 01:48 PM, Laine Stump wrote:
On 10/29/2015 12:49 PM, Tony Krowiak wrote:
For a guest domain defined with a large number of macvtap devices, it takes an exceedingly long time to boot the guest. In a test of a guest domain configured with 82 macvtap devices, it took over two minutes for the guest to boot. An strace of the ioctl calls during guest start up showed the SIOCGIFFLAGS ioctl literally being invoked 3,403 times. I was able to isolate the source of the ioctl calls to the*virNetDevMacVLanCreateWithVPortProfile* function in*virnetdevmacvlan.c*. The macvtap interface name is created by looping over a counter variable, starting with zero, and appending the counter value to 'macvtap'.
I've wondered ever since the first time I saw that code why it was done that way, and why there had never been any performance complaints. Lacking any complaints, I promptly forgot about it (until the next time I went past the code for some other tangentially related reason.)
Since you're the first to complain, you have the honor of fixing it :-) Thank you for that honor.
With each iteration, a call is made to*virNetDevExists* (SIOCGIFFLAGS ioctl) to determine if a device with that name already exists, until a unique name is created. In the test case cited above, to create an interface name for the 82nd macvtap device, the*virNetDevExists* function will be called for interface names 'macvtap0' to 'macvtap80' before it is determined that 'mavtap81' can be used. So if N is the number of macvtap interfaces defined for a guest, the SIOCGIFFLAGS ioctl will be invoked (N x N + N)/2 times to find an unused macvtap device names. That's assuming only one guest is being started, who knows how many times the ioctl may have to be called in an installation running a large number of guests defined with macvtap devices.
I was able to reduce the amount of time for starting a guest domain defined with 82 macvtap devices from over 2 minutes to about 14 seconds by keeping track of the interface name suffixes previously used. I defined two static bit maps (virBitmap), one each for macvtap and macvlan device name suffixes. When a macvtap/macvlan device is created, the index of the next clear bit (virBitmapNextClearBit) is retrieved to create the name. If an interface with that name does not exist, the device is created and the bit at the index used to create the interface name is set (virBitmapSetBit). When a macvtap/macvlan device is deleted, if the interface name has the pattern 'macvtap%d' or 'macvlan%d', the suffix is parsed into a bit index and used to clear the (virBitMapClearBit) bit in the respective bitmap.
This sounds fine, as long as 1) you recreate the bitmap whenever libvirtd is restarted (while scanning through all the interfaces of every domain; there is already code being executed in exactly the right place - look for qemu_process.c:qemuProcessNotifyNets() and add appropriate code inside the loop there), and 2) you retry some number of times if a supposedly unused device name is actually in use (to account for processes other than libvirt using the same naming convention).
I am not sure that is the best design because there is no way to track interface names used to create macvtap devices outside of libvirt, for example using the ip command.
If you wanted to get *really* complicated, you could use netlink to get a list of all network devices, or even monitor netlink traffic to maintain your own cache, but I think that's serious overkill (until proven otherwise). I agree, I think this would be overkill. I think it would require that we track the complete interface names as opposed to maintaining a bitmap of interface name suffixes.
There may also be other issues I've not contemplated. I included a couple of additional ideas below and am looking for comments or other suggestions that I have not considered.
* Define a global counter variable initialized to 0, that gets incremented each time an interface name is created, to keep track of the last used interface name suffix. At some maximum value, the counter will be set back to 0.
There could be some merit to this, as it is simpler and likely faster. You would need to maintain the counter somewhere in persistent storage so it could be retrieved when libvirtd is restarted though. I have a problem with this one, because certain scenarios could introduce performance issues, for example:
* Guest1, defined with 1 macvtap device is started and the 'macvtap0' device is created * A plethora of guests are subsequently defined, such that there are no gaps between interface names 'macvtap0' and 'macvtap5100' * Guest1 is deleted, thus removing the 'macvtap0' device * Additional guests are defined until the counter recycles back to 0 * GuestN is defined with more than one macvtap device. When guestN is started, the 'macvtap0' device will get created for it right off the bat, but then 5000 ioctl calls will be made until 'macvtap5200' is found to be available.
I don't know what the likelihood of such a scenario is, but we should probably code for such contingencies. What say you?
Yes, you're correct - it would work very nicely unless/until it wrapped back around to 0, and after that we would be back in more or less the same situation we have now.
* Append a random number to 'macvlan' or 'macvtap' when creating the interface name. Of course, the number of digits would have to be limited so the interface name would not exceed the maximum allowed.
Well, that has the advantage that no persistent state information is required.
This one would be pretty easy to implement and as you said, would not require maintaining persistent state information. The only question I have with regard to this one is would users complain that the expected behavior has dramatically changed. Curently, the macvtap interface names are somewhat consecutive and look like 'macvtap0', 'macvtap1' ... macvtapN, with the gaps being filled in as new macvtap devices are created. With this idea, the device names would look like 'macvtap83927611', 'macvtap91304510', 'macvtap18294667' .... Do you think this would be a problem?
Depends on your definition of "problem". It would likely work just fine, and it's unlikely you'd get enough simultaneous macvtaps that you would have to do a lot of retries, but it would be *much* more difficult to talk about a particular device - "Which macvtap was that?" "91304510", vs "1" :-). Just or that reason I'm kind of against this idea (and the previous one as well).
* Create the interface name in code that has more knowledge of the environment and pass the name into the *virNetDevMacVLanCreateWithVPortProfile* function via the *tgifname* parameter. For example, the *qemuBuildCommandLine* function in *qemu_command.c* contains the loop that iterates over the network devices defined for the guest domain that ultimately get created via the *virNetDevMacVLanCreateWithVPortProfile* function. That function has access to the network device configuration and at the very least could ensure none of the names previously defined for the guest aren't used. I believe it would be matter of creating a macvtap interface name - e.g., maybe a call to some function in *virnetdevmacvlan.c* - and setting the name in the virDomainNetDef structure prior to invoking *qemuBuildInterfaceCommandLine*?
I don't quite follow what you're saying, but it sounds like you are suggesting that we try to know enough about the environment that we can predetermine an interface name. That won't work though - you can't know for certain that some other program hasn't taken the name you want until you try to create is.
The name creation function in *virnetdevmacvlan.c* would still check to see if a device with the name exists. I don't really like this idea anyway for a lot of other reasons.
Yeah, knowing about other interfaces for this particular guess is only a very small part of the equation. You really want to know about all interfaces for all guests, which is exactly what your proposed bitmap does.
There are shortcomings in all of these ideas, so if you have a better one, feel free to present it.
Any of the first three is better than what we currently do. Note that in the case of standard tap devices, the kernel itself handles the creation of a unique name - if you call ioctl(TUNSETIFF) with a string with "%d" in it and it finds the lowest numbered unused name and returns that. For some reason, the macvtap authors didn't want to do that.
Correct me if I am wrong, but doing something like this would require changes in the kernel?
Yes. You would have to allow macvtap creation to generate an automatic name when created, and have a method of changing the name of a macvtap device. (It's entirely possible both of these capabilities exist, but I haven't seen them). Anything that requires a kernel change I consider in the realm of "not feasible" (unless that really is the only way to get it done).
participants (4)
-
Daniel P. Berrange
-
Laine Stump
-
Michal Privoznik
-
Tony Krowiak