Re: [libvirt] [PATCH 0/4] improve virConnectListAllInterfaces()

25 Sep 2015

      On Fri, Sep 25, 2015 at 05:22:30PM +0100, Daniel P. Berrange wrote:
...
On Fri, Sep 25, 2015 at 11:13:52AM -0400, Laine Stump wrote:
...
There's a bit of background about this here:
https://www.redhat.com/archives/augeas-devel/2015-September/msg00001.html
In short, virt-manager is calling the virInterface APIs and that ties
up a libvirt thread (and CPU core) for a very long time on hosts that
have a large number of interfaces. These patches don't cure the
problem (I don't know that there really is a cure other than "Don't DO
that!"), but they do fix a couple of bugs I found while investigating,
and make a substantial improvement in the amount of time used by
virConnectListAllInterfaces().
One thing that I wondered about while investigating this - a big use
of CPU by virConnectListAllInterfaces() comes from the need to
retrieve the MAC address of every interface. The MAC addresses are
both
1) returned to the caller in the interface objects and
2) sent to the policykit ACL checking to decide which interfaces to include in
the list.
I'm 90% confident that
1) most callers don't really care that they're getting the MAC address
along with interface name (virt-manager, for example, follows up with
a virInterfaceGetXMLDesc() anyway)), and
2) there is not even a single host *anywhere* that is using libvirt
policykit ACLs to limit the list of host interfaces viewable by a
client.
So we could add a flag to not return MAC addresses, which would allow
cutting down the time to list all devices to something < 1
second). But this would be at the expense of no longer having the
possibility to limit the list with policykit according to MAC
address. Since all host interface information is available to all
users via the file system, for example, I don't see this as a huge
issue, but it would change behavior, so I don't feel comfortable doing
it. I don't like the idea of a single API call taking > 1 minute to
return either, though. Does anyone have an opinion about this?
Ultimately 500 interfaces, each ifcfg-XXX file 300 bytes in size
on average is only 150 KB of data. Given the amount of data we
are consuming, here I think it is reasonable to expect we can
process than in a tiny fraction of a second. So there's clearly
a serious algorithmic / data structure flaw here if its taking
minutes.
By the sounds of the thread you quote, its in augeas itself, so I
think we really need to focus on addressing that root cause as a
priority rather than try to work around it.
As a side note, we might consider adding new API to netcf so that
we can fetch the entire interface set + macs in one api call to
netcf, though I doubt it'd chance performance that much.
So, I instrumented the netcf and augeas code to checking timings.

The aug_get calls time less than a millisecond, as do the various
other calls. I found the bulk of the time is actually coming from
the netcf function "get_augeas", which in turns call "aug_load"
for every single damn netcf function call. So when we have 500
interfaces, we're telling augeas to load all the config files
1000 times. That's where the slowness is coming from....

Either we need to stop loading congfig files on every fnuction
call in netcf, or we need to add a netcf bulk data API call,
so that libvirt can load /all/ the data it needs in 1 single
API call.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|