[libvirt] [PATCHv2 0/3] Speed up hvsupport.pl

This reduces the script execution time from 14.7s to 60 ms. Since this was by far the longest taking target from the docs/ directory and make does not parallelize across Makefiles, the savings are similar for a complete build from a fresh git checkout. The second patch shaves off 8.2s, the third 6.4s and the last one shaves off half of the remaining time. v2: * renamed parseAPIXML to getAPIFilenames * documented getAPIFilenames * comment in apibuild.py above the line that generates the data consumed by hvsupport.pl * more verbose commit messages Ján Tomko (3): hvsupport: use a regex instead of XML::XPath hvsupport: construct the group regex upfront hvsupport: skip non-matching lines early bootstrap.conf | 1 - docs/apibuild.py | 1 + docs/hvsupport.pl | 69 +++++++++++++++++++++++++++++++++++++------------------ 3 files changed, 48 insertions(+), 23 deletions(-) -- 2.7.3

When generating the hvsupport.html.in file, we parse the -api.xml files generated by apibuild.py to know in which HTML file the API function is. Doing an XPath query for every single 'function' element in the file is inefficient. Since the XML file is generated by another of our build scripts (apibuild.py, using Python's standard 'output.write' XML library), just find the function name->file mapping by a regex upfront. Also add a note about this next to the line that generates it in apibuild.py and do not check if XML::XPath is installed in bootstrap since we no longer use it. --- bootstrap.conf | 1 - docs/apibuild.py | 1 + docs/hvsupport.pl | 33 ++++++++++++++++++++++++++++----- 3 files changed, 29 insertions(+), 6 deletions(-) diff --git a/bootstrap.conf b/bootstrap.conf index edea8c3..0bfa794 100644 --- a/bootstrap.conf +++ b/bootstrap.conf @@ -209,7 +209,6 @@ gzip - libtool - patch - perl 5.5 -perl::XML::XPath - pkg-config - rpcgen - tar - diff --git a/docs/apibuild.py b/docs/apibuild.py index f5216ea..8728b27 100755 --- a/docs/apibuild.py +++ b/docs/apibuild.py @@ -2267,6 +2267,7 @@ class docBuilder: if name == debugsym and not quiet: print "=>", id + # NB: this is consumed by a regex in 'getAPIFilenames' in hvsupport.pl output.write(" <%s name='%s' file='%s' module='%s'>\n" % (id.type, name, self.modulename_file(id.header), self.modulename_file(id.module))) diff --git a/docs/hvsupport.pl b/docs/hvsupport.pl index 7a6f1ac..7dd7c3f 100755 --- a/docs/hvsupport.pl +++ b/docs/hvsupport.pl @@ -4,8 +4,6 @@ use strict; use warnings; use File::Find; -use XML::XPath; -use XML::XPath::XMLParser; die "syntax: $0 SRCDIR\n" unless int(@ARGV) == 1; @@ -45,6 +43,32 @@ find({ } }, no_chdir => 1}, $srcdir); +# Map API functions to the header and documentation files they're in +# so that we can generate proper hyperlinks to their documentation. +# +# The function names are grep'd from the XML output of apibuild.py. +sub getAPIFilenames { + my $filename = shift; + + my %files; + my $line; + + open FILE, "<", $filename or die "cannot read $filename: $!"; + + while (defined($line = <FILE>)) { + if ($line =~ /function name='([^']+)' file='([^']+)'/) { + $files{$1} = $2; + } + } + + close FILE; + + if (keys %files == 0) { + die "No functions found in $filename. Has the apibuild.py output changed?"; + } + return \%files; +} + sub parseSymsFile { my $apisref = shift; my $prefix = shift; @@ -55,7 +79,7 @@ sub parseSymsFile { my $vers; my $prevvers; - my $apixpath = XML::XPath->new(filename => $xmlfilename); + my $filenames = getAPIFilenames($xmlfilename); open FILE, "<$filename" or die "cannot read $filename: $!"; @@ -83,10 +107,9 @@ sub parseSymsFile { $prevvers = $vers; $vers = undef; } elsif ($line =~ /\s*(\w+)\s*;\s*$/) { - my $file = $apixpath->find("/api/symbols/function[\@name='$1']/\@file"); $$apisref{$1} = {}; $$apisref{$1}->{vers} = $vers; - $$apisref{$1}->{file} = $file; + $$apisref{$1}->{file} = $$filenames{$1}; } else { die "unexpected data $line\n"; } -- 2.7.3

On Tue, Jul 19, 2016 at 18:55:52 +0200, Ján Tomko wrote:
When generating the hvsupport.html.in file, we parse the -api.xml files generated by apibuild.py to know in which HTML file the API function is.
Doing an XPath query for every single 'function' element in the file is inefficient.
Since the XML file is generated by another of our build scripts (apibuild.py, using Python's standard 'output.write' XML library), just find the function name->file mapping by a regex upfront.
Also add a note about this next to the line that generates it in apibuild.py and do not check if XML::XPath is installed in bootstrap since we no longer use it. --- bootstrap.conf | 1 - docs/apibuild.py | 1 + docs/hvsupport.pl | 33 ++++++++++++++++++++++++++++----- 3 files changed, 29 insertions(+), 6 deletions(-)
ACK

The %groups hash contains all the driver types (e.g. virHypervisorDriver or virSecretDriver). When searching for all the APIs that are implemented by a driver of that specific driver type, we keep iterating over the %groups hash on every line we look at, then matching against the driver type. This is inefficient because it prevents perl from caching the regex and it executes the regex once for every driver type, even though one regex matching excludes all the others, since all the driver types are different. Construct the regex containing all the driver types upfront to save about 6.4s (~98%) of the script execution time. --- docs/hvsupport.pl | 33 ++++++++++++++++----------------- 1 file changed, 16 insertions(+), 17 deletions(-) diff --git a/docs/hvsupport.pl b/docs/hvsupport.pl index 7dd7c3f..fca83ca 100755 --- a/docs/hvsupport.pl +++ b/docs/hvsupport.pl @@ -207,28 +207,27 @@ foreach my $src (@srcs) { open FILE, "<$src" or die "cannot read $src: $!"; + my $groups_regex = join("|", keys %groups); $ingrp = undef; my $impl; while (defined($line = <FILE>)) { if (!$ingrp) { - foreach my $grp (keys %groups) { - if ($line =~ /^\s*(?:static\s+)?$grp\s+(\w+)\s*=\s*{/ || - $line =~ /^\s*(?:static\s+)?$grp\s+NAME\(\w+\)\s*=\s*{/) { - $ingrp = $grp; - $impl = $src; - - if ($impl =~ m,.*/node_device_(\w+)\.c,) { - $impl = $1; - } else { - $impl =~ s,.*/(\w+?)_((\w+)_)?(\w+)\.c,$1,; - } - - if ($groups{$ingrp}->{drivers}->{$impl}) { - die "Group $ingrp already contains $impl"; - } - - $groups{$ingrp}->{drivers}->{$impl} = {}; + if ($line =~ /^\s*(?:static\s+)?($groups_regex)\s+(\w+)\s*=\s*{/ || + $line =~ /^\s*(?:static\s+)?($groups_regex)\s+NAME\(\w+\)\s*=\s*{/) { + $ingrp = $1; + $impl = $src; + + if ($impl =~ m,.*/node_device_(\w+)\.c,) { + $impl = $1; + } else { + $impl =~ s,.*/(\w+?)_((\w+)_)?(\w+)\.c,$1,; } + + if ($groups{$ingrp}->{drivers}->{$impl}) { + die "Group $ingrp already contains $impl"; + } + + $groups{$ingrp}->{drivers}->{$impl} = {}; } } else { -- 2.7.3

On Tue, Jul 19, 2016 at 18:55:53 +0200, Ján Tomko wrote:
The %groups hash contains all the driver types (e.g. virHypervisorDriver or virSecretDriver).
When searching for all the APIs that are implemented by a driver of that specific driver type, we keep iterating over the %groups hash on every line we look at, then matching against the driver type.
This is inefficient because it prevents perl from caching the regex and it executes the regex once for every driver type, even though one regex matching excludes all the others, since all the driver types are different.
Construct the regex containing all the driver types upfront to save about 6.4s (~98%) of the script execution time. --- docs/hvsupport.pl | 33 ++++++++++++++++----------------- 1 file changed, 16 insertions(+), 17 deletions(-)
ACK

Most of the lines we look at are not going to match one of the driver types contained in $groups_regex. Move on to the next line if it does not contain any of them early. This speeds up the script execution by 50%, since this simple regex does not have any capture groups. --- docs/hvsupport.pl | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/hvsupport.pl b/docs/hvsupport.pl index fca83ca..2ead2cf 100755 --- a/docs/hvsupport.pl +++ b/docs/hvsupport.pl @@ -212,6 +212,9 @@ foreach my $src (@srcs) { my $impl; while (defined($line = <FILE>)) { if (!$ingrp) { + # skip non-matching lines early to save time + next if not $line =~ /$groups_regex/; + if ($line =~ /^\s*(?:static\s+)?($groups_regex)\s+(\w+)\s*=\s*{/ || $line =~ /^\s*(?:static\s+)?($groups_regex)\s+NAME\(\w+\)\s*=\s*{/) { $ingrp = $1; -- 2.7.3

On Tue, Jul 19, 2016 at 18:55:54 +0200, Ján Tomko wrote:
Most of the lines we look at are not going to match one of the driver types contained in $groups_regex.
Move on to the next line if it does not contain any of them early. This speeds up the script execution by 50%, since this simple regex does not have any capture groups. --- docs/hvsupport.pl | 3 +++ 1 file changed, 3 insertions(+)
ACK
participants (2)
-
Ján Tomko
-
Peter Krempa