[libvirt] [PATCH 0/4] Speed up hvsupport.pl

This reduces the script execution time from 14.7s to 60 ms. Since this was by far the longest taking target from the docs/ directory and make does not parallelize across Makefiles, the savings are similar for a complete build from a fresh git checkout. The second patch shaves off 8.2s, the third 6.4s and the last one shaves off half of the remaining time. Ján Tomko (4): hvsupport: Introduce parseSymsFile hvsupport: drop XML::XPath hvsupport: construct regex up front hvsupport: skip non-matching lines early bootstrap.conf | 1 - docs/hvsupport.pl | 218 +++++++++++++++++++++--------------------------------- 2 files changed, 85 insertions(+), 134 deletions(-) -- 2.7.3

The code for parsing the different public syms files only differs in the filenames and version prefix. Unify it to a single subroutine. --- docs/hvsupport.pl | 168 ++++++++++++++++-------------------------------------- 1 file changed, 50 insertions(+), 118 deletions(-) diff --git a/docs/hvsupport.pl b/docs/hvsupport.pl index 44a30ce..7a6f1ac 100755 --- a/docs/hvsupport.pl +++ b/docs/hvsupport.pl @@ -44,136 +44,66 @@ find({ push @srcs, $_ if $_ !~ /vbox_driver\.c/; } }, no_chdir => 1}, $srcdir); -my $line; -# Get the list of all public APIs and their corresponding version +sub parseSymsFile { + my $apisref = shift; + my $prefix = shift; + my $filename = shift; + my $xmlfilename = shift; -my %apis; -open FILE, "<$symslibvirt" - or die "cannot read $symslibvirt: $!"; - -my $vers; -my $prevvers; -my $apixpath = XML::XPath->new(filename => "$srcdir/../docs/libvirt-api.xml"); -while (defined($line = <FILE>)) { - chomp $line; - next if $line =~ /^\s*#/; - next if $line =~ /^\s*$/; - next if $line =~ /^\s*(global|local):/; - if ($line =~ /^\s*LIBVIRT_(\d+\.\d+\.\d+)\s*{\s*$/) { - if (defined $vers) { - die "malformed syms file"; - } - $vers = $1; - } elsif ($line =~ /\s*}\s*;\s*$/) { - if (defined $prevvers) { - die "malformed syms file"; - } - $prevvers = $vers; - $vers = undef; - } elsif ($line =~ /\s*}\s*LIBVIRT_(\d+\.\d+\.\d+)\s*;\s*$/) { - if ($1 ne $prevvers) { - die "malformed syms file $1 != $vers"; - } - $prevvers = $vers; - $vers = undef; - } elsif ($line =~ /\s*(\w+)\s*;\s*$/) { - my $file = $apixpath->find("/api/symbols/function[\@name='$1']/\@file"); - $apis{$1} = {}; - $apis{$1}->{vers} = $vers; - $apis{$1}->{file} = $file; - } else { - die "unexpected data $line\n"; - } -} + my $line; + my $vers; + my $prevvers; -close FILE; + my $apixpath = XML::XPath->new(filename => $xmlfilename); + open FILE, "<$filename" + or die "cannot read $filename: $!"; -# And the same for the QEMU specific APIs - -open FILE, "<$symsqemu" - or die "cannot read $symsqemu: $!"; - -$prevvers = undef; -$vers = undef; -$apixpath = XML::XPath->new(filename => "$srcdir/../docs/libvirt-qemu-api.xml"); -while (defined($line = <FILE>)) { - chomp $line; - next if $line =~ /^\s*#/; - next if $line =~ /^\s*$/; - next if $line =~ /^\s*(global|local):/; - if ($line =~ /^\s*LIBVIRT_QEMU_(\d+\.\d+\.\d+)\s*{\s*$/) { - if (defined $vers) { - die "malformed syms file"; - } - $vers = $1; - } elsif ($line =~ /\s*}\s*;\s*$/) { - if (defined $prevvers) { - die "malformed syms file"; - } - $prevvers = $vers; - $vers = undef; - } elsif ($line =~ /\s*}\s*LIBVIRT_QEMU_(\d+\.\d+\.\d+)\s*;\s*$/) { - if ($1 ne $prevvers) { - die "malformed syms file $1 != $vers"; + while (defined($line = <FILE>)) { + chomp $line; + next if $line =~ /^\s*#/; + next if $line =~ /^\s*$/; + next if $line =~ /^\s*(global|local):/; + if ($line =~ /^\s*${prefix}_(\d+\.\d+\.\d+)\s*{\s*$/) { + if (defined $vers) { + die "malformed syms file"; + } + $vers = $1; + } elsif ($line =~ /\s*}\s*;\s*$/) { + if (defined $prevvers) { + die "malformed syms file"; + } + $prevvers = $vers; + $vers = undef; + } elsif ($line =~ /\s*}\s*${prefix}_(\d+\.\d+\.\d+)\s*;\s*$/) { + if ($1 ne $prevvers) { + die "malformed syms file $1 != $vers"; + } + $prevvers = $vers; + $vers = undef; + } elsif ($line =~ /\s*(\w+)\s*;\s*$/) { + my $file = $apixpath->find("/api/symbols/function[\@name='$1']/\@file"); + $$apisref{$1} = {}; + $$apisref{$1}->{vers} = $vers; + $$apisref{$1}->{file} = $file; + } else { + die "unexpected data $line\n"; } - $prevvers = $vers; - $vers = undef; - } elsif ($line =~ /\s*(\w+)\s*;\s*$/) { - my $file = $apixpath->find("/api/symbols/function[\@name='$1']/\@file"); - $apis{$1} = {}; - $apis{$1}->{vers} = $vers; - $apis{$1}->{file} = $file; - } else { - die "unexpected data $line\n"; } + + close FILE; } -close FILE; +my %apis; +# Get the list of all public APIs and their corresponding version +parseSymsFile(\%apis, "LIBVIRT", $symslibvirt, "$srcdir/../docs/libvirt-api.xml"); +# And the same for the QEMU specific APIs +parseSymsFile(\%apis, "LIBVIRT_QEMU", $symsqemu, "$srcdir/../docs/libvirt-qemu-api.xml"); # And the same for the LXC specific APIs - -open FILE, "<$symslxc" - or die "cannot read $symslxc: $!"; - -$prevvers = undef; -$vers = undef; -$apixpath = XML::XPath->new(filename => "$srcdir/../docs/libvirt-lxc-api.xml"); -while (defined($line = <FILE>)) { - chomp $line; - next if $line =~ /^\s*#/; - next if $line =~ /^\s*$/; - next if $line =~ /^\s*(global|local):/; - if ($line =~ /^\s*LIBVIRT_LXC_(\d+\.\d+\.\d+)\s*{\s*$/) { - if (defined $vers) { - die "malformed syms file"; - } - $vers = $1; - } elsif ($line =~ /\s*}\s*;\s*$/) { - if (defined $prevvers) { - die "malformed syms file"; - } - $prevvers = $vers; - $vers = undef; - } elsif ($line =~ /\s*}\s*LIBVIRT_LXC_(\d+\.\d+\.\d+)\s*;\s*$/) { - if ($1 ne $prevvers) { - die "malformed syms file $1 != $vers"; - } - $prevvers = $vers; - $vers = undef; - } elsif ($line =~ /\s*(\w+)\s*;\s*$/) { - my $file = $apixpath->find("/api/symbols/function[\@name='$1']/\@file"); - $apis{$1} = {}; - $apis{$1}->{vers} = $vers; - $apis{$1}->{file} = $file; - } else { - die "unexpected data $line\n"; - } -} - -close FILE; +parseSymsFile(\%apis, "LIBVIRT_LXC", $symslxc, "$srcdir/../docs/libvirt-lxc-api.xml"); # Some special things which aren't public APIs, @@ -206,6 +136,8 @@ $apis{virDomainMigrateConfirm3Params}->{vers} = "1.1.0"; # and driver struct fields. This lets us later match # update the driver impls with the public APis. +my $line; + # Group name -> hash of APIs { fields -> api name } my %groups; my $ingrp; -- 2.7.3

On 06/29/2016 02:40 AM, Ján Tomko wrote:
The code for parsing the different public syms files only differs in the filenames and version prefix.
Unify it to a single subroutine. --- docs/hvsupport.pl | 168 ++++++++++++++++-------------------------------------- 1 file changed, 50 insertions(+), 118 deletions(-)
ACK, John

Doing an XPath query for every single 'function' element in the file is inefficient. Since the XML file is generated by another of our build scripts (apibuild.py, using Python's standard 'output.write' XML library), just find the function name->file mapping by a regex upfront. --- bootstrap.conf | 1 - docs/hvsupport.pl | 30 +++++++++++++++++++++++++----- 2 files changed, 25 insertions(+), 6 deletions(-) diff --git a/bootstrap.conf b/bootstrap.conf index 0db6b62..c7263f4 100644 --- a/bootstrap.conf +++ b/bootstrap.conf @@ -208,7 +208,6 @@ gzip - libtool - patch - perl 5.5 -perl::XML::XPath - pkg-config - rpcgen - tar - diff --git a/docs/hvsupport.pl b/docs/hvsupport.pl index 7a6f1ac..80a6f80 100755 --- a/docs/hvsupport.pl +++ b/docs/hvsupport.pl @@ -4,8 +4,6 @@ use strict; use warnings; use File::Find; -use XML::XPath; -use XML::XPath::XMLParser; die "syntax: $0 SRCDIR\n" unless int(@ARGV) == 1; @@ -45,6 +43,29 @@ find({ } }, no_chdir => 1}, $srcdir); +sub parseAPIXML { + my $xmlfilename = shift; + + my %files; + my $line; + + open FILE, "<", $xmlfilename or die "cannot read $xmlfilename: $!"; + + while (defined($line = <FILE>)) { + chomp $line; + if ($line =~ /function name='([^']+)' file='([^']+)'/) { + $files{$1} = $2; + } + } + + close FILE; + + if (keys %files == 0) { + die "No functions found in $xmlfilename. Has the apibuild.py output changed?"; + } + return \%files; +} + sub parseSymsFile { my $apisref = shift; my $prefix = shift; @@ -55,7 +76,7 @@ sub parseSymsFile { my $vers; my $prevvers; - my $apixpath = XML::XPath->new(filename => $xmlfilename); + my $filenames = parseAPIXML($xmlfilename); open FILE, "<$filename" or die "cannot read $filename: $!"; @@ -83,10 +104,9 @@ sub parseSymsFile { $prevvers = $vers; $vers = undef; } elsif ($line =~ /\s*(\w+)\s*;\s*$/) { - my $file = $apixpath->find("/api/symbols/function[\@name='$1']/\@file"); $$apisref{$1} = {}; $$apisref{$1}->{vers} = $vers; - $$apisref{$1}->{file} = $file; + $$apisref{$1}->{file} = $$filenames{$1}; } else { die "unexpected data $line\n"; } -- 2.7.3

On 06/29/2016 02:40 AM, Ján Tomko wrote:
Doing an XPath query for every single 'function' element in the file is inefficient.
Since the XML file is generated by another of our build scripts (apibuild.py, using Python's standard 'output.write' XML library), just find the function name->file mapping by a regex upfront. --- bootstrap.conf | 1 - docs/hvsupport.pl | 30 +++++++++++++++++++++++++----- 2 files changed, 25 insertions(+), 6 deletions(-)
I know less than zero about what XML::XPath did, but I got no errors during my test build. Might be nice to put a word or three in the parseAPIXML to describe what's being done. I don't find the code entirely self documenting. As near as I could tell you're searching the output XML files for "<function name='" and picking off the function name to be searched elsewhere (the syms files?) to ensure the function has been exported properly... Consider this a weak ACK. John

This lets perl cache the regex. Even though there are only eight groups, the speedup is more than that. --- docs/hvsupport.pl | 33 ++++++++++++++++----------------- 1 file changed, 16 insertions(+), 17 deletions(-) diff --git a/docs/hvsupport.pl b/docs/hvsupport.pl index 80a6f80..31821ad 100755 --- a/docs/hvsupport.pl +++ b/docs/hvsupport.pl @@ -204,28 +204,27 @@ foreach my $src (@srcs) { open FILE, "<$src" or die "cannot read $src: $!"; + my $grps = join("|", keys %groups); $ingrp = undef; my $impl; while (defined($line = <FILE>)) { if (!$ingrp) { - foreach my $grp (keys %groups) { - if ($line =~ /^\s*(?:static\s+)?$grp\s+(\w+)\s*=\s*{/ || - $line =~ /^\s*(?:static\s+)?$grp\s+NAME\(\w+\)\s*=\s*{/) { - $ingrp = $grp; - $impl = $src; - - if ($impl =~ m,.*/node_device_(\w+)\.c,) { - $impl = $1; - } else { - $impl =~ s,.*/(\w+?)_((\w+)_)?(\w+)\.c,$1,; - } - - if ($groups{$ingrp}->{drivers}->{$impl}) { - die "Group $ingrp already contains $impl"; - } - - $groups{$ingrp}->{drivers}->{$impl} = {}; + if ($line =~ /^\s*(?:static\s+)?($grps)\s+(\w+)\s*=\s*{/ || + $line =~ /^\s*(?:static\s+)?($grps)\s+NAME\(\w+\)\s*=\s*{/) { + $ingrp = $1; + $impl = $src; + + if ($impl =~ m,.*/node_device_(\w+)\.c,) { + $impl = $1; + } else { + $impl =~ s,.*/(\w+?)_((\w+)_)?(\w+)\.c,$1,; } + + if ($groups{$ingrp}->{drivers}->{$impl}) { + die "Group $ingrp already contains $impl"; + } + + $groups{$ingrp}->{drivers}->{$impl} = {}; } } else { -- 2.7.3

On 06/29/2016 02:40 AM, Ján Tomko wrote:
This lets perl cache the regex.
Even though there are only eight groups, the speedup is more than that.
eight groups? I'm glad someone knows what they are...
--- docs/hvsupport.pl | 33 ++++++++++++++++----------------- 1 file changed, 16 insertions(+), 17 deletions(-)
Again - this seems to work for the current good environment. Although I can confess I didn't try to add a dummy API and ensure that it got flagged. I assume you've done that... Another weak ACK, John

This speeds up the whole script almost twice. --- docs/hvsupport.pl | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/hvsupport.pl b/docs/hvsupport.pl index 31821ad..8b2f8c6 100755 --- a/docs/hvsupport.pl +++ b/docs/hvsupport.pl @@ -209,6 +209,7 @@ foreach my $src (@srcs) { my $impl; while (defined($line = <FILE>)) { if (!$ingrp) { + next if not $line =~ /$grps/; if ($line =~ /^\s*(?:static\s+)?($grps)\s+(\w+)\s*=\s*{/ || $line =~ /^\s*(?:static\s+)?($grps)\s+NAME\(\w+\)\s*=\s*{/) { $ingrp = $1; -- 2.7.3

On 06/29/2016 02:40 AM, Ján Tomko wrote:
This speeds up the whole script almost twice.
"This" ?? Might be nice to know what this is.
--- docs/hvsupport.pl | 1 + 1 file changed, 1 insertion(+)
I have to assume This is ignoring lines that don't contain '$grps' If so, ACK, but please add a few more words about what This code is doing. John
participants (2)
-
John Ferlan
-
Ján Tomko