Re: [libvirt] [PATCH v3 03/22] build-aux: rewrite po file minimizer in Python

26 Sep 2019

On Thu, Sep 26, 2019 at 12:39:39PM +0200, Erik Skultety wrote:
...
On Tue, Sep 24, 2019 at 03:58:44PM +0100, Daniel P. Berrangé wrote:
...
As part of an goal to eliminate Perl from libvirt build tools,
rewrite the minimize-po.pl tool in Python.
This was a straight conversion, manually going line-by-line to
change the syntax from Perl to Python. Thus the overall structure
of the file and approach is the same.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 Makefile.am              |  2 +-
 build-aux/minimize-po.pl | 37 -------------------------
 build-aux/minimize-po.py | 60 ++++++++++++++++++++++++++++++++++++++++
 po/Makefile.am           |  2 +-
 4 files changed, 62 insertions(+), 39 deletions(-)
 delete mode 100755 build-aux/minimize-po.pl
 create mode 100755 build-aux/minimize-po.py

diff --git a/Makefile.am b/Makefile.am
index 17448a914e..8f688d40d0 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -45,7 +45,7 @@ EXTRA_DIST = \
   build-aux/check-spacing.pl \
   build-aux/gitlog-to-changelog \
   build-aux/header-ifdef.pl \
-  build-aux/minimize-po.pl \
+  build-aux/minimize-po.py \
   build-aux/mock-noinline.pl \
   build-aux/prohibit-duplicate-header.pl \
   build-aux/useless-if-before-free \
diff --git a/build-aux/minimize-po.pl b/build-aux/minimize-po.pl
deleted file mode 100755
index 497533a836..0000000000
--- a/build-aux/minimize-po.pl
+++ /dev/null
@@ -1,37 +0,0 @@
-#!/usr/bin/perl
-
-my @block;
-my $msgstr = 0;
-my $empty = 0;
-my $unused = 0;
-my $fuzzy = 0;
-while (<>) {
-    if (/^$/) {
-        if (!$empty && !$unused && !$fuzzy) {
-            print @block;
-        }
-        @block = ();
-        $msgstr = 0;
-        $fuzzy = 0;
-        push @block, $_;
-    } else {
-        if (/^msgstr/) {
-            $msgstr = 1;
-            $empty = 1;
-        }
-        if (/^#.*fuzzy/) {
-            $fuzzy = 1;
-        }
-        if (/^#~ msgstr/) {
-            $unused = 1;
-        }
-        if ($msgstr && /".+"/) {
-            $empty = 0;
-        }
-        push @block, $_;
-    }
-}
-
-if (@block && !$empty && !$unused) {
I guess the fact !$fuzzy was missing in this condition was a bug that the new
python code doesn't suffer from, right?
Yeah it was a pre-existing bug, but harmless.
...
...
diff --git a/build-aux/minimize-po.py b/build-aux/minimize-po.py
new file mode 100755
index 0000000000..5046bacede
--- /dev/null
+++ b/build-aux/minimize-po.py
@@ -0,0 +1,60 @@
+#!/usr/bin/env python
+#
+# Copyright (C) 2018-2019 Red Hat, Inc.
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library.  If not, see
+# <http://www.gnu.org/licenses/>.
+
+from __future__ import print_function
+
+import re
+import sys
+
+block = []
+msgstr = False
+empty = False
+unused = False
+fuzzy = False
+
+strprog = re.compile(r'''.*".+".*''')
question 1) what's the benefit of compiling a regex and using it only once? Btw
python does cache every pattern passed to re.match (and friends) so compilation
IMO hardly ever makes sense unless you're doing 1000s of searches for the same
pattern in which case the latency would naturally accumulate.
Ah, I've just seen the docs now

  "The compiled versions of the most recent patterns passed to 
   re.compile() and the module-level matching functions are 
   cached, so programs that use only a few regular expressions 
   at a time needn’t worry about compiling regular expressions."

so with this in mind, I can probably just remove the 'compile' step from
all the scripts in this series. I haven't used it consistently to start
with.
...
question 2) why do we need the '''.* and .*''' parts compared to the original
perl regex? I'll go ahead and assume it's because re.match matches at the
beginning of a string by default, in which case, shouldn't we use re.search
which matches anywhere (that's what perl does by default) instead?
Yeah, I didn't notice the 're.search' function existed & had the semantics
closer to Perl.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|