On Thu, Apr 12, 2018 at 02:28:22PM +0100, Daniel P. Berrangé wrote:
> Similar to the libvirt.pot, .po files contain line numbers and file
> names identifying where in the source a translatable string comes from.
> The source locations in the .po files are thrown away and replaced with
> content from the libvirt.pot whenever msgmerge is run, so this is not
> precious information that needs to be stored in git.
>
> When msgmerge processes a .po file, it will add in any msgids from the
> libvirt.pot that were not already present. Thus, if a particular msgid
> currently has no translation, it can be considered redundant and again
> does not need storing in git.
>
> When msgmerge processes a .po file and can't find an exact existing
> translation match, it will try todo fuzzy matching instead, marking such
> entries with a "# fuzzy" comment to alert the translator to take a
> look and either discard, edit or accept the match. Looking at the
> existing fuzzy matches in .po files shows that the quality is awful,
> with many having a completely different set of printf format specifiers
> between the msgid and fuzzy msgstr entry. Fortunately when msgfmt
> generates the .gmo, the fuzzy entries are all ignored anyway. The fuzzy
> entries could be useful to translators if they were working on the .po
> files directly from git, but Libvirt outsourced translation to the
> Fedora Zanata system, so keeping fuzzy matches in git is not much help.
>
> Finally, by default msgids are sorted based on source location. Thus, if
> a bit of code with translatable text is moved from one file to another,
> it may shift around in the .po file, despite the msgid not itself changing.
> If the msgids were sorted alphabetically, the .po files would have
> stable ordering when code is refactored.
>
> This patch takes advantage of the above observations to canonicalize
> and minimize the content stored for .po files in git. Instead of storing
> the real .po files, we now store .mini.po files.
>
> The .mini.po files are the same file format as .po files, but have no
> source location comments, are sorted alphabetically, and all fuzzy
> msgstrs and msgids with no translation are discarded. This cuts the size
> of content in the po directory from 109MB to 19MB.
>
> Users working from a libvirt git checkout who need the full .po files
> can run "make update-po", which merges the libvirt.pot and .mini.po
> file to create a .po file containing all the content previously stored
> in git.
>
> Conversely if a full .po file has been modified, for example, by
> downloading new content from Zanata, the .mini.po files can be updated
> by running "make update-mini-po". The resulting diffs of the .mini.po
> file will clearly show the changed translations without any of the noise
> that previously obscured content. Being able to see content changes
> clearly actually identified a bug in the zanata python client where it
> was adding bogus "fuzzy" annotations to many messages:
>
>
https://bugzilla.redhat.com/show_bug.cgi?id=1564497
>
> Users working from libvirt releases should not see any difference in
> behaviour, since the tarballs only contain the full .po files, not the
> .mini.po files.
>
> As an added benefit, generating tarballs with "make dist", will no
> longer cause creation of dirty files in git, since it won't touch the
> .mini.po files, only the .po files which are no longer kept in git.
>
> To avoid creating a single commit 100+MB in size, each language is
> minimized separately in a following commit.
From a brief look at those, the few Slovak "translations" are all in
English and many of the translation team pages still point to transifex,
but I assume that data comes from Zanata.
Yeah there's a few other languages too where, for unknown reasons, the
english has been duplicated into the translation. I could go clicky-clicky
and kill that in Zanata UI but there's alot, so I want to figure out a way
to automatically extract that list of bad translations & cull them all in
one go via the API.
Good point about the translation URLs pointing to transifex. I'll submit
another patch for that too.
> Signed-off-by: Daniel P. Berrangé <berrange(a)redhat.com>
> ---
> .gitignore | 3 +++
> build-aux/minimize-po.pl | 37 +++++++++++++++++++++++++++++++++
> po/Makefile.am | 30 ++++++++++++++-------------
> po/README.md | 53 +++++++++++++++++++++++++++++++++++++++++-------
> 4 files changed, 102 insertions(+), 21 deletions(-)
> create mode 100755 build-aux/minimize-po.pl
>
Reviewed-by: Ján Tomko <jtomko(a)redhat.com>
Jano