Currently the server side hook that runs "git diff --check"
to prevent pushing a change that adds trailing blanks is
more strict than our "make syntax-check" hook, since the former
rejects any change that adds blank lines at the end of a file,
while "make syntax-check" doesn't complain about that.
The two should be consistent.
One way is to make "make syntax-check" more strict.
If we were to do that, we'd have to choose between
cleaning existing files and exempting them from the new test.
Cleaning is easy and doesn't impact tests at all, so I prefer it.
Here's what would be involved:
- remove 121 trailing newlines from 109 files by running this command:
git ls-files -z | xargs -0 perl -pi -0777 -e 's/\n\n+$/\n/'
Add a rule to cfg.mk so that "make syntax-check" warns about
any new violations. It might run something like this:
git ls-files -z \
| xargs -0 perl -ln -0777 -e '/\n(\n+)$/ and print "$ARGV: ".length
$1'
That command prints the name of each offending file with its trailing
blank line count. While it takes well under a second on my system,
(admittedly, with a hot cache), it's not well optimized, reading
each file into memory and processing it.
If it matters, we can come up with a more efficient (yet still portable)
way to compare the last two bytes of each file to "\n\n".
I went ahead and wrote a nearly-minimal script to do that.
Rather than reading/processing all 27MB of sources,
this reads just the last 2 bytes of each of the 1048 files,
comparing those bytes to "\n\n" and printing the name when
there's a match:
git ls-files -z \
| xargs -0 perl -le '
foreach my $f (@ARGV) {
open F,"<",$f or (warn "failed to open $f: $!\n"), next;
my $p = sysseek(F, -2, 2);
# seek failure probably means file has < 2 bytes; ignore
my $two;
defined $p and $p = sysread F,$two,2;
close F;
# ignore read failure
$p && $two eq "\n\n" and (print $f),$fail=1;
} END {exit defined $fail ? 1 : 0}'
However, counting minor page faults, there's little difference
(2193 before, 1976 after), but maximum memory consumption is probably
way down. I didn't measure that.
With a hot cache, the latter takes .02elapsed,
and the former takes .09 seconds.
I'm leaning towards the simplicity of the former, in spite of its cost.
I'll bet someone can come up with a simple *and* efficient script
(probably using sed) to list files with one or more trailing blank line.