On a Friday in 2021, Tim Wiederhake wrote:
This is a wrapper for codespell [1], a spell checker for source code.
Codespell does not compare words to a dictionary, but rather works by
checking words against a list of common typos, making it produce fewer
false positives than other solutions.
The script in this patch works around the lack of per-directory ignore
lists and some oddities regarding capitalization in ignore lists.
[1] (
https://github.com/codespell-project/codespell/)
RFC:
Is there interest in having something like this in CI?
Adding it as a job with 'allow_failure: true' would let us see
how many false positives there are / how annoying it is.
Examples of spelling mistakes that were found using codespell:
4ad3c95f4bef5c7c9657de470fb74a4d14c8a331,
785a11cec8693de7df024aae68975dd1799b646a,
1452317b5c727eb17178942012f57f0c37631ae4.
Please drop the RFC part from the commit message.
Signed-off-by: Tim Wiederhake <twiederh(a)redhat.com>
---
scripts/check-spelling.py | 115 ++++++++++++++++++++++++++++++++++++++
1 file changed, 115 insertions(+)
create mode 100755 scripts/check-spelling.py
diff --git a/scripts/check-spelling.py b/scripts/check-spelling.py
new file mode 100755
index 0000000000..01371c0d1e
--- /dev/null
+++ b/scripts/check-spelling.py
@@ -0,0 +1,115 @@
+#!/usr/bin/env python3
+
+import argparse
+import re
+import subprocess
+import os
+
+
+IGNORE_LIST = [
+ # ignore all translation files
+ ("/po/", []),
+
+ # ignore this script
+ ("/scripts/check-spelling.py", []),
+
+ # 3rd-party: keycodemapdb
+ ("/src/keycodemapdb/", []),
+
+ # 3rd-party: VirtualBox SDK
+ ("/src/vbox/vbox_CAPI", [
+ "aAdd",
+ "aCount",
+ "aLocation",
+ "aNumber",
+ "aParent",
+ "progess"]),
+
+ # 3rd-party: qemu
+ ("/tests/qemucapabilitiesdata/caps_", "encyption"),
You can completely skip checking the files we got from the 3rd party.
I'm also getting:
("/tests/qemucapabilitiesdata/caps_6.2.0.aarch64.replies", "hace"),
# line 17966, "have"?
("/tests/qemucapabilitiesdata/caps_6.2.0.aarch64.replies", "hace"),
# line 18659, "have"?
("/tests/qemucapabilitiesdata/caps_6.2.0.aarch64.replies", "hace"),
# line 20619, "have"?
("/tests/qemucapabilitiesdata/caps_6.2.0.aarch64.replies", "hace"),
# line 20871, "have"?
+
[..]
>+def main():
>+ parser = argparse.ArgumentParser(description="Check spelling")
>+ parser.add_argument(
>+ "dir",
>+ help="Path to source directory",
>+ type=os.path.realpath)
>+ args = parser.parse_args()
+
>+ findings = [f for f in check_spelling(args.dir) if
not ignore(*f)]
>+ if findings:
>+ template = "(\"{0}\", \"{2}\"),\t# line {1},
\"{3}\"?"
>+ for finding in findings:
>+ print(template.format(*finding))
>+ exit("error: %s spelling errors" % len(findings))
+
+
>+if __name__ ==
"__main__":
>+ main()
I'm also getting:
("/src/qemu/qemu_process.c", "wee"), # line 1225, "we"?
("/src/qemu/qemu_process.c", "wee"), # line 2369, "we"?
("/.git/logs/HEAD", "capablities"), # line 459,
"capabilities"?
.git should be ignored completely too.
Reviewed-by: Ján Tomko <jtomko(a)redhat.com>
Jano