Add translation CI validation#4519
Conversation
tr(f'Invalid configuration: {error}') evaluates the f-string before
tr() runs, so xgettext extracts the literal placeholder as the msgid
while runtime passes the formatted string - the two never match.
Switch to tr('...{}').format(...) and update msgid in base.pot.
Add translation-check workflow with two jobs: - validate-po: msgfmt --check on changed .po files, .mo sync warning, tr(f-string) anti-pattern grep on changed .py files - validate-pot: verify all tr() strings exist in base.pot when .py files change Workflow only triggers on .py/.po/.pot file changes. Add scripts/pot_tools.py developer utility (stats, list, add_missing) for managing base.pot.
|
I summon @svartkanin for discussion :-) |
Align check_pot_freshness.py and pot_tools.py with project indentation (tabs) and ruff format requirements. Sorry :-)
|
I'll have a look at it when I get some time |
| @@ -0,0 +1,56 @@ | |||
| #!/usr/bin/env python3 | |||
There was a problem hiding this comment.
Lets not roll our own parser, there is msgcmp to do this
| fresh_pot.unlink(missing_ok=True) | ||
|
|
||
|
|
||
| def cmd_add_missing(dry_run: bool = False) -> None: |
There was a problem hiding this comment.
Don't use manual appending msgcat and msgmerge
| BASE_POT = ARCHINSTALL_DIR / 'locales' / 'base.pot' | ||
|
|
||
|
|
||
| def extract_msgids(path: Path) -> set[str]: |
There was a problem hiding this comment.
why is this implemented twice?
Also please do not do this. You are hand rolling your own parser which firstly is not trivital and secondly doesn't work as it doesn't handle multi line messages.
| return ids | ||
|
|
||
|
|
||
| def generate_fresh_pot() -> Path: |
There was a problem hiding this comment.
This is already part of
We shouldn't create multiple scripts for doing the same thing. Either we enhance the existing one or we replace it.
|
There are a lot of things happening here. As pointed out there is already a helper script for minimal things . I'd prefer if we either enhance that or replace it.But given that there are a lot of moving pieces can we please do this iteratively with multiple smaller PRs. In addition, please do not roll your own parser. It is not trivial and prone to edge cases that are difficult to handle. There are already existing tools |
Address review feedback: use standard gettext msgcmp instead of hand-rolled parser for base.pot freshness check. Remove pot_tools.py that duplicated locales_generator.sh functionality.
|
Is it better now? |
| - 'archinstall/**/*.py' | ||
| - 'archinstall/locales/**' | ||
| - '.github/workflows/translation-check.yaml' | ||
| jobs: |
There was a problem hiding this comment.
We can simplify this even more, for example
jobs:
translations:
name: Validate translations
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
- name: Install gettext
run: sudo apt-get update && sudo apt-get install -y gettext
- name: Validate .po syntax
run: |
set -e
find archinstall/locales -name '*.po' -print0 \
| xargs -0 -n1 msgfmt --check --output-file=/dev/null
- name: Reject tr(f-string) anti-pattern
run: |
if grep -rnE "tr\(\s*f['\"]" archinstall --include='*.py'; then
echo "::error::Use tr('...{}').format(...) instead of tr(f'...')"
exit 1
fi
- name: Verify base.pot is up to date
run: |
cp archinstall/locales/base.pot /tmp/committed.pot
( cd archinstall && find . -type f -iname '*.py' \
| xargs xgettext --no-location --omit-header --keyword=tr \
-d base -o locales/base.pot )
if ! diff -q /tmp/committed.pot archinstall/locales/base.pot; then
echo "::error::base.pot is out of date - run archinstall/locales/locales_generator.sh all"
diff -u /tmp/committed.pot archinstall/locales/base.pot || true
exit 1
fi
All these checks are only in the GH action which makes them difficult to test, so I suggest to put these into the already existing locale_generator.sh file and call them from here. That means during dev they can actually be called and tested and there is a single source of truth.
At that point this whole action can be reduced to
jobs:
translations:
name: Validate translations
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
- name: Install gettext
run: sudo apt-get update && sudo apt-get install -y gettext
- name: Run translation checks
run: bash archinstall/locales/locales_generator.sh check
and the checks implemented in the bash file instead. As an example you can then have this in the script
...
cmd_check_no_tr_fstring() {
echo "Checking for tr(f-string) anti-pattern..."
if grep -rnE "tr\(\s*f['\"]" . --include='*.py'; then
echo "ERROR: use tr('...{}').format(...) instead of tr(f'...')" >&2
return 1
fi
}
cmd_check() {
cmd_check_no_tr_fstring
...
echo "All translation checks passed."
}
case "${1}" in
check) shift; cmd_check "$@" ;;
generate) shift; cmd_generate "$@" ;;
-h|--help) usage ;;
*) cmd_generate "$@" ;; # back-compat: bare `<lang>` means generate
esac
Fix broken localization in
global_menu.py:505:tr(f'Invalid configuration: {error}')evaluates the f-string beforetr()runs, so the translation catalog never matches. Switched totr('...{}').format(...)and updated the msgid inbase.pot.Add
scripts/pot_tools.pydeveloper utility:stats,list,add_missing [--dry-run]for managingbase.pot.Add
translation-checkCI workflow with two jobs:msgfmt --checkon changed.pofiles,.mosync warning. On .py changes: grep fortr(f"...")anti-pattern.tr()strings exist inbase.pot..py/.po/.potfile changes — no unnecessary runs.Context
The previous translation-check workflow (added 2023, removed in #4483) failed because it validated all 35 languages on every PR, causing merge conflicts between translators. This new workflow checks only changed files and separates
.povalidation from.potfreshness checks.