Skip to content

Add translation CI validation#4519

Open
Softer wants to merge 4 commits into
archlinux:masterfrom
Softer:feat-translation-ci
Open

Add translation CI validation#4519
Softer wants to merge 4 commits into
archlinux:masterfrom
Softer:feat-translation-ci

Conversation

@Softer
Copy link
Copy Markdown
Contributor

@Softer Softer commented May 4, 2026

Fix broken localization in global_menu.py:505: tr(f'Invalid configuration: {error}') evaluates the f-string before tr() runs, so the translation catalog never matches. Switched to tr('...{}').format(...) and updated the msgid in base.pot.

Add scripts/pot_tools.py developer utility: stats, list, add_missing [--dry-run] for managing base.pot.

Add translation-check CI workflow with two jobs:

  • validate-po (on .po changes): msgfmt --check on changed .po files, .mo sync warning. On .py changes: grep for tr(f"...") anti-pattern.
  • validate-pot (on .py changes): verify all tr() strings exist in base.pot.
  • Workflow triggers only on .py/.po/.pot file changes — no unnecessary runs.

Context

The previous translation-check workflow (added 2023, removed in #4483) failed because it validated all 35 languages on every PR, causing merge conflicts between translators. This new workflow checks only changed files and separates .po validation from .pot freshness checks.

Softer added 2 commits May 4, 2026 23:26
tr(f'Invalid configuration: {error}') evaluates the f-string before
tr() runs, so xgettext extracts the literal placeholder as the msgid
while runtime passes the formatted string - the two never match.
Switch to tr('...{}').format(...) and update msgid in base.pot.
Add translation-check workflow with two jobs:
- validate-po: msgfmt --check on changed .po files, .mo sync warning,
  tr(f-string) anti-pattern grep on changed .py files
- validate-pot: verify all tr() strings exist in base.pot when .py
  files change

Workflow only triggers on .py/.po/.pot file changes.

Add scripts/pot_tools.py developer utility (stats, list, add_missing)
for managing base.pot.
@Softer Softer requested a review from Torxed as a code owner May 4, 2026 20:30
@Softer
Copy link
Copy Markdown
Contributor Author

Softer commented May 4, 2026

I summon @svartkanin for discussion :-)

Align check_pot_freshness.py and pot_tools.py with project
indentation (tabs) and ruff format requirements.

Sorry :-)
@svartkanin
Copy link
Copy Markdown
Collaborator

I'll have a look at it when I get some time

Comment thread .github/scripts/check_pot_freshness.py Outdated
@@ -0,0 +1,56 @@
#!/usr/bin/env python3
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets not roll our own parser, there is msgcmp to do this

Comment thread scripts/pot_tools.py Outdated
fresh_pot.unlink(missing_ok=True)


def cmd_add_missing(dry_run: bool = False) -> None:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use manual appending msgcat and msgmerge

Comment thread scripts/pot_tools.py Outdated
BASE_POT = ARCHINSTALL_DIR / 'locales' / 'base.pot'


def extract_msgids(path: Path) -> set[str]:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this implemented twice?
Also please do not do this. You are hand rolling your own parser which firstly is not trivital and secondly doesn't work as it doesn't handle multi line messages.

Comment thread scripts/pot_tools.py Outdated
return ids


def generate_fresh_pot() -> Path:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already part of

find . -type f -iname "*.py" | xargs xgettext --join-existing --no-location --omit-header --keyword='tr' -d base -o locales/base.pot

We shouldn't create multiple scripts for doing the same thing. Either we enhance the existing one or we replace it.

@svartkanin
Copy link
Copy Markdown
Collaborator

There are a lot of things happening here. As pointed out there is already a helper script for minimal things

find . -type f -iname "*.py" | xargs xgettext --join-existing --no-location --omit-header --keyword='tr' -d base -o locales/base.pot
. I'd prefer if we either enhance that or replace it.
But given that there are a lot of moving pieces can we please do this iteratively with multiple smaller PRs.

In addition, please do not roll your own parser. It is not trivial and prone to edge cases that are difficult to handle. There are already existing tools msg* available that allow you to do everything you want.

Address review feedback: use standard gettext msgcmp instead of
hand-rolled parser for base.pot freshness check. Remove pot_tools.py
that duplicated locales_generator.sh functionality.
@Softer
Copy link
Copy Markdown
Contributor Author

Softer commented May 13, 2026

Is it better now?

- 'archinstall/**/*.py'
- 'archinstall/locales/**'
- '.github/workflows/translation-check.yaml'
jobs:
Copy link
Copy Markdown
Collaborator

@svartkanin svartkanin May 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can simplify this even more, for example

jobs:
    translations:
        name: Validate translations
        runs-on: ubuntu-latest
        steps:
            - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6
            - name: Install gettext
              run: sudo apt-get update && sudo apt-get install -y gettext

            - name: Validate .po syntax
              run: |
                  set -e
                  find archinstall/locales -name '*.po' -print0 \
                      | xargs -0 -n1 msgfmt --check --output-file=/dev/null

            - name: Reject tr(f-string) anti-pattern
              run: |
                  if grep -rnE "tr\(\s*f['\"]" archinstall --include='*.py'; then
                      echo "::error::Use tr('...{}').format(...) instead of tr(f'...')"
                      exit 1
                  fi

            - name: Verify base.pot is up to date
              run: |
                  cp archinstall/locales/base.pot /tmp/committed.pot
                  ( cd archinstall && find . -type f -iname '*.py' \
                      | xargs xgettext --no-location --omit-header --keyword=tr \
                            -d base -o locales/base.pot )
                  if ! diff -q /tmp/committed.pot archinstall/locales/base.pot; then
                      echo "::error::base.pot is out of date - run archinstall/locales/locales_generator.sh all"
                      diff -u /tmp/committed.pot archinstall/locales/base.pot || true
                      exit 1
                  fi

All these checks are only in the GH action which makes them difficult to test, so I suggest to put these into the already existing locale_generator.sh file and call them from here. That means during dev they can actually be called and tested and there is a single source of truth.
At that point this whole action can be reduced to

jobs:
    translations:
        name: Validate translations
        runs-on: ubuntu-latest
        steps:
            - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6
            - name: Install gettext
              run: sudo apt-get update && sudo apt-get install -y gettext
            - name: Run translation checks
              run: bash archinstall/locales/locales_generator.sh check

and the checks implemented in the bash file instead. As an example you can then have this in the script

...
cmd_check_no_tr_fstring() {
	echo "Checking for tr(f-string) anti-pattern..."
	if grep -rnE "tr\(\s*f['\"]" . --include='*.py'; then
		echo "ERROR: use tr('...{}').format(...) instead of tr(f'...')" >&2
		return 1
	fi
}

cmd_check() {
	cmd_check_no_tr_fstring
     ...
	echo "All translation checks passed."
}

case "${1}" in
	check)    shift; cmd_check "$@" ;;
	generate) shift; cmd_generate "$@" ;;
	-h|--help) usage ;;
	*)        cmd_generate "$@" ;;  # back-compat: bare `<lang>` means generate
esac

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants