ci: run unit tests on emerge/** PRs#444
Merged
Merged
Conversation
The Unit tests workflow was gated to PRs targeting main, so it never ran on the emerge/temp_training lineage (e.g. #439). Enable it for emerge/** and port the suite, which was written against the main/puffer-4 config and layout: - utest.yml: add emerge/** to the pull_request/push branch filters; install with `pip install -e .` under PUFFER_CPU=1 (this lineage has no [cpu] extra and builds the CUDA backend by default, so force the CPU build on the runner). - test_drive_config.py: assert this lineage's loaded config defaults (rnn_name None, torch_deterministic False, policy backbone_hidden_size 512, rnn input/hidden 512, vec num_workers auto / num_envs 20). - test_drive_map_types.py: the carla fixture directory is `carla`. - tests/ini_parser: point CMakeLists at extern/inih-r62 (the vendored source location here) and add `set -euo pipefail` to build_n_test.sh so a build or ctest failure actually fails the step instead of exiting 0. Verified locally on a CPU build: ini-parser 4/4, test_drive_config 4 pass / 1 skip, map-types 5/5. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR extends the unit-test CI workflow so it also runs for PRs and pushes targeting emerge/** branches, and updates the unit test suite to match the emerge lineage’s current config/layout so those checks pass and provide signal on that branch family.
Changes:
- Update
.github/workflows/utest.ymlto trigger onmainandemerge/**, and adjust install/build steps for this lineage. - Update config-driven unit test expectations in
tests/test_drive_config.pyto reflect current defaults (e.g.,torch_deterministic, policy/RNN sizes, vec defaults). - Fix map fixture name and ini-parser test wiring (
tests/test_drive_map_types.py,tests/ini_parser/*) including stricter shell error handling.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
.github/workflows/utest.yml |
Run unit tests on main + emerge/**; adjust install/build environment for CI. |
tests/test_drive_config.py |
Update expected loaded config defaults for this branch lineage. |
tests/test_drive_map_types.py |
Update Carla fixture subdirectory name used by the smoke test. |
tests/ini_parser/CMakeLists.txt |
Point ini-parser CMake test to the inih vendoring location used by this lineage. |
tests/ini_parser/build_n_test.sh |
Make the ini-parser build/test script fail-fast (set -euo pipefail). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+41
to
46
| pip install -e . --no-cache-dir | ||
| env: | ||
| TMPDIR: ${{ runner.temp }}/build | ||
| PIP_NO_CACHE_DIR: 1 | ||
| PUFFER_CPU: 1 | ||
|
|
Comment on lines
47
to
+50
| - name: Compile C extensions | ||
| run: python setup.py build_ext --inplace --force | ||
| env: | ||
| PUFFER_CPU: 1 |
The value pins (num_agents, hidden sizes, vec counts, rnn_name, torch_deterministic, ...) re-encoded a snapshot of drive.ini/default.ini: they broke on every routine config tune and silently lagged schema changes (e.g. the [policy] hidden_size -> backbone_hidden_size rename). They tested the config file, not the loader. Keep the loader/parser behavior tests that don't rot: test_load_config (loads to a populated dict without raising), test_cli_override (CLI overrides ini), and the comment-handling tests. Remove the all-pins test_drive_ini_config and its never-executed ASSERTION_LEVEL>=3 tier. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wire the remaining Drive/ocean tests into CI for emerge/** and fix their
lineage drift (they targeted the main/puffer-4 paths and APIs):
- utest.yml: add the previously-orphaned pytest suites
(test_drive_scenario_length, test_eval_manager, test_validation_replay_html,
ocean/benchmark/{geometry,map_metrics,road_edges,ttc}).
- render-ci.yml / train-ci.yml: add emerge/** to the branch filters so
test_drive_render and test_drive_train run in the headless-render / training
environments those workflows already provide; install `pip install -e .` under
PUFFER_CPU=1 (no [cpu] extra on this lineage).
- test_drive_train: map_dir -> pufferlib/resources/drive/binaries/carla (the
resources live under pufferlib/ here).
- test_eval_manager: _run_rollout_loop now reads args["env"]; give the
done-state test an env section.
- test_validation_replay_html: render_backend html -> triage_html (renamed).
Excluded: test_c_advantage.cu (CUDA, needs a GPU runner) and test_env_binding
(a breakout test, not Drive).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
These cover other environments (atari, pokemon_red, nmmo3, squared/pong, ...) and generic PufferLib infra (emulation flatten/namespace/nested, vectorization pool, sweep, policy_pool, record/rich/utils, ...) that the Drive/ocean work doesn't use. Keeps the Drive/ocean suite: test_drive_*, test_eval_manager, test_validation_replay_html, test_simulator_perf, ini_parser/, drive/. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The standalone visualize binary is retired, so the old test_drive_render (which built and ran ./visualize) exercised dead code. Replace it with a smoke test of the modern egl -> ffmpeg mp4 render backend through the Python evaluator. EGL is Linux-only, so the test skips on macOS; render-ci.yml is repurposed to run it on Linux with ffmpeg + a headless GL stack, on main and emerge/**. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Stop at target_steps=2000 (was 50000) — a handful of optimizer updates is enough to exercise env -> rollout -> optimize end-to-end — and point map_dir at pufferlib/resources/drive/binaries/carla (resources live under pufferlib/ on this lineage; the old relative path no longer resolves). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
puffernet is PufferLib's hand-written C neural-net inference, the same C path the now-retired visualize binary used; nothing in the Drive training/eval stack (all torch) exercises it. No workflow ran it. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Parametrize the HTML render smoke over both CPU-only HTML backends (triage_html, obs_html). obs_html's viz unpacks the NN observation, so the env config now also carries the obs-shape keys (action_type, dynamics_model, target_type, obs maxima); the env is built with those same values so the two agree. Assert the largest produced HTML is non-trivial, since obs_html also writes a small index.html. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This lineage's Drive policy has no single hidden_size; it uses separate backbone/actor/critic hidden sizes. Set those (64) instead of hidden_size=64, which raised Drive.__init__() got an unexpected keyword argument 'hidden_size' and then hung the run until the 20-minute timeout. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The egl mp4 render initialized on the GH runner but produced an empty mp4 — software Mesa EGL on a CPU-only runner isn't rendering real frames for raylib, and the renderer also looks for assets at resources/drive/* (a main-era relative path; this lineage's assets live under pufferlib/resources/drive/). Getting it green would need a working headless GL stack (and likely a GPU runner), which isn't worth pursuing on free CI here. The render *pipeline* is already covered by the HTML backends (triage_html, obs_html) in utest; egl mp4 stays a local/GPU check. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
vcharraut
approved these changes
May 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The Unit tests workflow (
utest.yml) — C ini-parser,test_drive_config.py, and the carla/nuplan/womd map smoke test — only triggered on PRs targetingmain:We also update the tests to pass.