ci: run unit tests on emerge/** PRs by eugenevinitsky · Pull Request #444 · Emerge-Lab/PufferDrive

eugenevinitsky · 2026-05-25T23:33:42Z

Why

The Unit tests workflow (utest.yml) — C ini-parser, test_drive_config.py, and the carla/nuplan/womd map smoke test — only triggered on PRs targeting main:

on:
  pull_request:
    branches: [ main ]

We also update the tests to pass.

The Unit tests workflow was gated to PRs targeting main, so it never ran on the emerge/temp_training lineage (e.g. #439). Enable it for emerge/** and port the suite, which was written against the main/puffer-4 config and layout: - utest.yml: add emerge/** to the pull_request/push branch filters; install with `pip install -e .` under PUFFER_CPU=1 (this lineage has no [cpu] extra and builds the CUDA backend by default, so force the CPU build on the runner). - test_drive_config.py: assert this lineage's loaded config defaults (rnn_name None, torch_deterministic False, policy backbone_hidden_size 512, rnn input/hidden 512, vec num_workers auto / num_envs 20). - test_drive_map_types.py: the carla fixture directory is `carla`. - tests/ini_parser: point CMakeLists at extern/inih-r62 (the vendored source location here) and add `set -euo pipefail` to build_n_test.sh so a build or ctest failure actually fails the step instead of exiting 0. Verified locally on a CPU build: ini-parser 4/4, test_drive_config 4 pass / 1 skip, map-types 5/5. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

This PR extends the unit-test CI workflow so it also runs for PRs and pushes targeting emerge/** branches, and updates the unit test suite to match the emerge lineage’s current config/layout so those checks pass and provide signal on that branch family.

Changes:

Update .github/workflows/utest.yml to trigger on main and emerge/**, and adjust install/build steps for this lineage.
Update config-driven unit test expectations in tests/test_drive_config.py to reflect current defaults (e.g., torch_deterministic, policy/RNN sizes, vec defaults).
Fix map fixture name and ini-parser test wiring (tests/test_drive_map_types.py, tests/ini_parser/*) including stricter shell error handling.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`.github/workflows/utest.yml`	Run unit tests on `main` + `emerge/**`; adjust install/build environment for CI.
`tests/test_drive_config.py`	Update expected loaded config defaults for this branch lineage.
`tests/test_drive_map_types.py`	Update Carla fixture subdirectory name used by the smoke test.
`tests/ini_parser/CMakeLists.txt`	Point ini-parser CMake test to the inih vendoring location used by this lineage.
`tests/ini_parser/build_n_test.sh`	Make the ini-parser build/test script fail-fast (`set -euo pipefail`).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+          pip install -e . --no-cache-dir
        env:
          TMPDIR: ${{ runner.temp }}/build
          PIP_NO_CACHE_DIR: 1
+          PUFFER_CPU: 1



      - name: Compile C extensions
        run: python setup.py build_ext --inplace --force
+        env:
+          PUFFER_CPU: 1


The value pins (num_agents, hidden sizes, vec counts, rnn_name, torch_deterministic, ...) re-encoded a snapshot of drive.ini/default.ini: they broke on every routine config tune and silently lagged schema changes (e.g. the [policy] hidden_size -> backbone_hidden_size rename). They tested the config file, not the loader. Keep the loader/parser behavior tests that don't rot: test_load_config (loads to a populated dict without raising), test_cli_override (CLI overrides ini), and the comment-handling tests. Remove the all-pins test_drive_ini_config and its never-executed ASSERTION_LEVEL>=3 tier. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Wire the remaining Drive/ocean tests into CI for emerge/** and fix their lineage drift (they targeted the main/puffer-4 paths and APIs): - utest.yml: add the previously-orphaned pytest suites (test_drive_scenario_length, test_eval_manager, test_validation_replay_html, ocean/benchmark/{geometry,map_metrics,road_edges,ttc}). - render-ci.yml / train-ci.yml: add emerge/** to the branch filters so test_drive_render and test_drive_train run in the headless-render / training environments those workflows already provide; install `pip install -e .` under PUFFER_CPU=1 (no [cpu] extra on this lineage). - test_drive_train: map_dir -> pufferlib/resources/drive/binaries/carla (the resources live under pufferlib/ here). - test_eval_manager: _run_rollout_loop now reads args["env"]; give the done-state test an env section. - test_validation_replay_html: render_backend html -> triage_html (renamed). Excluded: test_c_advantage.cu (CUDA, needs a GPU runner) and test_env_binding (a breakout test, not Drive). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

These cover other environments (atari, pokemon_red, nmmo3, squared/pong, ...) and generic PufferLib infra (emulation flatten/namespace/nested, vectorization pool, sweep, policy_pool, record/rich/utils, ...) that the Drive/ocean work doesn't use. Keeps the Drive/ocean suite: test_drive_*, test_eval_manager, test_validation_replay_html, test_simulator_perf, ini_parser/, drive/. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The standalone visualize binary is retired, so the old test_drive_render (which built and ran ./visualize) exercised dead code. Replace it with a smoke test of the modern egl -> ffmpeg mp4 render backend through the Python evaluator. EGL is Linux-only, so the test skips on macOS; render-ci.yml is repurposed to run it on Linux with ffmpeg + a headless GL stack, on main and emerge/**. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Stop at target_steps=2000 (was 50000) — a handful of optimizer updates is enough to exercise env -> rollout -> optimize end-to-end — and point map_dir at pufferlib/resources/drive/binaries/carla (resources live under pufferlib/ on this lineage; the old relative path no longer resolves). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

puffernet is PufferLib's hand-written C neural-net inference, the same C path the now-retired visualize binary used; nothing in the Drive training/eval stack (all torch) exercises it. No workflow ran it. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Parametrize the HTML render smoke over both CPU-only HTML backends (triage_html, obs_html). obs_html's viz unpacks the NN observation, so the env config now also carries the obs-shape keys (action_type, dynamics_model, target_type, obs maxima); the env is built with those same values so the two agree. Assert the largest produced HTML is non-trivial, since obs_html also writes a small index.html. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

This lineage's Drive policy has no single hidden_size; it uses separate backbone/actor/critic hidden sizes. Set those (64) instead of hidden_size=64, which raised Drive.__init__() got an unexpected keyword argument 'hidden_size' and then hung the run until the 20-minute timeout. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The egl mp4 render initialized on the GH runner but produced an empty mp4 — software Mesa EGL on a CPU-only runner isn't rendering real frames for raylib, and the renderer also looks for assets at resources/drive/* (a main-era relative path; this lineage's assets live under pufferlib/resources/drive/). Getting it green would need a working headless GL stack (and likely a GPU runner), which isn't worth pursuing on free CI here. The render *pipeline* is already covered by the HTML backends (triage_html, obs_html) in utest; egl mp4 stays a local/GPU check. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 25, 2026 23:33

Copilot started reviewing on behalf of eugenevinitsky May 25, 2026 23:33 View session

Copilot AI reviewed May 25, 2026

View reviewed changes

Eugene Vinitsky and others added 9 commits May 25, 2026 20:02

vcharraut approved these changes May 26, 2026

View reviewed changes

eugenevinitsky merged commit ba4ac7c into emerge/temp_training May 26, 2026
13 checks passed

eugenevinitsky deleted the ev/ci-utest-emerge branch May 26, 2026 11:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: run unit tests on emerge/** PRs#444

ci: run unit tests on emerge/** PRs#444
eugenevinitsky merged 10 commits into
emerge/temp_trainingfrom
ev/ci-utest-emerge

eugenevinitsky commented May 25, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

eugenevinitsky commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

eugenevinitsky commented May 25, 2026 •

edited

Loading