Skip to content

ci: run unit tests on emerge/** PRs#444

Merged
eugenevinitsky merged 10 commits into
emerge/temp_trainingfrom
ev/ci-utest-emerge
May 26, 2026
Merged

ci: run unit tests on emerge/** PRs#444
eugenevinitsky merged 10 commits into
emerge/temp_trainingfrom
ev/ci-utest-emerge

Conversation

@eugenevinitsky
Copy link
Copy Markdown

@eugenevinitsky eugenevinitsky commented May 25, 2026

Why

The Unit tests workflow (utest.yml) — C ini-parser, test_drive_config.py, and the carla/nuplan/womd map smoke test — only triggered on PRs targeting main:

on:
  pull_request:
    branches: [ main ]

We also update the tests to pass.

The Unit tests workflow was gated to PRs targeting main, so it never ran on
the emerge/temp_training lineage (e.g. #439). Enable it for emerge/** and port
the suite, which was written against the main/puffer-4 config and layout:

- utest.yml: add emerge/** to the pull_request/push branch filters; install
  with `pip install -e .` under PUFFER_CPU=1 (this lineage has no [cpu] extra
  and builds the CUDA backend by default, so force the CPU build on the runner).
- test_drive_config.py: assert this lineage's loaded config defaults
  (rnn_name None, torch_deterministic False, policy backbone_hidden_size 512,
  rnn input/hidden 512, vec num_workers auto / num_envs 20).
- test_drive_map_types.py: the carla fixture directory is `carla`.
- tests/ini_parser: point CMakeLists at extern/inih-r62 (the vendored source
  location here) and add `set -euo pipefail` to build_n_test.sh so a build or
  ctest failure actually fails the step instead of exiting 0.

Verified locally on a CPU build: ini-parser 4/4, test_drive_config 4 pass /
1 skip, map-types 5/5.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 25, 2026 23:33
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the unit-test CI workflow so it also runs for PRs and pushes targeting emerge/** branches, and updates the unit test suite to match the emerge lineage’s current config/layout so those checks pass and provide signal on that branch family.

Changes:

  • Update .github/workflows/utest.yml to trigger on main and emerge/**, and adjust install/build steps for this lineage.
  • Update config-driven unit test expectations in tests/test_drive_config.py to reflect current defaults (e.g., torch_deterministic, policy/RNN sizes, vec defaults).
  • Fix map fixture name and ini-parser test wiring (tests/test_drive_map_types.py, tests/ini_parser/*) including stricter shell error handling.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
.github/workflows/utest.yml Run unit tests on main + emerge/**; adjust install/build environment for CI.
tests/test_drive_config.py Update expected loaded config defaults for this branch lineage.
tests/test_drive_map_types.py Update Carla fixture subdirectory name used by the smoke test.
tests/ini_parser/CMakeLists.txt Point ini-parser CMake test to the inih vendoring location used by this lineage.
tests/ini_parser/build_n_test.sh Make the ini-parser build/test script fail-fast (set -euo pipefail).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +41 to 46
pip install -e . --no-cache-dir
env:
TMPDIR: ${{ runner.temp }}/build
PIP_NO_CACHE_DIR: 1
PUFFER_CPU: 1

Comment on lines 47 to +50
- name: Compile C extensions
run: python setup.py build_ext --inplace --force
env:
PUFFER_CPU: 1
Eugene Vinitsky and others added 9 commits May 25, 2026 20:02
The value pins (num_agents, hidden sizes, vec counts, rnn_name,
torch_deterministic, ...) re-encoded a snapshot of drive.ini/default.ini: they
broke on every routine config tune and silently lagged schema changes (e.g. the
[policy] hidden_size -> backbone_hidden_size rename). They tested the config
file, not the loader.

Keep the loader/parser behavior tests that don't rot: test_load_config (loads
to a populated dict without raising), test_cli_override (CLI overrides ini), and
the comment-handling tests. Remove the all-pins test_drive_ini_config and its
never-executed ASSERTION_LEVEL>=3 tier.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wire the remaining Drive/ocean tests into CI for emerge/** and fix their
lineage drift (they targeted the main/puffer-4 paths and APIs):

- utest.yml: add the previously-orphaned pytest suites
  (test_drive_scenario_length, test_eval_manager, test_validation_replay_html,
  ocean/benchmark/{geometry,map_metrics,road_edges,ttc}).
- render-ci.yml / train-ci.yml: add emerge/** to the branch filters so
  test_drive_render and test_drive_train run in the headless-render / training
  environments those workflows already provide; install `pip install -e .` under
  PUFFER_CPU=1 (no [cpu] extra on this lineage).
- test_drive_train: map_dir -> pufferlib/resources/drive/binaries/carla (the
  resources live under pufferlib/ here).
- test_eval_manager: _run_rollout_loop now reads args["env"]; give the
  done-state test an env section.
- test_validation_replay_html: render_backend html -> triage_html (renamed).

Excluded: test_c_advantage.cu (CUDA, needs a GPU runner) and test_env_binding
(a breakout test, not Drive).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
These cover other environments (atari, pokemon_red, nmmo3, squared/pong, ...) and
generic PufferLib infra (emulation flatten/namespace/nested, vectorization pool,
sweep, policy_pool, record/rich/utils, ...) that the Drive/ocean work doesn't use.
Keeps the Drive/ocean suite: test_drive_*, test_eval_manager,
test_validation_replay_html, test_simulator_perf, ini_parser/, drive/.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The standalone visualize binary is retired, so the old test_drive_render (which
built and ran ./visualize) exercised dead code. Replace it with a smoke test of
the modern egl -> ffmpeg mp4 render backend through the Python evaluator. EGL is
Linux-only, so the test skips on macOS; render-ci.yml is repurposed to run it on
Linux with ffmpeg + a headless GL stack, on main and emerge/**.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Stop at target_steps=2000 (was 50000) — a handful of optimizer updates is enough
to exercise env -> rollout -> optimize end-to-end — and point map_dir at
pufferlib/resources/drive/binaries/carla (resources live under pufferlib/ on this
lineage; the old relative path no longer resolves).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
puffernet is PufferLib's hand-written C neural-net inference, the same C path the
now-retired visualize binary used; nothing in the Drive training/eval stack (all
torch) exercises it. No workflow ran it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Parametrize the HTML render smoke over both CPU-only HTML backends (triage_html,
obs_html). obs_html's viz unpacks the NN observation, so the env config now also
carries the obs-shape keys (action_type, dynamics_model, target_type, obs
maxima); the env is built with those same values so the two agree. Assert the
largest produced HTML is non-trivial, since obs_html also writes a small
index.html.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This lineage's Drive policy has no single hidden_size; it uses separate
backbone/actor/critic hidden sizes. Set those (64) instead of hidden_size=64,
which raised Drive.__init__() got an unexpected keyword argument 'hidden_size'
and then hung the run until the 20-minute timeout.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The egl mp4 render initialized on the GH runner but produced an empty mp4 —
software Mesa EGL on a CPU-only runner isn't rendering real frames for raylib,
and the renderer also looks for assets at resources/drive/* (a main-era relative
path; this lineage's assets live under pufferlib/resources/drive/). Getting it
green would need a working headless GL stack (and likely a GPU runner), which
isn't worth pursuing on free CI here.

The render *pipeline* is already covered by the HTML backends (triage_html,
obs_html) in utest; egl mp4 stays a local/GPU check.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@eugenevinitsky eugenevinitsky merged commit ba4ac7c into emerge/temp_training May 26, 2026
13 checks passed
@eugenevinitsky eugenevinitsky deleted the ev/ci-utest-emerge branch May 26, 2026 11:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants