Skip to content

Oss readiness#51

Merged
JuhaoLiang1997 merged 5 commits into
mainfrom
oss-readiness
May 19, 2026
Merged

Oss readiness#51
JuhaoLiang1997 merged 5 commits into
mainfrom
oss-readiness

Conversation

@JuhaoLiang1997
Copy link
Copy Markdown
Collaborator

Summary

Type of change

  • New platform support
  • Bug fix (runner, validator, leaderboard, or tooling)
  • Suite definition change
  • Schema change
  • Leaderboard / UI improvement
  • Documentation
  • Other:

Testing

# Commands used to verify

Checklist

  • I have read CONTRIBUTING.md
  • My change does not break existing result.json files (or I have explained the migration path)
  • If adding a new platform: runner inherits from BenchmarkRunner, produces valid result.json, includes a reference result
  • If changing the schema: validate_submission.py updated and all existing results still validate
  • If changing the leaderboard generator: leaderboard/generate.py produces correct output on existing results
  • I have updated relevant documentation

Related issues

JuhaoLiang1997 and others added 2 commits May 19, 2026 11:22
Phase 1 — Legal & attribution
- Align license: pyproject.toml + README badge → Apache-2.0 (matches LICENSE).
- Add NOTICE summarising bundled third-party data and upstream terms.
- Add License & attribution sections to datasets/README.md and each
  datasets/sharegpt_*_v1/README.md (CC BY 4.0, upstream link).
- Add schema/accuracy_subset.README.md documenting the MMLU subset (MIT).

Phase 2 — Contributor experience & validation
- Fix doc drift in DEVELOPMENT.md, README.md, runners/README.md,
  suites/README.md, runners/template/runner.py (rename
  SUPPORTED_QUANTIZATIONS → SUPPORTED_QUANTIZATION_BACKENDS in
  *editable* files only; existing runner.py hashes untouched).
- Add schema/suite.schema.json + runners/validate_suites.py and wire
  both into validate_pr.yml / generate_leaderboard.yml.
- Add .github/ISSUE_TEMPLATE/new_suite.md for community suite proposals.
- CONTRIBUTING.md: add local leaderboard preview instructions.
- .gitignore: ignore node_modules/, .cursor/, .aider*, .envrc, .direnv/.

Phase 3 — Code quality & CI
- runners/benchmark_runner.py:
  * Remove dead code (stub format_prompt, dead spec-decoding branch,
    redundant acc_result init, duplicated _build_result_json block).
  * Extract helpers (_prepare_load_context, _score_accuracy_questions,
    _write_accuracy_artifacts) shared between accuracy scenarios.
  * Replace inference dispatch if/elif ladder with _SCENARIO_REGISTRY
    (ScenarioSpec dataclass: inference_kind, use_async, merge_key…).
  * _MERGE_SCENARIO_KEYS now derived from the registry. Net −111 lines.
- leaderboard: split SUITE_META into
  leaderboard/site/assets/data/suite-meta.js, data.js re-exports it
  (data.js 1010 → 800 lines).
- validate_pr.yml: add python-tests job (serve + openclaw_skill pytest).
- pyproject.toml: setuptools.packages.find now lists loadgen/runners/
  serve/openclaw_skill explicitly and excludes tests*.

README hero & citation
- Embed docs/assets/framework-overview.png under nav links and
  docs/assets/chip-cloud.png in a new "Currently on the leaderboard"
  section.
- Expand BibTeX author list in the Citation section.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 19, 2026

✅ AccelMark Validation: All submissions valid

See the workflow run for details.

JuhaoLiang1997 and others added 3 commits May 19, 2026 11:27
- serve/server.py: import uvicorn lazily inside start_server() so that
  importing the module (e.g. from tests, or to expose the ASGI app)
  does not require uvicorn to be installed.
- validate_pr.yml: add numpy to the python-tests install list — pulled
  in transitively by loadgen, needed once serve.server imports
  runners.benchmark_runner during test collection.

Co-authored-by: Cursor <cursoragent@cursor.com>
… NotImplementedError

Pre-existing breakage in serve/tests/test_server.py — never caught
because python-tests was not wired into CI until this branch.

- test_server.py imports TokenStreamingMockRunner from mock_runner,
  but the class did not exist (4 ImportError collection errors).
- test_fallback_when_no_token_stream expects MockRunner to *not*
  implement true token streaming so the server's single-chunk
  fallback path runs. MockRunner used to yield word-by-word, so the
  test asserted len(content_chunks) == 1 but got more (1 AssertionError).

Fix to match the RunnerProtocol contract (runners/protocol.py:67) —
true token streaming is optional, runners signal "not supported" by
raising NotImplementedError:

- MockRunner.inference_fn_token_stream now raises NotImplementedError
  (with a trailing unreachable yield so the function shape stays an
  async generator, matching the protocol).
- Add TokenStreamingMockRunner(MockRunner) that overrides the method
  to yield word-by-word with a small async delay — used by the four
  tests that exercise the multi-chunk SSE path.

Co-authored-by: Cursor <cursoragent@cursor.com>
…not trailing

test_token_stream_reassembles_correctly concatenates every content
delta and expects exact equality with the response_text. Yielding
"word + ' '" tacks an extra trailing space onto the reassembled string,
so the assertion failed:

    got:      'Hello from token stream. '
    expected: 'Hello from token stream.'

Switch to a leading-space separator (space before every word *after*
the first). Concatenation now round-trips exactly, and the shape
matches how real BPE / SentencePiece tokenizers stream pieces (the
first token has no preceding space; subsequent ones do).

Co-authored-by: Cursor <cursoragent@cursor.com>
@JuhaoLiang1997 JuhaoLiang1997 merged commit 6bd3bf1 into main May 19, 2026
5 checks passed
@JuhaoLiang1997 JuhaoLiang1997 deleted the oss-readiness branch May 19, 2026 04:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant