Implement tool registry and executor for agent actions (CS-10479) by habdelra · Pull Request #4232 · cardstack/boxel

habdelra · 2026-03-22T20:07:59Z

Summary

Depends on Define FactoryAgent interface and OpenRouter integration (CS-10476) #4229 — that PR must be reviewed and merged first (this branch is based on it)
Adds ToolRegistry with 18 built-in tool manifests across three categories: script (4), boxel-cli (6), and realm-api (8)
Adds ToolExecutor with dispatch logic that routes invoke_tool actions to the correct sub-executor (ScriptToolExecutor, BoxelCliToolExecutor, RealmApiToolExecutor)
Enforces safety constraints: source realm protection, allowed-realm-only targeting, unregistered tool rejection, configurable timeout, and audit logging
realm-create sends iconURL/backgroundURL defaults and updates Matrix account data (app.boxel.realms) after creation
realm-auth tool obtains per-realm JWTs for all accessible realms
realm-read documents card+source vs card+json accept header behavior (file extensions for source, no extensions for instance)
Live integration tests verify tools against a running realm server (skip gracefully when harness isn't up)

New files

File	Purpose
`scripts/lib/factory-tool-registry.ts`	Static manifest of all available tools
`scripts/lib/factory-tool-executor.ts`	Executor with three sub-executors, safety layer, icon/background defaults, Matrix account data update
`scripts/factory-tools-smoke.ts`	CLI smoke test (no services required)
`tests/factory-tool-registry.test.ts`	Unit tests for the registry
`tests/factory-tool-executor.test.ts`	Unit tests for the executor
`tests/factory-tool-executor.integration.test.ts`	Mock-server integration tests
`tests/factory-tool-executor.live.test.ts`	Live tests against harness realm server (auto-skip when not running)

Try it out

No running services required for the smoke test:

cd packages/software-factory
pnpm install
pnpm factory:tools-smoke

You should see:

=== Tool Registry ===

Registered tools: 18

  script (4):
    - search-realm  [json]  required: realm  optional: type-name, type-module, eq, contains, sort, size, page
    - pick-ticket  [json]  required: realm  optional: status, project, agent, module
    - get-session  [json]  optional: realm
    - run-realm-tests  [json]  optional: realm-path, realm-url, spec-dir, fixtures-dir, endpoint, scratch-root

  boxel-cli (6):
    - boxel-sync  [text]  required: path  optional: prefer, dry-run
    - boxel-push  [text]  required: local-dir, realm-url  optional: delete, dry-run
    - boxel-pull  [text]  required: realm-url, local-dir  optional: delete, dry-run
    - boxel-status  [text]  required: path  optional: all, pull
    - boxel-create  [text]  required: endpoint, name
    - boxel-history  [text]  required: path  optional: message

  realm-api (8):
    - realm-read  [json]  required: realm-url, path  optional: accept
    - realm-write  [json]  required: realm-url, path, content  optional: content-type
    - realm-delete  [json]  required: realm-url, path
    - realm-atomic  [json]  required: realm-url, operations
    - realm-search  [json]  required: realm-url, query
    - realm-create  [json]  required: realm-server-url, name, endpoint  optional: iconURL, backgroundURL
    - realm-server-session  [json]  required: realm-server-url, openid-token
    - realm-auth  [json]  required: realm-server-url

  ✓ has script tools    ✓ has boxel-cli tools    ✓ has realm-api tools    ✓ all names unique
  ✓ valid args -> no errors    ✓ missing required arg -> error    ✓ unknown tool -> error
  ✓ rejects unregistered tool    ✓ rejects source realm    ✓ rejects unknown realm
  ✓ realm-read exitCode=0    ✓ realm-search exitCode=0    ✓ realm-write exitCode=0

  16 passed, 0 failed

To run the full test suite (161 tests — unit + mock-server + live):

pnpm test:node

Live tests (realm-read, realm-search, realm-create against real server) auto-skip when the harness isn't running. To run them, start the harness first:

# Terminal 1
pnpm serve:support
# Terminal 2
pnpm cache:prepare && pnpm serve:realm
# Terminal 3
pnpm test:node    # live tests will now execute

Test plan

pnpm factory:tools-smoke — 16/16 checks pass
pnpm test:node — 161/161 tests pass (live tests skip when harness not running)
pnpm lint:js — clean on all files
pnpm lint:format — clean on all files
Verify safety: unregistered tools, source realm targeting, and unknown realm targeting are all rejected (automated)
Verify realm-api request building produces correct URLs, methods, and headers (automated)

🤖 Generated with Claude Code

Implement the core FactoryAgent interface that decouples the orchestration loop from any specific LLM. This is the foundational ticket for the software factory execution loop. - Define types: FactoryAgentConfig, AgentContext, AgentAction, ResolvedSkill, ToolManifest, TestResult, ToolResult, and placeholder card types - Implement OpenRouterFactoryAgent with dual-path routing: - Direct path via OPENROUTER_API_KEY env var (simplest for local dev/CI) - Proxy path via realm server _request-forward (production, with billing) - Env var takes precedence over config over proxy - Implement MockFactoryAgent for deterministic testing - Add resolveFactoryModel() with CLI > env > FACTORY_DEFAULT_MODEL fallback - Add action validation, response parsing with markdown fence stripping - Add retry-once with error correction on malformed LLM responses - Add smoke-test script (pnpm factory:agent-smoke) for manual verification - 42 tests: unit tests + integration tests with mock HTTP servers Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Fix all prettier and qunit lint errors - Trim and treat blank OPENROUTER_API_KEY as missing (avoids bypassing proxy with empty env var in CI) - Pass authorization through as-is to avoid Bearer Bearer double-prefix - Use new URL() for proxy URL construction (safe without trailing slash) - Reject non-object toolArgs in validation instead of silently dropping - Add tests for blank API key handling and invalid toolArgs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add ToolRegistry with 19 built-in tool manifests across three categories (script, boxel-cli, realm-api) and ToolExecutor with dispatch logic, safety constraints, timeout handling, and audit logging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 03e4058c40

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

packages/software-factory/scripts/lib/factory-tool-executor.ts

Copilot

Pull request overview

Introduces a tool registry + execution layer in packages/software-factory so agent invoke_tool actions can be validated, safety-checked, dispatched to the correct backend (script/boxel-cli/realm-api), and logged—along with new QUnit coverage and smoke-test CLIs.

Changes:

Add ToolRegistry with built-in manifests (script, boxel-cli, realm-api) and basic arg validation.
Add ToolExecutor with safety constraints, dispatch logic, timeout handling, and audit logging.
Add smoke-test CLIs and expand the node test index with new QUnit suites (including agent/OpenRouter tests included on this branch).

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
packages/software-factory/tests/index.ts	Registers the new tool/agent test modules in the node test suite.
packages/software-factory/tests/factory-tool-registry.test.ts	Unit tests for registry construction, lookup, and arg validation.
packages/software-factory/tests/factory-tool-executor.test.ts	Unit tests for executor safety checks, realm-api request building, logging, and timeouts.
packages/software-factory/tests/factory-agent.test.ts	Unit tests for agent action validation/parsing and OpenRouter agent behavior.
packages/software-factory/tests/factory-agent.integration.test.ts	Integration tests with local HTTP servers for proxy/direct OpenRouter request flows.
packages/software-factory/scripts/lib/factory-tool-registry.ts	Defines built-in tool manifests and implements `ToolRegistry`.
packages/software-factory/scripts/lib/factory-tool-executor.ts	Implements `ToolExecutor` with sub-executors and realm-api request builder.
packages/software-factory/scripts/lib/factory-agent.ts	Adds agent types, validation/parsing helpers, and OpenRouter + mock agents.
packages/software-factory/scripts/factory-tools-smoke.ts	Standalone CLI smoke test for registry/executor behavior.
packages/software-factory/scripts/factory-agent-smoke.ts	Standalone CLI smoke test for OpenRouter round-trip.
packages/software-factory/package.json	Adds `factory:agent-smoke` and `factory:tools-smoke` scripts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

packages/software-factory/scripts/lib/factory-tool-registry.ts

packages/software-factory/tests/factory-tool-executor.test.ts

packages/software-factory/scripts/lib/factory-tool-executor.ts

- realm-create: send JSON:API body with data.type/attributes.endpoint/name - realm-server-session: send OpenID token, capture Authorization header JWT - enforceRealmSafety: always validate realm-api targets even without sourceRealmUrl - ToolRegistry constructor: detect and throw on duplicate tool names - validateArgs: reject whitespace-only strings for required args - executeRealmApi: handle empty JSON response bodies gracefully Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Verify that the realm JWT / realm-server JWT is correctly sent as the Authorization header for every realm-api tool (read, write, delete, search, atomic, create, reindex). Also tests the end-to-end flow: realm-server-session → mint JWT → use JWT in realm-create. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Integration tests spin up a real HTTP server and verify the executor sends correct URLs, methods, headers, and bodies for all 9 realm-api tools. Also verifies safety constraints reject before any HTTP request reaches the server (unregistered tools, source realm, unknown realm). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add realm-auth tool: POST _realm-auth to get per-realm JWTs - Remove realm-mtimes and realm-reindex (agent won't use directly) - realm-create: default iconURL from name, random backgroundURL, update Matrix account data (app.boxel.realms) after create - realm-read: document card+source vs card+json accept header behavior - Add live integration test suite (pnpm test:live) for real server contract - Update unit and mock-server integration tests for all changes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

habdelra · 2026-03-22T22:32:16Z

TODO before merge: The iconURLForName and getRandomBackgroundURL functions in factory-tool-executor.ts are duplicated from packages/host/app/lib/utils.ts. These should be moved to @cardstack/runtime-common so both packages share one copy. Filed as a follow-up — the functions are tiny and pure JS so the duplication is low-risk for now.

Live tests now mint JWTs directly using the harness secret seed (matching src/harness.ts pattern) instead of requiring Matrix login. They skip gracefully when the realm server isn't running and are part of the normal test:node suite — no separate pnpm script needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Live tests throw a clear error message when the realm server isn't running instead of silently skipping. They run via pnpm test:live (requires serve:support + cache:prepare + serve:realm), separate from pnpm test:node which doesn't need external services. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Starts the harness (serve:support, cache:prepare, serve:realm) before running live integration tests in CI. Live tests fail-hard when the realm server isn't running with a clear error message. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- realm-read and realm-search use REALM_SECRET_SEED for JWT minting - realm-create live test blocked by CS-10472 (orphaned process teardown) - Revert CI harness steps (process lifecycle not reliable yet) - Clean up unused REALM_SERVER_SECRET_SEED references Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

habdelra · 2026-03-22T23:19:32Z

@codex review

chatgpt-codex-connector · 2026-03-22T23:19:40Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Copilot

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

packages/software-factory/scripts/lib/factory-tool-executor.ts

packages/software-factory/tests/factory-tool-executor.live.test.ts

packages/software-factory/scripts/lib/factory-tool-executor.ts

The harness indexes cards that reference @cardstack/boxel-icons modules, so the icons server must be running on port 4206 before cache:prepare. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ndex ref - Validate realm/realm-url args against allowed-realm list for ALL tool categories (not just realm-api), preventing SSRF and token exfiltration - Validate realm-server-url by origin match against allowed realm origins - Remove stale realm-reindex reference from validateDestructiveOps Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ol-registry-and-executor-for-agent-actions # Conflicts: # packages/software-factory/tests/index.ts

backspace

This predates this PR but I noticed that “Upload Playwright traces” is a step in the CI job, is there a plan to serve the Playwright report?

habdelra · 2026-03-23T21:52:29Z

This predates this PR but I noticed that “Upload Playwright traces” is a step in the CI job, is there a plan to serve the Playwright report?

sure we can. i'll add a ticket for that

habdelra and others added 3 commits March 22, 2026 15:21

habdelra requested a review from Copilot March 22, 2026 20:09

Copilot started reviewing on behalf of habdelra March 22, 2026 20:09 View session

chatgpt-codex-connector bot reviewed Mar 22, 2026

View reviewed changes

packages/software-factory/scripts/lib/factory-tool-executor.ts Outdated Show resolved Hide resolved

packages/software-factory/scripts/lib/factory-tool-executor.ts Outdated Show resolved Hide resolved

packages/software-factory/scripts/lib/factory-tool-executor.ts Show resolved Hide resolved

Copilot AI reviewed Mar 22, 2026

View reviewed changes

habdelra and others added 4 commits March 22, 2026 17:31

habdelra and others added 4 commits March 22, 2026 18:35

habdelra requested a review from Copilot March 22, 2026 23:19

Copilot started reviewing on behalf of habdelra March 22, 2026 23:19 View session

habdelra force-pushed the cs-10479-implement-tool-registry-and-executor-for-agent-actions branch from 32519b2 to 0a1bb67 Compare March 22, 2026 23:23

Copilot AI reviewed Mar 22, 2026

View reviewed changes

packages/software-factory/scripts/lib/factory-tool-executor.ts Outdated Show resolved Hide resolved

packages/software-factory/tests/factory-tool-executor.live.test.ts Show resolved Hide resolved

packages/software-factory/scripts/lib/factory-tool-executor.ts Outdated Show resolved Hide resolved

Document boxel-icons server prerequisite for cache:prepare

3c97e4e

The harness indexes cards that reference @cardstack/boxel-icons modules, so the icons server must be running on port 4206 before cache:prepare. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

habdelra force-pushed the cs-10479-implement-tool-registry-and-executor-for-agent-actions branch from 0a1bb67 to 3c97e4e Compare March 22, 2026 23:23

habdelra requested a review from a team March 23, 2026 00:53

Merge remote-tracking branch 'origin/main' into cs-10479-implement-to…

b334c36

…ol-registry-and-executor-for-agent-actions # Conflicts: # packages/software-factory/tests/index.ts

backspace approved these changes Mar 23, 2026

View reviewed changes

Conversation

habdelra commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New files

Try it out

Test plan

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

habdelra commented Mar 22, 2026

Uh oh!

habdelra commented Mar 22, 2026

Uh oh!

chatgpt-codex-connector bot commented Mar 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

backspace left a comment

Choose a reason for hiding this comment

Uh oh!

habdelra commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

habdelra commented Mar 22, 2026 •

edited

Loading