Skip to content

Implement tool registry and executor for agent actions (CS-10479)#4232

Open
habdelra wants to merge 14 commits intomainfrom
cs-10479-implement-tool-registry-and-executor-for-agent-actions
Open

Implement tool registry and executor for agent actions (CS-10479)#4232
habdelra wants to merge 14 commits intomainfrom
cs-10479-implement-tool-registry-and-executor-for-agent-actions

Conversation

@habdelra
Copy link
Contributor

@habdelra habdelra commented Mar 22, 2026

Summary

  • Depends on Define FactoryAgent interface and OpenRouter integration (CS-10476) #4229 — that PR must be reviewed and merged first (this branch is based on it)
  • Adds ToolRegistry with 18 built-in tool manifests across three categories: script (4), boxel-cli (6), and realm-api (8)
  • Adds ToolExecutor with dispatch logic that routes invoke_tool actions to the correct sub-executor (ScriptToolExecutor, BoxelCliToolExecutor, RealmApiToolExecutor)
  • Enforces safety constraints: source realm protection, allowed-realm-only targeting, unregistered tool rejection, configurable timeout, and audit logging
  • realm-create sends iconURL/backgroundURL defaults and updates Matrix account data (app.boxel.realms) after creation
  • realm-auth tool obtains per-realm JWTs for all accessible realms
  • realm-read documents card+source vs card+json accept header behavior (file extensions for source, no extensions for instance)
  • Live integration tests verify tools against a running realm server (skip gracefully when harness isn't up)

New files

File Purpose
scripts/lib/factory-tool-registry.ts Static manifest of all available tools
scripts/lib/factory-tool-executor.ts Executor with three sub-executors, safety layer, icon/background defaults, Matrix account data update
scripts/factory-tools-smoke.ts CLI smoke test (no services required)
tests/factory-tool-registry.test.ts Unit tests for the registry
tests/factory-tool-executor.test.ts Unit tests for the executor
tests/factory-tool-executor.integration.test.ts Mock-server integration tests
tests/factory-tool-executor.live.test.ts Live tests against harness realm server (auto-skip when not running)

Try it out

No running services required for the smoke test:

cd packages/software-factory
pnpm install
pnpm factory:tools-smoke

You should see:

=== Tool Registry ===

Registered tools: 18

  script (4):
    - search-realm  [json]  required: realm  optional: type-name, type-module, eq, contains, sort, size, page
    - pick-ticket  [json]  required: realm  optional: status, project, agent, module
    - get-session  [json]  optional: realm
    - run-realm-tests  [json]  optional: realm-path, realm-url, spec-dir, fixtures-dir, endpoint, scratch-root

  boxel-cli (6):
    - boxel-sync  [text]  required: path  optional: prefer, dry-run
    - boxel-push  [text]  required: local-dir, realm-url  optional: delete, dry-run
    - boxel-pull  [text]  required: realm-url, local-dir  optional: delete, dry-run
    - boxel-status  [text]  required: path  optional: all, pull
    - boxel-create  [text]  required: endpoint, name
    - boxel-history  [text]  required: path  optional: message

  realm-api (8):
    - realm-read  [json]  required: realm-url, path  optional: accept
    - realm-write  [json]  required: realm-url, path, content  optional: content-type
    - realm-delete  [json]  required: realm-url, path
    - realm-atomic  [json]  required: realm-url, operations
    - realm-search  [json]  required: realm-url, query
    - realm-create  [json]  required: realm-server-url, name, endpoint  optional: iconURL, backgroundURL
    - realm-server-session  [json]  required: realm-server-url, openid-token
    - realm-auth  [json]  required: realm-server-url

  ✓ has script tools    ✓ has boxel-cli tools    ✓ has realm-api tools    ✓ all names unique
  ✓ valid args -> no errors    ✓ missing required arg -> error    ✓ unknown tool -> error
  ✓ rejects unregistered tool    ✓ rejects source realm    ✓ rejects unknown realm
  ✓ realm-read exitCode=0    ✓ realm-search exitCode=0    ✓ realm-write exitCode=0

  16 passed, 0 failed

To run the full test suite (161 tests — unit + mock-server + live):

pnpm test:node

Live tests (realm-read, realm-search, realm-create against real server) auto-skip when the harness isn't running. To run them, start the harness first:

# Terminal 1
pnpm serve:support
# Terminal 2
pnpm cache:prepare && pnpm serve:realm
# Terminal 3
pnpm test:node    # live tests will now execute

Test plan

  • pnpm factory:tools-smoke — 16/16 checks pass
  • pnpm test:node — 161/161 tests pass (live tests skip when harness not running)
  • pnpm lint:js — clean on all files
  • pnpm lint:format — clean on all files
  • Verify safety: unregistered tools, source realm targeting, and unknown realm targeting are all rejected (automated)
  • Verify realm-api request building produces correct URLs, methods, and headers (automated)

🤖 Generated with Claude Code

habdelra and others added 3 commits March 22, 2026 15:21
Implement the core FactoryAgent interface that decouples the orchestration
loop from any specific LLM. This is the foundational ticket for the
software factory execution loop.

- Define types: FactoryAgentConfig, AgentContext, AgentAction, ResolvedSkill,
  ToolManifest, TestResult, ToolResult, and placeholder card types
- Implement OpenRouterFactoryAgent with dual-path routing:
  - Direct path via OPENROUTER_API_KEY env var (simplest for local dev/CI)
  - Proxy path via realm server _request-forward (production, with billing)
  - Env var takes precedence over config over proxy
- Implement MockFactoryAgent for deterministic testing
- Add resolveFactoryModel() with CLI > env > FACTORY_DEFAULT_MODEL fallback
- Add action validation, response parsing with markdown fence stripping
- Add retry-once with error correction on malformed LLM responses
- Add smoke-test script (pnpm factory:agent-smoke) for manual verification
- 42 tests: unit tests + integration tests with mock HTTP servers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix all prettier and qunit lint errors
- Trim and treat blank OPENROUTER_API_KEY as missing (avoids bypassing
  proxy with empty env var in CI)
- Pass authorization through as-is to avoid Bearer Bearer double-prefix
- Use new URL() for proxy URL construction (safe without trailing slash)
- Reject non-object toolArgs in validation instead of silently dropping
- Add tests for blank API key handling and invalid toolArgs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add ToolRegistry with 19 built-in tool manifests across three categories
(script, boxel-cli, realm-api) and ToolExecutor with dispatch logic,
safety constraints, timeout handling, and audit logging.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 03e4058c40

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a tool registry + execution layer in packages/software-factory so agent invoke_tool actions can be validated, safety-checked, dispatched to the correct backend (script/boxel-cli/realm-api), and logged—along with new QUnit coverage and smoke-test CLIs.

Changes:

  • Add ToolRegistry with built-in manifests (script, boxel-cli, realm-api) and basic arg validation.
  • Add ToolExecutor with safety constraints, dispatch logic, timeout handling, and audit logging.
  • Add smoke-test CLIs and expand the node test index with new QUnit suites (including agent/OpenRouter tests included on this branch).

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
packages/software-factory/tests/index.ts Registers the new tool/agent test modules in the node test suite.
packages/software-factory/tests/factory-tool-registry.test.ts Unit tests for registry construction, lookup, and arg validation.
packages/software-factory/tests/factory-tool-executor.test.ts Unit tests for executor safety checks, realm-api request building, logging, and timeouts.
packages/software-factory/tests/factory-agent.test.ts Unit tests for agent action validation/parsing and OpenRouter agent behavior.
packages/software-factory/tests/factory-agent.integration.test.ts Integration tests with local HTTP servers for proxy/direct OpenRouter request flows.
packages/software-factory/scripts/lib/factory-tool-registry.ts Defines built-in tool manifests and implements ToolRegistry.
packages/software-factory/scripts/lib/factory-tool-executor.ts Implements ToolExecutor with sub-executors and realm-api request builder.
packages/software-factory/scripts/lib/factory-agent.ts Adds agent types, validation/parsing helpers, and OpenRouter + mock agents.
packages/software-factory/scripts/factory-tools-smoke.ts Standalone CLI smoke test for registry/executor behavior.
packages/software-factory/scripts/factory-agent-smoke.ts Standalone CLI smoke test for OpenRouter round-trip.
packages/software-factory/package.json Adds factory:agent-smoke and factory:tools-smoke scripts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

habdelra and others added 4 commits March 22, 2026 17:31
- realm-create: send JSON:API body with data.type/attributes.endpoint/name
- realm-server-session: send OpenID token, capture Authorization header JWT
- enforceRealmSafety: always validate realm-api targets even without sourceRealmUrl
- ToolRegistry constructor: detect and throw on duplicate tool names
- validateArgs: reject whitespace-only strings for required args
- executeRealmApi: handle empty JSON response bodies gracefully

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Verify that the realm JWT / realm-server JWT is correctly sent as
the Authorization header for every realm-api tool (read, write, delete,
search, atomic, create, reindex). Also tests the end-to-end flow:
realm-server-session → mint JWT → use JWT in realm-create.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Integration tests spin up a real HTTP server and verify the executor
sends correct URLs, methods, headers, and bodies for all 9 realm-api
tools. Also verifies safety constraints reject before any HTTP request
reaches the server (unregistered tools, source realm, unknown realm).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add realm-auth tool: POST _realm-auth to get per-realm JWTs
- Remove realm-mtimes and realm-reindex (agent won't use directly)
- realm-create: default iconURL from name, random backgroundURL,
  update Matrix account data (app.boxel.realms) after create
- realm-read: document card+source vs card+json accept header behavior
- Add live integration test suite (pnpm test:live) for real server contract
- Update unit and mock-server integration tests for all changes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@habdelra
Copy link
Contributor Author

TODO before merge: The iconURLForName and getRandomBackgroundURL functions in factory-tool-executor.ts are duplicated from packages/host/app/lib/utils.ts. These should be moved to @cardstack/runtime-common so both packages share one copy. Filed as a follow-up — the functions are tiny and pure JS so the duplication is low-risk for now.

habdelra and others added 4 commits March 22, 2026 18:35
Live tests now mint JWTs directly using the harness secret seed
(matching src/harness.ts pattern) instead of requiring Matrix login.
They skip gracefully when the realm server isn't running and are part
of the normal test:node suite — no separate pnpm script needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Live tests throw a clear error message when the realm server isn't
running instead of silently skipping. They run via pnpm test:live
(requires serve:support + cache:prepare + serve:realm), separate from
pnpm test:node which doesn't need external services.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Starts the harness (serve:support, cache:prepare, serve:realm) before
running live integration tests in CI. Live tests fail-hard when the
realm server isn't running with a clear error message.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- realm-read and realm-search use REALM_SECRET_SEED for JWT minting
- realm-create live test blocked by CS-10472 (orphaned process teardown)
- Revert CI harness steps (process lifecycle not reliable yet)
- Clean up unused REALM_SERVER_SECRET_SEED references

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@habdelra habdelra requested a review from Copilot March 22, 2026 23:19
@habdelra
Copy link
Contributor Author

@codex review

@chatgpt-codex-connector
Copy link

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@habdelra habdelra force-pushed the cs-10479-implement-tool-registry-and-executor-for-agent-actions branch from 32519b2 to 0a1bb67 Compare March 22, 2026 23:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

The harness indexes cards that reference @cardstack/boxel-icons modules,
so the icons server must be running on port 4206 before cache:prepare.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@habdelra habdelra force-pushed the cs-10479-implement-tool-registry-and-executor-for-agent-actions branch from 0a1bb67 to 3c97e4e Compare March 22, 2026 23:23
…ndex ref

- Validate realm/realm-url args against allowed-realm list for ALL tool
  categories (not just realm-api), preventing SSRF and token exfiltration
- Validate realm-server-url by origin match against allowed realm origins
- Remove stale realm-reindex reference from validateDestructiveOps

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@habdelra habdelra requested a review from a team March 23, 2026 00:53
…ol-registry-and-executor-for-agent-actions

# Conflicts:
#	packages/software-factory/tests/index.ts
Copy link
Contributor

@backspace backspace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This predates this PR but I noticed that “Upload Playwright traces” is a step in the CI job, is there a plan to serve the Playwright report?

@habdelra
Copy link
Contributor Author

This predates this PR but I noticed that “Upload Playwright traces” is a step in the CI job, is there a plan to serve the Playwright report?

sure we can. i'll add a ticket for that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants