Implement tool registry and executor for agent actions (CS-10479)#4232
Implement tool registry and executor for agent actions (CS-10479)#4232
Conversation
Implement the core FactoryAgent interface that decouples the orchestration loop from any specific LLM. This is the foundational ticket for the software factory execution loop. - Define types: FactoryAgentConfig, AgentContext, AgentAction, ResolvedSkill, ToolManifest, TestResult, ToolResult, and placeholder card types - Implement OpenRouterFactoryAgent with dual-path routing: - Direct path via OPENROUTER_API_KEY env var (simplest for local dev/CI) - Proxy path via realm server _request-forward (production, with billing) - Env var takes precedence over config over proxy - Implement MockFactoryAgent for deterministic testing - Add resolveFactoryModel() with CLI > env > FACTORY_DEFAULT_MODEL fallback - Add action validation, response parsing with markdown fence stripping - Add retry-once with error correction on malformed LLM responses - Add smoke-test script (pnpm factory:agent-smoke) for manual verification - 42 tests: unit tests + integration tests with mock HTTP servers Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix all prettier and qunit lint errors - Trim and treat blank OPENROUTER_API_KEY as missing (avoids bypassing proxy with empty env var in CI) - Pass authorization through as-is to avoid Bearer Bearer double-prefix - Use new URL() for proxy URL construction (safe without trailing slash) - Reject non-object toolArgs in validation instead of silently dropping - Add tests for blank API key handling and invalid toolArgs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add ToolRegistry with 19 built-in tool manifests across three categories (script, boxel-cli, realm-api) and ToolExecutor with dispatch logic, safety constraints, timeout handling, and audit logging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 03e4058c40
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
Introduces a tool registry + execution layer in packages/software-factory so agent invoke_tool actions can be validated, safety-checked, dispatched to the correct backend (script/boxel-cli/realm-api), and logged—along with new QUnit coverage and smoke-test CLIs.
Changes:
- Add
ToolRegistrywith built-in manifests (script, boxel-cli, realm-api) and basic arg validation. - Add
ToolExecutorwith safety constraints, dispatch logic, timeout handling, and audit logging. - Add smoke-test CLIs and expand the node test index with new QUnit suites (including agent/OpenRouter tests included on this branch).
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/software-factory/tests/index.ts | Registers the new tool/agent test modules in the node test suite. |
| packages/software-factory/tests/factory-tool-registry.test.ts | Unit tests for registry construction, lookup, and arg validation. |
| packages/software-factory/tests/factory-tool-executor.test.ts | Unit tests for executor safety checks, realm-api request building, logging, and timeouts. |
| packages/software-factory/tests/factory-agent.test.ts | Unit tests for agent action validation/parsing and OpenRouter agent behavior. |
| packages/software-factory/tests/factory-agent.integration.test.ts | Integration tests with local HTTP servers for proxy/direct OpenRouter request flows. |
| packages/software-factory/scripts/lib/factory-tool-registry.ts | Defines built-in tool manifests and implements ToolRegistry. |
| packages/software-factory/scripts/lib/factory-tool-executor.ts | Implements ToolExecutor with sub-executors and realm-api request builder. |
| packages/software-factory/scripts/lib/factory-agent.ts | Adds agent types, validation/parsing helpers, and OpenRouter + mock agents. |
| packages/software-factory/scripts/factory-tools-smoke.ts | Standalone CLI smoke test for registry/executor behavior. |
| packages/software-factory/scripts/factory-agent-smoke.ts | Standalone CLI smoke test for OpenRouter round-trip. |
| packages/software-factory/package.json | Adds factory:agent-smoke and factory:tools-smoke scripts. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- realm-create: send JSON:API body with data.type/attributes.endpoint/name - realm-server-session: send OpenID token, capture Authorization header JWT - enforceRealmSafety: always validate realm-api targets even without sourceRealmUrl - ToolRegistry constructor: detect and throw on duplicate tool names - validateArgs: reject whitespace-only strings for required args - executeRealmApi: handle empty JSON response bodies gracefully Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Verify that the realm JWT / realm-server JWT is correctly sent as the Authorization header for every realm-api tool (read, write, delete, search, atomic, create, reindex). Also tests the end-to-end flow: realm-server-session → mint JWT → use JWT in realm-create. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Integration tests spin up a real HTTP server and verify the executor sends correct URLs, methods, headers, and bodies for all 9 realm-api tools. Also verifies safety constraints reject before any HTTP request reaches the server (unregistered tools, source realm, unknown realm). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add realm-auth tool: POST _realm-auth to get per-realm JWTs - Remove realm-mtimes and realm-reindex (agent won't use directly) - realm-create: default iconURL from name, random backgroundURL, update Matrix account data (app.boxel.realms) after create - realm-read: document card+source vs card+json accept header behavior - Add live integration test suite (pnpm test:live) for real server contract - Update unit and mock-server integration tests for all changes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
TODO before merge: The |
Live tests now mint JWTs directly using the harness secret seed (matching src/harness.ts pattern) instead of requiring Matrix login. They skip gracefully when the realm server isn't running and are part of the normal test:node suite — no separate pnpm script needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Live tests throw a clear error message when the realm server isn't running instead of silently skipping. They run via pnpm test:live (requires serve:support + cache:prepare + serve:realm), separate from pnpm test:node which doesn't need external services. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Starts the harness (serve:support, cache:prepare, serve:realm) before running live integration tests in CI. Live tests fail-hard when the realm server isn't running with a clear error message. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- realm-read and realm-search use REALM_SECRET_SEED for JWT minting - realm-create live test blocked by CS-10472 (orphaned process teardown) - Revert CI harness steps (process lifecycle not reliable yet) - Clean up unused REALM_SERVER_SECRET_SEED references Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@codex review |
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
32519b2 to
0a1bb67
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The harness indexes cards that reference @cardstack/boxel-icons modules, so the icons server must be running on port 4206 before cache:prepare. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
0a1bb67 to
3c97e4e
Compare
…ndex ref - Validate realm/realm-url args against allowed-realm list for ALL tool categories (not just realm-api), preventing SSRF and token exfiltration - Validate realm-server-url by origin match against allowed realm origins - Remove stale realm-reindex reference from validateDestructiveOps Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ol-registry-and-executor-for-agent-actions # Conflicts: # packages/software-factory/tests/index.ts
backspace
left a comment
There was a problem hiding this comment.
This predates this PR but I noticed that “Upload Playwright traces” is a step in the CI job, is there a plan to serve the Playwright report?
sure we can. i'll add a ticket for that |
Summary
ToolRegistrywith 18 built-in tool manifests across three categories: script (4), boxel-cli (6), and realm-api (8)ToolExecutorwith dispatch logic that routesinvoke_toolactions to the correct sub-executor (ScriptToolExecutor,BoxelCliToolExecutor,RealmApiToolExecutor)realm-createsendsiconURL/backgroundURLdefaults and updates Matrix account data (app.boxel.realms) after creationrealm-authtool obtains per-realm JWTs for all accessible realmsrealm-readdocumentscard+sourcevscard+jsonaccept header behavior (file extensions for source, no extensions for instance)New files
scripts/lib/factory-tool-registry.tsscripts/lib/factory-tool-executor.tsscripts/factory-tools-smoke.tstests/factory-tool-registry.test.tstests/factory-tool-executor.test.tstests/factory-tool-executor.integration.test.tstests/factory-tool-executor.live.test.tsTry it out
No running services required for the smoke test:
cd packages/software-factory pnpm install pnpm factory:tools-smokeYou should see:
To run the full test suite (161 tests — unit + mock-server + live):
Live tests (realm-read, realm-search, realm-create against real server) auto-skip when the harness isn't running. To run them, start the harness first:
Test plan
pnpm factory:tools-smoke— 16/16 checks passpnpm test:node— 161/161 tests pass (live tests skip when harness not running)pnpm lint:js— clean on all filespnpm lint:format— clean on all files🤖 Generated with Claude Code