Skip to content

Define FactoryAgent interface and OpenRouter integration (CS-10476)#4229

Open
habdelra wants to merge 5 commits intomainfrom
cs-10476-define-factoryagent-interface-and-openrouter-integration-v2
Open

Define FactoryAgent interface and OpenRouter integration (CS-10476)#4229
habdelra wants to merge 5 commits intomainfrom
cs-10476-define-factoryagent-interface-and-openrouter-integration-v2

Conversation

@habdelra
Copy link
Contributor

@habdelra habdelra commented Mar 22, 2026

Note: PR #4222 (CS-10449: Implement project artifact bootstrap from a brief) should be reviewed and merged first — this PR builds on the same software-factory package and the test index will need its imports once both land.

Summary

  • Define the core FactoryAgent interface and types (AgentContext, AgentAction, ToolManifest, TestResult, etc.) that decouple the orchestration loop from any specific LLM
  • Implement OpenRouterFactoryAgent with dual-path routing: direct API key (local dev) or realm server _request-forward proxy (production with billing)
  • Implement MockFactoryAgent for deterministic testing
  • Add smoke-test script for manual CLI verification
  • 42 new tests (unit + integration with mock HTTP servers)

Try it out

After checking out this branch, you can verify the full round-trip to an LLM:

cd packages/software-factory

# Set your OpenRouter API key and run the smoke test:
OPENROUTER_API_KEY=sk-or-v1-YOUR_KEY_HERE \
  pnpm factory:agent-smoke \
  --realm-server-url https://realms-staging.stack.cards/

# Optionally override the model (defaults to anthropic/claude-sonnet-4):
OPENROUTER_API_KEY=sk-or-v1-YOUR_KEY_HERE \
  pnpm factory:agent-smoke \
  --realm-server-url https://realms-staging.stack.cards/ \
  --model anthropic/claude-opus-4

Expected output:

Model: anthropic/claude-sonnet-4                  ← resolved from default constant
Realm server: https://realms-staging.stack.cards/ ← from your --realm-server-url flag

Sending plan() request...

Received 3 action(s):                             ← count will vary (real LLM response)
[
  {                                               ┐
    "type": "create_file",                        │
    "path": "HelloWorld/hello-world.gts",         │ ← LLM-generated actions
    "content": "export class HelloWorld ...",      │   (content, paths, and number
    "realm": "target"                             │    of actions will differ on
  },                                              │    every run)
  {                                               │
    "type": "create_test",                        │
    "path": "TestSpec/hello-world.spec.ts",       │
    "content": "test('renders hello', ...",       │
    "realm": "test"                               │
  },                                              │
  {                                               │
    "type": "done"                                │
  }                                               ┘
]

Smoke test passed.                                ← confirms response was valid JSON
                                                    and parsed as AgentAction[]

The lines between [ and ] are the raw AgentAction[] returned by the LLM — the exact actions, file paths, and content will be different every time since it's a real model response. What matters is:

  1. You see a valid JSON array printed
  2. Each action has a valid type (one of: create_file, update_file, create_test, update_test, invoke_tool, done, etc.)
  3. The script exits with "Smoke test passed." (meaning the response was successfully parsed and validated)

Run the tests:

cd packages/software-factory
pnpm test:node

Test plan

  • 37 unit tests: action validation, response parsing, model resolution, mock agent, message assembly, API path selection
  • 5 integration tests: full round-trip for both proxy and direct paths using local mock HTTP servers, error handling, retry on malformed response
  • TypeScript compiles cleanly (tsc --noEmit)
  • Manual smoke test with real OpenRouter API key

🤖 Generated with Claude Code

Linear: https://linear.app/cardstack/issue/CS-10476

Implement the core FactoryAgent interface that decouples the orchestration
loop from any specific LLM. This is the foundational ticket for the
software factory execution loop.

- Define types: FactoryAgentConfig, AgentContext, AgentAction, ResolvedSkill,
  ToolManifest, TestResult, ToolResult, and placeholder card types
- Implement OpenRouterFactoryAgent with dual-path routing:
  - Direct path via OPENROUTER_API_KEY env var (simplest for local dev/CI)
  - Proxy path via realm server _request-forward (production, with billing)
  - Env var takes precedence over config over proxy
- Implement MockFactoryAgent for deterministic testing
- Add resolveFactoryModel() with CLI > env > FACTORY_DEFAULT_MODEL fallback
- Add action validation, response parsing with markdown fence stripping
- Add retry-once with error correction on malformed LLM responses
- Add smoke-test script (pnpm factory:agent-smoke) for manual verification
- 42 tests: unit tests + integration tests with mock HTTP servers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5759f0f53f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a new “FactoryAgent” abstraction in packages/software-factory and provides an OpenRouter-backed implementation (with direct-key and realm-server-proxy routing), plus a mock agent and new unit/integration tests to validate parsing, validation, and request routing.

Changes:

  • Added FactoryAgent types/utilities (action validation + response parsing) and OpenRouterFactoryAgent / MockFactoryAgent.
  • Added a CLI smoke-test script for manual end-to-end verification against OpenRouter.
  • Added unit + integration tests (including local HTTP stubs) and wired them into the test index.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
packages/software-factory/scripts/lib/factory-agent.ts Defines core agent types, action parsing/validation, OpenRouter implementation (direct/proxy), and mock agent.
packages/software-factory/scripts/factory-agent-smoke.ts Adds a manual CLI smoke test for exercising OpenRouterFactoryAgent.plan().
packages/software-factory/tests/factory-agent.test.ts Adds unit tests for action validation, response parsing, model resolution, and request path selection.
packages/software-factory/tests/factory-agent.integration.test.ts Adds integration tests using local HTTP servers for both proxy and direct call paths (including retry behavior).
packages/software-factory/tests/index.ts Registers the new agent test modules.
packages/software-factory/package.json Adds factory:agent-smoke script entry.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Fix all prettier and qunit lint errors
- Trim and treat blank OPENROUTER_API_KEY as missing (avoids bypassing
  proxy with empty env var in CI)
- Pass authorization through as-is to avoid Bearer Bearer double-prefix
- Use new URL() for proxy URL construction (safe without trailing slash)
- Reject non-object toolArgs in validation instead of silently dropping
- Add tests for blank API key handling and invalid toolArgs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ryagent-interface-and-openrouter-integration-v2

# Conflicts:
#	packages/software-factory/tests/index.ts
@backspace
Copy link
Contributor

❯ pnpm factory:agent-smoke -- \
  --realm-server-url https://realms-staging.stack.cards/

> @cardstack/software-factory@1.0.0 factory:agent-smoke /Users/b/Documents/Cardstack/Code/boxel-motion/packages/software-factory
> NODE_NO_WARNINGS=1 ts-node --transpileOnly scripts/factory-agent-smoke.ts -- --realm-server-url https://realms-staging.stack.cards/

Smoke test failed: TypeError: Unexpected argument '--realm-server-url'. This command does not take positional arguments

I confirmed I’m on 6205d52

pnpm passes '--' through to ts-node which forwards it to the script,
so process.argv contains ['--', '--realm-server-url', ...]. parseArgs
with strict: true rejects this. Strip the leading '--' like the
factory-entrypoint already does.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@habdelra
Copy link
Contributor Author

Fixed in dcea592 -- the -- that pnpm passes through to ts-node was being forwarded as a literal arg to parseArgs, which rejected it under strict: true. Now strips the leading -- the same way factory-entrypoint.ts already does.

Verified locally:

$ OPENROUTER_API_KEY=sk-or-... pnpm factory:agent-smoke -- --realm-server-url http://localhost:4201/

Model: anthropic/claude-sonnet-4
Realm server: http://localhost:4201/

Sending plan() request...

Received 4 action(s):
[
  { "type": "create_file", "path": "hello.py", ... },
  { "type": "create_test", "path": "test_hello.py", ... },
  { "type": "update_ticket", ... },
  { "type": "done" }
]

Smoke test passed.

The -- separator is not needed since pnpm doesn't consume
--realm-server-url or --model. The script still strips a leading --
defensively in case users include it out of habit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants