Create prompt templates for agent communication (CS-10477) by habdelra · Pull Request #4230 · cardstack/boxel

habdelra · 2026-03-22T20:07:46Z

Summary

Add markdown prompt templates in packages/software-factory/prompts/ that define how the orchestrator communicates with the LLM — standalone files that can be iterated without code changes
Implement PromptLoader with {{variable}}, {{#each}}, {{#if}}/{{else}} interpolation — minimal mustache-like expansion, no template engine dependency
Replace the stub buildMessages() in OpenRouterFactoryAgent with template-based one-shot prompt assembly
Add factory:prompt-smoke CLI script for reviewing assembled prompts without an API key or server
37 new unit tests covering interpolation, prompt assembly, and integration

Depends on

#4229 (CS-10476: Define FactoryAgent interface and OpenRouter integration) must be reviewed and merged first. This PR is based on that branch.

Files

Prompt templates (packages/software-factory/prompts/):

File	Purpose
`system.md`	Role, rules, output schema, skills, tools
`ticket-implement.md`	First pass: project context + ticket + implementation instructions
`ticket-test.md`	Generate tests for existing implementation
`ticket-iterate.md`	Self-contained: ticket context + previous actions + test failures + fix instructions
`action-schema.md`	Canonical AgentAction[] JSON schema
`examples/create-card.md`	Example: creating a card definition + instance
`examples/create-test.md`	Example: generating a test spec
`examples/iterate-fix.md`	Example: fixing code after test failure

Implementation:

File	Purpose
`scripts/lib/factory-prompt-loader.ts`	`PromptLoader` interface, `FilePromptLoader`, interpolation engine, message assembly functions
`scripts/lib/factory-agent.ts`	Updated `buildMessages()` to use templates
`scripts/factory-prompt-smoke.ts`	CLI smoke test for reviewing prompts
`tests/factory-prompt-loader.test.ts`	37 new unit tests

Try it out

No API keys, servers, or network access required — the smoke test assembles prompts with sample data and prints exactly what the LLM would receive.

Run all three stages (implement → iterate → test):

cd packages/software-factory
pnpm factory:prompt-smoke

Run a specific stage:

pnpm factory:prompt-smoke -- --stage implement   # first pass
pnpm factory:prompt-smoke -- --stage iterate      # fix after test failure
pnpm factory:prompt-smoke -- --stage test         # generate tests

What you should see:

Each stage prints a [SYSTEM] message and a [USER] message separated by decorated headers. For example, the implement stage output starts with:

════════════════════════════════════════════════════════════════════════
  STAGE: implement (first pass)
════════════════════════════════════════════════════════════════════════

── [SYSTEM] ──────────────────────────
# Role

You are a software factory agent. You implement Boxel cards and tests in
target realms based on ticket descriptions and project context.

# Output Format
...

followed by the full system prompt (role, rules, action schema, skills, tools), then:

── [USER] ──────────────────────────
# Project

Build a sticky note card application for Boxel

Success criteria:
- StickyNote card renders with title and body
...

# Current Ticket

ID: Ticket/define-sticky-note-core
Summary: Define the core StickyNote CardDef
...

Each stage ends with a char count summary like:

📊  System: 3174 chars | User: 1482 chars

The iterate stage additionally shows previous actions (with code blocks) and test failure output embedded in the user prompt, demonstrating the self-contained one-shot design.

Run unit tests:

pnpm test:node

All 123 tests should pass (37 new + 86 existing).

Test plan

Unit tests for PromptLoader interpolation (simple vars, dot paths, {{#each}}, {{#if}}/{{else}})
Unit tests for one-shot message assembly at each loop stage
Test that ticket-iterate prompt includes previous actions and test results
Snapshot test for assembled system prompt with sample skills and tools
Verify factory:prompt-smoke runs cleanly and prints all three stages
Existing factory-agent tests updated and passing

🤖 Generated with Claude Code

chatgpt-codex-connector

💡 Codex Review

boxel/packages/software-factory/scripts/lib/factory-agent.ts

Lines 379 to 380 in 24bf1b0

    
           async plan(context: AgentContext): Promise<AgentAction[]> { 
        
             let messages = this.buildMessages(context);

Thread failed-test context into plan() retries

The new iterate prompt is only selected when both previousActions and iteration are passed, but the public entry point still calls buildMessages(context) with no extra arguments. That means a real follow-up plan(context) after a failed test run will never hit ticket-iterate; it falls back to ticket-implement and loses the prior actions plus failure details that the fix-up loop needs. Compared with the previous implementation, this regresses all test-failure retries from self-contained repair prompts to essentially a fresh first pass.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

packages/software-factory/scripts/lib/factory-agent.ts

Copilot

Pull request overview

This PR introduces a template-driven prompt system for the software-factory agent so the orchestrator’s LLM messages can be iterated via standalone markdown files rather than hardcoded strings.

Changes:

Added markdown prompt templates under packages/software-factory/prompts/ (system + implement/iterate/test stages + action schema + examples).
Implemented PromptLoader/FilePromptLoader and a minimal mustache-like interpolator with {{var}}, {{#each}}, and {{#if}}/{{else}}.
Updated OpenRouterFactoryAgent.buildMessages() to assemble one-shot [system,user] messages from templates and added a factory:prompt-smoke CLI plus unit tests.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
packages/software-factory/scripts/lib/factory-prompt-loader.ts	Implements file-backed prompt loading, interpolation, and prompt assembly helpers.
packages/software-factory/scripts/lib/factory-agent.ts	Switches `buildMessages()` from stubbed strings to template-based prompt construction.
packages/software-factory/prompts/system.md	Defines the shared system prompt (role/rules/realms/skills/tools + action schema).
packages/software-factory/prompts/ticket-implement.md	Template for initial implementation pass user prompt.
packages/software-factory/prompts/ticket-iterate.md	Template for iteration/fix pass user prompt (previous actions + failures + tool results).
packages/software-factory/prompts/ticket-test.md	Template for test generation pass user prompt.
packages/software-factory/prompts/action-schema.md	Provides the canonical action schema text embedded into the system prompt.
packages/software-factory/prompts/examples/*	Adds example input/output snippets for reference.
packages/software-factory/scripts/factory-prompt-smoke.ts	Adds CLI to print assembled prompts without any network/API key.
packages/software-factory/tests/factory-prompt-loader.test.ts	Adds unit/integration tests for interpolation and prompt assembly.
packages/software-factory/tests/factory-agent.test.ts	Updates tests to reflect new template-based message assembly and iterate-mode signature.
packages/software-factory/tests/index.ts	Registers the new test file in the suite.
packages/software-factory/package.json	Adds `factory:prompt-smoke` script.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

packages/software-factory/scripts/lib/factory-agent.ts

packages/software-factory/prompts/ticket-iterate.md

packages/software-factory/scripts/lib/factory-prompt-loader.ts

packages/software-factory/prompts/ticket-implement.md

packages/software-factory/prompts/system.md

habdelra · 2026-03-22T21:38:17Z

Re: P1 — Thread failed-test context into plan() retries

Fixed in 3451842. Added previousActions and iteration as optional fields on AgentContext:

export interface AgentContext {
  // ...existing fields...
  /** Actions from the previous plan() call, fed back for iteration prompts. */
  previousActions?: AgentAction[];
  /** Current iteration number (1-based), set by the orchestrator. */
  iteration?: number;
}

plan() now threads these through to buildMessages():

async plan(context: AgentContext): Promise<AgentAction[]> {
  let messages = this.buildMessages(
    context,
    context.previousActions,
    context.iteration,
  );
  // ...
}

The orchestrator sets context.previousActions and context.iteration before calling plan(), and the iterate template receives real data instead of empty defaults.

Added an integration-style test (plan() uses iterate template when context has previousActions and testResults) that stubs fetch, calls plan() with a full iteration context, and verifies the LLM request body contains the iterate prompt with previous actions, iteration number, and test failure details.

backspace

I ran the suggested commands and saw the suggested output. The actual volume of words here is massive though, am I meant to be reading this all? I only skimmed

backspace · 2026-03-23T15:21:01Z

packages/software-factory/prompts/system.md

+
+# Realms
+
+- Target realm: {{targetRealmUrl}}


Where does this come from? I ran with BOXEL_ENVIRONMENT=hello so I’d expect to see http://realm-server.hello.localhost/user/personal but I got http://localhost:4201/user/personal/

but I know I’m probably the only one using this at the moment, so when the time comes, I’ll add support for this

the target realm is the realm that you specified to create the project in. i don't think the software factory is aware of the boxel environment work. I can add a ticket for that though

Implement the core FactoryAgent interface that decouples the orchestration loop from any specific LLM. This is the foundational ticket for the software factory execution loop. - Define types: FactoryAgentConfig, AgentContext, AgentAction, ResolvedSkill, ToolManifest, TestResult, ToolResult, and placeholder card types - Implement OpenRouterFactoryAgent with dual-path routing: - Direct path via OPENROUTER_API_KEY env var (simplest for local dev/CI) - Proxy path via realm server _request-forward (production, with billing) - Env var takes precedence over config over proxy - Implement MockFactoryAgent for deterministic testing - Add resolveFactoryModel() with CLI > env > FACTORY_DEFAULT_MODEL fallback - Add action validation, response parsing with markdown fence stripping - Add retry-once with error correction on malformed LLM responses - Add smoke-test script (pnpm factory:agent-smoke) for manual verification - 42 tests: unit tests + integration tests with mock HTTP servers Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Fix all prettier and qunit lint errors - Trim and treat blank OPENROUTER_API_KEY as missing (avoids bypassing proxy with empty env var in CI) - Pass authorization through as-is to avoid Bearer Bearer double-prefix - Use new URL() for proxy URL construction (safe without trailing slash) - Reject non-object toolArgs in validation instead of silently dropping - Add tests for blank API key handling and invalid toolArgs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add markdown prompt templates and a PromptLoader that assembles one-shot LLM messages from templates + runtime context. Each plan() call sends exactly [system, user] — no multi-turn conversation. - prompts/: system.md, ticket-implement.md, ticket-test.md, ticket-iterate.md, action-schema.md, and examples/ - PromptLoader: reads, caches, and interpolates templates with {{variable}}, {{#each}}, {{#if}}/{{else}} support - OpenRouterFactoryAgent.buildMessages() now uses template-based assembly - factory:prompt-smoke script for reviewing assembled prompts - 37 new unit tests for interpolation, assembly, and integration Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…formatting - buildMessages() now keys off context.testResults alone (with sensible defaults for previousActions/iteration) so iterate template is used whenever test results are present - assembleImplementPrompt() includes tool results when present, so invoke_tool output is not silently dropped on re-plan - Tool results propagate outputFormat from tool manifests — templates use ```text or ```json fences based on the tool's declared format - Extract shared buildToolResultsData() helper used by both implement and iterate assembly functions - Remove indentation from closing template tags ({{/each}}, {{/if}}) to avoid stray whitespace in rendered prompts - Add prompts/ to .prettierignore (template syntax conflicts with prettier's markdown list formatting) - 5 new tests covering the above changes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add previousActions and iteration as optional fields on AgentContext so the orchestrator can set them and plan() threads them to buildMessages(). This ensures the iterate template is used with real data during the fix-up loop, not just sensible defaults. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

habdelra · 2026-03-23T22:23:49Z

I ran the suggested commands and saw the suggested output. The actual volume of words here is massive though, am I meant to be reading this all? I only skimmed

skimming is fine--the point is that they are there. we will tweak these prompts as we iterate on the factory

…t-templates-for-agent-communication # Conflicts: # packages/software-factory/scripts/factory-agent-smoke.ts # packages/software-factory/scripts/lib/factory-agent.ts # packages/software-factory/tests/factory-agent.test.ts

…t-templates-for-agent-communication # Conflicts: # packages/software-factory/package.json

habdelra requested a review from Copilot March 22, 2026 20:08

Copilot started reviewing on behalf of habdelra March 22, 2026 20:08 View session

chatgpt-codex-connector bot reviewed Mar 22, 2026

View reviewed changes

packages/software-factory/scripts/lib/factory-agent.ts Show resolved Hide resolved

Copilot AI reviewed Mar 22, 2026

View reviewed changes

habdelra requested a review from a team March 22, 2026 21:40

habdelra changed the base branch from cs-10476-define-factoryagent-interface-and-openrouter-integration-v2 to main March 23, 2026 15:17

backspace approved these changes Mar 23, 2026

View reviewed changes

habdelra and others added 5 commits March 23, 2026 17:31

habdelra force-pushed the cs-10477-create-prompt-templates-for-agent-communication branch from 3451842 to 0a9d061 Compare March 23, 2026 21:33

jurgenwerk approved these changes Mar 24, 2026

View reviewed changes

habdelra added 2 commits March 24, 2026 09:33

Merge remote-tracking branch 'origin/main' into cs-10477-create-promp…

48c2475

…t-templates-for-agent-communication # Conflicts: # packages/software-factory/package.json

habdelra merged commit 23ebd63 into main Mar 24, 2026
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create prompt templates for agent communication (CS-10477)#4230

Create prompt templates for agent communication (CS-10477)#4230
habdelra merged 7 commits intomainfrom
cs-10477-create-prompt-templates-for-agent-communication

habdelra commented Mar 22, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

habdelra commented Mar 22, 2026

Uh oh!

backspace left a comment

Uh oh!

backspace Mar 23, 2026

Uh oh!

habdelra Mar 23, 2026

Uh oh!

habdelra commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	async plan(context: AgentContext): Promise<AgentAction[]> {
	let messages = this.buildMessages(context);

Conversation

habdelra commented Mar 22, 2026

Summary

Depends on

Files

Try it out

Test plan

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

habdelra commented Mar 22, 2026

Uh oh!

backspace left a comment

Choose a reason for hiding this comment

Uh oh!

backspace Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

habdelra Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

habdelra commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants