Implement skill loader and resolver for agent context (CS-10478) by habdelra · Pull Request #4231 · cardstack/boxel

habdelra · 2026-03-22T20:07:54Z

Depends on #4229 — that PR must be reviewed and merged first. This branch is based on cs-10476-define-factoryagent-interface-and-openrouter-integration-v2.

Summary

DefaultSkillResolver maps ticket context → skill names using keyword matching. Always loads boxel-development + boxel-file-structure; conditionally adds ember-best-practices (for .gts/component work), software-factory-operations (for delivery workflow), and CLI skills (boxel-sync, boxel-restore, etc.) based on ticket content. KnowledgeArticle tags can also specify additional skills.
SkillLoader reads SKILL.md + references/ or rules/+AGENTS.md from disk with multi-directory search (package-local .agents/skills/ first, then monorepo root for shared skills like ember-best-practices). Caches for the factory run duration.
enforceSkillBudget() sorts skills by priority, drops lowest-priority skills when a token budget is exceeded, and filters boxel-development references by ticket keyword relevance.
Smoke script (pnpm factory:skill-smoke) for quick CLI verification — no servers or API keys needed.
38 QUnit tests covering resolution, loading, caching, budget enforcement, fallback directories, and re-resolution across tickets.

Files

File	Purpose
`scripts/lib/factory-skill-loader.ts`	Core implementation (SkillResolver, SkillLoader, budget enforcement)
`scripts/factory-skill-smoke.ts`	CLI smoke script
`tests/factory-skill-loader.test.ts`	38 QUnit unit tests
`tests/index.ts`	Register test file
`package.json`	Add `factory:skill-smoke` script

Try it out

From packages/software-factory/, run:

1. Unit tests (38 tests)

pnpm test:node

You should see all tests pass, including the new factory-skill-loader tests:

ok 63 factory-skill-loader > DefaultSkillResolver > always includes boxel-development and boxel-file-structure
ok 64 factory-skill-loader > DefaultSkillResolver > includes ember-best-practices when ticket mentions .gts
...
ok 100 factory-skill-loader > re-resolution on new ticket > cache can be cleared between tickets for fresh loading
1..116
# pass 116
# fail 0

2. Smoke test — basic skill discovery

pnpm factory:skill-smoke

Expected output — resolves skills for 4 sample tickets, then lists all 10 discoverable skills:

=== Skill Loader & Resolver Smoke Test ===

--- Generic card work (base case) ---
  Ticket: Create a contact card
  Resolved skills: [boxel-development, boxel-file-structure]
  Loaded: 2/2 skills
    - boxel-development: ~24036 tokens + 19 reference(s)
    - boxel-file-structure: ~1964 tokens

--- .gts component work (triggers ember-best-practices) ---
  Ticket: Build a dashboard component
  Resolved skills: [boxel-development, boxel-file-structure, ember-best-practices]
  Loaded: 3/3 skills
    - boxel-development: ~24036 tokens + 19 reference(s)
    - boxel-file-structure: ~1964 tokens
    - ember-best-practices: ~60336 tokens + 1 reference(s)

...

--- All discoverable skills ---
  boxel-development: ~24036 tokens (19 refs)
  boxel-file-structure: ~1964 tokens
  ember-best-practices: ~60336 tokens (1 refs)
  software-factory-operations: ~787 tokens
  boxel-sync: ~439 tokens
  boxel-track: ~787 tokens
  boxel-watch: ~488 tokens
  boxel-restore: ~523 tokens
  boxel-repair: ~415 tokens
  boxel-setup: ~615 tokens
  Total: ~90390 tokens across 10 skills

Smoke test passed.

3. Smoke test — with token budget

pnpm factory:skill-smoke --max-tokens 30000

Same output as above, but each ticket section now shows budget enforcement. Notice ember-best-practices (~60k tokens) gets dropped when it won't fit:

--- .gts component work (triggers ember-best-practices) ---
  ...
  Budget (30000 tokens): 2/3 skills kept (8310/86336 tokens)
  Dropped: [ember-best-practices]

4. Smoke test — custom ticket text

pnpm factory:skill-smoke --ticket-text "Create a .gts component with styling"

Resolves skills for your custom ticket description instead of the built-in samples.

Test plan

Unit tests for skill resolution with different ticket types
Unit tests for skill loading (SKILL.md, references/, rules/ patterns)
Test context budget enforcement (drop low-priority skills when over limit)
Test caching behavior (loaded once, reused across iterations)
Test re-resolution when advancing to a new ticket
Test fallback directory loading (monorepo root for ember-best-practices)
Smoke script verifies end-to-end against real skill files on disk

🤖 Generated with Claude Code

Implement the core FactoryAgent interface that decouples the orchestration loop from any specific LLM. This is the foundational ticket for the software factory execution loop. - Define types: FactoryAgentConfig, AgentContext, AgentAction, ResolvedSkill, ToolManifest, TestResult, ToolResult, and placeholder card types - Implement OpenRouterFactoryAgent with dual-path routing: - Direct path via OPENROUTER_API_KEY env var (simplest for local dev/CI) - Proxy path via realm server _request-forward (production, with billing) - Env var takes precedence over config over proxy - Implement MockFactoryAgent for deterministic testing - Add resolveFactoryModel() with CLI > env > FACTORY_DEFAULT_MODEL fallback - Add action validation, response parsing with markdown fence stripping - Add retry-once with error correction on malformed LLM responses - Add smoke-test script (pnpm factory:agent-smoke) for manual verification - 42 tests: unit tests + integration tests with mock HTTP servers Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Fix all prettier and qunit lint errors - Trim and treat blank OPENROUTER_API_KEY as missing (avoids bypassing proxy with empty env var in CI) - Pass authorization through as-is to avoid Bearer Bearer double-prefix - Use new URL() for proxy URL construction (safe without trailing slash) - Reject non-object toolArgs in validation instead of silently dropping - Add tests for blank API key handling and invalid toolArgs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add SkillResolver and SkillLoader modules that determine which skills to load for each ticket and read them from .agents/skills/ into AgentContext. - DefaultSkillResolver maps ticket context to skill names using keyword matching (always loads boxel-development + boxel-file-structure, adds ember-best-practices for .gts work, CLI skills for sync/restore, etc.) - SkillLoader reads SKILL.md + references/ from disk with multi-directory search (package-local then monorepo root for shared skills like ember-best-practices) - enforceSkillBudget drops low-priority skills when token budget exceeded - Skills cached for factory run duration, re-resolved between tickets - Smoke script (pnpm factory:skill-smoke) for CLI verification - 38 QUnit tests covering resolution, loading, caching, budget, fallback Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a9a3294661

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

packages/software-factory/scripts/lib/factory-skill-loader.ts

Copilot

Pull request overview

Adds the “skills” discovery pipeline for the software-factory agent context: resolving which skills apply to a ticket, loading their on-disk content (including references / compiled rules), and enforcing a token budget so the agent prompt stays bounded.

Changes:

Introduces DefaultSkillResolver, SkillLoader, token estimation, and enforceSkillBudget() in a new library module.
Adds a CLI smoke script (factory:skill-smoke) to validate resolver/loader behavior without servers or API keys.
Adds comprehensive QUnit coverage and wires the new test module into the existing node test index.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
packages/software-factory/scripts/lib/factory-skill-loader.ts	Core skill resolution/loading logic, token estimation, and budget enforcement.
packages/software-factory/scripts/factory-skill-smoke.ts	CLI smoke test for end-to-end skill discovery/loading and budget behavior.
packages/software-factory/tests/factory-skill-loader.test.ts	New unit tests covering resolution, loading, caching, fallback dirs, and budgeting.
packages/software-factory/tests/index.ts	Registers the new test module.
packages/software-factory/package.json	Adds `factory:skill-smoke` script.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

packages/software-factory/scripts/lib/factory-skill-loader.ts

packages/software-factory/scripts/factory-skill-smoke.ts

… schema fields - Fix P1: Track reference filenames alongside content (NamedReference type) so filterBoxelDevelopmentRefs uses real filenames instead of fragile index-based reconstruction. Fixes misalignment from files like dev-file-editing.md not being in the keyword map. - Fix P1: Reference filtering now happens at load time (always applied when a ticket is provided) rather than only during budget enforcement. This prevents 19 reference files from bloating prompts by default. - Fix P2: extractKnowledgeSkillTags now checks knowledgeBase (Project schema) and relatedKnowledge (Ticket schema) in addition to the generic knowledge field. - Fix: Use word-boundary regex for plain keywords to avoid false positives (e.g. "sync" no longer matches "async", "track" no longer matches "stacktrace"). Dotted tokens like ".gts" still use substring. - Fix: Update enforceSkillBudget docstring to accurately describe the greedy behavior (skips large skills, may include smaller later ones). - Fix: Validate --max-tokens in smoke script (reject NaN/negative). - Fix: Prettier formatting issues that caused CI lint failure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Verifies that boxel-development references are filtered by ticket relevance at load time even when enforceSkillBudget is called with no budget (maxTokens=undefined). This covers the P1 review comment about callers that omit maxSkillTokens still getting bloated prompts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…-and-resolver # Conflicts: # packages/software-factory/tests/index.ts

habdelra and others added 3 commits March 22, 2026 15:21

habdelra requested a review from Copilot March 22, 2026 20:09

Copilot started reviewing on behalf of habdelra March 22, 2026 20:10 View session

chatgpt-codex-connector bot reviewed Mar 22, 2026

View reviewed changes

packages/software-factory/scripts/lib/factory-skill-loader.ts Show resolved Hide resolved

packages/software-factory/scripts/lib/factory-skill-loader.ts Outdated Show resolved Hide resolved

packages/software-factory/scripts/lib/factory-skill-loader.ts Outdated Show resolved Hide resolved

Copilot AI reviewed Mar 22, 2026

View reviewed changes

habdelra and others added 2 commits March 22, 2026 17:32

habdelra changed the base branch from cs-10476-define-factoryagent-interface-and-openrouter-integration-v2 to main March 23, 2026 15:19

Merge remote-tracking branch 'origin/main' into cs-10478-skill-loader…

4523f22

…-and-resolver # Conflicts: # packages/software-factory/tests/index.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement skill loader and resolver for agent context (CS-10478)#4231

Implement skill loader and resolver for agent context (CS-10478)#4231
habdelra wants to merge 6 commits intomainfrom
cs-10478-skill-loader-and-resolver

habdelra commented Mar 22, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

habdelra commented Mar 22, 2026

Summary

Files

Try it out

1. Unit tests (38 tests)

2. Smoke test — basic skill discovery

3. Smoke test — with token budget

4. Smoke test — custom ticket text

Test plan

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants