Skip to content

Implement skill loader and resolver for agent context (CS-10478)#4231

Open
habdelra wants to merge 6 commits intomainfrom
cs-10478-skill-loader-and-resolver
Open

Implement skill loader and resolver for agent context (CS-10478)#4231
habdelra wants to merge 6 commits intomainfrom
cs-10478-skill-loader-and-resolver

Conversation

@habdelra
Copy link
Contributor

Depends on #4229 — that PR must be reviewed and merged first. This branch is based on cs-10476-define-factoryagent-interface-and-openrouter-integration-v2.

Summary

  • DefaultSkillResolver maps ticket context → skill names using keyword matching. Always loads boxel-development + boxel-file-structure; conditionally adds ember-best-practices (for .gts/component work), software-factory-operations (for delivery workflow), and CLI skills (boxel-sync, boxel-restore, etc.) based on ticket content. KnowledgeArticle tags can also specify additional skills.
  • SkillLoader reads SKILL.md + references/ or rules/+AGENTS.md from disk with multi-directory search (package-local .agents/skills/ first, then monorepo root for shared skills like ember-best-practices). Caches for the factory run duration.
  • enforceSkillBudget() sorts skills by priority, drops lowest-priority skills when a token budget is exceeded, and filters boxel-development references by ticket keyword relevance.
  • Smoke script (pnpm factory:skill-smoke) for quick CLI verification — no servers or API keys needed.
  • 38 QUnit tests covering resolution, loading, caching, budget enforcement, fallback directories, and re-resolution across tickets.

Files

File Purpose
scripts/lib/factory-skill-loader.ts Core implementation (SkillResolver, SkillLoader, budget enforcement)
scripts/factory-skill-smoke.ts CLI smoke script
tests/factory-skill-loader.test.ts 38 QUnit unit tests
tests/index.ts Register test file
package.json Add factory:skill-smoke script

Try it out

From packages/software-factory/, run:

1. Unit tests (38 tests)

pnpm test:node

You should see all tests pass, including the new factory-skill-loader tests:

ok 63 factory-skill-loader > DefaultSkillResolver > always includes boxel-development and boxel-file-structure
ok 64 factory-skill-loader > DefaultSkillResolver > includes ember-best-practices when ticket mentions .gts
...
ok 100 factory-skill-loader > re-resolution on new ticket > cache can be cleared between tickets for fresh loading
1..116
# pass 116
# fail 0

2. Smoke test — basic skill discovery

pnpm factory:skill-smoke

Expected output — resolves skills for 4 sample tickets, then lists all 10 discoverable skills:

=== Skill Loader & Resolver Smoke Test ===

--- Generic card work (base case) ---
  Ticket: Create a contact card
  Resolved skills: [boxel-development, boxel-file-structure]
  Loaded: 2/2 skills
    - boxel-development: ~24036 tokens + 19 reference(s)
    - boxel-file-structure: ~1964 tokens

--- .gts component work (triggers ember-best-practices) ---
  Ticket: Build a dashboard component
  Resolved skills: [boxel-development, boxel-file-structure, ember-best-practices]
  Loaded: 3/3 skills
    - boxel-development: ~24036 tokens + 19 reference(s)
    - boxel-file-structure: ~1964 tokens
    - ember-best-practices: ~60336 tokens + 1 reference(s)

...

--- All discoverable skills ---
  boxel-development: ~24036 tokens (19 refs)
  boxel-file-structure: ~1964 tokens
  ember-best-practices: ~60336 tokens (1 refs)
  software-factory-operations: ~787 tokens
  boxel-sync: ~439 tokens
  boxel-track: ~787 tokens
  boxel-watch: ~488 tokens
  boxel-restore: ~523 tokens
  boxel-repair: ~415 tokens
  boxel-setup: ~615 tokens
  Total: ~90390 tokens across 10 skills

Smoke test passed.

3. Smoke test — with token budget

pnpm factory:skill-smoke --max-tokens 30000

Same output as above, but each ticket section now shows budget enforcement. Notice ember-best-practices (~60k tokens) gets dropped when it won't fit:

--- .gts component work (triggers ember-best-practices) ---
  ...
  Budget (30000 tokens): 2/3 skills kept (8310/86336 tokens)
  Dropped: [ember-best-practices]

4. Smoke test — custom ticket text

pnpm factory:skill-smoke --ticket-text "Create a .gts component with styling"

Resolves skills for your custom ticket description instead of the built-in samples.

Test plan

  • Unit tests for skill resolution with different ticket types
  • Unit tests for skill loading (SKILL.md, references/, rules/ patterns)
  • Test context budget enforcement (drop low-priority skills when over limit)
  • Test caching behavior (loaded once, reused across iterations)
  • Test re-resolution when advancing to a new ticket
  • Test fallback directory loading (monorepo root for ember-best-practices)
  • Smoke script verifies end-to-end against real skill files on disk

🤖 Generated with Claude Code

habdelra and others added 3 commits March 22, 2026 15:21
Implement the core FactoryAgent interface that decouples the orchestration
loop from any specific LLM. This is the foundational ticket for the
software factory execution loop.

- Define types: FactoryAgentConfig, AgentContext, AgentAction, ResolvedSkill,
  ToolManifest, TestResult, ToolResult, and placeholder card types
- Implement OpenRouterFactoryAgent with dual-path routing:
  - Direct path via OPENROUTER_API_KEY env var (simplest for local dev/CI)
  - Proxy path via realm server _request-forward (production, with billing)
  - Env var takes precedence over config over proxy
- Implement MockFactoryAgent for deterministic testing
- Add resolveFactoryModel() with CLI > env > FACTORY_DEFAULT_MODEL fallback
- Add action validation, response parsing with markdown fence stripping
- Add retry-once with error correction on malformed LLM responses
- Add smoke-test script (pnpm factory:agent-smoke) for manual verification
- 42 tests: unit tests + integration tests with mock HTTP servers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix all prettier and qunit lint errors
- Trim and treat blank OPENROUTER_API_KEY as missing (avoids bypassing
  proxy with empty env var in CI)
- Pass authorization through as-is to avoid Bearer Bearer double-prefix
- Use new URL() for proxy URL construction (safe without trailing slash)
- Reject non-object toolArgs in validation instead of silently dropping
- Add tests for blank API key handling and invalid toolArgs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add SkillResolver and SkillLoader modules that determine which skills to
load for each ticket and read them from .agents/skills/ into AgentContext.

- DefaultSkillResolver maps ticket context to skill names using keyword
  matching (always loads boxel-development + boxel-file-structure, adds
  ember-best-practices for .gts work, CLI skills for sync/restore, etc.)
- SkillLoader reads SKILL.md + references/ from disk with multi-directory
  search (package-local then monorepo root for shared skills like
  ember-best-practices)
- enforceSkillBudget drops low-priority skills when token budget exceeded
- Skills cached for factory run duration, re-resolved between tickets
- Smoke script (pnpm factory:skill-smoke) for CLI verification
- 38 QUnit tests covering resolution, loading, caching, budget, fallback

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a9a3294661

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds the “skills” discovery pipeline for the software-factory agent context: resolving which skills apply to a ticket, loading their on-disk content (including references / compiled rules), and enforcing a token budget so the agent prompt stays bounded.

Changes:

  • Introduces DefaultSkillResolver, SkillLoader, token estimation, and enforceSkillBudget() in a new library module.
  • Adds a CLI smoke script (factory:skill-smoke) to validate resolver/loader behavior without servers or API keys.
  • Adds comprehensive QUnit coverage and wires the new test module into the existing node test index.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
packages/software-factory/scripts/lib/factory-skill-loader.ts Core skill resolution/loading logic, token estimation, and budget enforcement.
packages/software-factory/scripts/factory-skill-smoke.ts CLI smoke test for end-to-end skill discovery/loading and budget behavior.
packages/software-factory/tests/factory-skill-loader.test.ts New unit tests covering resolution, loading, caching, fallback dirs, and budgeting.
packages/software-factory/tests/index.ts Registers the new test module.
packages/software-factory/package.json Adds factory:skill-smoke script.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

habdelra and others added 2 commits March 22, 2026 17:32
… schema fields

- Fix P1: Track reference filenames alongside content (NamedReference type)
  so filterBoxelDevelopmentRefs uses real filenames instead of fragile
  index-based reconstruction. Fixes misalignment from files like
  dev-file-editing.md not being in the keyword map.
- Fix P1: Reference filtering now happens at load time (always applied
  when a ticket is provided) rather than only during budget enforcement.
  This prevents 19 reference files from bloating prompts by default.
- Fix P2: extractKnowledgeSkillTags now checks knowledgeBase (Project
  schema) and relatedKnowledge (Ticket schema) in addition to the
  generic knowledge field.
- Fix: Use word-boundary regex for plain keywords to avoid false
  positives (e.g. "sync" no longer matches "async", "track" no longer
  matches "stacktrace"). Dotted tokens like ".gts" still use substring.
- Fix: Update enforceSkillBudget docstring to accurately describe the
  greedy behavior (skips large skills, may include smaller later ones).
- Fix: Validate --max-tokens in smoke script (reject NaN/negative).
- Fix: Prettier formatting issues that caused CI lint failure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Verifies that boxel-development references are filtered by ticket
relevance at load time even when enforceSkillBudget is called with no
budget (maxTokens=undefined). This covers the P1 review comment about
callers that omit maxSkillTokens still getting bloated prompts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@habdelra habdelra changed the base branch from cs-10476-define-factoryagent-interface-and-openrouter-integration-v2 to main March 23, 2026 15:19
…-and-resolver

# Conflicts:
#	packages/software-factory/tests/index.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants