Implement skill loader and resolver for agent context (CS-10478)#4231
Implement skill loader and resolver for agent context (CS-10478)#4231
Conversation
Implement the core FactoryAgent interface that decouples the orchestration loop from any specific LLM. This is the foundational ticket for the software factory execution loop. - Define types: FactoryAgentConfig, AgentContext, AgentAction, ResolvedSkill, ToolManifest, TestResult, ToolResult, and placeholder card types - Implement OpenRouterFactoryAgent with dual-path routing: - Direct path via OPENROUTER_API_KEY env var (simplest for local dev/CI) - Proxy path via realm server _request-forward (production, with billing) - Env var takes precedence over config over proxy - Implement MockFactoryAgent for deterministic testing - Add resolveFactoryModel() with CLI > env > FACTORY_DEFAULT_MODEL fallback - Add action validation, response parsing with markdown fence stripping - Add retry-once with error correction on malformed LLM responses - Add smoke-test script (pnpm factory:agent-smoke) for manual verification - 42 tests: unit tests + integration tests with mock HTTP servers Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix all prettier and qunit lint errors - Trim and treat blank OPENROUTER_API_KEY as missing (avoids bypassing proxy with empty env var in CI) - Pass authorization through as-is to avoid Bearer Bearer double-prefix - Use new URL() for proxy URL construction (safe without trailing slash) - Reject non-object toolArgs in validation instead of silently dropping - Add tests for blank API key handling and invalid toolArgs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add SkillResolver and SkillLoader modules that determine which skills to load for each ticket and read them from .agents/skills/ into AgentContext. - DefaultSkillResolver maps ticket context to skill names using keyword matching (always loads boxel-development + boxel-file-structure, adds ember-best-practices for .gts work, CLI skills for sync/restore, etc.) - SkillLoader reads SKILL.md + references/ from disk with multi-directory search (package-local then monorepo root for shared skills like ember-best-practices) - enforceSkillBudget drops low-priority skills when token budget exceeded - Skills cached for factory run duration, re-resolved between tickets - Smoke script (pnpm factory:skill-smoke) for CLI verification - 38 QUnit tests covering resolution, loading, caching, budget, fallback Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a9a3294661
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
Adds the “skills” discovery pipeline for the software-factory agent context: resolving which skills apply to a ticket, loading their on-disk content (including references / compiled rules), and enforcing a token budget so the agent prompt stays bounded.
Changes:
- Introduces
DefaultSkillResolver,SkillLoader, token estimation, andenforceSkillBudget()in a new library module. - Adds a CLI smoke script (
factory:skill-smoke) to validate resolver/loader behavior without servers or API keys. - Adds comprehensive QUnit coverage and wires the new test module into the existing node test index.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/software-factory/scripts/lib/factory-skill-loader.ts | Core skill resolution/loading logic, token estimation, and budget enforcement. |
| packages/software-factory/scripts/factory-skill-smoke.ts | CLI smoke test for end-to-end skill discovery/loading and budget behavior. |
| packages/software-factory/tests/factory-skill-loader.test.ts | New unit tests covering resolution, loading, caching, fallback dirs, and budgeting. |
| packages/software-factory/tests/index.ts | Registers the new test module. |
| packages/software-factory/package.json | Adds factory:skill-smoke script. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
… schema fields - Fix P1: Track reference filenames alongside content (NamedReference type) so filterBoxelDevelopmentRefs uses real filenames instead of fragile index-based reconstruction. Fixes misalignment from files like dev-file-editing.md not being in the keyword map. - Fix P1: Reference filtering now happens at load time (always applied when a ticket is provided) rather than only during budget enforcement. This prevents 19 reference files from bloating prompts by default. - Fix P2: extractKnowledgeSkillTags now checks knowledgeBase (Project schema) and relatedKnowledge (Ticket schema) in addition to the generic knowledge field. - Fix: Use word-boundary regex for plain keywords to avoid false positives (e.g. "sync" no longer matches "async", "track" no longer matches "stacktrace"). Dotted tokens like ".gts" still use substring. - Fix: Update enforceSkillBudget docstring to accurately describe the greedy behavior (skips large skills, may include smaller later ones). - Fix: Validate --max-tokens in smoke script (reject NaN/negative). - Fix: Prettier formatting issues that caused CI lint failure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Verifies that boxel-development references are filtered by ticket relevance at load time even when enforceSkillBudget is called with no budget (maxTokens=undefined). This covers the P1 review comment about callers that omit maxSkillTokens still getting bloated prompts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…-and-resolver # Conflicts: # packages/software-factory/tests/index.ts
Summary
DefaultSkillResolvermaps ticket context → skill names using keyword matching. Always loadsboxel-development+boxel-file-structure; conditionally addsember-best-practices(for.gts/component work),software-factory-operations(for delivery workflow), and CLI skills (boxel-sync,boxel-restore, etc.) based on ticket content.KnowledgeArticletags can also specify additional skills.SkillLoaderreadsSKILL.md+references/orrules/+AGENTS.mdfrom disk with multi-directory search (package-local.agents/skills/first, then monorepo root for shared skills likeember-best-practices). Caches for the factory run duration.enforceSkillBudget()sorts skills by priority, drops lowest-priority skills when a token budget is exceeded, and filtersboxel-developmentreferences by ticket keyword relevance.pnpm factory:skill-smoke) for quick CLI verification — no servers or API keys needed.Files
scripts/lib/factory-skill-loader.tsscripts/factory-skill-smoke.tstests/factory-skill-loader.test.tstests/index.tspackage.jsonfactory:skill-smokescriptTry it out
From
packages/software-factory/, run:1. Unit tests (38 tests)
You should see all tests pass, including the new
factory-skill-loadertests:2. Smoke test — basic skill discovery
Expected output — resolves skills for 4 sample tickets, then lists all 10 discoverable skills:
3. Smoke test — with token budget
Same output as above, but each ticket section now shows budget enforcement. Notice
ember-best-practices(~60k tokens) gets dropped when it won't fit:4. Smoke test — custom ticket text
pnpm factory:skill-smoke --ticket-text "Create a .gts component with styling"Resolves skills for your custom ticket description instead of the built-in samples.
Test plan
🤖 Generated with Claude Code