-
Notifications
You must be signed in to change notification settings - Fork 5
Prompt quality auditing: self-validation protocol for assembled prompts #134
Description
Summary
Add a prompt quality auditing protocol that validates the assembled prompt itself — checking structural completeness, coherence, and alignment before the prompt is used.
Motivation
GitHub Spec Kit introduced a powerful concept: "unit tests for requirements" — checklists that test whether the requirements themselves are well-written, not whether the implementation works. Examples:
- ✅ ""Are error handling requirements specified for all failure modes?"" (tests the spec)
- ❌ ""Test error handling works"" (tests the implementation)
PromptKit should apply this same rigor to its own output: the assembled prompt. Before a user loads a prompt into an LLM session, we should be able to validate:
- Are all
{{param}}placeholders filled? (no leftover template variables) - Does the persona's domain align with the template's domain?
- Are non-goals specified? (prevents scope creep)
- Are all referenced protocols actually loaded?
- Does the format's
producesartifact type match the template'soutput_contract? - Are there conflicting instructions between protocols?
- Is the prompt within reasonable token bounds for the target model?
Proposed Design
- New protocol:
guardrails/prompt-self-validation— a meta-protocol that checks the assembled prompt for structural issues - Checklist approach: Following Spec Kit's pattern, produce a quality checklist:
- Completeness: All params filled, all sections present
- Coherence: Persona domain matches task domain
- Consistency: No conflicting protocol instructions
- Measurability: Output format has concrete structure requirements
- Scope: Non-goals are defined
- CLI integration:
npx promptkit assemblecould run validation automatically and emit warnings
Credit
This pattern is inspired by GitHub Spec Kit's ""unit tests for requirements"" concept in their checklist-template.md. The insight — testing the specification quality rather than the implementation — maps directly to testing prompt quality rather than LLM output quality.