-
Notifications
You must be signed in to change notification settings - Fork 0
Feature Integration Testing Spec
Feature ID: integration-testing
Status: Implemented - Validation Framework Enhancement Required
Created: 2025-01-23
Updated: 2025-01-24
Source:
- CRD-integration-testing.md
-
IQ_OQ_PQ_IntegrationTesting.md
Review Date: 2025-01-24
Reviewer: Claude (SDD Integration)
Create a comprehensive validation framework for CCH (Claude Context Hooks) that combines end-to-end integration testing with Installation Qualification (IQ), Operational Qualification (OQ), and Performance Qualification (PQ). This framework ensures CCH works correctly across all supported platforms and meets strict engineering standards for AI governance tools.
CCH is an AI policy enforcement tool where failures can have severe consequences:
- A missed block on a force push could corrupt production repositories
- Failed context injection might lead AI agents to make uninformed, dangerous decisions
- Incomplete audit logs could create compliance gaps or security blind spots
Current testing gaps:
- Unit tests validate individual components but not end-to-end behavior
- No cross-platform installation validation (IQ)
- No operational scenario testing with evidence capture (OQ)
- Performance benchmarks exist but lack formal qualification process (PQ)
- Integration tests use soft assertions that rarely fail
From the IQ/OQ/PQ Guide:
"When you build tools designed to enforce governance and oversight in high-stakes environments, those tools must themselves embody the same quality standards they're meant to protect."
This means:
- Shift left on quality: Validation from day one, not an afterthought
- Alignment: Tools that enforce rules must be built following strict rules
- Evidence-based trust: Every validation generates auditable artifacts
- Cross-platform reliability: Works everywhere, every time
| ID | Requirement | Priority |
|---|---|---|
| IQ-001 | Validate installation on macOS Apple Silicon (ARM64) | High |
| IQ-002 | Validate installation on macOS Intel (x86_64) | High |
| IQ-003 | Validate installation on Windows (path handling, registry) | High |
| IQ-004 | Validate installation on Linux (multiple distros) | High |
| IQ-005 | Verify cch --version returns correct version |
High |
| IQ-006 | Verify cch init creates .claude/hooks.yaml correctly |
High |
| IQ-007 | Verify log directory creation at ~/.claude/logs/
|
High |
| IQ-008 | Verify CCH hook registration with Claude settings.json | High |
| IQ-009 | Generate IQ evidence report with timestamps and artifacts | Medium |
| IQ-010 | Automate IQ via CI/CD with platform-specific runners | Medium |
| ID | Requirement | Priority |
|---|---|---|
| OQ-001 | Test rule evaluation for PreToolUse events |
High |
| OQ-002 | Test rule evaluation for PostToolUse events |
High |
| OQ-003 | Test rule evaluation for PermissionRequest events |
High |
| OQ-004 | Test rule evaluation for SessionStart events |
Medium |
| OQ-005 | Verify block action prevents tool execution |
High |
| OQ-006 | Verify inject action adds context to Claude |
High |
| OQ-007 | Verify warn mode logs but allows operations |
High |
| OQ-008 | Verify run action executes validator scripts |
Medium |
| OQ-009 | Test complex rule conditions (tool + directory + regex) | High |
| OQ-010 | Verify JSON audit log format and required fields | High |
| OQ-011 | Generate OQ evidence with test case results | Medium |
| ID | Requirement | Priority |
|---|---|---|
| PQ-001 | Cold start time < 15ms (release build) | High |
| PQ-002 | Event processing < 50ms per event | High |
| PQ-003 | Memory usage < 10MB RSS under normal operation | High |
| PQ-004 | Sustained throughput > 100 events/second | Medium |
| PQ-005 | No memory leaks over 7-day stress test | Medium |
| PQ-006 | Performance consistent across all platforms (+/- 20%) | Medium |
| PQ-007 | Timing fields in responses accurate (processing_ms) | High |
| PQ-008 | Performance benchmarks with evidence capture | Medium |
| ID | Requirement | Priority |
|---|---|---|
| FR-001 | Test framework uses Bash scripts for simplicity and portability | High |
| FR-002 | Master test runner (run-all.sh) orchestrates all test cases |
High |
| FR-003 | Shared helper functions library (test-helpers.sh) |
High |
| FR-004 | Tests invoke real Claude CLI via claude -p with --allowedTools
|
High |
| FR-005 | Each test case has setup, install, execute, verify, cleanup phases | High |
| FR-006 | Test results saved as JSON in results/ directory |
Medium |
| FR-007 | Tests fail gracefully with clear messages when Claude CLI missing | High |
| FR-008 | Quick test mode skips slow tests for rapid feedback | Low |
| FR-009 | Strict assertion mode for CI/CD (fail on any issue) | High |
| FR-010 | Timeout on Claude CLI calls to prevent hanging | High |
| ID | Test Case | Purpose |
|---|---|---|
| IQ-TC-001 | binary-exists | Verify CCH binary is in PATH |
| IQ-TC-002 | version-check | Verify --version returns expected format |
| IQ-TC-003 | init-creates-config | Verify cch init creates hooks.yaml |
| IQ-TC-004 | install-registers-hook | Verify cch install updates settings.json |
| IQ-TC-005 | log-directory-created | Verify log directory is created |
| IQ-TC-006 | validate-passes | Verify cch validate passes on valid config |
| ID | Test Case | Purpose |
|---|---|---|
| OQ-TC-001 | block-force-push | Verify CCH blocks git push --force operations |
| OQ-TC-002 | context-injection | Verify CCH injects context for .cdk.ts files |
| OQ-TC-003 | session-logging | Verify CCH creates JSON Lines audit logs |
| OQ-TC-004 | permission-explanations | Verify CCH provides context on permission requests |
| OQ-TC-005 | multi-condition-rule | Verify rules with multiple conditions work |
| OQ-TC-006 | warn-mode-allows | Verify warn mode logs but doesn't block |
| ID | Test Case | Purpose |
|---|---|---|
| PQ-TC-001 | cold-start-version | Measure cold start time with --version
|
| PQ-TC-002 | cold-start-help | Measure cold start time with --help
|
| PQ-TC-003 | event-processing | Measure event processing latency |
| PQ-TC-004 | timing-in-response | Verify timing fields in response |
| PQ-TC-005 | throughput-with-rules | Measure throughput with 20+ rules |
| PQ-TC-006 | memory-baseline | Measure memory usage baseline |
As a CCH developer
I want to run IQ tests across all platforms via CI/CD
So that I can ensure CCH installs correctly everywhere
Acceptance Criteria:
- IQ tests run on macOS (ARM64 + Intel), Windows, and Linux
- Each platform generates IQ evidence report
- CI/CD fails if any IQ check fails
As a CCH developer
I want to verify that blocking rules prevent dangerous operations with evidence
So that I can demonstrate CCH works as designed
Acceptance Criteria:
- TC-001 verifies
git push --forceis blocked - CCH logs show the block action and reason
- OQ evidence includes event payloads and log entries
As a CCH developer
I want to verify that context files are injected correctly
So that I can ensure Claude receives the additional context
Acceptance Criteria:
- TC-002 verifies context injection for
.cdk.tsfiles - Injected context appears in CCH logs
- Evidence includes before/after comparison
As a CCH developer
I want to run PQ benchmarks with evidence capture
So that I can demonstrate CCH meets <50ms latency requirements
Acceptance Criteria:
- PQ tests measure p50, p95, p99 latencies
- Evidence includes benchmark data and statistical analysis
- CI/CD fails if performance exceeds thresholds
As a release manager
I want to collect IQ/OQ/PQ evidence for each release
So that I can provide auditable validation reports
Acceptance Criteria:
- Evidence stored in
docs/validation/{iq,oq,pq}/{date}/ - Sign-off template available for approval
- Evidence linked to specific release version
| ID | Criterion | Metric |
|---|---|---|
| SC-001 | All IQ tests pass on 4 platforms | 100% pass rate |
| SC-002 | All OQ test cases pass | 100% pass rate |
| SC-003 | PQ benchmarks meet requirements | All < 50ms threshold |
| SC-004 | Evidence generated for each phase | Artifacts in docs/validation/ |
| SC-005 | CI/CD integration complete | Tests run on every PR |
| SC-006 | Strict mode catches real failures | No false positives/negatives |
docs/validation/iq/{date}/
├── report.md # Summary report
├── install.log # Installation output
├── version.txt # Version verification
├── config-perms.txt # File permissions
├── hooks.yaml # Configuration copy
└── platform-specific/
├── codesign.txt # macOS code signing
└── registry.txt # Windows registry
docs/validation/oq/{date}/
├── report.md # Summary report
├── test-results.json # All test case results
├── test-cases/
│ ├── OQ-TC-001/
│ │ ├── hooks.yaml # Test configuration
│ │ ├── event.json # Input event
│ │ ├── log.json # CCH log entry
│ │ └── result.json # Pass/fail with details
│ └── OQ-TC-002/
│ └── ...
└── coverage-summary.md # Test coverage metrics
docs/validation/pq/{date}/
├── report.md # Summary report
├── benchmark-results.json # Raw benchmark data
├── latency-analysis.md # Statistical analysis
├── memory-profile.txt # Memory usage data
└── platform-comparison.md # Cross-platform comparison
| Edge Case | Expected Behavior |
|---|---|
| Claude CLI not installed | Test suite fails with clear error message |
| CCH binary not built | Test runner auto-builds via dependencies |
| Permission prompt appears | Tests timeout after 60 seconds |
| Log directory doesn't exist | Tests create directory as needed |
| Previous test artifacts exist | Tests clean up before and after |
| Claude refuses dangerous command | Test passes with note (expected safe behavior) |
| Platform-specific path issues | Normalize paths in test helpers |
| GAP-ID | Description | Impact | Resolution |
|---|---|---|---|
| GAP-001 | Soft assertions everywhere | Tests rarely fail even when hooks don't trigger | Add --strict mode |
| GAP-002 | No CI/CD workflow | Tests not automated in GitHub Actions | Create workflow file |
| GAP-003 | No timeout on Claude calls | Tests can hang indefinitely | Add 60s timeout |
test/integration/
├── README.md # Documentation
├── run-all.sh # Master test runner
├── lib/
│ └── test-helpers.sh # Shared bash functions
├── use-cases/
│ ├── 01-block-force-push/ # OQ-TC-001
│ ├── 02-context-injection/ # OQ-TC-002
│ ├── 03-session-logging/ # OQ-TC-003
│ └── 04-permission-explanations/ # OQ-TC-004
└── results/ # Test outputs (gitignored)
cch_cli/tests/
├── iq_installation.rs # IQ tests in Rust
├── oq_us1_blocking.rs # OQ blocking tests
├── oq_us2_injection.rs # OQ injection tests
├── oq_us3_validators.rs # OQ validator tests
├── oq_us4_permissions.rs # OQ permission tests
├── oq_us5_logging.rs # OQ logging tests
└── pq_performance.rs # PQ benchmark tests
docs/validation/
├── README.md # Validation framework docs
├── iq/ # Installation evidence
├── oq/ # Operational evidence
├── pq/ # Performance evidence
└── sign-off/
└── TEMPLATE-validation-report.md
| Dependency | Purpose |
|---|---|
| Claude CLI | Tool invocation (claude -p) |
| CCH binary | Hook processing |
| Bash shell | Test execution |
| Cargo | Rust test framework for IQ/OQ/PQ |
| GitHub Actions | CI/CD automation |
| jq (optional) | JSON parsing in tests |
integration-test:
desc: Run CCH + Claude CLI integration tests
aliases: [itest]
deps: [build]
cmds:
- ./test/integration/run-all.sh
integration-test-quick:
desc: Run quick integration tests (skip slow ones)
aliases: [itest-quick]
deps: [build]
cmds:
- ./test/integration/run-all.sh --quick
iq-test:
desc: Run Installation Qualification tests
cmds:
- cargo test --release iq_ -- --nocapture
oq-test:
desc: Run Operational Qualification tests
cmds:
- cargo test --release oq_ -- --nocapture
pq-test:
desc: Run Performance Qualification tests
cmds:
- cargo test --release pq_ -- --nocapture
validation-all:
desc: Run full IQ/OQ/PQ validation suite
deps: [build]
cmds:
- task iq-test
- task oq-test
- task pq-test
- task integration-test| PRD Requirement | Spec Requirement | Test Case |
|---|---|---|
| Installation works on all platforms | IQ-001 through IQ-004 | IQ-TC-001 through IQ-TC-006 |
| Block dangerous operations | OQ-005 | OQ-TC-001 |
| Inject context files | OQ-006 | OQ-TC-002 |
| Create audit logs | OQ-010 | OQ-TC-003 |
| <50ms latency | PQ-002 | PQ-TC-003 |
| Cross-platform consistency | PQ-006 | All PQ tests on all platforms |
- Tests use
set -euo pipefailfor strict error handling - The
using-claude-code-cliskill was added to AGENTS.md to support this feature - PQ tests automatically adjust thresholds for debug vs release builds (10x multiplier)
- Evidence collection should be automated via CI/CD artifacts
- Sign-off process required for regulated environments