Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ Please choose versions by [Semantic Versioning](http://semver.org/).
* MINOR version when you add functionality in a backwards-compatible manner, and
* PATCH version when you make backwards-compatible bug fixes.

## unreleased

- feat: `/vault-cli:plan-task` Step 5's E2E verify subtask check now also rejects *vague* verify subtasks. The body must describe both *what to do* and *what to expect* — at least one concrete shape (procedure to execute, observable to check, or artifact to inspect) plus a result a reader could independently confirm. Bare promises like *"Verify the endpoint"* fail; procedure-only steps like *"run a check on the endpoint"* also fail (no expected result); concrete steps like *"curl /widgets, confirm 200 + body matches schema"* pass. LLM quality call (no verb list or regex). Closes the *vague-verify* hole that PR #15's *missing-verify* fix left open.

## v0.73.0

- feat: `/vault-cli:plan-task` Step 5 now enforces five planning-gate checks instead of two. Adds three new non-negotiables: an e2e verify subtask for shipping-class tasks (rejects all 9 dishonest-tick phrases from `task-writing.md:122-134`); subtask-goal alignment (every `# Tasks` checkbox must map to a `# Success Criteria` outcome or be the verify subtask, else flagged as scope-creep); and a soft KISS warning when `# Tasks` has > 8 checkboxes (owner can still proceed). Step 7's phase-transition gate now requires all four hard non-negotiables to pass, not just the original two. Closes a gap where plan-task let tasks pass while missing verification subtasks (e.g. BRO-20548 closed without an e2e check).
Expand Down
35 changes: 23 additions & 12 deletions commands/plan-task.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,18 +66,29 @@ Five checks beyond the auditor's general scoring — first four are hard (any fa

- **Success Criteria defined** — `# Success Criteria` section exists with ≥ 2 binary checkboxes.
- **Subtasks reach the goal** — `# Tasks` section (or equivalent) lists concrete steps that, if completed, produce the SC outcomes. If subtasks are missing or vague ("Implement feature" alone), flag.
- **E2E verify subtask present** — for shipping-class tasks (PR / release / plugin update / agent / deploy / library publish; or subtasks reference a git repo / marketplace / registry — see `task-writing.md` "Shipping Checklist"), `# Tasks` must include a subtask that runs the shipped artifact in its real environment. Reject the subtask if its body contains a case-insensitive substring match of any dishonest-tick phrase from `task-writing.md:122-134`:
- *"deferred to first use"*
- *"deferred — will validate"*
- *"will check next session"*
- *"will verify on first use"*
- *"first deployment will test"*
- *"trust the audit"*
- *"trust CI"*
- *"trust the tests"*
- *"will validate later"*

Skip this check for non-shipping-class tasks (pure research, decision, doc-only with no published artifact).
- **E2E verify subtask present** — for shipping-class tasks (PR / release / plugin update / agent / deploy / library publish; or subtasks reference a git repo / marketplace / registry — see `task-writing.md` "Shipping Checklist"), `# Tasks` must include a subtask that runs the shipped artifact in its real environment. Two sub-checks on that subtask:

1. **No dishonest-tick phrases.** Reject if the body contains a case-insensitive substring match of any phrase from `task-writing.md:122-134`:
- *"deferred to first use"*
- *"deferred — will validate"*
- *"will check next session"*
- *"will verify on first use"*
- *"first deployment will test"*
- *"trust the audit"*
- *"trust CI"*
- *"trust the tests"*
- *"will validate later"*

2. **Concrete procedure, not just a promise.** The body must describe HOW verification happens AND what result counts as success — a reader must know both *what to do* and *what to expect*. Three shapes count as concrete (any one is sufficient; combinations are stronger):
- a **procedure to execute** — `curl /widgets`, `kubectl get pod foo`, `open the rendered page`, `run make docs-build`, `gh release list`
- an **observable to check** — `HTTP 200`, `exit 0`, `log contains "X"`, `table renders without overflow`, `tag v0.74.0 exists`
- an **artifact to inspect** — `output matches schema docs/widget-response.schema.json`, `marketplace.json version equals git tag`, `rendered README has working Code-Of-Conduct link`

A verify subtask passes when its body covers (a) at least one of the three shapes AND (b) a result a reader could independently confirm. **Both clauses required** — a procedure without an expected result is still vague. Vague fails: *"Verify the endpoint"* names a target but no action and no expected result; *"Verify it works"* names neither; *"run a check on the endpoint"* names a procedure shape but no expected result (the (b) clause fails). Concrete passes — HTTP: *"curl /widgets, confirm 200 + body matches schema"*; CLI: *"run `scenarios/release.md`, confirm exit 0"*; doc: *"open the rendered README, confirm the install table renders + Code-Of-Conduct link works"*; K8s: *"kubectl get pod foo, confirm Running + log contains 'startup complete'"*.

LLM quality call (no verb list, no regex) — the rule above IS the anchor. Re-read it when in doubt; the procedure / observable / artifact taxonomy defines what concrete means here.

Skip this whole check for non-shipping-class tasks (pure research, decision, doc-only with no published artifact).
- **Subtask-goal alignment** — every `# Tasks` checkbox must either (a) map by topic to ≥ 1 `# Success Criteria` outcome, or (b) be the e2e verify subtask. Flag any orphan as a scope-creep candidate; in step 6 the owner can link it to an SC, move it to `# Out of Scope`, or split it into a separate task.

**Soft:**
Expand Down
Loading