code-yeongyu · madgegja · Jun 7, 2026
diff --git a/README.md b/README.md
@@ -61,7 +61,7 @@ LazyCodex installs these as OmO commands for Codex. Invoke them with the
 | --- | --- | --- |
 | `$ulw-loop` | `$ulw-loop "task" [--completion-promise=TEXT] [--strategy=reset\|continue]` | Self-referential loop that runs until Oracle-verified completion. Caps at 500 iterations in ultrawork mode, 100 in normal mode. |
 | `$ulw-plan` | `$ulw-plan "what to build"` | Prometheus strategic planner. Writes a plan to `plans/<slug>.md`. Never writes product code. |
-| `$start-work` | `$start-work [plan-name] [--worktree <path>]` | Executes a plan until every checkbox is done. Prints **ORCHESTRATION COMPLETE**. |
+| `$start-work` | `$start-work [plan-name] [--worktree <path>]` | Executes a plan until every checkbox is done, then requires the global post-implementation review and debugging gate before **ORCHESTRATION COMPLETE**. |
 
 Full documentation lives at [lazycodex.ai/docs](https://lazycodex.ai/docs).
 
@@ -86,7 +86,8 @@ Use `$ulw-plan` when the work needs decisions before implementation. It writes a
 plan to `plans/<slug>.md` and does not touch product code.
 
 Use `$start-work` when a plan is ready. It executes the checklist with durable
-Boulder progress and stops only when the plan is complete.
+Boulder progress and stops only when the plan is complete and the global
+post-implementation review plus debugging gate has passed.
 
 Use `$ulw-loop` when the task should keep moving until the result is verified by
 evidence instead of a hopeful status update.
@@ -100,9 +101,9 @@ actual work:
 | --- | --- |
 | `/init-deep` | Hierarchical project memory through `AGENTS.md` |
 | `$ulw-plan` | Decision-complete planning before code changes |
-| `$start-work` | Durable plan execution with Boulder progress |
+| `$start-work` | Durable plan execution with Boulder progress, post-implementation review, and debugging gate |
 | `$ulw-loop` | Verified completion for open-ended tasks |
-| `review-work` | Multi-angle post-implementation review |
+| `review-work` | Multi-angle post-implementation review that blocks completion when any lane fails or is inconclusive |
 | `remove-ai-slops` | Behavior-preserving cleanup of AI-looking code |
 | `frontend-ui-ux` | Polished UI surfaces |
 | `programming` | Strict TypeScript, Rust, Python, or Go discipline |

diff --git a/packages/web/content/docs/skills.md b/packages/web/content/docs/skills.md
@@ -4,21 +4,21 @@ LazyCodex is most useful as a harness for complex codebases: project memory, pla
 
 Start with `/init-deep` when the repository is too large or too old to explain from memory. It generates hierarchical `AGENTS.md` context so agents can find the right files before they change code.
 
-Use `$ulw-plan` when the work needs decisions before implementation, `$start-work` when a plan should be executed, and `$ulw-loop` when you want the agent to keep going until the result is verified.
+Use `$ulw-plan` when the work needs decisions before implementation, `$start-work` when a plan should be executed through a final review/debugging gate, and `$ulw-loop` when you want the agent to keep going until the result is verified.
 
 ### Feature coverage
 
 The three command pillars stay simple:
 
 - `$ulw-loop` keeps moving until verified completion
 - `$ulw-plan` turns fuzzy work into a decision-complete plan
-- `$start-work` executes a plan with durable Boulder progress
+- `$start-work` executes a plan with durable Boulder progress, post-implementation review, and a debugging gate
 
 Skills add specialist judgment around those pillars:
 
 | Skill | Use it for |
 | --- | --- |
-| `review-work` | Multi-angle post-implementation review |
+| `review-work` | Multi-angle post-implementation review that blocks completion when any lane fails or is inconclusive |
 | `remove-ai-slops` | Behavior-preserving cleanup of AI-looking code |
 | `frontend-ui-ux` | Designed UI work instead of generic layout filling |
 | `programming` | Strict TypeScript, Rust, Python, or Go discipline |

diff --git a/packages/web/content/docs/start-work.md b/packages/web/content/docs/start-work.md
@@ -6,6 +6,7 @@
 - A Stop-hook re-injects the next turn until the plan is complete
 - Independent sub-tasks fan out to parallel subagents
 - Strict TDD plus five evidence gates: plan reread, automated verification, manual-QA, adversarial QA, cleanup
+- A final Global Review and Debugging Gate runs `review-work`, records a debugging audit, and blocks completion or PR handoff on failed or inconclusive lanes
 - Progress is recorded to a ledger
 
 ### Syntax
@@ -16,4 +17,4 @@ $start-work [plan-name] [--worktree <absolute-path>]
 
 ### Done
 
-It prints an `ORCHESTRATION COMPLETE` block when every checkbox is checked.
+It prints an `ORCHESTRATION COMPLETE` block only when every checkbox is checked and the global post-implementation review plus debugging gate has passed.
diff --git a/packages/web/lib/commands.ts b/packages/web/lib/commands.ts
@@ -33,11 +33,12 @@ export const COMMANDS: readonly LazyCommand[] = [
     name: "$start-work",
     glyph: "work",
     syntax: "$start-work [plan-name] [--worktree <path>]",
-    summary: "Executes a Prometheus plan until every checkbox is done.",
+    summary: "Executes a Prometheus plan through every checkbox and the final review/debugging gate.",
     facts: [
       "Durable Boulder state survives across turns",
       "Parallel subagents, strict TDD + 5 evidence gates",
-      "Prints ORCHESTRATION COMPLETE when finished",
+      "Global review + debugging gate blocks completion and PR handoff",
+      "Prints ORCHESTRATION COMPLETE only after the gate passes",
     ],
   },
 ] as const;
diff --git a/packages/web/lib/docs-content.generated.ts b/packages/web/lib/docs-content.generated.ts
@@ -2,9 +2,9 @@
 export const DOC_SOURCES: Record<string, string> = {
   "overview.md": "<p>LazyCodex packages <a href=\"https://github.com/code-yeongyu/oh-my-openagent\">oh-my-openagent</a> (OmO) inside Codex as the agent harness for complex codebases. Think <a href=\"https://github.com/LazyVim/LazyVim\">LazyVim</a> for <a href=\"https://github.com/folke/lazy.nvim\">lazy.nvim</a>, but for Codex.</p>\n<h3>What you get</h3>\n<p>OmO gives Codex a full agent harness: discipline agents (Sisyphus orchestrates Hephaestus, Oracle, and Librarian), parallel execution, multi-model routing, a skills system, hooks and lifecycle, and verification defaults. LazyCodex packages that harness as a repeatable Codex setup.</p>\n<h3>The harness workflow</h3>\n<p>Use <code>{your prompt} ultrawork</code> when the job needs project memory, planning, parallel agents, and verified completion to run as one coordinated loop.</p>\n<h3>How it fits together</h3>\n<p>LazyCodex is a thin distribution layer. The core engine is <a href=\"https://github.com/code-yeongyu/oh-my-openagent\">OmO</a>. LazyCodex is maintained by <a href=\"https://sisyphuslabs.ai\">Sisyphus Labs</a>.</p>\n<p>Credit: The LazyCodex name idea is inspired by <a href=\"https://github.com/LazyVim/LazyVim\">LazyVim</a>. The Ultragoal and UltraQA ideas are inspired by <a href=\"https://github.com/Yeachan-Heo/oh-my-codex\">oh-my-codex</a>, reimplemented from concept for this Codex setup.</p>\n<ul>\n<li><a href=\"https://github.com/code-yeongyu/lazycodex\">LazyCodex on GitHub</a></li>\n<li><a href=\"https://github.com/code-yeongyu/oh-my-openagent\">OmO on GitHub</a></li>\n</ul>\n",
   "installation.md": "<p>One command installs the OmO agent harness for Codex without a global package install.</p>\n<h3>Install</h3>\n<pre><code class=\"language-bash\">npx lazycodex-ai install\n</code></pre>\n<p>This is exactly equivalent to <code>npx --yes --package oh-my-openagent omo install --platform=codex</code>.</p>\n<h3>Autonomous one-liner</h3>\n<pre><code class=\"language-bash\">npx lazycodex-ai install --no-tui --codex-autonomous\n</code></pre>\n<h3>Prerequisites</h3>\n<ul>\n<li><a href=\"https://bun.sh\">Bun</a></li>\n<li>The <a href=\"https://github.com/openai/codex\">OpenAI Codex CLI</a></li>\n</ul>\n<blockquote>\n<p>Do NOT use <code>npm install -g</code> or <code>bun add -g</code>. Always invoke via <code>npx</code>.</p>\n</blockquote>\n<h3>Let an agent do it</h3>\n<p>It is strongly recommended to let an LLM agent run the install and walk the setup for you. The agent handles subscription detection, model selection, and provider auth automatically.</p>\n",
-  "skills.md": "<p>LazyCodex is most useful as a harness for complex codebases: project memory, planning, execution, verified completion, skills, hooks, model routing, and diagnostics.</p>\n<h3>Built-in workflows</h3>\n<p>Start with <code>/init-deep</code> when the repository is too large or too old to explain from memory. It generates hierarchical <code>AGENTS.md</code> context so agents can find the right files before they change code.</p>\n<p>Use <code>$ulw-plan</code> when the work needs decisions before implementation, <code>$start-work</code> when a plan should be executed, and <code>$ulw-loop</code> when you want the agent to keep going until the result is verified.</p>\n<h3>Feature coverage</h3>\n<p>The three command pillars stay simple:</p>\n<ul>\n<li><code>$ulw-loop</code> keeps moving until verified completion</li>\n<li><code>$ulw-plan</code> turns fuzzy work into a decision-complete plan</li>\n<li><code>$start-work</code> executes a plan with durable Boulder progress</li>\n</ul>\n<p>Skills add specialist judgment around those pillars:</p>\n<table>\n<thead>\n<tr>\n<th>Skill</th>\n<th>Use it for</th>\n</tr>\n</thead>\n<tbody><tr>\n<td><code>review-work</code></td>\n<td>Multi-angle post-implementation review</td>\n</tr>\n<tr>\n<td><code>remove-ai-slops</code></td>\n<td>Behavior-preserving cleanup of AI-looking code</td>\n</tr>\n<tr>\n<td><code>frontend-ui-ux</code></td>\n<td>Designed UI work instead of generic layout filling</td>\n</tr>\n<tr>\n<td><code>programming</code></td>\n<td>Strict TypeScript, Rust, Python, or Go discipline</td>\n</tr>\n<tr>\n<td><code>LSP</code></td>\n<td>Diagnostics, definitions, references, symbols, and renames</td>\n</tr>\n<tr>\n<td><code>AST-grep</code></td>\n<td>Structural search and rewrite across code</td>\n</tr>\n<tr>\n<td><code>rules</code></td>\n<td>Project instructions from AGENTS, rules, and instruction files</td>\n</tr>\n<tr>\n<td><code>comment-checker</code></td>\n<td>Feedback after edit-like operations</td>\n</tr>\n</tbody></table>\n<h3>Where skills live</h3>\n<p>OmO can load skills from project and user locations such as <code>.opencode/skills</code>, <code>~/.config/opencode/skills</code>, <code>.claude/skills</code>, <code>.agents/skills</code>, and <code>~/.agents/skills</code>.</p>\n<p>LazyCodex installs the Codex Light setup with:</p>\n<pre><code class=\"language-bash\">npx lazycodex-ai install\n</code></pre>\n<p>That installer wires the Codex marketplace plugin as <code>omo@sisyphuslabs</code> while keeping the public package alias easy to remember.</p>\n",
+  "skills.md": "<p>LazyCodex is most useful as a harness for complex codebases: project memory, planning, execution, verified completion, skills, hooks, model routing, and diagnostics.</p>\n<h3>Built-in workflows</h3>\n<p>Start with <code>/init-deep</code> when the repository is too large or too old to explain from memory. It generates hierarchical <code>AGENTS.md</code> context so agents can find the right files before they change code.</p>\n<p>Use <code>$ulw-plan</code> when the work needs decisions before implementation, <code>$start-work</code> when a plan should be executed through a final review/debugging gate, and <code>$ulw-loop</code> when you want the agent to keep going until the result is verified.</p>\n<h3>Feature coverage</h3>\n<p>The three command pillars stay simple:</p>\n<ul>\n<li><code>$ulw-loop</code> keeps moving until verified completion</li>\n<li><code>$ulw-plan</code> turns fuzzy work into a decision-complete plan</li>\n<li><code>$start-work</code> executes a plan with durable Boulder progress, post-implementation review, and a debugging gate</li>\n</ul>\n<p>Skills add specialist judgment around those pillars:</p>\n<table>\n<thead>\n<tr>\n<th>Skill</th>\n<th>Use it for</th>\n</tr>\n</thead>\n<tbody><tr>\n<td><code>review-work</code></td>\n<td>Multi-angle post-implementation review that blocks completion when any lane fails or is inconclusive</td>\n</tr>\n<tr>\n<td><code>remove-ai-slops</code></td>\n<td>Behavior-preserving cleanup of AI-looking code</td>\n</tr>\n<tr>\n<td><code>frontend-ui-ux</code></td>\n<td>Designed UI work instead of generic layout filling</td>\n</tr>\n<tr>\n<td><code>programming</code></td>\n<td>Strict TypeScript, Rust, Python, or Go discipline</td>\n</tr>\n<tr>\n<td><code>LSP</code></td>\n<td>Diagnostics, definitions, references, symbols, and renames</td>\n</tr>\n<tr>\n<td><code>AST-grep</code></td>\n<td>Structural search and rewrite across code</td>\n</tr>\n<tr>\n<td><code>rules</code></td>\n<td>Project instructions from AGENTS, rules, and instruction files</td>\n</tr>\n<tr>\n<td><code>comment-checker</code></td>\n<td>Feedback after edit-like operations</td>\n</tr>\n</tbody></table>\n<h3>Where skills live</h3>\n<p>OmO can load skills from project and user locations such as <code>.opencode/skills</code>, <code>~/.config/opencode/skills</code>, <code>.claude/skills</code>, <code>.agents/skills</code>, and <code>~/.agents/skills</code>.</p>\n<p>LazyCodex installs the Codex Light setup with:</p>\n<pre><code class=\"language-bash\">npx lazycodex-ai install\n</code></pre>\n<p>That installer wires the Codex marketplace plugin as <code>omo@sisyphuslabs</code> while keeping the public package alias easy to remember.</p>\n",
   "ultrawork.md": "<p>ultrawork is the headline mode. Typing <code>ultrawork</code> (or the short alias <code>ulw</code>) anywhere in your prompt activates maximum-precision, outcome-first, evidence-driven orchestration.</p>\n<blockquote>\n<p>&quot;Plan, execute, verify, and keep the evidence attached.&quot;</p>\n</blockquote>\n<h3>Usage</h3>\n<pre><code class=\"language-bash\">ulw add authentication\n</code></pre>\n<h3>What it enforces</h3>\n<ul>\n<li>Strict TDD: RED → GREEN → SURFACE → CLEAN</li>\n<li>At least 3 realistic QA scenarios</li>\n<li>Real manual-QA channels (HTTP call, tmux, browser)</li>\n<li>A binding verification gate that loops until the work is genuinely done</li>\n</ul>\n",
   "ulw-loop.md": "<p><code>$ulw-loop</code> is a self-referential development loop that runs until verified completion.</p>\n<h3>How it works</h3>\n<p>The agent works continuously and emits <code>&lt;promise&gt;DONE&lt;/promise&gt;</code> when it believes the task is complete, but that does NOT end the loop. An Oracle must verify the result first. The loop ends only after the system confirms Oracle verified it. If verification fails, it continues with the message: &quot;Oracle verification failed. Continuing ULTRAWORK loop.&quot;</p>\n<h3>Syntax</h3>\n<pre><code class=\"language-bash\">$ulw-loop &quot;task description&quot; [--completion-promise=TEXT] [--strategy=reset|continue]\n</code></pre>\n<h3>Limits</h3>\n<p>The iteration cap is 500 in ultrawork mode (100 in normal mode).</p>\n",
   "ulw-plan.md": "<p><code>$ulw-plan</code> is the strategic planning consultant (Prometheus). It turns an idea into a decision-complete work plan. It is a planner, NOT an implementer. When you say &quot;do X&quot; it produces a plan for X and never writes product code.</p>\n<h3>The flow</h3>\n<ol>\n<li>Socratic interview</li>\n<li>Parallel codebase exploration</li>\n<li>Metis gap analysis</li>\n<li>Writes the plan to <code>plans/&lt;slug&gt;.md</code></li>\n<li>Optional Momus high-accuracy review</li>\n</ol>\n<h3>Output</h3>\n<p>Questions, research, and a work plan you can hand to <a href=\"#start-work\"><code>$start-work</code></a>.</p>\n",
-  "start-work.md": "<p><code>$start-work</code> executes a Prometheus work plan until every top-level checkbox is done.</p>\n<h3>How it works</h3>\n<ul>\n<li>Durable Boulder state in <code>.omo/boulder.json</code> survives across turns and sessions</li>\n<li>A Stop-hook re-injects the next turn until the plan is complete</li>\n<li>Independent sub-tasks fan out to parallel subagents</li>\n<li>Strict TDD plus five evidence gates: plan reread, automated verification, manual-QA, adversarial QA, cleanup</li>\n<li>Progress is recorded to a ledger</li>\n</ul>\n<h3>Syntax</h3>\n<pre><code class=\"language-bash\">$start-work [plan-name] [--worktree &lt;absolute-path&gt;]\n</code></pre>\n<h3>Done</h3>\n<p>It prints an <code>ORCHESTRATION COMPLETE</code> block when every checkbox is checked.</p>\n"
+  "start-work.md": "<p><code>$start-work</code> executes a Prometheus work plan until every top-level checkbox is done.</p>\n<h3>How it works</h3>\n<ul>\n<li>Durable Boulder state in <code>.omo/boulder.json</code> survives across turns and sessions</li>\n<li>A Stop-hook re-injects the next turn until the plan is complete</li>\n<li>Independent sub-tasks fan out to parallel subagents</li>\n<li>Strict TDD plus five evidence gates: plan reread, automated verification, manual-QA, adversarial QA, cleanup</li>\n<li>A final Global Review and Debugging Gate runs <code>review-work</code>, records a debugging audit, and blocks completion or PR handoff on failed or inconclusive lanes</li>\n<li>Progress is recorded to a ledger</li>\n</ul>\n<h3>Syntax</h3>\n<pre><code class=\"language-bash\">$start-work [plan-name] [--worktree &lt;absolute-path&gt;]\n</code></pre>\n<h3>Done</h3>\n<p>It prints an <code>ORCHESTRATION COMPLETE</code> block only when every checkbox is checked and the global post-implementation review plus debugging gate has passed.</p>\n"
 };
diff --git a/packages/web/lib/site-config.ts b/packages/web/lib/site-config.ts
@@ -33,7 +33,7 @@ export const SITE_CONFIG = {
       },
       {
         label: "Plans before edits",
-        text: "$ulw-plan turns ambiguous work into a decision-complete plan, then $start-work executes it with durable Boulder progress.",
+        text: "$ulw-plan turns ambiguous work into a decision-complete plan, then $start-work executes it with durable Boulder progress and blocks completion on the global review/debugging gate.",
       },
       {
         label: "Evidence at the end",

diff --git a/plugins/omo/components/rules/bundled-rules/hephaestus.md b/plugins/omo/components/rules/bundled-rules/hephaestus.md
@@ -98,6 +98,27 @@ omo-codex bundles three read-only Codex subagent roles in `CODEX_HOME/agents/`:
 - **Verify.** Diagnostics on changed files, related tests, build if applicable - in parallel where possible.
 - **Manually QA.** Drive the artifact through its surface (Manual QA Gate). Then write the final message.
 
+# Post-implementation Global Review and Debugging Gate
+
+For significant implementation work, PR creation, or PR handoff, completion is
+blocked until a global review and debugging gate passes.
+
+1. Run `review-work` after implementation verification. All five lanes must
+   PASS. Failed, timed-out, missing-deliverable, ack-only, `BLOCKED:`, or
+   inconclusive lanes block completion.
+2. Run a debugging-oriented audit against the changed surface: name at least
+   three plausible runtime failure hypotheses, run distinguishing checks, and
+   record the evidence that each was ruled out or confirmed.
+3. If review or debugging finds a real issue, use the `debugging` skill to
+   confirm root cause with runtime evidence, add a failing test or reproduction,
+   fix minimally, and rerun the gate.
+4. Redact or mask secrets and sensitive user data before writing evidence to a
+   ledger, PR body, or handoff. Never include raw tokens, credentials, auth
+   headers, cookies, API keys, env dumps, private logs, or PII; use concise
+   summaries, lengths, hashes, or short non-sensitive prefixes instead.
+5. For PR work, refresh branch/PR state after the gate and include only
+   redacted review/debugging evidence in the PR body or handoff.
+
 # Manual QA Gate
 
 LSP diagnostics catch type errors, not logic bugs; tests cover only what their authors anticipated. **"Done" requires you have personally used the deliverable through its matching surface and observed it working** within this turn. The surface determines the tool:
@@ -170,6 +191,7 @@ Done when ALL of:
 - LSP diagnostics clean on every file you changed.
 - Build (if applicable) exits 0; tests pass, or pre-existing failures are explicitly named with the reason.
 - The artifact has been driven through its matching surface in this turn (Manual QA Gate).
+- Significant implementation work, PR creation, and PR handoff have passed the Post-implementation Global Review and Debugging Gate.
 - The final message reports what you did, what you verified, what you could not verify (with the reason), and any pre-existing issues you noticed but did not touch.
 
 When you think you are done: re-read the original request and your intent line. Did every committed action complete? Run verification once more on changed files in parallel. Then report.