Skip to content

docs(cookbook): revamp eval-correction-loop + add Mermaid component#660

Open
SuhaniNagpal7 wants to merge 2 commits into
devfrom
docs/eval-correction-loop-revamp
Open

docs(cookbook): revamp eval-correction-loop + add Mermaid component#660
SuhaniNagpal7 wants to merge 2 commits into
devfrom
docs/eval-correction-loop-revamp

Conversation

@SuhaniNagpal7
Copy link
Copy Markdown
Contributor

@SuhaniNagpal7 SuhaniNagpal7 commented May 21, 2026

What this changes

1. New Mermaid MDX component (src/components/docs/Mermaid.astro)

Drop <Mermaid code={flowchart LR ...} /> into any MDX page and it renders. Registered in vite-docs-transform.mjs so authors don't need to import it manually. Loads mermaid@11 from jsdelivr lazily (zero cost on pages without diagrams), re-renders on theme toggle via a mutation observer on data-theme / .dark, and dedupes across multiple diagrams on the same page.

2. eval-correction-loop cookbook revamp

Restructures /docs/cookbook/evaluation/eval-correction-loop to match the cookbook playbook:

  • Kebab-case frontmatter with structured last-tested-with (Python + package versions), time-to-complete, difficulty, code-repo-url, og-image, canonical.
  • New What you'll build section names the concrete artifacts (custom eval template, agreement metric, versioning rule).
  • Why this matters rewritten with a production-incident lead (unauthorized refunds caught by finance three weeks later) instead of generic framing.
  • 5-step Mermaid flowchart of the correction loop replaces the prose-only summary.
  • Troubleshooting table gains a Verify column.
  • Next steps rewritten as a technical ladder (gate CI → multi-eval composite → drift monitoring) instead of a card grid.
  • Em dashes purged per project rule.
  • Real Evaluations dashboard screenshot from S3 added after Step 4.

Why now

The Mermaid component unblocks any future doc/cookbook PR that needs an architecture or flow diagram (PR #641 already imports it for the Imagine page). The eval-correction-loop revamp is the first cookbook to consume it end-to-end, so this PR ships the component and a proof-of-use together.

Test plan

  • pnpm dev → visit /docs/cookbook/evaluation/eval-correction-loop. Confirm the 5-step Mermaid flowchart renders, not a raw code block.
  • Toggle docs theme. Diagram re-renders in the new theme variables.
  • Load any docs page that does NOT use <Mermaid>. DevTools Network shows no mermaid chunk requested (lazy bundle confirmed).
  • pnpm astro build passes; eval-correction-loop page emits a data-mermaid wrapper in the HTML.
  • S3 screenshot URL in Step 4 loads (network 200).

Scope notes

3 other cookbook files (semantic-caching, debug-traces-from-ide, docker-compose-quickstart) have related local edits but are not in this PR by design. They'll land in a follow-up once their real dashboard screenshots are captured.

Suhani Nagpal added 2 commits May 21, 2026 13:37
Lazy-loads mermaid@11 from jsdelivr CDN the first time a <Mermaid>
appears on a page. Registered in the auto-import map so cookbook
authors can drop <Mermaid code={`...`} /> into MDX without manual
imports. Theme-aware (re-renders on dark/light toggle) and
deduplicated across multiple diagrams per page via Astro's hoisted
<script> bundling.
- Frontmatter expanded to kebab-case schema (slug, author, products,
  frameworks, difficulty, tags, og-image, canonical, last-tested-date,
  structured last-tested-with, code-repo-url, page-type).
- Rewrote description for plain-language clarity ("Build a fi.evals
  evaluator that matches your team's judgments...").
- Added "What you'll build" section with 5 concrete artifact bullets.
- Expanded "Why this matters" with concrete production-incident lead
  (unauthorized refunds) and the four playbook elements: bad outcome,
  why standard tooling misses it, which FAGI product helps, metric
  that proves the fix.
- Added Mermaid flowchart of the 5-step correction loop.
- Em dashes purged (5 occurrences) per docs style.
- Troubleshooting table gained "Verify" column.
- Replaced "Explore further" reference cards with a 4-item technical
  next-steps ladder (CI gate, 80/20 holdout, trace-pipeline promotion,
  quarterly recalibration).
- Added real Evaluations dashboard screenshot after Step 4 with
  caption explaining how the registered template appears in the list.
Copy link
Copy Markdown
Contributor

nik13 commented May 25, 2026

can you add proper comments in the pr what change you did

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants