Skip to content

feat(mcp): rewrite dashboard authoring prompts; expose filters on save tool#2264

Merged
kodiakhq[bot] merged 13 commits into
mainfrom
alex/HDX-mcp-prompts-overhaul
May 20, 2026
Merged

feat(mcp): rewrite dashboard authoring prompts; expose filters on save tool#2264
kodiakhq[bot] merged 13 commits into
mainfrom
alex/HDX-mcp-prompts-overhaul

Conversation

@alex-fedotyev
Copy link
Copy Markdown
Contributor

@alex-fedotyev alex-fedotyev commented May 12, 2026

Summary

Drew shipped the schemas in #2273 (dashboard-level filters, per-tile onClick, drill-down navigation). This PR ships the prompts and examples that teach an MCP-using agent how to use those primitives correctly. Without it, an LLM building a HyperDX dashboard via MCP has the API but doesn't know when or how to call it.

What lands

create_dashboard design checklist. A ten-rule scannable preamble at the top of the prompt covers RED columns with aliases, per-series numberFormat for durations, groupByColumnsOnLeft for inventory tables, dashboard-level filters instead of per-tile where literals, one-metric-per-tile for metric sources, and containers and tabs for grouping. Replaces the wall-of-JSON canonical example in the old prompt so the model picks the right pattern in one read.

Four verified dashboard_examples patterns. service_inventory, service_detail, log_analytics, backend_dependencies. Each carries a "When to use" header and a "Why this shape" note. Each was built and rendered on a live dev stack with realistic synthetic data (90K traces across 8 services, 17K logs with severity distribution) before landing. That verification surfaced two design issues now captured as explicit rules: chart-level numberFormat on a table mixing counts and durations formats every value as a duration (per-series numberFormat is the fix), and metric tiles take exactly one select item because the renderer destructures select[0] and ignores the rest (multi-metric authoring needs one tile per metric).

query_guide additions. New sections for DASHBOARD FILTERS (canonical { type, name, expression, sourceId, where?, whereLanguage? } shape), NUMBER FORMAT (per-series vs chart-level distinction), and PER-TILE TYPE CONSTRAINTS (the metric one-select rule).

hyperdx_save_dashboard filters parameter. Widened the tool's input handling to accept both ExternalDashboardFilter and ExternalDashboardFilterWithId shapes via the same body schemas the v2 REST handler already uses (createDashboardBodySchema, updateDashboardBodySchema). MCP and REST surfaces stay in lockstep; the existing convertExternalFiltersToInternal helper handles the conversion without a translation layer.

service_inventory row-click wiring. Concrete worked example of how to compose Drew's onClick primitive: the Services table in the service_inventory example carries an onClick that drills into the partner Service Detail dashboard. Uses mode: "template" with the constant template "Service Detail" so the destination resolves by name at click time and there's no order-of-save dependency between the two dashboards.

Post-Claude-run prompt refinements. Three follow-up commits capture bugs the v1 prompts surfaced when Claude actually authored dashboards against them. The groupBy alias gap (LLM kept emitting SELECT toStartOfMinute(t) AS time without aliasing the groupBy column). A broader lucene-gotcha note. Inlined concrete examples for the patterns that were ambiguous on first pass.

Voice pass. Em-dashes dropped from buildSourceSummary and from the dashboard authoring prompt content. Snapshot test guards regressions.

Co-authorship and rebase

The branch composes Drew's MCP onClick schemas and prompts commit (78e72652) on top of his already-merged #2273. Drew authored that commit; the service_inventory row-click wiring on top is the concrete demonstration of the foundation he built. Drew's commit is preserved in the chain to keep attribution intact. Drew's chore: Remove IS_DASHBOARD_LINKING_ENABLED toggle dropped during rebase as already-upstream via #2273.

Schema design note

mcpDashboardFilterSchema and mcpFiltersParam use the inline-schema design from main (the version Drew's #2273 shipped). An earlier version of this branch defined mcpFiltersParam as z.array(externalDashboardFilterSchemaWithId.partial({ id: true })) to reuse the canonical REST schema, but that lost Drew's id-management description (concrete LLM guidance on filter-id handling for create vs update flows). After rebase, the refactor commit aligned back to main's design to preserve those semantics. The wider input type on saveDashboard.ts still ships, so REST/MCP type parity is unaffected.

Test plan

  • yarn workspace @hyperdx/api tsc --noEmit clean
  • yarn workspace @hyperdx/api lint clean (prettier + eslint + tsc + openapi lint)
  • prose-lint.py clean on every changed file
  • Built every example dashboard on a live dev stack and confirmed every tile renders with realistic synthetic data
  • Snapshot tests guard: design checklist present, four patterns listed, each example carries "When to use", zero em-dashes in any prompt string
  • Filter round-trip via MCP: save with two filters, get back, update with one renamed filter, fetch, assert shape
  • Filter rejection: save with bogus sourceId returns 4xx
  • Backward compat: save without filters returns filters: []
  • Green CI across lint, unit, integration, knip, e2e shards 1-4, ClickHouse Bundle Build, Vercel (handled on push)

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 12, 2026

🦋 Changeset detected

Latest commit: 00d360d

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@hyperdx/api Patch
@hyperdx/app Patch
@hyperdx/otel-collector Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link
Copy Markdown

vercel Bot commented May 12, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
hyperdx-oss Ready Ready Preview, Comment May 20, 2026 4:08pm

Request Review

@github-actions github-actions Bot added the review/tier-3 Standard — full human review required label May 12, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 12, 2026

🔴 Tier 4 — Critical

Touches auth, data models, config, tasks, OTel pipeline, ClickHouse, or CI/CD.

Why this tier:

  • Large diff: 1405 production lines changed (threshold: 1000)

Review process: Deep review from a domain expert. Synchronous walkthrough may be required.
SLA: Schedule synchronous review within 2 business days.

Stats
  • Production files changed: 5
  • Production lines changed: 1405 (+ 916 in test files, excluded from tier calculation)
  • Branch: alex/HDX-mcp-prompts-overhaul
  • Author: alex-fedotyev

To override this classification, remove the review/tier-4 label and apply a different review/tier-* label. Manual overrides are preserved on subsequent pushes.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 12, 2026

PR Review

This PR is mostly prompt content + a new filters parameter on hyperdx_save_dashboard. The prompt rewrite is well-snapshot-tested, the filter round-trip path mirrors the REST v2 schemas, and the onClick validation tests survived intact. One concrete behavior concern below.

  • ⚠️ assignFilterIds (saveDashboard.ts:126-140) silently mints a fresh ObjectId for any filter missing an id on UPDATE → contradicts the prompt's own "MUST carry an id" contract (content.ts:36-44, schemas.ts:641-650). If an LLM forgets to carry the id of an existing filter, it gets a new id and silently orphans its savedFilterValues. The REST PUT path rejects this; the MCP path absorbs it. Either fail loud when an unknown id is missing on a filter that doesn't already exist on the dashboard, or at least diff against existingFilterIds (already in scope at saveDashboard.ts:501-503) and only auto-mint when the body has more filters than persisted. Worth covering with a test of "update payload omits id on a filter that exists in persisted set" → expect either error or id preserved.

  • ℹ️ Minor: mcpDashboardFilterSchema.id is plain z.string().optional() with no objectIdSchema validation (schemas.ts:637-650), while sourceId on the same schema uses objectIdSchema. A malformed id passed on UPDATE flows straight through assignFilterIds (since length > 0) into Mongo. Low risk, but tightening this would catch typos before they persist.

  • ℹ️ drilldown_links example carries <TARGET_DASHBOARD_ID> as a placeholder string (content.ts:656). The example header tells the user to swap it, but an agent that copies verbatim will hit Could not find the following onClick dashboard IDs: <TARGET_DASHBOARD_ID> rather than a schema error — fine if that's the intended UX, since the message is actionable.

No security or critical correctness issues. Ship-ready modulo the filter-id auto-mint footgun.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 12, 2026

E2E Test Results

All tests passed • 178 passed • 3 skipped • 1229s

Status Count
✅ Passed 178
❌ Failed 0
⚠️ Flaky 4
⏭️ Skipped 3

Tests ran across 4 shards in parallel.

View full report →

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 12, 2026

Deep Review

Scope: PR #2264 against base 205919f5 — MCP dashboard prompts overhaul + hyperdx_save_dashboard filter input widening (7 files, +1472/−864).

Reviewers ran: correctness, testing, maintainability, project-standards, agent-native, learnings, api-contract, kieran-typescript, adversarial.

✅ No critical issues found.

🟡 P2 — recommended

  • packages/api/src/mcp/tools/dashboards/saveDashboard.ts:126 — On update, assignFilterIds mints a fresh ObjectId for any filter missing an id; convertExternalFiltersToInternal then mints another (since the just-minted id is not in existingFilterIds), so every "forgot ids" update rotates every filter id. The schema warning at schemas.ts:637-650 claims this "orphans savedFilterValues," but savedFilterValues is keyed by {type, condition} not by filter id, so the documented harm cannot occur — the real harm (silent id churn that invalidates id-keyed UI state and the table-tile onClick filter→expression binding contract this PR ships) is not surfaced to the agent.
    • Fix: Either reject update payloads that omit id on a filter the dashboard still has, or match by (expression, name, sourceId) against existingDashboard.filters to reuse the prior id; then rewrite the schema description to match the chosen behavior.
    • correctness, api-contract, adversarial
  • packages/api/src/mcp/tools/dashboards/schemas.ts:637 — The MCP filter schema's .describe() and the new prompt content (content.ts:38) teach the model to "preserve savedFilterValues bound to that id," but the hyperdx_save_dashboard inputSchema never exposes savedFilterValues (or savedQuery / savedQueryLanguage), and saveDashboard.ts:288 hard-codes resolveSavedQueryLanguage({savedQuery: undefined, savedQueryLanguage: undefined}); the agent is told to defend a field it cannot author. The pre-existing parity gap is unchanged, but this PR's prompt rewrite is the first place the gap becomes a documented promise.
    • Fix: Either expose savedQuery, savedQueryLanguage, and savedFilterValues on the MCP inputSchema (plumbing them into $set on update) or scrub the prompt + schema descriptions of references to fields the tool cannot drive.
    • agent-native, kieran-typescript
  • packages/api/src/mcp/__tests__/dashboards.test.ts:1359 — The new normalization helpers stripFilterIds and assignFilterIds are exercised only through happy-path integration tests; the dangerous case (update payload that omits ids on previously-saved filters) is not covered, and the PR-description claim "save with bogus sourceId returns 4xx" has no test that drives a filter (vs. tile) sourceId. A regression that flipped the orphan-vs-preserve behavior would still pass.
    • Fix: Add a save→update round-trip that resubmits both existing filters without ids and asserts the contract you chose (reject, or id-stable by expression), plus a test that saves with filters: [{ ..., sourceId: "<unknown>" }] and asserts isError: true.
    • testing, correctness
🔵 P3 nitpicks (8)
  • packages/api/src/mcp/tools/dashboards/saveDashboard.ts:137assignFilterIds performs an ObjectId allocation that convertExternalFiltersToInternal always discards on update, since the just-minted id never matches existingFilterIds; the helper exists only to satisfy updateDashboardBodySchema's id-required check.
    • Fix: Collapse the two id-mint sites into one — either drop assignFilterIds and let convertExternalFiltersToInternal own id generation, or have convertExternalFiltersToInternal skip its mint when the id was already assigned at the boundary.
    • correctness, api-contract
  • packages/api/src/mcp/tools/dashboards/saveDashboard.ts:299 — On create, savedFilterValues: parsed.data.savedFilterValues is spread unconditionally, but the MCP inputSchema does not expose savedFilterValues, so this is always undefined; the update path correctly guards on !== undefined. Harmless today, but the inconsistency masks the missing-input bug if the schema later exposes the field.
    • Fix: Spread savedFilterValues conditionally on create to match the update path.
    • correctness
  • packages/api/src/mcp/tools/dashboards/saveDashboard.ts:114inputFilters is typed (ExternalDashboardFilter | ExternalDashboardFilterWithId)[], but ExternalDashboardFilter is just ExternalDashboardFilterWithId with id omitted; the union forces four as casts inside the two helpers.
    • Fix: Type the param as a single shape with id?: string, then destructure { id, ...rest } without casts.
    • kieran-typescript
  • packages/api/src/mcp/tools/dashboards/schemas.ts:635mcpDashboardFilterSchema re-declares every field of externalDashboardFilterSchemaWithId and is not .strict(), while the REST counterpart is; field additions in common-utils will silently drift between the two surfaces.
    • Fix: Build the MCP shape from externalDashboardFilterSchemaWithId.partial({ id: true }) and layer per-field .describe() on top, or add a compile-time TypeEqual assert.
    • maintainability, api-contract
  • packages/api/src/mcp/tools/dashboards/saveDashboard.ts:457filterChangedHeatmapTiles intentionally skips the heatmap source-kind gate when a tile's sourceId and displayType are unchanged, so a heatmap whose source was externally narrowed from trace to something else persists across any MCP-driven update; pre-existing REST behavior, now reachable from the MCP surface.
    • Fix: Either accept the trade-off and document it (test that pins the behavior), or run the source-kind gate against the persisted heatmap set on every update.
    • adversarial
  • packages/api/src/mcp/tools/dashboards/saveDashboard.ts:288 — Helper getSourceConnectionMismatches runs on the REST POST/PUT path but is not exported from utils/dashboards.ts, so the MCP save path cannot invoke it; a raw-SQL tile whose sourceId and connectionId belong to different connections will be persisted via MCP and rejected on the next REST round-trip.
    • Fix: Export the helper and call it from both createDashboard and updateDashboard in saveDashboard.ts.
    • agent-native
  • packages/api/src/mcp/prompts/dashboards/content.ts:712 — The raw-SQL macro reference list ($__timeFilter, $__timeInterval, $__filters, …) appears in three places (schemas.ts:585, content.ts:712, content.ts:871); a future macro rename has to touch all three to avoid teaching the model an inconsistent macro set.
    • Fix: Extract a single SQL_MACRO_REFERENCE constant and import it at all three sites.
    • maintainability
  • packages/api/src/mcp/__tests__/dashboards.test.ts:1414 — The new having field on table tiles has a single save-and-assert-from-response test; there is no get-dashboard round-trip and no .max(10000) rejection assertion, so a body-schema regression that drops having from persistence would pass.
    • Fix: Add a save→get assertion on having and a rejection test for >10000-char input.
    • testing

Reviewers (9): correctness, testing, maintainability, project-standards, agent-native, learnings, api-contract, kieran-typescript, adversarial.

Testing gaps:

  • No direct unit tests for stripFilterIds / assignFilterIds (empty array, mixed-id, empty-string-id, undefined branches).
  • No regression test for the MCP-vs-REST parity gap on getSourceConnectionMismatches.
  • No assertion that save-without-filters returns filters: [] (the backward-compat claim in the PR body).

@teeohhem
Copy link
Copy Markdown
Contributor

@alex-fedotyev I noticed some recent changes. Is this ready to review?

alex-fedotyev and others added 5 commits May 19, 2026 20:47
…ilters on save tool

The `create_dashboard` prompt now leads with a ten-rule design checklist
(RED columns with aliases, per-series numberFormat for durations,
groupByColumnsOnLeft for inventory tables, dashboard-level filters
instead of per-tile where literals, one-metric-per-tile for metric
sources, containers and tabs for grouping). The wall-of-JSON canonical
example is gone; the four dashboard_examples patterns carry the
concrete shapes.

The dashboard_examples set is replaced with four verified patterns
(service_inventory, service_detail, log_analytics,
backend_dependencies) plus the existing infrastructure_sql. Each
non-SQL example leads with a "When to use" header and a "Why this
shape" note so the model picks by intent, not by surface keyword
match. Examples were built and rendered on a live dev stack before
landing, which surfaced two design issues now captured as rules in
the prompt:

- chart-level numberFormat on a table that mixes counts and durations
  formats every value as a duration; per-series numberFormat is the fix.
- metric tiles take exactly one select item (renderer destructures
  select[0] and ignores the rest).

The query_guide prompt gains a DASHBOARD FILTERS section that
documents the filters: [{ type, name, expression, sourceId, where?,
whereLanguage? }] shape, a NUMBER FORMAT section that explains the
per-series vs. chart-level distinction, and a PER-TILE TYPE
CONSTRAINTS note about the metric one-select rule.

hyperdx_save_dashboard now accepts `filters` on its input schema,
reusing externalDashboardFilterSchemaWithId so the MCP and REST
surfaces stay in lockstep and the existing
convertExternalFiltersToInternal helper handles the conversion
without translation. Filters round-trip through create, get, and
update.

Voice pass: every prompt string is now em-dash-free, with a snapshot
test guarding regressions.

Drill-down behavior (onClick from a service inventory row into the
service detail dashboard) is a separate follow-up PR; landing the
prompt rewrite first.
The buildSourceSummary helper is the source-list block that
buildCreateDashboardPrompt prepends to the prompt body. It still
carried em-dashes from before the voice pass, so the assembled
`create_dashboard` prompt that the MCP transport returns to a client
contained eight em-dashes even though every builder in content.ts is
clean. Caught by an end-to-end check that fetched the prompt through
the live MCP transport and counted em-dashes in the response payload.
…rompts

claude-review flagged:
- `service_detail` example uses `having: "StatusMessage != ''"` on a
  table tile, but mcpTableTileSchema.config does not include `having`.
  Zod's `.strip()` silently drops the field, so an LLM following the
  example through MCP would save a table that includes empty-message
  rows. Added `having: z.string().max(10000).optional()` mirroring
  externalDashboardTableChartConfigSchema and a round-trip test.
- Heatmap tile's numberFormat describe still referenced
  `output: "time"` while the rest of the file uses `output: "duration"`.
  Aligned to "duration" for consistency with the prompt guidance.

deep-review additionally flagged:
- query_guide told the model that metric tiles take "1 select item with
  metricName + metricType", but mcpTileSelectItemSchema does not expose
  those fields, so the keys are silently stripped. Replaced the metric
  authoring guidance with an explicit "not currently exposed via the MCP
  schema; use raw SQL for infrastructure metrics" note. Same edit on
  the design-checklist rule 9 (replaced with a "replace, not merge"
  update-semantic note that addresses a separate review concern about
  partial-payload data loss on update).
- mcpFiltersParam advertised id as optional via .partial({ id: true }),
  but createDashboardBodySchema rejects id (strict, no id) and
  updateDashboardBodySchema requires id. So an LLM copy-pasting a
  filter from get-dashboard into create, or adding a brand-new filter
  on update without an id, would hit a confusing strict-validation
  rejection. saveDashboard.ts now normalizes per flow: stripFilterIds()
  on create, assignFilterIds() on update.
- Em-dash regression check did not cover buildQueryGuidePrompt (largest
  prompt, most likely to regress) or buildSourceSummary (helpers.ts
  carried em-dashes through the initial PR until fix `3539649e`). Added
  assertions for both.
- Bad-sourceId test asserted only `text.toContain('source')`, which
  matches almost any error string. Tightened to assert
  "Could not find source IDs" plus the literal bad id.
- Numbered-rule check used `prompt.toContain('${i}.')` which would
  match substrings like "1.2s" or "0.000000001". Anchored the check
  to line start (`/^${i}\\. /m`) and scoped it to the DESIGN CHECKLIST
  section.
- DASHBOARD FILTERS and NUMBER FORMAT body content was not asserted
  beyond the heading. Now asserts substantive content (QUERY_EXPRESSION
  / expression / sourceId for filters; factor: 0.000000001 / duration /
  per-series for number format).
- Filter round-trip did not cover the freshly-generated-id branch of
  convertExternalFiltersToInternal at the MCP layer. Added a test that
  ships a new no-id filter on update and asserts saveDashboard assigns
  a fresh id.
…oard

The service_inventory pattern's Services table now carries an onClick that
drills into the partner "Service Detail" dashboard. Using mode: "template"
with the constant template "Service Detail" resolves the target by name
at click time, so the canonical flow (save both dashboards in one shot)
works without needing to thread the detail dashboard's ID back into the
inventory tile.

The onClick filter's expression "ServiceName" matches the service_detail
filter declaration, so the destination dashboard's "Service" dropdown
auto-populates with the clicked row's value rather than asking the user
to re-select it.

Snapshot test asserts the template target, filter expression, and
template field shape so a future refactor cannot silently break the
inventory -> detail drill-down link.
Worked Claude through five starter dashboards on a fresh stack; this pass
folds the patterns it got wrong back into the prompts.

DESIGN CHECKLIST changes:

- Rule 2 (alias) now reads ALIAS EVERY SELECT ITEM, not ALIAS EVERY
  AGGREGATION. Claude landed three number tiles with no alias on the
  quantile() select item; rule 2 as written felt table-specific.
- Rule 10 (containers) is now REQUIRED at five-plus tiles, not a soft
  hint. Claude built five dashboards averaging ten tiles each with
  zero containers when the rule was a SHOULD.
- New rule 11 VALIDATE EVERY TILE AFTER SAVE, separated from the
  workflow step so it has its own anchor in the checklist.
- New rule 12 NO TITLE-RECAP MARKDOWN TILE. Claude added a 24x2
  "About this dashboard" markdown tile to every starter dashboard; the
  renderer styles markdown headings at title-bar scale so the tile
  ate a row of vertical space before the first KPI showed.

TILE TYPE GUIDE: markdown line rewritten to actively discourage the
title-recap habit (use containers/tabs for sectioning, skip the
markdown tile entirely on starters, no headings inside markdown bodies
if you must add one).

QUERY GUIDE additions:

- LUCENE FILTER SYNTAX gains a GOTCHA block: Lucene comparison
  operators and wildcards on dotted attribute paths
  (http.status_code:>=500, http.route:*) parse and save without error
  but fail silently at query time. Workaround: switch to SQL with
  bracket access. This was the cause of every "Error loading chart"
  Claude hit during its run.
- PER-TILE TYPE CONSTRAINTS expanded for metric sources: builder tile
  authoring on metric sources currently renders as "Both table name
  and UUID are empty" at query time. Use a raw SQL tile with explicit
  table reference (otel_metrics_gauge / otel_metrics_sum /
  otel_metrics_histogram) until the builder catches up. Discovery note
  added because metric sources don't publish mapAttributeKeys the way
  log and trace sources do.
- TABLE TILE LINKING gains a GROUPBY ALIASES AND ROW-CLICK TEMPLATES
  subsection. Builder tiles don't expose a groupBy alias today, so
  groupBy: SpanAttributes['http.route'] produces a column name no
  Handlebars template can reference as {{Route}}. Workaround:
  configType: "sql" with explicit AS Route.
- COMMON MISTAKES rule 8 (validate-after-save) tightened to "every
  tile, not just one". Rule 13 (missing alias) and rule 14 (Lucene
  comparison on map paths) added with concrete wrong/correct examples.
  Rules 11/12 onClick entries were duplicate-numbered in the
  cherry-pick; renumbered to 15-18.
- New REFERENCES section at the end of the query guide pins three
  ClickStack / ClickHouse doc URLs: search syntax for Lucene, the SQL
  reference for the where field with whereLanguage:"sql", and the
  sql-visualizations page for the $__timeFilter / $__timeInterval /
  $__filters macros used in raw SQL tiles.

DEFAULT TIME WINDOW: short note in the create prompt that dashboards
open at 15 minutes and the user-side time picker is currently the only
way to widen the window. Tells the agent to surface this to the user
rather than padding individual tile time ranges to compensate.

Snapshot tests extended to lock the new rules in place: design
checklist now iterates 1 to 12, asserts on the strengthened phrasings
("ALIAS EVERY SELECT ITEM", "REQUIRED at five or more tiles",
"VALIDATE EVERY TILE AFTER SAVE", "NO TITLE-RECAP MARKDOWN TILE"),
plus per-section coverage tests for the GROUPBY ALIASES workaround,
the Lucene gotcha (in both LUCENE FILTER SYNTAX and COMMON MISTAKES),
the metric-source workaround text, and the three reference URLs.
…ond pass

Watched Claude run on a fresh stack with the previous prompt revision;
this commit closes the remaining gaps surfaced by that run.

DESIGN CHECKLIST changes:

- Rule 2 (alias) now ships a copy-pasteable number-tile example next
  to the rule body. The prior phrasing got better tables but Claude
  still dropped aliases on every builder number tile across three
  dashboards (P50/P95/P99 latency, Server Requests, Failed Spans, Log
  Errors). Concrete shape next to the text is what makes it stick.
- Rule 10 (containers) now ships a copy-pasteable containers/tabs
  block. The prior REQUIRED phrasing did not bite at all; Claude built
  five dashboards with 9-10 tiles each and zero containers. The model
  reads "REQUIRED" as a soft hint when the next concrete shape is two
  files away in the dashboard_examples prompt. Inline the shape and
  the tile -> containerId wiring directly in the rule.

WORKFLOW changes:

- Now six steps. Step 2 says "list existing dashboards with
  hyperdx_get_dashboard, fetch one or two, learn local idioms before
  designing", which Claude's reflection asked for. Step 4 says
  "sketch tiles, THEN group them into 2-4 containers before assembling
  the save payload" so the container decision happens at the same
  level as picking source kind, not buried in a checklist.

QUERY GUIDE changes:

- LUCENE FILTER SYNTAX gotcha broadened from "comparison and wildcard
  on dotted map-attribute paths" to "ALL Lucene operations on
  map-attribute paths". Claude saw db.system:mongodb (simple equality)
  also fail to translate to SpanAttributes['db.system'] on this round.
  The previous phrasing said "simple equality on a dotted path is
  fine"; that was wrong. The corrected guidance is: use SQL with
  bracket access for any operation on map-attribute keys.

- Added a FUZZY-MATCH NOTE alongside the gotcha: Lucene field:value
  on top-level string columns translates to ilike(field, '%value%'),
  not equality. That is fine for free-text columns but surprising for
  enum-like ones (SpanKind, SeverityText, StatusCode). Claude hit this
  on SpanKind:Server matching broader strings than intended. The
  canonical fix is whereLanguage: "sql" with =.

- COMMON MISTAKES rule 14 rewritten to cover any-operation, with
  db.system:mongodb as the equality example.
- Rule 15 added for the fuzzy-substring trap on top-level enum-like
  columns.
- Rule 16 added for the empty-string trap on map-attribute groupBy:
  ClickHouse map columns return '' for unset keys, so groupBy on a
  sparse attribute produces an empty bucket alongside real values.
  Workaround: where: "SpanAttributes['<key>'] != ''" alongside the
  groupBy. Comes from Claude's MongoDB dashboard work where the empty
  collection-name bucket was the loudest visual noise.
- Renumbered the onClick mistakes from 15-18 to 17-20 to make room.

Tests extended to assert: workflow has six numbered steps and names
the read-existing and group-into-containers steps explicitly; rule 2
carries the Server Requests example; rule 10 carries the kpis/trends/
errors shape; LUCENE FILTER SYNTAX mentions both wildcard and equality
gotchas plus the FUZZY-MATCH NOTE; COMMON MISTAKES has dedicated rules
for the map-attr-any-operation gotcha, the enum-like fuzzy-match
trap, and the empty-string map-attr trap. All 6 example patterns
still save and query cleanly end-to-end (48 tiles checked through
hyperdx_query_tile).
Adds a new rule 3 GROUP BY HAS NO ALIAS HOOK that calls out the
schema limit on the v2 chart config: groupBy is a single expression
string, so the renderer uses it verbatim as the table column header
and as the raw column name in CSV export, tooltips, orderBy, and
onClick template references. A table grouped by
SpanAttributes['http.route'] renders arrayElement(SpanAttributes,
'http.route') as the header. The rule pushes the model toward a
top-level column when one carries the same semantic
(SpanName / ServiceName / SeverityText) and falls back to a raw
SQL tile with AS alias when only a Map attribute is available.

Second-pass dashboards consistently produced ugly Map-expression
headers on App-Performance tables even though every alias rule on
the select side was already in the checklist; the gap was the
missing acknowledgement that groupBy has no alias hook.

Also fixes a pre-existing assertion in dashboards.test.ts left
over from the rewrite: the test expected the phrase "not
currently exposed via the MCP" which is no longer in the
PER-TILE TYPE CONSTRAINTS section. Anchors on the surviving
"Authoring builder tiles on a metric source is not reliable"
and "MCP select-item shape does not carry" phrasing instead.
Drew's #2273 shipped `mcpDashboardFilterSchema` + `mcpFiltersParam`
with explicit id-management semantics in the description (telling
the LLM exactly how to handle filter ids on create vs update).
The pre-rebase version of this branch defined `mcpFiltersParam` as
`z.array(externalDashboardFilterSchemaWithId.partial({ id: true }))`
instead, which loses Drew's id description.

Aligning back to main's design keeps the richer guidance the LLM
benefits from and avoids overriding a schema another PR shipped
upstream four days earlier. The wider `(ExternalDashboardFilter |
ExternalDashboardFilterWithId)[]` input type on saveDashboard.ts
still ships, so REST/MCP parity is unaffected.

`externalDashboardFilterSchemaWithId` is no longer used in
schemas.ts; the import is dropped.
Five em-dashes (U+2014) in the dashboard authoring prompt
content surface to the LLM and to human reviewers. Per project
voice rules, replace with colon / period equivalents that read
the same to readers and to LLMs.

Affected:
- filter-id self-describing rule (two occurrences)
- pattern-mining query_guide bullet list (where / sampleSize /
  bodyExpression)
Deep-review (P2 #1, #2264 comment on 2026-05-12):
the rewritten mcpOnClickDashboardSchema.filters description dropped
the warning that an onClick filter whose expression is not declared
on the destination dashboard is dropped at click time. Restore one
sentence on the onClick schema description and the matching wording
in the dashboards prompt content.

Also drop three em-dashes from mcpDashboardFilterSchema descriptions
introduced by 9444c2c; project voice rules ban em-dashes in
reviewer-facing prose.
@alex-fedotyev alex-fedotyev force-pushed the alex/HDX-mcp-prompts-overhaul branch from 94bdfb2 to 3c7a316 Compare May 19, 2026 20:53
@alex-fedotyev
Copy link
Copy Markdown
Contributor Author

alex-fedotyev commented May 19, 2026

Yes, ready for review. Rebased on main (was 10 behind, conflicting) and pushed 3c7a316.

What I did in this pass:

  • Rebased on main. One real conflict at content.ts COMMON MISTAKES item 4 from the hyperdx_query to per-displayType split in refactor(mcp): split hyperdx_query into 5 display-type-specific tools #2294. Resolved by keeping the broader scope (number / pie / heatmap) and folding in the new hyperdx_table auto-upgrade note as a clarification line. CI re-running on the rebased branch.
  • Addressed the deep-review correctness finding on mcpOnClickDashboardSchema.filters (the description had dropped the silent-drop warning). Restored a sentence on the schema description and the matching content.ts:994 filters wording. Also dropped three em-dashes from mcpDashboardFilterSchema descriptions introduced by 9444c2c to keep the file voice-clean.

What's NOT in this pass, deliberately, since it's an unblock pass and not a full sweep:

  • assignFilterIds foreign-id handling: real API-contract decision (reject vs warn), deserves its own change.
  • Three test-only follow-ups: having round-trip persistence assertion, empty-string id branch coverage, filters empty-array vs omitted semantics.
  • Structural parity test between mcpDashboardFilterSchema and externalDashboardFilterSchemaWithId.
  • All P3 nits.

If you'd rather I roll the assignFilterIds fix into this PR before review, happy to; otherwise I'll track it as a follow-up so this one stays focused on the prompt-rewrite + dashboard filters surface.

(Edited: rewrote the deferred-items list to drop #N notation that GitHub was auto-linking to unrelated issues.)

.optional()
.describe(
'Display formatting for bucket values. Example: { output: "time", factor: 0.000000001 } ' +
'Display formatting for bucket values. Example: { output: "duration", factor: 0.000000001 } ' +
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't time be better here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kept output: "duration" intentionally. The heatmap chart at packages/app/src/components/DBHeatmapChart.tsx:1091 branches on numberFormat?.output === 'duration' and calls formatDurationMsCompact(value * factor * 1000), which renders bucket values as "1.2s", "45ms", "3µs" (trace-waterfall style). output: "time" falls through to numbro's time formatter which is clock-style HH:MM:SS, wrong for sub-second latencies. The design checklist in the same prompt (rule 8) also prescribes { output: "duration", factor: 0.000000001 } for heatmaps, so the schema example now lines up with that.

});
});

it('should reject a filter whose sourceId does not exist', async () => {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably already have a test for this, if not, it should be a generic sourceId validation test not specific to filters

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, dropped in b8115f9. The tile-level missing-source test at line 295 exercises the same getMissingSources(sources, tilesWithId, filters) codepath in saveDashboard.ts:217, so the filter-specific variant was duplicate coverage.

expect(text).toContain(missingSourceId);
});

it('should round-trip a dashboard with no filters (backward compat)', async () => {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this - other tests should cover this. No backward compat needed yet

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped in b8115f9. Existing create/update tests in the same file save dashboards without a filters field and implicitly cover the empty-array default.

Copy link
Copy Markdown
Member

@brandon-pereira brandon-pereira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall, lgtm. a couple nits

Two of the four filter tests Brandon flagged as redundant on
#2264:

- The filter-specific missing-source test duplicated the
  existing tile-level test at line 295. saveDashboard's
  getMissingSources(sources, tilesWithId, filters) walks both
  in one pass, so a tile-level miss exercises the same
  codepath.

- The no-filters round-trip was framed as a backward-compat
  guard, but the create/update suite already saves dashboards
  without a filters field and implicitly covers the empty-array
  default.
@kodiakhq kodiakhq Bot merged commit c3a8aa5 into main May 20, 2026
18 checks passed
@kodiakhq kodiakhq Bot deleted the alex/HDX-mcp-prompts-overhaul branch May 20, 2026 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

automerge review/tier-4 Critical — deep review + domain expert sign-off

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants