feat(mcp): rewrite dashboard authoring prompts; expose filters on save tool#2264
Conversation
🦋 Changeset detectedLatest commit: 00d360d The changes in this PR will be included in the next version bump. This PR includes changesets to release 3 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
🔴 Tier 4 — CriticalTouches auth, data models, config, tasks, OTel pipeline, ClickHouse, or CI/CD. Why this tier:
Review process: Deep review from a domain expert. Synchronous walkthrough may be required. Stats
|
PR ReviewThis PR is mostly prompt content + a new
No security or critical correctness issues. Ship-ready modulo the filter-id auto-mint footgun. |
E2E Test Results✅ All tests passed • 178 passed • 3 skipped • 1229s
Tests ran across 4 shards in parallel. |
Deep ReviewScope: PR #2264 against base Reviewers ran: correctness, testing, maintainability, project-standards, agent-native, learnings, api-contract, kieran-typescript, adversarial. ✅ No critical issues found. 🟡 P2 — recommended
🔵 P3 nitpicks (8)
Reviewers (9): correctness, testing, maintainability, project-standards, agent-native, learnings, api-contract, kieran-typescript, adversarial. Testing gaps:
|
cd37010 to
be223fb
Compare
e409a29 to
5ed6ca9
Compare
5ed6ca9 to
94bdfb2
Compare
|
@alex-fedotyev I noticed some recent changes. Is this ready to review? |
…ilters on save tool
The `create_dashboard` prompt now leads with a ten-rule design checklist
(RED columns with aliases, per-series numberFormat for durations,
groupByColumnsOnLeft for inventory tables, dashboard-level filters
instead of per-tile where literals, one-metric-per-tile for metric
sources, containers and tabs for grouping). The wall-of-JSON canonical
example is gone; the four dashboard_examples patterns carry the
concrete shapes.
The dashboard_examples set is replaced with four verified patterns
(service_inventory, service_detail, log_analytics,
backend_dependencies) plus the existing infrastructure_sql. Each
non-SQL example leads with a "When to use" header and a "Why this
shape" note so the model picks by intent, not by surface keyword
match. Examples were built and rendered on a live dev stack before
landing, which surfaced two design issues now captured as rules in
the prompt:
- chart-level numberFormat on a table that mixes counts and durations
formats every value as a duration; per-series numberFormat is the fix.
- metric tiles take exactly one select item (renderer destructures
select[0] and ignores the rest).
The query_guide prompt gains a DASHBOARD FILTERS section that
documents the filters: [{ type, name, expression, sourceId, where?,
whereLanguage? }] shape, a NUMBER FORMAT section that explains the
per-series vs. chart-level distinction, and a PER-TILE TYPE
CONSTRAINTS note about the metric one-select rule.
hyperdx_save_dashboard now accepts `filters` on its input schema,
reusing externalDashboardFilterSchemaWithId so the MCP and REST
surfaces stay in lockstep and the existing
convertExternalFiltersToInternal helper handles the conversion
without translation. Filters round-trip through create, get, and
update.
Voice pass: every prompt string is now em-dash-free, with a snapshot
test guarding regressions.
Drill-down behavior (onClick from a service inventory row into the
service detail dashboard) is a separate follow-up PR; landing the
prompt rewrite first.
The buildSourceSummary helper is the source-list block that buildCreateDashboardPrompt prepends to the prompt body. It still carried em-dashes from before the voice pass, so the assembled `create_dashboard` prompt that the MCP transport returns to a client contained eight em-dashes even though every builder in content.ts is clean. Caught by an end-to-end check that fetched the prompt through the live MCP transport and counted em-dashes in the response payload.
…rompts
claude-review flagged:
- `service_detail` example uses `having: "StatusMessage != ''"` on a
table tile, but mcpTableTileSchema.config does not include `having`.
Zod's `.strip()` silently drops the field, so an LLM following the
example through MCP would save a table that includes empty-message
rows. Added `having: z.string().max(10000).optional()` mirroring
externalDashboardTableChartConfigSchema and a round-trip test.
- Heatmap tile's numberFormat describe still referenced
`output: "time"` while the rest of the file uses `output: "duration"`.
Aligned to "duration" for consistency with the prompt guidance.
deep-review additionally flagged:
- query_guide told the model that metric tiles take "1 select item with
metricName + metricType", but mcpTileSelectItemSchema does not expose
those fields, so the keys are silently stripped. Replaced the metric
authoring guidance with an explicit "not currently exposed via the MCP
schema; use raw SQL for infrastructure metrics" note. Same edit on
the design-checklist rule 9 (replaced with a "replace, not merge"
update-semantic note that addresses a separate review concern about
partial-payload data loss on update).
- mcpFiltersParam advertised id as optional via .partial({ id: true }),
but createDashboardBodySchema rejects id (strict, no id) and
updateDashboardBodySchema requires id. So an LLM copy-pasting a
filter from get-dashboard into create, or adding a brand-new filter
on update without an id, would hit a confusing strict-validation
rejection. saveDashboard.ts now normalizes per flow: stripFilterIds()
on create, assignFilterIds() on update.
- Em-dash regression check did not cover buildQueryGuidePrompt (largest
prompt, most likely to regress) or buildSourceSummary (helpers.ts
carried em-dashes through the initial PR until fix `3539649e`). Added
assertions for both.
- Bad-sourceId test asserted only `text.toContain('source')`, which
matches almost any error string. Tightened to assert
"Could not find source IDs" plus the literal bad id.
- Numbered-rule check used `prompt.toContain('${i}.')` which would
match substrings like "1.2s" or "0.000000001". Anchored the check
to line start (`/^${i}\\. /m`) and scoped it to the DESIGN CHECKLIST
section.
- DASHBOARD FILTERS and NUMBER FORMAT body content was not asserted
beyond the heading. Now asserts substantive content (QUERY_EXPRESSION
/ expression / sourceId for filters; factor: 0.000000001 / duration /
per-series for number format).
- Filter round-trip did not cover the freshly-generated-id branch of
convertExternalFiltersToInternal at the MCP layer. Added a test that
ships a new no-id filter on update and asserts saveDashboard assigns
a fresh id.
…oard The service_inventory pattern's Services table now carries an onClick that drills into the partner "Service Detail" dashboard. Using mode: "template" with the constant template "Service Detail" resolves the target by name at click time, so the canonical flow (save both dashboards in one shot) works without needing to thread the detail dashboard's ID back into the inventory tile. The onClick filter's expression "ServiceName" matches the service_detail filter declaration, so the destination dashboard's "Service" dropdown auto-populates with the clicked row's value rather than asking the user to re-select it. Snapshot test asserts the template target, filter expression, and template field shape so a future refactor cannot silently break the inventory -> detail drill-down link.
Worked Claude through five starter dashboards on a fresh stack; this pass
folds the patterns it got wrong back into the prompts.
DESIGN CHECKLIST changes:
- Rule 2 (alias) now reads ALIAS EVERY SELECT ITEM, not ALIAS EVERY
AGGREGATION. Claude landed three number tiles with no alias on the
quantile() select item; rule 2 as written felt table-specific.
- Rule 10 (containers) is now REQUIRED at five-plus tiles, not a soft
hint. Claude built five dashboards averaging ten tiles each with
zero containers when the rule was a SHOULD.
- New rule 11 VALIDATE EVERY TILE AFTER SAVE, separated from the
workflow step so it has its own anchor in the checklist.
- New rule 12 NO TITLE-RECAP MARKDOWN TILE. Claude added a 24x2
"About this dashboard" markdown tile to every starter dashboard; the
renderer styles markdown headings at title-bar scale so the tile
ate a row of vertical space before the first KPI showed.
TILE TYPE GUIDE: markdown line rewritten to actively discourage the
title-recap habit (use containers/tabs for sectioning, skip the
markdown tile entirely on starters, no headings inside markdown bodies
if you must add one).
QUERY GUIDE additions:
- LUCENE FILTER SYNTAX gains a GOTCHA block: Lucene comparison
operators and wildcards on dotted attribute paths
(http.status_code:>=500, http.route:*) parse and save without error
but fail silently at query time. Workaround: switch to SQL with
bracket access. This was the cause of every "Error loading chart"
Claude hit during its run.
- PER-TILE TYPE CONSTRAINTS expanded for metric sources: builder tile
authoring on metric sources currently renders as "Both table name
and UUID are empty" at query time. Use a raw SQL tile with explicit
table reference (otel_metrics_gauge / otel_metrics_sum /
otel_metrics_histogram) until the builder catches up. Discovery note
added because metric sources don't publish mapAttributeKeys the way
log and trace sources do.
- TABLE TILE LINKING gains a GROUPBY ALIASES AND ROW-CLICK TEMPLATES
subsection. Builder tiles don't expose a groupBy alias today, so
groupBy: SpanAttributes['http.route'] produces a column name no
Handlebars template can reference as {{Route}}. Workaround:
configType: "sql" with explicit AS Route.
- COMMON MISTAKES rule 8 (validate-after-save) tightened to "every
tile, not just one". Rule 13 (missing alias) and rule 14 (Lucene
comparison on map paths) added with concrete wrong/correct examples.
Rules 11/12 onClick entries were duplicate-numbered in the
cherry-pick; renumbered to 15-18.
- New REFERENCES section at the end of the query guide pins three
ClickStack / ClickHouse doc URLs: search syntax for Lucene, the SQL
reference for the where field with whereLanguage:"sql", and the
sql-visualizations page for the $__timeFilter / $__timeInterval /
$__filters macros used in raw SQL tiles.
DEFAULT TIME WINDOW: short note in the create prompt that dashboards
open at 15 minutes and the user-side time picker is currently the only
way to widen the window. Tells the agent to surface this to the user
rather than padding individual tile time ranges to compensate.
Snapshot tests extended to lock the new rules in place: design
checklist now iterates 1 to 12, asserts on the strengthened phrasings
("ALIAS EVERY SELECT ITEM", "REQUIRED at five or more tiles",
"VALIDATE EVERY TILE AFTER SAVE", "NO TITLE-RECAP MARKDOWN TILE"),
plus per-section coverage tests for the GROUPBY ALIASES workaround,
the Lucene gotcha (in both LUCENE FILTER SYNTAX and COMMON MISTAKES),
the metric-source workaround text, and the three reference URLs.
…ond pass Watched Claude run on a fresh stack with the previous prompt revision; this commit closes the remaining gaps surfaced by that run. DESIGN CHECKLIST changes: - Rule 2 (alias) now ships a copy-pasteable number-tile example next to the rule body. The prior phrasing got better tables but Claude still dropped aliases on every builder number tile across three dashboards (P50/P95/P99 latency, Server Requests, Failed Spans, Log Errors). Concrete shape next to the text is what makes it stick. - Rule 10 (containers) now ships a copy-pasteable containers/tabs block. The prior REQUIRED phrasing did not bite at all; Claude built five dashboards with 9-10 tiles each and zero containers. The model reads "REQUIRED" as a soft hint when the next concrete shape is two files away in the dashboard_examples prompt. Inline the shape and the tile -> containerId wiring directly in the rule. WORKFLOW changes: - Now six steps. Step 2 says "list existing dashboards with hyperdx_get_dashboard, fetch one or two, learn local idioms before designing", which Claude's reflection asked for. Step 4 says "sketch tiles, THEN group them into 2-4 containers before assembling the save payload" so the container decision happens at the same level as picking source kind, not buried in a checklist. QUERY GUIDE changes: - LUCENE FILTER SYNTAX gotcha broadened from "comparison and wildcard on dotted map-attribute paths" to "ALL Lucene operations on map-attribute paths". Claude saw db.system:mongodb (simple equality) also fail to translate to SpanAttributes['db.system'] on this round. The previous phrasing said "simple equality on a dotted path is fine"; that was wrong. The corrected guidance is: use SQL with bracket access for any operation on map-attribute keys. - Added a FUZZY-MATCH NOTE alongside the gotcha: Lucene field:value on top-level string columns translates to ilike(field, '%value%'), not equality. That is fine for free-text columns but surprising for enum-like ones (SpanKind, SeverityText, StatusCode). Claude hit this on SpanKind:Server matching broader strings than intended. The canonical fix is whereLanguage: "sql" with =. - COMMON MISTAKES rule 14 rewritten to cover any-operation, with db.system:mongodb as the equality example. - Rule 15 added for the fuzzy-substring trap on top-level enum-like columns. - Rule 16 added for the empty-string trap on map-attribute groupBy: ClickHouse map columns return '' for unset keys, so groupBy on a sparse attribute produces an empty bucket alongside real values. Workaround: where: "SpanAttributes['<key>'] != ''" alongside the groupBy. Comes from Claude's MongoDB dashboard work where the empty collection-name bucket was the loudest visual noise. - Renumbered the onClick mistakes from 15-18 to 17-20 to make room. Tests extended to assert: workflow has six numbered steps and names the read-existing and group-into-containers steps explicitly; rule 2 carries the Server Requests example; rule 10 carries the kpis/trends/ errors shape; LUCENE FILTER SYNTAX mentions both wildcard and equality gotchas plus the FUZZY-MATCH NOTE; COMMON MISTAKES has dedicated rules for the map-attr-any-operation gotcha, the enum-like fuzzy-match trap, and the empty-string map-attr trap. All 6 example patterns still save and query cleanly end-to-end (48 tiles checked through hyperdx_query_tile).
Adds a new rule 3 GROUP BY HAS NO ALIAS HOOK that calls out the schema limit on the v2 chart config: groupBy is a single expression string, so the renderer uses it verbatim as the table column header and as the raw column name in CSV export, tooltips, orderBy, and onClick template references. A table grouped by SpanAttributes['http.route'] renders arrayElement(SpanAttributes, 'http.route') as the header. The rule pushes the model toward a top-level column when one carries the same semantic (SpanName / ServiceName / SeverityText) and falls back to a raw SQL tile with AS alias when only a Map attribute is available. Second-pass dashboards consistently produced ugly Map-expression headers on App-Performance tables even though every alias rule on the select side was already in the checklist; the gap was the missing acknowledgement that groupBy has no alias hook. Also fixes a pre-existing assertion in dashboards.test.ts left over from the rewrite: the test expected the phrase "not currently exposed via the MCP" which is no longer in the PER-TILE TYPE CONSTRAINTS section. Anchors on the surviving "Authoring builder tiles on a metric source is not reliable" and "MCP select-item shape does not carry" phrasing instead.
Drew's #2273 shipped `mcpDashboardFilterSchema` + `mcpFiltersParam` with explicit id-management semantics in the description (telling the LLM exactly how to handle filter ids on create vs update). The pre-rebase version of this branch defined `mcpFiltersParam` as `z.array(externalDashboardFilterSchemaWithId.partial({ id: true }))` instead, which loses Drew's id description. Aligning back to main's design keeps the richer guidance the LLM benefits from and avoids overriding a schema another PR shipped upstream four days earlier. The wider `(ExternalDashboardFilter | ExternalDashboardFilterWithId)[]` input type on saveDashboard.ts still ships, so REST/MCP parity is unaffected. `externalDashboardFilterSchemaWithId` is no longer used in schemas.ts; the import is dropped.
Five em-dashes (U+2014) in the dashboard authoring prompt content surface to the LLM and to human reviewers. Per project voice rules, replace with colon / period equivalents that read the same to readers and to LLMs. Affected: - filter-id self-describing rule (two occurrences) - pattern-mining query_guide bullet list (where / sampleSize / bodyExpression)
Deep-review (P2 #1, #2264 comment on 2026-05-12): the rewritten mcpOnClickDashboardSchema.filters description dropped the warning that an onClick filter whose expression is not declared on the destination dashboard is dropped at click time. Restore one sentence on the onClick schema description and the matching wording in the dashboards prompt content. Also drop three em-dashes from mcpDashboardFilterSchema descriptions introduced by 9444c2c; project voice rules ban em-dashes in reviewer-facing prose.
94bdfb2 to
3c7a316
Compare
|
Yes, ready for review. Rebased on What I did in this pass:
What's NOT in this pass, deliberately, since it's an unblock pass and not a full sweep:
If you'd rather I roll the (Edited: rewrote the deferred-items list to drop |
| .optional() | ||
| .describe( | ||
| 'Display formatting for bucket values. Example: { output: "time", factor: 0.000000001 } ' + | ||
| 'Display formatting for bucket values. Example: { output: "duration", factor: 0.000000001 } ' + |
There was a problem hiding this comment.
Wouldn't time be better here?
There was a problem hiding this comment.
Kept output: "duration" intentionally. The heatmap chart at packages/app/src/components/DBHeatmapChart.tsx:1091 branches on numberFormat?.output === 'duration' and calls formatDurationMsCompact(value * factor * 1000), which renders bucket values as "1.2s", "45ms", "3µs" (trace-waterfall style). output: "time" falls through to numbro's time formatter which is clock-style HH:MM:SS, wrong for sub-second latencies. The design checklist in the same prompt (rule 8) also prescribes { output: "duration", factor: 0.000000001 } for heatmaps, so the schema example now lines up with that.
| }); | ||
| }); | ||
|
|
||
| it('should reject a filter whose sourceId does not exist', async () => { |
There was a problem hiding this comment.
We probably already have a test for this, if not, it should be a generic sourceId validation test not specific to filters
There was a problem hiding this comment.
Good call, dropped in b8115f9. The tile-level missing-source test at line 295 exercises the same getMissingSources(sources, tilesWithId, filters) codepath in saveDashboard.ts:217, so the filter-specific variant was duplicate coverage.
| expect(text).toContain(missingSourceId); | ||
| }); | ||
|
|
||
| it('should round-trip a dashboard with no filters (backward compat)', async () => { |
There was a problem hiding this comment.
No need for this - other tests should cover this. No backward compat needed yet
There was a problem hiding this comment.
Dropped in b8115f9. Existing create/update tests in the same file save dashboards without a filters field and implicitly cover the empty-array default.
brandon-pereira
left a comment
There was a problem hiding this comment.
overall, lgtm. a couple nits
Two of the four filter tests Brandon flagged as redundant on #2264: - The filter-specific missing-source test duplicated the existing tile-level test at line 295. saveDashboard's getMissingSources(sources, tilesWithId, filters) walks both in one pass, so a tile-level miss exercises the same codepath. - The no-filters round-trip was framed as a backward-compat guard, but the create/update suite already saves dashboards without a filters field and implicitly covers the empty-array default.
Summary
Drew shipped the schemas in #2273 (dashboard-level filters, per-tile onClick, drill-down navigation). This PR ships the prompts and examples that teach an MCP-using agent how to use those primitives correctly. Without it, an LLM building a HyperDX dashboard via MCP has the API but doesn't know when or how to call it.
What lands
create_dashboarddesign checklist. A ten-rule scannable preamble at the top of the prompt covers RED columns with aliases, per-seriesnumberFormatfor durations,groupByColumnsOnLeftfor inventory tables, dashboard-level filters instead of per-tilewhereliterals, one-metric-per-tile for metric sources, and containers and tabs for grouping. Replaces the wall-of-JSON canonical example in the old prompt so the model picks the right pattern in one read.Four verified
dashboard_examplespatterns.service_inventory,service_detail,log_analytics,backend_dependencies. Each carries a "When to use" header and a "Why this shape" note. Each was built and rendered on a live dev stack with realistic synthetic data (90K traces across 8 services, 17K logs with severity distribution) before landing. That verification surfaced two design issues now captured as explicit rules: chart-levelnumberFormaton a table mixing counts and durations formats every value as a duration (per-seriesnumberFormatis the fix), and metric tiles take exactly one select item because the renderer destructuresselect[0]and ignores the rest (multi-metric authoring needs one tile per metric).query_guideadditions. New sections forDASHBOARD FILTERS(canonical{ type, name, expression, sourceId, where?, whereLanguage? }shape),NUMBER FORMAT(per-series vs chart-level distinction), andPER-TILE TYPE CONSTRAINTS(the metric one-select rule).hyperdx_save_dashboardfilters parameter. Widened the tool's input handling to accept bothExternalDashboardFilterandExternalDashboardFilterWithIdshapes via the same body schemas the v2 REST handler already uses (createDashboardBodySchema,updateDashboardBodySchema). MCP and REST surfaces stay in lockstep; the existingconvertExternalFiltersToInternalhelper handles the conversion without a translation layer.service_inventoryrow-click wiring. Concrete worked example of how to compose Drew's onClick primitive: the Services table in theservice_inventoryexample carries anonClickthat drills into the partner Service Detail dashboard. Usesmode: "template"with the constant template "Service Detail" so the destination resolves by name at click time and there's no order-of-save dependency between the two dashboards.Post-Claude-run prompt refinements. Three follow-up commits capture bugs the v1 prompts surfaced when Claude actually authored dashboards against them. The groupBy alias gap (LLM kept emitting
SELECT toStartOfMinute(t) AS timewithout aliasing the groupBy column). A broader lucene-gotcha note. Inlined concrete examples for the patterns that were ambiguous on first pass.Voice pass. Em-dashes dropped from
buildSourceSummaryand from the dashboard authoring prompt content. Snapshot test guards regressions.Co-authorship and rebase
The branch composes Drew's MCP
onClickschemas and prompts commit (78e72652) on top of his already-merged #2273. Drew authored that commit; theservice_inventoryrow-click wiring on top is the concrete demonstration of the foundation he built. Drew's commit is preserved in the chain to keep attribution intact. Drew'schore: Remove IS_DASHBOARD_LINKING_ENABLED toggledropped during rebase as already-upstream via #2273.Schema design note
mcpDashboardFilterSchemaandmcpFiltersParamuse the inline-schema design from main (the version Drew's #2273 shipped). An earlier version of this branch definedmcpFiltersParamasz.array(externalDashboardFilterSchemaWithId.partial({ id: true }))to reuse the canonical REST schema, but that lost Drew's id-management description (concrete LLM guidance on filter-id handling for create vs update flows). After rebase, the refactor commit aligned back to main's design to preserve those semantics. The wider input type onsaveDashboard.tsstill ships, so REST/MCP type parity is unaffected.Test plan
yarn workspace @hyperdx/api tsc --noEmitcleanyarn workspace @hyperdx/api lintclean (prettier + eslint + tsc + openapi lint)prose-lint.pyclean on every changed filesourceIdreturns 4xxfiltersreturnsfilters: []