Skip to content

feat: emit Lucene filters for KV items direct_read optimization#2301

Merged
kodiakhq[bot] merged 12 commits into
mainfrom
aaron/filters-direct-read
May 19, 2026
Merged

feat: emit Lucene filters for KV items direct_read optimization#2301
kodiakhq[bot] merged 12 commits into
mainfrom
aaron/filters-direct-read

Conversation

@knudtty
Copy link
Copy Markdown
Contributor

@knudtty knudtty commented May 18, 2026

Summary

Sidebar and dashboard filters now emit Lucene conditions instead of raw SQL, enabling the KV items direct_read optimization on Map columns (e.g. LogAttributes, ResourceAttributes). This routes filter equality checks through the Lucene serializer's eq() path which rewrites to has(KvItemsColumn, concat(key, sep, value)) when a text(tokenizer=array) index exists.

  • filtersToQuery emits type:'lucene' for all filters (range stays SQL)
  • parseLuceneFilter uses the existing Lucene parser + decodeSpecialTokens
  • Removed stringifyKeys option (Lucene serializer handles JSON toString)
  • Added AGENTS.md guidance on reusing existing utility functions

Steps:

  1. yarn dev
  2. Add the following to your table:
`ResourceAttributeItems` Array(String) MATERIALIZED arrayMap(x -> concat(x .1, '=', x.2), CAST(ResourceAttributes, 'Array(Tuple(String, String))')) CODEC (ZSTD (1)),
  `LogAttributeItems` Array(String) MATERIALIZED arrayMap(x -> concat(x.1, '=', x.2), CAST(LogAttributes, 'Array(Tuple(String, String))')) CODEC(ZSTD(1)),
  `ScopeAttributeItems` Array(String) MATERIALIZED arrayMap(x -> concat(x.1, '=', x.2), CAST(ScopeAttributes, 'Array(Tuple(String, String))')) CODEC(ZSTD(1)),
  INDEX idx_res_attr_items ResourceAttributeItems TYPE text(tokenizer = 'array'),
  INDEX idx_scope_attr_items ScopeAttributeItems TYPE text(tokenizer = 'array'),
  INDEX idx_log_attr_items LogAttributeItems TYPE text(tokenizer = 'array'),
  1. Apply a filter on a map attribute
  2. Verify the generated SQL has the optimization applied (You should see has(ResourceAttributeItems, concat('key', '=', 'value')

References

  • Linear Issue: Closes HDX-4271

Sidebar and dashboard filters now emit Lucene conditions instead of raw
SQL, enabling the KV items direct_read optimization on Map columns
(e.g. LogAttributes, ResourceAttributes). This routes filter equality
checks through the Lucene serializer's eq() path which rewrites to
has(KvItemsColumn, concat(key, sep, value)) when a text(tokenizer=array)
index exists.

- filtersToQuery emits type:'lucene' for all filters (range stays SQL)
- parseLuceneFilter uses the existing Lucene parser + decodeSpecialTokens
- Removed stringifyKeys option (Lucene serializer handles JSON toString)
- Added AGENTS.md guidance on reusing existing utility functions
@vercel
Copy link
Copy Markdown

vercel Bot commented May 18, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
hyperdx-oss Ready Ready Preview, Comment May 19, 2026 8:57pm

Request Review

@github-actions github-actions Bot added the review/tier-3 Standard — full human review required label May 18, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 18, 2026

🟡 Tier 3 — Standard

Introduces new logic, modifies core functionality, or touches areas with non-trivial risk.

Why this tier:

  • Diff size: 338 production lines changed (Tier 2 max: < 250)
  • Cross-layer change: touches frontend (packages/app) + backend (packages/api) + shared utils (packages/common-utils)

Review process: Full human review — logic, architecture, edge cases.
SLA: First-pass feedback within 1 business day.

Stats
  • Production files changed: 6
  • Production lines changed: 338 (+ 964 in test files, excluded from tier calculation)
  • Branch: aaron/filters-direct-read
  • Author: knudtty

To override this classification, remove the review/tier-3 label and apply a different review/tier-* label. Manual overrides are preserved on subsequent pushes.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 18, 2026

PR Review

Solid change overall — clean refactor to route filter equality through the Lucene path so KV has(...) direct_read can kick in. Tests are thorough (round-trip + direct_read SQL assertions). A few items worth addressing:

  • ⚠️ useDashboardFilters.setFilterValue drops passthrough filters (packages/app/src/hooks/useDashboardFilters.tsx:37). It destructures only { filters } from parseQuery and returns filtersToQuery(filterValues), so any unparseable Lucene condition in the URL is silently lost the moment a user toggles a dashboard filter. The migration effect a few lines below does preserve them, and useSearchPageFilterState.updateFilterQuery does too — make this consistent: pull passthroughFilters out of parseQuery and append them to the return value.
  • ⚠️ coerceBooleanValue (packages/common-utils/src/filters.ts) lossily converts the literal strings "true"/"false" to booleans on Lucene round-trip. The test file documents this as a "known limitation," but it's a real behavior regression for users filtering on string-typed columns that legitimately contain those values (e.g., a status string column). At minimum, scope the coercion to fields known to be boolean, or rely on the column type from metadata rather than blanket-coercing.
  • ⚠️ The migration useEffect in both useSearchPageFilterState and useDashboardFilters mutates the URL on mount via onFilterChange / setFilterQueries. Depending on the underlying nuqs history option, this can push an extra entry into browser history (an immediate back-button no-op for the user). Consider using history: 'replace' semantics for the migration write, or verify the existing setter already replaces.
  • ℹ️ PR description says "range stays SQL" but filters.ts actually emits Lucene range syntax field:[min TO max] — update the PR body so reviewers/changelog readers aren't misled.
  • ℹ️ The new parseLuceneFilter accepts a numeric range only when both bounds parse as floats; non-numeric range bounds (e.g. dates as ISO strings) are silently dropped without going to passthroughFilters. Probably fine since the writer side only emits numbers, but worth a comment so future callers don't expect symmetric round-tripping.

Nothing here is a merge blocker beyond the first two items, which I'd address before shipping since both can silently corrupt user-saved URL state.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 18, 2026

Deep Review

Scope: 13 files changed, ~990 net additions. Sidebar/dashboard filters now emit Lucene conditions instead of SQL IN clauses; legacy SQL filters auto-migrate on load. Touches packages/common-utils/src/filters.ts, packages/app/src/searchFilters.tsx, packages/app/src/hooks/useDashboardFilters.tsx, packages/app/src/ChartUtils.tsx, and widens an external-API Zod schema.

🔴 P0/P1 — must fix

  • packages/app/src/hooks/useDashboardFilters.tsx:88-94 — Dashboard migration fires whenever any type:'sql' filter is in the URL but does not check that parseQuery actually extracted anything. A SQL filter that extractInClauses skips (any condition with =, <, >, LIKE, IS NULL, or top-level OR) yields {filters:{}, passthroughFilters:[]}, so setFilterQueries([...filtersToQuery({}), ...[]]) overwrites the URL with [] and the filter is destroyed on page load. The matching guard in useSearchPageFilterState (Object.keys(parsedQuery.filters).length > 0) is missing here.

    • Fix: Gate the migration on Object.keys(parsed).length > 0 || passthroughFilters.length > 0, and route any SQL filter that produced zero parsed clauses into passthroughFilters from parseQuery itself so the data is preserved end-to-end.
    • correctness, adversarial, reliability, api-contract
  • packages/app/src/searchFilters.tsx:337,427-436parseQuery's SQL branch only collects IN/NOT IN/BETWEEN clauses; anything else is silently dropped and never appended to passthroughFilters (which only catches Lucene parse failures at line 318-322). The instant the user toggles any sidebar facet, updateFilterQuery re-emits [...filtersToQuery(newFilters), ...parsedQuery.passthroughFilters] and the unparseable SQL filter disappears. Same root cause as the dashboard variant above; affects any savedFilterValues (now newly accepting type:'sql' from external API per zod.ts:148-152) whose condition uses an operator outside the supported set.

    • Fix: When a type:'sql' filter produces zero extracted IN/BETWEEN clauses, push the original filter into passthroughFilters so it round-trips through updateFilterQuery intact.
    • correctness, adversarial, reliability

🟡 P2 — recommended

  • packages/app/src/hooks/useDashboardFilters.tsx:24-41setFilterValue calls parseQuery(prev ?? []) and destructures only filters, dropping passthroughFilters on every facet update. The search hook (searchFilters.tsx:431-434) preserves them; the dashboard hook does not. Any unparseable Lucene filter that survived initial parse is lost the moment the user clicks a checkbox.

    • Fix: Destructure and append: return [...filtersToQuery(filterValues), ...passthroughFilters];.
    • correctness
  • packages/common-utils/src/filters.ts:74-97collectFromAst walks both branches of a BinaryAST without preserving the operator. A Lucene filter like service:"a" OR level:"b" is collected as two separate fields, then re-emitted by filtersToQuery as two Filter[] entries that the downstream serializer AND-joins. Saved Lucene filters with cross-field OR semantics silently flip to AND on the next interaction. Same-field OR survives because terms collapse into one included set.

    • Fix: When collectFromAst encounters an OR BinaryAST whose branches cover different fields, route the original condition through passthroughFilters instead of decomposing it.
    • correctness
  • packages/common-utils/src/filters.ts:99-105coerceBooleanValue unconditionally converts the strings "true"/"false" to booleans. For a String-typed ClickHouse column whose values literally include those strings (e.g. LogAttributes['enabled']='true'), FilterState ends up with Set{true} while the option list and Set.has(option.value) comparison still use strings, so the checkbox renders unchecked despite the filter being active and the user can no longer toggle it off cleanly. The test at __tests__/filters.test.ts:359-371 documents this as a "known limitation," but the UI side-effect has no coverage.

    • Fix: Don't coerce in parseLuceneFilter; let callers carry the raw string and only convert at the column boundary using column-type metadata, or treat boolean as a separate filter kind end-to-end.
    • correctness, kieran-typescript
  • packages/api/src/routers/external-api/v2/dashboards.ts:93-100 — Hand-written OpenAPI JSDoc still declares SavedFilterValue.type as enum: [sql] with description: "Currently only \"sql\" is supported" and the condition field as SQL filter condition. ... "column IN ('value')". The Zod schema (packages/api/src/utils/zod.ts:148-152) now accepts ['sql', 'lucene']. Generated SDKs and spec-driven clients won't know 'lucene' is valid.

    • Fix: Update the OpenAPI block to enum: [sql, lucene], broaden the description, and add a lucene example (e.g. ServiceName:"hdx-oss-dev-api").
    • api-contract
  • packages/app/src/ChartUtils.tsx:929-943buildEventsSearchUrl's group-filter path was rewritten from SqlString.escape SQL IN clauses to filtersToQuery-emitted Lucene. There is no unit test in packages/app/src/__tests__/ChartUtils.test.ts exercising this function — the only coverage is the e2e dashboard-table-linking.spec.ts, which exercises table-row clicks but not the multi-column groupFilters aggregation path or the value == null guard branch.

    • Fix: Add unit tests covering single-column, multi-column, and null-value group-filter cases against the new Lucene output.
    • testing
🔵 P3 nitpicks (10)
  • packages/common-utils/src/core/linkUrlBuilder.ts:40-46 — JSDoc for renderFilterTemplates still says "Render the filter entries into {expression} IN (v1, v2, ...) SQL conditions" but the function now emits Lucene via filtersToQuery.
    • Fix: Rewrite the JSDoc to describe Lucene output.
    • maintainability
  • packages/app/src/searchFilters.tsx:412-425, packages/app/src/hooks/useDashboardFilters.tsx:84-95 — Two near-identical SQL→Lucene migration effects with the divergent length > 0 guard noted above.
    • Fix: Extract a shared migrateSqlFilters(filters) helper in packages/common-utils/src/filters.ts so the guard logic lives in one place.
    • maintainability
  • packages/common-utils/src/filters.ts:37, packages/app/src/searchFilters.tsx:447,494,511, packages/app/src/hooks/useDashboardFilters.tsx:27,55parseKeyPath(x).join('.') is repeated six times across three files with inline comments explaining the same invariant.
    • Fix: Export a single normalizeFilterKey(key) helper from filters.ts and replace the inline calls.
    • maintainability
  • packages/app/src/searchFilters.tsx:419hasMigratedRef.current = true is set before the synchronous onFilterChange call; a thrown callback leaves the ref set and migration permanently skipped.
    • Fix: Move the assignment after onFilterChange returns, or wrap in try/finally that resets on error.
    • correctness, reliability
  • packages/app/src/searchFilters.tsx:316-337parseQuery handles 'lucene' and 'sql' explicitly but silently drops 'sql_ast' filters (no current upstream caller, but the Filter union allows them).
    • Fix: Add an explicit if (filter.type === 'sql_ast') { passthroughFilters.push(filter); continue; } branch.
    • kieran-typescript
  • packages/common-utils/src/filters.ts:84-90 — Range bounds parsed via parseFloat accept "2024-01-01" as 2024, silently flattening date-string ranges captured from migrated SQL BETWEEN to numeric.
    • Fix: Require strict numeric parsing (String(parseFloat(x)) === x.trim()) before accepting a range, otherwise route to passthrough.
    • correctness
  • packages/common-utils/src/filters.ts:79-83 — Implicit-field Lucene terms (bare quoted strings with no field:) get field = '<implicit>' from the parser and are re-emitted as a filter against a column literally named <implicit>.
    • Fix: Treat <implicit> as unparseable and route the original filter to passthrough.
    • correctness
  • packages/app/src/__tests__/searchFilters.test.ts:1065,1085,1108 — Three sites cast condition as { condition: string } instead of using the 'condition' in f discriminated-union narrowing already in filters.test.ts:5-7.
    • Fix: Replace the casts with the existing getCondition helper or 'condition' in condition guard.
    • kieran-typescript, project-standards
  • .changeset/filters-direct-read.md — Lists @hyperdx/common-utils and @hyperdx/app but not @hyperdx/api, even though packages/api/src/utils/zod.ts was widened.
    • Fix: Add an entry for @hyperdx/api (the fixed group with app may bump it implicitly, but explicit listing avoids ambiguity).
    • project-standards
  • packages/common-utils/src/filters.ts:141-145, packages/app/src/searchFilters.tsx:324-328 — Both getEntry patterns use byField.set(...) followed by byField.get(field)! with a non-null assertion.
    • Fix: Refactor to the let entry = map.get(k); if (!entry) { entry = {...}; map.set(k, entry); } return entry; idiom.
    • kieran-typescript

Reviewers (11): correctness, testing, maintainability, adversarial, kieran-typescript, api-contract, performance, reliability, project-standards, agent-native, learnings-researcher.

Testing gaps:

  • No unit test for ChartUtils.tsx buildEventsSearchUrl group-filter Lucene emission (only e2e covers it for table-row clicks).
  • No migration test for SQL filters using =/</>/OR/LIKE — the data-loss path is unprotected by regression.
  • No round-trip test for cross-field Lucene OR semantics.
  • No test that a String column with a literal "true"/"false" option round-trips through setFilterValue without breaking the checkbox selected-state comparison.
  • No test for parseQuery with a sql_ast-typed filter or with setFilterRange/clearFilter against a bracket-notation key.
  • No integration test verifying the emitted has(LogAttributeItems, concat(...)) SQL actually hits the text(tokenizer='array') skip index against a real ClickHouse instance.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 18, 2026

E2E Test Results

All tests passed • 178 passed • 3 skipped • 1226s

Status Count
✅ Passed 178
❌ Failed 0
⚠️ Flaky 4
⏭️ Skipped 3

Tests ran across 4 shards in parallel.

View full report →

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 18, 2026

🦋 Changeset detected

Latest commit: 0715ffb

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 4 packages
Name Type
@hyperdx/common-utils Minor
@hyperdx/app Minor
@hyperdx/api Minor
@hyperdx/otel-collector Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@knudtty knudtty requested review from a team and bot-hyperdx and removed request for a team May 19, 2026 19:56
@knudtty knudtty requested review from a team and dhable and removed request for a team and bot-hyperdx May 19, 2026 19:56
@kodiakhq kodiakhq Bot merged commit 1df7583 into main May 19, 2026
19 checks passed
@kodiakhq kodiakhq Bot deleted the aaron/filters-direct-read branch May 19, 2026 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

automerge review/tier-3 Standard — full human review required

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants