feat: deterministic sampling with adaptive sample size by alex-fedotyev · Pull Request #1849 · hyperdxio/hyperdx

alex-fedotyev · 2026-03-05T00:26:02Z

Summary

Replaces non-deterministic ORDER BY rand() with deterministic cityHash64(SpanId) sampling and introduces sampling configuration constants.

What this PR does

Deterministic sampling: ORDER BY cityHash64(SpanId) instead of rand() — same data always produces the same sample, so results are stable across re-renders
Named constants: SAMPLE_SIZE, STABLE_SAMPLE_EXPR replace hardcoded 1000 and 'rand()' in query configs
Adaptive sizing foundation: computeEffectiveSampleSize() function with MIN_SAMPLE_SIZE/MAX_SAMPLE_SIZE/SAMPLE_RATIO constants, exported and tested (6 unit tests)

What this PR does NOT do (follow-up)

Count query for adaptive sizing: Wiring computeEffectiveSampleSize into the actual queries requires adding a lightweight count() query. This is deferred to keep this PR focused on the deterministic sampling change.
Dynamic column detection: STABLE_SAMPLE_EXPR uses SpanId which is trace-specific. Event Deltas currently only renders on the traces search page where SpanId is always present. If the feature expands to logs/metrics, this should be parameterized per source (documented in code comment).

Test plan

Same data + same hover always highlights the same heatmap cells (deterministic)
Run npx jest src/components/__tests__/deltaChartSampling.test.ts — 6 tests pass

🤖 Generated with Claude Code

vercel · 2026-03-05T00:26:07Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
hyperdx-oss	Ready	Preview, Comment	Mar 5, 2026 4:24pm

changeset-bot · 2026-03-05T00:26:07Z

🦋 Changeset detected

Latest commit: 980ab65

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages

Name	Type
@hyperdx/app	Patch
@hyperdx/api	Patch
@hyperdx/otel-collector	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

github-actions · 2026-03-05T00:26:19Z

Claude finished @alex-fedotyev's task in 58s —— View job

Code Review

⚠️ Dead code / incomplete wiring — STABLE_SAMPLE_EXPR, computeEffectiveSampleSize, and all sampling constants in deltaChartUtils.ts are only referenced in the test file. They're never imported into DBDeltaChart.tsx or any query builder, so the deterministic sampling and adaptive sizing described in the PR description aren't actually active yet. → Wire these into the actual query (e.g. the ORDER BY / LIMIT in the delta chart's ClickHouse query), or hold back until they're connected.
🔒 Hardcoded SpanId column — STABLE_SAMPLE_EXPR = 'cityHash64(SpanId)' will silently produce wrong results (or a ClickHouse error) on log/metric sources that don't have a SpanId column. → Use the source's spanIdExpression dynamically, or pick a universally-present column (e.g. cityHash64(timestamp, RowID)) as the stable hash key.
⚠️ Unrelated changes in the same PR — The onAddFilter prop added to DBDeltaChart.tsx (lines 47–55, 221–229, 421, 458) has nothing to do with sampling. This mixes two separate features, making review and rollback harder. → Split into its own PR, or clearly call this out in the description.

github-actions · 2026-03-05T00:28:49Z

E2E Test Results

✅ All tests passed • 86 passed • 3 skipped • 941s

Status	Count
✅ Passed	86
❌ Failed	0
⚠️ Flaky	2
⏭️ Skipped	3

Tests ran across 4 shards in parallel.

View full report →

- Replace ORDER BY rand() with cityHash64(SpanId) for deterministic sampling - Replace hardcoded LIMIT 1000 with SAMPLE_SIZE constant - Add computeEffectiveSampleSize() for adaptive sizing (clamp 500-5000) - Add STABLE_SAMPLE_EXPR, SAMPLE_RATIO constants - 6 unit tests for computeEffectiveSampleSize Closes #1827 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-03-05T01:04:14Z

Claude finished @alex-fedotyev's task in 39s —— View job

Code Review

❌ computeEffectiveSampleSize is imported but never called (DBDeltaChart.tsx:25) → All three queries still use the hardcoded SAMPLE_SIZE (1000) constant; the adaptive sizing is dead code. The count query described in the PR summary also doesn't exist. Either wire up the count query + call computeEffectiveSampleSize with its result, or remove the import and the function until it's ready.
🔒 STABLE_SAMPLE_EXPR = 'cityHash64(SpanId)' hardcodes a trace-only column (deltaChartUtils.ts:306) → This will cause a ClickHouse error or silent wrong results on log/metric sources that don't have SpanId. Use a universally-present column (e.g., cityHash64(toString(timestamp))) or derive the expression dynamically from the source's schema.

…scope - Remove dead import of computeEffectiveSampleSize (adaptive sizing via count query will be added in a follow-up) - Add comment explaining STABLE_SAMPLE_EXPR uses SpanId because Event Deltas is currently trace-specific; should be parameterized if the feature expands to logs/metrics Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-03-05T01:08:05Z

Claude finished @alex-fedotyev's task in 37s —— View job

Code Review

❌ computeEffectiveSampleSize is never called — the three queries all use the static SAMPLE_SIZE = 1000. The PR description promises "adaptive sample sizing" but it's not wired up. Either call it with a count query result or remove it until ready. Fix this →
⚠️ STABLE_SAMPLE_EXPR = 'cityHash64(SpanId)' is trace-only (deltaChartUtils.ts:309) — will cause a ClickHouse error on log/metric sources without a SpanId column. Use a universally-present column like cityHash64(toString(timestamp)), or accept the current scope limitation with a clear code comment (one already exists, which is good, but the risk is real if the component is ever reused).

alex-fedotyev · 2026-03-05T01:13:30Z

@claude - repeat code review against the updated PR description and give aligned feedback

github-actions · 2026-03-05T01:14:45Z

Claude finished @alex-fedotyev's task in 1m 17s —— View job

Code Review (aligned with PR description)

Read changed files
Post aligned review

The latest commit (70023920) cleaned up the previously flagged unused import of computeEffectiveSampleSize. Here's a review aligned with what the PR actually claims to do and explicitly defers:

What this PR does well:

All 3 queries in DBDeltaChart.tsx correctly swap ORDER BY rand() → ORDER BY cityHash64(SpanId) DESC and 1000 → SAMPLE_SIZE — the core deterministic sampling is live and working.
Named constants replace magic numbers throughout, making future adjustments trivial.
The SpanId scope limitation is clearly documented inline (deltaChartUtils.ts:298–304) — no silent failure risk for the current use case (traces-only page).
computeEffectiveSampleSize is exported, tested (6 passing tests), and not imported in production code — this correctly matches the PR's stated intent to defer the count query wiring.

One genuine concern not previously raised:

⚠️ ORDER BY … DESC LIMIT is a biased top-N, not a statistical sample (DBDeltaChart.tsx:141, 196, 205) — ORDER BY cityHash64(SpanId) DESC LIMIT 1000 always returns the same 1000 rows: those with the largest hash values. For the stated goal of stable hover highlights across re-renders, this is exactly right. But it's worth knowing this isn't a statistically representative sample — if the data is large enough that 1000 rows is a meaningful subset, the selected rows are always a fixed population (highest-hash spans), not a random draw. This could matter if, e.g., certain span types cluster in hash space. For the current heatmap use case it's acceptable, but worth a brief comment like // always selects the same top-N rows by hash — biased but stable across renders.

Previous review concerns vs. current state:

Concern	Status
`computeEffectiveSampleSize` imported but unused	✅ Fixed in latest commit
Adaptive sizing not wired	✅ Explicitly deferred — PR description is clear
`STABLE_SAMPLE_EXPR` / `SpanId` scope	✅ Documented with code comment; scoped risk is acceptable
Unrelated `onAddFilter` changes	Not present in current diff — may have been on an older version

Overall the implementation matches the PR description accurately. The only call-out is the "biased sample" nuance, which is a minor documentation gap rather than a bug.

github-actions · 2026-03-05T15:19:29Z

Claude finished @alex-fedotyev's task in 22s —— View job

Code Review

✅ No critical issues found.

The implementation matches the PR description accurately:

ORDER BY cityHash64(SpanId) DESC LIMIT 1000 is correctly wired into all 3 queries
computeEffectiveSampleSize is exported/tested but intentionally not called yet (deferred per PR description)
SpanId scope limitation is documented inline

One minor note: ORDER BY cityHash64(SpanId) DESC LIMIT 1000 is a deterministic top-N (always the same 1000 rows), not a statistical sample. This is correct for stable hover highlights but worth knowing if representative sampling is ever needed.

pulpdrew · 2026-03-05T15:55:07Z

packages/app/src/components/deltaChartUtils.ts

+export const MIN_SAMPLE_SIZE = 500;
+export const MAX_SAMPLE_SIZE = 5000;
+export const SAMPLE_RATIO = 0.01;
+export const STABLE_SAMPLE_EXPR = 'cityHash64(SpanId)';


This should refer to the source's spanIdExpression - it may not always be SpanId. If the user has a custom schema, I suspect this would break.

Good catch — fixed in 980ab65. Replaced the hardcoded STABLE_SAMPLE_EXPR constant with getStableSampleExpression(spanIdExpression?) that accepts the source's spanIdExpression (passed from DBSearchHeatmapChart which already has the source). Falls back to rand() when not available.

packages/app/src/components/deltaChartUtils.ts

@pulpdrew

Address review feedback from @pulpdrew: - Replace hardcoded STABLE_SAMPLE_EXPR constant with getStableSampleExpression() that accepts the source's spanIdExpression, falling back to rand() - Pass spanIdExpression from source through DBSearchHeatmapChart -> DBDeltaChart - Apply JSDoc per-constant style for better editor hover docs - Add 4 tests for getStableSampleExpression (10 total) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-03-05T16:22:08Z

Claude finished @alex-fedotyev's task in 38s —— View job

Code Review

✅ No critical issues. The implementation matches the PR description accurately.

getStableSampleExpression(spanIdExpression?) correctly uses the source's spanIdExpression (passed from DBSearchHeatmapChart.tsx:122) and falls back to rand() — the previously flagged hardcoded SpanId concern is resolved.
All 3 queries in DBDeltaChart.tsx use stableSampleExpr and SAMPLE_SIZE — deterministic sampling is wired and active.
computeEffectiveSampleSize is exported + tested but intentionally not called — matches the PR's stated deferral.
JSDoc comments are now on each constant as suggested by the reviewer.

One minor note (not blocking): ORDER BY cityHash64(spanId) DESC LIMIT 1000 is a deterministic top-N, not a statistically representative sample — it always selects the same highest-hash rows. This is the correct behavior for stable hover highlights, but worth a brief inline comment if representative sampling is ever a future goal.

Resolve conflicts: keep both onAddFilter (filter-actions) and spanIdExpression (sampling, merged via #1849) props. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

vercel bot had a problem deploying to Preview March 5, 2026 00:27 Failure

alex-fedotyev force-pushed the event-deltas/sampling branch from 841885d to ff88d90 Compare March 5, 2026 01:03

vercel bot deployed to Preview March 5, 2026 01:07 View deployment

alex-fedotyev self-assigned this Mar 5, 2026

vercel bot deployed to Preview March 5, 2026 01:11 View deployment

Merge branch 'main' into event-deltas/sampling

aa2102f

vercel bot deployed to Preview March 5, 2026 15:21 View deployment

alex-fedotyev marked this pull request as ready for review March 5, 2026 15:33

alex-fedotyev requested review from a team and pulpdrew and removed request for a team March 5, 2026 15:35

pulpdrew reviewed Mar 5, 2026

View reviewed changes

vercel bot deployed to Preview March 5, 2026 16:24 View deployment

pulpdrew approved these changes Mar 5, 2026

View reviewed changes

alex-fedotyev added the automerge label Mar 5, 2026

kodiakhq bot merged commit 68ef3d6 into main Mar 5, 2026
20 of 21 checks passed

kodiakhq bot deleted the event-deltas/sampling branch March 5, 2026 16:52

Conversation

alex-fedotyev commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What this PR does

What this PR does NOT do (follow-up)

Test plan

Uh oh!

vercel bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

changeset-bot bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

github-actions bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Uh oh!

github-actions bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Test Results

Uh oh!

github-actions bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Uh oh!

github-actions bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Uh oh!

alex-fedotyev commented Mar 5, 2026

Uh oh!

github-actions bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review (aligned with PR description)

Uh oh!

github-actions bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Uh oh!

pulpdrew Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

alex-fedotyev Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alex-fedotyev commented Mar 5, 2026 •

edited

Loading

vercel bot commented Mar 5, 2026 •

edited

Loading

changeset-bot bot commented Mar 5, 2026 •

edited

Loading

github-actions bot commented Mar 5, 2026 •

edited

Loading

github-actions bot commented Mar 5, 2026 •

edited

Loading

github-actions bot commented Mar 5, 2026 •

edited

Loading

github-actions bot commented Mar 5, 2026 •

edited

Loading

github-actions bot commented Mar 5, 2026 •

edited

Loading

github-actions bot commented Mar 5, 2026 •

edited

Loading

github-actions bot commented Mar 5, 2026 •

edited

Loading