feat: deterministic sampling with adaptive sample size#1849
feat: deterministic sampling with adaptive sample size#1849kodiakhq[bot] merged 4 commits intomainfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
🦋 Changeset detectedLatest commit: 980ab65 The changes in this PR will be included in the next version bump. This PR includes changesets to release 3 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
Claude finished @alex-fedotyev's task in 58s —— View job Code Review
|
E2E Test Results✅ All tests passed • 86 passed • 3 skipped • 941s
Tests ran across 4 shards in parallel. |
- Replace ORDER BY rand() with cityHash64(SpanId) for deterministic sampling - Replace hardcoded LIMIT 1000 with SAMPLE_SIZE constant - Add computeEffectiveSampleSize() for adaptive sizing (clamp 500-5000) - Add STABLE_SAMPLE_EXPR, SAMPLE_RATIO constants - 6 unit tests for computeEffectiveSampleSize Closes #1827 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
841885d to
ff88d90
Compare
|
Claude finished @alex-fedotyev's task in 39s —— View job Code Review
|
…scope - Remove dead import of computeEffectiveSampleSize (adaptive sizing via count query will be added in a follow-up) - Add comment explaining STABLE_SAMPLE_EXPR uses SpanId because Event Deltas is currently trace-specific; should be parameterized if the feature expands to logs/metrics Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Claude finished @alex-fedotyev's task in 37s —— View job Code Review
|
|
@claude - repeat code review against the updated PR description and give aligned feedback |
|
Claude finished @alex-fedotyev's task in 1m 17s —— View job Code Review (aligned with PR description)
The latest commit ( What this PR does well:
One genuine concern not previously raised:
Previous review concerns vs. current state:
Overall the implementation matches the PR description accurately. The only call-out is the "biased sample" nuance, which is a minor documentation gap rather than a bug. |
|
Claude finished @alex-fedotyev's task in 22s —— View job Code Review✅ No critical issues found. The implementation matches the PR description accurately:
One minor note: |
| export const MIN_SAMPLE_SIZE = 500; | ||
| export const MAX_SAMPLE_SIZE = 5000; | ||
| export const SAMPLE_RATIO = 0.01; | ||
| export const STABLE_SAMPLE_EXPR = 'cityHash64(SpanId)'; |
There was a problem hiding this comment.
This should refer to the source's spanIdExpression - it may not always be SpanId. If the user has a custom schema, I suspect this would break.
There was a problem hiding this comment.
Good catch — fixed in 980ab65. Replaced the hardcoded STABLE_SAMPLE_EXPR constant with getStableSampleExpression(spanIdExpression?) that accepts the source's spanIdExpression (passed from DBSearchHeatmapChart which already has the source). Falls back to rand() when not available.
Address review feedback from @pulpdrew: - Replace hardcoded STABLE_SAMPLE_EXPR constant with getStableSampleExpression() that accepts the source's spanIdExpression, falling back to rand() - Pass spanIdExpression from source through DBSearchHeatmapChart -> DBDeltaChart - Apply JSDoc per-constant style for better editor hover docs - Add 4 tests for getStableSampleExpression (10 total) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Claude finished @alex-fedotyev's task in 38s —— View job Code Review✅ No critical issues. The implementation matches the PR description accurately.
One minor note (not blocking): |
Resolve conflicts: keep both onAddFilter (filter-actions) and spanIdExpression (sampling, merged via #1849) props. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Closes #1827
Replaces non-deterministic
ORDER BY rand()with deterministiccityHash64(SpanId)sampling and introduces sampling configuration constants.What this PR does
ORDER BY cityHash64(SpanId)instead ofrand()— same data always produces the same sample, so results are stable across re-rendersSAMPLE_SIZE,STABLE_SAMPLE_EXPRreplace hardcoded1000and'rand()'in query configscomputeEffectiveSampleSize()function withMIN_SAMPLE_SIZE/MAX_SAMPLE_SIZE/SAMPLE_RATIOconstants, exported and tested (6 unit tests)What this PR does NOT do (follow-up)
computeEffectiveSampleSizeinto the actual queries requires adding a lightweightcount()query. This is deferred to keep this PR focused on the deterministic sampling change.STABLE_SAMPLE_EXPRusesSpanIdwhich is trace-specific. Event Deltas currently only renders on the traces search page whereSpanIdis always present. If the feature expands to logs/metrics, this should be parameterized per source (documented in code comment).Test plan
npx jest src/components/__tests__/deltaChartSampling.test.ts— 6 tests pass🤖 Generated with Claude Code