Skip to content

feat: improved attribute sorting with entropy scoring#1854

Draft
alex-fedotyev wants to merge 4 commits intomainfrom
event-deltas/attribute-sorting
Draft

feat: improved attribute sorting with entropy scoring#1854
alex-fedotyev wants to merge 4 commits intomainfrom
event-deltas/attribute-sorting

Conversation

@alex-fedotyev
Copy link
Contributor

@alex-fedotyev alex-fedotyev commented Mar 5, 2026

Summary

Closes #1826

Replaces the basic max-delta sorting for event delta attributes with smarter scoring algorithms:

  • Proportional comparison scoring (computeComparisonScore): Normalizes each group's percentages to sum to 100% before computing max delta, so fields with identical proportional distributions score 0 regardless of coverage rate differences. This makes the sorting resilient to different sample sizes between outlier and inlier groups.
  • Shannon entropy scoring (computeEntropyScore): Returns [0, 1] where 1 = maximally useful (low entropy, dominant value among several) and 0 = not useful (single value, empty, or perfectly uniform). Prepared for future use in distribution mode.
  • Semantic boost (semanticBoost): Provides a tiebreaker boost for well-known OTel attributes (e.g., service.name, http.status_code, error) to surface the most operationally relevant fields.

Currently uses computeComparisonScore for sorting. Entropy scoring will be integrated when #1824 (always-on distribution mode) merges.

Test plan

  • 15 unit tests covering all three scoring functions pass
  • Verify attribute order improves in practice: fields with proportionally different distributions should rank higher than fields that differ only due to sample size
  • Confirm no regression in existing event delta chart behavior

🤖 Generated with Claude Code

…l comparison

Replace the basic max-delta sorting with proportional normalization
(computeComparisonScore) that normalizes group sizes before computing
deltas. Add Shannon entropy scoring (computeEntropyScore) for future
distribution mode, and semantic boost for well-known OTel attributes.

Closes #1826

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vercel
Copy link

vercel bot commented Mar 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
hyperdx-oss Ready Ready Preview, Comment Mar 5, 2026 10:09pm

Request Review

@changeset-bot
Copy link

changeset-bot bot commented Mar 5, 2026

🦋 Changeset detected

Latest commit: 0aadc28

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@hyperdx/app Patch
@hyperdx/api Patch
@hyperdx/otel-collector Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

Claude finished @alex-fedotyev's task in 50s —— View job


PR Review

✅ The core algorithms (computeComparisonScore, computeEntropyScore) are mathematically correct and well-tested.

Issues:

  • ⚠️ semanticBoost is dead code — defined, exported, and tested, but never imported or called in DBDeltaChart.tsx. The PR description says it "Provides a tiebreaker boost" but it's never applied to the sort score at DBDeltaChart.tsx:257. Either integrate it (e.g., sortScore + semanticBoost(key) * 0.1) or note it's intentionally deferred like computeEntropyScore. Fix this →

  • ⚠️ Raw delta fallback is scale-inconsistent — when one group is empty (outlierSum === 0 || inlierSum === 0), computeComparisonScore returns the raw count (e.g., 50 for a count of 50), while the normalized path returns values in [0, 100]. Fields with an empty inlier group but large outlier counts could be scored unexpectedly high or low relative to other fields. Consider normalizing to [0, 100] in this case too (e.g., return 100 if one group is fully absent).

@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

E2E Test Results

All tests passed • 86 passed • 3 skipped • 919s

Status Count
✅ Passed 86
❌ Failed 0
⚠️ Flaky 2
⏭️ Skipped 3

Tests ran across 4 shards in parallel.

View full report →

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

Claude finished @alex-fedotyev's task in 31s —— View job


PR Review

  • Read changed files
  • Provide review feedback

Issues:

  • ⚠️ semanticBoost is dead code — exported and tested but never applied in DBDeltaChart.tsx:257. The PR description says it's a "tiebreaker boost" but sortScore is only computeComparisonScore(...) with no boost applied. Either integrate it (sortScore + semanticBoost(key) * 0.1) or explicitly mark it as deferred like computeEntropyScore. Fix this →

  • ⚠️ computeComparisonScore empty-group fallback is scale-inconsistent — when one group is empty, it returns a raw count delta (e.g., 50 for 50 occurrences), while the normal path returns values in [0, 100]. A field with a large outlier count and empty inlier group will be scored orders of magnitude higher than fields scored on the normalized path. Fix: return 100 when one group sum is zero and the other is nonzero (or clamp to [0, 100]).

✅ Core algorithms (computeComparisonScore normalized path, computeEntropyScore) are mathematically correct and well-tested.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

Claude encountered an error —— View job


I'll analyze this and get back to you.

Two fixes from code review:

1. computeComparisonScore empty-group fallback: when one group has no
   data, normalize the present group's values to percentages [0,100]
   instead of using raw counts. This prevents single-value fields with
   high raw counts from inflating their score relative to multi-value
   fields. Fixes scale inconsistency with distribution mode in #1855.

2. Wire semanticBoost into DBDeltaChart sort as a tiebreaker (0.1
   weight, only when baseScore > 0). Previously exported and tested
   but never called.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

Claude encountered an error —— View job


I'll analyze this and get back to you.

@alex-fedotyev
Copy link
Contributor Author

@claude - review code

@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

Claude encountered an error —— View job


I'll analyze this and get back to you.

@alex-fedotyev
Copy link
Contributor Author

Code Review

✅ No critical issues found. Solid scoring implementation with good test coverage (15 tests).

  • ⚠️ mergeValueStatisticsMaps removed from imports but still used in useMemo (DBDeltaChart.tsx) — the import was dropped in the diff but the function is still called inside the useMemo for building mergedArray in the hidden/visible property split. Verify this doesn't cause a build error (it may be imported transitively or the diff may be misleading due to merge with main).

  • ℹ️ computeEntropyScore exported but only used in tests — intentionally deferred for Event Deltas: Always-on attribute distribution mode (no selection required) #1824 integration per the TODO comment. Acceptable for now but should be wired in when feat: always-on attribute distribution mode #1855 merges.

  • ℹ️ semanticBoost suffix matching could false-positive on attributes like SpanAttributes.custom.error.type.something if a user has deeply nested attribute keys where a boosted suffix appears mid-path. The endsWith check mitigates this for typical OTel naming, but worth noting. Not a blocker.

Overall: clean implementation, good separation of scoring functions from sort integration, well-documented with TODO for entropy integration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Event Deltas: Improved attribute sorting with entropy scoring and proportional comparison

1 participant