feat: improved attribute sorting with entropy scoring#1854
feat: improved attribute sorting with entropy scoring#1854alex-fedotyev wants to merge 4 commits intomainfrom
Conversation
…l comparison Replace the basic max-delta sorting with proportional normalization (computeComparisonScore) that normalizes group sizes before computing deltas. Add Shannon entropy scoring (computeEntropyScore) for future distribution mode, and semantic boost for well-known OTel attributes. Closes #1826 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
🦋 Changeset detectedLatest commit: 0aadc28 The changes in this PR will be included in the next version bump. This PR includes changesets to release 3 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
Claude finished @alex-fedotyev's task in 50s —— View job PR Review✅ The core algorithms ( Issues:
|
E2E Test Results✅ All tests passed • 86 passed • 3 skipped • 919s
Tests ran across 4 shards in parallel. |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Claude finished @alex-fedotyev's task in 31s —— View job PR Review
Issues:
✅ Core algorithms ( |
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
Two fixes from code review: 1. computeComparisonScore empty-group fallback: when one group has no data, normalize the present group's values to percentages [0,100] instead of using raw counts. This prevents single-value fields with high raw counts from inflating their score relative to multi-value fields. Fixes scale inconsistency with distribution mode in #1855. 2. Wire semanticBoost into DBDeltaChart sort as a tiebreaker (0.1 weight, only when baseScore > 0). Previously exported and tested but never called. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
|
@claude - review code |
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
Code Review✅ No critical issues found. Solid scoring implementation with good test coverage (15 tests).
Overall: clean implementation, good separation of scoring functions from sort integration, well-documented with TODO for entropy integration. |
Summary
Closes #1826
Replaces the basic max-delta sorting for event delta attributes with smarter scoring algorithms:
computeComparisonScore): Normalizes each group's percentages to sum to 100% before computing max delta, so fields with identical proportional distributions score 0 regardless of coverage rate differences. This makes the sorting resilient to different sample sizes between outlier and inlier groups.computeEntropyScore): Returns [0, 1] where 1 = maximally useful (low entropy, dominant value among several) and 0 = not useful (single value, empty, or perfectly uniform). Prepared for future use in distribution mode.semanticBoost): Provides a tiebreaker boost for well-known OTel attributes (e.g.,service.name,http.status_code,error) to surface the most operationally relevant fields.Currently uses
computeComparisonScorefor sorting. Entropy scoring will be integrated when #1824 (always-on distribution mode) merges.Test plan
🤖 Generated with Claude Code