Summary
Replace the basic max-delta sorting with Shannon entropy scoring for distribution mode and proportional normalization for comparison mode, ensuring the most useful attributes for outlier detection appear first.
Problem
Distribution mode: The original max(pct) - mean(pcts) skewness score doesn't handle multi-modal or power-law distributions well, and has no awareness of well-known OTel attributes.
Comparison mode: The original max(abs(outlierPct - inlierPct)) uses raw percentages where each group's denominator is totalRows of that group. When selection (500 rows) and background (1500 rows) have different sizes, a field with identical proportions (100% "message" in both) shows artificial deltas like |80% - 27%| = 53, pushing it above genuinely different fields.
Changes
Distribution mode scoring
computeEntropyScore(valuePercentages) — Shannon entropy: 1 - H(p)/log2(N). Returns 0 for single-value or uniform fields, close to 1 for highly skewed. Naturally handles power-law distributions
computeDistributionScore(valuePercentages) — original skewness score kept as alternative, selectable via DISTRIBUTION_SCORING constant
semanticBoost(key) — returns 1 for well-known OTel attribute suffixes (service.name, http.method, http.status_code, error, deployment.environment, rpc.method, db.system, etc.). Applied as 0.1 tiebreaker only when baseScore > 0 — single-value boosted attributes score 0
Comparison mode scoring
computeComparisonScore(outlierValues, inlierValues) — normalizes each group's percentages to sum to 100% before computing max delta. Fields with identical proportional distributions (100% "message" in both groups) score 0 regardless of coverage rate differences. Falls back to raw delta when one group has no data for a property
Sort integration
- Distribution mode:
sortScore = baseScore + (baseScore > 0 ? semanticBoost(key) * 0.1 : 0)
- Comparison mode:
sortScore = computeComparisonScore(outlierCount, inlierCount)
Files
packages/app/src/components/deltaChartUtils.ts (computeEntropyScore, computeComparisonScore, computeDistributionScore, semanticBoost, DISTRIBUTION_SCORING, BOOSTED_ATTRIBUTE_SUFFIXES)
packages/app/src/components/DBDeltaChart.tsx (sort logic in useMemo)
packages/app/src/components/__tests__/DBDeltaChart.test.ts (entropy tests, comparison tests, boost integration tests)
Dependencies
Test plan
Context
Part of the Event Deltas improvement series. Reference implementation in PR #1797.
Summary
Replace the basic max-delta sorting with Shannon entropy scoring for distribution mode and proportional normalization for comparison mode, ensuring the most useful attributes for outlier detection appear first.
Problem
Distribution mode: The original
max(pct) - mean(pcts)skewness score doesn't handle multi-modal or power-law distributions well, and has no awareness of well-known OTel attributes.Comparison mode: The original
max(abs(outlierPct - inlierPct))uses raw percentages where each group's denominator istotalRowsof that group. When selection (500 rows) and background (1500 rows) have different sizes, a field with identical proportions (100% "message" in both) shows artificial deltas like |80% - 27%| = 53, pushing it above genuinely different fields.Changes
Distribution mode scoring
computeEntropyScore(valuePercentages)— Shannon entropy:1 - H(p)/log2(N). Returns 0 for single-value or uniform fields, close to 1 for highly skewed. Naturally handles power-law distributionscomputeDistributionScore(valuePercentages)— original skewness score kept as alternative, selectable viaDISTRIBUTION_SCORINGconstantsemanticBoost(key)— returns 1 for well-known OTel attribute suffixes (service.name, http.method, http.status_code, error, deployment.environment, rpc.method, db.system, etc.). Applied as 0.1 tiebreaker only whenbaseScore > 0— single-value boosted attributes score 0Comparison mode scoring
computeComparisonScore(outlierValues, inlierValues)— normalizes each group's percentages to sum to 100% before computing max delta. Fields with identical proportional distributions (100% "message" in both groups) score 0 regardless of coverage rate differences. Falls back to raw delta when one group has no data for a propertySort integration
sortScore = baseScore + (baseScore > 0 ? semanticBoost(key) * 0.1 : 0)sortScore = computeComparisonScore(outlierCount, inlierCount)Files
packages/app/src/components/deltaChartUtils.ts(computeEntropyScore, computeComparisonScore, computeDistributionScore, semanticBoost, DISTRIBUTION_SCORING, BOOSTED_ATTRIBUTE_SUFFIXES)packages/app/src/components/DBDeltaChart.tsx(sort logic in useMemo)packages/app/src/components/__tests__/DBDeltaChart.test.ts(entropy tests, comparison tests, boost integration tests)Dependencies
Test plan
Context
Part of the Event Deltas improvement series. Reference implementation in PR #1797.