Skip to content

fix(scripts): trend annotations fall back to nearest non-null prior release#1120

Open
carlos-alm wants to merge 7 commits into
mainfrom
fix/incremental-report-trend-fallback
Open

fix(scripts): trend annotations fall back to nearest non-null prior release#1120
carlos-alm wants to merge 7 commits into
mainfrom
fix/incremental-report-trend-fallback

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • scripts/update-incremental-report.ts compared each release against the immediately prior release only. When that prior release had null build/rebuild metrics (e.g. 3.9.5, where both engine workers were SIGKILL'd on timeout), the trend cell was silently emitted blank — hiding real regressions.
  • 3.9.6 wasm full-build was 14.0s vs 3.9.4's 7.6s (~86% slower) and no-op was 131ms vs 19ms (~589% slower), but the table showed no arrow at all.
  • Replace the per-release lookup with findPrevMetric — a per-metric walk back through history that skips releases without a value for that specific metric.
  • Apply the same fallback in engineRow (the version-history table) and the regression-warning block.
  • Regenerated generated/benchmarks/INCREMENTAL-BENCHMARKS.md so the existing rows reflect the fallback. Notable previously-blank rows now annotated: 3.9.6 (vs 3.9.4), 3.8.0 native (vs 3.6.0 native), 3.6.0 wasm (vs 3.4.1 wasm).
  • Added regression test tests/unit/incremental-report-trend-fallback.test.ts gated on the CODEGRAPH_INCREMENTAL_REPORT_PATH env override so the script can target a temp file from tests.

Note: PR #1119 (closing #1042) introduces the same env override. Whichever lands first, the other rebases trivially.

Closes #1043

Test plan

  • npx vitest run tests/unit/incremental-report-trend-fallback.test.ts — both cases pass
  • Visual inspection of the regenerated INCREMENTAL-BENCHMARKS.md confirms previously-blank trend cells now show the expected fallback annotations
  • npm run lint — clean
  • CI green

…nnotations

scripts/update-incremental-report.ts compared each release's metrics
against the immediately prior release only. When that prior release had
null build/rebuild metrics (e.g. 3.9.5 had null after both engine
workers were SIGKILL'd on timeout), the trend cell was silently emitted
blank, hiding real regressions: 3.9.6 wasm full-build was 14.0s vs
3.9.4's 7.6s (~86% slower) and no-op was 131ms vs 19ms (~589% slower),
but the table showed no arrow at all.

Replace the per-release prev lookup with findPrevMetric — a per-metric
walk back through history that skips releases without a value for that
specific metric. Apply consistently in engineRow (the table) and the
regression-warning block. Regenerated the report so existing trend
annotations now reflect the fallback (3.9.6, 3.8.0 native, 3.6.0 wasm).

Closes #1043
@claude
Copy link
Copy Markdown

claude Bot commented May 14, 2026

Claude finished @carlos-alm's task —— View job


PR Review Complete

Final Assessment

Root Cause Analysis

Problem Being Solved: The incremental benchmark script was showing blank trend arrows when comparing against releases with null metrics (e.g., when workers timed out and stored null values). For example, version 3.9.6 had 86% slower WASM builds and 589% slower no-ops compared to 3.9.4, but showed no trend arrows because it only looked at the immediate prior release (3.9.5) which had null metrics.

Why This Approach: The PR introduces findPrevMetric() which walks back through release history to find the nearest non-null value for each specific metric. This ensures trend annotations reflect actual performance changes rather than being hidden by intermediate releases with incomplete data.

Risk Assessment: Low risk. This is a reporting/analytics fix with comprehensive test coverage. No runtime or CLI behavior changes.

Backlog Compliance

  • Zero-dep: ✓ — No new dependencies added
  • Foundation-aligned: ✓ — Pure reporting improvement, maintains zero-cost core
  • Problem-fit: 2 — Nice improvement for development observability, but tangential to core agent navigation problem
  • Breaking: No — Only changes generated report content, no CLI/API changes
  • Tier: 1

Code Quality Review

Implementation Quality ⭐⭐⭐⭐☆

The findPrevMetric() function is well-designed:

  • Clean separation of concerns - single purpose function
  • Proper null handling and dev version skipping (lines 64-65)
  • Applied consistently across all metric types (full build, no-op, one-file, resolve metrics)
  • Maintains backward compatibility with existing report format

Test Coverage ⭐⭐⭐⭐⭐

Excellent test implementation (tests/unit/incremental-report-trend-fallback.test.ts):

  • Comprehensive scenarios: Tests both fallback behavior and empty history edge case
  • Realistic data: Uses actual version patterns that mirror real release structure
  • Isolation: Uses temp files and proper cleanup with CODEGRAPH_INCREMENTAL_REPORT_PATH env override
  • Assertions: Verifies specific trend percentages (↑100%, ↑1200%, ↑20%) and negative cases
  • Edge cases: Handles first release (no prior history) correctly

Code Changes Analysis

scripts/update-incremental-report.ts (lines 58-69, 92-97, 101-104, 194-228)

Clean function extraction: findPrevMetric() is pure and reusable
Consistent application: Applied to all metric types (fullBuildMs, noopRebuildMs, etc.)
Regression detection: Also updated to use fallback logic (lines 194-228)
Dev version handling: Properly skips dev versions when finding prior metrics

Test Structure

Node.js compatibility: Uses --experimental-strip-types flag with version detection
Proper isolation: Temp directories, env overrides, cleanup in afterEach
Realistic fixtures: Hand-crafted release history mimicking actual benchmark data

Generated File Updates

The regenerated INCREMENTAL-BENCHMARKS.md shows the fix working correctly:

  • Line 12: 3.9.6 wasm: ↑86% (vs 3.9.4, not blank due to null 3.9.5)
  • Line 25: 3.8.0 native: ↓50% (vs 3.6.0, skipping the gaps)
  • Previously blank trend cells now show meaningful annotations

Minor Observations

  • Env override pattern: The CODEGRAPH_INCREMENTAL_REPORT_PATH pattern is clean and follows existing conventions
  • Error handling: Graceful fallback when no prior metric exists (return null)
  • Performance: No performance impact - just changes lookup logic during report generation

Critical Concerns

None identified. This is a solid bug fix with excellent test coverage.

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: Well-implemented fix for a real reporting issue. Clean code, comprehensive tests, and no breaking changes. The solution properly handles edge cases and maintains backward compatibility. This directly improves the usefulness of performance trend data without any downside risk.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 14, 2026

Greptile Summary

This PR fixes trend annotations in the incremental benchmark report so that null-metric releases (where engine workers were killed) no longer silently break the comparison chain. A new findPrevMetric helper walks back through history to find the nearest non-null value for each individual metric, replacing the coarser findPrevRelease approach that returned the whole prior release object.

  • findPrevMetric replaces findPrevRelease in engineRow, the table-building loop, and the regression-warning block, so any single metric gap (e.g. a crashed WASM worker) no longer blanks all trend cells for that release.
  • trend() gains a previous === 0 guard (matching the existing guard in checkRegression) to prevent a ↑Infinity% annotation if a zero baseline ever appears in history.
  • A new integration test seeds a null-metric intermediate release, runs the script against a temp file via CODEGRAPH_INCREMENTAL_REPORT_PATH, and asserts the expected fallback annotations on the skipped-over release.

Confidence Score: 5/5

Safe to merge — the change is additive (a fallback that only activates on null-metric gaps), the zero-division guard is a pure defensive tightening, and the new test exercises both the fallback path and the empty-history path end-to-end against a real script invocation.

All three paths that consume findPrevMetric (table rows, regression detection, resolve-metric trends) behave correctly and consistently. The previous two review concerns — the previous === 0 guard and the duplicate findPrevMetric calls per release — are both addressed in this version. No correctness issues remain.

No files require special attention.

Important Files Changed

Filename Overview
scripts/update-incremental-report.ts Core logic rewritten to use per-metric history walkback; zero-division guard added to trend(); resolve-metric baselines hoisted out of engineRow to avoid duplicate scans per release. Logic is correct and consistent across table-building and regression-detection paths.
tests/unit/incremental-report-trend-fallback.test.ts New integration test covering the two key scenarios: fallback over a null-metric release and first-ever release with no history. Assertions directly verify trend annotation strings in the rendered markdown.
generated/benchmarks/INCREMENTAL-BENCHMARKS.md Regenerated output; previously-blank trend cells for 3.9.6, 3.8.0 native, and 3.6.0 wasm now carry the correct fallback annotations. No logic lives here.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[For each release i in history] --> B[Pre-compute prevResolveNative\nprevResolveJs via findPrevMetric]
    B --> C[engineRow for native]
    B --> D[engineRow for wasm]
    C --> E{engine key present?}
    D --> E
    E -->|No| F[return null — skip row]
    E -->|Yes| G[findPrevMetric for\nfullBuildMs / noopRebuildMs\noneFileRebuildMs]
    G --> H{Walk history fromIdx+1}
    H -->|version === 'dev'| H
    H -->|getter returns null| H
    H -->|getter returns non-null v| I[return v]
    H -->|end of history| J[return null]
    I --> K[trend current vs v]
    J --> K
    K -->|v == null or v === 0| L[return '' blank cell]
    K -->|abs pct lt 2| M[return ' ~']
    K -->|pct significant| N[return ↑/↓ pct%]
Loading

Reviews (8): Last reviewed commit: "Merge branch 'main' into fix/incremental..." | Re-trigger Greptile

Comment on lines 130 to 135
for (let i = 0; i < history.length; i++) {
const h = history[i];
const prev = findPrevRelease(history, i);

const nativeRow = engineRow(h, prev, 'native');
const wasmRow = engineRow(h, prev, 'wasm');
const nativeRow = engineRow(history, i, 'native');
const wasmRow = engineRow(history, i, 'wasm');
if (nativeRow) md += nativeRow + '\n';
if (wasmRow) md += wasmRow + '\n';
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The natT/jsT resolve trends are computed inside engineRow, which is called once for each engine type per release. Since h.resolve is release-level (not engine-specific), findPrevMetric for nativeBatchMs and jsFallbackMs is invoked twice per release — returning identical results both times. This is harmless but wasteful for long history arrays. Consider hoisting the resolve trend calls to the call site, alongside the nativeRow/wasmRow pair, so they are computed once and passed in.

Suggested change
for (let i = 0; i < history.length; i++) {
const h = history[i];
const prev = findPrevRelease(history, i);
const nativeRow = engineRow(h, prev, 'native');
const wasmRow = engineRow(h, prev, 'wasm');
const nativeRow = engineRow(history, i, 'native');
const wasmRow = engineRow(history, i, 'wasm');
if (nativeRow) md += nativeRow + '\n';
if (wasmRow) md += wasmRow + '\n';
}
for (let i = 0; i < history.length; i++) {
// Resolve metrics are release-level; pre-compute their trends once per
// release so engineRow doesn't redundantly walk history twice.
const prevNative = findPrevMetric(history, i, (x) => x.resolve?.nativeBatchMs);
const prevJs = findPrevMetric(history, i, (x) => x.resolve?.jsFallbackMs);
const nativeRow = engineRow(history, i, 'native', prevNative, prevJs);
const wasmRow = engineRow(history, i, 'wasm', prevNative, prevJs);
if (nativeRow) md += nativeRow + '\n';
if (wasmRow) md += wasmRow + '\n';
}

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 628d37d — hoisted findPrevMetric for the release-level resolve metrics (nativeBatchMs, jsFallbackMs) to the call site and pass them into engineRow, so they are computed once per release instead of twice.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 14, 2026

Codegraph Impact Analysis

3 functions changed2 callers affected across 1 files

  • findPrevMetric in scripts/update-incremental-report.ts:62 (2 transitive callers)
  • trend in scripts/update-incremental-report.ts:72 (2 transitive callers)
  • engineRow in scripts/update-incremental-report.ts:89 (1 transitive callers)

…lookup (#1120)

- trend(): treat previous === 0 the same as null to avoid emitting
  '↑Infinity%' if a historical entry stores 0 for a metric.
  Matches the existing previous === 0 guard in checkRegression.
- engineRow(): accept pre-computed prevResolveNative/prevResolveJs from
  the caller instead of recomputing them inside the function. Resolve
  metrics are release-level (not engine-specific), so the previous code
  walked history twice per release (once for native, once for wasm)
  returning identical results both times.
@carlos-alm
Copy link
Copy Markdown
Contributor Author

Addressed Greptile review feedback (commit 628d37d):

  • Hoisted resolve trend lookup (line 130-135): pre-compute prevResolveNative / prevResolveJs once per release at the call site, and pass them into engineRow. Resolve metrics are release-level, so we no longer walk history twice per release (once per engine).
  • Zero-baseline guard in trend() (line 72-80, from the Comments Outside Diff section): trend() now early-returns '' when previous === 0, matching the existing guard in checkRegression. Prevents ↑Infinity% if a historical metric is genuinely zero.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

follow-up: trend annotations should fall back to nearest non-null prior release

1 participant