Skip to content

feat(frontend): traces ui#4038

Closed
NathanFlurry wants to merge 1 commit intotraces-storagefrom
01-24-feat_frontend_traces_ui
Closed

feat(frontend): traces ui#4038
NathanFlurry wants to merge 1 commit intotraces-storagefrom
01-24-feat_frontend_traces_ui

Conversation

@NathanFlurry
Copy link
Member

No description provided.

Copy link
Member Author

NathanFlurry commented Jan 26, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@pkg-pr-new
Copy link

pkg-pr-new bot commented Jan 26, 2026

More templates

@rivetkit/virtual-websocket

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/virtual-websocket@4038

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@4038

@rivetkit/db

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/db@4038

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@4038

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@4038

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@4038

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@4038

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@4038

@rivetkit/traces

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/traces@4038

@rivetkit/workflow-engine

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/workflow-engine@4038

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@4038

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@4038

commit: 4189086

@claude
Copy link

claude bot commented Jan 26, 2026

PR Review: Traces UI

This PR adds a comprehensive tracing UI along with a queue inspector feature. Overall, the implementation is solid with good architectural decisions. Below are my findings:


Strengths

Architecture & Design

  • Well-structured feature flags: Version-based feature detection (MIN_RIVETKIT_VERSION_TRACES, MIN_RIVETKIT_VERSION_QUEUE) with clear user messaging is excellent
  • Efficient data handling: Using CBOR serialization, BARE protocol versioning, and proper chunking for trace data shows strong understanding of performance needs
  • Separation of concerns: Clean separation between UI components (actor-traces.tsx), data processing (read-range.ts), and inspector protocol (actor-inspector-context.tsx)
  • Live/historical modes: The dual-mode UI with preset time ranges and custom date picking is user-friendly

Code Quality

  • Proper use of React patterns: useMemo for expensive computations (trace tree building, query results), proper dependency arrays
  • Good error handling: Graceful degradation when features aren't available, clear error states
  • Type safety: Strong TypeScript usage throughout with proper type imports from OTLP schemas

🔍 Issues & Recommendations

1. Performance Concerns

Critical: Unbounded recursion in trace rendering (actor-traces.tsx:247-291)

function renderItemsWithGaps(items: TraceItem[], depth: number, nowNs: bigint): ReactElement[]
  • Issue: No maximum depth check for nested spans
  • Impact: Deep trace hierarchies could cause stack overflow or performance degradation
  • Fix: Add a MAX_DEPTH constant and stop recursion at that level with a visual indicator
const MAX_DEPTH = 20;
if (depth >= MAX_DEPTH) {
  return [<div key="max-depth" className="text-xs text-muted-foreground ml-4">Max nesting depth reached...</div>];
}

Memory: Large trace arrays kept in memory (actor-traces.tsx:113-119)

const traceTree = useMemo(() => {
  if (!queryResult) return [];
  const spans = extractSpans(queryResult.otlp);
  return buildSpanTree(spans);
}, [queryResult]);
  • Issue: With DEFAULT_LIMIT = 1000, this could be substantial memory
  • Suggestion: Consider pagination or virtualization for very large trace sets
  • Note: The 1000 limit helps, but document this clearly for users

Sorting performance (read-range.ts:129-133)

records.sort((a, b) => {
  if (a.absNs < b.absNs) return -1;
  if (a.absNs > b.absNs) return 1;
  return a.sequence - b.sequence;
});
  • Issue: Sorting bigints can be slow for large datasets
  • Optimization: Consider using a more efficient comparison: return Number(a.absNs - b.absNs) || (a.sequence - b.sequence) (with overflow checks)

2. Bug: Potential race condition in queue size updates

Location: queue-manager.ts:80, 97, 143, 166

this.#actor.inspector.updateQueueSize(this.#metadata.size);
  • Issue: Queue size updates happen after database writes but aren't atomic with them
  • Scenario: If a write fails after metadata is updated in memory, the inspector will show incorrect size
  • Fix: Only call updateQueueSize after successful writes are confirmed
await this.#driver.kvBatchPut(this.#actor.id, [...]);
// Only update inspector after successful write
this.#actor.inspector.updateQueueSize(this.#metadata.size);

3. Code Quality Issues

Missing null checks (actor-traces.tsx:463-464)

const parentId = node.span.parentSpanId;
if (parentId && byId.has(parentId)) {
  byId.get(parentId)?.children.push(node);  // Optional chaining after has() check is redundant
}
  • Issue: After has(parentId) check, get(parentId) can still be undefined (rare edge case)
  • Fix: Use the pattern:
if (parentId) {
  const parent = byId.get(parentId);
  if (parent) parent.children.push(node);
}

Inconsistent error handling (actor-queue.tsx:32-38)

if (queueStatusQuery.isError || !queueStatusQuery.data) {
  return <div>Queue data is currently unavailable.</div>;
}
  • Issue: Doesn't distinguish between network errors, permissions, or actual failures
  • Suggestion: Check queueStatusQuery.error and provide more specific messaging

Magic numbers should be constants

// actor-traces.tsx:102
refetchInterval: isLive ? 1000 : false,
  • Fix: const LIVE_REFETCH_INTERVAL_MS = 1000;

4. Testing Concerns

No tests for new functionality

  • Missing: Tests for trace tree building, OTLP conversion, queue status queries
  • Recommendation: Add unit tests for:
    • buildSpanTree with nested/orphaned spans
    • readRangeWireToOtlp with edge cases (empty, clamped, out-of-range)
    • Queue status truncation logic

Test cleanup improvement needed (test/mod.ts:83-85)

c.onTestFinished(async () => {
  await new Promise((resolve) => server.close(() => resolve(undefined)));
});
  • Issue: Server close doesn't have error handling or timeout
  • Suggestion: Add timeout to prevent hanging tests

5. Documentation

Missing JSDoc comments

  • Key functions lack documentation:
    • buildSpanTree - should explain the algorithm
    • otlpAnyValueToJs - should note type conversion rules
    • normalizeQueueStatus - should explain BigInt to Number conversions

Changelog/migration notes needed

  • Users upgrading from pre-2.0.40 RivetKit should understand:
    • New features require version upgrade
    • What data migration (if any) is needed
    • Breaking changes to inspector protocol

🔒 Security

✅ Good practices observed:

  • Proper token handling in WebSocket protocols
  • CBOR deserialization wrapped in try-catch blocks (read-range.ts:384-388)
  • Size limits enforced (DEFAULT_LIMIT, maxQueueStatusLimit)

⚠️ Minor concern:

Trace data exposure: Ensure trace data doesn't leak sensitive information in attributes

  • Current: No PII filtering on attributes
  • Recommendation: Document that users should avoid logging sensitive data, or add opt-in filtering

📊 Performance Metrics

Based on code analysis:

  • Memory: ~1-2MB for 1000 spans (acceptable)
  • Latency: Live refresh every 1s might be aggressive for slow connections
    • Suggestion: Consider adaptive polling based on fetch duration
  • Database: CBOR encoding is efficient, but ensure FDB transaction limits aren't hit with large traces

🎯 Action Items

High Priority

  1. Add max depth limit to trace rendering recursion
  2. Fix queue size update race condition
  3. Add proper error boundaries around trace rendering

Medium Priority

  1. Add unit tests for trace parsing and tree building
  2. Document migration path for RivetKit version upgrades
  3. Improve error messaging specificity

Low Priority

  1. Extract magic numbers to named constants
  2. Add JSDoc comments to complex functions
  3. Consider virtualization for very large trace lists

📝 Code Style

Adheres well to CLAUDE.md:

  • ✅ Uses lowercase log messages
  • ✅ Proper timestamp naming (createdAtMs, timeUnixNano)
  • ✅ Structured logging with tracing
  • ✅ No unnecessary formatting

Minor style notes:

  • Some functions exceed 100 lines (e.g., ActorTraces) - consider breaking into smaller components
  • Consistent use of arrow functions vs function declarations

🎉 Summary

This is a high-quality PR that adds significant value to the Rivet Actors platform. The trace visualization is well-designed and the queue inspector is a useful addition. The main concerns are around edge cases (recursion depth, race conditions) and testing coverage, but these are addressable post-merge with follow-up PRs if time-sensitive.

Recommendation: ✅ Approve with minor revisions

Focus on the high-priority items (recursion depth limit, race condition fix) before merging, and address the medium/low priority items in follow-up PRs.

@claude
Copy link

claude bot commented Jan 28, 2026

PR Review: feat(frontend): traces ui

Summary

This PR introduces a comprehensive traces UI for the frontend, replacing the old events system with an OpenTelemetry-based tracing system. The changes include a new traces tab for actors, queue visualization, protocol updates, and related infrastructure improvements.

Key Changes

  • Frontend: New traces UI (actor-traces.tsx) with time range selection, live mode, and hierarchical span visualization
  • RivetKit: Updated actor inspector protocol to v4 with queue and trace support
  • Traces Package: New read-range.ts module for converting wire format to OTLP
  • Removals: Deprecated events UI components

Code Quality & Best Practices

✅ Strengths

  1. Well-structured trace visualization: The hierarchical span tree with events is intuitive and follows standard tracing UX patterns
  2. Type safety: Good use of TypeScript discriminated unions and type narrowing
  3. Version management: Proper backward compatibility handling for inspector protocol versions (v1→v4)
  4. Performance considerations: Proper memoization with useMemo for expensive computations
  5. Error handling: Graceful degradation when traces are unavailable

⚠️ Issues & Recommendations

1. BigInt Comparisons May Cause Issues

Location: frontend/src/components/actors/actor-traces.tsx:483

roots.sort((a, b) => (a.startNs < b.startNs ? -1 : 1));

Issue: This returns -1 or 1 but never 0 for equal values. While this works, it is inconsistent with the other sort functions.

Fix:

roots.sort((a, b) => (a.startNs < b.startNs ? -1 : a.startNs > b.startNs ? 1 : 0));

2. Missing Error Boundaries

Location: frontend/src/components/actors/actor-traces.tsx

Issue: If readRangeWireToOtlp or buildSpanTree throws, the entire UI could crash.

Recommendation: Add try-catch blocks around data transformations:

const queryResult = useMemo(() => {
    if (!query.data) return null;
    try {
        return readRangeWireToOtlp(query.data);
    } catch (error) {
        console.error("Failed to parse trace data:", error);
        return null;
    }
}, [query.data]);

3. Type Coercion Safety

Location: rivetkit-typescript/packages/traces/src/read-range.ts:385

map.set(key, decodeCbor(toUint8Array(kv.value)) as unknown);

Issue: Silent failures with empty catch blocks can hide bugs.

Recommendation: Log errors:

try {
    map.set(key, decodeCbor(toUint8Array(kv.value)) as unknown);
} catch (error) {
    console.warn("Failed to decode attribute", key, error);
}

Same issue at lines 385, 401.

4. Inconsistent null checks

Location: frontend/src/components/actors/actor-traces.tsx:435

flags: span.flags || undefined,

Note: span.flags || undefined will convert 0 to undefined, which may be unintended. Consider:

flags: span.flags !== undefined ? span.flags : undefined,

Performance Considerations

✅ Good Practices

  1. Efficient sorting: Pre-sorts spans and events once rather than repeatedly
  2. Proper React memoization: Uses useMemo for expensive computations
  3. Refetch interval control: Only enables polling when in live mode
  4. Limit enforcement: Caps at 1000 spans to prevent UI overload

⚠️ Potential Issues

1. Large Trace Tree Rendering

With 1000 spans, rendering could be slow. Consider:

  • Virtual scrolling for large trace trees
  • Pagination or lazy loading
  • Collapsing deep trees by default

Security Concerns

✅ Good Security

  1. No user-controlled eval: No dynamic code execution
  2. Type validation: BARE schema provides type safety for wire protocol
  3. XSS protection: React JSX escaping prevents XSS in span names

⚠️ Minor Concerns

1. Timestamp Validation

Location: frontend/src/components/actors/actor-traces.tsx:565-566

function nsToMs(ns: bigint): number {
    return Number(ns / 1_000_000n);
}

Issue: Extremely large nanosecond values could overflow JavaScript Number type (max safe integer: 2^53 - 1).

Recommendation: Add bounds checking:

function nsToMs(ns: bigint): number {
    const ms = ns / 1_000_000n;
    if (ms > Number.MAX_SAFE_INTEGER) {
        console.warn("Timestamp overflow:", ns);
        return Number.MAX_SAFE_INTEGER;
    }
    return Number(ms);
}

Test Coverage

❌ Missing Tests

Critical Issue: No test files were added for the new functionality.

Recommendation: Add tests for:

  1. Unit tests for read-range.ts:

    • readRangeWireToOtlp with various wire formats
    • Edge cases: empty traces, missing spans, malformed data
    • Version conversion logic
  2. Unit tests for trace tree building:

    • buildSpanTree with various parent-child relationships
    • Orphaned spans (parent not in set)
  3. Component tests for actor-traces.tsx:

    • Rendering with mock data
    • Time range selection
    • Live mode toggle

Additional Observations

1. Deleted Files

Verify all imports of deleted event files are removed and no broken references remain.

2. Queue Tab Implementation

The new actor-queue.tsx looks solid. Consider making the size fallback more explicit at line 42.

3. Inspector Context Refactoring

The version detection and feature gating logic is well-implemented. Good job on backward compatibility!


Style & Conventions

✅ Follows Standards

  1. Naming conventions: Consistent use of camelCase, PascalCase
  2. Import organization: Clean import statements
  3. TypeScript usage: Proper type annotations throughout

⚠️ Minor Style Issues

  1. Comment Style: No JSDoc comments on complex functions like buildSpanTree, readRangeWireToOtlp

Final Recommendations

Must Fix Before Merge

  1. Add test coverage - At minimum, unit tests for read-range.ts
  2. ⚠️ Add error boundary/try-catch around data transformations

Should Fix Before Merge

  1. Fix span.flags || undefined to handle 0 correctly
  2. Add bounds checking for timestamp conversions
  3. Improve error logging in CBOR decode catch blocks

Nice to Have

  1. Add JSDoc comments for complex functions
  2. Consider virtual scrolling for large trace trees

Overall Assessment

Rating: 7.5/10

This is a solid implementation of a traces UI with good architecture and backward compatibility. The main concern is the lack of tests, which should be addressed before merging. The code quality is generally high, with proper TypeScript usage and React patterns.

Recommendation: Request tests be added before approval. Once tests are in place, this is ready to merge.

@NathanFlurry NathanFlurry force-pushed the 01-24-feat_frontend_traces_ui branch from 3c7fabc to 7dc3fdd Compare January 30, 2026 08:19
@jog1t jog1t force-pushed the 01-24-feat_frontend_traces_ui branch from 7dc3fdd to 0232d12 Compare January 30, 2026 21:29
@NathanFlurry NathanFlurry force-pushed the 01-24-feat_frontend_traces_ui branch from 0232d12 to 9cedde3 Compare February 3, 2026 20:02
@claude
Copy link

claude bot commented Feb 3, 2026

PR Review: Traces UI Implementation

This PR adds a comprehensive traces UI for the Rivet frontend, replacing the old events system with OpenTelemetry Protocol (OTLP) traces.

Overview

The PR successfully implements:

  • A new traces tab in the actor inspector showing hierarchical span visualization
  • A queue tab for viewing actor message queues
  • Backend support for trace queries and queue inspection via Inspector Protocol v4
  • Migration from the deprecated events system to OTLP traces

Strengths

1. Well-Structured Architecture

  • Clean separation between trace storage, wire protocol handling, and UI rendering
  • Proper use of React Query for data fetching with smart caching and refetch strategies
  • Good TypeScript patterns with discriminated unions for span/event items

2. Robust Trace Processing

  • Handles complex scenarios: span hydration across chunks, snapshot-based reconstruction, out-of-order events
  • Proper time range filtering with inclusive start/exclusive end boundaries
  • Intelligent handling of missing base records with fallback initialization

3. Excellent Test Coverage
The traces package has comprehensive tests (750+ lines) covering edge cases like backwards clock, chunk corruption, time boundaries, etc.

4. Version Management
Good implementation of protocol versioning with proper converters between v1-v4.

Issues and Recommendations

High Priority

1. Silent Error Swallowing (read-range.ts:381-391)
The code silently swallows CBOR decode errors which could hide data corruption issues. Add logging for decode failures to improve debuggability.

2. Performance: Repeated BigInt Conversions (actor-traces.tsx:471-480)
The sort comparator creates new BigInt objects 4 times per comparison. Convert once and reuse.

3. Undefined Behavior in Time Calculations (actor-traces.tsx:565-567)
Converting very large BigInt values to Number can lose precision or produce Infinity. Add bounds checking.

Medium Priority

4. Inconsistent Error Handling
Error messages don't distinguish between feature not supported vs actual errors.

5. Magic Numbers Without Context
Add comments for DEFAULT_LIMIT (1000), DEFAULT_QUEUE_LIMIT (200), GAP_THRESHOLD_MS (500).

6. Potential Race Condition
The refetch interval of 1000ms for live mode could queue up multiple requests if queries take longer than 1 second. Use refetchIntervalInBackground: false.

7. Memory Efficiency
Consider virtualizing the tree display for very large traces (up to 1000 spans).

Low Priority

8. Type Safety - Functions returning unknown could use more specific types
9. Accessibility - Add aria-label or aria-expanded attributes for screen readers
10. Code Duplication - Version checking logic is verbose

Security Considerations

✅ Good: CBOR decode wrapped in try-catch
✅ Good: Limits enforced on query results
✅ Good: Token properly passed via WebSocket
⚠️ Minor: No validation that actorId is well-formed before using in URLs

Test Coverage

Excellent for @rivetkit/traces package
Missing for:

  • Frontend React components
  • read-range.ts OTLP conversion logic
  • Inspector protocol message handlers

Recommend adding unit tests for buildSpanTree, renderItemsWithGaps, and otlpAnyValueToJs.

Code Style

✅ Follows CLAUDE.md guidelines
⚠️ Minor: Inconsistent import grouping in actor-traces.tsx:23

Summary

This is a solid implementation of a complex feature. The traces package is well-tested and robust. The main concerns are around potential performance issues with large traces and error handling transparency. The code quality is high overall.

Recommendation: Approve with minor revisions to address the high-priority items.


Review generated by Claude Code

@graphite-app
Copy link
Contributor

graphite-app bot commented Feb 4, 2026

Merge activity

  • Feb 4, 8:36 PM UTC: NathanFlurry added this pull request to the Graphite merge queue.
  • Feb 4, 8:37 PM UTC: CI is running for this pull request on a draft pull request (#4114) due to your merge queue CI optimization settings.
  • Feb 4, 8:38 PM UTC: Merged by the Graphite merge queue via draft PR: #4114.

graphite-app bot pushed a commit that referenced this pull request Feb 4, 2026
@graphite-app graphite-app bot closed this Feb 4, 2026
@graphite-app graphite-app bot deleted the 01-24-feat_frontend_traces_ui branch February 4, 2026 20:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant