Skip to content

Conversation

@NathanFlurry
Copy link
Member

No description provided.

Copy link
Member Author

NathanFlurry commented Jan 26, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@pkg-pr-new
Copy link

pkg-pr-new bot commented Jan 26, 2026

More templates

@rivetkit/virtual-websocket

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/virtual-websocket@4038

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@4038

@rivetkit/db

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/db@4038

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@4038

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@4038

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@4038

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@4038

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@4038

@rivetkit/traces

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/traces@4038

@rivetkit/workflow-engine

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/workflow-engine@4038

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@4038

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@4038

commit: 4189086

@claude
Copy link

claude bot commented Jan 26, 2026

PR Review: Traces UI

This PR adds a comprehensive tracing UI along with a queue inspector feature. Overall, the implementation is solid with good architectural decisions. Below are my findings:


Strengths

Architecture & Design

  • Well-structured feature flags: Version-based feature detection (MIN_RIVETKIT_VERSION_TRACES, MIN_RIVETKIT_VERSION_QUEUE) with clear user messaging is excellent
  • Efficient data handling: Using CBOR serialization, BARE protocol versioning, and proper chunking for trace data shows strong understanding of performance needs
  • Separation of concerns: Clean separation between UI components (actor-traces.tsx), data processing (read-range.ts), and inspector protocol (actor-inspector-context.tsx)
  • Live/historical modes: The dual-mode UI with preset time ranges and custom date picking is user-friendly

Code Quality

  • Proper use of React patterns: useMemo for expensive computations (trace tree building, query results), proper dependency arrays
  • Good error handling: Graceful degradation when features aren't available, clear error states
  • Type safety: Strong TypeScript usage throughout with proper type imports from OTLP schemas

🔍 Issues & Recommendations

1. Performance Concerns

Critical: Unbounded recursion in trace rendering (actor-traces.tsx:247-291)

function renderItemsWithGaps(items: TraceItem[], depth: number, nowNs: bigint): ReactElement[]
  • Issue: No maximum depth check for nested spans
  • Impact: Deep trace hierarchies could cause stack overflow or performance degradation
  • Fix: Add a MAX_DEPTH constant and stop recursion at that level with a visual indicator
const MAX_DEPTH = 20;
if (depth >= MAX_DEPTH) {
  return [<div key="max-depth" className="text-xs text-muted-foreground ml-4">Max nesting depth reached...</div>];
}

Memory: Large trace arrays kept in memory (actor-traces.tsx:113-119)

const traceTree = useMemo(() => {
  if (!queryResult) return [];
  const spans = extractSpans(queryResult.otlp);
  return buildSpanTree(spans);
}, [queryResult]);
  • Issue: With DEFAULT_LIMIT = 1000, this could be substantial memory
  • Suggestion: Consider pagination or virtualization for very large trace sets
  • Note: The 1000 limit helps, but document this clearly for users

Sorting performance (read-range.ts:129-133)

records.sort((a, b) => {
  if (a.absNs < b.absNs) return -1;
  if (a.absNs > b.absNs) return 1;
  return a.sequence - b.sequence;
});
  • Issue: Sorting bigints can be slow for large datasets
  • Optimization: Consider using a more efficient comparison: return Number(a.absNs - b.absNs) || (a.sequence - b.sequence) (with overflow checks)

2. Bug: Potential race condition in queue size updates

Location: queue-manager.ts:80, 97, 143, 166

this.#actor.inspector.updateQueueSize(this.#metadata.size);
  • Issue: Queue size updates happen after database writes but aren't atomic with them
  • Scenario: If a write fails after metadata is updated in memory, the inspector will show incorrect size
  • Fix: Only call updateQueueSize after successful writes are confirmed
await this.#driver.kvBatchPut(this.#actor.id, [...]);
// Only update inspector after successful write
this.#actor.inspector.updateQueueSize(this.#metadata.size);

3. Code Quality Issues

Missing null checks (actor-traces.tsx:463-464)

const parentId = node.span.parentSpanId;
if (parentId && byId.has(parentId)) {
  byId.get(parentId)?.children.push(node);  // Optional chaining after has() check is redundant
}
  • Issue: After has(parentId) check, get(parentId) can still be undefined (rare edge case)
  • Fix: Use the pattern:
if (parentId) {
  const parent = byId.get(parentId);
  if (parent) parent.children.push(node);
}

Inconsistent error handling (actor-queue.tsx:32-38)

if (queueStatusQuery.isError || !queueStatusQuery.data) {
  return <div>Queue data is currently unavailable.</div>;
}
  • Issue: Doesn't distinguish between network errors, permissions, or actual failures
  • Suggestion: Check queueStatusQuery.error and provide more specific messaging

Magic numbers should be constants

// actor-traces.tsx:102
refetchInterval: isLive ? 1000 : false,
  • Fix: const LIVE_REFETCH_INTERVAL_MS = 1000;

4. Testing Concerns

No tests for new functionality

  • Missing: Tests for trace tree building, OTLP conversion, queue status queries
  • Recommendation: Add unit tests for:
    • buildSpanTree with nested/orphaned spans
    • readRangeWireToOtlp with edge cases (empty, clamped, out-of-range)
    • Queue status truncation logic

Test cleanup improvement needed (test/mod.ts:83-85)

c.onTestFinished(async () => {
  await new Promise((resolve) => server.close(() => resolve(undefined)));
});
  • Issue: Server close doesn't have error handling or timeout
  • Suggestion: Add timeout to prevent hanging tests

5. Documentation

Missing JSDoc comments

  • Key functions lack documentation:
    • buildSpanTree - should explain the algorithm
    • otlpAnyValueToJs - should note type conversion rules
    • normalizeQueueStatus - should explain BigInt to Number conversions

Changelog/migration notes needed

  • Users upgrading from pre-2.0.40 RivetKit should understand:
    • New features require version upgrade
    • What data migration (if any) is needed
    • Breaking changes to inspector protocol

🔒 Security

✅ Good practices observed:

  • Proper token handling in WebSocket protocols
  • CBOR deserialization wrapped in try-catch blocks (read-range.ts:384-388)
  • Size limits enforced (DEFAULT_LIMIT, maxQueueStatusLimit)

⚠️ Minor concern:

Trace data exposure: Ensure trace data doesn't leak sensitive information in attributes

  • Current: No PII filtering on attributes
  • Recommendation: Document that users should avoid logging sensitive data, or add opt-in filtering

📊 Performance Metrics

Based on code analysis:

  • Memory: ~1-2MB for 1000 spans (acceptable)
  • Latency: Live refresh every 1s might be aggressive for slow connections
    • Suggestion: Consider adaptive polling based on fetch duration
  • Database: CBOR encoding is efficient, but ensure FDB transaction limits aren't hit with large traces

🎯 Action Items

High Priority

  1. Add max depth limit to trace rendering recursion
  2. Fix queue size update race condition
  3. Add proper error boundaries around trace rendering

Medium Priority

  1. Add unit tests for trace parsing and tree building
  2. Document migration path for RivetKit version upgrades
  3. Improve error messaging specificity

Low Priority

  1. Extract magic numbers to named constants
  2. Add JSDoc comments to complex functions
  3. Consider virtualization for very large trace lists

📝 Code Style

Adheres well to CLAUDE.md:

  • ✅ Uses lowercase log messages
  • ✅ Proper timestamp naming (createdAtMs, timeUnixNano)
  • ✅ Structured logging with tracing
  • ✅ No unnecessary formatting

Minor style notes:

  • Some functions exceed 100 lines (e.g., ActorTraces) - consider breaking into smaller components
  • Consistent use of arrow functions vs function declarations

🎉 Summary

This is a high-quality PR that adds significant value to the Rivet Actors platform. The trace visualization is well-designed and the queue inspector is a useful addition. The main concerns are around edge cases (recursion depth, race conditions) and testing coverage, but these are addressable post-merge with follow-up PRs if time-sensitive.

Recommendation: ✅ Approve with minor revisions

Focus on the high-priority items (recursion depth limit, race condition fix) before merging, and address the medium/low priority items in follow-up PRs.

@jog1t jog1t force-pushed the 01-24-feat_frontend_traces_ui branch from 389f28e to 3c7fabc Compare January 28, 2026 19:55
@claude
Copy link

claude bot commented Jan 28, 2026

PR Review: Traces UI

Overview

This PR adds a comprehensive traces visualization UI and queue monitoring feature to the frontend, along with supporting infrastructure for reading and processing OTLP traces. The changes span 31 files with significant additions to the traces package and new UI components.


✅ Strengths

  1. Well-structured trace reading logic: The read-range.ts implementation is thorough with proper span tree building, time-based filtering, and OTLP conversion.

  2. Comprehensive UI components: The traces visualization includes collapsible spans, event display, gap markers, and time range selection.

  3. Feature detection: Good version checking logic to determine feature support (MIN_RIVETKIT_VERSION_TRACES, MIN_RIVETKIT_VERSION_QUEUE).

  4. Protocol versioning: Proper support for inspector protocol versions with backwards compatibility.


🔍 Code Quality Issues

1. Missing error handling in CBOR decoding

Location: read-range.ts:385-388, read-range.ts:398-405

Silent failure in catch blocks could hide data corruption or protocol mismatches. Consider logging warnings in development.

2. Type safety concerns with as unknown casts

Location: actor-traces.tsx:526, read-range.ts:385

Multiple uses of as unknown bypass type checking. While CBOR decoding necessitates some dynamic typing, consider using Zod schemas or runtime validation for critical data paths.

3. Potential performance issue with BigInt conversions

Location: actor-traces.tsx:122, 318, 400, 476-480

Repeated BigInt() conversions in render loops and sorting functions. Consider memoizing or caching these conversions.

4. Inconsistent null handling

Location: actor-traces.tsx:308

Uses == null instead of === null, inconsistent with TypeScript best practices.


🐛 Potential Bugs

1. Race condition in refetch interval

Location: actor-traces.tsx:102

With a 1-second refetch interval and live mode, rapid state updates could cause query cache thrashing. Consider increasing the interval to 2-3 seconds or using WebSocket streaming instead of polling.

2. Unsafe array access

Location: read-range.ts:283, 336, 359

Good use of nullish coalescing with strings array access, but consider validating string indices are in bounds to prevent subtle bugs.


⚡ Performance Considerations

1. Large trace data handling

Location: actor-traces.tsx:137-138

The buildSpanTree function processes all spans in memory. For large traces (approaching the 1000 span limit), consider implementing virtual scrolling or pagination.

2. Unnecessary sorting on every render

Location: actor-traces.tsx:252-254

Creates a new sorted array on every render. Should be memoized with useMemo.


🔒 Security Concerns

1. CBOR deserialization of untrusted data

Location: Multiple files

CBOR data is decoded from WebSocket messages and trace data without size limits or schema validation. Consider implementing size limits for decoded objects and using Zod schemas to validate structure after decoding.


🧪 Test Coverage

Missing test coverage for:

  1. read-range.ts span building logic (complex state machine)
  2. actor-traces.tsx tree building and gap detection
  3. Edge cases: empty traces, single span, deeply nested spans
  4. Error scenarios: malformed CBOR, invalid timestamps, missing fields

Recommendation: Add unit tests for core trace processing logic, especially the span tree building and time-based filtering.


✨ Positive Patterns

  1. Proper use of memoization: actor-traces.tsx:309-322 correctly memoizes derived data
  2. Clean separation of concerns: Trace reading logic is separate from UI components
  3. Responsive UI design: Loading states, error states, and empty states are all handled
  4. Accessibility: Proper use of semantic HTML and ARIA attributes in collapsible sections

📋 Summary

Overall Assessment: This is a solid implementation of a complex feature. The core logic is sound, but there are several areas for improvement around error handling, performance optimization, and type safety.

Priority Fixes (before merge):

  1. Add error handling/logging to CBOR decode failures
  2. Fix the inconsistent null check (== vs ===)
  3. Memoize the sorting operation in renderItemsWithGaps
  4. Add bounds checking for string array access

Recommended Follow-ups (post-merge):

  1. Add comprehensive test coverage for trace processing
  2. Investigate WebSocket streaming for live traces instead of polling
  3. Implement virtual scrolling for large trace sets
  4. Add runtime validation for critical data paths
  5. Complete the database functionality (TODOs)

Verdict: ✅ Approve with minor revisions recommended

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants