Skip to content

Conversation

@NathanFlurry
Copy link
Member

No description provided.

@railway-app
Copy link

railway-app bot commented Jan 21, 2026

🚅 Deployed to the rivet-pr-3989 environment in rivet-frontend

Service Status Web Updated (UTC)
website ❌ Build Failed (View Logs) Web Jan 30, 2026 at 8:21 am
frontend-inspector ❌ Build Failed (View Logs) Web Jan 28, 2026 at 1:13 am
frontend-cloud ❌ Build Failed (View Logs) Web Jan 28, 2026 at 1:13 am

Copy link
Member Author

NathanFlurry commented Jan 21, 2026


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link

claude bot commented Jan 21, 2026

Code Review - Actor Messages & Queues Feature

This PR adds message queue functionality to RivetKit actors. Overall this is a solid implementation with good architectural decisions.

Strengths

  1. Well-structured implementation with clean separation between queue manager and user-facing API
  2. Comprehensive test coverage for edge cases (timeouts, aborts, size limits)
  3. Proper custom error types with helpful messages
  4. Good use of versioned schemas for persistence
  5. Smart batching of storage operations to reduce I/O
  6. Metadata corruption recovery mechanism

Critical Issues

Race Condition in Waiter Resolution (queue-manager.ts:300-325)

The loop in maybeResolveWaiters could allow multiple waiters to claim the same messages. Each waiter calls drainMessages which loads all queue messages fresh from storage. Without tracking claimed messages within the resolution cycle, concurrent waiters matching the same queue names could receive duplicate messages.

Recommendation: Track which messages have been claimed during the current resolution cycle.

Memory Scaling (queue-manager.ts:246-279)

Every queue operation loads ALL messages into memory via kvListPrefix. For actors with hundreds or thousands of queued messages, this creates unnecessary memory pressure and latency.

Recommendation: Implement pagination or use the existing maxQueueSize as a hard limit on loading.

Important Issues

Silent Decode Failures (queue-manager.ts:266-271)

Failed message decoding only logs an error. These ghost messages occupy space in the queue but can never be delivered or removed. Over time this could exhaust the queue capacity.

Recommendation: Track decode failures and implement dead-letter handling or automatic cleanup.

Unpersisted Size Corrections (queue-manager.ts:274-277)

When metadata size mismatches actual message count, the code corrects it in memory but doesn't persist the fix. This means every subsequent load will detect and re-correct the same discrepancy.

Recommendation: Persist corrected metadata when detecting mismatches.

Timeout Cleanup (queue-manager.ts:183-203)

The timeout handler removes the waiter and resolves the promise but doesn't clean up the abort signal listeners. If the actor aborts after timeout, the onStop listener will attempt to reject an already-resolved promise.

Recommendation: Remove event listeners in all exit paths (timeout, abort, success).

Minor Issues

  1. Queue name validation missing in createQueueProxy - should validate format/length
  2. Variable naming could be clearer (limitedCount -> normalizedCount)
  3. Missing JSDoc on several public methods
  4. Complex ternary logic in queue.ts:48-51 could use early returns

Security & Performance

Security: Proper validation of CBOR serialization, size limits enforced, abort signals handled correctly ✅

Performance:

  • Good use of BigInt IDs with proper byte ordering ✅
  • Efficient batch operations ✅
  • Concern: Multiple serialization passes for large messages (JSON → CBOR → Bare)
  • Concern: No caching of frequently accessed metadata

Test Coverage

Good coverage of basic scenarios but missing:

  • Concurrent waiter resolution tests
  • Message decode failure scenarios
  • Metadata corruption recovery verification
  • Performance tests with large queues

Recommendation

Approve with revisions - The race condition in maybeResolveWaiters should be fixed before merge as it could cause message loss or duplication in production. The memory scaling issue should also be addressed for actors with large queues.

The implementation follows good TypeScript patterns and the overall architecture is sound. Focus on the concurrency edge cases and resource scaling concerns.

@NathanFlurry NathanFlurry marked this pull request as ready for review January 22, 2026 05:36
@claude
Copy link

claude bot commented Jan 30, 2026

PR Review: Actor Messages & Queues

Summary

This PR adds a comprehensive queue messaging system to RivetKit actors, allowing asynchronous message passing between actors and clients. The implementation includes queue management, persistence, configurable limits, and comprehensive test coverage.

Code Quality & Best Practices

✅ Strengths

  1. Well-structured implementation: The queue system is cleanly separated into distinct concerns:

    • QueueManager for internal queue operations
    • ActorQueue for the public API surface
    • QueueProxy for ergonomic client-side access
  2. Proper error handling: Custom error types follow the existing pattern:

    • QueueFull, QueueMessageTooLarge, QueueMessageInvalid, ActorAborted
    • All errors are properly marked as public: true for client visibility
    • Metadata is included where relevant
  3. Good use of TypeScript: Strong typing throughout with proper type overloads for the next() API

  4. Comprehensive test coverage: The test suite covers client-to-actor messaging, self-messaging, timeouts, abort signals, and size limits

  5. Proper resource cleanup: Timeout handles and abort listeners are properly cleaned up

⚠️ Potential Issues

1. Race Condition in #maybeResolveWaiters() (queue-manager.ts:300-325)

The current implementation loads messages for each waiter sequentially. If multiple waiters are waiting for the same queue name, each #drainMessages call loads the entire queue again via #loadQueueMessages(), which is inefficient.

Recommendation: Load messages once per cycle, then distribute them to waiters in priority order (FIFO based on registration time).

2. Inefficient Queue Loading Pattern

The #loadQueueMessages() method is called frequently and does a full KV prefix scan, deserializes all messages, and sorts them by ID each time. For actors with many queued messages, this could be expensive.

Recommendation: Consider caching loaded messages or implementing an in-memory index.

3. Metadata Consistency (queue-manager.ts:274-277)

Silent correction of size mismatches during read operations could mask underlying issues:

if (this.#metadata.size !== decoded.length) {
    this.#metadata.size = decoded.length;
    this.#actor.inspector.updateQueueSize(this.#metadata.size);
}

Recommendation: Log a warning when this correction occurs.

4. Missing Validation in enqueue()

The method validates queue size limits and message size limits, but doesn't validate the name parameter (empty strings, length limits, special characters).

Recommendation: Add basic validation for queue names.

5. Type Annotation Inconsistency (fixtures/queue.ts)

Explicit type annotations on map callbacks are unnecessary:

return (messages ?? []).map(
    (message: { name: string; body: unknown }) => ({
        name: message.name,
        body: message.body,
    }),
);

Recommendation: Remove explicit type annotation and rely on type inference.

Performance Considerations

  1. Queue Message Loading: Loading all messages into memory on every operation could be expensive for large queues. Consider pagination or max queue size defaults.

  2. Batched Operations: Good use of kvBatchPut and kvBatchDelete, though the split at line 291-295 (delete + put separately) could be optimized at the driver level.

  3. BigInt for Message IDs: Good for preventing overflow, acceptable overhead.

Security Concerns

✅ Good

  • Configurable limits prevent DoS via queue flooding (maxQueueSize, maxQueueMessageSize)
  • isCborSerializable check prevents injection of non-serializable types

⚠️ Concerns

  • Queue names aren't validated - could be exploited if used in logging/error messages
  • No limit on concurrent waiters per actor - potential resource exhaustion
  • Long-lived waiters with indefinite timeouts consume memory

Recommendation: Add limit on concurrent waiters per actor.

Test Coverage

✅ Good Coverage

  • Basic send/receive, multi-queue patterns, timeout/abort handling, size limits, self-messaging

🔍 Missing Test Cases

  1. Concurrent operations (multiple clients sending simultaneously)
  2. Waiter fairness (order of message distribution)
  3. Metadata corruption recovery (#rebuildMetadata() path)
  4. Edge cases: empty queue name, very long names, operations during shutdown
  5. Performance tests for large queues (100+ messages)

Minor Issues

  1. Magic Numbers: queue-manager.ts:163 - Math.max(1, count) - the 1 minimum should be a named constant

  2. Inconsistent Return Types: The next() overloads return different types (single string → object, array → array), which could be confusing

  3. Documentation: Add JSDoc comments for timeout behavior edge cases

Recommendations Summary

High Priority

  1. Fix race condition in #maybeResolveWaiters() - load messages once per cycle
  2. Add queue name validation
  3. Add logging when metadata size corrections occur
  4. Add tests for concurrent operations and edge cases

Medium Priority

  1. Performance optimizations for large queues
  2. Add limit on concurrent waiters per actor
  3. Document timeout behavior more clearly
  4. Add performance/stress tests

Low Priority

  1. Extract magic numbers to named constants
  2. Add inline comments for complex logic
  3. Improve waiter distribution algorithm for fairness

Conclusion

This is a solid implementation with good separation of concerns, proper error handling, and decent test coverage. The main concerns are around performance for large queues and potential race conditions in waiter resolution.

The code follows RivetKit patterns well and integrates cleanly with the existing actor system. With the recommended fixes, this will be a valuable addition to the framework.

Recommendation: Request changes for the race condition and validation issues, then approve after fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants