feat(rivetkit): workflows #3734

NathanFlurry · 2026-01-02T20:03:50Z

No description provided.

vercel · 2026-01-02T20:03:54Z

The latest updates on your projects. Learn more about Vercel for GitHub.

4 Skipped Deployments

Project	Deployment	Review	Updated (UTC)
rivet-cloud	Ignored		Jan 2, 2026 8:07pm
rivet-inspector	Ignored	Preview	Jan 2, 2026 8:07pm
rivet-site	Ignored	Preview	Jan 2, 2026 8:07pm
rivetkit-serverless	Skipped		Jan 2, 2026 8:07pm

NathanFlurry · 2026-01-02T20:04:10Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

feat(frontend): traces ui #4038 : 2 dependent PRs (#4032 , #4051 )
feat(rivetkit): traces #4037
feat(rivetkit): integrate workflows in to actors #4006
feat(rivetkit): workflows #3734 👈 (View in Graphite)
feat(rivetkit): add actor run handler #4005 : 1 other dependent PR (#4008 )
feat: actor messages & queues #3989 : 1 other dependent PR (#4002 )
main

How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

This stack of pull requests is managed by Graphite. Learn more about stacking.

claude · 2026-01-22T17:49:47Z

PR Review: Workflow Engine for RivetKit

Overview

This PR introduces a comprehensive durable workflow engine for RivetKit TypeScript. This is a major addition (~9,257 lines) that implements a reentrant execution framework for building fault-tolerant, long-running workflows.

Summary Assessment

Overall: Strong implementation with production-ready patterns

This is high-quality code that demonstrates deep understanding of durable execution semantics. The implementation includes sophisticated features like:

Deterministic replay with history divergence detection
Automatic retry with exponential backoff
Rollback/compensation handlers
Race and join primitives for parallel execution
Message passing for external coordination
Loop state management with history trimming

Code Quality Analysis

Strengths ✅

Excellent Error Handling Strategy
- Custom error hierarchy is well-designed (CriticalError, RollbackError, EvictedError, etc.)
- Clear distinction between retriable vs non-retriable errors
- Proper error propagation through the execution stack
Strong Type Safety
- Comprehensive TypeScript types with good use of discriminated unions
- Type-safe branch outputs in join/race operations
- Well-defined interfaces for extensibility
Deterministic Execution
- History-based replay prevents non-deterministic behavior
- Name registry pattern ensures consistent location keys
- Input persistence for deterministic replays (lines 690-703 in index.ts)
Memory Management
- Loop history trimming to prevent unbounded growth (context.ts:718-763)
- Lazy loading of entry metadata (storage.ts:165-189)
- Dirty tracking for efficient flushes
Testing Coverage
- Tests cover both "yield" and "live" modes comprehensively
- Edge cases like retry exhaustion, timeouts, and eviction are tested
- Good use of test fixtures (CountingDriver)
Documentation
- Excellent inline comments explaining non-obvious behavior
- Comprehensive QUICKSTART.md and architecture.md
- Clear JSDoc annotations

Issues & Concerns 🔍

1. Critical: Race Condition in Message Consumption (High Priority)

Location: storage.ts:307-328 (consumeMessage)

export async function consumeMessage(
	storage: Storage,
	driver: EngineDriver,
	messageName: string,
): Promise<Message | null> {
	const index = storage.messages.findIndex(
		(message) => message.name === messageName,
	);
	if (index === -1) {
		return null;
	}

	const message = storage.messages[index];

	// Delete from driver first - if this fails, memory is unchanged
	await driver.delete(buildMessageKey(message.id));

	// Only remove from memory after successful driver deletion
	storage.messages.splice(index, 1);

	return message;
}

Problem: If the workflow crashes between deleting from KV and removing from memory, the message will be lost on next load (since it's deleted from KV but storage.messages still has it in memory).

Impact: Message loss in crash scenarios.

Recommendation: Consider a two-phase approach:

Mark message as "consumed" in KV first
Remove from memory
Delete from KV in background or during flush

Alternatively, accept this as "at-most-once" delivery and document it clearly.

2. Performance: Excessive Metadata Loading (Medium Priority)

Location: context.ts:355-359

Every retry attempt loads metadata from driver. If metadata is already loaded in storage.entryMetadata, this becomes redundant network calls. The loadMetadata function does cache, but consider loading all entry metadata upfront during loadStorage() for frequently-retried workflows.

3. Timeout Behavior May Be Surprising (Medium Priority)

Location: context.ts:464-480

The comment correctly notes that timeouts don't cancel underlying operations. This can lead to resource leaks if step operations don't respect ctx.abortSignal. Consider:

Documenting this more prominently in user-facing docs
Providing a helper to wrap operations with proper cancellation
Warning in logs when timeout occurs

4. Missing Input Validation (Low Priority)

Several functions lack input validation:

loop(): commitInterval must be > 0 (note in code but not enforced at context.ts:630)
listenN(): limit should be > 0
step(): timeout values could be validated

5. Inconsistent Error Messages (Low Priority)

Some errors are user-facing while others are internal. Consider standardizing error messages with actionable guidance.

Security Considerations 🔒

No Sensitive Data Leakage: ✅ Good - error metadata filtering (index.ts:833-849) only includes serializable primitives
Input Validation: ⚠️ Missing validation could allow malformed data into KV
Resource Limits: ⚠️ No built-in limits on:
- Number of loop iterations (could grow unbounded)
- Message queue size
- History entry count before trimming
Consider adding configurable limits to prevent abuse.

Performance Considerations ⚡

Batch Operations: ✅ Good use of batching in flush() (storage.ts:194-278)
Ephemeral Steps: ✅ Excellent optimization (context.ts:417-424)
Name Registry: ✅ Efficient deduplication
Concerns:
- list() operations load all entries into memory (storage.ts:122-128)
- For workflows with thousands of entries, this could be expensive
- Consider pagination or streaming for large histories

Test Coverage Analysis 🧪

Strengths:

Both "yield" and "live" modes tested
Good coverage of edge cases (timeouts, retries, eviction)
Parameterized tests reduce duplication

Gaps:

No tests for concurrent workflow execution
Limited testing of rollback scenarios
No chaos/fault injection tests (simulate KV failures)
Missing tests for history divergence recovery

Recommendation: Add tests for KV failures, partial batch writes, concurrent messages, and loop iteration limits.

Architecture Review 🏗️

Isolation Model

The driver design assumes external isolation (Durable Objects, actors). This is well-documented and appropriate.

Consideration: What happens if isolation is violated? (e.g., two processes run same workflow ID)

Could lead to history corruption
Consider adding optimistic locking or version checks

Serialization Strategy

Uses BARE schema for binary serialization. Good choice for compact storage, schema evolution, and type safety.

Note: Ensure BARE schema changes are backward compatible or migrations are handled.

Recommendations Summary

Must Fix (Before Merge)

❗ Document message delivery semantics - Is it at-most-once? exactly-once? Clarify in docs and potentially in error handling.

Should Fix (High Priority)

Add resource limits (configurable max iterations, message queue size)
Improve error messages with actionable guidance
Add tests for concurrent scenarios and KV failures

Nice to Have (Medium Priority)

Optimize metadata loading for retry-heavy workflows
Add warnings when timeouts occur without cancellation support
Add integration guide for implementing custom drivers
Consider adding metrics/observability hooks

Documentation Improvements

Add example of implementing a custom driver
Document recovery scenarios (what happens if KV is corrupted?)
Add migration guide for workflow code changes
Document performance characteristics and scaling limits

Conclusion

This is excellent work and demonstrates a mature understanding of durable execution patterns. The code quality is high, error handling is sophisticated, and the API is well-designed.

Recommendation: APPROVE with minor fixes

The identified issues are mostly documentation gaps and potential edge cases. The core implementation is solid and ready for production use with proper driver implementations.

Key Action Items:

Clarify message delivery semantics (documentation)
Add resource limits to prevent unbounded growth
Expand test coverage for failure scenarios
Document driver implementation guide

Great job on this implementation! 🎉

Reviewed using Rivet CLAUDE.md guidelines and TypeScript best practices.

claude · 2026-01-28T19:59:31Z

Pull Request Review: Workflow Engine for RivetKit

This PR introduces a comprehensive durable workflow engine to RivetKit. The implementation is well-architected with strong TypeScript types, comprehensive test coverage, and excellent documentation.

Strengths

Architecture & Design

Clean separation of concerns with isolated KV namespaces per workflow instance
Well-designed driver abstraction that allows multiple backend implementations
Deterministic replay mechanism through history tracking
Strong error handling with specialized error types for different failure modes

Code Quality

Comprehensive TypeScript typing throughout
Clear separation between public API and internal implementation
Good use of comments explaining complex logic
Consistent code style and patterns

Test Coverage

Extensive test suite covering multiple scenarios
Tests run in both yield and live modes
Good edge case coverage

Documentation

Excellent QUICKSTART.md with clear examples
Detailed architecture.md explaining design decisions
Multiple focused docs on specific features

Issues & Recommendations

High Priority

No Upper Bound Validation for Loop Config (context.ts:630) - Add validation for commitInterval and historyKeep parameters
Timeout Does Not Cancel Underlying Operations (context.ts:464-480) - Document this limitation prominently in public API docs
Missing Input Validation - Add validation for sleep duration, listenN limit, and step timeout parameters

Medium Priority

Console Warning in Production Code (context.ts:1747-1750) - Consider adding optional logger to driver interface
Error Serialization Limited (index.ts:815-862) - Consider supporting arrays and nested objects for richer error context
Potential Performance Issue in validateComplete (context.ts:245-267) - Consider optimizing for workflows with many entries

Low Priority

Magic Numbers - Use named constants for hardcoded values like retry delays

Security & Performance

Uses crypto.randomUUID() for secure ID generation
Batch writes for efficiency
Lazy metadata loading
Loop history pruning prevents unbounded growth

Final Verdict

This is high-quality code with excellent architecture. The implementation is production-ready with minor improvements recommended.

Recommendation: APPROVE

Should Fix Soon:

Add validation for loop config parameters
Add input validation for negative durations/limits
Consider replacing console.warn with configurable logging

Estimated Effort: ~4-6 hours for all improvements

Great work! The workflow engine is a solid foundation for building reliable distributed systems.

vercel bot had a problem deploying to Preview – rivetkit-serverless January 2, 2026 20:04 Failure

NathanFlurry force-pushed the 01-02-feat_rivetkit_workflows branch from a1796cf to 4ca7516 Compare January 2, 2026 20:07

vercel bot temporarily deployed to Preview – rivetkit-serverless January 2, 2026 20:07 Inactive

NathanFlurry force-pushed the 01-02-feat_rivetkit_workflows branch from 4ca7516 to cff619f Compare January 15, 2026 18:21

NathanFlurry marked this pull request as ready for review January 15, 2026 18:21

NathanFlurry force-pushed the 01-02-feat_rivetkit_workflows branch from cff619f to 3700d45 Compare January 15, 2026 18:54

NathanFlurry changed the base branch from main to graphite-base/3734 January 22, 2026 04:11

NathanFlurry force-pushed the 01-02-feat_rivetkit_workflows branch from 3700d45 to fbdbcd3 Compare January 22, 2026 04:11

NathanFlurry changed the base branch from graphite-base/3734 to actor-messages January 22, 2026 04:11

NathanFlurry mentioned this pull request Jan 22, 2026

feat: actor messages & queues #3989

Open

NathanFlurry changed the base branch from actor-messages to graphite-base/3734 January 22, 2026 05:36

This was referenced Jan 22, 2026

feat(rivetkit): add run #4002

Open

feat(rivetkit): c.keepAwake #4003

Open

feat(rivetkit): add clearAlarm #4004

Draft

NathanFlurry force-pushed the 01-02-feat_rivetkit_workflows branch from fbdbcd3 to 246d677 Compare January 22, 2026 06:31

NathanFlurry changed the base branch from graphite-base/3734 to 01-21-feat_rivetkit_add_actor_run_handler January 22, 2026 06:31

This was referenced Jan 22, 2026

feat(rivetkit): add actor run handler #4005

Open

feat(rivetkit): integrate workflows in to actors #4006

Open

NathanFlurry changed the base branch from 01-21-feat_rivetkit_add_actor_run_handler to graphite-base/3734 January 22, 2026 07:14

This was referenced Jan 22, 2026

feat(examples): effect #4008

Open

feat(rivetkit/effect): effect support #4009

Open

NathanFlurry force-pushed the graphite-base/3734 branch from 70b8762 to 2fe5b7d Compare January 22, 2026 17:45

NathanFlurry force-pushed the 01-02-feat_rivetkit_workflows branch from 246d677 to a3fbdd7 Compare January 22, 2026 17:45

NathanFlurry changed the base branch from graphite-base/3734 to 01-21-feat_rivetkit_add_actor_run_handler January 22, 2026 17:45

NathanFlurry force-pushed the 01-02-feat_rivetkit_workflows branch from a3fbdd7 to b4b061f Compare January 24, 2026 23:35

NathanFlurry force-pushed the 01-21-feat_rivetkit_add_actor_run_handler branch from 2fe5b7d to 329640c Compare January 24, 2026 23:35

NathanFlurry mentioned this pull request Jan 24, 2026

workflow example #4032

Open

This was referenced Jan 24, 2026

feat(rivetkit): add icon & name customization #4033

Open

feat(rivetkit): traces #4037

Open

feat(frontend): traces ui #4038

Open

jog1t mentioned this pull request Jan 27, 2026

refactor(rivetkit): make traces server only lib #4051

Open

NathanFlurry force-pushed the 01-21-feat_rivetkit_add_actor_run_handler branch from 329640c to 4e8ca02 Compare January 28, 2026 01:12

NathanFlurry force-pushed the 01-02-feat_rivetkit_workflows branch from b4b061f to 8a67460 Compare January 28, 2026 01:12

feat(rivetkit): workflows

a5391dd

jog1t force-pushed the 01-21-feat_rivetkit_add_actor_run_handler branch from 4e8ca02 to f1e4e30 Compare January 28, 2026 19:55

jog1t force-pushed the 01-02-feat_rivetkit_workflows branch from 8a67460 to a5391dd Compare January 28, 2026 19:55

jog1t mentioned this pull request Jan 28, 2026

feat(dashboard): traces timeline ui #4056

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rivetkit): workflows #3734

feat(rivetkit): workflows #3734

Uh oh!

NathanFlurry commented Jan 2, 2026

Uh oh!

vercel bot commented Jan 2, 2026 •

edited

Loading

Uh oh!

NathanFlurry commented Jan 2, 2026 •

edited

Loading

Uh oh!

claude bot commented Jan 22, 2026 •

edited

Loading

Uh oh!

claude bot commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(rivetkit): workflows #3734

Are you sure you want to change the base?

feat(rivetkit): workflows #3734

Uh oh!

Conversation

NathanFlurry commented Jan 2, 2026

Uh oh!

vercel bot commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NathanFlurry commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to use the Graphite Merge Queue

Uh oh!

claude bot commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Workflow Engine for RivetKit

Overview

Summary Assessment

Code Quality Analysis

Strengths ✅

Issues & Concerns 🔍

1. Critical: Race Condition in Message Consumption (High Priority)

2. Performance: Excessive Metadata Loading (Medium Priority)

3. Timeout Behavior May Be Surprising (Medium Priority)

4. Missing Input Validation (Low Priority)

5. Inconsistent Error Messages (Low Priority)

Security Considerations 🔒

Performance Considerations ⚡

Test Coverage Analysis 🧪

Architecture Review 🏗️

Isolation Model

Serialization Strategy

Recommendations Summary

Must Fix (Before Merge)

Should Fix (High Priority)

Nice to Have (Medium Priority)

Documentation Improvements

Conclusion

Key Action Items:

Uh oh!

claude bot commented Jan 28, 2026

Pull Request Review: Workflow Engine for RivetKit

Strengths

Issues & Recommendations

High Priority

Medium Priority

Low Priority

Security & Performance

Final Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel bot commented Jan 2, 2026 •

edited

Loading

NathanFlurry commented Jan 2, 2026 •

edited

Loading

claude bot commented Jan 22, 2026 •

edited

Loading