-
Notifications
You must be signed in to change notification settings - Fork 150
feat(rivetkit): traces #4037
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 01-21-feat_rivetkit_integrate_workflows_in_to_actors
Are you sure you want to change the base?
feat(rivetkit): traces #4037
Conversation
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
How to use the Graphite Merge QueueAdd the label merge-queue to this PR to add it to the merge queue. You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
More templates
@rivetkit/cloudflare-workers
@rivetkit/db
@rivetkit/framework-base
@rivetkit/next-js
@rivetkit/react
rivetkit
@rivetkit/sql-loader
@rivetkit/traces
@rivetkit/workflow-engine
@rivetkit/virtual-websocket
@rivetkit/engine-runner
@rivetkit/engine-runner-protocol
commit: |
| import { AsyncLocalStorage } from "node:async_hooks"; | ||
| import { Buffer } from "node:buffer"; | ||
| import { randomBytes } from "node:crypto"; | ||
| import { performance } from "node:perf_hooks"; | ||
| import { decode as decodeCbor, encode as encodeCbor } from "cbor-x"; | ||
| import { pack, unpack } from "fdb-tuple"; | ||
| import { | ||
| CHUNK_VERSIONED, | ||
| CURRENT_VERSION, | ||
| READ_RANGE_VERSIONED, | ||
| encodeRecord, | ||
| type ActiveSpanRef, | ||
| type Attributes, | ||
| type Chunk, | ||
| type KeyValue, | ||
| type ReadRangeWire, | ||
| type Record as TraceRecord, | ||
| type RecordBody, | ||
| type SpanEnd, | ||
| type SpanEvent, | ||
| type SpanId, | ||
| type SpanLink, | ||
| type SpanRecordKey, | ||
| type SpanSnapshot, | ||
| type SpanStart, | ||
| type SpanStatus, | ||
| SpanStatusCode, | ||
| type SpanUpdate, | ||
| type StringId, | ||
| type TraceId, | ||
| } from "../schemas/versioned.js"; | ||
| import { | ||
| anyValueFromJs, | ||
| hexFromBytes, | ||
| type OtlpExportTraceServiceRequestJson, | ||
| type OtlpKeyValue, | ||
| type OtlpResource, | ||
| type OtlpSpan, | ||
| type OtlpSpanEvent, | ||
| type OtlpSpanLink, | ||
| type OtlpSpanStatus, | ||
| } from "./otlp.js"; | ||
| import type { | ||
| EndSpanOptions, | ||
| EventOptions, | ||
| ReadRangeOptions, | ||
| ReadRangeResult, | ||
| SpanHandle, | ||
| SpanStatusInput, | ||
| StartSpanOptions, | ||
| Traces, | ||
| TracesDriver, | ||
| TracesOptions, | ||
| UpdateSpanOptions, | ||
| } from "./types.js"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The imports need to be sorted alphabetically. Biome linter expects imports to be sorted in a specific order. Run 'biome check --write .' to automatically fix this issue.
Spotted by Graphite Agent (based on CI logs)
Is this helpful? React 👍 or 👎 to let us know.
846c70c to
7504a0b
Compare
2bdce1c to
9524546
Compare
9524546 to
e9cb986
Compare
PR Review: feat(rivetkit): tracesThis PR introduces a new traces storage and API package (@rivetkit/traces) for RivetKit, implementing an efficient local tracing system with OTLP v1 JSON export capabilities. Overall, this is a well-architected implementation with comprehensive test coverage. 🎯 Architecture & DesignStrengths
Concerns
🐛 Potential BugsCritical1. Race condition in write chain (traces.ts:559-567) function enqueueWrite(pending: PendingChunk): void {
writeChain = writeChain.then(async () => {
await driver.set(pending.key, pending.bytes);
const index = pendingChunks.indexOf(pending);
if (index !== -1) {
pendingChunks.splice(index, 1);
}
});
}If 2. Time anchor drift (traces.ts:205-208, 220-225) 3. Unsafe bigint to number conversion (traces.ts:1147-1153) function toNumber(value: bigint): number {
const asNumber = Number(value);
if (!Number.isSafeInteger(asNumber)) {
throw new Error("Value exceeds safe integer range");
}
return asNumber;
}This is used for Medium Priority4. Missing flush error handling (traces.ts:777-784) async function flush(): Promise<boolean> {
const didFlush = flushChunk();
if (didFlush) {
resetChunkState(currentChunk.bucketStartSec);
}
await writeChain; // No error handling
return didFlush;
}If a driver.set() fails in the write chain, the error is uncaught. Should either propagate the error or log it and continue. 5. Attribute sanitization inconsistency (traces.ts:294-320) 6. listRange implementation in ActorTracesDriver (traces-driver.ts:85-114) ⚡ Performance ConsiderationsGood Practices
Optimization Opportunities1. CBOR encoding overhead (traces.ts:285-287, 437-441) // Instead of encoding each attribute separately
const encoded = encodeCbor({ attr1, attr2, attr3 });2. Chunk size estimation (traces.ts:506-507, 520-523) const encodedRecord = encodeRecord(record);
if (currentChunk.sizeBytes + encodedRecord.length > targetChunkBytes) {
flushChunk();
resetChunkState(recordBucketStart);
// Reuse encodedRecord here instead of rebuilding body and re-encoding
}3. Active span snapshot linear search (traces.ts:760-771) 🔒 Security Concerns1. Unbounded attribute values (traces.ts:294-320) if (typeof value === 'string' && value.length > MAX_ATTRIBUTE_STRING_LENGTH) {
return undefined; // or truncate
}2. Denial of service via span creation (traces.ts:620-679)
✅ Test CoverageExcellent CoverageThe test file (traces.test.ts) has 750 lines of comprehensive tests covering:
Missing Test Cases
📝 Code Quality & StyleAdherence to CLAUDE.md
Suggestions
🎨 API DesignWell Designed
Suggestions
interface StartSpanOptions {
// ... existing options
// TODO(v2): Support continuing remote traces
// injectTraceContext?: { traceId: Uint8Array; parentSpanId: Uint8Array };
}📋 Minor Issues
✨ RecommendationsHigh Priority
Medium Priority
Low Priority
🎉 Overall AssessmentThis is a strong implementation with thoughtful design decisions. The architecture is well-suited for long-running Rivet Actors with heavy write loads. The main concerns are around error handling, potential race conditions, and DoS protection. With the critical issues addressed, this will be a solid foundation for tracing in RivetKit. Recommendation: Request changes for critical issues, then approve. |
e9cb986 to
7ff09e8
Compare
b18c3ae to
7e0a15d
Compare
PR Review: feat(rivetkit): tracesOverviewThis PR introduces a comprehensive distributed tracing system for RivetKit actors. The implementation adds a new Positive AspectsArchitecture & Design
Implementation Quality
Issues & Concerns🔴 Critical Issues1. Actor Instance Integration - Missing Error Handling// rivetkit/src/actor/instance/mod.ts:886-889
#initializeTraces() {
this.#traces = createTraces({
driver: new ActorTracesDriver(this.driver, this.#actorId),
});
}Issue: No error handling for trace initialization. If trace initialization fails, the actor will crash without proper cleanup. Recommendation: Wrap in try-catch and add resource configuration. 2. Trace Span Memory Leak RiskThe action execution at Recommendation: Move span creation inside the try block or use a more robust pattern. 3. Race Condition in Flush Chain// traces/src/traces.ts:559-567
function enqueueWrite(pending: PendingChunk): void {
writeChain = writeChain.then(async () => {
await driver.set(pending.key, pending.bytes);
const index = pendingChunks.indexOf(pending);
if (index !== -1) {
pendingChunks.splice(index, 1);
}
});
}Issue: If multiple flushes happen rapidly, the Recommendation: Add error handling to prevent breaking the write chain. 🟡 High Priority Issues4. Unclosed Spans on Actor ShutdownThe Recommendation: Add 5. Missing Resource ConfigurationTraces are initialized without resource metadata. According to the spec, resource data should be attached at export time for proper span identification. Recommendation: Add resource configuration with actor.id, actor.name, actor.region attributes. 6. Inefficient Driver Implementation// rivetkit/src/actor/instance/traces-driver.ts:85-114
async listRange(...) {
const entries = await this.#driver.kvListPrefix(this.#actorId, this.#prefix);
const filtered = entries.filter(...)
}Issue: Fetches ALL trace keys then filters in memory. Very inefficient for large datasets. Recommendation: Check if underlying driver supports native range queries or document performance implications. 🟢 Medium Priority Issues7. Missing Trace Context Validation
8. Logger Patching Side EffectsThe logger monkey-patching at 9. Snapshot Threshold TuningFixed thresholds (256KB bytes, 300s interval) may not be optimal for all use cases. Consider making configurable. 🔵 Low Priority / Style Issues10. Inconsistent Error MessagesError at 11. Magic NumbersAdd comments explaining constants like 12. Silent CBOR FailuresSilent Security Considerations✅ No major security issues identified
Performance ConsiderationsStrengths:
Concerns:
Test Coverage AssessmentGood coverage of basic functionality. Missing tests for:
Recommendations SummaryMust Fix Before Merge:
Should Fix Soon:
Nice to Have:
Overall AssessmentThis is a well-designed and implemented feature with good documentation. The core tracing library is solid, but the integration with ActorInstance needs some refinement around error handling and lifecycle management. Recommendation: Approve with required changes (items 1-4 above). The design choices (custom binary format, snapshot mechanism, single-writer model) are appropriate for the use case of long-running actor spans. The implementation quality is generally high, with most issues being in the integration layer rather than the core traces library. Great work on the comprehensive specification and test coverage! 🎉 |

No description provided.