diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index 4adcfea..9efe428 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -1,6 +1,6 @@ # Architecture: @git-stunts/empty-graph -A graph database substrate living entirely within Git commits, using the "Empty Tree" pattern for invisible storage and Roaring Bitmaps for high-performance indexing. +A "hidden" graph database. No files, just Git commits, using the "Empty Tree" pattern for invisible storage and Roaring Bitmaps for high-performance indexing. ## 🧱 Core Concepts diff --git a/CHANGELOG.md b/CHANGELOG.md index dbc1b79..d864db5 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,13 +7,71 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +## [3.0.0] - 2025-01-30 + ### Added -- **Node Query API**: New methods for convenient node access - - `getNode(sha)` - Returns full GraphNode with all metadata (sha, author, date, message, parents) - - `hasNode(sha)` - Boolean existence check without loading full node data - - `countNodes(ref)` - Count nodes reachable from a ref without loading all nodes into memory -- **Batch Operations**: `createNodes(nodes)` - Create multiple nodes in a single operation with placeholder parent refs -- **LRU Cache**: Loaded shards now use an LRU cache to bound memory usage + +#### Managed Mode & Durability +- **`EmptyGraph.open()`** - New static factory for creating managed graphs with automatic durability guarantees +- **`GraphRefManager`** - New service for ref/anchor management +- **Anchor commits** - Automatic creation of anchor commits to prevent GC of disconnected subgraphs +- **`graph.sync(sha)`** - Manual ref synchronization for `autoSync: 'manual'` mode +- **`graph.anchor(ref, shas)`** - Power user method for explicit anchor creation + +#### Batching API +- **`graph.beginBatch()`** - Start a batch for efficient bulk writes +- **`GraphBatch.createNode()`** - Create nodes without per-write ref updates +- **`GraphBatch.commit()`** - Single octopus anchor for all batch nodes +- **`graph.compactAnchors()`** - Utility to compact anchor chains into single octopus + +#### Validation & Error Handling +- **`EmptyMessageError`** - New error type for empty message validation (code: `EMPTY_MESSAGE`) +- Empty messages now rejected at write time (prevents "ghost nodes") + +#### Index Improvements +- **Canonical JSON checksums** - Deterministic checksums for cross-engine compatibility +- **Shard version 2** - New format with backward compatibility for v1 +- **`SUPPORTED_SHARD_VERSIONS`** - Reader accepts both v1 and v2 shards + +#### Performance +- **`isAncestor()`** - New method on GitGraphAdapter for ancestry checking +- **Fast-forward detection** - `syncHead()` skips anchor creation for linear history +- **Octopus anchoring** - Batch.commit() creates single anchor with N parents + +#### Cancellation +- AbortSignal propagation added to all TraversalService methods +- AbortSignal support in StreamingBitmapIndexBuilder finalization + +#### Node Query API +- **`getNode(sha)`** - Returns full GraphNode with all metadata (sha, author, date, message, parents) +- **`hasNode(sha)`** - Boolean existence check without loading full node data +- **`countNodes(ref)`** - Count nodes reachable from a ref without loading all nodes into memory + +#### Batch Operations +- **`createNodes(nodes)`** - Create multiple nodes in a single operation with placeholder parent refs + +#### Caching & Resilience +- **LRU Cache** - Loaded shards now use an LRU cache to bound memory usage +- **Retry Logic** - `GitGraphAdapter` now retries transient Git failures with exponential backoff and decorrelated jitter + - Uses `@git-stunts/alfred` resilience library + - Retries on: "cannot lock ref", "resource temporarily unavailable", "connection timed out" + - Configurable via `retryOptions` constructor parameter +- **CachedValue Utility** - Reusable TTL-based caching utility in `src/domain/utils/CachedValue.js` +- **Memory Warning** - `BitmapIndexReader` logs a warning when ID-to-SHA cache exceeds 1M entries (~40MB) + +### Changed +- `SHARD_VERSION` bumped from 1 to 2 (v1 still readable) +- **TraversalService** - Refactored path reconstruction into unified `_walkPredecessors()` and `_walkSuccessors()` helpers +- **HealthCheckService** - Now uses `CachedValue` utility instead of inline caching logic + +### Fixed +- **Durability bug** - Nodes created via `createNode()` were not reachable from any ref, making them vulnerable to Git GC +- **Ghost nodes** - Empty messages allowed at write time but rejected during iteration + +### Documentation +- Added `SEMANTICS.md` - Durability contract and anchor commit semantics +- Updated `README.md` - Durability warning, mode selection guide, new API docs +- Added **Memory Considerations** section documenting memory requirements for large graphs ## [2.5.0] - 2026-01-29 diff --git a/Dockerfile b/Dockerfile index 2230e84..bd78199 100644 --- a/Dockerfile +++ b/Dockerfile @@ -6,9 +6,12 @@ RUN apt-get update && apt-get install -y \ g++ \ && rm -rf /var/lib/apt/lists/* WORKDIR /app -COPY package*.json ./ +# Copy alfred first (dependency) +COPY alfred ../alfred +# Copy empty-graph +COPY empty-graph/package*.json ./ RUN npm install -COPY . . +COPY empty-graph . ENV GIT_STUNTS_DOCKER=1 # Default to tests, but can be overridden for benchmark CMD ["npm", "test"] diff --git a/README.md b/README.md index 11d4066..2cfcf49 100644 --- a/README.md +++ b/README.md @@ -65,70 +65,17 @@ Because all commits point to the "Empty Tree" (`4b825dc642cb6eb9a060e54bf8d69288 Let's pump the brakes... Just because you *can* store a graph in Git doesn't mean you *should*. Here's an honest assessment. -### When EmptyGraph Makes Sense - -#### You need offline-first graph data. - -Git works without a network. Clone the repo, query locally, sync when you reconnect. Perfect for edge computing, field work, or airplane mode. - -#### You want Git-native replication. - -Your graph automatically inherits Git's distributed model. Fork it. Push to multiple remotes. Merge branches of graph data (carefully). No separate replication infrastructure. - -#### Your graph is append-mostly. - -Git loves immutable data. Add nodes, add edges, never delete? Perfect fit. *The reflog even lets you recover "deleted" nodes.* - -#### You're already in a Git ecosystem. - -If your workflow is Git-centric (CI/CD, GitOps, infrastructure-as-code), adding a graph that lives in Git means one less system to manage. - -#### You need an audit trail for free. - -Every mutation is a commit. `git log` is your audit log. `git blame` tells you when a node was added. `git bisect` can find when a relationship broke. - -#### The graph is small-to-medium (< 10M nodes). - -The bitmap index handles millions of nodes comfortably. At 1M nodes, you're looking at ~150-200MB of index data. That's fine. - -#### You value simplicity over features. - -No query language to learn. No cluster to manage. No connection pools. It's just JavaScript and Git. - -### When EmptyGraph Is A Bad Idea - -You should probably consider a more legit and powerful solution if: - -#### You need ACID transactions. - -Git commits are atomic, but there's no rollback, no isolation levels, no multi-statement transactions. If you need "transfer money from A to B" semantics, *please* use a real database. - -#### You need real-time updates. - -Git has no pubsub. No change streams. No WebSocket notifications. Polling `git fetch` is your only option, and it's not fast. - -#### You need complex queries. - -"Find all users who bought product X and also reviewed product Y in the last 30 days" - this requires a query planner, indexes, and probably Cypher or Gremlin. EmptyGraph gives you raw traversal primitives, not a query language (... *yet*). - -#### Your graph is write-heavy. - -Every write is a `git commit-tree` + `git commit`. That's fast, but not "10,000 writes per second" fast. Write-heavy workloads need a database that is designed for writes. - -#### You need to delete data (for real). - -GDPR "right to be forgotten"? Git's immutability works against you. Yes, you can rewrite history with `git filter-branch`, but it's painful and breaks every clone. - -#### The graph is huge (> 100M nodes). - -At some point, you're fighting Git's assumptions. Pack files get unwieldy. Index shards multiply. Clone times become brutal. Neo4j, DGraph, or TigerGraph exist for a reason. - -#### You need fine-grained access control. - -Git repos are all-or-nothing. Either you can clone it or you can't. There's no "user A can see nodes 1-100 but not 101-200." If you need row-level security, look elsewhere. - -> [!note] -> There *is* a trick to accomplish this, and I'll post it in a blog post sometime. You can run a [git startgate](https://github.com/flyingrobots/git-stargate) that uses git receive hooks + encryption to achieve "distributed opaque data", but it's too hacky to include in this project and you might want to question why you want to have private data live in git in the first place. +| Scenario | ✅ Good Fit | ❌ Bad Fit | Notes | +| ------------------------ | ---------------------------------------------------- | --------------------------------------- | ---------------------------------------------------------------- | +| **Network connectivity** | Offline-first, edge computing, field work | Real-time updates needed | Git has no pubsub/WebSocket; polling `git fetch` is slow | +| **Replication model** | Git-native (fork, push, merge branches) | Fine-grained access control | Git repos are all-or-nothing; no row-level security | +| **Write patterns** | Append-mostly, immutable data | Write-heavy (10k+ writes/sec) | Every write = `git commit-tree` + `git commit` | +| **Existing ecosystem** | Already Git-centric (CI/CD, GitOps, IaC) | Team unfamiliar with Git | Debugging corrupt indexes after force-push requires Git fluency | +| **Audit requirements** | Need free audit trail (`git log`, `blame`, `bisect`) | Need true ACID transactions | Git commits are atomic but no rollback/isolation levels | +| **Graph size** | Small-to-medium (< 10M nodes) | Huge (> 100M nodes) | 1M nodes ≈ 150-200MB index; beyond 100M, pack files get unwieldy | +| **Query complexity** | Raw traversal primitives, simple patterns | Complex multi-hop queries with filters | No query planner; Cypher/Gremlin needed for complex queries | +| **Data deletion** | Rarely delete (reflog recovers "deleted" nodes) | GDPR compliance / right to be forgotten | `git filter-branch` is painful and breaks all clones | +| **Philosophy** | Value simplicity over features | Need enterprise DB features | No query language, no cluster, no connection pools—just JS + Git | #### Your team doesn't know Git. @@ -177,6 +124,10 @@ That's the stunt. Take something everyone has, use it for something no one inten npm install @git-stunts/empty-graph @git-stunts/plumbing ``` +## Durability + +> **Warning**: If you don't use managed mode or call `sync()`/`anchor()`, Git GC can prune unreachable nodes. See [SEMANTICS.md](./SEMANTICS.md) for details. + ## Quick Start ```javascript @@ -187,13 +138,15 @@ import EmptyGraph, { GitGraphAdapter } from '@git-stunts/empty-graph'; const plumbing = new GitPlumbing({ cwd: './my-db' }); const persistence = new GitGraphAdapter({ plumbing }); -// Create the graph with injected adapter -const graph = new EmptyGraph({ persistence }); +// Open graph in managed mode (recommended) +const graph = await EmptyGraph.open({ + persistence, + ref: 'refs/empty-graph/events', + mode: 'managed', // default - automatic durability +}); -// Create a node (commit) +// Create nodes - automatically synced to ref const parentSha = await graph.createNode({ message: 'First Entry' }); - -// Create a child node const childSha = await graph.createNode({ message: 'Second Entry', parents: [parentSha] @@ -202,15 +155,107 @@ const childSha = await graph.createNode({ // Read data const message = await graph.readNode(childSha); -// List linear history (small graphs) -const nodes = await graph.listNodes({ ref: childSha, limit: 50 }); - // Stream large graphs (millions of nodes) -for await (const node of graph.iterateNodes({ ref: childSha })) { +for await (const node of graph.iterateNodes({ ref: 'refs/empty-graph/events' })) { console.log(node.message); } ``` +## Choosing a Mode + +### Beginner (Recommended) + +Use `EmptyGraph.open()` with managed mode for automatic durability: + +```javascript +const graph = await EmptyGraph.open({ + persistence, + ref: 'refs/empty-graph/events', + mode: 'managed', // default +}); + +// Every write is automatically made durable +await graph.createNode({ message: 'Safe from GC' }); +``` + +### Batch Writer + +For bulk imports, use batching to reduce ref update overhead: + +```javascript +const tx = graph.beginBatch(); +for (const item of items) { + await tx.createNode({ message: JSON.stringify(item) }); +} +await tx.commit(); // Single ref update +``` + +### Power User + +For custom ref management, use manual mode: + +```javascript +const graph = await EmptyGraph.open({ + persistence, + ref: 'refs/my-graph', + mode: 'managed', + autoSync: 'manual', +}); + +// Create nodes without automatic ref updates +const sha1 = await graph.createNode({ message: 'Node 1' }); +const sha2 = await graph.createNode({ message: 'Node 2' }); + +// Explicit sync when ready +await graph.sync(sha2); + +// Or use anchor() for fine-grained control +await graph.anchor('refs/my-graph', [sha1, sha2]); +``` + +### Direct Constructor (Legacy) + +For backward compatibility, you can still use the constructor directly: + +```javascript +const graph = new EmptyGraph({ persistence }); + +// But you must manage durability yourself! +const sha = await graph.createNode({ message: 'May be GC\'d!' }); +``` + +## How Durability Works + +EmptyGraph nodes are Git commits. Git garbage collection (GC) prunes commits that are not reachable from any ref. Without ref management, your data can be silently deleted. + +In **managed mode**, EmptyGraph automatically maintains reachability using **anchor commits**: + +- **Linear history**: Fast-forward updates (no anchor needed) +- **Disconnected roots**: Creates an anchor commit with parents `[old_tip, new_commit]` +- **Batch imports**: Single octopus anchor with all tips as parents + +Anchor commits have the message `{"_type":"anchor"}` and are filtered from graph traversals—they are infrastructure, not domain data. + +See [docs/ANCHORING.md](./docs/ANCHORING.md) for the full algorithm and [SEMANTICS.md](./SEMANTICS.md) for the durability contract. + +## Performance Considerations + +Anchor commit overhead depends on your write pattern: + +| Pattern | Anchor Overhead | Notes | +|---------|-----------------|-------| +| Linear history | Zero | Fast-forward updates | +| Disconnected roots (`autoSync: 'onWrite'`) | O(N) chained anchors | One anchor per disconnected write | +| Batch imports (`beginBatch()`) | O(1) octopus anchor | Single anchor regardless of batch size | + +**Recommendations:** + +- Use `beginBatch()` for bulk imports to avoid anchor chains +- Call `compactAnchors()` periodically to consolidate chained anchors into one octopus +- For streaming writes with disconnected roots, consider batching or periodic compaction + +See [docs/ANCHORING.md](./docs/ANCHORING.md) for traversal complexity analysis. + ## Interactive Demo Try EmptyGraph hands-on with our Docker-based interactive demo. It creates a sample e-commerce event graph and demonstrates traversal, event sourcing projections, and path finding. @@ -287,9 +332,33 @@ Path: 0148a1e4 → 6771a15f → 20744421 → 6025e6ca → d2abe22c → fb285001 ### `EmptyGraph` +#### `static async open({ persistence, ref, mode?, autoSync?, ... })` + +Opens a managed graph with automatic durability guarantees. **This is the recommended way to create an EmptyGraph instance.** + +**Parameters:** +- `persistence` (GitGraphAdapter): Adapter implementing `GraphPersistencePort` & `IndexStoragePort` +- `ref` (string): The ref to manage (e.g., `'refs/empty-graph/events'`) +- `mode` ('managed' | 'manual', optional): Durability mode. Defaults to `'managed'` +- `autoSync` ('onWrite' | 'manual', optional): When to sync refs. Defaults to `'onWrite'` +- `maxMessageBytes` (number, optional): Maximum message size. Defaults to 1MB +- `logger` (LoggerPort, optional): Logger for structured logging +- `clock` (ClockPort, optional): Clock for timing operations + +**Returns:** `Promise` - Configured graph instance + +**Example:** +```javascript +const graph = await EmptyGraph.open({ + persistence, + ref: 'refs/empty-graph/events', + mode: 'managed', +}); +``` + #### `constructor({ persistence, clock?, healthCacheTtlMs? })` -Creates a new EmptyGraph instance. +Creates a new EmptyGraph instance (legacy API). Prefer `EmptyGraph.open()` for automatic durability. **Parameters:** - `persistence` (GitGraphAdapter): Adapter implementing `GraphPersistencePort` & `IndexStoragePort` @@ -344,6 +413,87 @@ const shas = await graph.createNodes([ ]); ``` +#### `beginBatch()` + +Begins a batch operation for efficient bulk writes. Delays ref updates until `commit()` is called. **Requires managed mode.** + +**Returns:** `GraphBatch` - A batch context + +**Example:** +```javascript +const tx = graph.beginBatch(); +const a = await tx.createNode({ message: 'Node A' }); +const b = await tx.createNode({ message: 'Node B', parents: [a] }); +const result = await tx.commit(); // Single ref update +console.log(result.count); // 2 +console.log(result.anchor); // SHA if anchor was created, undefined otherwise +``` + +#### `async sync(sha)` + +Manually syncs the ref to make a node reachable. Only needed when `autoSync='manual'`. + +**Parameters:** +- `sha` (string): The SHA to sync to the managed ref + +**Returns:** `Promise<{ updated: boolean, anchor: boolean, sha: string }>` + +**Throws:** `Error` if not in managed mode or sha is not provided + +**Example:** +```javascript +const graph = await EmptyGraph.open({ + persistence, + ref: 'refs/my-graph', + mode: 'managed', + autoSync: 'manual', +}); + +const sha = await graph.createNode({ message: 'My node' }); +await graph.sync(sha); // Explicitly make durable +``` + +#### `async anchor(ref, shas)` + +Creates an anchor commit to make SHAs reachable from a ref. This is an advanced method for power users who want fine-grained control over ref management. + +**Parameters:** +- `ref` (string): The ref to update +- `shas` (string | string[]): SHA(s) to anchor + +**Returns:** `Promise` - The anchor commit SHA + +**Example:** +```javascript +// Anchor a single disconnected node +const anchorSha = await graph.anchor('refs/my-graph', nodeSha); + +// Anchor multiple nodes at once +const anchorSha = await graph.anchor('refs/my-graph', [sha1, sha2, sha3]); +``` + +#### `async compactAnchors(ref?)` + +Consolidates chained anchor commits into a single octopus anchor. Use this to clean up after many incremental writes that created disconnected roots. + +**Parameters:** +- `ref` (string, optional): The ref to compact. Defaults to the managed ref. + +**Returns:** `Promise<{ compacted: boolean, oldAnchors: number, tips: number }>` +- `compacted`: Whether compaction occurred +- `oldAnchors`: Number of anchor commits replaced +- `tips`: Number of real node tips in the new octopus anchor + +**Example:** +```javascript +// After many incremental writes with disconnected roots +const result = await graph.compactAnchors(); +console.log(`Replaced ${result.oldAnchors} anchors with 1 octopus anchor`); +console.log(`Now tracking ${result.tips} tips`); +``` + +See [docs/ANCHORING.md](./docs/ANCHORING.md) for details on when compaction is beneficial. + #### `async readNode(sha)` Reads a node's message. @@ -956,6 +1106,15 @@ const children = await graph.getChildren(someSha); | Iterate Nodes (large) | O(n) | Streaming, constant memory | | Bitmap Index Lookup | O(1) | With `BitmapIndexService` | +## Memory Considerations + +The `BitmapIndexReader` caches all SHA-to-ID mappings in memory for O(1) lookups. Each entry consumes approximately 40 bytes (SHA string + numeric ID). For a graph with 10 million nodes, this equates to roughly 400MB of memory. + +A warning is logged when the cache exceeds 1 million entries to help identify memory pressure early. For very large graphs (>10M nodes), consider: +- Pagination strategies to limit active working sets +- External indexing solutions (Redis, SQLite) +- Periodic index rebuilds to remove unreachable nodes + ## Architecture EmptyGraph follows hexagonal architecture (ports & adapters): diff --git a/SEMANTICS.md b/SEMANTICS.md new file mode 100644 index 0000000..ae8b5a8 --- /dev/null +++ b/SEMANTICS.md @@ -0,0 +1,95 @@ +# EmptyGraph Durability Semantics + +This document defines the official durability contract for EmptyGraph. + +## Core Durability Contract + +**A write is durable if and only if it becomes reachable from the graph ref.** + +Unreachable commits may be pruned by Git garbage collection at any time. +EmptyGraph provides mechanisms to ensure writes remain reachable. + +## Modes + +### Managed Mode (Default) + +In managed mode, EmptyGraph guarantees durability for all writes. + +- Every write operation updates the graph ref (or creates an anchor commit) +- Reachability from the ref is maintained automatically +- Users do not need to manage refs or call sync manually + +### Manual Mode + +In manual mode, EmptyGraph provides no automatic ref management. + +- Writes create commits but do not update refs +- User is responsible for calling `sync()` to persist reachability +- User may manage refs directly via Git commands +- **Warning**: Uncommitted writes are subject to garbage collection + +## Anchor Commits + +Anchor commits solve the reachability problem for disconnected graphs. + +### When Anchors Are Created + +An anchor commit is created when a new node is not a descendant of the +current ref tip. This occurs when: + +- Creating a disconnected root node +- Importing commits from external sources +- Merging unrelated graph histories + +### Anchor Structure + +Anchor commits have the following properties: + +- **Parents**: `[old_tip, new_commit, ...]` - includes both the previous ref + tip and all newly unreachable commits +- **Payload**: `{"_type":"anchor"}` - marker identifying the commit as an anchor +- **Purpose**: Maintains reachability without affecting graph semantics + +Anchor commits are internal bookkeeping and should be transparent to +graph traversal operations. + +## Sync Algorithm + +The `sync()` operation ensures a commit becomes reachable from the graph ref. + +``` +sync(ref, new_commit): + if ref does not exist: + set ref → new_commit + + else if ref_tip is ancestor of new_commit: + fast-forward ref → new_commit + + else: + anchor = create_commit( + parents: [ref_tip, new_commit], + payload: {"_type":"anchor"} + ) + set ref → anchor +``` + +### Cases + +| Condition | Action | Result | +|-----------|--------|--------| +| Ref missing | Create ref | `ref → new_commit` | +| Linear history | Fast-forward | `ref → new_commit` | +| Divergent history | Anchor | `ref → anchor → [old_tip, new_commit]` | + +## Guarantees + +1. In managed mode, any successfully returned write is durable +2. Anchor commits preserve all previously reachable history +3. The sync algorithm is idempotent for the same inputs +4. Graph semantics are unaffected by anchor commits + +## Non-Guarantees + +1. In manual mode, writes may be lost to garbage collection +2. Anchor commit ordering is not semantically meaningful +3. Concurrent writes may create multiple anchors (all valid) diff --git a/docker-compose.yml b/docker-compose.yml index 1116907..c2ea259 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -1,11 +1,15 @@ services: test: - build: . + build: + context: .. + dockerfile: empty-graph/Dockerfile environment: - GIT_STUNTS_DOCKER=1 benchmark: - build: . + build: + context: .. + dockerfile: empty-graph/Dockerfile volumes: - ./benchmarks:/app/benchmarks environment: diff --git a/docs/ANCHORING.md b/docs/ANCHORING.md new file mode 100644 index 0000000..90dea5a --- /dev/null +++ b/docs/ANCHORING.md @@ -0,0 +1,108 @@ +# Anchor Commits: How EmptyGraph Maintains Durability + +## The Problem + +Git garbage collection (GC) prunes commits that are not reachable from any ref. EmptyGraph nodes are Git commits, so without careful ref management, your data can be silently deleted. + +## The Solution: Anchor Commits + +EmptyGraph uses "anchor commits" to ensure all nodes remain reachable from the graph's managed ref. + +### What is an Anchor Commit? + +An anchor commit is a special commit with: +- Message: `{"_type":"anchor"}` +- Parents: The commits that need to be kept reachable +- Tree: The empty tree (same as all EmptyGraph nodes) + +Anchor commits are infrastructure—they don't represent domain data and are filtered from E graph queries. + +### When Are Anchors Created? + +**Linear history (no anchor needed):** +``` +Before: fix → A +After: fix → A ← B (B has parent A) +Result: Fast-forward, no anchor +``` + +**Disconnected root (anchor needed):** +``` +Before: fix → A +After: fix → ANCHOR + / \ + A B (B has no connection to A) +``` + +### Anchoring Strategies + +#### 1. Chained Anchors (per-write sync) + +Each disconnected write creates one anchor with 2 parents: + +``` +fix → A3 → A2 → A1 → ... + \ \ \ + D C B (real nodes) +``` + +- **Pro**: Simple, stateless, works for incremental writes +- **Con**: O(N) anchor commits for N disconnected tips + +#### 2. Octopus Anchors (batch mode) + +Single anchor with N parents for all tips: + +``` +fix → ANCHOR + /|\ \ + A B C D (all real nodes as direct parents) +``` + +- **Pro**: O(1) anchor overhead regardless of structure +- **Con**: Requires knowing all tips upfront + +#### 3. Hybrid (what EmptyGraph does) + +- **`autoSync: 'onWrite'`**: Uses chained anchors with fast-forward optimization +- **`beginBatch()`**: Uses octopus anchor on commit() +- **`compactAnchors()`**: Rewrites chains to octopus for cleanup + +## Traversal Complexity Impact + +### If Anchors Are Filtered (correct behavior) + +| Metric | Chained | Octopus | +|--------|---------|---------| +| E nodes visible | N (real only) | N (real only) | +| E traversal | O(V + E) | O(V + E) | +| No impact on domain graph | ✓ | ✓ | + +### Index Rebuild Overhead + +| Metric | Chained | Octopus | +|--------|---------|---------| +| L commits to iterate | N + O(N) anchors | N + O(1) anchors | +| Overhead | ~2x in worst case | ~none | + +This is why `Batch.commit()` uses octopus—bulk imports avoid the 2x penalty. + +## The Sync Algorithm + +``` +syncHead(ref, newSha): + 1. Read current ref tip R + 2. If R is null: set ref → newSha (first write) + 3. If R == newSha: no-op (idempotent) + 4. If R is ancestor of newSha: fast-forward ref → newSha + 5. Else: create anchor(R, newSha), set ref → anchor +``` + +Step 4 is the key optimization—linear history creates zero anchors. + +## Best Practices + +1. **Use managed mode** for automatic durability +2. **Use batching** for bulk imports (octopus = efficient) +3. **Call `compactAnchors()`** periodically if you have many incremental writes +4. **Don't worry about anchors** in your domain logic—they're invisible to E traversals diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md new file mode 100644 index 0000000..16416bd --- /dev/null +++ b/docs/ARCHITECTURE.md @@ -0,0 +1,457 @@ +# EmptyGraph Architecture + +## Overview + +EmptyGraph is a graph database built on Git. It uses Git commits pointing to the empty tree as nodes, with commit messages as payloads and parent relationships as edges. + +This architecture enables: +- Content-addressable storage with built-in deduplication +- Git's proven durability and integrity guarantees +- Standard Git tooling compatibility +- Distributed replication via git push/pull + +## Design Principles + +### Hexagonal Architecture (Ports & Adapters) + +The codebase follows hexagonal architecture to isolate domain logic from infrastructure concerns: + +- **Ports** define abstract interfaces for external dependencies +- **Adapters** implement ports for specific technologies (Git, console, etc.) +- **Domain services** contain pure business logic with injected dependencies + +This enables: +- Easy testing via mock adapters +- Swappable infrastructure (different Git implementations, logging backends) +- Clear separation of concerns + +### Domain-Driven Design + +The domain layer models the graph database concepts: +- `GraphNode` - Immutable value object representing a node +- `GraphService` - Node CRUD operations +- `TraversalService` - Graph algorithms (BFS, DFS, shortest path) +- `BitmapIndexBuilder/Reader` - High-performance indexing + +### Immutable Entities + +`GraphNode` instances are frozen after construction. The `parents` array is also frozen to prevent accidental mutation. This aligns with Git's immutable commit model. + +### Dependency Injection + +All services accept their dependencies via constructor options: +- Persistence adapters +- Loggers +- Clocks +- Parsers + +This enables testing with mocks and flexible runtime configuration. + +## Layer Diagram + +``` ++-------------------------------------------------------------+ +| EmptyGraph | <- Facade +| (index.js) | ++-------------------------------------------------------------+ +| Domain Services | +| +-------------+ +---------------+ +--------------------+ | +| | GraphService| | IndexRebuild | | TraversalService | | +| | | | Service | | | | +| +-------------+ +---------------+ +--------------------+ | +| +-------------+ +---------------+ +--------------------+ | +| | GraphRef | | BitmapIndex | | BitmapIndex | | +| | Manager | | Builder | | Reader | | +| +-------------+ +---------------+ +--------------------+ | +| +-------------+ +---------------+ +--------------------+ | +| | HealthCheck | | GitLogParser | | Streaming | | +| | Service | | | | BitmapIndexBuilder | | +| +-------------+ +---------------+ +--------------------+ | ++-------------------------------------------------------------+ +| Ports | +| +-------------------+ +---------------------------+ | +| | GraphPersistence | | IndexStoragePort | | +| | Port | | | | +| +-------------------+ +---------------------------+ | +| +-------------------+ +---------------------------+ | +| | LoggerPort | | ClockPort | | +| +-------------------+ +---------------------------+ | ++-------------------------------------------------------------+ +| Adapters | +| +-------------------+ +---------------------------+ | +| | GitGraphAdapter | | ConsoleLogger | | +| | | | NoOpLogger | | +| +-------------------+ +---------------------------+ | +| +-------------------+ | +| | PerformanceClock | | +| | GlobalClock | | +| +-------------------+ | ++-------------------------------------------------------------+ +``` + +## Directory Structure + +``` +src/ ++-- domain/ +| +-- entities/ # Immutable domain objects +| | +-- GraphNode.js # Node value object (sha, message, parents) +| +-- services/ # Business logic +| | +-- GraphService.js # Node CRUD operations +| | +-- GraphRefManager.js # Ref/anchor management +| | +-- IndexRebuildService.js # Index orchestration +| | +-- BitmapIndexBuilder.js # In-memory index construction +| | +-- BitmapIndexReader.js # O(1) index queries +| | +-- StreamingBitmapIndexBuilder.js # Memory-bounded building +| | +-- TraversalService.js # Graph algorithms +| | +-- HealthCheckService.js # K8s-style probes +| | +-- GitLogParser.js # Binary stream parsing +| +-- errors/ # Domain-specific errors +| | +-- IndexError.js +| | +-- ShardLoadError.js +| | +-- ShardCorruptionError.js +| | +-- ShardValidationError.js +| | +-- TraversalError.js +| | +-- OperationAbortedError.js +| | +-- EmptyMessageError.js +| +-- utils/ # Domain utilities +| +-- LRUCache.js # Shard caching +| +-- MinHeap.js # Priority queue for A* +| +-- CachedValue.js # TTL-based caching +| +-- cancellation.js # AbortSignal utilities ++-- infrastructure/ +| +-- adapters/ # Port implementations +| +-- GitGraphAdapter.js # Git operations via @git-stunts/plumbing +| +-- ConsoleLogger.js # Structured JSON logging +| +-- NoOpLogger.js # Silent logger for tests +| +-- PerformanceClockAdapter.js # Node.js timing +| +-- GlobalClockAdapter.js # Bun/Deno/Browser timing ++-- ports/ # Abstract interfaces + +-- GraphPersistencePort.js # Git commit/ref operations + +-- IndexStoragePort.js # Blob/tree storage + +-- LoggerPort.js # Structured logging contract + +-- ClockPort.js # Timing abstraction +``` + +## Key Components + +### Facade: EmptyGraph + +The main entry point (`index.js`) provides: +- Simplified API over domain services +- `open()` factory for managed mode with automatic durability +- Batch API for efficient bulk writes +- Health check endpoints (K8s liveness/readiness) +- Index management (rebuild, load, save) + +### Domain Services + +#### GraphService + +Core node operations: +- `createNode()` - Create a single node +- `createNodes()` - Bulk creation with placeholder references (`$0`, `$1`) +- `readNode()` / `getNode()` - Retrieve node data +- `hasNode()` - Existence check +- `iterateNodes()` - Streaming iterator for large graphs +- `countNodes()` - Efficient count via `git rev-list --count` + +Message validation enforces size limits (default 1MB) and non-empty content. + +#### GraphRefManager + +Manages ref reachability for durability: +- `readHead()` - Get current ref SHA +- `syncHead()` - Ensure node is reachable from ref +- `createAnchor()` - Create octopus merge for disconnected nodes + +The sync algorithm: +1. If ref missing: create ref pointing to new SHA +2. If ref already at target: no-op +3. If current tip is ancestor of new SHA: fast-forward +4. Otherwise: create anchor commit with both as parents + +#### IndexRebuildService + +Orchestrates index creation: +- **In-memory mode**: Fast, O(N) memory, single serialization pass +- **Streaming mode**: Memory-bounded, flushes to storage periodically + +Supports cancellation via `AbortSignal` and progress callbacks. + +#### TraversalService + +Graph algorithms using O(1) bitmap lookups: +- `bfs()` / `dfs()` - Traversal generators +- `ancestors()` / `descendants()` - Transitive closures +- `findPath()` - Any path between nodes +- `shortestPath()` - Bidirectional BFS for efficiency +- `weightedShortestPath()` - Dijkstra with custom edge weights +- `aStarSearch()` - A* with heuristic guidance +- `bidirectionalAStar()` - A* from both ends +- `topologicalSort()` - Kahn's algorithm with cycle detection +- `commonAncestors()` - Find shared ancestors of multiple nodes + +All traversals support: +- `maxNodes` / `maxDepth` limits +- Cancellation via `AbortSignal` +- Direction control (forward/reverse) + +#### BitmapIndexBuilder / BitmapIndexReader + +Roaring bitmap-based indexes for O(1) neighbor lookups: + +**Builder**: +- `registerNode()` - Assign numeric ID to SHA +- `addEdge()` - Record parent/child relationship +- `serialize()` - Output sharded JSON structure + +**Reader**: +- `setup()` - Configure with shard OID mappings +- `getParents()` / `getChildren()` - O(1) lookups +- Lazy loading with LRU cache for bounded memory +- Checksum validation with strict/non-strict modes + +#### StreamingBitmapIndexBuilder + +Memory-bounded variant of BitmapIndexBuilder: +- Flushes bitmap data to storage when threshold exceeded +- SHA-to-ID mappings remain in memory (required for consistency) +- Merges chunks at finalization via bitmap OR operations + +### Ports (Interfaces) + +#### GraphPersistencePort + +Git operations contract: +- `commitNode()` - Create commit pointing to empty tree +- `showNode()` / `getNodeInfo()` - Retrieve commit data +- `logNodesStream()` - Stream commit history +- `updateRef()` / `readRef()` / `deleteRef()` - Ref management +- `isAncestor()` - Ancestry testing for fast-forward detection +- `countNodes()` - Efficient count +- `ping()` - Health check + +Also includes blob/tree operations for index storage. + +#### IndexStoragePort + +Index persistence contract: +- `writeBlob()` / `readBlob()` - Blob I/O +- `writeTree()` / `readTreeOids()` - Tree I/O +- `updateRef()` / `readRef()` - Index ref management + +#### LoggerPort + +Structured logging contract: +- `debug()`, `info()`, `warn()`, `error()` - Log levels +- `child()` - Create scoped logger with inherited context + +#### ClockPort + +Timing abstraction: +- `now()` - High-resolution timestamp (ms) +- `timestamp()` - ISO 8601 wall-clock time + +### Adapters (Implementations) + +#### GitGraphAdapter + +Implements both `GraphPersistencePort` and `IndexStoragePort`: +- Uses `@git-stunts/plumbing` for git command execution +- Retry logic with exponential backoff for transient errors +- Input validation to prevent command injection +- NUL-terminated output parsing for reliability + +#### ConsoleLogger / NoOpLogger + +- `ConsoleLogger`: Structured JSON output with configurable levels +- `NoOpLogger`: Zero-overhead silent logger for tests + +#### PerformanceClockAdapter / GlobalClockAdapter + +- `PerformanceClockAdapter`: Uses Node.js `perf_hooks` +- `GlobalClockAdapter`: Uses global `performance` for Bun/Deno/browsers + +## Data Flow + +### Write Path + +``` +createNode() -> GraphService.createNode() + -> persistence.commitNode() + -> GraphRefManager.syncHead() (managed mode) + -> persistence.updateRef() +``` + +### Read Path (with index) + +``` +getParents() -> BitmapIndexReader._getEdges() + -> _getOrLoadShard() (lazy load) + -> storage.readBlob() + -> Validate checksum + -> RoaringBitmap32.deserialize() + -> Map IDs to SHAs +``` + +### Index Rebuild + +``` +rebuildIndex() -> IndexRebuildService.rebuild() + -> GraphService.iterateNodes() + -> BitmapIndexBuilder.registerNode() / addEdge() + -> builder.serialize() + -> storage.writeBlob() (per shard, parallel) + -> storage.writeTree() +``` + +## The Empty Tree Trick + +All EmptyGraph nodes are Git commits pointing to the empty tree: + +``` +SHA: 4b825dc642cb6eb9a060e54bf8d69288fbee4904 +``` + +This is the well-known SHA of an empty Git tree, automatically available in every repository. + +**How it works:** +- **Data**: Stored in commit message (arbitrary payload up to 1MB default) +- **Edges**: Commit parent relationships (directed, multi-parent supported) +- **Identity**: Commit SHA (content-addressable) + +**Benefits:** +- Introduces no files into the repository working tree +- Content-addressable with automatic deduplication +- Git's proven durability and integrity (SHA verification) +- Standard tooling compatibility (`git log`, `git show`, etc.) +- Distributed replication via `git push`/`git pull` + +## Index Structure + +The bitmap index enables O(1) neighbor lookups. It is stored as a Git tree with sharded JSON blobs: + +``` +index-tree/ ++-- meta_00.json # SHA->ID mappings for prefix "00" ++-- meta_01.json # SHA->ID mappings for prefix "01" ++-- ... ++-- meta_ff.json # SHA->ID mappings for prefix "ff" ++-- shards_fwd_00.json # Forward edges (parent->children) for prefix "00" ++-- shards_rev_00.json # Reverse edges (child->parents) for prefix "00" ++-- ... ++-- shards_fwd_ff.json ++-- shards_rev_ff.json +``` + +**Shard envelope format:** +```json +{ + "version": 2, + "checksum": "sha256-hex-of-data", + "data": { ... actual content ... } +} +``` + +**Meta shard content:** +```json +{ + "00a1b2c3d4e5f6789...": 0, + "00d4e5f6a7b8c9012...": 42 +} +``` + +**Edge shard content:** +```json +{ + "00a1b2c3d4e5f6789...": "OjAAAAEAAAAAAAEAEAAAABAAAA==" +} +``` + +Values are base64-encoded Roaring bitmaps containing numeric IDs of connected nodes. + +## Durability Model + +See [SEMANTICS.md](../SEMANTICS.md) and [ANCHORING.md](./ANCHORING.md) for full details. + +### Key Points + +1. **Reachability requirement**: Nodes must be reachable from a ref to survive Git GC +2. **Managed mode**: Automatic reachability via `GraphRefManager.syncHead()` +3. **Fast-forward optimization**: Linear history avoids anchor commits +4. **Anchor commits**: Octopus merges for disconnected subgraphs +5. **Batch API**: Efficient bulk imports with single octopus anchor + +### Sync Algorithm + +``` +syncHead(ref, newSha): + current = readRef(ref) + + if current is null: + updateRef(ref, newSha) # First write + + else if current == newSha: + return # No-op (idempotent) + + else if isAncestor(current, newSha): + updateRef(ref, newSha) # Fast-forward + + else: + anchor = commitNode({ + message: '{"_type":"anchor"}', + parents: [current, newSha] + }) + updateRef(ref, anchor) # Anchor merge +``` + +## Performance Characteristics + +| Operation | Complexity | Notes | +|-----------|------------|-------| +| Write (createNode) | O(1) | Append-only commit | +| Read (readNode) | O(1) | Direct SHA lookup | +| Unindexed traversal | O(N) | Linear scan via git log | +| Indexed lookup | O(1) | Bitmap query + ID resolution | +| Index rebuild | O(N) | One-time scan | +| Index load | O(1) initial | Lazy shard loading | + +**Memory characteristics:** +| Scenario | Approximate Memory | +|----------|-------------------| +| Cold start (no index) | Near-zero | +| Single shard loaded | 0.5-2 MB per prefix | +| Full index (1M nodes) | 150-200 MB | + +## Error Handling + +Domain-specific error types enable precise error handling: + +- `ShardLoadError` - Storage I/O failure +- `ShardCorruptionError` - Invalid shard format +- `ShardValidationError` - Version/checksum mismatch +- `TraversalError` - Algorithm failures (no path, cycle detected) +- `OperationAbortedError` - Cancellation via AbortSignal +- `EmptyMessageError` - Empty message validation failure + +## Cancellation Support + +Long-running operations support `AbortSignal` for cooperative cancellation: + +```javascript +const controller = new AbortController(); +setTimeout(() => controller.abort(), 5000); + +for await (const node of graph.iterateNodes({ + ref: 'HEAD', + signal: controller.signal +})) { + // Process node +} +``` + +Supported operations: +- `iterateNodes()` +- `rebuildIndex()` +- All traversal methods (BFS, DFS, shortest path, etc.) diff --git a/docs/WARP-ROADMAP.md b/docs/WARP-ROADMAP.md new file mode 100644 index 0000000..68ac6a7 --- /dev/null +++ b/docs/WARP-ROADMAP.md @@ -0,0 +1,306 @@ +# EmptyGraph WARP Roadmap + +> From single-writer event log to multi-writer confluent graph database + +## Current State: v3.0.0 + +Summary of what we have: +- Managed mode with automatic durability (anchor commits) +- Fast-forward optimization for linear history +- Octopus anchoring for batch imports +- Bitmap indexes for O(1) traversal +- Single-writer model +- Payloads in commit messages (JSON) + +## Target: v4.0.0 — Multi-Writer Deterministic Fold + +### Goals +- Multiple writers can append independently +- Deterministic merge (same patches → same state) +- Payloads move from commit message to tree blobs +- Trailer-based commit metadata +- Checkpoints for fast state recovery + +### Phase 1: Plumbing (Foundation) + +#### 1.1 Ref Layout +- `refs/empty-graph//writers/` — per-writer patch chain +- `refs/empty-graph//checkpoints/head` — latest checkpoint +- `refs/empty-graph//coverage` — anchor covering all writers (reuses v3 anchor logic) + +#### 1.2 Trailer Codec Integration +- Integrate @git-stunts/trailer-codec +- Define patch commit format: + ``` + empty-graph:patch + + eg-kind: patch + eg-graph: + eg-writer: + eg-lamport: + eg-patch-oid: + eg-schema: 1 + ``` + +#### 1.3 Tree-Based Payloads +- Add `commitNodeWithTree(treeOid, parents, message)` to GitGraphAdapter +- Patch data stored as `patch.cbor` blob in commit tree +- Commit message becomes lightweight header (trailers only) + +#### 1.4 CBOR Encoding +- Add cbor dependency +- Define PatchV1 schema: + ``` + PatchV1 { + schema: 1, + writer: string, + lamport: u64, + ops: [OpV1], + base_checkpoint?: Oid + } + ``` + +### Phase 2: Reducer v1 (Deterministic Fold) + +#### 2.1 Operation Types +- `NodeAdd { node: NodeId }` +- `NodeTombstone { node: NodeId }` +- `EdgeAdd { from: NodeId, to: NodeId, label: string }` +- `EdgeTombstone { from: NodeId, to: NodeId, label: string }` +- `PropSet { node: NodeId, key: string, value: ValueRef }` + +#### 2.2 EventId (Total Order) +``` +EventId = (lamport, writer_id, patch_sha, op_index) +``` +Lexicographic comparison gives deterministic global order. + +#### 2.3 State Model (LWW Registers) +- `node_alive[node] = LWW` — winner by max EventId +- `edge_alive[EdgeKey] = LWW` — winner by max EventId +- `prop[node][key] = LWW` — winner by max EventId + +#### 2.4 Merge Algorithm +``` +reduce(patches): + 1. Collect all patch commits since checkpoint frontier + 2. Decode patches, expand to ops with EventIds + 3. Sort ops by EventId (total order) + 4. Apply sequentially to state + 5. Return state_hash for verification +``` + +#### 2.5 Frontier Tracking +- Checkpoint stores `frontier: Map` +- Reducer walks each writer's chain from frontier to head + +### Phase 3: Checkpoints + +#### 3.1 Checkpoint Commit Format +``` +empty-graph:checkpoint + +eg-kind: checkpoint +eg-graph: +eg-state-hash: +eg-frontier-oid: +eg-index-oid: +eg-schema: 1 +``` + +#### 3.2 Checkpoint Tree Contents +- `state.cbor` or `state/` (chunked for large graphs) +- `frontier.cbor` — writer frontiers +- `index/` — bitmap index shards (reuse existing) + +#### 3.3 Incremental Rebuild +- Load checkpoint state +- Reduce only patches since frontier +- Much faster than reducing from genesis + +### Phase 4: API Surface + +#### 4.1 Multi-Writer Graph +```js +const graph = await EmptyGraph.openMultiWriter({ + persistence, + graphName: 'events', + writerId: 'node-1', +}); +``` + +#### 4.2 Patch Creation +```js +const patch = graph.createPatch(); +patch.addNode('user:alice'); +patch.addEdge('user:alice', 'group:admins', 'member-of'); +patch.setProperty('user:alice', 'name', 'Alice'); +await patch.commit(); +``` + +#### 4.3 State Materialization +```js +const state = await graph.materialize(); // reduce all patches +const state = await graph.materializeAt(checkpointOid); // from checkpoint +``` + +#### 4.4 Sync Operations +```js +await graph.syncCoverage(); // update coverage anchor +await graph.createCheckpoint(); // snapshot current state +``` + +### Phase 5: Testing & Validation + +#### 5.1 Determinism Tests +- Two replicas with same patches produce identical state_hash +- Order of patch arrival doesn't affect final state + +#### 5.2 Conflict Resolution Tests +- Concurrent NodeAdd + NodeTombstone → deterministic winner +- Concurrent PropSet on same key → LWW by EventId + +#### 5.3 Performance Benchmarks +- Reduce 10k patches +- Checkpoint creation time +- Incremental reduce from checkpoint + +--- + +## Target: v5.0.0 — True Lattice Confluence + +### Goals +- Order-independent merge (commutative, associative, idempotent) +- CRDT-based state types +- Merge is algebraic join, not sequential fold +- Formal verification possible + +### Phase 6: CRDT State Types + +#### 6.1 Upgrade Node State +- From: `LWW` (last-write-wins) +- To: `OR-Set` (Observed-Remove Set) + - NodeAdd adds a dot (EventId) + - NodeTombstone removes observed dots + - Concurrent add/remove: add wins (crdt semantics) + +#### 6.2 Upgrade Edge State +- From: `LWW` +- To: `OR-Set` with EdgeKey + +#### 6.3 Keep Property State +- `LWW` is already a valid CRDT +- Join = max by EventId (commutative, associative, idempotent) + +### Phase 7: Causal Context + +#### 7.1 Version Vectors +- Each writer tracks: `VV: Map` +- Patch includes: `context: VV` (what writer had seen) + +#### 7.2 Dots +- Each op gets: `dot: (writer_id, counter)` +- Enables "remove what I've seen" semantics + +#### 7.3 PatchV2 Schema +``` +PatchV2 { + schema: 2, + writer: string, + lamport: u64, + context: VersionVector, + ops: [OpV2], // each op has dot +} +``` + +### Phase 8: Join-Based Merge + +#### 8.1 Replace Fold with Join +- From: Sort ops, apply sequentially +- To: Merge incoming deltas in any order + +``` +merge(state, patch): + for op in patch.ops: + state = join(state, op) // commutative! + return state +``` + +#### 8.2 State as Lattice +- Nodes: OR-Set lattice +- Edges: OR-Set lattice +- Props: LWW-Register lattice (per node, per key) + +### Phase 9: Verification + +#### 9.1 Content-Addressed State +- `state_hash = hash(canonical_state_bytes)` +- Checkpoint commit ID may vary (metadata differences) +- State hash must match across replicas + +#### 9.2 Merge Receipts +- Record which patches were merged +- Proof of correct merge (for auditing) + +### Phase 10: Footprints (Echo-style) + +#### 10.1 Read/Write Sets +- Each patch declares: `footprint: { reads: [...], writes: [...] }` +- Enables fast conflict detection without full reduce + +#### 10.2 Partial Acceptance +- If footprints don't overlap: auto-merge +- If footprints conflict: policy decides (reject, rebase, etc.) + +--- + +## Migration Strategy + +### Backward Compatibility +- v4 reducer can read v3 commits (message-based payloads) +- v3 API remains available for single-writer use cases +- Gradual migration: new graphs use v4, old graphs stay v3 + +### Upgrade Path +``` +v3.0.0 (current) + │ + ├── Add trailer-codec, CBOR deps + ├── Add commitNodeWithTree() + ├── Add per-writer refs + │ +v4.0.0 (multi-writer, LWW fold) + │ + ├── Add version vectors + ├── Upgrade to OR-Set + ├── Switch to join-based merge + │ +v5.0.0 (true CRDT confluence) +``` + +--- + +## Task Estimates + +| Phase | Description | Complexity | +|-------|-------------|------------| +| 1.1-1.4 | Plumbing | Medium | +| 2.1-2.5 | Reducer v1 | High | +| 3.1-3.3 | Checkpoints | Medium | +| 4.1-4.4 | API Surface | Medium | +| 5.1-5.3 | Testing | Medium | +| 6.1-6.3 | CRDT Types | High | +| 7.1-7.3 | Causal Context | High | +| 8.1-8.2 | Join Merge | Medium | +| 9.1-9.2 | Verification | Medium | +| 10.1-10.2 | Footprints | High | + +--- + +## References + +- [SEMANTICS.md](../SEMANTICS.md) — Durability contract (v3) +- [ANCHORING.md](./ANCHORING.md) — Anchor mechanics (v3, reused in v4 coverage) +- [ARCHITECTURE.md](./ARCHITECTURE.md) — Current hexagonal architecture +- WARP Papers I-IV — Theoretical foundation +- @git-stunts/trailer-codec — Commit message encoding diff --git a/index.js b/index.js index 6b83132..af4939c 100644 --- a/index.js +++ b/index.js @@ -18,6 +18,7 @@ import NoOpLogger from './src/infrastructure/adapters/NoOpLogger.js'; import ConsoleLogger, { LogLevel } from './src/infrastructure/adapters/ConsoleLogger.js'; import PerformanceClockAdapter from './src/infrastructure/adapters/PerformanceClockAdapter.js'; import GlobalClockAdapter from './src/infrastructure/adapters/GlobalClockAdapter.js'; +import GraphRefManager from './src/domain/services/GraphRefManager.js'; import { IndexError, ShardLoadError, @@ -29,6 +30,105 @@ import { } from './src/domain/errors/index.js'; import { checkAborted, createTimeoutSignal } from './src/domain/utils/cancellation.js'; +/** + * Batch context for efficient bulk node creation. + * Delays ref updates until commit() is called. + */ +class GraphBatch { + constructor(graph) { + this._graph = graph; + this._createdShas = []; + this._committed = false; + } + + /** + * Creates a node without updating refs. + * @param {Object} options - Same as EmptyGraph.createNode() + * @returns {Promise} The created SHA + */ + async createNode(options) { + if (this._committed) { + throw new Error('Batch already committed'); + } + const sha = await this._graph.service.createNode(options); + this._createdShas.push(sha); + return sha; + } + + /** + * Finds SHAs that are tips (not ancestors of any other SHA in batch). + * @returns {Promise} Array of tip SHAs + * @private + */ + async _findDisconnectedTips() { + if (this._createdShas.length <= 1) { + return [...this._createdShas]; + } + + const tips = []; + for (const candidate of this._createdShas) { + let isAncestorOfAnother = false; + for (const other of this._createdShas) { + if (candidate !== other) { + if (await this._graph._persistence.isAncestor(candidate, other)) { + isAncestorOfAnother = true; + break; + } + } + } + if (!isAncestorOfAnother) { + tips.push(candidate); + } + } + return tips; + } + + /** + * Commits the batch, updating the ref once. + * @returns {Promise<{count: number, anchor?: string}>} + */ + async commit() { + if (this._committed) { + throw new Error('Batch already committed'); + } + this._committed = true; + + if (this._createdShas.length === 0) { + return { count: 0 }; + } + + // Find disconnected tips among created SHAs + const tips = await this._findDisconnectedTips(); + + // Read current ref tip + const currentTip = await this._graph._persistence.readRef(this._graph._ref); + + // Build octopus: current tip (if exists) + all new tips + const parents = currentTip ? [currentTip, ...tips] : tips; + + // Create single octopus anchor + const anchorMessage = JSON.stringify({ _type: 'anchor' }); + const anchorSha = await this._graph._persistence.commitNode({ + message: anchorMessage, + parents, + }); + + // Update ref + await this._graph._persistence.updateRef(this._graph._ref, anchorSha); + + return { + count: this._createdShas.length, + anchor: anchorSha, + tips: tips.length, + }; + } + + /** @returns {string[]} SHAs created in this batch */ + get createdShas() { + return [...this._createdShas]; + } +} + export { GraphService, GitGraphAdapter, @@ -54,6 +154,12 @@ export { PerformanceClockAdapter, GlobalClockAdapter, + // Ref management + GraphRefManager, + + // Batching API + GraphBatch, + // Error types for integrity failure handling IndexError, ShardLoadError, @@ -116,6 +222,33 @@ export const DEFAULT_INDEX_REF = 'refs/empty-graph/index'; * const graph = new EmptyGraph({ persistence, logger }); */ export default class EmptyGraph { + /** + * Opens a managed graph with automatic durability guarantees. + * + * @param {Object} options + * @param {GraphPersistencePort & IndexStoragePort} options.persistence - Adapter + * @param {string} options.ref - The ref to manage (e.g., 'refs/empty-graph/events') + * @param {'managed'|'manual'} [options.mode='managed'] - Durability mode + * @param {'onWrite'|'manual'} [options.autoSync='onWrite'] - When to sync refs + * @param {number} [options.maxMessageBytes] - Max message size + * @param {LoggerPort} [options.logger] - Logger + * @param {ClockPort} [options.clock] - Clock + * @returns {Promise} Configured graph instance + */ + static async open({ persistence, ref, mode = 'managed', autoSync = 'onWrite', ...rest }) { + const graph = new EmptyGraph({ persistence, ...rest }); + graph._ref = ref; + graph._mode = mode; + graph._autoSync = autoSync; + if (mode === 'managed') { + graph._refManager = new GraphRefManager({ + persistence, + logger: graph._logger.child({ component: 'GraphRefManager' }), + }); + } + return graph; + } + /** * Creates a new EmptyGraph instance. * @param {Object} options @@ -170,7 +303,14 @@ export default class EmptyGraph { * }); */ async createNode(options) { - return this.service.createNode(options); + const sha = await this.service.createNode(options); + + // In managed mode with autoSync='onWrite', sync the ref + if (this._mode === 'managed' && this._autoSync === 'onWrite' && this._refManager) { + await this._refManager.syncHead(this._ref, sha); + } + + return sha; } /** @@ -203,7 +343,54 @@ export default class EmptyGraph { * ]); */ async createNodes(nodes) { - return this.service.createNodes(nodes); + const shas = await this.service.createNodes(nodes); + + // In managed mode with autoSync='onWrite', sync with the last created node + if (this._mode === 'managed' && this._autoSync === 'onWrite' && this._refManager && shas.length > 0) { + // Sync with the last SHA - it should be reachable from all others if they're connected + // For disconnected nodes, multiple syncs may create anchors + const lastSha = shas[shas.length - 1]; + await this._refManager.syncHead(this._ref, lastSha); + } + + return shas; + } + + /** + * Manually syncs the ref to make all pending nodes reachable. + * Only needed when autoSync='manual'. + * + * @param {string} [sha] - Specific SHA to sync to. If not provided, uses last created node. + * @returns {Promise<{updated: boolean, anchor: boolean, sha: string}>} + */ + async sync(sha) { + if (!this._refManager) { + throw new Error('sync() requires managed mode. Use EmptyGraph.open() with mode="managed".'); + } + if (!sha) { + throw new Error('sha is required for sync()'); + } + return this._refManager.syncHead(this._ref, sha); + } + + /** + * Begins a batch operation for efficient bulk writes. + * + * Batch mode delays ref updates until commit() is called, + * avoiding per-node overhead for large imports. + * + * @returns {GraphBatch} A batch context + * @example + * const tx = graph.beginBatch(); + * const a = await tx.createNode({ message: 'A' }); + * const b = await tx.createNode({ message: 'B', parents: [a] }); + * await tx.commit(); // Single ref update + */ + beginBatch() { + if (this._mode !== 'managed') { + throw new Error('beginBatch() requires managed mode. Use EmptyGraph.open() with mode="managed".'); + } + return new GraphBatch(this); } /** @@ -495,4 +682,117 @@ export default class EmptyGraph { async countNodes(ref) { return this.service.countNodes(ref); } + + /** + * Creates an anchor commit to make SHAs reachable from a ref. + * + * This is an advanced method for power users who want fine-grained + * control over ref management. In managed mode, this is handled + * automatically by createNode(). + * + * @param {string} ref - The ref to update + * @param {string|string[]} shas - SHA(s) to anchor + * @returns {Promise} The anchor commit SHA + * @example + * // Anchor a single disconnected node + * const anchorSha = await graph.anchor('refs/my-graph', nodeSha); + * + * @example + * // Anchor multiple nodes at once + * const anchorSha = await graph.anchor('refs/my-graph', [sha1, sha2, sha3]); + */ + async anchor(ref, shas) { + const shaArray = Array.isArray(shas) ? shas : [shas]; + + // Read current ref tip + const currentTip = await this._persistence.readRef(ref); + + // Build parents: current tip (if exists) + new SHAs + const parents = currentTip ? [currentTip, ...shaArray] : shaArray; + + // Create anchor commit + const anchorMessage = JSON.stringify({ _type: 'anchor' }); + const anchorSha = await this._persistence.commitNode({ + message: anchorMessage, + parents, + }); + + // Update ref to point to anchor + await this._persistence.updateRef(ref, anchorSha); + + return anchorSha; + } + + /** + * Compacts anchor chains into a single octopus anchor. + * + * Walks from the ref through commits, identifies real (non-anchor) tips, + * and creates a fresh octopus anchor with those tips as parents. + * Useful for cleaning up after many incremental writes. + * + * @param {string} [ref] - The ref to compact (defaults to graph's managed ref) + * @returns {Promise<{compacted: boolean, oldAnchors: number, tips: number, newAnchor?: string}>} + * @example + * // After many incremental writes + * const result = await graph.compactAnchors(); + * console.log(`Replaced ${result.oldAnchors} anchors with 1`); + */ + async compactAnchors(ref) { + const targetRef = ref || this._ref; + if (!targetRef) { + throw new Error('compactAnchors() requires a ref. Use EmptyGraph.open() or pass ref parameter.'); + } + + const currentTip = await this._persistence.readRef(targetRef); + if (!currentTip) { + return { compacted: false, oldAnchors: 0, tips: 0 }; + } + + // Collect all commits reachable from ref, separate anchors from real nodes + const anchors = []; + const realNodes = []; + + for await (const node of this.iterateNodes({ ref: targetRef, limit: 1000000 })) { + if (node.message.startsWith('{"_type":"anchor"')) { + anchors.push(node.sha); + } else { + realNodes.push(node); + } + } + + // If no anchors, nothing to compact + if (anchors.length === 0) { + return { compacted: false, oldAnchors: 0, tips: realNodes.length }; + } + + // Find tips: real nodes that have no children among real nodes + const hasChild = new Set(); + for (const node of realNodes) { + for (const parent of node.parents) { + hasChild.add(parent); + } + } + const tips = realNodes.filter(n => !hasChild.has(n.sha)).map(n => n.sha); + + if (tips.length === 0) { + return { compacted: false, oldAnchors: anchors.length, tips: 0 }; + } + + // Create single octopus anchor with all tips + const anchorMessage = JSON.stringify({ _type: 'anchor' }); + const newAnchor = await this._persistence.commitNode({ + message: anchorMessage, + parents: tips, + }); + + // Update ref to point to new anchor + await this._persistence.updateRef(targetRef, newAnchor); + + return { + compacted: true, + oldAnchors: anchors.length, + tips: tips.length, + newAnchor, + }; + } } diff --git a/package.json b/package.json index caec2bd..2b93756 100644 --- a/package.json +++ b/package.json @@ -48,6 +48,7 @@ "author": "James Ross ", "license": "Apache-2.0", "dependencies": { + "@git-stunts/alfred": "file:../alfred", "@git-stunts/plumbing": "^2.8.0", "@git-stunts/trailer-codec": "^2.1.1", "roaring": "^2.7.0", diff --git a/src/domain/errors/EmptyMessageError.js b/src/domain/errors/EmptyMessageError.js new file mode 100644 index 0000000..39a7049 --- /dev/null +++ b/src/domain/errors/EmptyMessageError.js @@ -0,0 +1,48 @@ +import IndexError from './IndexError.js'; + +/** + * Error thrown when a message is empty or contains only whitespace. + * + * This error indicates that an operation received an empty message + * where content was required. + * + * @class EmptyMessageError + * @extends IndexError + * + * @property {string} name - The error name ('EmptyMessageError') + * @property {string} code - Error code ('EMPTY_MESSAGE') + * @property {string} operation - The operation that failed due to empty message + * @property {Object} context - Serializable context object for debugging + * + * @example + * if (!message || message.trim() === '') { + * throw new EmptyMessageError('Message cannot be empty', { + * operation: 'createNode', + * context: { nodeType: 'commit' } + * }); + * } + */ +export default class EmptyMessageError extends IndexError { + /** + * Creates a new EmptyMessageError. + * + * @param {string} message - Human-readable error message + * @param {Object} [options={}] - Error options + * @param {string} [options.operation] - The operation that failed + * @param {Object} [options.context={}] - Additional context for debugging + */ + constructor(message, options = {}) { + const context = { + ...options.context, + operation: options.operation, + }; + + super(message, { + code: 'EMPTY_MESSAGE', + context, + }); + + this.name = 'EmptyMessageError'; + this.operation = options.operation; + } +} diff --git a/src/domain/errors/index.js b/src/domain/errors/index.js index 6ce5000..bfa4761 100644 --- a/src/domain/errors/index.js +++ b/src/domain/errors/index.js @@ -4,10 +4,11 @@ * @module domain/errors */ +export { default as EmptyMessageError } from './EmptyMessageError.js'; export { default as IndexError } from './IndexError.js'; -export { default as ShardLoadError } from './ShardLoadError.js'; +export { default as OperationAbortedError } from './OperationAbortedError.js'; export { default as ShardCorruptionError } from './ShardCorruptionError.js'; +export { default as ShardLoadError } from './ShardLoadError.js'; export { default as ShardValidationError } from './ShardValidationError.js'; export { default as StorageError } from './StorageError.js'; export { default as TraversalError } from './TraversalError.js'; -export { default as OperationAbortedError } from './OperationAbortedError.js'; diff --git a/src/domain/services/BitmapIndexBuilder.js b/src/domain/services/BitmapIndexBuilder.js index bce8fe3..f0350d7 100644 --- a/src/domain/services/BitmapIndexBuilder.js +++ b/src/domain/services/BitmapIndexBuilder.js @@ -8,17 +8,37 @@ const { RoaringBitmap32 } = roaring; * Increment when changing the shard structure. * @const {number} */ -export const SHARD_VERSION = 1; +export const SHARD_VERSION = 2; + +/** + * Produces a canonical JSON string with deterministic key ordering. + * Recursively sorts object keys alphabetically to ensure consistent + * output across different JavaScript engines. + * + * @param {*} obj - The value to stringify + * @returns {string} Canonical JSON string + */ +const canonicalStringify = (obj) => { + if (obj === null || typeof obj !== 'object') { + return JSON.stringify(obj); + } + if (Array.isArray(obj)) { + return '[' + obj.map(canonicalStringify).join(',') + ']'; + } + const keys = Object.keys(obj).sort(); + return '{' + keys.map(k => JSON.stringify(k) + ':' + canonicalStringify(obj[k])).join(',') + '}'; +}; /** * Computes a SHA-256 checksum of the given data. - * Used to verify shard integrity on load. + * Uses canonical JSON stringification for deterministic output + * across different JavaScript engines. * * @param {Object} data - The data object to checksum * @returns {string} Hex-encoded SHA-256 hash */ const computeChecksum = (data) => { - const json = JSON.stringify(data); + const json = canonicalStringify(data); return createHash('sha256').update(json).digest('hex'); }; diff --git a/src/domain/services/BitmapIndexReader.js b/src/domain/services/BitmapIndexReader.js index 0fab230..768aa14 100644 --- a/src/domain/services/BitmapIndexReader.js +++ b/src/domain/services/BitmapIndexReader.js @@ -7,11 +7,12 @@ import LRUCache from '../utils/LRUCache.js'; const { RoaringBitmap32 } = roaring; /** - * Shard format version for forward compatibility. - * Must match the version used by BitmapIndexBuilder. - * @const {number} + * Supported shard format versions for backward compatibility. + * Version 1: Original format using JSON.stringify for checksums + * Version 2: Uses canonicalStringify for deterministic checksums + * @const {number[]} */ -const SHARD_VERSION = 1; +const SUPPORTED_SHARD_VERSIONS = [1, 2]; /** * Default maximum number of shards to cache. @@ -19,15 +20,35 @@ const SHARD_VERSION = 1; */ const DEFAULT_MAX_CACHED_SHARDS = 100; +/** + * Produces a canonical JSON string with deterministic key ordering. + * Recursively sorts object keys alphabetically to ensure consistent + * output across different JavaScript engines. + * + * @param {*} obj - The value to stringify + * @returns {string} Canonical JSON string + */ +const canonicalStringify = (obj) => { + if (obj === null || typeof obj !== 'object') { + return JSON.stringify(obj); + } + if (Array.isArray(obj)) { + return '[' + obj.map(canonicalStringify).join(',') + ']'; + } + const keys = Object.keys(obj).sort(); + return '{' + keys.map(k => JSON.stringify(k) + ':' + canonicalStringify(obj[k])).join(',') + '}'; +}; + /** * Computes a SHA-256 checksum of the given data. * Used to verify shard integrity on load. * * @param {Object} data - The data object to checksum + * @param {number} [version=2] - Shard version (1 uses JSON.stringify, 2+ uses canonicalStringify) * @returns {string} Hex-encoded SHA-256 hash */ -const computeChecksum = (data) => { - const json = JSON.stringify(data); +const computeChecksum = (data, version = 2) => { + const json = version === 1 ? JSON.stringify(data) : canonicalStringify(data); return createHash('sha256').update(json).digest('hex'); }; @@ -219,6 +240,16 @@ export default class BitmapIndexReader { } } + const entryCount = this._idToShaCache.length; + if (entryCount > 1_000_000) { + this.logger.warn('ID-to-SHA cache has high memory usage', { + operation: '_buildIdToShaMapping', + entryCount, + estimatedMemoryBytes: entryCount * 40, + message: `Cache contains ${entryCount} entries (~40 bytes per entry). Consider pagination or streaming for very large graphs.`, + }); + } + return this._idToShaCache; } @@ -249,15 +280,16 @@ export default class BitmapIndexReader { reason: 'missing_or_invalid_data', }); } - if (envelope.version !== SHARD_VERSION) { - throw new ShardValidationError('Version mismatch', { + if (!SUPPORTED_SHARD_VERSIONS.includes(envelope.version)) { + throw new ShardValidationError('Unsupported version', { shardPath: path, - expected: SHARD_VERSION, + expected: SUPPORTED_SHARD_VERSIONS, actual: envelope.version, field: 'version', }); } - const actualChecksum = computeChecksum(envelope.data); + // Use version-appropriate checksum computation for backward compatibility + const actualChecksum = computeChecksum(envelope.data, envelope.version); if (envelope.checksum !== actualChecksum) { throw new ShardValidationError('Checksum mismatch', { shardPath: path, diff --git a/src/domain/services/GraphRefManager.js b/src/domain/services/GraphRefManager.js new file mode 100644 index 0000000..97cd73d --- /dev/null +++ b/src/domain/services/GraphRefManager.js @@ -0,0 +1,213 @@ +import { performance } from 'perf_hooks'; +import NoOpLogger from '../../infrastructure/adapters/NoOpLogger.js'; + +/** + * Domain service for managing graph ref reachability. + * + * This service implements the core durability contract for EmptyGraph by ensuring + * that commits remain reachable from the graph ref and are not subject to Git + * garbage collection. + * + * Key responsibilities: + * - Reading and updating the graph ref + * - Creating anchor commits when needed to maintain reachability + * - Implementing the sync algorithm from SEMANTICS.md + * + * @example + * // Basic usage + * const refManager = new GraphRefManager({ persistence: gitAdapter }); + * const result = await refManager.syncHead('refs/empty-graph/index', newCommitSha); + * + * @example + * // With logging enabled + * const refManager = new GraphRefManager({ persistence: gitAdapter, logger: consoleLogger }); + */ +export default class GraphRefManager { + /** + * Creates a new GraphRefManager instance. + * + * @param {Object} options - Configuration options + * @param {Object} options.persistence - Persistence adapter implementing GraphPersistencePort. + * Required methods: readRef, updateRef, commitNode + * @param {import('../../ports/LoggerPort.js').default} [options.logger] - Logger for structured logging. + * Defaults to NoOpLogger (no logging). Inject ConsoleLogger or custom logger for output. + */ + constructor({ persistence, logger = new NoOpLogger() }) { + if (!persistence) { + throw new Error('GraphRefManager requires a persistence adapter'); + } + this.persistence = persistence; + this.logger = logger; + } + + /** + * Reads the current SHA that a ref points to. + * + * @param {string} ref - The ref name (e.g., 'refs/empty-graph/index') + * @returns {Promise} The SHA the ref points to, or null if ref doesn't exist + * + * @example + * const sha = await refManager.readHead('refs/empty-graph/index'); + * if (sha) { + * console.log(`Current tip: ${sha}`); + * } else { + * console.log('Ref does not exist yet'); + * } + */ + async readHead(ref) { + const startTime = performance.now(); + const sha = await this.persistence.readRef(ref); + const durationMs = performance.now() - startTime; + + this.logger.debug('Read head', { + operation: 'readHead', + ref, + sha, + exists: sha !== null, + durationMs, + }); + + return sha; + } + + /** + * Synchronizes the ref to include a new commit, ensuring reachability. + * + * Implements the sync algorithm from SEMANTICS.md: + * 1. If ref does not exist: create ref pointing to newTipSha + * 2. If ref already points to newTipSha: no-op (returns updated: false) + * 3. If ref tip is ancestor of newTipSha: fast-forward ref to newTipSha + * 4. Otherwise: create anchor commit with parents [ref_tip, newTipSha], update ref to anchor + * + * Uses persistence.isAncestor() for fast-forward detection, which delegates to + * `git merge-base --is-ancestor` to check reachability. + * + * @param {string} ref - The ref name to sync (e.g., 'refs/empty-graph/index') + * @param {string} newTipSha - The SHA of the commit that needs to become reachable + * @returns {Promise<{updated: boolean, anchor: boolean, sha: string}>} Sync result: + * - updated: true if ref was changed, false if already at target + * - anchor: true if an anchor commit was created, false if direct update or no-op + * - sha: the SHA the ref now points to (either newTipSha or the anchor SHA) + * + * @example + * // First write to a new graph + * const result = await refManager.syncHead('refs/empty-graph/index', firstCommitSha); + * // result: { updated: true, anchor: false, sha: firstCommitSha } + * + * @example + * // Write that requires an anchor + * const result = await refManager.syncHead('refs/empty-graph/index', disconnectedCommitSha); + * // result: { updated: true, anchor: true, sha: anchorSha } + */ + async syncHead(ref, newTipSha) { + const startTime = performance.now(); + const currentTip = await this.readHead(ref); + + if (!currentTip) { + // No ref exists - create it + await this.persistence.updateRef(ref, newTipSha); + const durationMs = performance.now() - startTime; + this.logger.debug('Ref created', { + operation: 'syncHead', + ref, + sha: newTipSha, + anchor: false, + durationMs, + }); + return { updated: true, anchor: false, sha: newTipSha }; + } + + if (currentTip === newTipSha) { + // Already pointing here + return { updated: false, anchor: false, sha: currentTip }; + } + + // NEW: Fast-forward check - if current tip is ancestor of new commit, just move ref + if (await this.persistence.isAncestor(currentTip, newTipSha)) { + await this.persistence.updateRef(ref, newTipSha); + const durationMs = performance.now() - startTime; + this.logger.debug('Ref fast-forwarded', { + operation: 'syncHead', + ref, + sha: newTipSha, + previousTip: currentTip, + anchor: false, + durationMs, + }); + return { updated: true, anchor: false, sha: newTipSha }; + } + + // Divergent history - create anchor + const anchorSha = await this.createAnchor([currentTip, newTipSha]); + await this.persistence.updateRef(ref, anchorSha); + const durationMs = performance.now() - startTime; + this.logger.debug('Anchor created', { + operation: 'syncHead', + ref, + sha: anchorSha, + parents: [currentTip, newTipSha], + anchor: true, + durationMs, + }); + return { updated: true, anchor: true, sha: anchorSha }; + } + + /** + * Creates an anchor commit that unifies multiple parent commits. + * + * Anchor commits are internal bookkeeping to maintain reachability. + * They have a special payload marker and should be transparent to + * graph traversal operations. + * + * @param {string[]} parents - Array of parent SHAs to include in the anchor + * @returns {Promise} The SHA of the created anchor commit + * + * @example + * // Create anchor unifying old tip and new disconnected commit + * const anchorSha = await refManager.createAnchor([oldTipSha, newCommitSha]); + */ + async createAnchor(parents) { + const startTime = performance.now(); + + // Anchor commits have a JSON payload marking their type + const message = JSON.stringify({ _type: 'anchor' }); + + const anchorSha = await this.persistence.commitNode({ + message, + parents, + }); + + const durationMs = performance.now() - startTime; + this.logger.debug('Anchor created', { + operation: 'createAnchor', + anchorSha, + parentCount: parents.length, + parents, + durationMs, + }); + + return anchorSha; + } + + /** + * Checks if potentialAncestor is reachable from descendant. + * + * @deprecated This method is no longer used by syncHead(), which now delegates + * directly to persistence.isAncestor(). This stub remains for backwards + * compatibility but always returns false. + * + * @param {string} potentialAncestor - SHA to check as potential ancestor + * @param {string} descendant - SHA to check as potential descendant + * @returns {Promise} Always returns false (stub implementation) + */ + async isAncestor(potentialAncestor, descendant) { + this.logger.debug('isAncestor stub called (always returns false)', { + operation: 'isAncestor', + potentialAncestor, + descendant, + result: false, + }); + + return false; + } +} diff --git a/src/domain/services/GraphService.js b/src/domain/services/GraphService.js index fa6752d..5efc794 100644 --- a/src/domain/services/GraphService.js +++ b/src/domain/services/GraphService.js @@ -3,6 +3,7 @@ import GitLogParser, { RECORD_SEPARATOR } from './GitLogParser.js'; import GraphNode from '../entities/GraphNode.js'; import NoOpLogger from '../../infrastructure/adapters/NoOpLogger.js'; import { checkAborted } from '../utils/cancellation.js'; +import EmptyMessageError from '../errors/EmptyMessageError.js'; /** Default maximum message size in bytes (1MB) */ export const DEFAULT_MAX_MESSAGE_BYTES = 1048576; @@ -75,6 +76,9 @@ export default class GraphService { if (typeof message !== 'string') { throw new Error('message must be a string'); } + if (message.length === 0) { + throw new EmptyMessageError('message must be non-empty', { operation: 'createNode' }); + } // Validate message size const messageBytes = Buffer.byteLength(message, 'utf-8'); if (messageBytes > this.maxMessageBytes) { @@ -319,6 +323,12 @@ export default class GraphService { if (typeof message !== 'string') { throw new Error(`Node at index ${index}: message must be a string`); } + if (message.length === 0) { + throw new EmptyMessageError(`Node at index ${index}: message must be non-empty`, { + operation: 'createNodes', + context: { index } + }); + } const messageBytes = Buffer.byteLength(message, 'utf-8'); if (messageBytes > this.maxMessageBytes) { diff --git a/src/domain/services/HealthCheckService.js b/src/domain/services/HealthCheckService.js index a582049..a8f5197 100644 --- a/src/domain/services/HealthCheckService.js +++ b/src/domain/services/HealthCheckService.js @@ -1,4 +1,5 @@ import NoOpLogger from '../../infrastructure/adapters/NoOpLogger.js'; +import CachedValue from '../utils/CachedValue.js'; /** * Default TTL for health check cache in milliseconds. @@ -53,16 +54,17 @@ export default class HealthCheckService { constructor({ persistence, clock, cacheTtlMs = DEFAULT_CACHE_TTL_MS, logger = new NoOpLogger() }) { this._persistence = persistence; this._clock = clock; - this._cacheTtlMs = cacheTtlMs; this._logger = logger; /** @type {import('./BitmapIndexReader.js').default|null} */ this._indexReader = null; - // Cache state - this._cachedHealth = null; - this._cacheTimestamp = 0; - this._cachedTimestampIso = null; + // Health check cache + this._healthCache = new CachedValue({ + clock, + ttlMs: cacheTtlMs, + compute: () => this._computeHealth(), + }); } /** @@ -72,29 +74,7 @@ export default class HealthCheckService { */ setIndexReader(reader) { this._indexReader = reader; - this._invalidateCache(); - } - - /** - * Invalidates the cached health result. - * Call this when state changes (e.g., index loaded/unloaded). - */ - _invalidateCache() { - this._cachedHealth = null; - this._cacheTimestamp = 0; - this._cachedTimestampIso = null; - } - - /** - * Checks if the cached health result is still valid. - * @returns {boolean} - * @private - */ - _isCacheValid() { - if (!this._cachedHealth) { - return false; - } - return this._clock.now() - this._cacheTimestamp < this._cacheTtlMs; + this._healthCache.invalidate(); } /** @@ -149,16 +129,32 @@ export default class HealthCheckService { * @property {number} [shardCount] - Number of shards (if loaded) */ async getHealth() { - // Return cached result if valid - if (this._isCacheValid()) { - return { - ...this._cachedHealth, - cachedAt: this._cachedTimestampIso, - }; + const { value, cachedAt, fromCache } = await this._healthCache.getWithMetadata(); + + if (cachedAt) { + return { ...value, cachedAt }; } - const start = this._clock.now(); + // Log only for fresh computations + if (!fromCache) { + this._logger.debug('Health check completed', { + operation: 'getHealth', + status: value.status, + repositoryStatus: value.components.repository.status, + indexStatus: value.components.index.status, + }); + } + + return value; + } + /** + * Computes health by checking all components. + * This is called by CachedValue when the cache is stale. + * @returns {Promise} + * @private + */ + async _computeHealth() { // Check repository health const repositoryHealth = await this._checkRepository(); @@ -168,29 +164,13 @@ export default class HealthCheckService { // Determine overall status const status = this._computeOverallStatus(repositoryHealth, indexHealth); - const health = { + return { status, components: { repository: repositoryHealth, index: indexHealth, }, }; - - // Cache the result - this._cachedHealth = health; - this._cacheTimestamp = this._clock.now(); - this._cachedTimestampIso = this._clock.timestamp(); - - const durationMs = this._clock.now() - start; - this._logger.debug('Health check completed', { - operation: 'getHealth', - status, - durationMs, - repositoryStatus: repositoryHealth.status, - indexStatus: indexHealth.status, - }); - - return health; } /** diff --git a/src/domain/services/IndexRebuildService.js b/src/domain/services/IndexRebuildService.js index 9316dcc..a5abbeb 100644 --- a/src/domain/services/IndexRebuildService.js +++ b/src/domain/services/IndexRebuildService.js @@ -203,7 +203,7 @@ export default class IndexRebuildService { } } - return builder.finalize(); + return builder.finalize({ signal }); } /** diff --git a/src/domain/services/StreamingBitmapIndexBuilder.js b/src/domain/services/StreamingBitmapIndexBuilder.js index 9ccaf49..3eca5de 100644 --- a/src/domain/services/StreamingBitmapIndexBuilder.js +++ b/src/domain/services/StreamingBitmapIndexBuilder.js @@ -6,6 +6,7 @@ const { RoaringBitmap32 } = roaring; import ShardCorruptionError from '../errors/ShardCorruptionError.js'; import ShardValidationError from '../errors/ShardValidationError.js'; import NoOpLogger from '../../infrastructure/adapters/NoOpLogger.js'; +import { checkAborted } from '../utils/cancellation.js'; /** * Current shard format version. @@ -302,13 +303,16 @@ export default class StreamingBitmapIndexBuilder { /** * Processes bitmap shards, merging chunks if necessary. * + * @param {Object} [options] - Options + * @param {AbortSignal} [options.signal] - Optional AbortSignal for cancellation * @returns {Promise} Array of tree entry strings * @private */ - async _processBitmapShards() { + async _processBitmapShards({ signal } = {}) { return Promise.all( Array.from(this.flushedChunks.entries()).map(async ([path, oids]) => { - const finalOid = oids.length === 1 ? oids[0] : await this._mergeChunks(oids); + checkAborted(signal, 'processBitmapShards'); + const finalOid = oids.length === 1 ? oids[0] : await this._mergeChunks(oids, { signal }); return `100644 blob ${finalOid}\t${path}`; }) ); @@ -326,9 +330,11 @@ export default class StreamingBitmapIndexBuilder { * Meta shards and bitmap shards are processed in parallel using Promise.all * since they are independent (prefix-based partitioning). * + * @param {Object} [options] - Finalization options + * @param {AbortSignal} [options.signal] - Optional AbortSignal for cancellation * @returns {Promise} OID of the created tree containing the index */ - async finalize() { + async finalize({ signal } = {}) { this.logger.debug('Finalizing index', { operation: 'finalize', nodeCount: this.shaToId.size, @@ -336,11 +342,15 @@ export default class StreamingBitmapIndexBuilder { flushCount: this.flushCount, }); + checkAborted(signal, 'finalize'); await this.flush(); + checkAborted(signal, 'finalize'); const idShards = this._buildMetaShards(); const metaEntries = await this._writeMetaShards(idShards); - const bitmapEntries = await this._processBitmapShards(); + + checkAborted(signal, 'finalize'); + const bitmapEntries = await this._processBitmapShards({ signal }); const flatEntries = [...metaEntries, ...bitmapEntries]; const treeOid = await this.storage.writeTree(flatEntries); @@ -506,16 +516,19 @@ export default class StreamingBitmapIndexBuilder { * Throws ShardValidationError on version mismatch or ShardCorruptionError on checksum mismatch. * * @param {string[]} oids - Blob OIDs of chunks to merge + * @param {Object} [options] - Options + * @param {AbortSignal} [options.signal] - Optional AbortSignal for cancellation * @returns {Promise} OID of merged shard blob * @throws {ShardValidationError} If a chunk has an unsupported version * @throws {ShardCorruptionError} If a chunk's checksum does not match * @private */ - async _mergeChunks(oids) { + async _mergeChunks(oids, { signal } = {}) { // Load all chunks and merge bitmaps by SHA const merged = {}; for (const oid of oids) { + checkAborted(signal, 'mergeChunks'); const chunk = await this._loadAndValidateChunk(oid); for (const [sha, base64Bitmap] of Object.entries(chunk)) { diff --git a/src/domain/services/TraversalService.js b/src/domain/services/TraversalService.js index c017392..8375f1d 100644 --- a/src/domain/services/TraversalService.js +++ b/src/domain/services/TraversalService.js @@ -10,6 +10,7 @@ import NoOpLogger from '../../infrastructure/adapters/NoOpLogger.js'; import TraversalError from '../errors/TraversalError.js'; import MinHeap from '../utils/MinHeap.js'; +import { checkAborted } from '../utils/cancellation.js'; /** * @typedef {'forward' | 'reverse'} TraversalDirection @@ -92,6 +93,7 @@ export default class TraversalService { * @param {number} [options.maxNodes=100000] - Maximum nodes to visit * @param {number} [options.maxDepth=1000] - Maximum depth to traverse * @param {TraversalDirection} [options.direction='forward'] - Traversal direction + * @param {AbortSignal} [options.signal] - Optional AbortSignal for cancellation * @yields {TraversalNode} * * @example @@ -99,7 +101,7 @@ export default class TraversalService { * console.log(`${node.sha} at depth ${node.depth}`); * } */ - async *bfs({ start, maxNodes = DEFAULT_MAX_NODES, maxDepth = DEFAULT_MAX_DEPTH, direction = 'forward' }) { + async *bfs({ start, maxNodes = DEFAULT_MAX_NODES, maxDepth = DEFAULT_MAX_DEPTH, direction = 'forward', signal }) { const visited = new Set(); const queue = [{ sha: start, depth: 0, parent: null }]; let nodesYielded = 0; @@ -107,6 +109,10 @@ export default class TraversalService { this._logger.debug('BFS started', { start, direction, maxNodes, maxDepth }); while (queue.length > 0 && nodesYielded < maxNodes) { + if (nodesYielded % 1000 === 0) { + checkAborted(signal, 'bfs'); + } + const current = queue.shift(); if (visited.has(current.sha)) { continue; } @@ -137,9 +143,10 @@ export default class TraversalService { * @param {number} [options.maxNodes=100000] - Maximum nodes to visit * @param {number} [options.maxDepth=1000] - Maximum depth to traverse * @param {TraversalDirection} [options.direction='forward'] - Traversal direction + * @param {AbortSignal} [options.signal] - Optional AbortSignal for cancellation * @yields {TraversalNode} */ - async *dfs({ start, maxNodes = DEFAULT_MAX_NODES, maxDepth = DEFAULT_MAX_DEPTH, direction = 'forward' }) { + async *dfs({ start, maxNodes = DEFAULT_MAX_NODES, maxDepth = DEFAULT_MAX_DEPTH, direction = 'forward', signal }) { const visited = new Set(); const stack = [{ sha: start, depth: 0, parent: null }]; let nodesYielded = 0; @@ -147,6 +154,10 @@ export default class TraversalService { this._logger.debug('DFS started', { start, direction, maxNodes, maxDepth }); while (stack.length > 0 && nodesYielded < maxNodes) { + if (nodesYielded % 1000 === 0) { + checkAborted(signal, 'dfs'); + } + const current = stack.pop(); if (visited.has(current.sha)) { continue; } @@ -177,10 +188,11 @@ export default class TraversalService { * @param {string} options.sha - Starting node SHA * @param {number} [options.maxNodes=100000] - Maximum nodes to visit * @param {number} [options.maxDepth=1000] - Maximum depth to traverse + * @param {AbortSignal} [options.signal] - Optional AbortSignal for cancellation * @yields {TraversalNode} */ - async *ancestors({ sha, maxNodes = DEFAULT_MAX_NODES, maxDepth = DEFAULT_MAX_DEPTH }) { - yield* this.bfs({ start: sha, maxNodes, maxDepth, direction: 'reverse' }); + async *ancestors({ sha, maxNodes = DEFAULT_MAX_NODES, maxDepth = DEFAULT_MAX_DEPTH, signal }) { + yield* this.bfs({ start: sha, maxNodes, maxDepth, direction: 'reverse', signal }); } /** @@ -190,10 +202,11 @@ export default class TraversalService { * @param {string} options.sha - Starting node SHA * @param {number} [options.maxNodes=100000] - Maximum nodes to visit * @param {number} [options.maxDepth=1000] - Maximum depth to traverse + * @param {AbortSignal} [options.signal] - Optional AbortSignal for cancellation * @yields {TraversalNode} */ - async *descendants({ sha, maxNodes = DEFAULT_MAX_NODES, maxDepth = DEFAULT_MAX_DEPTH }) { - yield* this.bfs({ start: sha, maxNodes, maxDepth, direction: 'forward' }); + async *descendants({ sha, maxNodes = DEFAULT_MAX_NODES, maxDepth = DEFAULT_MAX_DEPTH, signal }) { + yield* this.bfs({ start: sha, maxNodes, maxDepth, direction: 'forward', signal }); } /** @@ -204,9 +217,10 @@ export default class TraversalService { * @param {string} options.to - Target node SHA * @param {number} [options.maxNodes=100000] - Maximum nodes to visit * @param {number} [options.maxDepth=1000] - Maximum search depth + * @param {AbortSignal} [options.signal] - Optional AbortSignal for cancellation * @returns {Promise} */ - async findPath({ from, to, maxNodes = DEFAULT_MAX_NODES, maxDepth = DEFAULT_MAX_DEPTH }) { + async findPath({ from, to, maxNodes = DEFAULT_MAX_NODES, maxDepth = DEFAULT_MAX_DEPTH, signal }) { if (from === to) { return { found: true, path: [from], length: 0 }; } @@ -218,6 +232,10 @@ export default class TraversalService { const queue = [{ sha: from, depth: 0 }]; while (queue.length > 0 && visited.size < maxNodes) { + if (visited.size % 1000 === 0) { + checkAborted(signal, 'findPath'); + } + const current = queue.shift(); if (current.depth > maxDepth) { continue; } @@ -254,9 +272,10 @@ export default class TraversalService { * @param {string} options.from - Source node SHA * @param {string} options.to - Target node SHA * @param {number} [options.maxDepth=1000] - Maximum search depth + * @param {AbortSignal} [options.signal] - Optional AbortSignal for cancellation * @returns {Promise} */ - async shortestPath({ from, to, maxDepth = DEFAULT_MAX_DEPTH }) { + async shortestPath({ from, to, maxDepth = DEFAULT_MAX_DEPTH, signal }) { if (from === to) { return { found: true, path: [from], length: 0 }; } @@ -274,6 +293,8 @@ export default class TraversalService { let bwdFrontier = [to]; for (let depth = 0; depth < maxDepth; depth++) { + checkAborted(signal, 'shortestPath'); + // Check if frontiers are exhausted if (fwdFrontier.length === 0 && bwdFrontier.length === 0) { break; @@ -338,10 +359,11 @@ export default class TraversalService { * @param {string} options.to - Target SHA * @param {Function} [options.weightProvider] - Callback (fromSha, toSha) => number, defaults to 1 * @param {string} [options.direction='children'] - 'children' or 'parents' + * @param {AbortSignal} [options.signal] - Optional AbortSignal for cancellation * @returns {Promise<{path: string[], totalCost: number}>} * @throws {TraversalError} If no path exists between from and to */ - async weightedShortestPath({ from, to, weightProvider = () => 1, direction = 'children' }) { + async weightedShortestPath({ from, to, weightProvider = () => 1, direction = 'children', signal }) { this._logger.debug('weightedShortestPath started', { from, to, direction }); // Initialize distances map with Infinity for all except `from` (0) @@ -359,6 +381,10 @@ export default class TraversalService { const visited = new Set(); while (!pq.isEmpty()) { + if (visited.size % 1000 === 0) { + checkAborted(signal, 'weightedShortestPath'); + } + const current = pq.extractMin(); // Skip if already visited @@ -427,10 +453,11 @@ export default class TraversalService { * @param {Function} [options.weightProvider] - (fromSha, toSha) => number, defaults to 1 * @param {Function} [options.heuristicProvider] - (sha, targetSha) => number, defaults to 0 (becomes Dijkstra) * @param {string} [options.direction='children'] - 'children' or 'parents' + * @param {AbortSignal} [options.signal] - Optional AbortSignal for cancellation * @returns {Promise<{path: string[], totalCost: number, nodesExplored: number}>} * @throws {TraversalError} If no path exists */ - async aStarSearch({ from, to, weightProvider = () => 1, heuristicProvider = () => 0, direction = 'children' }) { + async aStarSearch({ from, to, weightProvider = () => 1, heuristicProvider = () => 0, direction = 'children', signal }) { this._logger.debug('aStarSearch started', { from, to, direction }); // Epsilon for tie-breaking: small enough not to affect ordering by f, @@ -462,6 +489,10 @@ export default class TraversalService { let nodesExplored = 0; while (!pq.isEmpty()) { + if (nodesExplored % 1000 === 0) { + checkAborted(signal, 'aStarSearch'); + } + const current = pq.extractMin(); // Skip if already visited @@ -529,6 +560,7 @@ export default class TraversalService { * @param {Function} [options.weightProvider] - (fromSha, toSha) => number * @param {Function} [options.forwardHeuristic] - (sha, targetSha) => number, for forward search * @param {Function} [options.backwardHeuristic] - (sha, targetSha) => number, for backward search + * @param {AbortSignal} [options.signal] - Optional AbortSignal for cancellation * @returns {Promise<{path: string[], totalCost: number, nodesExplored: number}>} * @throws {TraversalError} If no path exists between from and to */ @@ -538,6 +570,7 @@ export default class TraversalService { weightProvider = () => 1, forwardHeuristic = () => 0, backwardHeuristic = () => 0, + signal, }) { this._logger.debug('bidirectionalAStar started', { from, to }); @@ -571,6 +604,10 @@ export default class TraversalService { let nodesExplored = 0; while (!fwdHeap.isEmpty() || !bwdHeap.isEmpty()) { + if (nodesExplored % 1000 === 0) { + checkAborted(signal, 'bidirectionalAStar'); + } + // Get minimum f-values from each frontier const fwdMinF = fwdHeap.isEmpty() ? Infinity : fwdHeap.peekPriority(); const bwdMinF = bwdHeap.isEmpty() ? Infinity : bwdHeap.peekPriority(); @@ -698,45 +735,82 @@ export default class TraversalService { } /** - * Reconstructs path from bidirectional A* search. - * @param {Map} fwdPrevious - Forward search predecessor map - * @param {Map} bwdNext - Backward search successor map - * @param {string} from - Start node - * @param {string} to - End node - * @param {string} meeting - Meeting point - * @returns {string[]} Complete path from start to end + * Unified helper to reconstruct a path by walking a predecessor map backwards. + * + * Walks from `to` back to `from` using the provided predecessor map, + * building the path in order from start to end. + * + * @param {Map} predecessorMap - Maps each node to its predecessor + * @param {string} from - Start node (path reconstruction stops here) + * @param {string} to - End node (path reconstruction starts here) + * @param {string} [context='Path'] - Context label for error logging + * @returns {string[]} Path from `from` to `to` * @private */ - _reconstructBidirectionalAStarPath(fwdPrevious, bwdNext, from, to, meeting) { - // Build forward path (from -> meeting) - const forwardPath = []; - let current = meeting; + _walkPredecessors(predecessorMap, from, to, context = 'Path') { + const path = [to]; + let current = to; while (current !== from) { - forwardPath.unshift(current); - current = fwdPrevious.get(current); - if (current === undefined) { - // Should never happen if algorithm is correct, but guard against infinite loop - this._logger.error('Forward path reconstruction failed: missing predecessor', { from, to, meeting, path: forwardPath }); + const prev = predecessorMap.get(current); + if (prev === undefined) { + // Guard against infinite loop if algorithm has a bug + this._logger.error(`${context} reconstruction failed: missing predecessor`, { from, to, path }); break; } + current = prev; + path.unshift(current); } - if (current === from) { - forwardPath.unshift(from); - } + return path; + } - // Build backward path (meeting -> to), excluding meeting point (already in forwardPath) - current = meeting; + /** + * Unified helper to reconstruct a path by walking a successor map forwards. + * + * Walks from `from` to `to` using the provided successor map, + * building the path in order. + * + * @param {Map} successorMap - Maps each node to its successor + * @param {string} from - Start node (path reconstruction starts here) + * @param {string} to - End node (path reconstruction stops here) + * @param {string} [context='Path'] - Context label for error logging + * @returns {string[]} Path from `from` to `to` + * @private + */ + _walkSuccessors(successorMap, from, to, context = 'Path') { + const path = [from]; + let current = from; while (current !== to) { - current = bwdNext.get(current); - if (current === undefined) { - // Should never happen if algorithm is correct, but guard against infinite loop - this._logger.error('Backward path reconstruction failed: missing successor', { from, to, meeting, path: forwardPath }); + const next = successorMap.get(current); + if (next === undefined) { + // Guard against infinite loop if algorithm has a bug + this._logger.error(`${context} reconstruction failed: missing successor`, { from, to, path }); break; } - forwardPath.push(current); + current = next; + path.push(current); } + return path; + } - return forwardPath; + /** + * Reconstructs path from bidirectional A* search. + * @param {Map} fwdPrevious - Forward search predecessor map + * @param {Map} bwdNext - Backward search successor map + * @param {string} from - Start node + * @param {string} to - End node + * @param {string} meeting - Meeting point + * @returns {string[]} Complete path from start to end + * @private + */ + _reconstructBidirectionalAStarPath(fwdPrevious, bwdNext, from, to, meeting) { + // Build forward path (from -> meeting) using predecessors + const forwardPath = this._walkPredecessors(fwdPrevious, from, meeting, 'Forward path'); + + // Build backward path (meeting -> to) using successors, excluding meeting (already included) + const backwardPath = this._walkSuccessors(bwdNext, meeting, to, 'Backward path'); + + // Combine paths, avoiding duplicate meeting point + return forwardPath.concat(backwardPath.slice(1)); } /** @@ -744,18 +818,7 @@ export default class TraversalService { * @private */ _reconstructWeightedPath(previous, from, to) { - const path = [to]; - let current = to; - while (current !== from) { - current = previous.get(current); - if (current === undefined) { - // Should never happen if algorithm is correct, but guard against infinite loop - this._logger.error('Path reconstruction failed: missing predecessor', { from, to, path }); - break; - } - path.unshift(current); - } - return path; + return this._walkPredecessors(previous, from, to, 'Weighted path'); } /** @@ -763,18 +826,7 @@ export default class TraversalService { * @private */ _reconstructPath(parentMap, from, to) { - const path = [to]; - let current = to; - while (current !== from) { - current = parentMap.get(current); - if (current === undefined) { - // Should never happen if algorithm is correct, but guard against infinite loop - this._logger.error('Path reconstruction failed: missing predecessor', { from, to, path }); - break; - } - path.unshift(current); - } - return path; + return this._walkPredecessors(parentMap, from, to, 'Path'); } /** @@ -813,10 +865,11 @@ export default class TraversalService { * @param {string} options.from - Source node SHA * @param {string} options.to - Target node SHA * @param {number} [options.maxDepth=1000] - Maximum search depth + * @param {AbortSignal} [options.signal] - Optional AbortSignal for cancellation * @returns {Promise} */ - async isReachable({ from, to, maxDepth = DEFAULT_MAX_DEPTH }) { - const result = await this.findPath({ from, to, maxDepth }); + async isReachable({ from, to, maxDepth = DEFAULT_MAX_DEPTH, signal }) { + const result = await this.findPath({ from, to, maxDepth, signal }); return result.found; } @@ -827,13 +880,14 @@ export default class TraversalService { * @param {string[]} options.shas - Array of node SHAs to find common ancestors for * @param {number} [options.maxResults=100] - Maximum ancestors to return * @param {number} [options.maxDepth=1000] - Maximum depth to search + * @param {AbortSignal} [options.signal] - Optional AbortSignal for cancellation * @returns {Promise} Array of common ancestor SHAs */ - async commonAncestors({ shas, maxResults = 100, maxDepth = DEFAULT_MAX_DEPTH }) { + async commonAncestors({ shas, maxResults = 100, maxDepth = DEFAULT_MAX_DEPTH, signal }) { if (shas.length === 0) { return []; } if (shas.length === 1) { const ancestors = []; - for await (const node of this.ancestors({ sha: shas[0], maxNodes: maxResults, maxDepth })) { + for await (const node of this.ancestors({ sha: shas[0], maxNodes: maxResults, maxDepth, signal })) { ancestors.push(node.sha); } return ancestors; @@ -846,8 +900,9 @@ export default class TraversalService { const requiredCount = shas.length; for (const sha of shas) { + checkAborted(signal, 'commonAncestors'); const visited = new Set(); - for await (const node of this.ancestors({ sha, maxDepth })) { + for await (const node of this.ancestors({ sha, maxDepth, signal })) { if (!visited.has(node.sha)) { visited.add(node.sha); ancestorCounts.set(node.sha, (ancestorCounts.get(node.sha) || 0) + 1); @@ -879,10 +934,11 @@ export default class TraversalService { * @param {number} [options.maxNodes=100000] - Maximum nodes to yield * @param {TraversalDirection} [options.direction='forward'] - Direction determines dependency order * @param {boolean} [options.throwOnCycle=false] - If true, throws TraversalError when cycle detected + * @param {AbortSignal} [options.signal] - Optional AbortSignal for cancellation * @yields {TraversalNode} * @throws {TraversalError} If throwOnCycle is true and a cycle is detected */ - async *topologicalSort({ start, maxNodes = DEFAULT_MAX_NODES, direction = 'forward', throwOnCycle = false }) { + async *topologicalSort({ start, maxNodes = DEFAULT_MAX_NODES, direction = 'forward', throwOnCycle = false, signal }) { this._logger.debug('topologicalSort started', { start, direction, maxNodes }); // Phase 1: Discover all reachable nodes and compute in-degrees @@ -895,6 +951,10 @@ export default class TraversalService { allNodes.add(start); while (queue.length > 0) { + if (allNodes.size % 1000 === 0) { + checkAborted(signal, 'topologicalSort'); + } + const sha = queue.shift(); const neighbors = await this._getNeighbors(sha, direction); edges.set(sha, neighbors); @@ -925,6 +985,10 @@ export default class TraversalService { const depthMap = new Map([[start, 0]]); while (ready.length > 0 && nodesYielded < maxNodes) { + if (nodesYielded % 1000 === 0) { + checkAborted(signal, 'topologicalSort'); + } + const sha = ready.shift(); const depth = depthMap.get(sha) || 0; diff --git a/src/domain/utils/CachedValue.js b/src/domain/utils/CachedValue.js new file mode 100644 index 0000000..2ede99a --- /dev/null +++ b/src/domain/utils/CachedValue.js @@ -0,0 +1,140 @@ +/** + * A TTL-based caching utility for a single value. + * + * Caches the result of an expensive operation and reuses it until + * the configured TTL expires. Supports manual invalidation. + * + * @class CachedValue + * @template T + * + * @example + * const cache = new CachedValue({ + * clock, + * ttlMs: 5000, + * compute: async () => await expensiveOperation(), + * }); + * + * // First call computes the value + * const value1 = await cache.get(); + * + * // Subsequent calls within TTL return cached value + * const value2 = await cache.get(); // Same as value1 + * + * // Force recompute + * cache.invalidate(); + * const value3 = await cache.get(); // Fresh value + */ +class CachedValue { + /** + * Creates a CachedValue instance. + * + * @param {Object} options + * @param {import('../../ports/ClockPort.js').default} options.clock - Clock port for timing + * @param {number} options.ttlMs - Time-to-live in milliseconds + * @param {() => T | Promise} options.compute - Function to compute the value when cache is stale + * @throws {Error} If ttlMs is not a positive number + */ + constructor({ clock, ttlMs, compute }) { + if (typeof ttlMs !== 'number' || ttlMs <= 0) { + throw new Error('CachedValue ttlMs must be a positive number'); + } + if (typeof compute !== 'function') { + throw new Error('CachedValue compute must be a function'); + } + + /** @type {import('../../ports/ClockPort.js').default} */ + this._clock = clock; + + /** @type {number} */ + this._ttlMs = ttlMs; + + /** @type {() => T | Promise} */ + this._compute = compute; + + /** @type {T|null} */ + this._value = null; + + /** @type {number} */ + this._cachedAt = 0; + + /** @type {string|null} */ + this._cachedAtIso = null; + } + + /** + * Gets the cached value, computing it if stale or not present. + * + * @returns {Promise} The cached or freshly computed value + */ + async get() { + if (this._isValid()) { + return this._value; + } + + const value = await this._compute(); + this._value = value; + this._cachedAt = this._clock.now(); + this._cachedAtIso = this._clock.timestamp(); + + return value; + } + + /** + * Gets the cached value with metadata about when it was cached. + * + * @returns {Promise<{value: T, cachedAt: string|null, fromCache: boolean}>} + */ + async getWithMetadata() { + const wasValid = this._isValid(); + const value = await this.get(); + + return { + value, + cachedAt: wasValid ? this._cachedAtIso : null, + fromCache: wasValid, + }; + } + + /** + * Invalidates the cached value, forcing recomputation on next get(). + */ + invalidate() { + this._value = null; + this._cachedAt = 0; + this._cachedAtIso = null; + } + + /** + * Checks if the cached value is still valid. + * + * @returns {boolean} True if the cache is valid + * @private + */ + _isValid() { + if (this._value === null) { + return false; + } + return this._clock.now() - this._cachedAt < this._ttlMs; + } + + /** + * Gets the ISO timestamp of when the value was cached. + * Returns null if no value is cached. + * + * @returns {string|null} + */ + get cachedAt() { + return this._cachedAtIso; + } + + /** + * Checks if a value is currently cached (regardless of validity). + * + * @returns {boolean} + */ + get hasValue() { + return this._value !== null; + } +} + +export default CachedValue; diff --git a/src/infrastructure/adapters/GitGraphAdapter.js b/src/infrastructure/adapters/GitGraphAdapter.js index 2a0f227..22dbd04 100644 --- a/src/infrastructure/adapters/GitGraphAdapter.js +++ b/src/infrastructure/adapters/GitGraphAdapter.js @@ -1,5 +1,40 @@ +import { retry } from '@git-stunts/alfred'; import GraphPersistencePort from '../../ports/GraphPersistencePort.js'; +/** + * Transient git errors that are safe to retry. + * @type {string[]} + */ +const TRANSIENT_ERROR_PATTERNS = [ + 'cannot lock ref', + 'resource temporarily unavailable', + 'connection timed out', +]; + +/** + * Determines if an error is transient and safe to retry. + * @param {Error} error - The error to check + * @returns {boolean} True if the error is transient + */ +function isTransientError(error) { + const message = (error.message || '').toLowerCase(); + return TRANSIENT_ERROR_PATTERNS.some(pattern => message.includes(pattern)); +} + +/** + * Default retry options for git operations. + * Uses exponential backoff with decorrelated jitter. + * @type {import('@git-stunts/alfred').RetryOptions} + */ +const DEFAULT_RETRY_OPTIONS = { + retries: 3, + delay: 100, + maxDelay: 2000, + backoff: 'exponential', + jitter: 'decorrelated', + shouldRetry: isTransientError, +}; + /** * Implementation of GraphPersistencePort using GitPlumbing. */ @@ -7,10 +42,22 @@ export default class GitGraphAdapter extends GraphPersistencePort { /** * @param {Object} options * @param {import('@git-stunts/plumbing').default} options.plumbing + * @param {import('@git-stunts/alfred').RetryOptions} [options.retryOptions] - Custom retry options */ - constructor({ plumbing }) { + constructor({ plumbing, retryOptions = {} }) { super(); this.plumbing = plumbing; + this._retryOptions = { ...DEFAULT_RETRY_OPTIONS, ...retryOptions }; + } + + /** + * Executes a git command with retry logic. + * @param {Object} options - Options to pass to plumbing.execute + * @returns {Promise} Command output + * @private + */ + async _executeWithRetry(options) { + return retry(() => this.plumbing.execute(options), this._retryOptions); } get emptyTree() { @@ -25,13 +72,13 @@ export default class GitGraphAdapter extends GraphPersistencePort { const signArgs = sign ? ['-S'] : []; const args = ['commit-tree', this.emptyTree, ...parentArgs, ...signArgs, '-m', message]; - const oid = await this.plumbing.execute({ args }); + const oid = await this._executeWithRetry({ args }); return oid.trim(); } async showNode(sha) { this._validateOid(sha); - return await this.plumbing.execute({ args: ['show', '-s', '--format=%B', sha] }); + return await this._executeWithRetry({ args: ['show', '-s', '--format=%B', sha] }); } /** @@ -45,7 +92,7 @@ export default class GitGraphAdapter extends GraphPersistencePort { // Format: SHA, author, date, parents (space-separated), then message // Using %x00 to separate fields for reliable parsing const format = '%H%x00%an <%ae>%x00%aI%x00%P%x00%B'; - const output = await this.plumbing.execute({ + const output = await this._executeWithRetry({ args: ['show', '-s', `--format=${format}`, sha] }); @@ -75,7 +122,7 @@ export default class GitGraphAdapter extends GraphPersistencePort { args.push(`--format=${format}`); } args.push(ref); - return await this.plumbing.execute({ args }); + return await this._executeWithRetry({ args }); } /** @@ -134,7 +181,7 @@ export default class GitGraphAdapter extends GraphPersistencePort { } async writeBlob(content) { - const oid = await this.plumbing.execute({ + const oid = await this._executeWithRetry({ args: ['hash-object', '-w', '--stdin'], input: content, }); @@ -142,7 +189,7 @@ export default class GitGraphAdapter extends GraphPersistencePort { } async writeTree(entries) { - const oid = await this.plumbing.execute({ + const oid = await this._executeWithRetry({ args: ['mktree'], input: `${entries.join('\n')}\n`, }); @@ -161,7 +208,7 @@ export default class GitGraphAdapter extends GraphPersistencePort { async readTreeOids(treeOid) { this._validateOid(treeOid); - const output = await this.plumbing.execute({ + const output = await this._executeWithRetry({ args: ['ls-tree', '-r', '-z', treeOid] }); @@ -202,7 +249,7 @@ export default class GitGraphAdapter extends GraphPersistencePort { async updateRef(ref, oid) { this._validateRef(ref); this._validateOid(oid); - await this.plumbing.execute({ + await this._executeWithRetry({ args: ['update-ref', ref, oid] }); } @@ -215,7 +262,7 @@ export default class GitGraphAdapter extends GraphPersistencePort { async readRef(ref) { this._validateRef(ref); try { - const oid = await this.plumbing.execute({ + const oid = await this._executeWithRetry({ args: ['rev-parse', ref] }); return oid.trim(); @@ -241,7 +288,7 @@ export default class GitGraphAdapter extends GraphPersistencePort { */ async deleteRef(ref) { this._validateRef(ref); - await this.plumbing.execute({ + await this._executeWithRetry({ args: ['update-ref', '-d', ref] }); } @@ -295,7 +342,7 @@ export default class GitGraphAdapter extends GraphPersistencePort { async nodeExists(sha) { this._validateOid(sha); try { - await this.plumbing.execute({ args: ['cat-file', '-e', sha] }); + await this._executeWithRetry({ args: ['cat-file', '-e', sha] }); return true; } catch { return false; @@ -310,7 +357,7 @@ export default class GitGraphAdapter extends GraphPersistencePort { async ping() { const start = Date.now(); try { - await this.plumbing.execute({ args: ['rev-parse', '--git-dir'] }); + await this._executeWithRetry({ args: ['rev-parse', '--git-dir'] }); const latencyMs = Date.now() - start; return { ok: true, latencyMs }; } catch { @@ -327,9 +374,30 @@ export default class GitGraphAdapter extends GraphPersistencePort { */ async countNodes(ref) { this._validateRef(ref); - const output = await this.plumbing.execute({ + const output = await this._executeWithRetry({ args: ['rev-list', '--count', ref] }); return parseInt(output.trim(), 10); } + + /** + * Checks if one commit is an ancestor of another. + * Uses `git merge-base --is-ancestor` for efficient ancestry testing. + * + * @param {string} potentialAncestor - The commit that might be an ancestor + * @param {string} descendant - The commit that might be a descendant + * @returns {Promise} True if potentialAncestor is an ancestor of descendant + */ + async isAncestor(potentialAncestor, descendant) { + this._validateOid(potentialAncestor); + this._validateOid(descendant); + try { + await this._executeWithRetry({ + args: ['merge-base', '--is-ancestor', potentialAncestor, descendant] + }); + return true; // Exit code 0 means it IS an ancestor + } catch { + return false; // Exit code 1 means it is NOT an ancestor + } + } } diff --git a/test/unit/domain/EmptyGraph.manual-mode.test.js b/test/unit/domain/EmptyGraph.manual-mode.test.js new file mode 100644 index 0000000..ea00e8c --- /dev/null +++ b/test/unit/domain/EmptyGraph.manual-mode.test.js @@ -0,0 +1,213 @@ +import { describe, it, expect, vi, beforeEach } from 'vitest'; + +// Mock GitGraphAdapter to avoid loading @git-stunts/alfred which may not be available +vi.mock('../../../src/infrastructure/adapters/GitGraphAdapter.js', () => ({ + default: class MockGitGraphAdapter {}, +})); + +import EmptyGraph from '../../../index.js'; + +describe('EmptyGraph manual mode', () => { + let mockPersistence; + + beforeEach(() => { + let shaCounter = 0; + mockPersistence = { + commitNode: vi.fn().mockImplementation(async () => `sha-${shaCounter++}`), + showNode: vi.fn().mockResolvedValue('node-content'), + getNodeInfo: vi.fn().mockResolvedValue({ + sha: 'abc123', + message: 'test message', + author: 'Test Author', + date: '2026-01-29 10:00:00 -0500', + parents: [], + }), + nodeExists: vi.fn().mockResolvedValue(true), + logNodesStream: vi.fn(), + readRef: vi.fn().mockResolvedValue(null), + updateRef: vi.fn().mockResolvedValue(undefined), + }; + }); + + describe('manual mode createNode does not update ref', () => { + it('does not call updateRef when mode=manual and autoSync=manual', async () => { + const graph = await EmptyGraph.open({ + persistence: mockPersistence, + ref: 'refs/empty-graph/test', + mode: 'manual', + autoSync: 'manual', + }); + + await graph.createNode({ message: 'test node' }); + + // In manual mode, no ref update should occur + expect(mockPersistence.updateRef).not.toHaveBeenCalled(); + }); + + it('creates the node successfully without ref management', async () => { + const graph = await EmptyGraph.open({ + persistence: mockPersistence, + ref: 'refs/empty-graph/test', + mode: 'manual', + autoSync: 'manual', + }); + + const sha = await graph.createNode({ message: 'test node' }); + + expect(sha).toBe('sha-0'); + expect(mockPersistence.commitNode).toHaveBeenCalledWith({ + message: 'test node', + parents: [], + sign: false, + }); + }); + + it('does not update ref even after multiple createNode calls', async () => { + const graph = await EmptyGraph.open({ + persistence: mockPersistence, + ref: 'refs/empty-graph/test', + mode: 'manual', + autoSync: 'manual', + }); + + await graph.createNode({ message: 'node 1' }); + await graph.createNode({ message: 'node 2' }); + await graph.createNode({ message: 'node 3' }); + + expect(mockPersistence.updateRef).not.toHaveBeenCalled(); + }); + }); + + describe('sync() updates ref to specified SHA', () => { + it('updates ref to specified SHA in managed mode with autoSync=manual', async () => { + const graph = await EmptyGraph.open({ + persistence: mockPersistence, + ref: 'refs/empty-graph/test', + mode: 'managed', + autoSync: 'manual', + }); + + // Create nodes A and B + const shaA = await graph.createNode({ message: 'Node A' }); + const shaB = await graph.createNode({ message: 'Node B', parents: [shaA] }); + + // No ref updates should have happened yet (autoSync=manual) + expect(mockPersistence.updateRef).not.toHaveBeenCalled(); + + // Now sync to B's sha + const result = await graph.sync(shaB); + + // Ref should now point to B (or an anchor including B) + expect(result.updated).toBe(true); + expect(mockPersistence.updateRef).toHaveBeenCalled(); + // The sha should be either shaB directly or an anchor that includes it + expect(result.sha).toBeDefined(); + }); + + it('returns anchor=false when ref did not exist', async () => { + mockPersistence.readRef.mockResolvedValue(null); + + const graph = await EmptyGraph.open({ + persistence: mockPersistence, + ref: 'refs/empty-graph/test', + mode: 'managed', + autoSync: 'manual', + }); + + const sha = await graph.createNode({ message: 'First node' }); + const result = await graph.sync(sha); + + expect(result.anchor).toBe(false); + expect(result.sha).toBe(sha); + }); + + it('creates anchor when ref already points to different commit', async () => { + // Ref already points to some other commit + mockPersistence.readRef.mockResolvedValue('existing-tip-sha'); + + const graph = await EmptyGraph.open({ + persistence: mockPersistence, + ref: 'refs/empty-graph/test', + mode: 'managed', + autoSync: 'manual', + }); + + const sha = await graph.createNode({ message: 'Disconnected node' }); + const result = await graph.sync(sha); + + // Should create an anchor since the new node is disconnected from existing tip + expect(result.anchor).toBe(true); + expect(result.updated).toBe(true); + }); + }); + + describe('sync() throws without managed mode', () => { + it('throws error when called on graph created via constructor', async () => { + // Create graph via constructor (not open()) - no managed mode + const graph = new EmptyGraph({ persistence: mockPersistence }); + + await expect(graph.sync('some-sha')).rejects.toThrow( + 'sync() requires managed mode. Use EmptyGraph.open() with mode="managed".' + ); + }); + + it('throws error when mode is manual', async () => { + const graph = await EmptyGraph.open({ + persistence: mockPersistence, + ref: 'refs/empty-graph/test', + mode: 'manual', + autoSync: 'manual', + }); + + await expect(graph.sync('some-sha')).rejects.toThrow( + 'sync() requires managed mode. Use EmptyGraph.open() with mode="managed".' + ); + }); + }); + + describe('sync() throws without SHA argument', () => { + it('throws error when called without sha argument', async () => { + const graph = await EmptyGraph.open({ + persistence: mockPersistence, + ref: 'refs/empty-graph/test', + mode: 'managed', + autoSync: 'manual', + }); + + await expect(graph.sync()).rejects.toThrow('sha is required for sync()'); + }); + + it('throws error when called with undefined sha', async () => { + const graph = await EmptyGraph.open({ + persistence: mockPersistence, + ref: 'refs/empty-graph/test', + mode: 'managed', + autoSync: 'manual', + }); + + await expect(graph.sync(undefined)).rejects.toThrow('sha is required for sync()'); + }); + + it('throws error when called with null sha', async () => { + const graph = await EmptyGraph.open({ + persistence: mockPersistence, + ref: 'refs/empty-graph/test', + mode: 'managed', + autoSync: 'manual', + }); + + await expect(graph.sync(null)).rejects.toThrow('sha is required for sync()'); + }); + + it('throws error when called with empty string sha', async () => { + const graph = await EmptyGraph.open({ + persistence: mockPersistence, + ref: 'refs/empty-graph/test', + mode: 'managed', + autoSync: 'manual', + }); + + await expect(graph.sync('')).rejects.toThrow('sha is required for sync()'); + }); + }); +}); diff --git a/test/unit/domain/services/BitmapIndexBuilder.test.js b/test/unit/domain/services/BitmapIndexBuilder.test.js index 3cc6ca9..ec13159 100644 --- a/test/unit/domain/services/BitmapIndexBuilder.test.js +++ b/test/unit/domain/services/BitmapIndexBuilder.test.js @@ -1,7 +1,13 @@ import { describe, it, expect } from 'vitest'; -import BitmapIndexBuilder from '../../../../src/domain/services/BitmapIndexBuilder.js'; +import BitmapIndexBuilder, { SHARD_VERSION } from '../../../../src/domain/services/BitmapIndexBuilder.js'; describe('BitmapIndexBuilder', () => { + describe('SHARD_VERSION export', () => { + it('exports SHARD_VERSION as 2 (current format)', () => { + expect(SHARD_VERSION).toBe(2); + }); + }); + describe('constructor', () => { it('creates an empty builder', () => { const builder = new BitmapIndexBuilder(); @@ -65,5 +71,34 @@ describe('BitmapIndexBuilder', () => { expect(envelope.checksum).toBeDefined(); expect(typeof envelope.data['aabbcc']).toBe('string'); // base64 encoded }); + + it('writes v2 shards by default', () => { + const builder = new BitmapIndexBuilder(); + builder.addEdge('aabbcc', 'ddeeff'); + + const tree = builder.serialize(); + + // Check meta shard + const metaEnvelope = JSON.parse(tree['meta_aa.json'].toString()); + expect(metaEnvelope.version).toBe(2); + + // Check forward shard + const fwdEnvelope = JSON.parse(tree['shards_fwd_aa.json'].toString()); + expect(fwdEnvelope.version).toBe(2); + + // Check reverse shard + const revEnvelope = JSON.parse(tree['shards_rev_dd.json'].toString()); + expect(revEnvelope.version).toBe(2); + }); + + it('uses SHARD_VERSION constant for serialized version', () => { + const builder = new BitmapIndexBuilder(); + builder.registerNode('testsha1'); + + const tree = builder.serialize(); + const envelope = JSON.parse(tree['meta_te.json'].toString()); + + expect(envelope.version).toBe(SHARD_VERSION); + }); }); }); diff --git a/test/unit/domain/services/BitmapIndexReader.test.js b/test/unit/domain/services/BitmapIndexReader.test.js index 143d6b4..a7d0a5d 100644 --- a/test/unit/domain/services/BitmapIndexReader.test.js +++ b/test/unit/domain/services/BitmapIndexReader.test.js @@ -1,9 +1,42 @@ import { describe, it, expect, vi, beforeEach } from 'vitest'; import { createHash } from 'crypto'; import BitmapIndexReader from '../../../../src/domain/services/BitmapIndexReader.js'; -import BitmapIndexBuilder from '../../../../src/domain/services/BitmapIndexBuilder.js'; +import BitmapIndexBuilder, { SHARD_VERSION } from '../../../../src/domain/services/BitmapIndexBuilder.js'; import { ShardLoadError, ShardCorruptionError, ShardValidationError } from '../../../../src/domain/errors/index.js'; +/** + * Creates a v1 shard envelope using JSON.stringify for checksum (legacy format). + */ +const createV1Shard = (data) => ({ + version: 1, + checksum: createHash('sha256').update(JSON.stringify(data)).digest('hex'), + data, +}); + +/** + * Produces a canonical JSON string with deterministic key ordering. + * Mirrors the canonicalStringify function used in BitmapIndexBuilder. + */ +const canonicalStringify = (obj) => { + if (obj === null || typeof obj !== 'object') { + return JSON.stringify(obj); + } + if (Array.isArray(obj)) { + return '[' + obj.map(canonicalStringify).join(',') + ']'; + } + const keys = Object.keys(obj).sort(); + return '{' + keys.map(k => JSON.stringify(k) + ':' + canonicalStringify(obj[k])).join(',') + '}'; +}; + +/** + * Creates a v2 shard envelope using canonicalStringify for checksum (current format). + */ +const createV2Shard = (data) => ({ + version: 2, + checksum: createHash('sha256').update(canonicalStringify(data)).digest('hex'), + data, +}); + describe('BitmapIndexReader', () => { let mockStorage; let reader; @@ -341,6 +374,158 @@ describe('BitmapIndexReader', () => { }); }); + describe('shard versioning', () => { + it('accepts v1 shards for backward compatibility', async () => { + const v1Data = { 'abcd1234': 42 }; + const v1Shard = createV1Shard(v1Data); + + mockStorage.readBlob.mockResolvedValue(Buffer.from(JSON.stringify(v1Shard))); + + reader.setup({ + 'meta_ab.json': 'v1-shard-oid' + }); + + const id = await reader.lookupId('abcd1234'); + expect(id).toBe(42); + }); + + it('accepts v2 shards', async () => { + const v2Data = { 'abcd1234': 99 }; + const v2Shard = createV2Shard(v2Data); + + mockStorage.readBlob.mockResolvedValue(Buffer.from(JSON.stringify(v2Shard))); + + reader.setup({ + 'meta_ab.json': 'v2-shard-oid' + }); + + const id = await reader.lookupId('abcd1234'); + expect(id).toBe(99); + }); + + it('v2 checksum mismatch throws ShardValidationError in strict mode', async () => { + const strictReader = new BitmapIndexReader({ storage: mockStorage, strict: true }); + + // Create v2 shard with intentionally wrong checksum + const v2ShardWithBadChecksum = { + version: 2, + checksum: 'intentionally-wrong-checksum-value', + data: { 'abcd1234': 123 } + }; + + mockStorage.readBlob.mockResolvedValue(Buffer.from(JSON.stringify(v2ShardWithBadChecksum))); + + strictReader.setup({ + 'meta_ab.json': 'bad-checksum-v2-oid' + }); + + await expect(strictReader.lookupId('abcd1234')).rejects.toThrow(ShardValidationError); + + // Verify the error contains the expected context + try { + await strictReader.lookupId('abcd1234'); + } catch (err) { + expect(err.field).toBe('checksum'); + expect(err.shardPath).toBe('meta_ab.json'); + } + }); + + it('v2 checksum mismatch logs warning in non-strict mode (graceful degradation)', async () => { + const mockLogger = { + debug: vi.fn(), + info: vi.fn(), + warn: vi.fn(), + error: vi.fn(), + }; + const nonStrictReader = new BitmapIndexReader({ + storage: mockStorage, + strict: false, + logger: mockLogger, + }); + + // Create v2 shard with intentionally wrong checksum + const v2ShardWithBadChecksum = { + version: 2, + checksum: 'intentionally-wrong-checksum-value', + data: { 'abcd1234': 123 } + }; + + mockStorage.readBlob.mockResolvedValue(Buffer.from(JSON.stringify(v2ShardWithBadChecksum))); + + nonStrictReader.setup({ + 'meta_ab.json': 'bad-checksum-v2-oid' + }); + + // Should not throw, but return undefined (empty shard cached) + const result = await nonStrictReader.lookupId('abcd1234'); + expect(result).toBeUndefined(); + + // Should have logged a warning + expect(mockLogger.warn).toHaveBeenCalledTimes(1); + expect(mockLogger.warn).toHaveBeenCalledWith('Shard validation warning', expect.objectContaining({ + shardPath: 'meta_ab.json', + field: 'checksum', + error: 'Checksum mismatch', + })); + }); + + it('v1 checksum uses JSON.stringify, not canonicalStringify', async () => { + // This test verifies backward compatibility: v1 shards use JSON.stringify + // for checksum computation, which may produce different results than + // canonicalStringify for objects with unsorted keys. + + // Data with keys in non-alphabetical order + const v1Data = { 'zebra': 1, 'alpha': 2 }; + + // v1 checksum computed with JSON.stringify (key order preserved) + const v1Checksum = createHash('sha256') + .update(JSON.stringify(v1Data)) + .digest('hex'); + + const v1Shard = { + version: 1, + checksum: v1Checksum, + data: v1Data + }; + + mockStorage.readBlob.mockResolvedValue(Buffer.from(JSON.stringify(v1Shard))); + + reader.setup({ + 'meta_ze.json': 'v1-key-order-oid' + }); + + // Should succeed because v1 uses JSON.stringify for verification + const id = await reader.lookupId('zebra'); + expect(id).toBe(1); + }); + + it('v2 checksum uses canonicalStringify for deterministic ordering', async () => { + // Data with keys in non-alphabetical order + const v2Data = { 'zebra': 1, 'alpha': 2 }; + + // v2 checksum computed with canonicalStringify (keys sorted) + const v2Checksum = createHash('sha256') + .update(canonicalStringify(v2Data)) + .digest('hex'); + + const v2Shard = { + version: 2, + checksum: v2Checksum, + data: v2Data + }; + + mockStorage.readBlob.mockResolvedValue(Buffer.from(JSON.stringify(v2Shard))); + + reader.setup({ + 'meta_ze.json': 'v2-canonical-oid' + }); + + // Should succeed because v2 uses canonicalStringify for verification + const id = await reader.lookupId('zebra'); + expect(id).toBe(1); + }); + }); + describe('LRU cache eviction', () => { it('evicts least recently used shards when exceeding maxCachedShards', async () => { // Create reader with small cache size diff --git a/test/unit/domain/services/GraphService.test.js b/test/unit/domain/services/GraphService.test.js index 08ea118..2851797 100644 --- a/test/unit/domain/services/GraphService.test.js +++ b/test/unit/domain/services/GraphService.test.js @@ -1,6 +1,7 @@ import { describe, it, expect, vi, beforeEach } from 'vitest'; import GraphService from '../../../../src/domain/services/GraphService.js'; import GraphNode from '../../../../src/domain/entities/GraphNode.js'; +import EmptyMessageError from '../../../../src/domain/errors/EmptyMessageError.js'; describe('GraphService', () => { let service; @@ -160,6 +161,59 @@ describe('GraphService', () => { }); }); + describe('empty message validation', () => { + it('createNode throws EmptyMessageError for empty message', async () => { + await expect(service.createNode({ message: '' })) + .rejects.toThrow(EmptyMessageError); + }); + + it('EmptyMessageError has code EMPTY_MESSAGE', async () => { + try { + await service.createNode({ message: '' }); + expect.fail('Expected EmptyMessageError to be thrown'); + } catch (error) { + expect(error).toBeInstanceOf(EmptyMessageError); + expect(error.code).toBe('EMPTY_MESSAGE'); + } + }); + + it('EmptyMessageError has operation createNode', async () => { + try { + await service.createNode({ message: '' }); + expect.fail('Expected EmptyMessageError to be thrown'); + } catch (error) { + expect(error).toBeInstanceOf(EmptyMessageError); + expect(error.operation).toBe('createNode'); + } + }); + + it('createNodes throws EmptyMessageError for empty message', async () => { + await expect(service.createNodes([{ message: '' }])) + .rejects.toThrow(EmptyMessageError); + }); + + it('createNodes EmptyMessageError includes index in context', async () => { + try { + await service.createNodes([ + { message: 'Valid' }, + { message: '' }, // Empty at index 1 + ]); + expect.fail('Expected EmptyMessageError to be thrown'); + } catch (error) { + expect(error).toBeInstanceOf(EmptyMessageError); + expect(error.code).toBe('EMPTY_MESSAGE'); + expect(error.operation).toBe('createNodes'); + expect(error.context.index).toBe(1); + } + }); + + it('createNodes does not call persistence for empty message', async () => { + await expect(service.createNodes([{ message: '' }])) + .rejects.toThrow(EmptyMessageError); + expect(mockPersistence.commitNode).not.toHaveBeenCalled(); + }); + }); + describe('readNode()', () => { it('delegates to persistence.showNode', async () => { const content = await service.readNode('some-sha'); diff --git a/test/unit/domain/services/ManagedModeDurability.test.js b/test/unit/domain/services/ManagedModeDurability.test.js new file mode 100644 index 0000000..7f17e41 --- /dev/null +++ b/test/unit/domain/services/ManagedModeDurability.test.js @@ -0,0 +1,580 @@ +import { describe, it, expect, vi, beforeEach } from 'vitest'; + +// Mock GitGraphAdapter to avoid loading @git-stunts/alfred which may not be available +vi.mock('../../../../src/infrastructure/adapters/GitGraphAdapter.js', () => ({ + default: class MockGitGraphAdapter {}, +})); + +import EmptyGraph from '../../../../index.js'; +import GraphRefManager from '../../../../src/domain/services/GraphRefManager.js'; + +/** + * Tests for the managed mode durability feature. + * + * Managed mode ensures that all created nodes remain reachable from the graph ref, + * protecting them from git garbage collection. This is implemented through: + * + * 1. Automatic ref updates after each createNode() call + * 2. Anchor commits when nodes have disconnected roots + * 3. Fast-forward updates when possible (descendant relationship) + */ +describe('Managed Mode Durability', () => { + let mockPersistence; + + beforeEach(() => { + // Track ref state for realistic behavior + let currentRef = null; + let commitCounter = 0; + + mockPersistence = { + emptyTree: '4b825dc642cb6eb9a060e54bf8d69288fbee4904', + commitNode: vi.fn().mockImplementation(async () => { + return `sha${(++commitCounter).toString().padStart(8, '0')}`; + }), + showNode: vi.fn().mockResolvedValue('node-content'), + getNodeInfo: vi.fn().mockResolvedValue({ + sha: 'abc123', + message: 'test message', + author: 'Test Author', + date: '2026-01-29 10:00:00 -0500', + parents: ['parent1'], + }), + nodeExists: vi.fn().mockResolvedValue(true), + logNodesStream: vi.fn(), + readRef: vi.fn().mockImplementation(async () => currentRef), + updateRef: vi.fn().mockImplementation(async (_ref, oid) => { + currentRef = oid; + }), + countNodes: vi.fn().mockResolvedValue(1), + }; + }); + + describe('managed mode creates reachable nodes', () => { + it('creates ref pointing to node on first write', async () => { + const graph = await EmptyGraph.open({ + persistence: mockPersistence, + ref: 'refs/empty-graph/test', + mode: 'managed', + }); + + const sha = await graph.createNode({ message: 'First node' }); + + // Verify ref was created pointing to the new node + expect(mockPersistence.updateRef).toHaveBeenCalledWith( + 'refs/empty-graph/test', + sha + ); + }); + + it('ref points to reachable commit after createNode', async () => { + const graph = await EmptyGraph.open({ + persistence: mockPersistence, + ref: 'refs/empty-graph/test', + mode: 'managed', + }); + + await graph.createNode({ message: 'Node A' }); + + // The ref should have been updated + expect(mockPersistence.updateRef).toHaveBeenCalled(); + + // Reading the ref should return the SHA we created + const refSha = await mockPersistence.readRef('refs/empty-graph/test'); + expect(refSha).toBe('sha00000001'); + }); + + it('nodes are created with proper parents', async () => { + const graph = await EmptyGraph.open({ + persistence: mockPersistence, + ref: 'refs/empty-graph/test', + mode: 'managed', + }); + + // Create a root node + const rootSha = await graph.createNode({ message: 'Root' }); + + // Create a child node with explicit parent + await graph.createNode({ message: 'Child', parents: [rootSha] }); + + // Verify child was created with parent + expect(mockPersistence.commitNode).toHaveBeenNthCalledWith(2, { + message: 'Child', + parents: [rootSha], + sign: false, + }); + }); + }); + + describe('managed mode with disconnected roots creates anchor', () => { + it('creates anchor when adding disconnected root', async () => { + // Start with an existing ref + let currentRef = 'existingsha001'; + mockPersistence.readRef.mockImplementation(async () => currentRef); + mockPersistence.updateRef.mockImplementation(async (_ref, oid) => { + currentRef = oid; + }); + + const graph = await EmptyGraph.open({ + persistence: mockPersistence, + ref: 'refs/empty-graph/test', + mode: 'managed', + }); + + // Create a new disconnected root (no parents) + await graph.createNode({ message: 'New root' }); + + // An anchor commit should have been created + // The anchor has parents: [currentTip, newNode] + const anchorCall = mockPersistence.commitNode.mock.calls.find( + call => call[0].message.includes('"_type":"anchor"') + ); + expect(anchorCall).toBeDefined(); + expect(anchorCall[0].parents).toContain('existingsha001'); + }); + + it('both disconnected roots are reachable from ref', async () => { + // Track all commits and their parents for reachability checking + const commits = new Map(); + let refTip = null; + let counter = 0; + + mockPersistence.commitNode.mockImplementation(async ({ message, parents }) => { + const sha = `sha${(++counter).toString().padStart(8, '0')}`; + commits.set(sha, { message, parents }); + return sha; + }); + mockPersistence.readRef.mockImplementation(async () => refTip); + mockPersistence.updateRef.mockImplementation(async (_ref, oid) => { + refTip = oid; + }); + + const graph = await EmptyGraph.open({ + persistence: mockPersistence, + ref: 'refs/empty-graph/test', + mode: 'managed', + }); + + // Create first root + const shaA = await graph.createNode({ message: 'Root A' }); + + // Create second disconnected root + const shaB = await graph.createNode({ message: 'Root B' }); + + // Helper to check if a SHA is reachable from the ref tip + function isReachable(targetSha, startSha = refTip, visited = new Set()) { + if (!startSha || visited.has(startSha)) return false; + if (startSha === targetSha) return true; + visited.add(startSha); + + const commit = commits.get(startSha); + if (!commit) return false; + + return commit.parents.some(p => isReachable(targetSha, p, visited)); + } + + // Both A and B should be reachable from the ref + expect(isReachable(shaA)).toBe(true); + expect(isReachable(shaB)).toBe(true); + }); + + it('anchor commit has correct structure', async () => { + let refTip = 'existingsha001'; + mockPersistence.readRef.mockImplementation(async () => refTip); + mockPersistence.updateRef.mockImplementation(async (_ref, oid) => { + refTip = oid; + }); + + const graph = await EmptyGraph.open({ + persistence: mockPersistence, + ref: 'refs/empty-graph/test', + mode: 'managed', + }); + + await graph.createNode({ message: 'New disconnected node' }); + + // Find the anchor commit + const anchorCall = mockPersistence.commitNode.mock.calls.find( + call => call[0].message.includes('_type') + ); + + // Anchor should have JSON message with _type: 'anchor' + const anchorMessage = JSON.parse(anchorCall[0].message); + expect(anchorMessage._type).toBe('anchor'); + + // Anchor should have both the old tip and new node as parents + expect(anchorCall[0].parents.length).toBe(2); + }); + }); + + describe('managed mode with descendant does fast-forward', () => { + it('creates anchor for descendant with current implementation (isAncestor TODO)', async () => { + // Note: The current implementation of isAncestor always returns false, + // so even when B is a descendant of A, an anchor is created. + // This is safe but not optimal. When isAncestor is properly implemented + // using `git merge-base --is-ancestor`, this test should be updated to + // verify fast-forward behavior (no anchor created). + + // Track commits + const commits = new Map(); + let refTip = null; + let counter = 0; + + mockPersistence.commitNode.mockImplementation(async ({ message, parents }) => { + const sha = `sha${(++counter).toString().padStart(8, '0')}`; + commits.set(sha, { message, parents }); + return sha; + }); + mockPersistence.readRef.mockImplementation(async () => refTip); + mockPersistence.updateRef.mockImplementation(async (_ref, oid) => { + refTip = oid; + }); + + const graph = await EmptyGraph.open({ + persistence: mockPersistence, + ref: 'refs/empty-graph/test', + mode: 'managed', + }); + + // Create root A + const shaA = await graph.createNode({ message: 'Root A' }); + + // Create B with A as parent (B is descendant of A) + const shaB = await graph.createNode({ message: 'Child B', parents: [shaA] }); + + // With current implementation, an anchor is created even for fast-forward cases + // The ref tip will be the anchor, not shaB directly + const anchorCalls = mockPersistence.commitNode.mock.calls.filter( + call => call[0].message.includes('_type') + ); + expect(anchorCalls.length).toBe(1); + + // Both A and B should be reachable from the anchor + const anchor = commits.get(refTip); + expect(anchor).toBeDefined(); + expect(anchor.parents).toContain(shaA); // Previous tip + expect(anchor.parents).toContain(shaB); // New node + }); + + it('ref points to the descendant node after update', async () => { + let refTip = null; + let counter = 0; + + mockPersistence.commitNode.mockImplementation(async () => { + return `sha${(++counter).toString().padStart(8, '0')}`; + }); + mockPersistence.readRef.mockImplementation(async () => refTip); + mockPersistence.updateRef.mockImplementation(async (_ref, oid) => { + refTip = oid; + }); + + const graph = await EmptyGraph.open({ + persistence: mockPersistence, + ref: 'refs/empty-graph/test', + mode: 'managed', + }); + + const shaA = await graph.createNode({ message: 'A' }); + const shaB = await graph.createNode({ message: 'B', parents: [shaA] }); + + // The ref should have been updated to include B + // (Either directly to B, or to an anchor that includes B) + const finalRef = await mockPersistence.readRef('refs/empty-graph/test'); + expect(finalRef).toBeDefined(); + + // updateRef should have been called for B (or an anchor containing B) + expect(mockPersistence.updateRef).toHaveBeenCalled(); + }); + }); + + describe('manual mode does not auto-update ref', () => { + it('does not update ref on createNode', async () => { + const graph = await EmptyGraph.open({ + persistence: mockPersistence, + ref: 'refs/empty-graph/test', + mode: 'manual', + }); + + await graph.createNode({ message: 'Node A' }); + + // In manual mode, updateRef should NOT be called automatically + expect(mockPersistence.updateRef).not.toHaveBeenCalled(); + }); + + it('nodes are still created but ref is unchanged', async () => { + const graph = await EmptyGraph.open({ + persistence: mockPersistence, + ref: 'refs/empty-graph/test', + mode: 'manual', + }); + + const sha = await graph.createNode({ message: 'Node A' }); + + // Node was created + expect(mockPersistence.commitNode).toHaveBeenCalled(); + expect(sha).toBe('sha00000001'); + + // But ref was not updated + expect(mockPersistence.updateRef).not.toHaveBeenCalled(); + }); + + it('requires managed mode with autoSync=manual for explicit sync()', async () => { + // To use sync() manually, you need managed mode with autoSync='manual' + // This is tested in the existing EmptyGraph.manual-mode.test.js + const graph = await EmptyGraph.open({ + persistence: mockPersistence, + ref: 'refs/empty-graph/test', + mode: 'managed', + autoSync: 'manual', + }); + + const sha = await graph.createNode({ message: 'Node A' }); + + // Ref not updated yet (autoSync is manual) + expect(mockPersistence.updateRef).not.toHaveBeenCalled(); + + // Explicit sync + await graph.sync(sha); + + // Now ref should be updated + expect(mockPersistence.updateRef).toHaveBeenCalledWith( + 'refs/empty-graph/test', + sha + ); + }); + + it('sync() throws in manual mode without ref manager', async () => { + const graph = await EmptyGraph.open({ + persistence: mockPersistence, + ref: 'refs/empty-graph/test', + mode: 'manual', + }); + + await graph.createNode({ message: 'Node A' }); + + // sync() requires managed mode + await expect(graph.sync('somesha')).rejects.toThrow('requires managed mode'); + }); + }); + + describe('GraphRefManager unit tests', () => { + let refManager; + let refState; + + beforeEach(() => { + refState = { tip: null }; + + mockPersistence.readRef.mockImplementation(async () => refState.tip); + mockPersistence.updateRef.mockImplementation(async (_ref, oid) => { + refState.tip = oid; + }); + + refManager = new GraphRefManager({ persistence: mockPersistence }); + }); + + it('readHead returns null when ref does not exist', async () => { + refState.tip = null; + + const sha = await refManager.readHead('refs/empty-graph/test'); + + expect(sha).toBeNull(); + }); + + it('readHead returns SHA when ref exists', async () => { + refState.tip = 'abc123def456'; + + const sha = await refManager.readHead('refs/empty-graph/test'); + + expect(sha).toBe('abc123def456'); + }); + + it('syncHead creates ref when it does not exist', async () => { + refState.tip = null; + + const result = await refManager.syncHead('refs/empty-graph/test', 'newsha123'); + + expect(result).toEqual({ + updated: true, + anchor: false, + sha: 'newsha123', + }); + expect(refState.tip).toBe('newsha123'); + }); + + it('syncHead returns same SHA when ref already points to it', async () => { + refState.tip = 'existingsha'; + + const result = await refManager.syncHead('refs/empty-graph/test', 'existingsha'); + + expect(result).toEqual({ + updated: true, + anchor: false, + sha: 'existingsha', + }); + }); + + it('syncHead creates anchor when ref exists and points elsewhere', async () => { + refState.tip = 'oldtip123'; + let anchorSha; + + mockPersistence.commitNode.mockImplementation(async ({ message, parents }) => { + if (message.includes('anchor')) { + anchorSha = 'anchor999'; + return anchorSha; + } + return 'regular123'; + }); + + const result = await refManager.syncHead('refs/empty-graph/test', 'newtip456'); + + // With current implementation (isAncestor always false), anchor is created + expect(result.anchor).toBe(true); + expect(mockPersistence.commitNode).toHaveBeenCalledWith({ + message: JSON.stringify({ _type: 'anchor' }), + parents: ['oldtip123', 'newtip456'], + }); + }); + + it('createAnchor creates commit with anchor marker', async () => { + mockPersistence.commitNode.mockResolvedValue('anchorsha789'); + + const sha = await refManager.createAnchor(['parent1', 'parent2']); + + expect(sha).toBe('anchorsha789'); + expect(mockPersistence.commitNode).toHaveBeenCalledWith({ + message: JSON.stringify({ _type: 'anchor' }), + parents: ['parent1', 'parent2'], + }); + }); + + it('isAncestor currently returns false (TODO: implement)', async () => { + // Document current behavior - always returns false + const result = await refManager.isAncestor('ancestor', 'descendant'); + + expect(result).toBe(false); + }); + }); + + describe('batch operations in managed mode', () => { + it('beginBatch returns a batch context', async () => { + const graph = await EmptyGraph.open({ + persistence: mockPersistence, + ref: 'refs/empty-graph/test', + mode: 'managed', + }); + + const batch = graph.beginBatch(); + + expect(batch).toBeDefined(); + expect(typeof batch.createNode).toBe('function'); + expect(typeof batch.commit).toBe('function'); + }); + + it('batch.createNode does not update ref', async () => { + const graph = await EmptyGraph.open({ + persistence: mockPersistence, + ref: 'refs/empty-graph/test', + mode: 'managed', + }); + + const batch = graph.beginBatch(); + await batch.createNode({ message: 'Batched node' }); + + // Ref should not be updated during batch + expect(mockPersistence.updateRef).not.toHaveBeenCalled(); + }); + + it('batch.commit updates ref once for all nodes', async () => { + let refTip = null; + mockPersistence.readRef.mockImplementation(async () => refTip); + mockPersistence.updateRef.mockImplementation(async (_ref, oid) => { + refTip = oid; + }); + + const graph = await EmptyGraph.open({ + persistence: mockPersistence, + ref: 'refs/empty-graph/test', + mode: 'managed', + }); + + const batch = graph.beginBatch(); + const sha1 = await batch.createNode({ message: 'Node 1' }); + const sha2 = await batch.createNode({ message: 'Node 2' }); + const sha3 = await batch.createNode({ message: 'Node 3' }); + + // Still no ref update + expect(mockPersistence.updateRef).not.toHaveBeenCalled(); + + // Commit the batch + const result = await batch.commit(); + + // Now ref should be updated + expect(mockPersistence.updateRef).toHaveBeenCalled(); + expect(result.count).toBe(3); + }); + + it('batch tracks created SHAs', async () => { + const graph = await EmptyGraph.open({ + persistence: mockPersistence, + ref: 'refs/empty-graph/test', + mode: 'managed', + }); + + const batch = graph.beginBatch(); + const sha1 = await batch.createNode({ message: 'Node 1' }); + const sha2 = await batch.createNode({ message: 'Node 2' }); + + expect(batch.createdShas).toEqual([sha1, sha2]); + }); + + it('beginBatch throws in manual mode', async () => { + const graph = await EmptyGraph.open({ + persistence: mockPersistence, + ref: 'refs/empty-graph/test', + mode: 'manual', + }); + + expect(() => graph.beginBatch()).toThrow('requires managed mode'); + }); + }); + + describe('createNodes bulk operation in managed mode', () => { + it('syncs ref after createNodes completes', async () => { + let refTip = null; + mockPersistence.readRef.mockImplementation(async () => refTip); + mockPersistence.updateRef.mockImplementation(async (_ref, oid) => { + refTip = oid; + }); + + const graph = await EmptyGraph.open({ + persistence: mockPersistence, + ref: 'refs/empty-graph/test', + mode: 'managed', + }); + + const shas = await graph.createNodes([ + { message: 'Root' }, + { message: 'Child', parents: ['$0'] }, + ]); + + // Ref should be updated to include the last SHA + expect(mockPersistence.updateRef).toHaveBeenCalled(); + expect(shas).toHaveLength(2); + }); + + it('does not sync ref in manual mode', async () => { + const graph = await EmptyGraph.open({ + persistence: mockPersistence, + ref: 'refs/empty-graph/test', + mode: 'manual', + }); + + await graph.createNodes([ + { message: 'Root' }, + { message: 'Child', parents: ['$0'] }, + ]); + + // Ref should NOT be updated in manual mode + expect(mockPersistence.updateRef).not.toHaveBeenCalled(); + }); + }); +}); diff --git a/test/unit/domain/utils/CachedValue.test.js b/test/unit/domain/utils/CachedValue.test.js new file mode 100644 index 0000000..afe5115 --- /dev/null +++ b/test/unit/domain/utils/CachedValue.test.js @@ -0,0 +1,362 @@ +import { describe, it, expect, vi } from 'vitest'; +import CachedValue from '../../../../src/domain/utils/CachedValue.js'; + +/** + * Creates a mock clock for testing. + * @returns {Object} Mock clock with controllable time + */ +function createMockClock() { + let currentTime = 0; + return { + now: () => currentTime, + timestamp: () => new Date(currentTime).toISOString(), + advance: (ms) => { + currentTime += ms; + }, + setTime: (ms) => { + currentTime = ms; + }, + }; +} + +describe('CachedValue', () => { + describe('constructor', () => { + it('creates cache with valid options', () => { + const clock = createMockClock(); + const cache = new CachedValue({ + clock, + ttlMs: 5000, + compute: () => 'value', + }); + + expect(cache.hasValue).toBe(false); + expect(cache.cachedAt).toBeNull(); + }); + + it('throws when ttlMs is not a positive number', () => { + const clock = createMockClock(); + const compute = () => 'value'; + + expect(() => new CachedValue({ clock, ttlMs: 0, compute })).toThrow( + 'CachedValue ttlMs must be a positive number', + ); + expect(() => new CachedValue({ clock, ttlMs: -1, compute })).toThrow( + 'CachedValue ttlMs must be a positive number', + ); + expect(() => new CachedValue({ clock, ttlMs: 'invalid', compute })).toThrow( + 'CachedValue ttlMs must be a positive number', + ); + expect(() => new CachedValue({ clock, ttlMs: null, compute })).toThrow( + 'CachedValue ttlMs must be a positive number', + ); + }); + + it('throws when compute is not a function', () => { + const clock = createMockClock(); + + expect(() => new CachedValue({ clock, ttlMs: 5000, compute: 'not a function' })).toThrow( + 'CachedValue compute must be a function', + ); + expect(() => new CachedValue({ clock, ttlMs: 5000, compute: null })).toThrow( + 'CachedValue compute must be a function', + ); + }); + }); + + describe('get', () => { + it('computes value on first call', async () => { + const clock = createMockClock(); + const compute = vi.fn().mockResolvedValue('computed'); + const cache = new CachedValue({ clock, ttlMs: 5000, compute }); + + const value = await cache.get(); + + expect(value).toBe('computed'); + expect(compute).toHaveBeenCalledTimes(1); + }); + + it('returns cached value within TTL', async () => { + const clock = createMockClock(); + const compute = vi.fn().mockResolvedValue('computed'); + const cache = new CachedValue({ clock, ttlMs: 5000, compute }); + + await cache.get(); + clock.advance(4999); // Just under TTL + const value = await cache.get(); + + expect(value).toBe('computed'); + expect(compute).toHaveBeenCalledTimes(1); + }); + + it('recomputes value after TTL expires', async () => { + const clock = createMockClock(); + const compute = vi.fn().mockResolvedValueOnce('first').mockResolvedValueOnce('second'); + const cache = new CachedValue({ clock, ttlMs: 5000, compute }); + + const first = await cache.get(); + clock.advance(5001); // Just over TTL + const second = await cache.get(); + + expect(first).toBe('first'); + expect(second).toBe('second'); + expect(compute).toHaveBeenCalledTimes(2); + }); + + it('supports synchronous compute functions', async () => { + const clock = createMockClock(); + const cache = new CachedValue({ + clock, + ttlMs: 5000, + compute: () => 'sync value', + }); + + const value = await cache.get(); + + expect(value).toBe('sync value'); + }); + + it('supports async compute functions', async () => { + const clock = createMockClock(); + const cache = new CachedValue({ + clock, + ttlMs: 5000, + compute: async () => { + return 'async value'; + }, + }); + + const value = await cache.get(); + + expect(value).toBe('async value'); + }); + }); + + describe('getWithMetadata', () => { + it('returns fromCache false on first call', async () => { + const clock = createMockClock(); + const cache = new CachedValue({ + clock, + ttlMs: 5000, + compute: () => 'value', + }); + + const result = await cache.getWithMetadata(); + + expect(result.value).toBe('value'); + expect(result.fromCache).toBe(false); + expect(result.cachedAt).toBeNull(); + }); + + it('returns fromCache true and cachedAt for cached results', async () => { + const clock = createMockClock(); + clock.setTime(1000); + const cache = new CachedValue({ + clock, + ttlMs: 5000, + compute: () => 'value', + }); + + await cache.get(); + clock.advance(1000); + const result = await cache.getWithMetadata(); + + expect(result.value).toBe('value'); + expect(result.fromCache).toBe(true); + expect(result.cachedAt).toBe(new Date(1000).toISOString()); + }); + }); + + describe('invalidate', () => { + it('clears cached value', async () => { + const clock = createMockClock(); + const compute = vi.fn().mockResolvedValue('computed'); + const cache = new CachedValue({ clock, ttlMs: 5000, compute }); + + await cache.get(); + cache.invalidate(); + + expect(cache.hasValue).toBe(false); + expect(cache.cachedAt).toBeNull(); + }); + + it('forces recomputation on next get', async () => { + const clock = createMockClock(); + const compute = vi.fn().mockResolvedValueOnce('first').mockResolvedValueOnce('second'); + const cache = new CachedValue({ clock, ttlMs: 5000, compute }); + + const first = await cache.get(); + cache.invalidate(); + const second = await cache.get(); + + expect(first).toBe('first'); + expect(second).toBe('second'); + expect(compute).toHaveBeenCalledTimes(2); + }); + }); + + describe('hasValue', () => { + it('returns false before first computation', () => { + const clock = createMockClock(); + const cache = new CachedValue({ + clock, + ttlMs: 5000, + compute: () => 'value', + }); + + expect(cache.hasValue).toBe(false); + }); + + it('returns true after computation', async () => { + const clock = createMockClock(); + const cache = new CachedValue({ + clock, + ttlMs: 5000, + compute: () => 'value', + }); + + await cache.get(); + + expect(cache.hasValue).toBe(true); + }); + + it('returns true even after TTL expires (value is stale but still cached)', async () => { + const clock = createMockClock(); + const cache = new CachedValue({ + clock, + ttlMs: 5000, + compute: () => 'value', + }); + + await cache.get(); + clock.advance(10000); // Way past TTL + + // Value is stale but still present + expect(cache.hasValue).toBe(true); + }); + + it('returns false after invalidate', async () => { + const clock = createMockClock(); + const cache = new CachedValue({ + clock, + ttlMs: 5000, + compute: () => 'value', + }); + + await cache.get(); + cache.invalidate(); + + expect(cache.hasValue).toBe(false); + }); + }); + + describe('cachedAt', () => { + it('returns null before first computation', () => { + const clock = createMockClock(); + const cache = new CachedValue({ + clock, + ttlMs: 5000, + compute: () => 'value', + }); + + expect(cache.cachedAt).toBeNull(); + }); + + it('returns ISO timestamp after computation', async () => { + const clock = createMockClock(); + clock.setTime(1609459200000); // 2021-01-01T00:00:00.000Z + const cache = new CachedValue({ + clock, + ttlMs: 5000, + compute: () => 'value', + }); + + await cache.get(); + + expect(cache.cachedAt).toBe('2021-01-01T00:00:00.000Z'); + }); + + it('updates timestamp after recomputation', async () => { + const clock = createMockClock(); + clock.setTime(1000); + const cache = new CachedValue({ + clock, + ttlMs: 5000, + compute: () => 'value', + }); + + await cache.get(); + const firstCachedAt = cache.cachedAt; + + clock.advance(6000); // Past TTL + await cache.get(); + const secondCachedAt = cache.cachedAt; + + expect(firstCachedAt).not.toBe(secondCachedAt); + expect(secondCachedAt).toBe(new Date(7000).toISOString()); + }); + }); + + describe('edge cases', () => { + it('handles null return value from compute', async () => { + const clock = createMockClock(); + const compute = vi.fn().mockResolvedValue(null); + const cache = new CachedValue({ clock, ttlMs: 5000, compute }); + + const value = await cache.get(); + + expect(value).toBeNull(); + // Note: hasValue returns false for null since we check _value === null + expect(cache.hasValue).toBe(false); + }); + + it('handles compute function that throws', async () => { + const clock = createMockClock(); + const compute = vi.fn().mockRejectedValue(new Error('compute failed')); + const cache = new CachedValue({ clock, ttlMs: 5000, compute }); + + await expect(cache.get()).rejects.toThrow('compute failed'); + expect(cache.hasValue).toBe(false); + }); + + it('handles very small TTL', async () => { + const clock = createMockClock(); + const compute = vi.fn().mockResolvedValueOnce('first').mockResolvedValueOnce('second'); + const cache = new CachedValue({ clock, ttlMs: 1, compute }); + + await cache.get(); + clock.advance(2); + const second = await cache.get(); + + expect(second).toBe('second'); + expect(compute).toHaveBeenCalledTimes(2); + }); + + it('handles very large TTL', async () => { + const clock = createMockClock(); + const compute = vi.fn().mockResolvedValue('value'); + const cache = new CachedValue({ clock, ttlMs: Number.MAX_SAFE_INTEGER, compute }); + + await cache.get(); + clock.advance(1000000000); + await cache.get(); + + expect(compute).toHaveBeenCalledTimes(1); + }); + + it('caches object values correctly', async () => { + const clock = createMockClock(); + const obj = { nested: { deeply: true }, array: [1, 2, 3] }; + const cache = new CachedValue({ + clock, + ttlMs: 5000, + compute: () => obj, + }); + + const value = await cache.get(); + + expect(value).toBe(obj); + expect(value.nested.deeply).toBe(true); + expect(value.array).toEqual([1, 2, 3]); + }); + }); +});