Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion ARCHITECTURE.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Architecture: @git-stunts/empty-graph

A graph database substrate living entirely within Git commits, using the "Empty Tree" pattern for invisible storage and Roaring Bitmaps for high-performance indexing.
A "hidden" graph database. No files, just Git commits, using the "Empty Tree" pattern for invisible storage and Roaring Bitmaps for high-performance indexing.

## 🧱 Core Concepts

Expand Down
70 changes: 64 additions & 6 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,71 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

## [3.0.0] - 2025-01-30

Comment on lines +10 to +11
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix release date to keep chronology consistent.
Line 10 lists 2025-01-30, which is out of sequence with the 2026 entries below.

✏️ Proposed fix
-## [3.0.0] - 2025-01-30
+## [3.0.0] - 2026-01-30
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
## [3.0.0] - 2025-01-30
## [3.0.0] - 2026-01-30
🤖 Prompt for AI Agents
In `@CHANGELOG.md` around lines 10 - 11, Update the release date in the changelog
header "## [3.0.0] - 2025-01-30" so it is chronologically consistent with the
2026 entries below (for example change to "2026-01-30" or the correct 2026 date
you intend); edit the version header line only to reflect the correct year while
preserving the "## [3.0.0]" text.

### Added
- **Node Query API**: New methods for convenient node access
- `getNode(sha)` - Returns full GraphNode with all metadata (sha, author, date, message, parents)
- `hasNode(sha)` - Boolean existence check without loading full node data
- `countNodes(ref)` - Count nodes reachable from a ref without loading all nodes into memory
- **Batch Operations**: `createNodes(nodes)` - Create multiple nodes in a single operation with placeholder parent refs
- **LRU Cache**: Loaded shards now use an LRU cache to bound memory usage

#### Managed Mode & Durability
- **`EmptyGraph.open()`** - New static factory for creating managed graphs with automatic durability guarantees
- **`GraphRefManager`** - New service for ref/anchor management
- **Anchor commits** - Automatic creation of anchor commits to prevent GC of disconnected subgraphs
- **`graph.sync(sha)`** - Manual ref synchronization for `autoSync: 'manual'` mode
- **`graph.anchor(ref, shas)`** - Power user method for explicit anchor creation

#### Batching API
- **`graph.beginBatch()`** - Start a batch for efficient bulk writes
- **`GraphBatch.createNode()`** - Create nodes without per-write ref updates
- **`GraphBatch.commit()`** - Single octopus anchor for all batch nodes
- **`graph.compactAnchors()`** - Utility to compact anchor chains into single octopus

#### Validation & Error Handling
- **`EmptyMessageError`** - New error type for empty message validation (code: `EMPTY_MESSAGE`)
- Empty messages now rejected at write time (prevents "ghost nodes")

#### Index Improvements
- **Canonical JSON checksums** - Deterministic checksums for cross-engine compatibility
- **Shard version 2** - New format with backward compatibility for v1
- **`SUPPORTED_SHARD_VERSIONS`** - Reader accepts both v1 and v2 shards

#### Performance
- **`isAncestor()`** - New method on GitGraphAdapter for ancestry checking
- **Fast-forward detection** - `syncHead()` skips anchor creation for linear history
- **Octopus anchoring** - Batch.commit() creates single anchor with N parents

#### Cancellation
- AbortSignal propagation added to all TraversalService methods
- AbortSignal support in StreamingBitmapIndexBuilder finalization

#### Node Query API
- **`getNode(sha)`** - Returns full GraphNode with all metadata (sha, author, date, message, parents)
- **`hasNode(sha)`** - Boolean existence check without loading full node data
- **`countNodes(ref)`** - Count nodes reachable from a ref without loading all nodes into memory

#### Batch Operations
- **`createNodes(nodes)`** - Create multiple nodes in a single operation with placeholder parent refs

#### Caching & Resilience
- **LRU Cache** - Loaded shards now use an LRU cache to bound memory usage
- **Retry Logic** - `GitGraphAdapter` now retries transient Git failures with exponential backoff and decorrelated jitter
- Uses `@git-stunts/alfred` resilience library
- Retries on: "cannot lock ref", "resource temporarily unavailable", "connection timed out"
- Configurable via `retryOptions` constructor parameter
- **CachedValue Utility** - Reusable TTL-based caching utility in `src/domain/utils/CachedValue.js`
- **Memory Warning** - `BitmapIndexReader` logs a warning when ID-to-SHA cache exceeds 1M entries (~40MB)

### Changed
- `SHARD_VERSION` bumped from 1 to 2 (v1 still readable)
- **TraversalService** - Refactored path reconstruction into unified `_walkPredecessors()` and `_walkSuccessors()` helpers
- **HealthCheckService** - Now uses `CachedValue` utility instead of inline caching logic

### Fixed
- **Durability bug** - Nodes created via `createNode()` were not reachable from any ref, making them vulnerable to Git GC
- **Ghost nodes** - Empty messages allowed at write time but rejected during iteration

### Documentation
- Added `SEMANTICS.md` - Durability contract and anchor commit semantics
- Updated `README.md` - Durability warning, mode selection guide, new API docs
- Added **Memory Considerations** section documenting memory requirements for large graphs

## [2.5.0] - 2026-01-29

Expand Down
7 changes: 5 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,12 @@ RUN apt-get update && apt-get install -y \
g++ \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY package*.json ./
# Copy alfred first (dependency)
COPY alfred ../alfred
# Copy empty-graph
COPY empty-graph/package*.json ./
RUN npm install
COPY . .
COPY empty-graph .
Comment on lines +9 to +14
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

CI build failure: alfred directory not found in build context.

The pipeline reports: COPY alfred ../alfred could not be performed. '/alfred' not found in build context.

The Dockerfile expects alfred to exist as a sibling directory when using the parent directory as build context. This works locally in a monorepo structure but fails in CI, likely because:

  1. CI only checks out the empty-graph repository, not the parent monorepo
  2. The alfred package isn't available in the CI build environment

Potential solutions:

  1. If using a monorepo: Ensure CI checks out the entire monorepo, not just empty-graph
  2. If separate repos: Publish @git-stunts/alfred to npm and remove the local dependency
  3. For CI-only workaround: Add a CI step to clone/copy alfred before the Docker build
#!/bin/bash
# Verify the expected directory structure and CI configuration
echo "=== Checking for monorepo indicators ==="
# Look for workspace configuration
cat package.json 2>/dev/null | jq '.workspaces // "no workspaces defined"'

# Check if there's a parent package.json that might define workspaces
if [ -f ../package.json ]; then
  echo "=== Parent package.json found ==="
  cat ../package.json | jq '.workspaces // "no workspaces"'
fi

# Check CI workflow files for context
echo "=== CI workflow configuration ==="
fd -e yml -e yaml . .github/workflows --exec cat {} \; 2>/dev/null | head -100
🧰 Tools
🪛 GitHub Actions: CI

[error] 10-10: Docker build failed: COPY alfred ../alfred could not be performed. '/alfred' not found in build context.

🤖 Prompt for AI Agents
In `@Dockerfile` around lines 9 - 14, The Dockerfile's COPY alfred ../alfred step
fails in CI because the alfred directory isn't in the build context; update the
build to remove the local sibling dependency or ensure CI provides it: either
make CI check out the monorepo before docker build (so the Dockerfile's COPY
alfred ../alfred succeeds), or stop copying the local package and depend on the
published package (publish `@git-stunts/alfred` to npm and update package.json in
empty-graph), or add a CI step that clones/copies the alfred repo into the build
context prior to running the Dockerfile; locate the COPY alfred ../alfred and
related COPY empty-graph / RUN npm install lines in the Dockerfile and apply one
of these fixes accordingly.

ENV GIT_STUNTS_DOCKER=1
# Default to tests, but can be overridden for benchmark
CMD ["npm", "test"]
Loading
Loading