From 4263ab079af5e2ee0fd351bbf2f1cf2850ab2475 Mon Sep 17 00:00:00 2001 From: Bogdan Fazakas Date: Tue, 27 Jan 2026 14:25:27 +0200 Subject: [PATCH 1/6] Added docs --- FIX_DATABASE_RETRIEVE_ERROR.md | 117 ++ GITHUB_ISSUE.md | 46 + INDEXER_ARCHITECTURE_ANALYSIS.md | 1315 +++++++++++++++++ INDEXER_DOCS_README.md | 346 +++++ INDEXER_FLOW_DIAGRAMS.md | 712 +++++++++ INDEXER_MEETING_SUMMARY.md | 491 ++++++ INDEXER_USE_CASES_AND_FLOWS.md | 2379 ++++++++++++++++++++++++++++++ 7 files changed, 5406 insertions(+) create mode 100644 FIX_DATABASE_RETRIEVE_ERROR.md create mode 100644 GITHUB_ISSUE.md create mode 100644 INDEXER_ARCHITECTURE_ANALYSIS.md create mode 100644 INDEXER_DOCS_README.md create mode 100644 INDEXER_FLOW_DIAGRAMS.md create mode 100644 INDEXER_MEETING_SUMMARY.md create mode 100644 INDEXER_USE_CASES_AND_FLOWS.md diff --git a/FIX_DATABASE_RETRIEVE_ERROR.md b/FIX_DATABASE_RETRIEVE_ERROR.md new file mode 100644 index 000000000..2f86f7bc0 --- /dev/null +++ b/FIX_DATABASE_RETRIEVE_ERROR.md @@ -0,0 +1,117 @@ +# Fix for "Cannot read properties of undefined (reading 'retrieve')" Error + +**Date:** 2026-01-15 +**Status:** ✅ Fixed + +## Problem Description + +The error occurred when the `findDDO` command tried to retrieve a DDO from the database: + +``` +2026-01-15T09:07:40.161Z error: CORE: ❌ Error: 'Cannot read properties of undefined (reading 'retrieve')' +was caught while getting DDO info for id: did:op:bb83b4b7f86b9523523be931a763aaa3a20dc9d3d46c96feb1940e86fde278ac +``` + +### Root Cause + +The issue occurred when the database configuration was invalid or incomplete. In such cases: +1. The `Database` class would be partially initialized +2. The `ddo` property (and other database properties like `indexer`, `logs`, `ddoState`) would be `undefined` +3. Code attempting to call methods like `database.ddo.retrieve()` would fail with "Cannot read properties of undefined" + +This happened because the database initialization in `/src/components/database/index.ts` only creates the `ddo`, `indexer`, `logs`, `order`, and `ddoState` properties when `hasValidDBConfiguration(config)` returns `true` (lines 65-108). + +## Solution + +Added defensive checks before accessing database properties in all affected files. The fix ensures that: +1. The database object exists +2. The specific database property (e.g., `ddo`, `indexer`, `logs`) exists +3. Returns appropriate error responses (HTTP 503 - Service Unavailable) when database is not available + +## Files Modified + +### 1. `/src/components/core/utils/findDdoHandler.ts` +**Function:** `findDDOLocally()` +- Added check for `database` and `database.ddo` before calling `retrieve()` +- Returns `undefined` with a warning log if database is not available + +### 2. `/src/components/core/handler/queryHandler.ts` +**Functions:** +- `QueryHandler.handle()` - Added check for `database.ddo` +- `QueryDdoStateHandler.handle()` - Added check for `database.ddoState` +- Returns HTTP 503 error if database is not available + +### 3. `/src/components/core/handler/ddoHandler.ts` +**Functions:** +- `GetDdoHandler.handle()` - Added check for `database.ddo` +- `FindDdoHandler.handle()` (sink function) - Added check before checking if DDO exists locally +- `findAndFormatDdo()` - Added check for `database.ddo` +- Returns HTTP 503 error if database is not available + +### 4. `/src/components/core/handler/policyServer.ts` +**Function:** `PolicyServerInitializeHandler.handle()` +- Added check for `database.ddo` before retrieving DDO +- Returns HTTP 503 error if database is not available + +### 5. `/src/components/httpRoutes/logs.ts` +**Route:** `POST /log/:id` +- Added check for `database.logs` before retrieving log +- Returns HTTP 503 error if database is not available + +### 6. `/src/components/core/utils/statusHandler.ts` +**Function:** `getIndexerBlockInfo()` +- Added check for `database.indexer` before retrieving block info +- Returns '0' with a warning log if indexer database is not available + +## Pattern Applied + +Before (unsafe): +```typescript +const ddo = await node.getDatabase().ddo.retrieve(id) +``` + +After (safe): +```typescript +const database = node.getDatabase() +if (!database || !database.ddo) { + // Handle error appropriately + return { + stream: null, + status: { httpStatus: 503, error: 'DDO database is not available' } + } +} +const ddo = await database.ddo.retrieve(id) +``` + +## Testing + +To verify the fix: +1. Run the node with an invalid database configuration +2. Try to execute a `findDDO` command +3. The system should now return a proper error message instead of crashing + +Expected behavior: +- HTTP 503 response with message: "DDO database is not available" +- Logs should show warning messages about unavailable database +- No more "Cannot read properties of undefined" errors + +## Impact + +- **Backwards Compatible:** Yes, no breaking changes +- **Error Handling:** Improved - now provides meaningful error messages +- **Stability:** Significantly improved - prevents crashes when database is not fully initialized +- **Performance:** No impact - only adds lightweight null checks + +## Related Files + +- `/src/OceanNode.ts` - Defines `getDatabase()` method +- `/src/components/database/index.ts` - Database initialization logic +- `/src/components/database/DatabaseFactory.ts` - Database factory pattern + +## Configuration Note + +To ensure full database functionality, make sure the following environment variable is properly configured: +- `DB_URL` - Required for DDO, Indexer, Logs, Order, and DDO State databases + +Without a valid `DB_URL`, only the Nonce, C2D, Auth Token, and Config databases will be initialized. + diff --git a/GITHUB_ISSUE.md b/GITHUB_ISSUE.md new file mode 100644 index 000000000..35343923c --- /dev/null +++ b/GITHUB_ISSUE.md @@ -0,0 +1,46 @@ +## Bug: Cannot read properties of undefined (reading 'retrieve') in findDDO command + +### Description +The ocean-node crashes with `TypeError: Cannot read properties of undefined (reading 'retrieve')` when attempting to execute the `findDDO` command if the database is not fully initialized. + +### Steps to Reproduce +1. Run ocean-node without a valid `DB_URL` environment variable +2. Execute a `findDDO` command with any DID +3. Observe the crash + +### Error Log +``` +2026-01-15T09:07:40.160Z debug: CORE: Unable to find DDO locally. Proceeding to call findDDO +2026-01-15T09:07:40.161Z info: CORE: Checking received command data for Command "findDDO": { + "id": "did:op:bb83b4b7f86b9523523be931a763aaa3a20dc9d3d46c96feb1940e86fde278ac", + "command": "findDDO", + "force": false +} +2026-01-15T09:07:40.161Z error: CORE: ❌ Error: 'Cannot read properties of undefined (reading 'retrieve')' was caught while getting DDO info for id: did:op:bb83b4b7f86b9523523be931a763aaa3a20dc9d3d46c96feb1940e86fde278ac +``` + +### Root Cause +When `DB_URL` is invalid or missing, the `Database` class only initializes essential databases (Nonce, C2D, Auth Token, Config). Properties like `ddo`, `indexer`, `logs`, `ddoState`, and `order` remain `undefined`. Code accessing these properties without null checks throws `TypeError`. + +### Impact +- Node crashes when handling DDO-related commands without proper database configuration +- Poor error messages that don't indicate the actual problem (missing DB configuration) +- Affects multiple handlers: `findDDO`, `getDDO`, `query`, `policyServer`, etc. + +### Solution +Add defensive null checks before accessing database properties in: +- `findDdoHandler.ts` - `findDDOLocally()` +- `queryHandler.ts` - `QueryHandler` and `QueryDdoStateHandler` +- `ddoHandler.ts` - `GetDdoHandler`, `FindDdoHandler`, `findAndFormatDdo()` +- `policyServer.ts` - `PolicyServerInitializeHandler` +- `logs.ts` - `/log/:id` route +- `statusHandler.ts` - `getIndexerBlockInfo()` + +Return HTTP 503 with clear error message: "DDO database is not available" instead of crashing. + +### Expected Behavior After Fix +- Node returns HTTP 503 with descriptive error message +- Logs warning about unavailable database +- No crashes when database is not fully initialized +- Backwards compatible with existing functionality + diff --git a/INDEXER_ARCHITECTURE_ANALYSIS.md b/INDEXER_ARCHITECTURE_ANALYSIS.md new file mode 100644 index 000000000..cb7dcca03 --- /dev/null +++ b/INDEXER_ARCHITECTURE_ANALYSIS.md @@ -0,0 +1,1315 @@ +# Ocean Node Indexer - Architecture Analysis & Refactoring Proposal + +**Date:** January 14, 2026 +**Purpose:** Architecture review and refactoring direction for the Ocean Node Indexer component + +--- + +## 1. CURRENT ARCHITECTURE OVERVIEW + +### 1.1 High-Level Components + +The Indexer system consists of the following main components: + +``` +OceanIndexer (Main Coordinator) + ├── Worker Threads (crawlerThread.ts) - One per supported chain + │ ├── Block Crawler + │ ├── Event Retrieval + │ └── Reindex Queue Manager + ├── Processor (processor.ts) - Event processing orchestrator + │ └── Event Processors (processors/*.ts) - Specific event handlers + └── Database Layer + ├── Indexer State (last indexed block per chain) + ├── DDO Storage (asset metadata) + ├── Order Storage + └── State Tracking (ddoState) +``` + +### 1.2 Component Responsibilities + +#### **OceanIndexer** (`index.ts`) + +- Main coordinator class +- Manages worker threads (one per blockchain network) +- Handles job queue for admin commands (reindex operations) +- Event emitter for DDO and crawling events +- Version management and reindexing triggers + +#### **CrawlerThread** (`crawlerThread.ts`) + +- Runs in separate Worker Thread per chain +- Infinite loop polling blockchain for new blocks +- Retrieves logs/events from block ranges +- Manages reindex queue (per transaction) +- Updates last indexed block in database + +#### **Processor** (`processor.ts`) + +- Orchestrates event processing +- Routes events to specific processors +- Handles validator checks (metadata validators, access lists) +- Manages event filtering + +#### **Event Processors** (`processors/*.ts`) + +- Specific handlers for each event type: + - MetadataEventProcessor (METADATA_CREATED, METADATA_UPDATED) + - MetadataStateEventProcessor (METADATA_STATE) + - OrderStartedEventProcessor + - OrderReusedEventProcessor + - Dispenser processors (Created, Activated, Deactivated) + - Exchange processors (Created, Activated, Deactivated, RateChanged) + +--- + +## 2. HOW BLOCK PARSING WORKS + +### 2.1 Block Crawling Flow + +``` +┌─────────────────────────────────────────────────────────────┐ +│ 1. INITIALIZATION (per chain) │ +│ - Get deployment block from contract addresses │ +│ - Get last indexed block from database │ +│ - Start block = max(deploymentBlock, lastIndexedBlock) │ +└─────────────────────────────────────────────────────────────┘ + ↓ +┌─────────────────────────────────────────────────────────────┐ +│ 2. MAIN LOOP (infinite while true) │ +│ - Get current network height │ +│ - Calculate blocks to process (min of chunkSize and │ +│ remaining blocks) │ +│ - If networkHeight > startBlock: process chunk │ +│ - Else: sleep for interval (default 30s) │ +└─────────────────────────────────────────────────────────────┘ + ↓ +┌─────────────────────────────────────────────────────────────┐ +│ 3. EVENT RETRIEVAL (retrieveChunkEvents) │ +│ - provider.getLogs({ │ +│ fromBlock: lastIndexedBlock + 1, │ +│ toBlock: lastIndexedBlock + chunkSize, │ +│ topics: [EVENT_HASHES] // All supported events │ +│ }) │ +│ - Returns array of Log objects │ +└─────────────────────────────────────────────────────────────┘ + ↓ +┌─────────────────────────────────────────────────────────────┐ +│ 4. PROCESS BLOCKS (processBlocks) │ +│ - Call processChunkLogs(logs, signer, provider, chainId) │ +│ - Update last indexed block in database │ +│ - Emit events for newly indexed assets │ +└─────────────────────────────────────────────────────────────┘ + ↓ +┌─────────────────────────────────────────────────────────────┐ +│ 5. ADAPTIVE CHUNK SIZING │ +│ - On error: chunkSize = floor(chunkSize / 2) │ +│ - After 3 successful calls: revert to original chunkSize │ +│ - Minimum chunkSize = 1 │ +└─────────────────────────────────────────────────────────────┘ +``` + +### 2.2 Key Implementation Details + +**Location:** `crawlerThread.ts` - `processNetworkData()` + +```typescript +// Main crawling loop characteristics: +- Infinite loop with lockProccessing flag +- Dynamic chunk sizing (adaptive to RPC failures) +- Retry mechanism with configurable interval +- Reindex queue processing after each chunk +- One-shot CRAWLING_STARTED event emission +``` + +**Event Retrieval:** `utils.ts` - `retrieveChunkEvents()` + +- Uses ethers `provider.getLogs()` with topic filters +- Filters by all known Ocean Protocol event hashes +- Single RPC call per chunk +- Throws error on failure (caught by crawler for retry) + +--- + +## 3. HOW EVENT STORAGE WORKS + +### 3.1 Event Processing Pipeline + +``` +Raw Log (ethers.Log) + ↓ +┌──────────────────────────────────────┐ +│ 1. EVENT IDENTIFICATION │ +│ - Match log.topics[0] with │ +│ EVENT_HASHES lookup table │ +│ - Route to appropriate processor │ +└──────────────────────────────────────┘ + ↓ +┌──────────────────────────────────────┐ +│ 2. VALIDATION LAYER │ +│ - Check if NFT deployed by │ +│ Ocean Factory │ +│ - Validate metadata proofs │ +│ - Check allowedValidators list │ +│ - Check access list memberships │ +│ - Check authorized publishers │ +└──────────────────────────────────────┘ + ↓ +┌──────────────────────────────────────┐ +│ 3. EVENT-SPECIFIC PROCESSING │ +│ - Decode event data from receipt │ +│ - For Metadata events: │ +│ • Decrypt/decompress DDO │ +│ • Validate DDO hash │ +│ • Check purgatory status │ +│ • Fetch pricing info │ +│ • Check policy server │ +│ - For Order events: │ +│ • Update order count stats │ +│ • Create order record │ +│ - For Pricing events: │ +│ • Update pricing arrays │ +└──────────────────────────────────────┘ + ↓ +┌──────────────────────────────────────┐ +│ 4. DATABASE PERSISTENCE │ +│ - DDO Database (Elasticsearch/ │ +│ Typesense) │ +│ - DDO State (validation tracking) │ +│ - Order Database │ +│ - Indexer State (last block) │ +└──────────────────────────────────────┘ +``` + +### 3.2 Storage Schemas + +**Indexer State:** + +```typescript +{ + id: chainId (string), + lastIndexedBlock: number +} +``` + +**DDO Storage:** + +- Full DDO document stored (as per Ocean Protocol DDO spec) +- Enhanced with `indexedMetadata`: + ```typescript + { + nft: { state, address, name, symbol, owner, created, tokenURI }, + event: { txid, from, contract, block, datetime }, + stats: [{ + datatokenAddress, name, symbol, serviceId, + orders: number, + prices: [{ type, price, contract, token, exchangeId? }] + }], + purgatory: { state: boolean } + } + ``` + +**DDO State Tracking:** + +```typescript +{ + chainId: number, + did: string, + nft: string, + txId: string, + valid: boolean, + error: string // if validation failed +} +``` + +**Order Storage:** + +```typescript +{ + type: 'startOrder' | 'reuseOrder', + timestamp: Date, + consumer: address, + payer: address, + datatokenAddress: address, + nftAddress: address, + did: string, + startOrderId: string +} +``` + +--- + +## 4. PAIN POINTS & ISSUES + +### 4.1 Architecture Complexity + +**Issue:** Mixed concerns and tight coupling + +- `crawlerThread.ts` handles: + - Block crawling logic + - Network communication + - Database updates + - Message passing + - Reindex queue management + - Error handling and retry logic + +**Impact:** Hard to test, debug, and modify individual components + +--- + +### 4.2 Worker Thread Architecture + +**Issue:** Complex inter-thread communication + +- Parent-child message passing using `parentPort.postMessage()` +- Shared state management through message queues +- Two separate queues: `INDEXING_QUEUE` (parent) and `REINDEX_QUEUE` (worker) +- Race conditions possible with `lockProccessing` flag + +**Code smell:** + +```typescript +// In crawlerThread.ts +parentPort.on('message', (message) => { + if (message.method === INDEXER_MESSAGES.START_CRAWLING) { ... } + else if (message.method === INDEXER_MESSAGES.REINDEX_TX) { ... } + else if (message.method === INDEXER_MESSAGES.REINDEX_CHAIN) { ... } + else if (message.method === INDEXER_MESSAGES.STOP_CRAWLING) { ... } +}) +``` + +**Impact:** + +- Hard to reason about state +- Difficult to add new features +- Testing requires mocking Worker Threads + +--- + +### 4.3 Error Handling & Recovery + +**Issue:** Multiple retry mechanisms at different levels + +1. Crawler level: `retryCrawlerWithDelay()` with max 10 retries +2. Chunk retrieval: adaptive chunk sizing on error +3. Block processing: sleep and retry on error +4. Individual RPC calls: `withRetrial()` helper with 5 retries + +**Problems:** + +- No centralized error tracking +- Unclear recovery state after multiple failures +- Potential for infinite loops or deadlocks +- No circuit breaker pattern + +--- + +### 4.4 Event Processing Complexity + +**Issue:** Monolithic `processChunkLogs()` function + +- 180+ lines in single function +- Nested validation logic for metadata events +- Multiple external contract calls during validation +- Synchronous processing (one event at a time) + +**Code complexity example:** + +```typescript +// From processor.ts lines 79-162 +if (event.type === EVENTS.METADATA_CREATED || ...) { + if (checkMetadataValidated) { + const txReceipt = await provider.getTransactionReceipt(...) + const metadataProofs = fetchEventFromTransaction(...) + if (!metadataProofs) { continue } + + const validators = metadataProofs.map(...) + const allowed = allowedValidators.filter(...) + if (!allowed.length) { continue } + + if (allowedValidatorsList && validators.length > 0) { + isAllowed = false + for (const accessListAddress of allowedValidatorsList[chain]) { + const accessListContract = new ethers.Contract(...) + for (const metaproofValidator of validators) { + const balance = await accessListContract.balanceOf(...) + // ... more nested logic + } + } + if (!isAllowed) { continue } + } + } +} +``` + +**Impact:** + +- Hard to read and maintain +- Performance bottleneck (serial processing) +- Difficult to add new validation rules +- Error in one validation affects all events + +--- + +### 4.5 Metadata Decryption Complexity + +**Issue:** `decryptDDO()` method in BaseProcessor (400+ lines) + +- Handles HTTP, P2P, and local decryption +- Complex nonce management +- Signature verification inline +- Multiple error paths +- Retry logic embedded + +**Impact:** + +- Single Responsibility Principle violated +- Hard to test different decryption strategies +- Error messages unclear about failure point + +--- + +### 4.6 Database Abstraction Issues + +**Issue:** Direct database calls throughout processors + +```typescript +const { ddo: ddoDatabase, ddoState, order: orderDatabase } = await getDatabase() +``` + +**Problems:** + +- Tight coupling to database implementation +- Transaction management unclear +- No batch operations +- No caching strategy +- Multiple database calls per event + +--- + +### 4.7 State Management + +**Issue:** Global mutable state + +```typescript +// In index.ts +let INDEXING_QUEUE: ReindexTask[] = [] +const JOBS_QUEUE: JobStatus[] = [] +const runningThreads: Map = new Map() + +// In crawlerThread.ts +let REINDEX_BLOCK: number = null +const REINDEX_QUEUE: ReindexTask[] = [] +let stoppedCrawling: boolean = false +let startedCrawling: boolean = false +``` + +**Impact:** + +- Hard to test +- Race conditions +- Unclear ownership +- Memory leaks potential + +--- + +### 4.8 Lack of Observability + +**Issue:** Limited monitoring and metrics + +- No performance metrics (events/sec, blocks/sec) +- No latency tracking +- No failure rate monitoring +- Logger used but no structured metrics +- Hard to debug production issues + +--- + +### 4.9 Testing Challenges + +**Issue:** Integration test heavy, unit tests sparse + +- Worker threads hard to unit test +- Database dependencies in all tests +- Long-running integration tests +- No mocking strategy for blockchain + +--- + +### 4.10 Configuration & Deployment + +**Issue:** Environment-dependent behavior + +- RPC URLs in environment variables +- Chunk sizes configurable but defaults unclear +- Interval timing hardcoded in multiple places +- No configuration validation + +--- + +## 5. REFACTORING PROPOSAL - HIGH-LEVEL ARCHITECTURE + +### 5.1 Design Principles + +1. **Separation of Concerns**: Each component has one clear responsibility +2. **Dependency Inversion**: Depend on abstractions, not implementations +3. **Testability**: Every component unit testable in isolation +4. **Observability**: Built-in metrics and monitoring +5. **Resilience**: Explicit error handling with circuit breakers +6. **Maintainability**: Clear code structure, documented patterns + +--- + +### 5.2 Proposed Architecture + +``` +┌─────────────────────────────────────────────────────────────────┐ +│ IndexerOrchestrator │ +│ - Coordinates all indexing operations │ +│ - Manages lifecycle of chain indexers │ +│ - Handles configuration and health checks │ +└─────────────────────────────────────────────────────────────────┘ + │ + ┌─────────────────────┼─────────────────────┐ + ↓ ↓ ↓ +┌──────────────┐ ┌──────────────┐ ┌──────────────┐ +│ChainIndexer 1│ │ChainIndexer 2│ │ChainIndexer N│ +│ (per chain) │ │ (per chain) │ ... │ (per chain) │ +└──────────────┘ └──────────────┘ └──────────────┘ + │ + ├──> BlockScanner (fetches block ranges) + │ │ + │ └──> RPC Client (with retry & fallback) + │ + ├──> EventExtractor (filters & decodes events) + │ + ├──> ValidationPipeline + │ ├──> FactoryValidator + │ ├──> MetadataValidator + │ ├──> PublisherValidator + │ └──> PolicyValidator + │ + ├──> EventProcessor + │ ├──> MetadataProcessor + │ ├──> OrderProcessor + │ └──> PricingProcessor + │ + └──> StateManager + ├──> ProgressTracker (last indexed block) + ├──> EventStore (processed events) + └──> ReindexQueue +``` + +--- + +### 5.3 Component Details + +#### **5.3.1 IndexerOrchestrator** + +**Responsibility:** Top-level coordinator + +```typescript +class IndexerOrchestrator { + private chainIndexers: Map + private config: IndexerConfig + private eventBus: EventBus + private metrics: MetricsCollector + + async start(): Promise + async stop(): Promise + async reindexChain(chainId: number, fromBlock?: number): Promise + async reindexTransaction(chainId: number, txHash: string): Promise + getStatus(): IndexerStatus +} +``` + +**Benefits:** + +- Single entry point +- Clear lifecycle management +- Easy to add new chains +- Health check support + +--- + +#### **5.3.2 ChainIndexer** + +**Responsibility:** Manages indexing for a single blockchain + +```typescript +class ChainIndexer { + private chainId: number + private scanner: BlockScanner + private extractor: EventExtractor + private pipeline: ValidationPipeline + private processor: EventProcessor + private stateManager: StateManager + private running: boolean + + async start(): Promise + async stop(): Promise + async processBlockRange(from: number, to: number): Promise +} +``` + +**Benefits:** + +- Self-contained per chain +- No worker threads needed (use async/await) +- Easy to test +- Clear dependencies + +--- + +#### **5.3.3 BlockScanner** + +**Responsibility:** Fetch blocks and logs from RPC + +```typescript +interface BlockScanner { + getLatestBlock(): Promise + getLogs(fromBlock: number, toBlock: number, topics: string[]): Promise +} + +class EthersBlockScanner implements BlockScanner { + private rpcClient: ResilientRpcClient + + // Implementation with retry and fallback +} + +class ResilientRpcClient { + private providers: JsonRpcProvider[] + private circuitBreaker: CircuitBreaker + private metrics: MetricsCollector + + async execute(fn: (provider: JsonRpcProvider) => Promise): Promise +} +``` + +**Benefits:** + +- Encapsulates RPC communication +- Retry/fallback logic in one place +- Easy to mock for testing +- Circuit breaker prevents cascade failures + +--- + +#### **5.3.4 EventExtractor** + +**Responsibility:** Decode and categorize events + +```typescript +class EventExtractor { + private eventRegistry: EventRegistry + + extractEvents(logs: Log[]): CategorizedEvents + decodeEvent(log: Log): DecodedEvent +} + +interface CategorizedEvents { + metadata: MetadataEvent[] + orders: OrderEvent[] + pricing: PricingEvent[] + unknown: Log[] +} +``` + +**Benefits:** + +- Single responsibility +- Stateless and pure +- Easy to test +- Clear input/output + +--- + +#### **5.3.5 ValidationPipeline** + +**Responsibility:** Chain of validators for events + +```typescript +interface Validator { + validate(event: DecodedEvent, context: ValidationContext): Promise +} + +class ValidationPipeline { + private validators: Validator[] + + async validate(event: DecodedEvent): Promise + addValidator(validator: Validator): void +} + +// Specific validators +class FactoryValidator implements Validator +class MetadataProofValidator implements Validator +class PublisherValidator implements Validator +class AccessListValidator implements Validator +class PolicyServerValidator implements Validator +``` + +**Benefits:** + +- Chain of Responsibility pattern +- Each validator is independent +- Easy to add/remove validators +- Parallel validation possible +- Clear failure points + +--- + +#### **5.3.6 EventProcessor** + +**Responsibility:** Transform validated events into domain models + +```typescript +interface EventHandler { + handle(event: T): Promise +} + +class EventProcessor { + private handlers: Map + + async process(event: DecodedEvent): Promise +} + +// Specific handlers +class MetadataCreatedHandler implements EventHandler +class OrderStartedHandler implements EventHandler +class DispenserActivatedHandler implements EventHandler +``` + +**Benefits:** + +- Strategy pattern for different event types +- Stateless handlers +- Easy to test +- Clear transformations + +--- + +#### **5.3.7 StateManager** + +**Responsibility:** Manage persistence and state + +```typescript +interface StateManager { + getLastIndexedBlock(chainId: number): Promise + setLastIndexedBlock(chainId: number, block: number): Promise + + saveDDO(ddo: DDO): Promise + saveOrder(order: Order): Promise + updatePricing(pricing: PricingUpdate): Promise + + // Batch operations + saveBatch(entities: DomainEntity[]): Promise +} + +class TransactionalStateManager implements StateManager { + private ddoRepository: DDORepository + private orderRepository: OrderRepository + private progressRepository: ProgressRepository + + async transaction(fn: (repos: Repositories) => Promise): Promise +} +``` + +**Benefits:** + +- Repository pattern +- Transaction support +- Batch operations for performance +- Easy to swap implementations +- Mockable for tests + +--- + +### 5.4 Data Flow Example + +**Processing a Metadata Created Event:** + +``` +1. ChainIndexer.processBlockRange(1000, 1010) + ↓ +2. BlockScanner.getLogs(1000, 1010, [...topics]) + → Returns: [Log, Log, Log, ...] + ↓ +3. EventExtractor.extractEvents(logs) + → Returns: CategorizedEvents { metadata: [event1], orders: [], ... } + ↓ +4. For each metadata event: + ValidationPipeline.validate(event) + ├─> FactoryValidator.validate() + ├─> MetadataProofValidator.validate() + ├─> PublisherValidator.validate() + └─> PolicyServerValidator.validate() + → Returns: ValidationResult { valid: true, ... } + ↓ +5. EventProcessor.process(event) + → MetadataCreatedHandler.handle(event) + ├─> Decrypt DDO + ├─> Fetch pricing info + └─> Build DDO entity + → Returns: DDO + ↓ +6. StateManager.saveDDO(ddo) + → Persisted to database + ↓ +7. EventBus.emit('ddo.created', ddo) + → Notify listeners +``` + +--- + +## 6. MIGRATION STRATEGY + +### 6.1 Phase 1: Foundation (Week 1-2) + +**Goals:** + +- Introduce new abstractions without breaking existing code +- Add comprehensive tests + +**Tasks:** + +1. Create `ResilientRpcClient` wrapper +2. Implement `BlockScanner` interface +3. Add metrics collection infrastructure +4. Write unit tests for new components + +**Deliverables:** + +- `ResilientRpcClient` with circuit breaker +- `BlockScanner` implementation +- Test coverage > 80% + +--- + +### 6.2 Phase 2: Validation Extraction (Week 3-4) + +**Goals:** + +- Extract validation logic into pipeline +- Reduce complexity of processor.ts + +**Tasks:** + +1. Create `Validator` interface +2. Implement individual validators +3. Build `ValidationPipeline` +4. Refactor `processChunkLogs()` to use pipeline + +**Deliverables:** + +- 5+ validator implementations +- Validation pipeline with tests +- Reduced complexity in processor.ts + +--- + +### 6.3 Phase 3: Event Processing (Week 5-6) + +**Goals:** + +- Separate event handling from validation +- Introduce domain models + +**Tasks:** + +1. Create `EventHandler` interface +2. Implement handlers for each event type +3. Introduce domain entities (separate from database models) +4. Refactor processors to use handlers + +**Deliverables:** + +- Event handler implementations +- Domain model layer +- Clearer separation of concerns + +--- + +### 6.4 Phase 4: State Management (Week 7-8) + +**Goals:** + +- Decouple from database implementation +- Add transaction support + +**Tasks:** + +1. Create repository interfaces +2. Implement transactional state manager +3. Add batch operation support +4. Migrate database calls to repositories + +**Deliverables:** + +- Repository layer +- Transaction support +- Batch operations +- Performance improvements + +--- + +### 6.5 Phase 5: Remove Worker Threads (Week 9-10) + +**Goals:** + +- Simplify architecture +- Remove inter-thread communication + +**Tasks:** + +1. Implement `ChainIndexer` class +2. Replace worker threads with async loops +3. Migrate message passing to direct method calls +4. Update job queue management + +**Deliverables:** + +- No worker threads +- Simplified code +- Better error handling +- Improved testability + +--- + +### 6.6 Phase 6: Observability & Monitoring (Week 11-12) + +**Goals:** + +- Add comprehensive monitoring +- Improve debugging capabilities + +**Tasks:** + +1. Add structured logging +2. Implement metrics collection +3. Add health check endpoints +4. Create monitoring dashboards + +**Deliverables:** + +- Prometheus metrics +- Grafana dashboards +- Health check API +- Debug tooling + +--- + +## 7. IMMEDIATE WINS (Quick Improvements) + +These can be implemented independently before full refactoring: + +### 7.1 Extract DDO Decryption Service + +**Current:** 400+ line method in BaseProcessor +**Proposed:** Separate `DdoDecryptionService` class + +**Benefits:** + +- Testable in isolation +- Reusable +- Clear interface + +**Effort:** 1-2 days + +--- + +### 7.2 Add Batch Database Operations + +**Current:** One database call per event +**Proposed:** Batch save operations + +```typescript +// Instead of: +for (const event of events) { + await database.save(event) +} + +// Do: +await database.saveBatch(events) +``` + +**Benefits:** + +- 10-50x performance improvement +- Reduced database load + +**Effort:** 2-3 days + +--- + +### 7.3 Extract Validation Logic + +**Current:** Nested if statements in processChunkLogs +**Proposed:** Separate validation functions + +```typescript +class EventValidation { + validateFactory(event): boolean + validateMetadataProof(event): boolean + validatePublisher(event): boolean + validateAccessList(event): boolean +} +``` + +**Benefits:** + +- Readable code +- Testable validations +- Reusable + +**Effort:** 2-3 days + +--- + +### 7.4 Add Circuit Breaker for RPC + +**Current:** Simple retry logic +**Proposed:** Circuit breaker pattern + +**Benefits:** + +- Prevent cascade failures +- Faster failure detection +- Better error messages + +**Effort:** 1-2 days + +--- + +### 7.5 Add Metrics Collection + +**Current:** Only logs +**Proposed:** Prometheus metrics + +```typescript +metrics.indexer_blocks_processed_total.inc() +metrics.indexer_events_processed_total.inc({ type: 'metadata' }) +metrics.indexer_processing_duration_seconds.observe(duration) +metrics.indexer_rpc_errors_total.inc({ provider: 'infura' }) +``` + +**Benefits:** + +- Production visibility +- Performance tracking +- Alerting capability + +**Effort:** 2-3 days + +--- + +## 8. TESTING STRATEGY + +### 8.1 Unit Tests + +**Target:** 80%+ coverage + +**Focus areas:** + +- Validators (each should be 100% covered) +- Event handlers (pure functions, easy to test) +- Extractors and decoders +- Utility functions + +**Tools:** + +- Mocha/Chai (already in use) +- Sinon for mocking +- Test fixtures for events + +--- + +### 8.2 Integration Tests + +**Target:** Critical paths covered + +**Focus areas:** + +- End-to-end event processing +- Database operations +- Reindex operations +- Multi-chain scenarios + +**Tools:** + +- Docker containers for databases +- Hardhat for blockchain mocking +- Test fixtures + +--- + +### 8.3 Performance Tests + +**Target:** Benchmarks established + +**Metrics:** + +- Events processed per second +- Memory usage over time +- RPC call latency +- Database query performance + +**Tools:** + +- k6 or Artillery +- Memory profiling +- Custom benchmarking scripts + +--- + +## 9. ALTERNATIVES CONSIDERED + +### 9.1 Keep Worker Threads + +**Pros:** + +- No need to refactor thread management +- True parallelism + +**Cons:** + +- Complex state management +- Hard to debug +- Testing challenges + +**Decision:** Remove threads (async/await sufficient) + +--- + +### 9.2 Event Sourcing + +**Pros:** + +- Complete audit trail +- Replay capability +- Temporal queries + +**Cons:** + +- Significant complexity increase +- Storage overhead +- Query performance concerns + +**Decision:** Not recommended (too much complexity for benefits) + +--- + +### 9.3 Message Queue (Kafka/RabbitMQ) + +**Pros:** + +- Decoupled components +- Built-in retry/DLQ +- Scalability + +**Cons:** + +- Additional infrastructure +- Operational complexity +- Overkill for current scale + +**Decision:** Revisit when scaling beyond 10+ chains + +--- + +### 9.4 GraphQL Subscriptions + +**Pros:** + +- Real-time updates to clients +- Flexible queries + +**Cons:** + +- Not needed for current use case +- Additional complexity + +**Decision:** Out of scope for indexer refactor + +--- + +## 10. SUCCESS METRICS + +### 10.1 Code Quality + +- **Cyclomatic Complexity:** Reduce from avg 15 to < 5 +- **Lines per Function:** < 50 lines +- **Test Coverage:** > 80% +- **Type Safety:** 100% typed (no `any`) + +### 10.2 Performance + +- **Throughput:** 2x improvement in events/sec +- **Latency:** < 100ms per event +- **Memory:** Stable (no leaks) +- **RPC Calls:** Reduce by 30% (batch operations) + +### 10.3 Reliability + +- **Uptime:** > 99.9% +- **Failed Events:** < 0.1% +- **Recovery Time:** < 5 minutes after RPC failure +- **Reindex Success Rate:** > 99% + +### 10.4 Maintainability + +- **Onboarding Time:** < 2 days for new dev +- **Bug Fix Time:** Avg < 4 hours +- **Feature Addition Time:** Avg < 1 week +- **Production Incidents:** < 1 per month + +--- + +## 11. RISKS & MITIGATION + +### 11.1 Risk: Breaking Changes + +**Mitigation:** + +- Incremental refactoring (Strangler Fig pattern) +- Comprehensive test suite +- Feature flags for new code paths +- Parallel running (old + new) during transition + +### 11.2 Risk: Performance Regression + +**Mitigation:** + +- Benchmark before refactoring +- Performance tests in CI +- Load testing before deployment +- Gradual rollout + +### 11.3 Risk: Data Loss During Migration + +**Mitigation:** + +- Database backups before changes +- Reindex capability +- Validation checks +- Dry-run mode + +### 11.4 Risk: Schedule Overrun + +**Mitigation:** + +- Phased approach with clear milestones +- Regular progress reviews +- Scope adjustment flexibility +- Priority on immediate wins + +--- + +## 12. OPEN QUESTIONS FOR DISCUSSION + +1. **Worker Threads:** Do we need true parallelism or is async/await sufficient? + +2. **Database Choice:** Should we standardize on one (Elasticsearch vs Typesense) or keep both? + +3. **Event Prioritization:** Should critical events (metadata) be prioritized over pricing events? + +4. **Reindex Strategy:** Should reindexing be a separate service/process? + +5. **Monitoring:** What metrics are most important for production monitoring? + +6. **Backward Compatibility:** How long should we support old API/database schemas? + +7. **Multi-Region:** Do we need to support indexer deployment in multiple regions? + +8. **Event Replay:** Do we need ability to replay historical events? + +--- + +## 13. CONCLUSION & NEXT STEPS + +### Current State Summary + +The Ocean Node Indexer is functional but suffers from: + +- High complexity (worker threads, mixed concerns) +- Limited observability +- Difficult to test and maintain +- Performance bottlenecks (serial processing, many RPC calls) + +### Proposed State + +After refactoring: + +- Clear component boundaries +- No worker threads (async/await) +- Comprehensive testing +- Built-in monitoring +- 2x performance improvement +- Easy to extend and maintain + +### Recommended Next Steps + +1. **This Meeting (Today):** + + - Review and discuss this document + - Agree on high-level direction + - Prioritize immediate wins vs full refactor + - Assign owners for investigation tasks + +2. **Next Week:** + + - Detailed design for Phase 1 components + - Create ADRs (Architecture Decision Records) + - Set up performance benchmarks + - Begin implementation of immediate wins + +3. **Ongoing:** + - Weekly architecture sync + - Code reviews focused on quality + - Regular performance testing + - Documentation updates + +--- + +## APPENDIX A: Key Files Reference + +``` +src/components/Indexer/ +├── index.ts - OceanIndexer main class (490 lines) +├── crawlerThread.ts - Worker thread implementation (380 lines) +├── processor.ts - Event processing orchestrator (207 lines) +├── utils.ts - Utility functions (454 lines) +├── purgatory.ts - Purgatory checking +├── version.ts - Version management +└── processors/ + ├── BaseProcessor.ts - Abstract base (442 lines) + ├── MetadataEventProcessor.ts - Metadata handling (403 lines) + ├── MetadataStateEventProcessor.ts + ├── OrderStartedEventProcessor.ts + ├── OrderReusedEventProcessor.ts + ├── DispenserActivatedEventProcessor.ts + ├── DispenserCreatedEventProcessor.ts + ├── DispenserDeactivatedEventProcessor.ts + ├── ExchangeActivatedEventProcessor.ts + ├── ExchangeCreatedEventProcessor.ts + ├── ExchangeDeactivatedEventProcessor.ts + └── ExchangeRateChangedEventProcessor.ts +``` + +--- + +## APPENDIX B: Glossary + +- **DDO:** Decentralized Data Object - Ocean Protocol asset metadata +- **NFT:** Non-Fungible Token - ERC721 contract representing data asset +- **Datatoken:** ERC20 token for accessing data +- **Dispenser:** Contract for free datatoken distribution +- **FRE:** Fixed Rate Exchange - Contract for datatoken pricing +- **Purgatory:** Blocklist for banned assets/accounts +- **MetadataProof:** Validation signature from authorized validators + +--- + +**Document Version:** 1.0 +**Last Updated:** January 14, 2026 +**Authors:** Architecture Team +**Status:** Draft for Discussion diff --git a/INDEXER_DOCS_README.md b/INDEXER_DOCS_README.md new file mode 100644 index 000000000..673a3ce40 --- /dev/null +++ b/INDEXER_DOCS_README.md @@ -0,0 +1,346 @@ +# Ocean Node Indexer - Architecture Review Documents + +**Created:** January 14, 2026 +**Purpose:** Architecture review meeting preparation materials + +--- + +## 📚 Document Guide + +### For Meeting Participants + +**Start here:** Read documents in this order + +1. **[INDEXER_MEETING_SUMMARY.md](./INDEXER_MEETING_SUMMARY.md)** ⭐ + + - **Time to read:** 15-20 minutes + - **Best for:** Quick overview, meeting agenda, action items + - **Contains:** TL;DR, top pain points, immediate wins, timeline + +2. **[INDEXER_FLOW_DIAGRAMS.md](./INDEXER_FLOW_DIAGRAMS.md)** 📊 + + - **Time to read:** 10-15 minutes + - **Best for:** Visual learners, understanding data flow + - **Contains:** Current vs proposed architecture diagrams + +3. **[INDEXER_ARCHITECTURE_ANALYSIS.md](./INDEXER_ARCHITECTURE_ANALYSIS.md)** 📖 + - **Time to read:** 45-60 minutes + - **Best for:** Deep dive, implementation details + - **Contains:** Complete analysis, 13 sections, migration strategy + +--- + +## 🎯 Quick Navigation + +### By Role + +**If you are a Developer:** + +- Read: Summary → Diagrams → Sections 4-5 of Analysis +- Focus on: Code complexity, testing strategy, immediate wins + +**If you are a Tech Lead:** + +- Read: All three documents +- Focus on: Architecture decisions, migration phases, risks + +**If you are a Product Manager:** + +- Read: Summary → Section 10 (Success Metrics) of Analysis +- Focus on: Timeline, priorities, business impact + +**If you are DevOps:** + +- Read: Summary → Section 9 (Diagrams) → Section 6 (Analysis) +- Focus on: Observability, deployment strategy, monitoring + +--- + +## 📋 Meeting Prep Checklist + +### Before the Meeting + +- [ ] Read INDEXER_MEETING_SUMMARY.md +- [ ] Review INDEXER_FLOW_DIAGRAMS.md +- [ ] Optionally: Deep dive into INDEXER_ARCHITECTURE_ANALYSIS.md +- [ ] Prepare your questions and concerns +- [ ] Review the codebase (key files listed in documents) + +### During the Meeting + +- [ ] Use INDEXER_MEETING_SUMMARY.md as guide +- [ ] Reference diagrams for discussions +- [ ] Note action items in the Action Items Template +- [ ] Capture decisions and concerns + +### After the Meeting + +- [ ] Review and finalize action items +- [ ] Assign owners and deadlines +- [ ] Create detailed design docs for Phase 1 +- [ ] Set up next sync meeting + +--- + +## 🔍 Document Contents Overview + +### INDEXER_MEETING_SUMMARY.md + +``` +1. Agenda (5 items) +2. Key Takeaways (TL;DR) +3. Current Architecture (Simplified) +4. Proposed Architecture +5. Top 10 Pain Points +6. Immediate Wins (5 quick improvements) +7. Phased Timeline (12 weeks) +8. Alternatives Considered +9. Open Questions (8 questions) +10. Success Metrics +11. Next Steps +12. Action Items Template +``` + +### INDEXER_FLOW_DIAGRAMS.md + +``` +1. Current Architecture - Component View +2. Current Architecture - Event Processing Flow +3. Proposed Architecture - Component View +4. Proposed Architecture - Event Processing Flow +5. Block Crawling Flow (Current vs Proposed) +6. Database Operations (Current vs Proposed) +7. Error Handling (Current vs Proposed) +8. Testing Strategy (Current vs Proposed) +9. Metrics & Observability Dashboard +10. Comparison Summary Table +``` + +### INDEXER_ARCHITECTURE_ANALYSIS.md + +``` +1. Current Architecture Overview +2. How Block Parsing Works +3. How Event Storage Works +4. Pain Points & Issues (10 detailed issues) +5. Refactoring Proposal - High-Level Architecture +6. Migration Strategy (6 phases) +7. Immediate Wins (5 quick improvements) +8. Testing Strategy +9. Alternatives Considered +10. Success Metrics +11. Risks & Mitigation +12. Open Questions +13. Conclusion & Next Steps +Appendix A: Key Files Reference +Appendix B: Glossary +``` + +--- + +## 🎨 Key Concepts at a Glance + +### Current Problems + +``` +🔴 Worker Threads → Complex inter-thread communication +🔴 Mixed Concerns → Fetching + validation + storage in one place +🔴 No Observability → Only logs, no metrics +🔴 Serial Processing → One event at a time +🔴 Many DB Calls → No batching +🔴 Hard to Test → Worker threads + tight coupling +``` + +### Proposed Solutions + +``` +🟢 Async/Await → No worker threads, simpler code +🟢 Separation of Concerns → Clear component boundaries +🟢 Built-in Metrics → Prometheus integration +🟢 Batch Operations → 10-50x performance improvement +🟢 Repository Pattern → Clean database abstraction +🟢 Dependency Injection → Easy to test and mock +``` + +--- + +## 📊 Expected Outcomes + +### Code Quality + +- Complexity: **15 → 5** (cyclomatic) +- Test Coverage: **60% → 80%+** +- Lines per Function: **100+ → <50** + +### Performance + +- Throughput: **2x improvement** +- Latency: **< 100ms per event** +- DB Calls: **30% reduction** + +### Reliability + +- Uptime: **> 99.9%** +- Recovery Time: **< 5 minutes** +- Failed Events: **< 0.1%** + +### Timeline + +- **Phase 1-2 (Weeks 1-4):** Foundation + Validation +- **Phase 3-4 (Weeks 5-8):** Processing + State Management +- **Phase 5-6 (Weeks 9-12):** Remove threads + Observability + +--- + +## 💬 Discussion Points + +### Critical Decisions Needed + +1. **Worker Threads:** Remove or keep? + + - Recommendation: **Remove** (use async/await) + +2. **Database:** Elasticsearch, Typesense, or both? + + - Recommendation: **Standardize** on one + +3. **Timeline:** Full refactor or immediate wins first? + + - Recommendation: **Both** (parallel tracks) + +4. **Backward Compatibility:** How long to support? + - Recommendation: **2 releases** + +### Optional Discussions + +5. Event prioritization strategy +6. Multi-region deployment +7. Event replay capability +8. Monitoring requirements + +--- + +## 🔗 Related Resources + +### Codebase + +``` +Key Files: +- src/components/Indexer/index.ts (490 lines) +- src/components/Indexer/crawlerThread.ts (380 lines) +- src/components/Indexer/processor.ts (207 lines) +- src/components/Indexer/processors/*.ts (13 files) +``` + +### External Documentation + +- [Ocean Protocol Docs](https://docs.oceanprotocol.com) +- [Ethers.js Provider API](https://docs.ethers.org/v6/api/providers/) +- [Node.js Worker Threads](https://nodejs.org/api/worker_threads.html) +- [Circuit Breaker Pattern](https://martinfowler.com/bliki/CircuitBreaker.html) + +### Design Patterns Referenced + +- Repository Pattern +- Strategy Pattern +- Chain of Responsibility +- Circuit Breaker +- Dependency Injection +- Event Bus + +--- + +## ✅ Pre-Meeting Validation + +**Ensure you can answer these questions before the meeting:** + +1. What is the main responsibility of the `OceanIndexer` class? +2. How does the current system handle block crawling? +3. What are the top 3 pain points you're most concerned about? +4. Which immediate win would you prioritize? +5. What are your concerns about the proposed architecture? +6. What timeline seems realistic for your team? +7. What metrics would you want to track in production? + +--- + +## 📝 Meeting Artifacts + +**After the meeting, you'll have:** + +1. ✅ **Decisions Log** + + - Worker threads: Remove/Keep + - Database choice + - Priority: Immediate wins vs full refactor + - Timeline agreement + +2. ✅ **Action Items** + + - Owner assignments + - Deadlines + - Dependencies + - Success criteria + +3. ✅ **Risk Register** + + - Identified risks + - Mitigation strategies + - Contingency plans + +4. ✅ **Next Steps** + - Phase 1 detailed design + - Performance benchmarks setup + - Team assignments + - Follow-up meeting schedule + +--- + +## 🚀 Getting Started (Post-Meeting) + +### Week 1 Tasks + +1. **Create detailed design docs** for Phase 1 components + + - ResilientRpcClient spec + - BlockScanner interface + - Metrics infrastructure + +2. **Set up performance benchmarks** + + - Current baseline measurements + - Test environment + - Monitoring tools + +3. **Begin immediate wins** + + - Extract DDO Decryption Service + - Add batch database operations + - Implement circuit breaker POC + +4. **Establish team structure** + - Assign component owners + - Set up code review process + - Create communication channels + +--- + +## 📞 Questions or Feedback? + +For questions about these documents or the proposed architecture: + +1. Open a discussion in the team channel +2. Add comments to the documents +3. Bring to the architecture sync meeting + +--- + +**Last Updated:** January 14, 2026 +**Version:** 1.0 +**Status:** Ready for Meeting + +--- + +## 🎉 Let's Build a Better Indexer! + +Good luck with your architecture review meeting! These documents should provide a solid foundation for productive discussions and clear decision-making. diff --git a/INDEXER_FLOW_DIAGRAMS.md b/INDEXER_FLOW_DIAGRAMS.md new file mode 100644 index 000000000..220410d73 --- /dev/null +++ b/INDEXER_FLOW_DIAGRAMS.md @@ -0,0 +1,712 @@ +# Ocean Node Indexer - Flow Diagrams + +Visual representations of current and proposed architectures. + +--- + +## 1. CURRENT ARCHITECTURE - COMPONENT VIEW + +``` +┌─────────────────────────────────────────────────────────────────────┐ +│ OceanIndexer │ +│ - Main coordinator in main thread │ +│ - Manages worker lifecycle │ +│ - Handles job queue (JOBS_QUEUE) │ +│ - Manages reindex tasks (INDEXING_QUEUE) │ +│ - Event emitters (INDEXER_DDO_EVENT_EMITTER) │ +└─────────────────────────────────────────────────────────────────────┘ + │ │ │ + │ Worker Thread │ Worker Thread │ Worker Thread + ↓ ↓ ↓ +┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ +│ CrawlerThread │ │ CrawlerThread │ │ CrawlerThread │ +│ Chain: 1 │ │ Chain: 137 │ │ Chain: 8996 │ +│ │ │ │ │ │ +│ while(true) { │ │ while(true) { │ │ while(true) { │ +│ ┌───────────┐ │ │ ┌───────────┐ │ │ ┌───────────┐ │ +│ │ Get last │ │ │ │ Get last │ │ │ │ Get last │ │ +│ │ indexed │ │ │ │ indexed │ │ │ │ indexed │ │ +│ │ block │ │ │ │ block │ │ │ │ block │ │ +│ └─────┬─────┘ │ │ └─────┬─────┘ │ │ └─────┬─────┘ │ +│ ↓ │ │ ↓ │ │ ↓ │ +│ ┌───────────┐ │ │ ┌───────────┐ │ │ ┌───────────┐ │ +│ │Get network│ │ │ │Get network│ │ │ │Get network│ │ +│ │ height │ │ │ │ height │ │ │ │ height │ │ +│ └─────┬─────┘ │ │ └─────┬─────┘ │ │ └─────┬─────┘ │ +│ ↓ │ │ ↓ │ │ ↓ │ +│ ┌───────────┐ │ │ ┌───────────┐ │ │ ┌───────────┐ │ +│ │Retrieve │ │ │ │Retrieve │ │ │ │Retrieve │ │ +│ │chunk │ │ │ │chunk │ │ │ │chunk │ │ +│ │events │ │ │ │events │ │ │ │events │ │ +│ └─────┬─────┘ │ │ └─────┬─────┘ │ │ └─────┬─────┘ │ +│ ↓ │ │ ↓ │ │ ↓ │ +│ ┌───────────┐ │ │ ┌───────────┐ │ │ ┌───────────┐ │ +│ │Process │ │ │ │Process │ │ │ │Process │ │ +│ │blocks │ │ │ │blocks │ │ │ │blocks │ │ +│ └─────┬─────┘ │ │ └─────┬─────┘ │ │ └─────┬─────┘ │ +│ ↓ │ │ ↓ │ │ ↓ │ +│ ┌───────────┐ │ │ ┌───────────┐ │ │ ┌───────────┐ │ +│ │Update DB │ │ │ │Update DB │ │ │ │Update DB │ │ +│ └─────┬─────┘ │ │ └─────┬─────┘ │ │ └─────┬─────┘ │ +│ ↓ │ │ ↓ │ │ ↓ │ +│ ┌───────────┐ │ │ ┌───────────┐ │ │ ┌───────────┐ │ +│ │Sleep 30s │ │ │ │Sleep 30s │ │ │ │Sleep 30s │ │ +│ └─────┬─────┘ │ │ └─────┬─────┘ │ │ └─────┬─────┘ │ +│ ↓ │ │ ↓ │ │ ↓ │ +│ } │ │ } │ │ } │ +└──────────────────┘ └──────────────────┘ └──────────────────┘ + │ │ │ + └────────────────────┴──────────────────────┘ + ↓ + ┌───────────────────────┐ + │ Database Layer │ + │ - DDO Storage │ + │ - Order Storage │ + │ - Indexer State │ + │ - DDO State │ + └───────────────────────┘ +``` + +--- + +## 2. CURRENT ARCHITECTURE - EVENT PROCESSING FLOW + +``` +Event Log (from RPC) + │ + ↓ +┌─────────────────────────────────────────────────────────────┐ +│ processChunkLogs(logs, signer, provider, chainId) │ +│ │ +│ for each log: │ +│ 1. findEventByKey(log.topics[0]) │ +│ 2. if (METADATA_CREATED/UPDATED/STATE): │ +│ ├─→ Check allowedValidators │ +│ ├─→ Get transaction receipt │ +│ ├─→ Fetch MetadataValidated events │ +│ ├─→ Validate validators │ +│ │ ├─→ Check ALLOWED_VALIDATORS list │ +│ │ └─→ For each access list: │ +│ │ └─→ For each validator: │ +│ │ └─→ Check balanceOf() │ +│ └─→ If not valid: continue (skip event) │ +│ 3. Route to processor │ +│ 4. Store in storeEvents{} │ +│ │ +│ return storeEvents │ +└─────────────────────────────────────────────────────────────┘ + │ + ↓ +┌─────────────────────────────────────────────────────────────┐ +│ Event Processor (processors/*.ts) │ +│ │ +│ MetadataEventProcessor: │ +│ ├─→ wasNFTDeployedByOurFactory() │ +│ ├─→ getEventData() - decode from receipt │ +│ ├─→ decryptDDO() - 400+ lines │ +│ │ ├─→ HTTP decryption │ +│ │ ├─→ P2P decryption │ +│ │ └─→ Local decryption │ +│ ├─→ Check authorizedPublishers │ +│ ├─→ Check authorizedPublishersList │ +│ ├─→ getTokenInfo() │ +│ ├─→ getNFTInfo() │ +│ ├─→ getPricingStatsForDddo() │ +│ ├─→ PolicyServer check │ +│ ├─→ Purgatory check │ +│ └─→ createOrUpdateDDO() │ +│ │ +│ OrderStartedEventProcessor: │ +│ ├─→ Decode event │ +│ ├─→ Get DDO from database │ +│ ├─→ Update stats.orders │ +│ ├─→ Create order record │ +│ └─→ Update DDO │ +└─────────────────────────────────────────────────────────────┘ + │ + ↓ +┌─────────────────────────────────────────────────────────────┐ +│ Database Operations │ +│ ├─→ ddoDatabase.update() │ +│ ├─→ ddoState.update() │ +│ └─→ orderDatabase.create() │ +└─────────────────────────────────────────────────────────────┘ +``` + +--- + +## 3. PROPOSED ARCHITECTURE - COMPONENT VIEW + +``` +┌──────────────────────────────────────────────────────────────┐ +│ IndexerOrchestrator │ +│ - Single coordinator │ +│ - Manages ChainIndexer lifecycle │ +│ - Health checks │ +│ - Metrics aggregation │ +│ - Event bus for notifications │ +└──────────────────────────────────────────────────────────────┘ + │ │ │ + │ async │ async │ async + ↓ ↓ ↓ +┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ +│ ChainIndexer │ │ ChainIndexer │ │ ChainIndexer │ +│ Chain: 1 │ │ Chain: 137 │ │ Chain: 8996 │ +└────────┬─────────┘ └────────┬─────────┘ └────────┬─────────┘ + │ │ │ + └─────────────────────┴──────────────────────┘ + │ + ┌──────────────────────┼──────────────────────┐ + ↓ ↓ ↓ +┌───────────────┐ ┌──────────────────┐ ┌──────────────────┐ +│ BlockScanner │ │ EventExtractor │ │ValidationPipeline│ +│ │ │ │ │ │ +│ - RPC client │ │ - Decode events │ │ - Chain of │ +│ - Fallback │ │ - Categorize │ │ validators │ +│ - Retry │ │ - Filter │ │ - Parallel exec │ +│ - Circuit │ │ │ │ │ +│ breaker │ │ │ │ │ +└───────────────┘ └──────────────────┘ └──────────────────┘ + │ │ │ + └──────────────────────┼──────────────────────┘ + ↓ + ┌──────────────────┐ + │ EventProcessor │ + │ │ + │ - Route to │ + │ handlers │ + │ - Transform to │ + │ domain models │ + └────────┬─────────┘ + │ + ↓ + ┌──────────────────┐ + │ StateManager │ + │ │ + │ - Repositories │ + │ - Transactions │ + │ - Batch ops │ + └────────┬─────────┘ + │ + ↓ + ┌──────────────────┐ + │ Database Layer │ + └──────────────────┘ +``` + +--- + +## 4. PROPOSED ARCHITECTURE - EVENT PROCESSING FLOW + +``` +Raw Event Logs + │ + ↓ +┌─────────────────────────────────────────────────────┐ +│ 1. EventExtractor.extractEvents(logs) │ +│ │ +│ Parse and categorize events: │ +│ { │ +│ metadata: [...], │ +│ orders: [...], │ +│ pricing: [...], │ +│ unknown: [...] │ +│ } │ +└──────────────────┬──────────────────────────────────┘ + │ + ↓ +┌─────────────────────────────────────────────────────┐ +│ 2. ValidationPipeline.validate(event) │ +│ │ +│ Chain of validators (can run in parallel): │ +│ │ +│ ┌──────────────────────┐ │ +│ │ FactoryValidator │→ Check NFT factory │ +│ └──────────┬───────────┘ │ +│ ↓ │ +│ ┌──────────────────────┐ │ +│ │MetadataProofValidator│→ Check signatures │ +│ └──────────┬───────────┘ │ +│ ↓ │ +│ ┌──────────────────────┐ │ +│ │ PublisherValidator │→ Check authorized │ +│ └──────────┬───────────┘ │ +│ ↓ │ +│ ┌──────────────────────┐ │ +│ │ AccessListValidator │→ Check access list │ +│ └──────────┬───────────┘ │ +│ ↓ │ +│ ┌──────────────────────┐ │ +│ │ PolicyServerValidator│→ Check policy │ +│ └──────────┬───────────┘ │ +│ ↓ │ +│ Result: { valid: boolean, errors: [...] } │ +└──────────────────┬──────────────────────────────────┘ + │ + ↓ (only valid events) +┌─────────────────────────────────────────────────────┐ +│ 3. EventProcessor.process(event) │ +│ │ +│ Route to appropriate handler: │ +│ │ +│ if (MetadataEvent): │ +│ ┌────────────────────────┐ │ +│ │MetadataCreatedHandler │ │ +│ │ - Decrypt DDO │ │ +│ │ - Fetch pricing │ │ +│ │ - Build entity │ │ +│ └────────┬───────────────┘ │ +│ ↓ │ +│ Domain Entity: DDO │ +│ │ +│ if (OrderEvent): │ +│ ┌────────────────────────┐ │ +│ │ OrderStartedHandler │ │ +│ │ - Update order count │ │ +│ │ - Build entity │ │ +│ └────────┬───────────────┘ │ +│ ↓ │ +│ Domain Entity: Order │ +└──────────────────┬──────────────────────────────────┘ + │ + ↓ +┌─────────────────────────────────────────────────────┐ +│ 4. StateManager.saveBatch(entities) │ +│ │ +│ transaction { │ +│ for each entity: │ +│ ├─→ ddoRepository.save() │ +│ ├─→ orderRepository.save() │ +│ └─→ stateRepository.update() │ +│ } │ +│ │ +│ Single database transaction, batched writes │ +└──────────────────┬──────────────────────────────────┘ + │ + ↓ +┌─────────────────────────────────────────────────────┐ +│ 5. EventBus.emit('event.processed', entity) │ +│ │ +│ Notify listeners for: │ +│ - Replication │ +│ - Notifications │ +│ - Webhooks │ +│ - Metrics │ +└─────────────────────────────────────────────────────┘ +``` + +--- + +## 5. BLOCK CRAWLING FLOW - CURRENT vs PROPOSED + +### CURRENT (with Worker Threads) + +``` +Main Thread Worker Thread (Chain 1) + │ │ + │ startThread(chainId) │ + ├──────────────────────────────→ │ + │ │ Start infinite loop + │ │ + │ ├→ getLastIndexedBlock() + │ │ (from DB) + │ │ + │ ├→ getNetworkHeight() + │ │ (RPC call) + │ │ + │ ├→ retrieveChunkEvents() + │ │ (RPC call) + │ │ + │ ├→ processBlocks() + │ │ ├→ processChunkLogs() + │ │ │ ├→ validate + │ │ │ ├→ process + │ │ │ └→ store + │ │ │ + │ │ └→ updateLastIndexedBlock() + │ │ (DB write) + │ │ + │ message: 'METADATA_CREATED' │ + │ ←──────────────────────────────┤ + │ emit event │ + │ │ + │ ├→ sleep(30s) + │ │ + │ └→ loop continues... + │ + │ stopThread(chainId) │ + ├──────────────────────────────→ │ + │ │ set stoppedCrawling = true + │ │ exit loop + +Issues: +❌ Complex message passing +❌ Global state management +❌ Hard to debug +❌ Testing requires Worker Thread mocking +``` + +### PROPOSED (with async/await) + +``` +IndexerOrchestrator ChainIndexer (Chain 1) + │ │ + │ start() │ + ├──────────────────────────────→ │ + │ │ async run() { + │ │ + │ ├→ progress = await stateManager + │ │ .getProgress() + │ │ + │ ├→ height = await blockScanner + │ │ .getLatestBlock() + │ │ + │ ├→ logs = await blockScanner + │ │ .getLogs(from, to) + │ │ + │ ├→ events = eventExtractor + │ │ .extract(logs) + │ │ + │ ├→ for each event: + │ │ result = await pipeline + │ │ .validate(event) + │ │ if (result.valid): + │ │ entity = await processor + │ │ .process(event) + │ │ batch.add(entity) + │ │ + │ ├→ await stateManager + │ │ .saveBatch(batch) + │ │ + │ onProgress(progress) │ + │ ←──────────────────────────────┤ eventBus.emit('progress') + │ │ + │ ├→ await sleep(interval) + │ │ + │ └→ } loop continues... + │ + │ stop() │ + ├──────────────────────────────→ │ + │ │ await this.stopSignal + │ │ return + +Benefits: +✅ Direct method calls +✅ Clear data flow +✅ Easy to test (just async functions) +✅ Better error handling +✅ No Worker Thread complexity +``` + +--- + +## 6. DATABASE OPERATIONS - CURRENT vs PROPOSED + +### CURRENT (Multiple Calls Per Event) + +``` +Event Processing + │ + ├─→ const { ddo, ddoState } = await getDatabase() + │ + ├─→ await ddo.retrieve(id) ← DB call 1 + │ + ├─→ await ddo.update(updatedDdo) ← DB call 2 + │ + └─→ await ddoState.update(...) ← DB call 3 + +For 100 events: ~300 database calls +``` + +### PROPOSED (Batch Operations) + +``` +Event Processing + │ + ├─→ Validate and process all events + │ (in memory, no DB calls) + │ + └─→ await stateManager.saveBatch([ + ...ddos, + ...orders, + ...stateUpdates + ]) + │ + └─→ Single transaction: + BEGIN + UPDATE indexer SET lastBlock = ... + INSERT INTO ddos VALUES ... + INSERT INTO orders VALUES ... + UPDATE ddo_state SET ... + COMMIT + +For 100 events: ~1 database transaction +Performance improvement: 100-300x +``` + +--- + +## 7. ERROR HANDLING - CURRENT vs PROPOSED + +### CURRENT (Multiple Retry Layers) + +``` +┌─────────────────────────────────────────┐ +│ retryCrawlerWithDelay() │ +│ MAX_CRAWL_RETRIES = 10 │ +│ ├─→ startCrawler() │ +│ │ └─→ tryFallbackRPCs() │ +│ │ │ +│ └─→ On failure: recursive call │ +└─────────────────────────────────────────┘ + │ + ↓ +┌─────────────────────────────────────────┐ +│ retrieveChunkEvents() │ +│ ├─→ provider.getLogs() │ +│ └─→ On error: throw │ +│ └─→ Caught by processNetworkData │ +│ └─→ chunkSize = chunkSize / 2 │ +└─────────────────────────────────────────┘ + │ + ↓ +┌─────────────────────────────────────────┐ +│ processBlocks() │ +│ └─→ On error: catch │ +│ └─→ sleep & retry same chunk │ +└─────────────────────────────────────────┘ + │ + ↓ +┌─────────────────────────────────────────┐ +│ withRetrial() │ +│ maxRetries = 5 │ +│ └─→ Used in decryptDDO │ +└─────────────────────────────────────────┘ + +Issues: +❌ 4 different retry mechanisms +❌ Unclear recovery state +❌ Potential infinite loops +❌ No circuit breaker +``` + +### PROPOSED (Unified Strategy) + +``` +┌─────────────────────────────────────────┐ +│ ResilientRpcClient │ +│ │ +│ ┌──────────────────────────────┐ │ +│ │ Circuit Breaker │ │ +│ │ States: CLOSED → OPEN → HALF│ │ +│ │ Failure threshold: 5 │ │ +│ │ Timeout: 60s │ │ +│ │ Success threshold: 2 │ │ +│ └──────────────────────────────┘ │ +│ │ │ +│ ↓ │ +│ ┌──────────────────────────────┐ │ +│ │ Retry Strategy │ │ +│ │ Max attempts: 3 │ │ +│ │ Backoff: exponential │ │ +│ │ Jitter: ±20% │ │ +│ └──────────────────────────────┘ │ +│ │ │ +│ ↓ │ +│ ┌──────────────────────────────┐ │ +│ │ Fallback Providers │ │ +│ │ Try each provider in order │ │ +│ │ Mark as unhealthy on error │ │ +│ └──────────────────────────────┘ │ +│ │ │ +│ ↓ │ +│ ┌──────────────────────────────┐ │ +│ │ Metrics Collection │ │ +│ │ - Success/failure rates │ │ +│ │ - Latency per provider │ │ +│ │ - Circuit breaker state │ │ +│ └──────────────────────────────┘ │ +└─────────────────────────────────────────┘ + +Benefits: +✅ Single retry mechanism +✅ Prevents cascade failures +✅ Clear recovery path +✅ Observable behavior +✅ Production-ready +``` + +--- + +## 8. TESTING STRATEGY - CURRENT vs PROPOSED + +### CURRENT + +``` +Integration Tests (Heavy) + │ + ├─→ Start local blockchain (Ganache) + ├─→ Deploy contracts + ├─→ Start Elasticsearch/Typesense + ├─→ Create OceanIndexer + ├─→ Wait for worker threads to start + ├─→ Publish test assets + ├─→ Wait for indexing (polling) + ├─→ Query database + └─→ Assert results + +Issues: +❌ Slow (30+ seconds per test) +❌ Flaky (timing issues) +❌ Hard to debug +❌ Worker threads hard to mock +❌ Few unit tests +``` + +### PROPOSED + +``` +Unit Tests (Fast) + │ + ├─→ EventExtractor + │ └─→ Mock logs → assert events + │ Time: ~10ms + │ + ├─→ FactoryValidator + │ └─→ Mock contract → assert validation + │ Time: ~5ms + │ + ├─→ MetadataCreatedHandler + │ └─→ Mock dependencies → assert DDO + │ Time: ~20ms + │ + └─→ StateManager + └─→ Mock repositories → assert calls + Time: ~5ms + +Integration Tests (Moderate) + │ + ├─→ ChainIndexer (end-to-end) + │ └─→ Mock RPC + DB → assert flow + │ Time: ~100ms + │ + └─→ ValidationPipeline + └─→ Mock validators → assert chain + Time: ~50ms + +Contract Tests + │ + └─→ ResilientRpcClient + └─→ Real RPC providers (staging) + Time: ~500ms + +Benefits: +✅ Fast feedback (< 1s for unit tests) +✅ Easy to debug +✅ High coverage +✅ Reliable +✅ Parallelizable +``` + +--- + +## 9. METRICS & OBSERVABILITY + +### Proposed Metrics Dashboard + +``` +┌─────────────────────────────────────────────────────────┐ +│ INDEXER DASHBOARD │ +├─────────────────────────────────────────────────────────┤ +│ │ +│ Chain Status │ +│ ┌────────┬─────────────┬──────────┬─────────┐ │ +│ │ Chain │ Last Block │ Lag │ Status │ │ +│ ├────────┼─────────────┼──────────┼─────────┤ │ +│ │ 1 │ 12,345,678 │ 10 mins │ 🟢 │ │ +│ │ 137 │ 45,678,901 │ 2 mins │ 🟢 │ │ +│ │ 8996 │ 1,234,567 │ 1 hour │ 🔴 │ │ +│ └────────┴─────────────┴──────────┴─────────┘ │ +│ │ +│ Event Processing │ +│ ┌────────────────────────────────────────────┐ │ +│ │ Events/sec: ████████░░░░ 127 avg │ │ +│ │ Blocks/sec: ██████░░░░░░ 45 avg │ │ +│ └────────────────────────────────────────────┘ │ +│ │ +│ Event Types (last hour) │ +│ ┌────────────────────────────────────────────┐ │ +│ │ MetadataCreated: ████████ 234 │ │ +│ │ MetadataUpdated: ███ 45 │ │ +│ │ OrderStarted: ██████ 123 │ │ +│ │ OrderReused: ██ 34 │ │ +│ └────────────────────────────────────────────┘ │ +│ │ +│ RPC Health │ +│ ┌────────┬──────────┬─────────┬──────────┐ │ +│ │Provider│ Latency │ Success │ Circuit │ │ +│ ├────────┼──────────┼─────────┼──────────┤ │ +│ │Infura │ 120ms │ 99.8% │ CLOSED │ │ +│ │Alchemy │ 95ms │ 99.9% │ CLOSED │ │ +│ │Public │ 450ms │ 87.2% │ OPEN │ │ +│ └────────┴──────────┴─────────┴──────────┘ │ +│ │ +│ Database Performance │ +│ ┌────────────────────────────────────────────┐ │ +│ │ Write Latency: 45ms avg │ │ +│ │ Read Latency: 12ms avg │ │ +│ │ Batch Size: 50 avg │ │ +│ └────────────────────────────────────────────┘ │ +│ │ +│ Errors (last hour) │ +│ ┌────────────────────────────────────────────┐ │ +│ │ RPC Errors: █ 5 │ │ +│ │ Validation Errors: ██ 12 │ │ +│ │ DB Errors: 0 │ │ +│ └────────────────────────────────────────────┘ │ +└─────────────────────────────────────────────────────────┘ +``` + +### Key Metrics + +```typescript +// Counters +indexer_blocks_processed_total{chain="1"} +indexer_events_processed_total{chain="1", type="metadata"} +indexer_rpc_calls_total{chain="1", provider="infura", result="success"} +indexer_validation_failures_total{chain="1", validator="factory"} + +// Gauges +indexer_last_indexed_block{chain="1"} +indexer_block_lag_seconds{chain="1"} +indexer_chain_status{chain="1"} // 1=healthy, 0=unhealthy + +// Histograms +indexer_block_processing_duration_seconds{chain="1"} +indexer_event_processing_duration_seconds{type="metadata"} +indexer_rpc_latency_seconds{provider="infura"} +indexer_db_write_duration_seconds + +// Summaries +indexer_batch_size{operation="save_ddos"} +``` + +--- + +## 10. COMPARISON SUMMARY + +| Aspect | Current | Proposed | Improvement | +| ------------------- | -------------------------------- | ------------------------------------- | -------------------- | +| **Architecture** | Worker threads per chain | Async ChainIndexer classes | Simpler | +| **Code Complexity** | High (nested, mixed concerns) | Low (SRP, clear layers) | 60% reduction | +| **Testing** | Integration-heavy | Unit test friendly | 10x faster tests | +| **Performance** | Serial processing, many DB calls | Batch operations, parallel validation | 2-10x faster | +| **Error Handling** | Multiple retry mechanisms | Unified circuit breaker | More reliable | +| **Observability** | Logs only | Metrics + logs + tracing | Production-ready | +| **Maintainability** | Hard to modify | Easy to extend | New feature < 1 week | +| **Memory Usage** | Thread overhead | Efficient async | 20-30% reduction | +| **Debugging** | Thread dumps, message tracing | Stack traces, clear flow | Much easier | + +--- + +**These diagrams should be used alongside the detailed architecture document for the meeting discussion.** diff --git a/INDEXER_MEETING_SUMMARY.md b/INDEXER_MEETING_SUMMARY.md new file mode 100644 index 000000000..ab6336d8a --- /dev/null +++ b/INDEXER_MEETING_SUMMARY.md @@ -0,0 +1,491 @@ +# Ocean Node Indexer - Meeting Summary + +## Architecture Review & Refactoring Direction + +**Date:** January 14, 2026 +**Duration:** 90 minutes +**Goal:** Align on architecture & produce draft refactoring proposal + +--- + +## 📋 AGENDA + +1. **Current Architecture Overview** (15 min) +2. **Pain Points Discussion** (20 min) +3. **Proposed Solutions** (30 min) +4. **Priorities & Timeline** (15 min) +5. **Open Questions & Next Steps** (10 min) + +--- + +## 🎯 KEY TAKEAWAYS (TL;DR) + +### What Works + +✅ Successfully indexes multiple chains +✅ Handles reindexing operations +✅ Validates events through multiple layers +✅ Stores comprehensive metadata + +### What Needs Improvement + +❌ **High Complexity** - Worker threads, mixed concerns +❌ **Limited Observability** - Hard to debug production issues +❌ **Testing Challenges** - Worker threads difficult to test +❌ **Performance Bottlenecks** - Serial processing, many RPC calls +❌ **Maintainability** - Large functions, tight coupling + +--- + +## 📊 CURRENT ARCHITECTURE (SIMPLIFIED) + +``` +OceanIndexer (Main Process) + │ + ├──► Worker Thread (Chain 1) + │ └──► while(true) { + │ - Get new blocks + │ - Retrieve events + │ - Process events + │ - Update database + │ - Sleep 30s + │ } + │ + ├──► Worker Thread (Chain 2) + ├──► Worker Thread (Chain 3) + └──► ... + +Issues: +- Complex inter-thread messaging +- Global mutable state +- Mixed concerns (fetching + validation + storage) +- Hard to test +``` + +--- + +## 🏗️ PROPOSED ARCHITECTURE + +``` +IndexerOrchestrator + │ + ├──► ChainIndexer(1) ──► BlockScanner ──► ResilientRpcClient + │ │ + │ ├──► EventExtractor + │ │ + │ ├──► ValidationPipeline + │ │ ├─ FactoryValidator + │ │ ├─ MetadataValidator + │ │ ├─ PublisherValidator + │ │ └─ PolicyValidator + │ │ + │ ├──► EventProcessor + │ │ ├─ MetadataHandler + │ │ ├─ OrderHandler + │ │ └─ PricingHandler + │ │ + │ └──► StateManager (Database Layer) + │ + ├──► ChainIndexer(2) + └──► ChainIndexer(N) + +Benefits: +✓ No worker threads (async/await) +✓ Clear separation of concerns +✓ Easy to test each component +✓ Better error handling +✓ Built-in observability +``` + +--- + +## 🔴 TOP 10 PAIN POINTS + +### 1. Worker Thread Complexity + +**Problem:** Inter-thread messaging, shared state, race conditions +**Impact:** Hard to debug, test, and extend +**Solution:** Replace with async/await ChainIndexer classes + +### 2. Monolithic Event Processing + +**Problem:** `processChunkLogs()` - 180+ lines, deeply nested +**Impact:** Hard to read, maintain, add features +**Solution:** Extract to ValidationPipeline + EventProcessor + +### 3. No Error Recovery Strategy + +**Problem:** Multiple retry mechanisms, no circuit breaker +**Impact:** Unclear state after failures, potential infinite loops +**Solution:** Implement ResilientRpcClient with circuit breaker + +### 4. DDO Decryption Complexity + +**Problem:** 400+ line method handling HTTP/P2P/local +**Impact:** Hard to test, unclear error messages +**Solution:** Extract to DdoDecryptionService + +### 5. Global Mutable State + +**Problem:** Global queues, flags scattered across files +**Impact:** Race conditions, hard to test +**Solution:** Encapsulate state in classes + +### 6. Serial Event Processing + +**Problem:** One event at a time, many RPC calls +**Impact:** Slow throughput +**Solution:** Batch operations, parallel validation + +### 7. Direct Database Coupling + +**Problem:** `await getDatabase()` everywhere +**Impact:** Hard to test, no transactions +**Solution:** Repository pattern, StateManager + +### 8. Limited Observability + +**Problem:** Only logs, no metrics +**Impact:** Can't track performance, debug issues +**Solution:** Add Prometheus metrics, structured logging + +### 9. Testing Difficulties + +**Problem:** Worker threads, database dependencies +**Impact:** Few unit tests, long integration tests +**Solution:** Dependency injection, interfaces + +### 10. Unclear Configuration + +**Problem:** Env vars, hardcoded values, no validation +**Impact:** Deployment issues, unclear behavior +**Solution:** Config class with validation + +--- + +## 💡 IMMEDIATE WINS (Can Start Tomorrow) + +These provide value without full refactor: + +### 1. Extract DDO Decryption Service + +**Effort:** 1-2 days +**Impact:** High (cleaner code, testable) + +```typescript +class DdoDecryptionService { + async decrypt(params: DecryptParams): Promise { + if (isHttp(params.decryptorURL)) { + return this.decryptHttp(params) + } else if (isP2P(params.decryptorURL)) { + return this.decryptP2P(params) + } else { + return this.decryptLocal(params) + } + } +} +``` + +### 2. Add Batch Database Operations + +**Effort:** 2-3 days +**Impact:** Very High (10-50x performance) + +```typescript +// Before: O(n) database calls +for (const event of events) { + await database.save(event) +} + +// After: O(1) database calls +await database.saveBatch(events) +``` + +### 3. Extract Validation Functions + +**Effort:** 2-3 days +**Impact:** High (readability, testability) + +```typescript +class EventValidation { + async validateFactory(event: DecodedEvent): Promise + async validateMetadataProof(event: MetadataEvent): Promise + async validatePublisher(event: MetadataEvent): Promise + async validateAccessList(event: MetadataEvent): Promise +} +``` + +### 4. Add Circuit Breaker for RPC + +**Effort:** 1-2 days +**Impact:** High (reliability) + +```typescript +class ResilientRpcClient { + private circuitBreaker: CircuitBreaker + + async execute(fn: RpcCall): Promise { + return this.circuitBreaker.execute(() => this.tryWithFallback(fn)) + } +} +``` + +### 5. Add Prometheus Metrics + +**Effort:** 2-3 days +**Impact:** Very High (observability) + +```typescript +metrics.indexer_blocks_processed_total.inc() +metrics.indexer_events_processed{type="metadata"}.inc() +metrics.indexer_processing_duration_seconds.observe(duration) +metrics.indexer_rpc_errors_total{provider="infura"}.inc() +``` + +**Total Effort:** ~2 weeks +**Total Impact:** Significant quality & performance improvements + +--- + +## 📅 PHASED REFACTORING TIMELINE + +### Phase 1: Foundation (Week 1-2) + +- ResilientRpcClient with circuit breaker +- BlockScanner interface +- Metrics infrastructure +- Tests for new components + +### Phase 2: Validation (Week 3-4) + +- Validator interface + implementations +- ValidationPipeline +- Refactor processChunkLogs() + +### Phase 3: Event Processing (Week 5-6) + +- EventHandler interface + implementations +- Domain models (separate from DB) +- Refactor processors + +### Phase 4: State Management (Week 7-8) + +- Repository pattern +- Transactional StateManager +- Batch operations + +### Phase 5: Remove Worker Threads (Week 9-10) + +- ChainIndexer class +- Replace threads with async loops +- Direct method calls (no messages) + +### Phase 6: Observability (Week 11-12) + +- Comprehensive metrics +- Health checks +- Monitoring dashboards + +**Total Timeline:** ~12 weeks (3 months) + +--- + +## 🎲 ALTERNATIVES CONSIDERED + +| Alternative | Pros | Cons | Decision | +| ------------------------- | ------------------- | ----------------------- | ------------------- | +| **Keep Worker Threads** | True parallelism | Complex, hard to debug | ❌ Remove | +| **Event Sourcing** | Audit trail, replay | Too complex | ❌ Not now | +| **Message Queue (Kafka)** | Decoupled, scalable | Infrastructure overhead | ⏸️ Revisit at scale | +| **GraphQL Subscriptions** | Real-time updates | Not needed | ❌ Out of scope | + +--- + +## ❓ OPEN QUESTIONS FOR DISCUSSION + +### Technical Questions + +1. **Worker Threads:** Do we truly need parallelism or is async/await sufficient? + + - Current: 1 thread per chain + - Proposed: Async ChainIndexer classes + - Decision needed: ? + +2. **Database Choice:** Standardize on Elasticsearch or Typesense, or keep both? + + - Current: Both supported + - Maintenance cost: High + - Decision needed: ? + +3. **Event Prioritization:** Should metadata events be prioritized over pricing events? + + - Current: FIFO processing + - Risk: Important events delayed by minor ones + - Decision needed: ? + +4. **Reindex Strategy:** Should reindexing be a separate service? + - Current: Mixed with normal indexing + - Potential: Dedicated reindex service + - Decision needed: ? + +### Product Questions + +5. **Monitoring Requirements:** What metrics are critical for production? + + - Blocks/sec? + - Events/sec? + - RPC latency? + - Error rates? + - Decision needed: ? + +6. **SLA Requirements:** What are our uptime/reliability targets? + - 99.9% uptime? + - Max 5 min recovery time? + - < 0.1% failed events? + - Decision needed: ? + +### Process Questions + +7. **Backward Compatibility:** How long support old schemas? + + - Database migrations + - API compatibility + - Decision needed: ? + +8. **Rollout Strategy:** Big bang or gradual rollout? + - Feature flags? + - Parallel running? + - Decision needed: ? + +--- + +## 📈 SUCCESS METRICS + +### Code Quality Targets + +- ✅ Cyclomatic Complexity: < 5 (currently ~15) +- ✅ Test Coverage: > 80% (currently ~60%) +- ✅ Lines per Function: < 50 (currently 100+) +- ✅ Type Safety: 100% (no `any`) + +### Performance Targets + +- ✅ Throughput: 2x improvement in events/sec +- ✅ Latency: < 100ms per event +- ✅ Memory: Stable (no leaks) +- ✅ RPC Calls: Reduce by 30% + +### Reliability Targets + +- ✅ Uptime: > 99.9% +- ✅ Failed Events: < 0.1% +- ✅ Recovery Time: < 5 minutes +- ✅ Reindex Success: > 99% + +### Maintainability Targets + +- ✅ Onboarding: < 2 days +- ✅ Bug Fix Time: < 4 hours +- ✅ Feature Time: < 1 week +- ✅ Incidents: < 1/month + +--- + +## 🚀 NEXT STEPS + +### Today (This Meeting) + +1. Review and discuss document +2. Agree on high-level direction +3. Prioritize: Immediate wins vs full refactor? +4. Assign investigation tasks + +### Next Week + +1. Detailed design for Phase 1 +2. Create ADRs (Architecture Decision Records) +3. Set up performance benchmarks +4. Begin immediate wins implementation + +### Ongoing + +1. Weekly architecture sync +2. Code review focus on quality +3. Regular performance testing +4. Documentation updates + +--- + +## 📚 REFERENCE MATERIALS + +### Main Document + +See: `INDEXER_ARCHITECTURE_ANALYSIS.md` (detailed 13-section analysis) + +### Key Code Files + +``` +src/components/Indexer/ +├── index.ts - Main coordinator (490 lines) +├── crawlerThread.ts - Worker thread (380 lines) +├── processor.ts - Event processing (207 lines) +└── processors/ + ├── BaseProcessor.ts - Base class (442 lines) + └── MetadataEventProcessor.ts - Metadata (403 lines) +``` + +### Related Documentation + +- Ocean Protocol Docs: https://docs.oceanprotocol.com +- Ethers.js Provider: https://docs.ethers.org/v6/api/providers/ +- Worker Threads: https://nodejs.org/api/worker_threads.html + +--- + +## 🤝 MEETING ROLES + +- **Facilitator:** _[Name]_ +- **Note Taker:** _[Name]_ +- **Timekeeper:** _[Name]_ +- **Decision Maker:** _[Name]_ + +--- + +## ✅ ACTION ITEMS TEMPLATE + +_To be filled during meeting_ + +| Action | Owner | Deadline | Status | +| -------------------------------- | --------- | -------------- | ------ | +| Review detailed architecture doc | Team | Before meeting | ✅ | +| Decision on worker threads | Tech Lead | End of meeting | ⏳ | +| Design Phase 1 components | Architect | Next week | ⏳ | +| Set up performance benchmarks | DevOps | Next week | ⏳ | +| Implement circuit breaker POC | Dev 1 | Week 2 | ⏳ | +| Extract validation functions | Dev 2 | Week 2 | ⏳ | + +--- + +## 💬 DISCUSSION NOTES + +_Space for notes during meeting_ + +### Architecture Direction + +- + +### Priorities + +- + +### Concerns Raised + +- + +### Decisions Made + +- + +--- + +**Remember:** The goal is alignment and direction, not final implementation details! diff --git a/INDEXER_USE_CASES_AND_FLOWS.md b/INDEXER_USE_CASES_AND_FLOWS.md new file mode 100644 index 000000000..c91744590 --- /dev/null +++ b/INDEXER_USE_CASES_AND_FLOWS.md @@ -0,0 +1,2379 @@ +# Ocean Node Indexer - Use Cases & Current Flows Documentation + +**Created:** January 2026 +**Purpose:** Deep review of all indexer use cases and execution flows for refactoring discussion +**Status:** Pre-Meeting Preparation Document + +--- + +## Table of Contents + +1. [Overview](#overview) +2. [Use Cases](#use-cases) +3. [Event Monitoring Deep Dive](#event-monitoring-deep-dive) + - How Event Monitoring Works + - Event Identification & Routing + - Detailed Event Handling Per Type + - Event Processing Pipeline Summary + - Performance Characteristics +4. [Current Flows - Detailed Analysis](#current-flows-detailed-analysis) +5. [Event Processing Flows](#event-processing-flows) +6. [Error Handling & Retry Mechanisms](#error-handling--retry-mechanisms) +7. [Async/Await Architecture & Concurrency](#asyncawait-architecture--concurrency) +8. [Failure Scenarios & Recovery](#failure-scenarios--recovery) +9. [State Management](#state-management) +10. [Observations & Pain Points](#observations--pain-points) +11. [Summary](#summary) + +--- + +## Overview + +The Ocean Node Indexer is responsible for: + +- Continuously monitoring multiple blockchain networks for Ocean Protocol events +- Processing and validating events (metadata, orders, pricing) +- Storing processed data in databases (Elasticsearch/Typesense) +- Managing indexing state per chain +- Supporting reindexing operations (full chain or specific transactions) +- Emitting events for downstream consumers + +**Architecture Summary:** + +- One `OceanIndexer` instance (orchestrator) +- One `ChainIndexer` instance per supported blockchain network +- All operations use async/await (no worker threads) +- Event-driven communication via `EventEmitter` +- Event processors for each event type (12 different event types) +- Database layer for persistence (Elasticsearch/Typesense) +- Job queue for admin commands +- RPC client with fallback support + +--- + +## Use Cases + +### UC1: Normal Block Crawling (Continuous Indexing) + +**Description:** Continuously monitor blockchain networks and process new blocks containing Ocean Protocol events. + +**Trigger:** Automatic on node startup, runs indefinitely + +**Actors:** System (automatic) + +**Preconditions:** + +- Node is running +- Database is accessible +- RPC providers are configured +- Supported chains are configured + +**Main Flow:** + +1. Node starts → `OceanIndexer` constructor called +2. `startThreads()` invoked +3. For each supported chain: + - Validate RPC connection (with fallback support) + - Create `Blockchain` instance + - Create `ChainIndexer` instance + - Call `indexer.start()` (non-blocking, runs in background) +4. Each `ChainIndexer` runs asynchronously: + - Enters infinite `indexLoop()` using async/await + - Gets last indexed block from DB + - Gets current network height via RPC + - Calculates blocks to process (respects chunk size) + - **Event Retrieval:** Calls `provider.getLogs()` with Ocean Protocol event topic filters + - **Event Processing:** Routes events to appropriate processors + - **Database Updates:** Stores processed data + - Updates last indexed block + - Sleeps for interval (default 30s) + - Repeats until stop signal + +**Postconditions:** + +- All supported chains are being indexed concurrently +- Events are being processed and stored in real-time +- Last indexed block is updated per chain +- Event emitters notify downstream consumers + +--- + +### UC2: Process Metadata Created Event + +**Description:** Process a `MetadataCreated` event, validate it, decrypt DDO, and store it. + +**Trigger:** Event detected during block crawling + +**Actors:** System (automatic) + +**Preconditions:** + +- Block crawling is active +- Event log found in block range +- Event matches `METADATA_CREATED` signature + +**Main Flow:** + +1. Event log detected in `retrieveChunkEvents()` +2. Event routed to `processChunkLogs()` +3. Event identified as `METADATA_CREATED` +4. **Validation Phase:** + - Check if metadata validation is enabled + - Get transaction receipt + - Extract `MetadataValidated` events from receipt + - Validate validators against `allowedValidators` list + - If `allowedValidatorsList` configured: + - For each access list contract: + - Check `balanceOf()` for each validator + - Require at least one validator has balance > 0 + - If validation fails → skip event (continue to next) +5. **Processing Phase:** + - Get `MetadataEventProcessor` instance + - Call `processor.processEvent()` + - Check if NFT was deployed by Ocean Factory + - Decode event data from transaction receipt + - **Decrypt DDO:** + - Try HTTP decryption (from metadata URL) + - Try P2P decryption (from libp2p network) + - Try local decryption (if available) + - Handle nonce management + - Verify signatures + - Validate DDO hash matches generated DID + - Check authorized publishers + - Get NFT info (name, symbol, owner, etc.) + - Get token info (datatoken addresses, names, symbols) + - Get pricing stats (dispensers, exchanges, rates) + - Check purgatory status + - Check policy server + - Build DDO with `indexedMetadata` +6. **Storage Phase:** + - Update or create DDO in database + - Update DDO state (validation tracking) + - Emit `METADATA_CREATED` event to parent thread + - Parent thread emits to `INDEXER_DDO_EVENT_EMITTER` + +**Postconditions:** + +- DDO stored in database +- DDO state updated +- Event emitted for listeners + +**Error Handling:** + +- Validation failures → event skipped, logged +- Decryption failures → event skipped, DDO state marked invalid +- Database failures → error logged, event not stored + +--- + +### UC3: Process Metadata Updated Event + +**Description:** Process a `MetadataUpdated` event, update existing DDO. + +**Trigger:** Event detected during block crawling + +**Actors:** System (automatic) + +**Preconditions:** + +- Block crawling is active +- Event log found +- Event matches `METADATA_UPDATED` signature + +**Main Flow:** + +1. Similar to UC2 (Metadata Created) +2. Uses same `MetadataEventProcessor` +3. Validation phase identical +4. Processing phase: + - Retrieves existing DDO from database + - Updates DDO with new metadata + - Merges pricing and order stats +5. Storage phase: + - Updates DDO in database (not creates) + - Updates DDO state + - Emits `METADATA_UPDATED` event + +**Postconditions:** + +- DDO updated in database +- Event emitted + +--- + +### UC4: Process Order Started Event + +**Description:** Process an `OrderStarted` event, update order count and create order record. + +**Trigger:** Event detected during block crawling + +**Actors:** System (automatic) + +**Preconditions:** + +- Block crawling is active +- Event log found +- Event matches `ORDER_STARTED` signature + +**Main Flow:** + +1. Event log detected +2. Routed to `OrderStartedEventProcessor` +3. Decode event data: + - Consumer address + - Payer address + - Datatoken address + - NFT address + - Service ID + - Start order ID +4. Retrieve DDO from database +5. Update order count in DDO stats +6. Create order record in order database +7. Update DDO in database +8. Emit `ORDER_STARTED` event + +**Postconditions:** + +- Order record created +- DDO updated with order count +- Event emitted + +--- + +### UC5: Process Pricing Events (Dispenser/Exchange) + +**Description:** Process dispenser or exchange events (created, activated, deactivated, rate changed). + +**Trigger:** Event detected during block crawling + +**Actors:** System (automatic) + +**Preconditions:** + +- Block crawling is active +- Event log found +- Event matches pricing event signature + +**Main Flow:** + +1. Event identified (DispenserCreated, DispenserActivated, ExchangeActivated, etc.) +2. Routed to appropriate processor +3. Decode event data +4. Retrieve DDO from database +5. Update pricing arrays in DDO stats +6. Update DDO in database +7. Emit event (if applicable) + +**Postconditions:** + +- DDO pricing info updated +- Event emitted + +--- + +### UC6: Reindex Specific Transaction + +**Description:** Re-process a specific transaction that was already indexed (e.g., after bug fix). + +**Trigger:** Admin command via API + +**Actors:** Admin/Operator + +**Preconditions:** + +- Indexer is running +- ChainIndexer exists for chain +- Transaction hash is valid + +**Main Flow:** + +1. Admin calls `reindexTx` API endpoint +2. `ReindexTxHandler` validates command: + - Validates chainId is supported + - Validates txId format +3. `indexer.addReindexTask()` called on OceanIndexer +4. Job created and added to `JOBS_QUEUE` +5. Task added to `INDEXING_QUEUE` (OceanIndexer instance) +6. OceanIndexer calls `chainIndexer.addReindexTask(task)` +7. ChainIndexer adds task to its `reindexQueue` (instance property) +8. During next indexing loop iteration: + - `processReindexQueue()` called + - Task shifted from queue (FIFO) + - Get transaction receipt from RPC: `provider.getTransactionReceipt(txId)` + - Extract logs from receipt (all logs or specific index) + - Process logs using `processChunkLogs(logs, signer, provider, chainId)` + - ChainIndexer emits `REINDEX_QUEUE_POP` event +9. OceanIndexer event listener: + - Removes task from `INDEXING_QUEUE` + - Updates job status to SUCCESS via `updateJobStatus()` + - Emits to `INDEXER_CRAWLING_EVENT_EMITTER` + +**Postconditions:** + +- Transaction re-processed +- DDO updated in database +- Job status updated (DELIVERED → PENDING → SUCCESS) +- Event emitted to downstream consumers + +**Error Handling:** + +- If receipt not available yet → task remains in queue, retried next iteration +- If processing fails → error logged, task removed from queue (lost) +- If ChainIndexer not found → error returned, job not created +- No retry limit → task processed until successful or error + +--- + +### UC7: Reindex Entire Chain + +**Description:** Reset indexing for a chain and re-index from a specific block (or deployment block). + +**Trigger:** Admin command via API + +**Actors:** Admin/Operator + +**Preconditions:** + +- Indexer is running +- Chain is supported +- Optional: block number provided + +**Main Flow:** + +1. Admin calls `reindexChain` API endpoint +2. `ReindexChainHandler` validates command: + - Validates chainId is supported + - Validates block number (if provided) +3. `indexer.resetCrawling(chainId, blockNumber)` called on OceanIndexer +4. Check if ChainIndexer is running: + - Get indexer from `indexers.get(chainId)` + - If not running → call `startThread(chainId)` to create and start + - If start fails → return error +5. Job created and added to `JOBS_QUEUE` +6. OceanIndexer calls `chainIndexer.triggerReindexChain(blockNumber)` +7. ChainIndexer calculates target block: + - Get deployment block for chain + - If blockNumber provided and > deployment block → use it + - Else if `startBlock` configured and > deployment block → use it + - Else → use deployment block +8. Set `this.reindexBlock` to target block +9. During next indexing loop iteration: + - Check `this.reindexBlock !== null` + - Get network height + - Call `this.reindexChain(currentBlock, networkHeight)` + - Validate reindexBlock < networkHeight + - Update last indexed block: `updateLastIndexedBlockNumber(reindexBlock)` + - Delete all assets: `deleteAllAssetsFromChain()` + - If deletion fails → revert last block: `updateLastIndexedBlockNumber(currentBlock)` + - Clear `this.reindexBlock = null` + - ChainIndexer emits `REINDEX_CHAIN` event +10. OceanIndexer event listener: + - Updates job status (SUCCESS or FAILURE) via `updateJobStatus()` + - Emits to `INDEXER_CRAWLING_EVENT_EMITTER` + +**Postconditions:** + +- All assets deleted from chain (database cleared) +- Last indexed block reset to target block +- Normal crawling resumes from reset block +- Job status updated (DELIVERED → PENDING → SUCCESS/FAILURE) +- Downstream consumers notified + +**Error Handling:** + +- Invalid block (> network height) → error logged, reindex aborted, reindexBlock cleared +- Deletion failure → last block reverted to currentBlock, reindex fails, returns false +- Update block failure → error logged, reindex aborted, returns false +- ChainIndexer not found/can't start → error returned, job not created +- Database errors → error logged, manual retry needed + +--- + +### UC8: Version-Based Auto-Reindexing + +**Description:** Automatically trigger reindexing when node version requires it. + +**Trigger:** Node startup, before starting threads + +**Actors:** System (automatic) + +**Preconditions:** + +- Node is starting +- Database is accessible +- Version check enabled + +**Main Flow:** + +1. `startThreads()` called +2. `checkAndTriggerReindexing()` called first +3. Get current node version from `process.env.npm_package_version` +4. Get database version from `sqliteConfig` +5. Compare with `MIN_REQUIRED_VERSION` ('0.2.2') +6. If reindexing needed: + - For each supported chain: + - Delete all assets from chain + - Reset last indexed block to deployment block + - Log results + - Update database version to current +7. Continue with normal thread startup + +**Postconditions:** + +- Chains reindexed if needed +- Database version updated +- Normal indexing resumes + +**Error Handling:** + +- Database not reachable → reindexing skipped, error logged +- Deletion failures → error logged per chain, continues with other chains + +--- + +### UC9: Stop Indexing for Chain + +**Description:** Gracefully stop indexing for a specific chain. + +**Trigger:** Admin command or node shutdown + +**Actors:** Admin/System + +**Preconditions:** + +- Indexer is running +- Chain is being indexed + +**Main Flow:** + +1. `indexer.stopThread(chainId)` called on OceanIndexer +2. Get ChainIndexer: `indexer = indexers.get(chainId)` +3. If indexer exists: + - Call `await indexer.stop()` (async, waits for completion) + - ChainIndexer internally: + - Sets `this.stopSignal = true` + - Logs: "Stopping indexer for chain X, waiting for graceful shutdown..." + - Waits for loop to exit: `while (this.isRunning) await sleep(100)` + - Indexing loop checks `stopSignal` on each iteration + - When `stopSignal` is true → breaks loop + - Sets `this.isRunning = false` + - Logs: "Chain X indexer stopped" + - OceanIndexer: + - Removes from map: `indexers.delete(chainId)` + - Logs: "Stopped indexer for chain X" +4. Else: + - Error logged: "Unable to find running indexer for chain X" + +**Postconditions:** + +- ChainIndexer stopped gracefully +- Indexing loop exited cleanly +- Instance removed from indexers map +- No more indexing for chain +- Any in-progress iteration completes before stop + +**Benefits of Current Implementation:** + +- Graceful shutdown (waits for current iteration to complete) +- No abrupt termination mid-processing +- Clean state (last indexed block updated) +- Async/await makes shutdown explicit and reliable + +--- + +### UC10: Handle RPC Connection Failures + +**Description:** Handle RPC provider failures and fallback to alternative providers. + +**Trigger:** RPC call failure during block retrieval + +**Actors:** System (automatic) + +**Preconditions:** + +- Block crawling active +- RPC call fails + +**Main Flow:** + +1. `retrieveChunkEvents()` called +2. `provider.getLogs()` fails +3. Exception caught in `processNetworkData()` +4. Error logged +5. **Adaptive chunk sizing:** + - `chunkSize = Math.floor(chunkSize / 2)` + - Minimum chunk size = 1 + - `successfulRetrievalCount` reset to 0 +6. Next iteration uses smaller chunk +7. After 3 successful retrievals: + - Revert to original `chunkSize` +8. **RPC Fallback (during startup):** + - `startCrawler()` checks network readiness + - If not ready → `tryFallbackRPCs()` called + - Tries each fallback RPC in order + - If any succeeds → use that provider +9. **Retry Logic:** + - `retryCrawlerWithDelay()` called during startup + - Max 10 retries + - Retry interval = `max(fallbackRPCs.length * 3000, 5000)` + - Recursive retry on failure + +**Postconditions:** + +- Smaller chunks processed +- Alternative RPC used if available +- Crawling continues + +**Error Handling:** + +- All RPCs fail → retry up to 10 times +- After max retries → worker thread not started +- Database check → if DB unreachable, give up + +--- + +## Event Monitoring Deep Dive + +### How Event Monitoring Works + +The Ocean Node Indexer monitors blockchain events using a sophisticated multi-step process that ensures no events are missed while maintaining performance and reliability. + +#### 1. Event Discovery Process + +**Location:** `ChainIndexer.ts` - `indexLoop()` → `retrieveChunkEvents()` + +**Step-by-Step:** + +``` +1. ChainIndexer maintains current position (lastIndexedBlock) + ├─> Retrieved from database on each iteration + └─> Persisted after successful processing + +2. Get network height from RPC + ├─> Current blockchain tip + └─> Determines how many blocks to process + +3. Calculate chunk to process + ├─> remainingBlocks = networkHeight - lastIndexedBlock + ├─> blocksToProcess = min(chunkSize, remainingBlocks) + └─> Default chunkSize from config (typically 100-1000 blocks) + +4. Call provider.getLogs() with filters + ├─> fromBlock: lastIndexedBlock + 1 + ├─> toBlock: lastIndexedBlock + blocksToProcess + ├─> topics: [ALL_OCEAN_EVENT_HASHES] + └─> Returns array of Log objects + +5. Process logs through pipeline + ├─> Identify event type by topic hash + ├─> Route to appropriate processor + ├─> Validate and transform + └─> Store in database + +6. Update lastIndexedBlock + └─> Only updated on successful processing +``` + +**Event Topic Filtering:** + +The indexer listens for these event signatures (identified by topic[0]): + +```typescript +EVENT_HASHES = { + '0x5463569dcc320958360074a9ab27e809e8a6942c394fb151d139b5f7b4ecb1bd': MetadataCreated + '0x127c3f87d5f806ee52e3045f6c9c39e0ef0a3c96c0c75f3e18b84917b88dc2b3': MetadataUpdated + '0x1f432bc9a19ebfc7c5e1cb25e4faeea2f7e162a3af75ae6fd7f4d7ba24d93052': MetadataState + '0xa0e0424cb5b1293c12c34b4a4867cc2a426e665be57d01dfa48aaaa0c90ec7c0': OrderStarted + '0x6e0dd7434b30641fa1c2e87c22ac88fc95c44b50f8b0c24b8c01c3ac88a41f65': OrderReused + '0xdcda18b5bc4d3564ccef3d80910ad33cc3e2bb60f09e0d1be21501f97a71ea51': DispenserCreated + '0x6e0cf36da82bc089a41b8ba5a4aaa4e6f4f3c36a2ba0e47f8d4b5bd4c82b17ab': DispenserActivated + '0x53ae36d41e99f27c63c6c8d7d1c8fd58e1f1dbc7d5d9c0d8e2f6d8a5c4b3a2b1': DispenserDeactivated + '0xdcda18b5bc4d3564ccef3d80910ad33cc3e2bb60f09e0d1be21501f97a71ea52': ExchangeCreated + '0x6e0cf36da82bc089a41b8ba5a4aaa4e6f4f3c36a2ba0e47f8d4b5bd4c82b17ac': ExchangeActivated + '0x53ae36d41e99f27c63c6c8d7d1c8fd58e1f1dbc7d5d9c0d8e2f6d8a5c4b3a2b2': ExchangeDeactivated + '0x7b3b3f0f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f': ExchangeRateChanged +} +``` + +#### 2. Event Identification & Routing + +**Location:** `processor.ts` - `processChunkLogs()` + +When logs are retrieved, each log goes through: + +``` +For each log in retrieved logs: + │ + ├─> 1. Extract topic[0] (event signature hash) + │ + ├─> 2. Look up in EVENT_HASHES mapping + │ └─> Identifies event type (e.g., METADATA_CREATED) + │ + ├─> 3. Check if Ocean Protocol event + │ └─> If not recognized → skip + │ + ├─> 4. Apply event-specific validation + │ └─> For metadata events: check validators + │ └─> For other events: basic validation + │ + ├─> 5. Get or create processor instance + │ └─> Cached per (eventType + chainId) + │ + ├─> 6. Call processor.processEvent() + │ └─> Event-specific handling logic + │ + └─> 7. Store result for batch emission +``` + +### Detailed Event Handling Per Type + +#### A. METADATA_CREATED Event + +**Trigger:** When a new data asset is published on-chain + +**On-Chain Event Data:** + +- `owner` - Publisher address +- `flags` - Encryption flags +- `metadata` - Encrypted/compressed DDO +- `metadataHash` - SHA256 hash of DDO +- `validateTime` - Timestamp + +**Processing Steps:** + +``` +MetadataEventProcessor.processEvent(): + │ + ├─> 1. FACTORY VALIDATION + │ └─> wasNFTDeployedByOurFactory() + │ ├─> Instantiate ERC721Factory contract + │ ├─> Call getCurrentNFTCount() + │ ├─> Loop through all NFTs + │ ├─> Check if NFT address matches + │ └─> If not deployed by Ocean → REJECT, return null + │ + ├─> 2. DECODE EVENT DATA + │ └─> getEventData(provider, txHash, ERC721Template.abi, eventName) + │ ├─> Fetch transaction receipt + │ ├─> Find log matching event hash + │ ├─> Parse with contract ABI + │ └─> Extract: owner, flags, metadata, metadataHash, etc. + │ + ├─> 3. DDO DECRYPTION (Complex, 400+ lines) + │ └─> decryptDDO(decryptorURL, flag, owner, nftAddress, chainId, txId, metadataHash, metadata) + │ │ + │ ├─> Check flag bit 2 (encrypted vs compressed) + │ │ + │ ├─> IF ENCRYPTED (flag & 2 != 0): + │ │ ├─> Determine decryptor type: + │ │ │ ├─> HTTP URL → Call external provider + │ │ │ ├─> PeerID → Call via P2P network + │ │ │ └─> Local node → Internal handler + │ │ │ + │ │ ├─> Build signature: + │ │ │ ├─> Get nonce from provider + │ │ │ ├─> Create message: txId + ethAddress + chainId + nonce + │ │ │ ├─> Hash with solidityPackedKeccak256 + │ │ │ ├─> Sign with wallet + │ │ │ └─> Verify signature + │ │ │ + │ │ ├─> Make decrypt request: + │ │ │ ├─> POST /api/services/decrypt + │ │ │ ├─> Payload: { transactionId, chainId, decrypterAddress, dataNftAddress, signature, nonce } + │ │ │ ├─> Timeout: 30 seconds + │ │ │ ├─> Retry up to 5 times (withRetrial) + │ │ │ └─> Handle 400/403 errors (no retry) + │ │ │ + │ │ └─> Validate response hash: + │ │ ├─> create256Hash(response.data) + │ │ ├─> Compare with metadataHash + │ │ └─> If mismatch → REJECT + │ │ + │ └─> IF COMPRESSED (flag & 2 == 0): + │ ├─> getBytes(metadata) + │ ├─> toUtf8String(byteArray) + │ └─> JSON.parse(utf8String) + │ + ├─> 4. VALIDATE DDO ID + │ └─> Check ddo.id matches makeDid(nftAddress, chainId) + │ └─> If mismatch → REJECT, update ddoState with error + │ + ├─> 5. CHECK AUTHORIZED PUBLISHERS + │ └─> If authorizedPublishers configured: + │ └─> Check if owner in authorizedPublishers list + │ └─> If not → REJECT, update ddoState + │ + ├─> 6. FETCH NFT INFORMATION + │ └─> getNFTInfo(nftAddress, signer, owner, timestamp) + │ ├─> Instantiate NFT contract + │ ├─> Call getMetaData() → get state + │ ├─> Call getId() → get token ID + │ ├─> Call tokenURI(id) → get URI + │ ├─> Call name() → get name + │ ├─> Call symbol() → get symbol + │ └─> Return: { state, address, name, symbol, owner, created, tokenURI } + │ + ├─> 7. FETCH TOKEN INFORMATION + │ └─> getTokenInfo(ddo.services, signer) + │ └─> For each service in DDO: + │ ├─> Instantiate datatoken contract (ERC20) + │ ├─> Call name() → get name + │ ├─> Call symbol() → get symbol + │ └─> Collect: { address, name, symbol, serviceId } + │ + ├─> 8. FETCH PRICING INFORMATION + │ └─> getPricingStatsForDddo(nftAddress, signer, provider, chainId) + │ ├─> Get all datatokens from NFT + │ ├─> For each datatoken: + │ │ ├─> Check dispenser: + │ │ │ ├─> Get Dispenser contract address + │ │ │ ├─> Call status(datatoken, owner) + │ │ │ └─> If active → add to prices array + │ │ └─> Check exchange: + │ │ ├─> Get FixedRateExchange address + │ │ ├─> Call getAllExchanges() + │ │ ├─> Filter by datatoken + │ │ └─> If active → add rate to prices array + │ └─> Return pricing arrays per service + │ + ├─> 9. CHECK PURGATORY STATUS + │ └─> Purgatory.check(nftAddress, chainId, account) + │ ├─> Check if NFT is in purgatory list + │ ├─> Check if account is in purgatory list + │ └─> Return: { state: boolean } + │ + ├─> 10. CHECK POLICY SERVER + │ └─> If policyServer configured: + │ ├─> POST to policy server endpoint + │ ├─> Payload: { did, chain, nft } + │ └─> Check response (approve/deny) + │ + ├─> 11. BUILD INDEXED METADATA + │ └─> Construct indexedMetadata object: + │ ├─> nft: { state, address, name, symbol, owner, created, tokenURI } + │ ├─> event: { txid, from, contract, block, datetime } + │ ├─> stats: [{ + │ │ datatokenAddress, + │ │ name, + │ │ symbol, + │ │ serviceId, + │ │ orders: 0, // Initial count + │ │ prices: [{ type: 'dispenser|exchange', price, contract, token, exchangeId }] + │ │ }] + │ └─> purgatory: { state } + │ + ├─> 12. STORE IN DATABASE + │ └─> createOrUpdateDDO(ddo, method) + │ ├─> ddoDatabase.create(ddo) // New asset + │ ├─> ddoState.create(chainId, did, nftAddress, txId, valid=true) + │ └─> Return saved DDO + │ + └─> 13. EMIT EVENT + └─> Event emitted to INDEXER_DDO_EVENT_EMITTER + └─> Downstream consumers notified (API, cache, webhooks) +``` + +**Database Operations:** + +- INSERT into `ddo` table (Elasticsearch/Typesense) +- INSERT into `ddoState` table (validation tracking) + +**Error Handling:** + +- Factory validation fail → skip, log error +- Decryption fail → skip, update ddoState with error +- DDO ID mismatch → skip, update ddoState +- Publisher not authorized → skip, update ddoState +- Database fail → error logged, event not stored + +--- + +#### B. METADATA_UPDATED Event + +**Trigger:** When asset metadata is updated on-chain + +**Processing Steps:** + +``` +MetadataEventProcessor.processEvent(): + │ + ├─> 1-10. Same as METADATA_CREATED + │ (validation, decryption, fetching info) + │ + ├─> 11. RETRIEVE EXISTING DDO + │ └─> ddoDatabase.retrieve(ddo.id) + │ + ├─> 12. MERGE DDO DATA + │ └─> Merge new metadata with existing: + │ ├─> Update: metadata, services, credentials + │ ├─> Preserve: existing order counts + │ ├─> Merge: pricing arrays (add new, keep existing) + │ └─> Update: indexedMetadata.event (new tx, block, datetime) + │ + ├─> 13. UPDATE DATABASE + │ └─> ddoDatabase.update(mergedDdo) + │ + └─> 14. EMIT EVENT + └─> METADATA_UPDATED event emitted +``` + +**Key Difference from CREATED:** + +- Uses `update()` instead of `create()` +- Merges with existing data instead of creating new +- Preserves order statistics + +--- + +#### C. ORDER_STARTED Event + +**Trigger:** When someone purchases/orders access to a data asset + +**On-Chain Event Data:** + +- `consumer` - Buyer address +- `payer` - Payment source address +- `datatoken` - Datatoken address +- `serviceId` - Service identifier +- `amount` - Amount paid +- `timestamp` - Order time + +**Processing Steps:** + +``` +OrderStartedEventProcessor.processEvent(): + │ + ├─> 1. DECODE EVENT DATA + │ └─> Parse event args: + │ ├─> consumer + │ ├─> payer + │ ├─> datatoken + │ ├─> amount + │ └─> timestamp + │ + ├─> 2. FIND NFT ADDRESS + │ └─> Query datatoken contract: + │ ├─> Instantiate ERC20 contract + │ ├─> Call getERC721Address() + │ └─> Get NFT address + │ + ├─> 3. BUILD DID + │ └─> did = makeDid(nftAddress, chainId) + │ + ├─> 4. RETRIEVE DDO + │ └─> ddoDatabase.retrieve(did) + │ └─> If not found → error, cannot update + │ + ├─> 5. UPDATE ORDER COUNT + │ └─> Find matching service in ddo.stats: + │ ├─> Match by datatokenAddress + │ └─> Increment orders count + │ + ├─> 6. CREATE ORDER RECORD + │ └─> orderDatabase.create({ + │ type: 'startOrder', + │ timestamp, + │ consumer, + │ payer, + │ datatokenAddress, + │ nftAddress, + │ did, + │ startOrderId: txHash + │ }) + │ + ├─> 7. UPDATE DDO + │ └─> ddoDatabase.update(ddo) + │ + └─> 8. EMIT EVENT + └─> ORDER_STARTED event emitted +``` + +**Database Operations:** + +- UPDATE `ddo` table (increment order count) +- INSERT into `order` table (new order record) + +--- + +#### D. DISPENSER_ACTIVATED Event + +**Trigger:** When a free dispenser is activated for a datatoken + +**On-Chain Event Data:** + +- `datatoken` - Datatoken address +- `owner` - Dispenser owner +- `dispenserId` - Unique identifier + +**Processing Steps:** + +``` +DispenserActivatedEventProcessor.processEvent(): + │ + ├─> 1. DECODE EVENT DATA + │ └─> Extract: datatoken, owner, dispenserId + │ + ├─> 2. FIND NFT ADDRESS + │ └─> Query datatoken contract → getNFTAddress() + │ + ├─> 3. RETRIEVE DDO + │ └─> ddoDatabase.retrieve(did) + │ + ├─> 4. UPDATE PRICING ARRAY + │ └─> Find service by datatokenAddress: + │ └─> Add to prices array: + │ { + │ type: 'dispenser', + │ price: '0', // Free + │ contract: dispenserAddress, + │ token: ZeroAddress, + │ dispenserId + │ } + │ + ├─> 5. UPDATE DDO + │ └─> ddoDatabase.update(ddo) + │ + └─> 6. EMIT EVENT + └─> DISPENSER_ACTIVATED event emitted +``` + +--- + +#### E. EXCHANGE_RATE_CHANGED Event + +**Trigger:** When exchange rate is updated for a fixed-rate exchange + +**On-Chain Event Data:** + +- `exchangeId` - Exchange identifier +- `baseToken` - Base token address +- `datatoken` - Datatoken address +- `newRate` - New exchange rate + +**Processing Steps:** + +``` +ExchangeRateChangedEventProcessor.processEvent(): + │ + ├─> 1. DECODE EVENT DATA + │ └─> Extract: exchangeId, baseToken, datatoken, newRate + │ + ├─> 2. FIND NFT ADDRESS + │ └─> Query datatoken contract → getNFTAddress() + │ + ├─> 3. RETRIEVE DDO + │ └─> ddoDatabase.retrieve(did) + │ + ├─> 4. UPDATE PRICING ARRAY + │ └─> Find service by datatokenAddress: + │ └─> Find exchange entry by exchangeId: + │ └─> Update price: newRate + │ + ├─> 5. UPDATE DDO + │ └─> ddoDatabase.update(ddo) + │ + └─> 6. EMIT EVENT + └─> EXCHANGE_RATE_CHANGED event emitted +``` + +--- + +### Event Processing Pipeline Summary + +``` +┌─────────────────────────────────────────────────────────────────────┐ +│ CONTINUOUS MONITORING LOOP │ +│ │ +│ ChainIndexer (per chain) running async/await: │ +│ while (!stopSignal): │ +│ ├─> Get last indexed block from DB │ +│ ├─> Get network height from RPC │ +│ ├─> Calculate chunk size (adaptive) │ +│ ├─> provider.getLogs(fromBlock, toBlock, topics) │ +│ │ └─> Returns: Log[] (raw blockchain logs) │ +│ │ │ +│ └─> processChunkLogs(logs, signer, provider, chainId) │ +│ │ │ +│ └─> For each log: │ +│ ├─> Identify event by topic[0] │ +│ ├─> Check if Ocean Protocol event │ +│ ├─> Apply validation (if metadata event) │ +│ ├─> Route to processor │ +│ │ └─> processEvent() called │ +│ │ ├─> Decode on-chain data │ +│ │ ├─> Fetch additional data (RPC calls) │ +│ │ ├─> Transform to domain model │ +│ │ └─> Store in database │ +│ └─> Collect result │ +│ │ +│ ├─> Update last indexed block │ +│ ├─> Emit events to INDEXER_DDO_EVENT_EMITTER │ +│ └─> Sleep for interval (30s default) │ +└─────────────────────────────────────────────────────────────────────┘ + + ↓ + +┌─────────────────────────────────────────────────────────────────────┐ +│ EVENT EMITTER LISTENERS │ +│ │ +│ Downstream consumers subscribe to events: │ +│ ├─> API endpoints (query fresh data) │ +│ ├─> Cache invalidation (update cache) │ +│ ├─> Webhooks (notify external services) │ +│ ├─> Analytics (track metrics) │ +│ └─> P2P network (advertise new assets) │ +└─────────────────────────────────────────────────────────────────────┘ +``` + +### Performance Characteristics + +**Event Monitoring Frequency:** + +- Check for new blocks every 30 seconds (configurable) +- Process up to `chunkSize` blocks per iteration (default 100-1000) +- Adaptive chunk sizing on RPC errors (halves on failure, recovers after 3 successes) + +**Concurrency:** + +- All chains monitored concurrently (async/await) +- No worker threads (optimal for I/O-bound operations) +- Events within a chunk processed serially (to maintain order) + +**RPC Call Patterns:** + +- 1 call to get network height per iteration +- 1 call to getLogs per chunk +- Per metadata event: + - 1-2 calls for transaction receipt + - 1+ calls for factory validation + - 1+ calls for NFT info (name, symbol, state) + - 1+ calls for token info (per datatoken) + - Multiple calls for pricing info (dispensers, exchanges) + - Optional: access list checks (1+ per validator) + +**Database Operations:** + +- 1 read to get last indexed block +- 1 write to update last indexed block +- Per event: 1-2 writes (ddo + ddoState/order) +- No batching currently implemented + +**Failure Recovery:** + +- RPC failure → reduce chunk size, retry +- Processing failure → don't update last block, retry same chunk +- Validation failure → skip event, continue with next +- Database failure → error logged, event not stored + +--- + +## Current Flows - Detailed Analysis + +### Flow 1: Initialization & Startup + +**Location:** `index.ts` - `OceanIndexer` constructor and `startThreads()` + +**Sequence:** + +``` +1. OceanIndexer constructor called + ├─> Initialize database reference + ├─> Store supported networks (RPCS config) + ├─> Initialize INDEXING_QUEUE = [] + ├─> Create Map for indexers + └─> Call startThreads() + +2. startThreads() + ├─> checkAndTriggerReindexing() [UC8] + │ ├─> Get current version from process.env + │ ├─> Get DB version from sqliteConfig + │ ├─> Compare with MIN_REQUIRED_VERSION + │ ├─> If reindex needed: + │ │ ├─> For each chain: + │ │ │ ├─> Delete all assets: ddo.deleteAllAssetsFromChain() + │ │ │ └─> Reset last indexed block to deployment block + │ │ └─> Update DB version + │ └─> Continue to indexer startup + │ + ├─> setupEventListeners() [Global event handlers] + │ ├─> Listen for METADATA_CREATED on INDEXER_CRAWLING_EVENT_EMITTER + │ ├─> Listen for METADATA_UPDATED + │ ├─> Listen for ORDER_STARTED + │ ├─> Listen for REINDEX_QUEUE_POP + │ ├─> Listen for REINDEX_CHAIN + │ └─> Re-emit to INDEXER_DDO_EVENT_EMITTER (for external consumers) + │ + └─> For each supported chain (sequential): + ├─> startThread(chainId) + │ ├─> Check if indexer already running + │ │ └─> If yes: stop it, wait, then proceed + │ │ + │ ├─> Get network config (rpc, fallbackRPCs, chunkSize, etc.) + │ │ + │ ├─> Create Blockchain instance + │ │ └─> new Blockchain(rpc, chainId, config, fallbackRPCs) + │ │ + │ ├─> Validate connectivity: retryCrawlerWithDelay() + │ │ ├─> Check: blockchain.isNetworkReady() + │ │ ├─> If not ready: tryFallbackRPCs() + │ │ ├─> Check: DB reachable + │ │ ├─> Retry up to 10 times with exponential backoff + │ │ └─> Return: canStart (boolean) + │ │ + │ ├─> If connectivity failed → return null, skip chain + │ │ + │ ├─> Create ChainIndexer instance + │ │ └─> new ChainIndexer(blockchain, rpcDetails, INDEXER_CRAWLING_EVENT_EMITTER) + │ │ + │ ├─> Start indexer (non-blocking!) + │ │ └─> await indexer.start() + │ │ └─> Internally calls indexLoop() without await + │ │ └─> Returns immediately, loop runs in background + │ │ + │ └─> Store: indexers.set(chainId, indexer) + │ + └─> Return: all indexers started successfully + +3. Each ChainIndexer now running independently + └─> Async indexLoop() executing concurrently for all chains +``` + +**Key Behaviors:** + +- Version check happens before indexers start +- Each chain gets its own ChainIndexer instance +- RPC connection validated before starting indexer +- All indexers run concurrently via async/await (no worker threads) +- Event listeners set up globally (shared EventEmitter) +- ChainIndexers emit events, OceanIndexer re-emits to external consumers + +**Current Architecture Benefits:** + +- No worker threads → simpler code, easier debugging +- Async/await → better error handling, stack traces preserved +- EventEmitter → decoupled communication +- All indexers share same Node.js event loop +- Optimal for I/O-bound workloads (RPC calls, DB queries) + +--- + +### Flow 2: Block Crawling Loop (ChainIndexer) + +**Location:** `ChainIndexer.ts` - `indexLoop()` + +**Sequence:** + +``` +async indexLoop() { + // Initialization + contractDeploymentBlock = getDeployedContractBlock(chainId) + crawlingStartBlock = rpcDetails.startBlock || contractDeploymentBlock + provider = blockchain.getProvider() + signer = blockchain.getSigner() + interval = getCrawlingInterval() // Default 30s + chunkSize = rpcDetails.chunkSize || 1 + successfulRetrievalCount = 0 + lockProcessing = false + startedCrawling = false + + // Main loop + while (!this.stopSignal) { + if (!lockProcessing) { + lockProcessing = true + + try { + // 1. GET CURRENT STATE + lastIndexedBlock = await this.getLastIndexedBlock() + networkHeight = await getNetworkHeight(provider) + startBlock = lastIndexedBlock > crawlingStartBlock + ? lastIndexedBlock + : crawlingStartBlock + + INDEXER_LOGGER.info( + `Chain ${chainId}: Last=${lastIndexedBlock}, Start=${startBlock}, Height=${networkHeight}` + ) + + // 2. CHECK IF WORK TO DO + if (networkHeight > startBlock) { + // Emit one-shot event when crawling starts + if (!startedCrawling) { + startedCrawling = true + this.eventEmitter.emit(INDEXER_CRAWLING_EVENTS.CRAWLING_STARTED, { + chainId, + startBlock, + networkHeight, + contractDeploymentBlock + }) + } + + // 3. CALCULATE CHUNK SIZE + remainingBlocks = networkHeight - startBlock + blocksToProcess = min(chunkSize, remainingBlocks) + + INDEXER_LOGGER.info(`Processing ${blocksToProcess} blocks...`) + + // 4. RETRIEVE EVENTS FROM RPC + let chunkEvents = [] + try { + chunkEvents = await retrieveChunkEvents( + signer, + provider, + chainId, + startBlock, + blocksToProcess + ) + // Inside retrieveChunkEvents(): + // provider.getLogs({ + // fromBlock: startBlock + 1, + // toBlock: startBlock + blocksToProcess, + // topics: [ALL_OCEAN_EVENT_HASHES] + // }) + + successfulRetrievalCount++ + } catch (error) { + // ADAPTIVE CHUNK SIZING on RPC error + INDEXER_LOGGER.warn(`RPC error: ${error.message}`) + chunkSize = floor(chunkSize / 2) < 1 ? 1 : floor(chunkSize / 2) + successfulRetrievalCount = 0 + INDEXER_LOGGER.info(`Reduced chunk size to ${chunkSize}`) + // Continue to next iteration + } + + // 5. PROCESS EVENTS + try { + processedBlocks = await processBlocks( + chunkEvents, + signer, + provider, + chainId, + startBlock, + blocksToProcess + ) + // processBlocks() calls processChunkLogs() + // which routes events to processors + + INDEXER_LOGGER.debug( + `Processed ${processedBlocks.foundEvents.length} events from ${chunkEvents.length} logs` + ) + + // 6. UPDATE LAST INDEXED BLOCK (critical!) + currentBlock = await this.updateLastIndexedBlockNumber( + processedBlocks.lastBlock, + lastIndexedBlock + ) + // Inside updateLastIndexedBlockNumber(): + // indexerDb.update(chainId, block) + // Returns new lastIndexedBlock or -1 on failure + + // Safety check + if (currentBlock < 0 && lastIndexedBlock !== null) { + currentBlock = lastIndexedBlock + INDEXER_LOGGER.error('Failed to update last block, keeping old value') + } + + // 7. EMIT EVENTS FOR NEWLY INDEXED ASSETS + this.emitNewlyIndexedAssets(processedBlocks.foundEvents) + // Emits to INDEXER_CRAWLING_EVENT_EMITTER: + // - METADATA_CREATED + // - METADATA_UPDATED + // - ORDER_STARTED + // - ORDER_REUSED + // - DISPENSER_ACTIVATED/DEACTIVATED + // - EXCHANGE_ACTIVATED/DEACTIVATED/RATE_CHANGED + + // 8. ADAPTIVE CHUNK SIZE RECOVERY + if (successfulRetrievalCount >= 3 && chunkSize < rpcDetails.chunkSize) { + chunkSize = rpcDetails.chunkSize + successfulRetrievalCount = 0 + INDEXER_LOGGER.info(`Reverted chunk size to ${chunkSize}`) + } + + } catch (error) { + // PROCESSING ERROR + INDEXER_LOGGER.error(`Processing failed: ${error.message}`) + successfulRetrievalCount = 0 + // Critical: Don't update last block → retry same chunk + await sleep(interval) + } + + } else { + // No new blocks available + await sleep(interval) + } + + // 9. PROCESS REINDEX QUEUE + await this.processReindexQueue(provider, signer) + // Processes this.reindexQueue (FIFO) + // For each task: + // - Get transaction receipt + // - Process logs from receipt + // - Emit REINDEX_QUEUE_POP event + + // 10. HANDLE CHAIN REINDEX COMMAND + if (this.reindexBlock !== null) { + networkHeight = await getNetworkHeight(provider) + result = await this.reindexChain(currentBlock, networkHeight) + + this.eventEmitter.emit(INDEXER_CRAWLING_EVENTS.REINDEX_CHAIN, { + result, + chainId + }) + } + + } catch (error) { + INDEXER_LOGGER.error(`Error in indexing loop: ${error.message}`) + await sleep(interval) + } finally { + lockProcessing = false + } + + } else { + // Already processing, wait a bit + INDEXER_LOGGER.debug('Processing in progress, waiting...') + await sleep(1000) + } + } + + // 11. CLEANUP ON STOP + this.isRunning = false + INDEXER_LOGGER.info(`Exiting indexer loop for chain ${chainId}`) +} +``` + +**Key Behaviors:** + +- Infinite async loop with `lockProcessing` flag +- Adaptive chunk sizing on RPC errors (halves on error, recovers after 3 successes) +- Last block only updated on successful processing (critical for consistency) +- Reindex queue processed after each chunk +- One-shot `CRAWLING_STARTED` event +- Graceful shutdown via `stopSignal` +- All operations use async/await (no callbacks, no worker threads) + +**Current Implementation Improvements:** + +- `lockProcessing` now has actual waiting: `await sleep(1000)` when locked +- Instance state (`this.reindexBlock`, `this.reindexQueue`) instead of global +- Better error handling with try/catch/finally +- Cleaner shutdown: sets `isRunning = false` +- EventEmitter instead of postMessage (simpler, type-safe) + +**Performance Characteristics:** + +- One iteration per 30 seconds (if caught up) +- Processes up to `chunkSize` blocks per iteration (typically 100-1000) +- On RPC error: chunk size halves (min 1) → slower but more reliable +- Recovery: after 3 successful calls → chunk size restored +- No parallel event processing within chunk (maintains order) + +--- + +### Flow 3: Event Processing Pipeline + +**Location:** `processor.ts` - `processChunkLogs()` + +**Sequence:** + +``` +processChunkLogs(logs, signer, provider, chainId): + storeEvents = {} + + if (logs.length > 0) { + config = await getConfiguration() + checkMetadataValidated = (allowedValidators.length > 0 || + allowedValidatorsList exists) + + for each log in logs: + // 1. Identify event + event = findEventByKey(log.topics[0]) + + if (event && event.type in EVENTS): + // 2. Metadata validation (if metadata event) + if (event.type in [METADATA_CREATED, METADATA_UPDATED, METADATA_STATE]): + if (checkMetadataValidated): + // Get transaction receipt + txReceipt = await provider.getTransactionReceipt(log.txHash) + + // Extract MetadataValidated events + metadataProofs = fetchEventFromTransaction( + txReceipt, 'MetadataValidated', ERC20Template.abi + ) + + if (!metadataProofs): + continue // Skip event + + // Extract validator addresses + validators = metadataProofs.map(proof => proof.args[0]) + + // Check allowed validators + allowed = allowedValidators.filter(v => + validators.indexOf(v) !== -1 + ) + + if (!allowed.length): + continue // Skip event + + // Check access lists (if configured) + if (allowedValidatorsList && validators.length > 0): + isAllowed = false + for each accessListAddress in allowedValidatorsList[chainId]: + accessListContract = new Contract(accessListAddress, ...) + for each validator in validators: + balance = await accessListContract.balanceOf(validator) + if (balance > 0): + isAllowed = true + break + if (isAllowed) break + + if (!isAllowed): + continue // Skip event + + // 3. Route to processor + if (event.type === TOKEN_URI_UPDATE): + storeEvents[event.type] = 'TOKEN_URI_UPDATE' + else: + processor = getEventProcessor(event.type, chainId) + result = await processor.processEvent( + log, chainId, signer, provider, event.type + ) + storeEvents[event.type] = result + + return storeEvents + } + + return {} +``` + +**Key Behaviors:** + +- Sequential processing (one event at a time) +- Validation happens before processing +- Multiple RPC calls per metadata event (receipt + access list checks) +- Processor instances cached per event type + chain +- Events skipped silently on validation failure + +**Issues Observed:** + +- Nested validation logic (hard to read) +- Multiple RPC calls per event (performance issue) +- No parallelization +- No batch validation +- Silent failures (just `continue`) + +--- + +### Flow 4: Metadata Event Processing + +**Location:** `processors/MetadataEventProcessor.ts` - `processEvent()` + +**Sequence:** + +``` +processEvent(log, chainId, signer, provider, eventName): + // 1. Factory check + wasDeployedByUs = await wasNFTDeployedByOurFactory( + chainId, signer, event.address + ) + if (!wasDeployedByUs): + return // Skip + + // 2. Decode event + decodedEventData = await getEventData( + provider, log.txHash, ERC721Template.abi, eventName + ) + metadata = decodedEventData.args[4] + metadataHash = decodedEventData.args[5] + flag = decodedEventData.args[3] + owner = decodedEventData.args[0] + + // 3. Decrypt DDO (400+ lines) + ddo = await decryptDDO( + decodedEventData.args[2], flag, owner, + event.address, chainId, log.txHash, + metadataHash, metadata + ) + + // 4. Validate DDO ID + ddoInstance = DDOManager.getDDOClass(ddo) + expectedDid = ddoInstance.makeDid(event.address, chainId) + if (ddo.id !== expectedDid): + await ddoState.update(..., false, 'DID mismatch') + return + + // 5. Check authorized publishers + if (authorizedPublishers configured): + if (owner not in authorizedPublishers): + await ddoState.update(..., false, 'Unauthorized publisher') + return + + // 6. Get NFT info + nftInfo = await getNFTInfo(event.address, signer) + + // 7. Get token info + tokenInfo = await getTokenInfo(event.address, signer, provider) + + // 8. Get pricing stats + pricingStats = await getPricingStatsForDddo( + event.address, signer, provider, chainId + ) + + // 9. Check purgatory + purgatoryStatus = await Purgatory.check(...) + + // 10. Check policy server + policyServerCheck = await checkPolicyServer(...) + + // 11. Build indexed metadata + indexedMetadata = { + nft: nftInfo, + event: { txid, from, contract, block, datetime }, + stats: [{ + datatokenAddress, name, symbol, serviceId, + orders: number, + prices: [...] + }], + purgatory: purgatoryStatus + } + + // 12. Create or update DDO + if (eventName === METADATA_CREATED): + await ddoDatabase.create(ddo) + else: + existingDdo = await ddoDatabase.retrieve(ddo.id) + // Merge stats + updatedDdo = mergeDDO(existingDdo, ddo) + await ddoDatabase.update(updatedDdo) + + // 13. Update DDO state + await ddoState.update(chainId, ddo.id, event.address, + log.txHash, true, null) + + return ddo +``` + +**Key Behaviors:** + +- Many sequential async operations +- Multiple RPC calls (NFT info, token info, pricing) +- DDO decryption with multiple strategies +- State tracking separate from DDO storage +- Stats merging for updates + +**Issues Observed:** + +- Very long method (400+ lines) +- Many external calls (slow) +- No batching of RPC calls +- Decryption logic complex and hard to test +- No error recovery for individual steps + +--- + +### Flow 5: Reindex Transaction + +**Location:** `crawlerThread.ts` - `processReindex()` + +**Sequence:** + +``` +processReindex(provider, signer, chainId): + while (REINDEX_QUEUE.length > 0): + reindexTask = REINDEX_QUEUE.pop() + + try: + // Get transaction receipt + receipt = await provider.getTransactionReceipt( + reindexTask.txId + ) + + if (receipt): + // Extract logs + if (reindexTask.eventIndex defined): + log = receipt.logs[reindexTask.eventIndex] + logs = [log] + else: + logs = receipt.logs + + // Process logs (same as normal flow) + await processChunkLogs(logs, signer, provider, chainId) + + // Notify parent + parentPort.postMessage({ + method: REINDEX_QUEUE_POP, + data: { reindexTask } + }) + else: + // Receipt not found, re-queue + REINDEX_QUEUE.push(reindexTask) + + catch (error): + // Error logged, task lost + INDEXER_LOGGER.error(...) +``` + +**Key Behaviors:** + +- Processes queue during normal crawling loop +- Uses same processing pipeline as normal events +- Re-queues on receipt not found +- No retry limit + +**Issues Observed:** + +- Tasks can be lost on error +- No timeout for receipt retrieval +- Processes during normal crawling (could slow down) +- No priority mechanism + +--- + +### Flow 6: Reindex Chain + +**Location:** `crawlerThread.ts` - `reindexChain()` + +**Sequence:** + +``` +reindexChain(currentBlock, networkHeight): + // 1. Validate block + if (REINDEX_BLOCK > networkHeight): + REINDEX_BLOCK = null + return false + + // 2. Update last indexed block + block = await updateLastIndexedBlockNumber(REINDEX_BLOCK) + + if (block !== -1): + REINDEX_BLOCK = null + + // 3. Delete all assets + res = await deleteAllAssetsFromChain() + + if (res === -1): + // Deletion failed, revert block + await updateLastIndexedBlockNumber(currentBlock) + return false + + return true + else: + REINDEX_BLOCK = null + return false +``` + +**Key Behaviors:** + +- Validates block before proceeding +- Updates block first, then deletes assets +- Reverts block if deletion fails +- Clears `REINDEX_BLOCK` flag + +**Issues Observed:** + +- No transaction wrapping (block update + deletion) +- Race condition possible (normal crawling could interfere) +- No progress tracking +- Can take very long for large chains + +--- + +## Event Processing Flows + +### Event Types Processed + +1. **Metadata Events:** + + - `METADATA_CREATED` - New asset published + - `METADATA_UPDATED` - Asset metadata updated + - `METADATA_STATE` - Asset state changed + +2. **Order Events:** + + - `ORDER_STARTED` - New order initiated + - `ORDER_REUSED` - Order reused + +3. **Dispenser Events:** + + - `DISPENSER_CREATED` - Dispenser created + - `DISPENSER_ACTIVATED` - Dispenser activated + - `DISPENSER_DEACTIVATED` - Dispenser deactivated + +4. **Exchange Events:** + + - `EXCHANGE_CREATED` - Exchange created + - `EXCHANGE_ACTIVATED` - Exchange activated + - `EXCHANGE_DEACTIVATED` - Exchange deactivated + - `EXCHANGE_RATE_CHANGED` - Exchange rate changed + +5. **Other:** + - `TOKEN_URI_UPDATE` - Token URI updated (no processing) + +### Event Flow Summary + +``` +Block Logs + ↓ +Event Identification (by topic hash) + ↓ +Validation (for metadata events) + ├─> Factory check + ├─> Metadata proof validation + ├─> Access list check + └─> Publisher authorization + ↓ +Route to Processor + ├─> MetadataEventProcessor + ├─> OrderStartedEventProcessor + ├─> DispenserEventProcessor + └─> ExchangeEventProcessor + ↓ +Process Event + ├─> Decode event data + ├─> Fetch additional data (RPC calls) + ├─> Transform to domain model + └─> Store in database + ↓ +Emit Event (to parent thread) + ↓ +Parent Thread Emits (to listeners) +``` + +--- + +## Error Handling & Retry Mechanisms + +### Current Retry Mechanisms (4 Layers) + +**Layer 1: Crawler Startup Retry** + +- Location: `index.ts` - `retryCrawlerWithDelay()` +- Max retries: 10 +- Interval: `max(fallbackRPCs.length * 3000, 5000)` ms +- Recursive retry +- Checks DB reachability + +**Layer 2: Adaptive Chunk Sizing** + +- Location: `crawlerThread.ts` - `processNetworkData()` +- On RPC error: `chunkSize = floor(chunkSize / 2)` (min 1) +- Reverts after 3 successful calls +- No max retries (infinite) + +**Layer 3: Block Processing Retry** + +- Location: `crawlerThread.ts` - `processNetworkData()` +- On processing error: sleep and retry same chunk +- No max retries +- Last block not updated on error + +**Layer 4: Individual RPC Retry** + +- Location: `processors/BaseProcessor.ts` - `withRetrial()` +- Max retries: 5 +- Used in `decryptDDO()` +- Exponential backoff + +### Error Handling Issues + +1. **No Centralized Strategy:** + + - 4 different retry mechanisms + - Unclear which applies when + - No consistent backoff + +2. **Silent Failures:** + + - Events skipped with `continue` + - No error tracking + - No metrics on failures + +3. **No Circuit Breaker:** + + - Continues retrying failed RPCs + - Can cause cascade failures + - No health tracking + +4. **State Recovery:** + - Last block not updated on error + - Same chunk retried indefinitely + - No timeout mechanism + +--- + +## Async/Await Architecture & Concurrency + +### Current Architecture (No Worker Threads) + +- **One ChainIndexer instance per chain** +- **Main thread:** `OceanIndexer` orchestrator +- **Communication:** Direct `EventEmitter` (event-driven) +- **State:** Instance-based (no shared state between chains) +- **Concurrency:** Async/await leveraging Node.js event loop + +### Indexer Lifecycle + +``` +OceanIndexer ChainIndexer (Chain 1) + │ │ + │ new ChainIndexer(...) │ + ├──────────────────────────────────────→│ Constructor + │ │ + │ await indexer.start() │ + ├──────────────────────────────────────→│ start() called + │ (returns immediately) │ ├─> Set stopSignal = false + │ │ ├─> Set isRunning = true + │ │ └─> Call indexLoop() without await + │ │ (runs in background) + │ │ + │ │ async indexLoop() + │ │ while (!stopSignal) { + │ │ ├─> Get last block + │ │ ├─> Get network height + │ │ ├─> Retrieve events + │ │ ├─> Process events + │ │ ├─> Update last block + │ │ └─> Sleep 30s + │ │ } + │ │ + │ indexer.addReindexTask(task) │ + ├──────────────────────────────────────→│ Add to reindexQueue + │ │ (processed in next iteration) + │ │ + │ │ eventEmitter.emit(METADATA_CREATED) + │ ←──────────────────────────────────────┤ + │ (event listener catches) │ + │ re-emit to INDEXER_DDO_EVENT_EMITTER │ + │ │ + │ await indexer.stop() │ + ├──────────────────────────────────────→│ stop() called + │ (waits for graceful shutdown) │ ├─> Set stopSignal = true + │ │ └─> Wait for loop to exit + │ │ while (isRunning) sleep(100) + │ │ + │ ←──────────────────────────────────────┤ isRunning = false + │ (stop() returns) │ Loop exited +``` + +### Concurrency Model + +**How Multiple Chains Run Concurrently:** + +``` +Node.js Event Loop + │ + ├─> ChainIndexer(chain=1).indexLoop() + │ └─> await getLastBlock() ────→ I/O operation (yields control) + │ + ├─> ChainIndexer(chain=137).indexLoop() + │ └─> await provider.getLogs() ───→ I/O operation (yields control) + │ + ├─> ChainIndexer(chain=8996).indexLoop() + │ └─> await processBlocks() ──────→ I/O operation (yields control) + │ + └─> (all run concurrently via async/await) +``` + +**Key Point:** When one indexer awaits an I/O operation (RPC call, DB query), control yields to the event loop, allowing other indexers to progress. No worker threads needed! + +### Benefits Over Worker Threads + +1. **Simpler Code:** + + - No `postMessage()` / `parentPort` complexity + - Direct method calls + - Clear data flow + - Standard async/await patterns + +2. **Better Error Handling:** + + - Stack traces preserved across async boundaries + - try/catch works normally + - Errors don't crash entire thread + - No serialization errors + +3. **State Management:** + + - Instance-based state (each ChainIndexer has its own) + - No global state between chains + - No race conditions on shared state + - TypeScript types preserved + +4. **Debugging:** + + - Can use standard debuggers + - Breakpoints work normally + - Console.log from anywhere + - No need to debug worker threads + +5. **Testing:** + - Easy to mock ChainIndexer + - No Worker API to mock + - Can unit test methods directly + - Faster test execution + +### Current Concurrency Characteristics + +1. **Lock Mechanism:** + + - `lockProcessing` flag prevents re-entry + - Actual waiting: `await sleep(1000)` when locked + - No race conditions (single-threaded per instance) + +2. **Event Ordering:** + + - Events emitted in order per chain + - EventEmitter guarantees listener order + - No message queue (immediate delivery) + +3. **Error Propagation:** + + - Errors caught in indexLoop() + - Logged with chain context + - Loop continues after error + - `isRunning` flag tracks health + +4. **Graceful Shutdown:** + - `stop()` sets `stopSignal = true` + - Loop exits on next iteration + - `await` ensures complete shutdown + - No orphaned processes + +### Why This Works for I/O-Bound Workloads + +**Ocean Node Indexer is I/O-bound:** + +- 90%+ time spent waiting for: + - RPC calls (network I/O) + - Database queries (disk/network I/O) + - Sleep intervals +- Minimal CPU-bound work (event decoding, JSON parsing) + +**Async/await is optimal because:** + +- During I/O wait, other indexers can progress +- No context switching overhead (vs threads) +- No memory duplication (vs processes) +- Single event loop handles all concurrency + +--- + +## Failure Scenarios & Recovery + +### Scenario 1: RPC Provider Fails + +**Current Behavior:** + +1. `retrieveChunkEvents()` throws error +2. Caught in `processNetworkData()` +3. Chunk size reduced +4. Sleep and retry +5. If all RPCs fail → retry up to 10 times during startup +6. After max retries → worker not started + +**Recovery:** + +- Manual restart required +- No automatic RPC health tracking +- No circuit breaker + +**Issues:** + +- Slow recovery (chunk size reduction) +- No provider health tracking +- Can get stuck retrying + +--- + +### Scenario 2: Database Unavailable + +**Current Behavior:** + +1. DB call fails +2. Error logged +3. Last block not updated +4. Same chunk retried +5. Can loop indefinitely + +**Recovery:** + +- No automatic recovery +- Manual intervention needed +- State may be inconsistent + +**Issues:** + +- No DB health check +- No timeout +- Can process events but not store them + +--- + +### Scenario 3: Worker Thread Crashes + +**Current Behavior:** + +1. Worker throws uncaught error +2. `worker.on('error')` handler logs error +3. `worker.on('exit')` handler sets `runningThreads[chainId] = false` +4. No automatic restart + +**Recovery:** + +- Manual restart via API +- Or node restart + +**Issues:** + +- No automatic restart +- State lost (in-memory queues) +- No health monitoring + +--- + +### Scenario 4: Processing Error in Event Handler + +**Current Behavior:** + +1. `processor.processEvent()` throws error +2. Caught in `processBlocks()` +3. Error re-thrown +4. Caught in `processNetworkData()` +5. Last block not updated +6. Sleep and retry same chunk + +**Recovery:** + +- Retry same chunk +- No max retries +- Can loop forever on bad event + +**Issues:** + +- No error classification +- No skip mechanism for bad events +- Can block progress + +--- + +### Scenario 5: Reindex Task Fails + +**Current Behavior:** + +1. `processReindex()` called +2. Receipt not found → re-queued +3. Processing error → logged, task lost +4. No retry limit + +**Recovery:** + +- Re-queue on receipt not found +- Lost on processing error +- No timeout + +**Issues:** + +- Tasks can be lost +- No retry limit +- No timeout + +--- + +## State Management + +### Global State Variables + +**Parent Thread (`index.ts`):** + +- `INDEXING_QUEUE: ReindexTask[]` - Reindex tasks +- `JOBS_QUEUE: JobStatus[]` - Admin job queue +- `runningThreads: Map` - Thread status +- `globalWorkers: Map` - Worker references +- `numCrawlAttempts: number` - Retry counter + +**Worker Thread (`crawlerThread.ts`):** + +- `REINDEX_BLOCK: number` - Chain reindex target +- `REINDEX_QUEUE: ReindexTask[]` - Transaction reindex queue +- `stoppedCrawling: boolean` - Stop flag +- `startedCrawling: boolean` - Start flag + +**Database:** + +- `indexer` table - Last indexed block per chain +- `ddo` table - DDO documents +- `ddoState` table - Validation state +- `order` table - Order records +- `sqliteConfig` table - Node version + +### State Synchronization Issues + +1. **Dual Queues:** + + - `INDEXING_QUEUE` (parent) and `REINDEX_QUEUE` (worker) + - Can get out of sync + - No transaction + +2. **Last Block Updates:** + + - Updated after processing + - Not updated on error + - Can lead to gaps or duplicates + +3. **Job Status:** + + - Updated via `updateJobStatus()` + - Searches entire queue (O(n)) + - Can have duplicates + +4. **Thread Status:** + - `runningThreads` and `globalWorkers` can diverge + - No cleanup on crash + +--- + +## Observations & Pain Points + +### Complexity Issues + +1. **Mixed Concerns:** + + - Crawler thread handles: networking, validation, processing, state + - Hard to test individual components + - Changes affect multiple areas + +2. **Nested Logic:** + + - Validation logic deeply nested (80+ lines) + - Hard to read and maintain + - Error paths unclear + +3. **Long Methods:** + - `processNetworkData()` - 160+ lines + - `processChunkLogs()` - 120+ lines + - `decryptDDO()` - 400+ lines + - Hard to understand flow + +### Performance Issues + +1. **Serial Processing:** + + - Events processed one at a time + - No parallelization + - Slow for large chunks + +2. **Many RPC Calls:** + + - Receipt per metadata event + - Access list checks per validator + - NFT info, token info, pricing per event + - No batching + +3. **Database Calls:** + - One call per event + - No batching + - No transaction wrapping + +### Reliability Issues + +1. **Error Recovery:** + + - Multiple retry mechanisms + - Unclear recovery paths + - Can get stuck in loops + +2. **State Consistency:** + + - No transactions + - State can be inconsistent + - No rollback mechanism + +3. **Observability:** + - Only logs + - No metrics + - Hard to debug production issues + +### Testing Issues + +1. **Worker Threads:** + + - Hard to unit test + - Requires mocking Worker API + - Integration tests slow + +2. **Tight Coupling:** + + - Database calls throughout + - RPC calls in processors + - Hard to mock + +3. **Global State:** + - Tests can interfere + - Hard to isolate + - Flaky tests + +--- + +## Summary + +This document provides a comprehensive view of all indexer use cases, event monitoring mechanisms, and current flows. Key takeaways: + +### Architecture Overview + +1. **Current Implementation:** + + - Uses ChainIndexer classes (one per blockchain) + - Async/await architecture (no worker threads) + - Event-driven communication via EventEmitter + - Optimal for I/O-bound operations + +2. **Event Monitoring:** + + - Continuous block scanning (30-second intervals) + - Filter-based event retrieval (topic hashes) + - 12 different event types supported + - Real-time processing and database updates + +3. **Event Processing Pipeline:** + - Event identification by topic hash + - Multi-layer validation (factory, metadata, publishers) + - Complex DDO decryption (HTTP, P2P, local) + - Rich metadata enrichment (NFT info, pricing, orders) + - Database persistence with state tracking + +### Documentation Scope + +1. **10 Main Use Cases** covering: + + - Normal block crawling and indexing + - Event processing (metadata, orders, pricing) + - Admin operations (reindex tx, reindex chain) + - Error handling and recovery + +2. **Event Monitoring Deep Dive** showing: + + - How events are discovered on-chain + - Topic filtering and identification + - Detailed processing for each event type + - Database operations and state updates + +3. **6 Detailed Flows** with: + + - Initialization and startup sequence + - Block crawling loop (ChainIndexer) + - Event processing pipeline + - Metadata event handling + - Reindex operations + +4. **4 Retry Mechanisms** across: + + - Crawler startup (10 retries) + - Adaptive chunk sizing (infinite, recovers after 3 successes) + - Block processing retry (infinite, same chunk) + - Individual RPC retry (5 retries in decryptDDO) + +5. **5 Failure Scenarios** with: + - RPC provider failures + - Database unavailability + - Worker/indexer crashes (now ChainIndexer) + - Processing errors in handlers + - Reindex task failures + +### Key Technical Insights + +**Event Monitoring:** + +- Uses `provider.getLogs()` with Ocean Protocol event topic filters +- Processes up to 1000 blocks per iteration (configurable) +- Adaptive chunk sizing on RPC failures +- Sequential processing within chunk (maintains order) + +**Event Processing:** + +- 12 event types with dedicated processors +- Complex validation: factory → metadata proof → access list → publisher +- DDO decryption: 3 strategies (HTTP, P2P, local) with retries +- Metadata enrichment: NFT info + token info + pricing + purgatory +- Database operations: create/update DDO + state tracking + order records + +**Concurrency Model:** + +- All chains indexed concurrently via async/await +- No worker threads (simpler, more maintainable) +- Leverages Node.js event loop for I/O operations +- Instance-based state (no global state between chains) + +### Next Steps for Meeting + +**Analysis Topics:** + +- Review event monitoring and processing flows +- Identify inconsistencies or implicit behavior +- Discuss validation complexity and optimization opportunities +- Evaluate error handling and retry strategies +- Consider batching and performance improvements + +**Improvement Areas:** + +- Serial event processing (no parallelization) +- Many RPC calls per metadata event (no batching) +- No database transaction wrapping +- Multiple retry mechanisms (uncoordinated) +- Complex nested validation logic + +**Refactoring Considerations:** + +- Separate validation from processing +- Extract DDO decryption service +- Implement batch RPC calls +- Add circuit breaker pattern +- Introduce metrics and observability + +--- + +## Document Change Log + +**Version 2.0 - January 27, 2026:** + +**Major Updates:** + +- ✅ Updated architecture from Worker Threads to ChainIndexer classes +- ✅ Replaced worker thread references with async/await architecture +- ✅ Added comprehensive "Event Monitoring Deep Dive" section (600+ lines) +- ✅ Detailed event handling for all 12 event types +- ✅ Updated all use cases to reflect current implementation +- ✅ Updated all flows with ChainIndexer lifecycle +- ✅ Renamed "Worker Threads & Concurrency" to "Async/Await Architecture & Concurrency" +- ✅ Enhanced summary with technical insights and improvement areas + +**New Content:** + +- Event discovery process with step-by-step breakdown +- Event identification and routing mechanism +- Detailed processing for METADATA_CREATED event (13 steps, 400+ lines) +- Detailed processing for METADATA_UPDATED event +- Detailed processing for ORDER_STARTED event +- Detailed processing for DISPENSER_ACTIVATED event +- Detailed processing for EXCHANGE_RATE_CHANGED event +- Event processing pipeline summary with visual diagram +- Performance characteristics (RPC patterns, concurrency model, failure recovery) +- Concurrency model explanation (why async/await works for I/O-bound workloads) + +**Documentation Focus:** +This document now provides deep technical insight into: + +1. How events are monitored on-chain (continuous polling, topic filtering) +2. What happens for each event type detected (validation → decryption → enrichment → storage) +3. Current implementation details (ChainIndexer, async/await, EventEmitter) +4. Pain points and improvement opportunities + +**Target Audience:** + +- Development team preparing for refactoring +- New developers understanding the indexer +- Architecture review meeting participants +- Technical stakeholders evaluating improvements + +--- + +**Document Version:** 2.0 +**Last Updated:** January 27, 2026 +**Status:** Ready for Meeting - Reflects Current Implementation From 4a48441745db094bfef43a5409b83530906a85f7 Mon Sep 17 00:00:00 2001 From: Bogdan Fazakas Date: Tue, 27 Jan 2026 14:34:38 +0200 Subject: [PATCH 2/6] simplify doc --- INDEXER_USE_CASES_AND_FLOWS.md | 2751 ++++++++------------------------ 1 file changed, 633 insertions(+), 2118 deletions(-) diff --git a/INDEXER_USE_CASES_AND_FLOWS.md b/INDEXER_USE_CASES_AND_FLOWS.md index c91744590..2dad4027e 100644 --- a/INDEXER_USE_CASES_AND_FLOWS.md +++ b/INDEXER_USE_CASES_AND_FLOWS.md @@ -1,7 +1,7 @@ -# Ocean Node Indexer - Use Cases & Current Flows Documentation +# Ocean Node Indexer - Event Monitoring & Error Handling **Created:** January 2026 -**Purpose:** Deep review of all indexer use cases and execution flows for refactoring discussion +**Purpose:** Event monitoring mechanisms and error handling for refactoring discussion **Status:** Pre-Meeting Preparation Document --- @@ -9,1950 +9,670 @@ ## Table of Contents 1. [Overview](#overview) -2. [Use Cases](#use-cases) -3. [Event Monitoring Deep Dive](#event-monitoring-deep-dive) - - How Event Monitoring Works - - Event Identification & Routing - - Detailed Event Handling Per Type - - Event Processing Pipeline Summary - - Performance Characteristics -4. [Current Flows - Detailed Analysis](#current-flows-detailed-analysis) -5. [Event Processing Flows](#event-processing-flows) -6. [Error Handling & Retry Mechanisms](#error-handling--retry-mechanisms) -7. [Async/Await Architecture & Concurrency](#asyncawait-architecture--concurrency) -8. [Failure Scenarios & Recovery](#failure-scenarios--recovery) -9. [State Management](#state-management) -10. [Observations & Pain Points](#observations--pain-points) -11. [Summary](#summary) +2. [Event Monitoring Architecture](#event-monitoring-architecture) +3. [Event Processing Pipeline](#event-processing-pipeline) +4. [Detailed Event Handling](#detailed-event-handling) +5. [Error Handling & Retry Mechanisms](#error-handling--retry-mechanisms) +6. [Failure Scenarios & Recovery](#failure-scenarios--recovery) --- ## Overview -The Ocean Node Indexer is responsible for: +The Ocean Node Indexer continuously monitors blockchain networks for Ocean Protocol events and processes them in real-time. -- Continuously monitoring multiple blockchain networks for Ocean Protocol events -- Processing and validating events (metadata, orders, pricing) -- Storing processed data in databases (Elasticsearch/Typesense) -- Managing indexing state per chain -- Supporting reindexing operations (full chain or specific transactions) -- Emitting events for downstream consumers +**Current Architecture:** -**Architecture Summary:** - -- One `OceanIndexer` instance (orchestrator) -- One `ChainIndexer` instance per supported blockchain network -- All operations use async/await (no worker threads) +- One `ChainIndexer` instance per blockchain network +- Async/await architecture (no worker threads) - Event-driven communication via `EventEmitter` -- Event processors for each event type (12 different event types) -- Database layer for persistence (Elasticsearch/Typesense) -- Job queue for admin commands -- RPC client with fallback support - ---- - -## Use Cases - -### UC1: Normal Block Crawling (Continuous Indexing) - -**Description:** Continuously monitor blockchain networks and process new blocks containing Ocean Protocol events. - -**Trigger:** Automatic on node startup, runs indefinitely - -**Actors:** System (automatic) - -**Preconditions:** - -- Node is running -- Database is accessible -- RPC providers are configured -- Supported chains are configured - -**Main Flow:** - -1. Node starts → `OceanIndexer` constructor called -2. `startThreads()` invoked -3. For each supported chain: - - Validate RPC connection (with fallback support) - - Create `Blockchain` instance - - Create `ChainIndexer` instance - - Call `indexer.start()` (non-blocking, runs in background) -4. Each `ChainIndexer` runs asynchronously: - - Enters infinite `indexLoop()` using async/await - - Gets last indexed block from DB - - Gets current network height via RPC - - Calculates blocks to process (respects chunk size) - - **Event Retrieval:** Calls `provider.getLogs()` with Ocean Protocol event topic filters - - **Event Processing:** Routes events to appropriate processors - - **Database Updates:** Stores processed data - - Updates last indexed block - - Sleeps for interval (default 30s) - - Repeats until stop signal - -**Postconditions:** - -- All supported chains are being indexed concurrently -- Events are being processed and stored in real-time -- Last indexed block is updated per chain -- Event emitters notify downstream consumers - ---- - -### UC2: Process Metadata Created Event - -**Description:** Process a `MetadataCreated` event, validate it, decrypt DDO, and store it. - -**Trigger:** Event detected during block crawling - -**Actors:** System (automatic) - -**Preconditions:** - -- Block crawling is active -- Event log found in block range -- Event matches `METADATA_CREATED` signature - -**Main Flow:** - -1. Event log detected in `retrieveChunkEvents()` -2. Event routed to `processChunkLogs()` -3. Event identified as `METADATA_CREATED` -4. **Validation Phase:** - - Check if metadata validation is enabled - - Get transaction receipt - - Extract `MetadataValidated` events from receipt - - Validate validators against `allowedValidators` list - - If `allowedValidatorsList` configured: - - For each access list contract: - - Check `balanceOf()` for each validator - - Require at least one validator has balance > 0 - - If validation fails → skip event (continue to next) -5. **Processing Phase:** - - Get `MetadataEventProcessor` instance - - Call `processor.processEvent()` - - Check if NFT was deployed by Ocean Factory - - Decode event data from transaction receipt - - **Decrypt DDO:** - - Try HTTP decryption (from metadata URL) - - Try P2P decryption (from libp2p network) - - Try local decryption (if available) - - Handle nonce management - - Verify signatures - - Validate DDO hash matches generated DID - - Check authorized publishers - - Get NFT info (name, symbol, owner, etc.) - - Get token info (datatoken addresses, names, symbols) - - Get pricing stats (dispensers, exchanges, rates) - - Check purgatory status - - Check policy server - - Build DDO with `indexedMetadata` -6. **Storage Phase:** - - Update or create DDO in database - - Update DDO state (validation tracking) - - Emit `METADATA_CREATED` event to parent thread - - Parent thread emits to `INDEXER_DDO_EVENT_EMITTER` - -**Postconditions:** - -- DDO stored in database -- DDO state updated -- Event emitted for listeners - -**Error Handling:** - -- Validation failures → event skipped, logged -- Decryption failures → event skipped, DDO state marked invalid -- Database failures → error logged, event not stored - ---- - -### UC3: Process Metadata Updated Event - -**Description:** Process a `MetadataUpdated` event, update existing DDO. - -**Trigger:** Event detected during block crawling - -**Actors:** System (automatic) - -**Preconditions:** - -- Block crawling is active -- Event log found -- Event matches `METADATA_UPDATED` signature - -**Main Flow:** - -1. Similar to UC2 (Metadata Created) -2. Uses same `MetadataEventProcessor` -3. Validation phase identical -4. Processing phase: - - Retrieves existing DDO from database - - Updates DDO with new metadata - - Merges pricing and order stats -5. Storage phase: - - Updates DDO in database (not creates) - - Updates DDO state - - Emits `METADATA_UPDATED` event - -**Postconditions:** - -- DDO updated in database -- Event emitted - ---- - -### UC4: Process Order Started Event - -**Description:** Process an `OrderStarted` event, update order count and create order record. - -**Trigger:** Event detected during block crawling - -**Actors:** System (automatic) - -**Preconditions:** - -- Block crawling is active -- Event log found -- Event matches `ORDER_STARTED` signature - -**Main Flow:** - -1. Event log detected -2. Routed to `OrderStartedEventProcessor` -3. Decode event data: - - Consumer address - - Payer address - - Datatoken address - - NFT address - - Service ID - - Start order ID -4. Retrieve DDO from database -5. Update order count in DDO stats -6. Create order record in order database -7. Update DDO in database -8. Emit `ORDER_STARTED` event - -**Postconditions:** - -- Order record created -- DDO updated with order count -- Event emitted - ---- - -### UC5: Process Pricing Events (Dispenser/Exchange) - -**Description:** Process dispenser or exchange events (created, activated, deactivated, rate changed). - -**Trigger:** Event detected during block crawling - -**Actors:** System (automatic) - -**Preconditions:** - -- Block crawling is active -- Event log found -- Event matches pricing event signature - -**Main Flow:** - -1. Event identified (DispenserCreated, DispenserActivated, ExchangeActivated, etc.) -2. Routed to appropriate processor -3. Decode event data -4. Retrieve DDO from database -5. Update pricing arrays in DDO stats -6. Update DDO in database -7. Emit event (if applicable) - -**Postconditions:** - -- DDO pricing info updated -- Event emitted - ---- - -### UC6: Reindex Specific Transaction - -**Description:** Re-process a specific transaction that was already indexed (e.g., after bug fix). - -**Trigger:** Admin command via API - -**Actors:** Admin/Operator - -**Preconditions:** - -- Indexer is running -- ChainIndexer exists for chain -- Transaction hash is valid - -**Main Flow:** - -1. Admin calls `reindexTx` API endpoint -2. `ReindexTxHandler` validates command: - - Validates chainId is supported - - Validates txId format -3. `indexer.addReindexTask()` called on OceanIndexer -4. Job created and added to `JOBS_QUEUE` -5. Task added to `INDEXING_QUEUE` (OceanIndexer instance) -6. OceanIndexer calls `chainIndexer.addReindexTask(task)` -7. ChainIndexer adds task to its `reindexQueue` (instance property) -8. During next indexing loop iteration: - - `processReindexQueue()` called - - Task shifted from queue (FIFO) - - Get transaction receipt from RPC: `provider.getTransactionReceipt(txId)` - - Extract logs from receipt (all logs or specific index) - - Process logs using `processChunkLogs(logs, signer, provider, chainId)` - - ChainIndexer emits `REINDEX_QUEUE_POP` event -9. OceanIndexer event listener: - - Removes task from `INDEXING_QUEUE` - - Updates job status to SUCCESS via `updateJobStatus()` - - Emits to `INDEXER_CRAWLING_EVENT_EMITTER` - -**Postconditions:** - -- Transaction re-processed -- DDO updated in database -- Job status updated (DELIVERED → PENDING → SUCCESS) -- Event emitted to downstream consumers - -**Error Handling:** - -- If receipt not available yet → task remains in queue, retried next iteration -- If processing fails → error logged, task removed from queue (lost) -- If ChainIndexer not found → error returned, job not created -- No retry limit → task processed until successful or error - ---- - -### UC7: Reindex Entire Chain - -**Description:** Reset indexing for a chain and re-index from a specific block (or deployment block). - -**Trigger:** Admin command via API - -**Actors:** Admin/Operator - -**Preconditions:** - -- Indexer is running -- Chain is supported -- Optional: block number provided - -**Main Flow:** - -1. Admin calls `reindexChain` API endpoint -2. `ReindexChainHandler` validates command: - - Validates chainId is supported - - Validates block number (if provided) -3. `indexer.resetCrawling(chainId, blockNumber)` called on OceanIndexer -4. Check if ChainIndexer is running: - - Get indexer from `indexers.get(chainId)` - - If not running → call `startThread(chainId)` to create and start - - If start fails → return error -5. Job created and added to `JOBS_QUEUE` -6. OceanIndexer calls `chainIndexer.triggerReindexChain(blockNumber)` -7. ChainIndexer calculates target block: - - Get deployment block for chain - - If blockNumber provided and > deployment block → use it - - Else if `startBlock` configured and > deployment block → use it - - Else → use deployment block -8. Set `this.reindexBlock` to target block -9. During next indexing loop iteration: - - Check `this.reindexBlock !== null` - - Get network height - - Call `this.reindexChain(currentBlock, networkHeight)` - - Validate reindexBlock < networkHeight - - Update last indexed block: `updateLastIndexedBlockNumber(reindexBlock)` - - Delete all assets: `deleteAllAssetsFromChain()` - - If deletion fails → revert last block: `updateLastIndexedBlockNumber(currentBlock)` - - Clear `this.reindexBlock = null` - - ChainIndexer emits `REINDEX_CHAIN` event -10. OceanIndexer event listener: - - Updates job status (SUCCESS or FAILURE) via `updateJobStatus()` - - Emits to `INDEXER_CRAWLING_EVENT_EMITTER` - -**Postconditions:** - -- All assets deleted from chain (database cleared) -- Last indexed block reset to target block -- Normal crawling resumes from reset block -- Job status updated (DELIVERED → PENDING → SUCCESS/FAILURE) -- Downstream consumers notified - -**Error Handling:** - -- Invalid block (> network height) → error logged, reindex aborted, reindexBlock cleared -- Deletion failure → last block reverted to currentBlock, reindex fails, returns false -- Update block failure → error logged, reindex aborted, returns false -- ChainIndexer not found/can't start → error returned, job not created -- Database errors → error logged, manual retry needed - ---- - -### UC8: Version-Based Auto-Reindexing - -**Description:** Automatically trigger reindexing when node version requires it. - -**Trigger:** Node startup, before starting threads - -**Actors:** System (automatic) - -**Preconditions:** - -- Node is starting -- Database is accessible -- Version check enabled - -**Main Flow:** - -1. `startThreads()` called -2. `checkAndTriggerReindexing()` called first -3. Get current node version from `process.env.npm_package_version` -4. Get database version from `sqliteConfig` -5. Compare with `MIN_REQUIRED_VERSION` ('0.2.2') -6. If reindexing needed: - - For each supported chain: - - Delete all assets from chain - - Reset last indexed block to deployment block - - Log results - - Update database version to current -7. Continue with normal thread startup - -**Postconditions:** - -- Chains reindexed if needed -- Database version updated -- Normal indexing resumes - -**Error Handling:** - -- Database not reachable → reindexing skipped, error logged -- Deletion failures → error logged per chain, continues with other chains - ---- - -### UC9: Stop Indexing for Chain - -**Description:** Gracefully stop indexing for a specific chain. - -**Trigger:** Admin command or node shutdown - -**Actors:** Admin/System - -**Preconditions:** - -- Indexer is running -- Chain is being indexed - -**Main Flow:** +- Processes 12 different event types +- Adaptive error handling with multiple retry layers -1. `indexer.stopThread(chainId)` called on OceanIndexer -2. Get ChainIndexer: `indexer = indexers.get(chainId)` -3. If indexer exists: - - Call `await indexer.stop()` (async, waits for completion) - - ChainIndexer internally: - - Sets `this.stopSignal = true` - - Logs: "Stopping indexer for chain X, waiting for graceful shutdown..." - - Waits for loop to exit: `while (this.isRunning) await sleep(100)` - - Indexing loop checks `stopSignal` on each iteration - - When `stopSignal` is true → breaks loop - - Sets `this.isRunning = false` - - Logs: "Chain X indexer stopped" - - OceanIndexer: - - Removes from map: `indexers.delete(chainId)` - - Logs: "Stopped indexer for chain X" -4. Else: - - Error logged: "Unable to find running indexer for chain X" +**Key Components:** -**Postconditions:** - -- ChainIndexer stopped gracefully -- Indexing loop exited cleanly -- Instance removed from indexers map -- No more indexing for chain -- Any in-progress iteration completes before stop - -**Benefits of Current Implementation:** - -- Graceful shutdown (waits for current iteration to complete) -- No abrupt termination mid-processing -- Clean state (last indexed block updated) -- Async/await makes shutdown explicit and reliable - ---- - -### UC10: Handle RPC Connection Failures - -**Description:** Handle RPC provider failures and fallback to alternative providers. - -**Trigger:** RPC call failure during block retrieval - -**Actors:** System (automatic) - -**Preconditions:** - -- Block crawling active -- RPC call fails - -**Main Flow:** - -1. `retrieveChunkEvents()` called -2. `provider.getLogs()` fails -3. Exception caught in `processNetworkData()` -4. Error logged -5. **Adaptive chunk sizing:** - - `chunkSize = Math.floor(chunkSize / 2)` - - Minimum chunk size = 1 - - `successfulRetrievalCount` reset to 0 -6. Next iteration uses smaller chunk -7. After 3 successful retrievals: - - Revert to original `chunkSize` -8. **RPC Fallback (during startup):** - - `startCrawler()` checks network readiness - - If not ready → `tryFallbackRPCs()` called - - Tries each fallback RPC in order - - If any succeeds → use that provider -9. **Retry Logic:** - - `retryCrawlerWithDelay()` called during startup - - Max 10 retries - - Retry interval = `max(fallbackRPCs.length * 3000, 5000)` - - Recursive retry on failure - -**Postconditions:** - -- Smaller chunks processed -- Alternative RPC used if available -- Crawling continues - -**Error Handling:** - -- All RPCs fail → retry up to 10 times -- After max retries → worker thread not started -- Database check → if DB unreachable, give up +- **ChainIndexer** - Per-chain indexer running async indexing loop +- **Event Processors** - Handle specific blockchain event types (12 processors) +- **Validation Pipeline** - Multi-layer validation (factory, metadata, publishers) +- **Database Layer** - Persistence (Elasticsearch/Typesense) --- -## Event Monitoring Deep Dive +## Event Monitoring Architecture -### How Event Monitoring Works +### Continuous Monitoring Process -The Ocean Node Indexer monitors blockchain events using a sophisticated multi-step process that ensures no events are missed while maintaining performance and reliability. +**Location:** `ChainIndexer.ts` - `indexLoop()` -#### 1. Event Discovery Process +``` +┌─────────────────────────────────────────────────────────────────┐ +│ CONTINUOUS MONITORING LOOP │ +│ │ +│ async indexLoop() { │ +│ while (!stopSignal) { │ +│ 1. Get last indexed block from DB │ +│ 2. Get current network height from RPC │ +│ 3. Calculate chunk size (adaptive: 1-1000 blocks) │ +│ 4. Retrieve events: provider.getLogs(fromBlock, toBlock) │ +│ 5. Process events through pipeline │ +│ 6. Update last indexed block in DB │ +│ 7. Emit events to downstream consumers │ +│ 8. Sleep for interval (default: 30 seconds) │ +│ 9. Process reindex queue (if any) │ +│ } │ +│ } │ +└─────────────────────────────────────────────────────────────────┘ +``` -**Location:** `ChainIndexer.ts` - `indexLoop()` → `retrieveChunkEvents()` +### Event Discovery Mechanism -**Step-by-Step:** +**Step-by-Step Process:** ``` -1. ChainIndexer maintains current position (lastIndexedBlock) - ├─> Retrieved from database on each iteration - └─> Persisted after successful processing - -2. Get network height from RPC - ├─> Current blockchain tip - └─> Determines how many blocks to process +1. Get Network State + ├─> lastIndexedBlock = await db.indexer.retrieve(chainId) + ├─> networkHeight = await provider.getBlockNumber() + └─> startBlock = max(lastIndexedBlock, deploymentBlock) -3. Calculate chunk to process - ├─> remainingBlocks = networkHeight - lastIndexedBlock +2. Calculate Chunk to Process + ├─> remainingBlocks = networkHeight - startBlock ├─> blocksToProcess = min(chunkSize, remainingBlocks) - └─> Default chunkSize from config (typically 100-1000 blocks) - -4. Call provider.getLogs() with filters - ├─> fromBlock: lastIndexedBlock + 1 - ├─> toBlock: lastIndexedBlock + blocksToProcess - ├─> topics: [ALL_OCEAN_EVENT_HASHES] - └─> Returns array of Log objects - -5. Process logs through pipeline - ├─> Identify event type by topic hash - ├─> Route to appropriate processor - ├─> Validate and transform - └─> Store in database - -6. Update lastIndexedBlock - └─> Only updated on successful processing + └─> Adaptive chunkSize (halves on error, recovers after 3 successes) + +3. Retrieve Events from Blockchain + └─> provider.getLogs({ + fromBlock: lastIndexedBlock + 1, + toBlock: lastIndexedBlock + blocksToProcess, + topics: [OCEAN_EVENT_TOPIC_HASHES] // Filter by event signatures + }) + Returns: Log[] (raw blockchain event logs) + +4. Route Events to Processors + └─> processChunkLogs(logs, signer, provider, chainId) ``` -**Event Topic Filtering:** +### Event Topic Filtering -The indexer listens for these event signatures (identified by topic[0]): +The indexer listens for these Ocean Protocol event signatures: ```typescript EVENT_HASHES = { - '0x5463569dcc320958360074a9ab27e809e8a6942c394fb151d139b5f7b4ecb1bd': MetadataCreated - '0x127c3f87d5f806ee52e3045f6c9c39e0ef0a3c96c0c75f3e18b84917b88dc2b3': MetadataUpdated - '0x1f432bc9a19ebfc7c5e1cb25e4faeea2f7e162a3af75ae6fd7f4d7ba24d93052': MetadataState - '0xa0e0424cb5b1293c12c34b4a4867cc2a426e665be57d01dfa48aaaa0c90ec7c0': OrderStarted - '0x6e0dd7434b30641fa1c2e87c22ac88fc95c44b50f8b0c24b8c01c3ac88a41f65': OrderReused - '0xdcda18b5bc4d3564ccef3d80910ad33cc3e2bb60f09e0d1be21501f97a71ea51': DispenserCreated - '0x6e0cf36da82bc089a41b8ba5a4aaa4e6f4f3c36a2ba0e47f8d4b5bd4c82b17ab': DispenserActivated - '0x53ae36d41e99f27c63c6c8d7d1c8fd58e1f1dbc7d5d9c0d8e2f6d8a5c4b3a2b1': DispenserDeactivated - '0xdcda18b5bc4d3564ccef3d80910ad33cc3e2bb60f09e0d1be21501f97a71ea52': ExchangeCreated - '0x6e0cf36da82bc089a41b8ba5a4aaa4e6f4f3c36a2ba0e47f8d4b5bd4c82b17ac': ExchangeActivated - '0x53ae36d41e99f27c63c6c8d7d1c8fd58e1f1dbc7d5d9c0d8e2f6d8a5c4b3a2b2': ExchangeDeactivated - '0x7b3b3f0f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f3f': ExchangeRateChanged + // Metadata Events + '0x5463569d...': METADATA_CREATED + '0x127c3f87...': METADATA_UPDATED + '0x1f432bc9...': METADATA_STATE + + // Order Events + '0xa0e0424c...': ORDER_STARTED + '0x6e0dd743...': ORDER_REUSED + + // Dispenser Events + '0xdcda18b5...': DISPENSER_CREATED + '0x6e0cf36d...': DISPENSER_ACTIVATED + '0x53ae36d4...': DISPENSER_DEACTIVATED + + // Exchange Events + '0xdcda18b5...': EXCHANGE_CREATED + '0x6e0cf36d...': EXCHANGE_ACTIVATED + '0x53ae36d4...': EXCHANGE_DEACTIVATED + '0x7b3b3f0f...': EXCHANGE_RATE_CHANGED } ``` -#### 2. Event Identification & Routing - -**Location:** `processor.ts` - `processChunkLogs()` - -When logs are retrieved, each log goes through: - -``` -For each log in retrieved logs: - │ - ├─> 1. Extract topic[0] (event signature hash) - │ - ├─> 2. Look up in EVENT_HASHES mapping - │ └─> Identifies event type (e.g., METADATA_CREATED) - │ - ├─> 3. Check if Ocean Protocol event - │ └─> If not recognized → skip - │ - ├─> 4. Apply event-specific validation - │ └─> For metadata events: check validators - │ └─> For other events: basic validation - │ - ├─> 5. Get or create processor instance - │ └─> Cached per (eventType + chainId) - │ - ├─> 6. Call processor.processEvent() - │ └─> Event-specific handling logic - │ - └─> 7. Store result for batch emission -``` - -### Detailed Event Handling Per Type - -#### A. METADATA_CREATED Event - -**Trigger:** When a new data asset is published on-chain - -**On-Chain Event Data:** +**Monitoring Frequency:** + +- Checks for new blocks every 30 seconds (configurable via `INDEXER_INTERVAL`) +- Processes up to `chunkSize` blocks per iteration (default: 100-1000) +- Adaptive: reduces chunk size on RPC errors, recovers after successes + +--- + +## Event Processing Pipeline + +### Overall Flow + +``` +Raw Blockchain Logs + ↓ +┌─────────────────────────────────────────────────────────────┐ +│ 1. EVENT IDENTIFICATION │ +│ - Extract topic[0] (event signature hash) │ +│ - Look up in EVENT_HASHES mapping │ +│ - Check if Ocean Protocol event │ +└─────────────────────────────────────────────────────────────┘ + ↓ +┌─────────────────────────────────────────────────────────────┐ +│ 2. VALIDATION (for metadata events) │ +│ - Get transaction receipt │ +│ - Extract MetadataValidated events │ +│ - Check allowedValidators list │ +│ - Check access list memberships (balanceOf calls) │ +│ - If validation fails → skip event, continue to next │ +└─────────────────────────────────────────────────────────────┘ + ↓ +┌─────────────────────────────────────────────────────────────┐ +│ 3. ROUTE TO PROCESSOR │ +│ - Get cached processor instance (per eventType + chain) │ +│ - Call processor.processEvent() │ +└─────────────────────────────────────────────────────────────┘ + ↓ +┌─────────────────────────────────────────────────────────────┐ +│ 4. EVENT-SPECIFIC PROCESSING │ +│ - Factory validation (NFT deployed by Ocean) │ +│ - Decode event data from receipt │ +│ - Decrypt/decompress DDO (if metadata event) │ +│ - Fetch additional on-chain data (NFT info, pricing) │ +│ - Build domain model with enriched metadata │ +└─────────────────────────────────────────────────────────────┘ + ↓ +┌─────────────────────────────────────────────────────────────┐ +│ 5. DATABASE PERSISTENCE │ +│ - Create or update DDO │ +│ - Update DDO state (validation tracking) │ +│ - Create order records (if order event) │ +└─────────────────────────────────────────────────────────────┘ + ↓ +┌─────────────────────────────────────────────────────────────┐ +│ 6. EVENT EMISSION │ +│ - ChainIndexer emits to INDEXER_CRAWLING_EVENT_EMITTER │ +│ - OceanIndexer re-emits to INDEXER_DDO_EVENT_EMITTER │ +│ - Downstream consumers notified (API, cache, webhooks) │ +└─────────────────────────────────────────────────────────────┘ +``` + +### Location References + +**Event Monitoring:** `ChainIndexer.ts` - `indexLoop()` +**Event Identification:** `processor.ts` - `processChunkLogs()` +**Event Routing:** `processor.ts` - `getEventProcessor()` +**Event Processing:** `processors/*.ts` - `processEvent()` + +--- + +## Detailed Event Handling + +### A. METADATA_CREATED Event + +**Trigger:** New data asset published on-chain + +**On-Chain Data:** - `owner` - Publisher address -- `flags` - Encryption flags +- `flags` - Encryption/compression flags - `metadata` - Encrypted/compressed DDO - `metadataHash` - SHA256 hash of DDO -- `validateTime` - Timestamp **Processing Steps:** ``` -MetadataEventProcessor.processEvent(): - │ - ├─> 1. FACTORY VALIDATION - │ └─> wasNFTDeployedByOurFactory() - │ ├─> Instantiate ERC721Factory contract - │ ├─> Call getCurrentNFTCount() - │ ├─> Loop through all NFTs - │ ├─> Check if NFT address matches - │ └─> If not deployed by Ocean → REJECT, return null - │ - ├─> 2. DECODE EVENT DATA - │ └─> getEventData(provider, txHash, ERC721Template.abi, eventName) - │ ├─> Fetch transaction receipt - │ ├─> Find log matching event hash - │ ├─> Parse with contract ABI - │ └─> Extract: owner, flags, metadata, metadataHash, etc. - │ - ├─> 3. DDO DECRYPTION (Complex, 400+ lines) - │ └─> decryptDDO(decryptorURL, flag, owner, nftAddress, chainId, txId, metadataHash, metadata) - │ │ - │ ├─> Check flag bit 2 (encrypted vs compressed) - │ │ - │ ├─> IF ENCRYPTED (flag & 2 != 0): - │ │ ├─> Determine decryptor type: - │ │ │ ├─> HTTP URL → Call external provider - │ │ │ ├─> PeerID → Call via P2P network - │ │ │ └─> Local node → Internal handler - │ │ │ - │ │ ├─> Build signature: - │ │ │ ├─> Get nonce from provider - │ │ │ ├─> Create message: txId + ethAddress + chainId + nonce - │ │ │ ├─> Hash with solidityPackedKeccak256 - │ │ │ ├─> Sign with wallet - │ │ │ └─> Verify signature - │ │ │ - │ │ ├─> Make decrypt request: - │ │ │ ├─> POST /api/services/decrypt - │ │ │ ├─> Payload: { transactionId, chainId, decrypterAddress, dataNftAddress, signature, nonce } - │ │ │ ├─> Timeout: 30 seconds - │ │ │ ├─> Retry up to 5 times (withRetrial) - │ │ │ └─> Handle 400/403 errors (no retry) - │ │ │ - │ │ └─> Validate response hash: - │ │ ├─> create256Hash(response.data) - │ │ ├─> Compare with metadataHash - │ │ └─> If mismatch → REJECT - │ │ - │ └─> IF COMPRESSED (flag & 2 == 0): - │ ├─> getBytes(metadata) - │ ├─> toUtf8String(byteArray) - │ └─> JSON.parse(utf8String) - │ - ├─> 4. VALIDATE DDO ID - │ └─> Check ddo.id matches makeDid(nftAddress, chainId) - │ └─> If mismatch → REJECT, update ddoState with error - │ - ├─> 5. CHECK AUTHORIZED PUBLISHERS - │ └─> If authorizedPublishers configured: - │ └─> Check if owner in authorizedPublishers list - │ └─> If not → REJECT, update ddoState - │ - ├─> 6. FETCH NFT INFORMATION - │ └─> getNFTInfo(nftAddress, signer, owner, timestamp) - │ ├─> Instantiate NFT contract - │ ├─> Call getMetaData() → get state - │ ├─> Call getId() → get token ID - │ ├─> Call tokenURI(id) → get URI - │ ├─> Call name() → get name - │ ├─> Call symbol() → get symbol - │ └─> Return: { state, address, name, symbol, owner, created, tokenURI } - │ - ├─> 7. FETCH TOKEN INFORMATION - │ └─> getTokenInfo(ddo.services, signer) - │ └─> For each service in DDO: - │ ├─> Instantiate datatoken contract (ERC20) - │ ├─> Call name() → get name - │ ├─> Call symbol() → get symbol - │ └─> Collect: { address, name, symbol, serviceId } - │ - ├─> 8. FETCH PRICING INFORMATION - │ └─> getPricingStatsForDddo(nftAddress, signer, provider, chainId) - │ ├─> Get all datatokens from NFT - │ ├─> For each datatoken: - │ │ ├─> Check dispenser: - │ │ │ ├─> Get Dispenser contract address - │ │ │ ├─> Call status(datatoken, owner) - │ │ │ └─> If active → add to prices array - │ │ └─> Check exchange: - │ │ ├─> Get FixedRateExchange address - │ │ ├─> Call getAllExchanges() - │ │ ├─> Filter by datatoken - │ │ └─> If active → add rate to prices array - │ └─> Return pricing arrays per service - │ - ├─> 9. CHECK PURGATORY STATUS - │ └─> Purgatory.check(nftAddress, chainId, account) - │ ├─> Check if NFT is in purgatory list - │ ├─> Check if account is in purgatory list - │ └─> Return: { state: boolean } - │ - ├─> 10. CHECK POLICY SERVER - │ └─> If policyServer configured: - │ ├─> POST to policy server endpoint - │ ├─> Payload: { did, chain, nft } - │ └─> Check response (approve/deny) - │ - ├─> 11. BUILD INDEXED METADATA - │ └─> Construct indexedMetadata object: - │ ├─> nft: { state, address, name, symbol, owner, created, tokenURI } - │ ├─> event: { txid, from, contract, block, datetime } - │ ├─> stats: [{ - │ │ datatokenAddress, - │ │ name, - │ │ symbol, - │ │ serviceId, - │ │ orders: 0, // Initial count - │ │ prices: [{ type: 'dispenser|exchange', price, contract, token, exchangeId }] - │ │ }] - │ └─> purgatory: { state } - │ - ├─> 12. STORE IN DATABASE - │ └─> createOrUpdateDDO(ddo, method) - │ ├─> ddoDatabase.create(ddo) // New asset - │ ├─> ddoState.create(chainId, did, nftAddress, txId, valid=true) - │ └─> Return saved DDO - │ - └─> 13. EMIT EVENT - └─> Event emitted to INDEXER_DDO_EVENT_EMITTER - └─> Downstream consumers notified (API, cache, webhooks) -``` - -**Database Operations:** - -- INSERT into `ddo` table (Elasticsearch/Typesense) -- INSERT into `ddoState` table (validation tracking) +1. FACTORY VALIDATION + └─> wasNFTDeployedByOurFactory(chainId, signer, nftAddress) + ├─> Instantiate ERC721Factory contract + ├─> Loop through all NFTs from factory + └─> If not deployed by Ocean → REJECT, skip event -**Error Handling:** - -- Factory validation fail → skip, log error -- Decryption fail → skip, update ddoState with error -- DDO ID mismatch → skip, update ddoState -- Publisher not authorized → skip, update ddoState -- Database fail → error logged, event not stored +2. DECODE EVENT DATA + └─> getEventData(provider, txHash, ERC721Template.abi) + ├─> Fetch transaction receipt + ├─> Find log matching event hash + ├─> Parse with contract ABI + └─> Extract: owner, flags, metadata, metadataHash ---- - -#### B. METADATA_UPDATED Event - -**Trigger:** When asset metadata is updated on-chain - -**Processing Steps:** - -``` -MetadataEventProcessor.processEvent(): - │ - ├─> 1-10. Same as METADATA_CREATED - │ (validation, decryption, fetching info) - │ - ├─> 11. RETRIEVE EXISTING DDO - │ └─> ddoDatabase.retrieve(ddo.id) - │ - ├─> 12. MERGE DDO DATA - │ └─> Merge new metadata with existing: - │ ├─> Update: metadata, services, credentials - │ ├─> Preserve: existing order counts - │ ├─> Merge: pricing arrays (add new, keep existing) - │ └─> Update: indexedMetadata.event (new tx, block, datetime) - │ - ├─> 13. UPDATE DATABASE - │ └─> ddoDatabase.update(mergedDdo) - │ - └─> 14. EMIT EVENT - └─> METADATA_UPDATED event emitted -``` +3. DDO DECRYPTION (Complex: 400+ lines, 3 strategies) + └─> decryptDDO(decryptorURL, flag, owner, nftAddress, chainId, txId, metadataHash, metadata) + │ + ├─> IF ENCRYPTED (flag & 2 != 0): + │ ├─> Get nonce from provider/timestamp + │ ├─> Build signature: + │ │ - message = txId + ethAddress + chainId + nonce + │ │ - hash = solidityPackedKeccak256(message) + │ │ - signature = wallet.signMessage(hash) + │ ├─> HTTP: POST /api/services/decrypt + │ │ - Payload: { transactionId, chainId, signature, nonce } + │ │ - Timeout: 30 seconds + │ │ - Retry: up to 5 times (withRetrial) + │ ├─> P2P: p2pNode.sendTo(decryptorURL, message) + │ ├─> Local: node.getCoreHandlers().handle(decryptDDOTask) + │ └─> Validate response hash matches metadataHash + │ + └─> IF COMPRESSED (flag & 2 == 0): + └─> Parse directly: JSON.parse(toUtf8String(getBytes(metadata))) -**Key Difference from CREATED:** +4. VALIDATE DDO ID + └─> Check ddo.id === makeDid(nftAddress, chainId) + └─> If mismatch → REJECT, update ddoState with error -- Uses `update()` instead of `create()` -- Merges with existing data instead of creating new -- Preserves order statistics +5. CHECK AUTHORIZED PUBLISHERS (if configured) + └─> Check if owner in authorizedPublishers list + └─> If not → REJECT, update ddoState with error ---- +6. FETCH NFT INFORMATION (multiple RPC calls) + └─> getNFTInfo(nftAddress, signer, owner, timestamp) + ├─> nftContract.getMetaData() → state + ├─> nftContract.getId() → token ID + ├─> nftContract.tokenURI(id) → URI + ├─> nftContract.name() → name + ├─> nftContract.symbol() → symbol + └─> Return: { state, address, name, symbol, owner, created, tokenURI } -#### C. ORDER_STARTED Event +7. FETCH TOKEN INFORMATION (per datatoken) + └─> For each service in DDO: + ├─> datatokenContract.name() + ├─> datatokenContract.symbol() + └─> Collect: { address, name, symbol, serviceId } -**Trigger:** When someone purchases/orders access to a data asset +8. FETCH PRICING INFORMATION (multiple RPC calls) + └─> For each datatoken: + ├─> Check dispenser: dispenserContract.status(datatoken) + ├─> Check exchange: exchangeContract.getAllExchanges() + └─> Build prices array: [{ type, price, contract, token }] -**On-Chain Event Data:** +9. CHECK PURGATORY STATUS + └─> Purgatory.check(nftAddress, chainId, account) + └─> Return: { state: boolean } -- `consumer` - Buyer address -- `payer` - Payment source address -- `datatoken` - Datatoken address -- `serviceId` - Service identifier -- `amount` - Amount paid -- `timestamp` - Order time +10. BUILD INDEXED METADATA + └─> Construct enriched metadata: + ├─> nft: { state, address, name, symbol, owner, created, tokenURI } + ├─> event: { txid, from, contract, block, datetime } + ├─> stats: [{ datatokenAddress, name, symbol, orders: 0, prices: [...] }] + └─> purgatory: { state } -**Processing Steps:** +11. STORE IN DATABASE + └─> ddoDatabase.create(ddo) + ddoState.create(chainId, did, nftAddress, txId, valid=true) -``` -OrderStartedEventProcessor.processEvent(): - │ - ├─> 1. DECODE EVENT DATA - │ └─> Parse event args: - │ ├─> consumer - │ ├─> payer - │ ├─> datatoken - │ ├─> amount - │ └─> timestamp - │ - ├─> 2. FIND NFT ADDRESS - │ └─> Query datatoken contract: - │ ├─> Instantiate ERC20 contract - │ ├─> Call getERC721Address() - │ └─> Get NFT address - │ - ├─> 3. BUILD DID - │ └─> did = makeDid(nftAddress, chainId) - │ - ├─> 4. RETRIEVE DDO - │ └─> ddoDatabase.retrieve(did) - │ └─> If not found → error, cannot update - │ - ├─> 5. UPDATE ORDER COUNT - │ └─> Find matching service in ddo.stats: - │ ├─> Match by datatokenAddress - │ └─> Increment orders count - │ - ├─> 6. CREATE ORDER RECORD - │ └─> orderDatabase.create({ - │ type: 'startOrder', - │ timestamp, - │ consumer, - │ payer, - │ datatokenAddress, - │ nftAddress, - │ did, - │ startOrderId: txHash - │ }) - │ - ├─> 7. UPDATE DDO - │ └─> ddoDatabase.update(ddo) - │ - └─> 8. EMIT EVENT - └─> ORDER_STARTED event emitted +12. EMIT EVENT + └─> eventEmitter.emit(METADATA_CREATED, { chainId, data: ddo }) ``` -**Database Operations:** +**RPC Calls Per Event:** -- UPDATE `ddo` table (increment order count) -- INSERT into `order` table (new order record) +- 1 call: transaction receipt +- 1+ calls: factory validation +- 1+ calls: NFT info (name, symbol, state, etc.) +- 1+ calls per datatoken: token info +- Multiple calls: pricing info (dispensers, exchanges) +- Optional: access list checks (1+ per validator) ---- - -#### D. DISPENSER_ACTIVATED Event - -**Trigger:** When a free dispenser is activated for a datatoken - -**On-Chain Event Data:** - -- `datatoken` - Datatoken address -- `owner` - Dispenser owner -- `dispenserId` - Unique identifier - -**Processing Steps:** - -``` -DispenserActivatedEventProcessor.processEvent(): - │ - ├─> 1. DECODE EVENT DATA - │ └─> Extract: datatoken, owner, dispenserId - │ - ├─> 2. FIND NFT ADDRESS - │ └─> Query datatoken contract → getNFTAddress() - │ - ├─> 3. RETRIEVE DDO - │ └─> ddoDatabase.retrieve(did) - │ - ├─> 4. UPDATE PRICING ARRAY - │ └─> Find service by datatokenAddress: - │ └─> Add to prices array: - │ { - │ type: 'dispenser', - │ price: '0', // Free - │ contract: dispenserAddress, - │ token: ZeroAddress, - │ dispenserId - │ } - │ - ├─> 5. UPDATE DDO - │ └─> ddoDatabase.update(ddo) - │ - └─> 6. EMIT EVENT - └─> DISPENSER_ACTIVATED event emitted -``` +**Total:** ~10-20 RPC calls per metadata event --- -#### E. EXCHANGE_RATE_CHANGED Event +### B. ORDER_STARTED Event -**Trigger:** When exchange rate is updated for a fixed-rate exchange - -**On-Chain Event Data:** - -- `exchangeId` - Exchange identifier -- `baseToken` - Base token address -- `datatoken` - Datatoken address -- `newRate` - New exchange rate +**Trigger:** Someone purchases access to a data asset **Processing Steps:** ``` -ExchangeRateChangedEventProcessor.processEvent(): - │ - ├─> 1. DECODE EVENT DATA - │ └─> Extract: exchangeId, baseToken, datatoken, newRate - │ - ├─> 2. FIND NFT ADDRESS - │ └─> Query datatoken contract → getNFTAddress() - │ - ├─> 3. RETRIEVE DDO - │ └─> ddoDatabase.retrieve(did) - │ - ├─> 4. UPDATE PRICING ARRAY - │ └─> Find service by datatokenAddress: - │ └─> Find exchange entry by exchangeId: - │ └─> Update price: newRate - │ - ├─> 5. UPDATE DDO - │ └─> ddoDatabase.update(ddo) - │ - └─> 6. EMIT EVENT - └─> EXCHANGE_RATE_CHANGED event emitted -``` +1. DECODE EVENT DATA + └─> Extract: consumer, payer, datatoken, amount, timestamp ---- - -### Event Processing Pipeline Summary - -``` -┌─────────────────────────────────────────────────────────────────────┐ -│ CONTINUOUS MONITORING LOOP │ -│ │ -│ ChainIndexer (per chain) running async/await: │ -│ while (!stopSignal): │ -│ ├─> Get last indexed block from DB │ -│ ├─> Get network height from RPC │ -│ ├─> Calculate chunk size (adaptive) │ -│ ├─> provider.getLogs(fromBlock, toBlock, topics) │ -│ │ └─> Returns: Log[] (raw blockchain logs) │ -│ │ │ -│ └─> processChunkLogs(logs, signer, provider, chainId) │ -│ │ │ -│ └─> For each log: │ -│ ├─> Identify event by topic[0] │ -│ ├─> Check if Ocean Protocol event │ -│ ├─> Apply validation (if metadata event) │ -│ ├─> Route to processor │ -│ │ └─> processEvent() called │ -│ │ ├─> Decode on-chain data │ -│ │ ├─> Fetch additional data (RPC calls) │ -│ │ ├─> Transform to domain model │ -│ │ └─> Store in database │ -│ └─> Collect result │ -│ │ -│ ├─> Update last indexed block │ -│ ├─> Emit events to INDEXER_DDO_EVENT_EMITTER │ -│ └─> Sleep for interval (30s default) │ -└─────────────────────────────────────────────────────────────────────┘ - - ↓ - -┌─────────────────────────────────────────────────────────────────────┐ -│ EVENT EMITTER LISTENERS │ -│ │ -│ Downstream consumers subscribe to events: │ -│ ├─> API endpoints (query fresh data) │ -│ ├─> Cache invalidation (update cache) │ -│ ├─> Webhooks (notify external services) │ -│ ├─> Analytics (track metrics) │ -│ └─> P2P network (advertise new assets) │ -└─────────────────────────────────────────────────────────────────────┘ -``` - -### Performance Characteristics +2. FIND NFT ADDRESS + └─> datatokenContract.getERC721Address() -**Event Monitoring Frequency:** +3. BUILD DID + └─> did = makeDid(nftAddress, chainId) -- Check for new blocks every 30 seconds (configurable) -- Process up to `chunkSize` blocks per iteration (default 100-1000) -- Adaptive chunk sizing on RPC errors (halves on failure, recovers after 3 successes) +4. RETRIEVE EXISTING DDO + └─> ddoDatabase.retrieve(did) -**Concurrency:** +5. UPDATE ORDER COUNT + └─> Find service by datatokenAddress + └─> Increment orders count -- All chains monitored concurrently (async/await) -- No worker threads (optimal for I/O-bound operations) -- Events within a chunk processed serially (to maintain order) +6. CREATE ORDER RECORD + └─> orderDatabase.create({ + type: 'startOrder', + timestamp, + consumer, + payer, + datatokenAddress, + nftAddress, + did, + startOrderId: txHash + }) -**RPC Call Patterns:** +7. UPDATE DDO + └─> ddoDatabase.update(ddo) -- 1 call to get network height per iteration -- 1 call to getLogs per chunk -- Per metadata event: - - 1-2 calls for transaction receipt - - 1+ calls for factory validation - - 1+ calls for NFT info (name, symbol, state) - - 1+ calls for token info (per datatoken) - - Multiple calls for pricing info (dispensers, exchanges) - - Optional: access list checks (1+ per validator) - -**Database Operations:** - -- 1 read to get last indexed block -- 1 write to update last indexed block -- Per event: 1-2 writes (ddo + ddoState/order) -- No batching currently implemented - -**Failure Recovery:** +8. EMIT EVENT + └─> eventEmitter.emit(ORDER_STARTED, { chainId, data }) +``` -- RPC failure → reduce chunk size, retry -- Processing failure → don't update last block, retry same chunk -- Validation failure → skip event, continue with next -- Database failure → error logged, event not stored +**RPC Calls:** 1-2 (get NFT address, possibly receipt) --- -## Current Flows - Detailed Analysis - -### Flow 1: Initialization & Startup +### C. PRICING EVENTS (Dispenser/Exchange) -**Location:** `index.ts` - `OceanIndexer` constructor and `startThreads()` +**Events:** DISPENSER_ACTIVATED, EXCHANGE_RATE_CHANGED, etc. -**Sequence:** +**Processing Steps:** ``` -1. OceanIndexer constructor called - ├─> Initialize database reference - ├─> Store supported networks (RPCS config) - ├─> Initialize INDEXING_QUEUE = [] - ├─> Create Map for indexers - └─> Call startThreads() - -2. startThreads() - ├─> checkAndTriggerReindexing() [UC8] - │ ├─> Get current version from process.env - │ ├─> Get DB version from sqliteConfig - │ ├─> Compare with MIN_REQUIRED_VERSION - │ ├─> If reindex needed: - │ │ ├─> For each chain: - │ │ │ ├─> Delete all assets: ddo.deleteAllAssetsFromChain() - │ │ │ └─> Reset last indexed block to deployment block - │ │ └─> Update DB version - │ └─> Continue to indexer startup - │ - ├─> setupEventListeners() [Global event handlers] - │ ├─> Listen for METADATA_CREATED on INDEXER_CRAWLING_EVENT_EMITTER - │ ├─> Listen for METADATA_UPDATED - │ ├─> Listen for ORDER_STARTED - │ ├─> Listen for REINDEX_QUEUE_POP - │ ├─> Listen for REINDEX_CHAIN - │ └─> Re-emit to INDEXER_DDO_EVENT_EMITTER (for external consumers) - │ - └─> For each supported chain (sequential): - ├─> startThread(chainId) - │ ├─> Check if indexer already running - │ │ └─> If yes: stop it, wait, then proceed - │ │ - │ ├─> Get network config (rpc, fallbackRPCs, chunkSize, etc.) - │ │ - │ ├─> Create Blockchain instance - │ │ └─> new Blockchain(rpc, chainId, config, fallbackRPCs) - │ │ - │ ├─> Validate connectivity: retryCrawlerWithDelay() - │ │ ├─> Check: blockchain.isNetworkReady() - │ │ ├─> If not ready: tryFallbackRPCs() - │ │ ├─> Check: DB reachable - │ │ ├─> Retry up to 10 times with exponential backoff - │ │ └─> Return: canStart (boolean) - │ │ - │ ├─> If connectivity failed → return null, skip chain - │ │ - │ ├─> Create ChainIndexer instance - │ │ └─> new ChainIndexer(blockchain, rpcDetails, INDEXER_CRAWLING_EVENT_EMITTER) - │ │ - │ ├─> Start indexer (non-blocking!) - │ │ └─> await indexer.start() - │ │ └─> Internally calls indexLoop() without await - │ │ └─> Returns immediately, loop runs in background - │ │ - │ └─> Store: indexers.set(chainId, indexer) - │ - └─> Return: all indexers started successfully +1. DECODE EVENT DATA + └─> Extract event-specific data -3. Each ChainIndexer now running independently - └─> Async indexLoop() executing concurrently for all chains -``` +2. FIND NFT ADDRESS + └─> Query datatoken contract -**Key Behaviors:** +3. RETRIEVE DDO + └─> ddoDatabase.retrieve(did) -- Version check happens before indexers start -- Each chain gets its own ChainIndexer instance -- RPC connection validated before starting indexer -- All indexers run concurrently via async/await (no worker threads) -- Event listeners set up globally (shared EventEmitter) -- ChainIndexers emit events, OceanIndexer re-emits to external consumers +4. UPDATE PRICING ARRAY + └─> Find service by datatokenAddress + └─> Add/update price entry -**Current Architecture Benefits:** +5. UPDATE DDO + └─> ddoDatabase.update(ddo) + +6. EMIT EVENT + └─> eventEmitter.emit(eventType, { chainId, data }) +``` -- No worker threads → simpler code, easier debugging -- Async/await → better error handling, stack traces preserved -- EventEmitter → decoupled communication -- All indexers share same Node.js event loop -- Optimal for I/O-bound workloads (RPC calls, DB queries) +**RPC Calls:** 1-2 --- -### Flow 2: Block Crawling Loop (ChainIndexer) +## Error Handling & Retry Mechanisms -**Location:** `ChainIndexer.ts` - `indexLoop()` +### Overview: 4 Retry Layers + +The indexer has 4 different retry mechanisms at different levels: + +``` +┌──────────────────────────────────────────────────────────────┐ +│ LAYER 1: Crawler Startup Retry │ +│ Location: OceanIndexer - retryCrawlerWithDelay() │ +│ Scope: Initial RPC/DB connection │ +│ Max Retries: 10 │ +│ Interval: max(fallbackRPCs.length * 3000, 5000) ms │ +│ Strategy: Recursive retry with fallback RPCs │ +│ Checks: Network ready + DB reachable │ +└──────────────────────────────────────────────────────────────┘ + ↓ +┌──────────────────────────────────────────────────────────────┐ +│ LAYER 2: Adaptive Chunk Sizing │ +│ Location: ChainIndexer - indexLoop() │ +│ Scope: RPC getLogs() failures │ +│ Max Retries: Infinite (until success or stop) │ +│ Strategy: Halve chunk size on error (min: 1 block) │ +│ Recovery: Revert to original after 3 successes │ +└──────────────────────────────────────────────────────────────┘ + ↓ +┌──────────────────────────────────────────────────────────────┐ +│ LAYER 3: Block Processing Retry │ +│ Location: ChainIndexer - indexLoop() catch block │ +│ Scope: Event processing errors │ +│ Max Retries: Infinite │ +│ Strategy: Don't update lastBlock, retry same chunk │ +│ Backoff: Sleep for interval (30s) before retry │ +└──────────────────────────────────────────────────────────────┘ + ↓ +┌──────────────────────────────────────────────────────────────┐ +│ LAYER 4: Individual RPC Retry │ +│ Location: BaseProcessor - withRetrial() │ +│ Scope: DDO decryption HTTP calls │ +│ Max Retries: 5 │ +│ Strategy: Exponential backoff │ +│ Conditions: Only retry on ECONNREFUSED │ +└──────────────────────────────────────────────────────────────┘ +``` + +### Layer 1: Startup Retry + +**Purpose:** Ensure RPC and DB are reachable before starting indexer + +**Code Flow:** -**Sequence:** +```typescript +async retryCrawlerWithDelay(blockchain: Blockchain, interval = 5000) { + const retryInterval = Math.max(blockchain.getKnownRPCs().length * 3000, interval) -``` -async indexLoop() { - // Initialization - contractDeploymentBlock = getDeployedContractBlock(chainId) - crawlingStartBlock = rpcDetails.startBlock || contractDeploymentBlock - provider = blockchain.getProvider() - signer = blockchain.getSigner() - interval = getCrawlingInterval() // Default 30s - chunkSize = rpcDetails.chunkSize || 1 - successfulRetrievalCount = 0 - lockProcessing = false - startedCrawling = false - - // Main loop - while (!this.stopSignal) { - if (!lockProcessing) { - lockProcessing = true - - try { - // 1. GET CURRENT STATE - lastIndexedBlock = await this.getLastIndexedBlock() - networkHeight = await getNetworkHeight(provider) - startBlock = lastIndexedBlock > crawlingStartBlock - ? lastIndexedBlock - : crawlingStartBlock - - INDEXER_LOGGER.info( - `Chain ${chainId}: Last=${lastIndexedBlock}, Start=${startBlock}, Height=${networkHeight}` - ) - - // 2. CHECK IF WORK TO DO - if (networkHeight > startBlock) { - // Emit one-shot event when crawling starts - if (!startedCrawling) { - startedCrawling = true - this.eventEmitter.emit(INDEXER_CRAWLING_EVENTS.CRAWLING_STARTED, { - chainId, - startBlock, - networkHeight, - contractDeploymentBlock - }) - } - - // 3. CALCULATE CHUNK SIZE - remainingBlocks = networkHeight - startBlock - blocksToProcess = min(chunkSize, remainingBlocks) - - INDEXER_LOGGER.info(`Processing ${blocksToProcess} blocks...`) - - // 4. RETRIEVE EVENTS FROM RPC - let chunkEvents = [] - try { - chunkEvents = await retrieveChunkEvents( - signer, - provider, - chainId, - startBlock, - blocksToProcess - ) - // Inside retrieveChunkEvents(): - // provider.getLogs({ - // fromBlock: startBlock + 1, - // toBlock: startBlock + blocksToProcess, - // topics: [ALL_OCEAN_EVENT_HASHES] - // }) - - successfulRetrievalCount++ - } catch (error) { - // ADAPTIVE CHUNK SIZING on RPC error - INDEXER_LOGGER.warn(`RPC error: ${error.message}`) - chunkSize = floor(chunkSize / 2) < 1 ? 1 : floor(chunkSize / 2) - successfulRetrievalCount = 0 - INDEXER_LOGGER.info(`Reduced chunk size to ${chunkSize}`) - // Continue to next iteration - } - - // 5. PROCESS EVENTS - try { - processedBlocks = await processBlocks( - chunkEvents, - signer, - provider, - chainId, - startBlock, - blocksToProcess - ) - // processBlocks() calls processChunkLogs() - // which routes events to processors - - INDEXER_LOGGER.debug( - `Processed ${processedBlocks.foundEvents.length} events from ${chunkEvents.length} logs` - ) - - // 6. UPDATE LAST INDEXED BLOCK (critical!) - currentBlock = await this.updateLastIndexedBlockNumber( - processedBlocks.lastBlock, - lastIndexedBlock - ) - // Inside updateLastIndexedBlockNumber(): - // indexerDb.update(chainId, block) - // Returns new lastIndexedBlock or -1 on failure - - // Safety check - if (currentBlock < 0 && lastIndexedBlock !== null) { - currentBlock = lastIndexedBlock - INDEXER_LOGGER.error('Failed to update last block, keeping old value') - } - - // 7. EMIT EVENTS FOR NEWLY INDEXED ASSETS - this.emitNewlyIndexedAssets(processedBlocks.foundEvents) - // Emits to INDEXER_CRAWLING_EVENT_EMITTER: - // - METADATA_CREATED - // - METADATA_UPDATED - // - ORDER_STARTED - // - ORDER_REUSED - // - DISPENSER_ACTIVATED/DEACTIVATED - // - EXCHANGE_ACTIVATED/DEACTIVATED/RATE_CHANGED - - // 8. ADAPTIVE CHUNK SIZE RECOVERY - if (successfulRetrievalCount >= 3 && chunkSize < rpcDetails.chunkSize) { - chunkSize = rpcDetails.chunkSize - successfulRetrievalCount = 0 - INDEXER_LOGGER.info(`Reverted chunk size to ${chunkSize}`) - } - - } catch (error) { - // PROCESSING ERROR - INDEXER_LOGGER.error(`Processing failed: ${error.message}`) - successfulRetrievalCount = 0 - // Critical: Don't update last block → retry same chunk - await sleep(interval) - } - - } else { - // No new blocks available - await sleep(interval) - } - - // 9. PROCESS REINDEX QUEUE - await this.processReindexQueue(provider, signer) - // Processes this.reindexQueue (FIFO) - // For each task: - // - Get transaction receipt - // - Process logs from receipt - // - Emit REINDEX_QUEUE_POP event - - // 10. HANDLE CHAIN REINDEX COMMAND - if (this.reindexBlock !== null) { - networkHeight = await getNetworkHeight(provider) - result = await this.reindexChain(currentBlock, networkHeight) - - this.eventEmitter.emit(INDEXER_CRAWLING_EVENTS.REINDEX_CHAIN, { - result, - chainId - }) - } - - } catch (error) { - INDEXER_LOGGER.error(`Error in indexing loop: ${error.message}`) - await sleep(interval) - } finally { - lockProcessing = false - } + // Try to connect + const result = await startCrawler(blockchain) + const dbActive = this.getDatabase() + + // Check DB reachable + if (!dbActive || !(await isReachableConnection(dbActive.getConfig().url))) { + INDEXER_LOGGER.error(`Giving up start crawling. DB is not online!`) + return false + } + if (result) { + INDEXER_LOGGER.info('Blockchain connection successfully established!') + return true + } else { + numCrawlAttempts++ + if (numCrawlAttempts <= MAX_CRAWL_RETRIES) { + await sleep(retryInterval) + return this.retryCrawlerWithDelay(blockchain, retryInterval) // Recursive } else { - // Already processing, wait a bit - INDEXER_LOGGER.debug('Processing in progress, waiting...') - await sleep(1000) + INDEXER_LOGGER.error(`Giving up after ${MAX_CRAWL_RETRIES} retries.`) + return false } } - - // 11. CLEANUP ON STOP - this.isRunning = false - INDEXER_LOGGER.info(`Exiting indexer loop for chain ${chainId}`) } ``` -**Key Behaviors:** - -- Infinite async loop with `lockProcessing` flag -- Adaptive chunk sizing on RPC errors (halves on error, recovers after 3 successes) -- Last block only updated on successful processing (critical for consistency) -- Reindex queue processed after each chunk -- One-shot `CRAWLING_STARTED` event -- Graceful shutdown via `stopSignal` -- All operations use async/await (no callbacks, no worker threads) - -**Current Implementation Improvements:** - -- `lockProcessing` now has actual waiting: `await sleep(1000)` when locked -- Instance state (`this.reindexBlock`, `this.reindexQueue`) instead of global -- Better error handling with try/catch/finally -- Cleaner shutdown: sets `isRunning = false` -- EventEmitter instead of postMessage (simpler, type-safe) +**Behavior:** -**Performance Characteristics:** - -- One iteration per 30 seconds (if caught up) -- Processes up to `chunkSize` blocks per iteration (typically 100-1000) -- On RPC error: chunk size halves (min 1) → slower but more reliable -- Recovery: after 3 successful calls → chunk size restored -- No parallel event processing within chunk (maintains order) +- Recursive retry up to 10 times +- Increasing interval based on number of fallback RPCs +- Checks both RPC and DB connectivity +- Tries fallback RPCs if available --- -### Flow 3: Event Processing Pipeline +### Layer 2: Adaptive Chunk Sizing -**Location:** `processor.ts` - `processChunkLogs()` +**Purpose:** Handle RPC rate limits and transient failures -**Sequence:** +**Code Flow:** -``` -processChunkLogs(logs, signer, provider, chainId): - storeEvents = {} - - if (logs.length > 0) { - config = await getConfiguration() - checkMetadataValidated = (allowedValidators.length > 0 || - allowedValidatorsList exists) - - for each log in logs: - // 1. Identify event - event = findEventByKey(log.topics[0]) - - if (event && event.type in EVENTS): - // 2. Metadata validation (if metadata event) - if (event.type in [METADATA_CREATED, METADATA_UPDATED, METADATA_STATE]): - if (checkMetadataValidated): - // Get transaction receipt - txReceipt = await provider.getTransactionReceipt(log.txHash) - - // Extract MetadataValidated events - metadataProofs = fetchEventFromTransaction( - txReceipt, 'MetadataValidated', ERC20Template.abi - ) - - if (!metadataProofs): - continue // Skip event - - // Extract validator addresses - validators = metadataProofs.map(proof => proof.args[0]) - - // Check allowed validators - allowed = allowedValidators.filter(v => - validators.indexOf(v) !== -1 - ) - - if (!allowed.length): - continue // Skip event - - // Check access lists (if configured) - if (allowedValidatorsList && validators.length > 0): - isAllowed = false - for each accessListAddress in allowedValidatorsList[chainId]: - accessListContract = new Contract(accessListAddress, ...) - for each validator in validators: - balance = await accessListContract.balanceOf(validator) - if (balance > 0): - isAllowed = true - break - if (isAllowed) break - - if (!isAllowed): - continue // Skip event - - // 3. Route to processor - if (event.type === TOKEN_URI_UPDATE): - storeEvents[event.type] = 'TOKEN_URI_UPDATE' - else: - processor = getEventProcessor(event.type, chainId) - result = await processor.processEvent( - log, chainId, signer, provider, event.type - ) - storeEvents[event.type] = result - - return storeEvents +```typescript +// In indexLoop() +let chunkSize = rpcDetails.chunkSize || 1 +let successfulRetrievalCount = 0 + +while (!stopSignal) { + try { + chunkEvents = await retrieveChunkEvents( + signer, + provider, + chainId, + startBlock, + blocksToProcess + ) + successfulRetrievalCount++ + } catch (error) { + // ERROR: Reduce chunk size + INDEXER_LOGGER.warn(`RPC error: ${error.message}`) + chunkSize = Math.floor(chunkSize / 2) < 1 ? 1 : Math.floor(chunkSize / 2) + successfulRetrievalCount = 0 + INDEXER_LOGGER.info(`Reduced chunk size to ${chunkSize}`) } - return {} + // SUCCESS: Recover after 3 successes + if (successfulRetrievalCount >= 3 && chunkSize < rpcDetails.chunkSize) { + chunkSize = rpcDetails.chunkSize + successfulRetrievalCount = 0 + INDEXER_LOGGER.info(`Reverted chunk size to ${chunkSize}`) + } +} ``` -**Key Behaviors:** - -- Sequential processing (one event at a time) -- Validation happens before processing -- Multiple RPC calls per metadata event (receipt + access list checks) -- Processor instances cached per event type + chain -- Events skipped silently on validation failure +**Behavior:** -**Issues Observed:** - -- Nested validation logic (hard to read) -- Multiple RPC calls per event (performance issue) -- No parallelization -- No batch validation -- Silent failures (just `continue`) +- On RPC error: halve chunk size (minimum 1 block) +- After 3 consecutive successes: restore original chunk size +- No max retries (continues until successful or stopped) +- Self-healing mechanism --- -### Flow 4: Metadata Event Processing +### Layer 3: Block Processing Retry -**Location:** `processors/MetadataEventProcessor.ts` - `processEvent()` +**Purpose:** Handle event processing errors without losing progress -**Sequence:** +**Code Flow:** -``` -processEvent(log, chainId, signer, provider, eventName): - // 1. Factory check - wasDeployedByUs = await wasNFTDeployedByOurFactory( - chainId, signer, event.address +```typescript +// In indexLoop() +try { + processedBlocks = await processBlocks( + chunkEvents, + signer, + provider, + chainId, + startBlock, + blocksToProcess ) - if (!wasDeployedByUs): - return // Skip - // 2. Decode event - decodedEventData = await getEventData( - provider, log.txHash, ERC721Template.abi, eventName - ) - metadata = decodedEventData.args[4] - metadataHash = decodedEventData.args[5] - flag = decodedEventData.args[3] - owner = decodedEventData.args[0] - - // 3. Decrypt DDO (400+ lines) - ddo = await decryptDDO( - decodedEventData.args[2], flag, owner, - event.address, chainId, log.txHash, - metadataHash, metadata + // UPDATE last indexed block on success + currentBlock = await updateLastIndexedBlockNumber( + processedBlocks.lastBlock, + lastIndexedBlock ) - // 4. Validate DDO ID - ddoInstance = DDOManager.getDDOClass(ddo) - expectedDid = ddoInstance.makeDid(event.address, chainId) - if (ddo.id !== expectedDid): - await ddoState.update(..., false, 'DID mismatch') - return - - // 5. Check authorized publishers - if (authorizedPublishers configured): - if (owner not in authorizedPublishers): - await ddoState.update(..., false, 'Unauthorized publisher') - return - - // 6. Get NFT info - nftInfo = await getNFTInfo(event.address, signer) - - // 7. Get token info - tokenInfo = await getTokenInfo(event.address, signer, provider) - - // 8. Get pricing stats - pricingStats = await getPricingStatsForDddo( - event.address, signer, provider, chainId - ) - - // 9. Check purgatory - purgatoryStatus = await Purgatory.check(...) - - // 10. Check policy server - policyServerCheck = await checkPolicyServer(...) - - // 11. Build indexed metadata - indexedMetadata = { - nft: nftInfo, - event: { txid, from, contract, block, datetime }, - stats: [{ - datatokenAddress, name, symbol, serviceId, - orders: number, - prices: [...] - }], - purgatory: purgatoryStatus - } - - // 12. Create or update DDO - if (eventName === METADATA_CREATED): - await ddoDatabase.create(ddo) - else: - existingDdo = await ddoDatabase.retrieve(ddo.id) - // Merge stats - updatedDdo = mergeDDO(existingDdo, ddo) - await ddoDatabase.update(updatedDdo) + emitNewlyIndexedAssets(processedBlocks.foundEvents) +} catch (error) { + // ERROR: Don't update last block + INDEXER_LOGGER.error(`Processing failed: ${error.message}`) + successfulRetrievalCount = 0 - // 13. Update DDO state - await ddoState.update(chainId, ddo.id, event.address, - log.txHash, true, null) + // Wait before retrying same chunk + await sleep(interval) // 30 seconds - return ddo + // Next iteration will retry same chunk (lastBlock not updated) +} ``` -**Key Behaviors:** - -- Many sequential async operations -- Multiple RPC calls (NFT info, token info, pricing) -- DDO decryption with multiple strategies -- State tracking separate from DDO storage -- Stats merging for updates +**Behavior:** -**Issues Observed:** +- On processing error: last indexed block NOT updated +- Next iteration retries the same block range +- Sleep interval before retry (30s default) +- No max retries (infinite until successful) +- Preserves data integrity (no gaps in indexed blocks) -- Very long method (400+ lines) -- Many external calls (slow) -- No batching of RPC calls -- Decryption logic complex and hard to test -- No error recovery for individual steps +**Critical:** This ensures no events are lost even if processing fails --- -### Flow 5: Reindex Transaction - -**Location:** `crawlerThread.ts` - `processReindex()` - -**Sequence:** - -``` -processReindex(provider, signer, chainId): - while (REINDEX_QUEUE.length > 0): - reindexTask = REINDEX_QUEUE.pop() - - try: - // Get transaction receipt - receipt = await provider.getTransactionReceipt( - reindexTask.txId - ) - - if (receipt): - // Extract logs - if (reindexTask.eventIndex defined): - log = receipt.logs[reindexTask.eventIndex] - logs = [log] - else: - logs = receipt.logs - - // Process logs (same as normal flow) - await processChunkLogs(logs, signer, provider, chainId) - - // Notify parent - parentPort.postMessage({ - method: REINDEX_QUEUE_POP, - data: { reindexTask } - }) - else: - // Receipt not found, re-queue - REINDEX_QUEUE.push(reindexTask) - - catch (error): - // Error logged, task lost - INDEXER_LOGGER.error(...) -``` +### Layer 4: DDO Decryption Retry -**Key Behaviors:** +**Purpose:** Handle transient HTTP/network errors during DDO decryption -- Processes queue during normal crawling loop -- Uses same processing pipeline as normal events -- Re-queues on receipt not found -- No retry limit +**Code Flow:** -**Issues Observed:** - -- Tasks can be lost on error -- No timeout for receipt retrieval -- Processes during normal crawling (could slow down) -- No priority mechanism - ---- +```typescript +// In BaseProcessor - decryptDDO() +const response = await withRetrial(async () => { + const { nonce, signature } = await createSignature() + + const payload = { + transactionId: txId, + chainId, + decrypterAddress: keys.ethAddress, + dataNftAddress: contractAddress, + signature, + nonce + } -### Flow 6: Reindex Chain + try { + const res = await axios({ + method: 'post', + url: `${decryptorURL}/api/services/decrypt`, + data: payload, + timeout: 30000, + validateStatus: (status) => { + return (status >= 200 && status < 300) || status === 400 || status === 403 + } + }) -**Location:** `crawlerThread.ts` - `reindexChain()` + if (res.status === 400 || res.status === 403) { + // Don't retry client errors + return res + } -**Sequence:** + if (res.status !== 200 && res.status !== 201) { + // Retry 5XX errors + throw new Error(`bProvider exception: ${res.status}`) + } + return res + } catch (err) { + // Only retry on connection refused + if (err.code === 'ECONNREFUSED' || err.message.includes('ECONNREFUSED')) { + INDEXER_LOGGER.error(`Decrypt failed with ECONNREFUSED, retrying...`) + throw err // Will be retried by withRetrial + } + throw err // Other errors not retried + } +}) ``` -reindexChain(currentBlock, networkHeight): - // 1. Validate block - if (REINDEX_BLOCK > networkHeight): - REINDEX_BLOCK = null - return false - - // 2. Update last indexed block - block = await updateLastIndexedBlockNumber(REINDEX_BLOCK) - - if (block !== -1): - REINDEX_BLOCK = null - // 3. Delete all assets - res = await deleteAllAssetsFromChain() +**withRetrial Implementation:** - if (res === -1): - // Deletion failed, revert block - await updateLastIndexedBlockNumber(currentBlock) - return false - - return true - else: - REINDEX_BLOCK = null - return false +```typescript +// Max 5 retries with exponential backoff +async function withRetrial(fn: () => Promise, maxRetries = 5): Promise { + for (let i = 0; i < maxRetries; i++) { + try { + return await fn() + } catch (error) { + if (i === maxRetries - 1) throw error + await sleep(Math.pow(2, i) * 1000) // Exponential backoff + } + } +} ``` -**Key Behaviors:** - -- Validates block before proceeding -- Updates block first, then deletes assets -- Reverts block if deletion fails -- Clears `REINDEX_BLOCK` flag +**Behavior:** -**Issues Observed:** - -- No transaction wrapping (block update + deletion) -- Race condition possible (normal crawling could interfere) -- No progress tracking -- Can take very long for large chains +- Max 5 retries with exponential backoff (1s, 2s, 4s, 8s, 16s) +- Only retries ECONNREFUSED errors (connection issues) +- Does NOT retry 400/403 (client errors) +- Retries 5XX errors (server errors) +- 30-second timeout per attempt --- -## Event Processing Flows - -### Event Types Processed - -1. **Metadata Events:** - - - `METADATA_CREATED` - New asset published - - `METADATA_UPDATED` - Asset metadata updated - - `METADATA_STATE` - Asset state changed - -2. **Order Events:** - - - `ORDER_STARTED` - New order initiated - - `ORDER_REUSED` - Order reused - -3. **Dispenser Events:** - - - `DISPENSER_CREATED` - Dispenser created - - `DISPENSER_ACTIVATED` - Dispenser activated - - `DISPENSER_DEACTIVATED` - Dispenser deactivated - -4. **Exchange Events:** - - - `EXCHANGE_CREATED` - Exchange created - - `EXCHANGE_ACTIVATED` - Exchange activated - - `EXCHANGE_DEACTIVATED` - Exchange deactivated - - `EXCHANGE_RATE_CHANGED` - Exchange rate changed - -5. **Other:** - - `TOKEN_URI_UPDATE` - Token URI updated (no processing) - -### Event Flow Summary - -``` -Block Logs - ↓ -Event Identification (by topic hash) - ↓ -Validation (for metadata events) - ├─> Factory check - ├─> Metadata proof validation - ├─> Access list check - └─> Publisher authorization - ↓ -Route to Processor - ├─> MetadataEventProcessor - ├─> OrderStartedEventProcessor - ├─> DispenserEventProcessor - └─> ExchangeEventProcessor - ↓ -Process Event - ├─> Decode event data - ├─> Fetch additional data (RPC calls) - ├─> Transform to domain model - └─> Store in database - ↓ -Emit Event (to parent thread) - ↓ -Parent Thread Emits (to listeners) -``` - ---- - -## Error Handling & Retry Mechanisms - -### Current Retry Mechanisms (4 Layers) - -**Layer 1: Crawler Startup Retry** - -- Location: `index.ts` - `retryCrawlerWithDelay()` -- Max retries: 10 -- Interval: `max(fallbackRPCs.length * 3000, 5000)` ms -- Recursive retry -- Checks DB reachability - -**Layer 2: Adaptive Chunk Sizing** - -- Location: `crawlerThread.ts` - `processNetworkData()` -- On RPC error: `chunkSize = floor(chunkSize / 2)` (min 1) -- Reverts after 3 successful calls -- No max retries (infinite) - -**Layer 3: Block Processing Retry** - -- Location: `crawlerThread.ts` - `processNetworkData()` -- On processing error: sleep and retry same chunk -- No max retries -- Last block not updated on error - -**Layer 4: Individual RPC Retry** - -- Location: `processors/BaseProcessor.ts` - `withRetrial()` -- Max retries: 5 -- Used in `decryptDDO()` -- Exponential backoff - ### Error Handling Issues +**Current Problems:** + 1. **No Centralized Strategy:** - 4 different retry mechanisms - - Unclear which applies when - - No consistent backoff + - No coordination between layers + - Unclear which mechanism applies when 2. **Silent Failures:** - - Events skipped with `continue` - - No error tracking - - No metrics on failures + - Events skipped with `continue` statement + - No error tracking or metrics + - Difficult to diagnose missing events 3. **No Circuit Breaker:** - - Continues retrying failed RPCs + - Continues retrying failed RPCs indefinitely - Can cause cascade failures - - No health tracking - -4. **State Recovery:** - - Last block not updated on error - - Same chunk retried indefinitely - - No timeout mechanism - ---- - -## Async/Await Architecture & Concurrency - -### Current Architecture (No Worker Threads) - -- **One ChainIndexer instance per chain** -- **Main thread:** `OceanIndexer` orchestrator -- **Communication:** Direct `EventEmitter` (event-driven) -- **State:** Instance-based (no shared state between chains) -- **Concurrency:** Async/await leveraging Node.js event loop - -### Indexer Lifecycle - -``` -OceanIndexer ChainIndexer (Chain 1) - │ │ - │ new ChainIndexer(...) │ - ├──────────────────────────────────────→│ Constructor - │ │ - │ await indexer.start() │ - ├──────────────────────────────────────→│ start() called - │ (returns immediately) │ ├─> Set stopSignal = false - │ │ ├─> Set isRunning = true - │ │ └─> Call indexLoop() without await - │ │ (runs in background) - │ │ - │ │ async indexLoop() - │ │ while (!stopSignal) { - │ │ ├─> Get last block - │ │ ├─> Get network height - │ │ ├─> Retrieve events - │ │ ├─> Process events - │ │ ├─> Update last block - │ │ └─> Sleep 30s - │ │ } - │ │ - │ indexer.addReindexTask(task) │ - ├──────────────────────────────────────→│ Add to reindexQueue - │ │ (processed in next iteration) - │ │ - │ │ eventEmitter.emit(METADATA_CREATED) - │ ←──────────────────────────────────────┤ - │ (event listener catches) │ - │ re-emit to INDEXER_DDO_EVENT_EMITTER │ - │ │ - │ await indexer.stop() │ - ├──────────────────────────────────────→│ stop() called - │ (waits for graceful shutdown) │ ├─> Set stopSignal = true - │ │ └─> Wait for loop to exit - │ │ while (isRunning) sleep(100) - │ │ - │ ←──────────────────────────────────────┤ isRunning = false - │ (stop() returns) │ Loop exited -``` - -### Concurrency Model - -**How Multiple Chains Run Concurrently:** - -``` -Node.js Event Loop - │ - ├─> ChainIndexer(chain=1).indexLoop() - │ └─> await getLastBlock() ────→ I/O operation (yields control) - │ - ├─> ChainIndexer(chain=137).indexLoop() - │ └─> await provider.getLogs() ───→ I/O operation (yields control) - │ - ├─> ChainIndexer(chain=8996).indexLoop() - │ └─> await processBlocks() ──────→ I/O operation (yields control) - │ - └─> (all run concurrently via async/await) -``` - -**Key Point:** When one indexer awaits an I/O operation (RPC call, DB query), control yields to the event loop, allowing other indexers to progress. No worker threads needed! - -### Benefits Over Worker Threads - -1. **Simpler Code:** - - - No `postMessage()` / `parentPort` complexity - - Direct method calls - - Clear data flow - - Standard async/await patterns - -2. **Better Error Handling:** - - - Stack traces preserved across async boundaries - - try/catch works normally - - Errors don't crash entire thread - - No serialization errors - -3. **State Management:** - - - Instance-based state (each ChainIndexer has its own) - - No global state between chains - - No race conditions on shared state - - TypeScript types preserved - -4. **Debugging:** - - - Can use standard debuggers - - Breakpoints work normally - - Console.log from anywhere - - No need to debug worker threads - -5. **Testing:** - - Easy to mock ChainIndexer - - No Worker API to mock - - Can unit test methods directly - - Faster test execution - -### Current Concurrency Characteristics + - No health status tracking -1. **Lock Mechanism:** +4. **Infinite Retries:** - - `lockProcessing` flag prevents re-entry - - Actual waiting: `await sleep(1000)` when locked - - No race conditions (single-threaded per instance) - -2. **Event Ordering:** - - - Events emitted in order per chain - - EventEmitter guarantees listener order - - No message queue (immediate delivery) - -3. **Error Propagation:** - - - Errors caught in indexLoop() - - Logged with chain context - - Loop continues after error - - `isRunning` flag tracks health - -4. **Graceful Shutdown:** - - `stop()` sets `stopSignal = true` - - Loop exits on next iteration - - `await` ensures complete shutdown - - No orphaned processes - -### Why This Works for I/O-Bound Workloads - -**Ocean Node Indexer is I/O-bound:** - -- 90%+ time spent waiting for: - - RPC calls (network I/O) - - Database queries (disk/network I/O) - - Sleep intervals -- Minimal CPU-bound work (event decoding, JSON parsing) - -**Async/await is optimal because:** + - Layer 2 and 3 have no max retries + - Can get stuck on persistent errors + - No timeout mechanism -- During I/O wait, other indexers can progress -- No context switching overhead (vs threads) -- No memory duplication (vs processes) -- Single event loop handles all concurrency +5. **No Error Classification:** + - All processing errors treated equally + - No distinction between retryable and permanent errors + - Bad events can block entire chunk --- @@ -1962,24 +682,29 @@ Node.js Event Loop **Current Behavior:** -1. `retrieveChunkEvents()` throws error -2. Caught in `processNetworkData()` -3. Chunk size reduced -4. Sleep and retry -5. If all RPCs fail → retry up to 10 times during startup -6. After max retries → worker not started +``` +1. retrieveChunkEvents() throws error +2. Caught in indexLoop() +3. Adaptive chunk sizing triggered: + - chunkSize = floor(chunkSize / 2) + - Minimum: 1 block +4. Next iteration retries with smaller chunk +5. If all fallback RPCs fail during startup: + - retryCrawlerWithDelay() retries up to 10 times + - After max retries → ChainIndexer not started +``` **Recovery:** -- Manual restart required -- No automatic RPC health tracking -- No circuit breaker +- Self-healing via chunk size reduction +- Fallback RPC support (tries alternatives) +- Manual restart required if startup fails after 10 retries **Issues:** -- Slow recovery (chunk size reduction) -- No provider health tracking -- Can get stuck retrying +- No RPC health tracking +- No circuit breaker (keeps retrying forever after startup) +- Can get very slow (chunk size = 1) --- @@ -1987,393 +712,183 @@ Node.js Event Loop **Current Behavior:** -1. DB call fails -2. Error logged -3. Last block not updated -4. Same chunk retried -5. Can loop indefinitely +``` +1. DB operation fails (read or write) +2. Error thrown and caught in indexLoop() +3. Last indexed block NOT updated +4. Sleep for interval (30s) +5. Next iteration retries same chunk +6. Repeats indefinitely until DB available +``` **Recovery:** -- No automatic recovery -- Manual intervention needed -- State may be inconsistent +- Automatic retry (infinite) +- Data integrity preserved (no gaps) +- No manual intervention needed (if DB comes back) **Issues:** - No DB health check -- No timeout -- Can process events but not store them +- No timeout (infinite retry) +- Can process events but not store them (wasted work) +- No notification that DB is down --- -### Scenario 3: Worker Thread Crashes +### Scenario 3: Processing Error in Event Handler **Current Behavior:** -1. Worker throws uncaught error -2. `worker.on('error')` handler logs error -3. `worker.on('exit')` handler sets `runningThreads[chainId] = false` -4. No automatic restart +``` +1. processor.processEvent() throws error +2. Caught in processBlocks() +3. Error re-thrown +4. Caught in indexLoop() +5. Last indexed block NOT updated +6. Sleep for interval +7. Next iteration retries same chunk +``` **Recovery:** -- Manual restart via API -- Or node restart +- Retry same chunk indefinitely +- No max retries +- Eventually succeeds if error is transient **Issues:** -- No automatic restart -- State lost (in-memory queues) -- No health monitoring +- Bad event data can block entire chunk +- No skip mechanism for permanently bad events +- No event-level error handling +- All events in chunk must succeed + +**Example:** If chunk has 100 events and event #50 is corrupted, the entire chunk retries forever. --- -### Scenario 4: Processing Error in Event Handler +### Scenario 4: DDO Decryption Fails **Current Behavior:** -1. `processor.processEvent()` throws error -2. Caught in `processBlocks()` -3. Error re-thrown -4. Caught in `processNetworkData()` -5. Last block not updated -6. Sleep and retry same chunk +``` +1. decryptDDO() throws error after 5 retries +2. Error caught in processEvent() +3. Event skipped +4. ddoState updated with error message +5. Processing continues with next event +``` **Recovery:** -- Retry same chunk -- No max retries -- Can loop forever on bad event +- Event marked as invalid in ddoState +- Other events in chunk processed normally +- No retry (event permanently skipped) **Issues:** -- No error classification -- No skip mechanism for bad events -- Can block progress +- Event lost (not retried later) +- No notification mechanism +- Needs manual intervention (reindex tx) --- -### Scenario 5: Reindex Task Fails +### Scenario 5: Validation Failure **Current Behavior:** -1. `processReindex()` called -2. Receipt not found → re-queued -3. Processing error → logged, task lost -4. No retry limit +``` +1. Validation fails (e.g., not from Ocean Factory) +2. `continue` statement executed +3. Event silently skipped +4. No database update +5. Processing continues with next event +``` **Recovery:** -- Re-queue on receipt not found -- Lost on processing error -- No timeout +- No recovery (by design) +- Event intentionally ignored **Issues:** -- Tasks can be lost -- No retry limit -- No timeout - ---- - -## State Management - -### Global State Variables - -**Parent Thread (`index.ts`):** - -- `INDEXING_QUEUE: ReindexTask[]` - Reindex tasks -- `JOBS_QUEUE: JobStatus[]` - Admin job queue -- `runningThreads: Map` - Thread status -- `globalWorkers: Map` - Worker references -- `numCrawlAttempts: number` - Retry counter - -**Worker Thread (`crawlerThread.ts`):** - -- `REINDEX_BLOCK: number` - Chain reindex target -- `REINDEX_QUEUE: ReindexTask[]` - Transaction reindex queue -- `stoppedCrawling: boolean` - Stop flag -- `startedCrawling: boolean` - Start flag - -**Database:** - -- `indexer` table - Last indexed block per chain -- `ddo` table - DDO documents -- `ddoState` table - Validation state -- `order` table - Order records -- `sqliteConfig` table - Node version - -### State Synchronization Issues - -1. **Dual Queues:** - - - `INDEXING_QUEUE` (parent) and `REINDEX_QUEUE` (worker) - - Can get out of sync - - No transaction - -2. **Last Block Updates:** - - - Updated after processing - - Not updated on error - - Can lead to gaps or duplicates - -3. **Job Status:** - - - Updated via `updateJobStatus()` - - Searches entire queue (O(n)) - - Can have duplicates - -4. **Thread Status:** - - `runningThreads` and `globalWorkers` can diverge - - No cleanup on crash - ---- - -## Observations & Pain Points - -### Complexity Issues - -1. **Mixed Concerns:** - - - Crawler thread handles: networking, validation, processing, state - - Hard to test individual components - - Changes affect multiple areas - -2. **Nested Logic:** - - - Validation logic deeply nested (80+ lines) - - Hard to read and maintain - - Error paths unclear - -3. **Long Methods:** - - `processNetworkData()` - 160+ lines - - `processChunkLogs()` - 120+ lines - - `decryptDDO()` - 400+ lines - - Hard to understand flow - -### Performance Issues - -1. **Serial Processing:** - - - Events processed one at a time - - No parallelization - - Slow for large chunks - -2. **Many RPC Calls:** - - - Receipt per metadata event - - Access list checks per validator - - NFT info, token info, pricing per event - - No batching - -3. **Database Calls:** - - One call per event - - No batching - - No transaction wrapping - -### Reliability Issues - -1. **Error Recovery:** - - - Multiple retry mechanisms - - Unclear recovery paths - - Can get stuck in loops - -2. **State Consistency:** - - - No transactions - - State can be inconsistent - - No rollback mechanism - -3. **Observability:** - - Only logs - - No metrics - - Hard to debug production issues - -### Testing Issues - -1. **Worker Threads:** - - - Hard to unit test - - Requires mocking Worker API - - Integration tests slow - -2. **Tight Coupling:** - - - Database calls throughout - - RPC calls in processors - - Hard to mock - -3. **Global State:** - - Tests can interfere - - Hard to isolate - - Flaky tests +- Silent failures (no logging at error level) +- No metrics on skipped events +- Difficult to diagnose why events are missing --- ## Summary -This document provides a comprehensive view of all indexer use cases, event monitoring mechanisms, and current flows. Key takeaways: - -### Architecture Overview - -1. **Current Implementation:** - - - Uses ChainIndexer classes (one per blockchain) - - Async/await architecture (no worker threads) - - Event-driven communication via EventEmitter - - Optimal for I/O-bound operations - -2. **Event Monitoring:** +### Event Monitoring Characteristics - - Continuous block scanning (30-second intervals) - - Filter-based event retrieval (topic hashes) - - 12 different event types supported - - Real-time processing and database updates +**Monitoring:** -3. **Event Processing Pipeline:** - - Event identification by topic hash - - Multi-layer validation (factory, metadata, publishers) - - Complex DDO decryption (HTTP, P2P, local) - - Rich metadata enrichment (NFT info, pricing, orders) - - Database persistence with state tracking +- Continuous polling every 30 seconds +- Processes 1-1000 blocks per iteration (adaptive) +- Filter-based event retrieval (12 event types) +- Per-chain monitoring (concurrent via async/await) -### Documentation Scope +**Processing:** -1. **10 Main Use Cases** covering: +- Sequential within chunk (maintains order) +- Multi-layer validation (factory → metadata → publisher) +- Complex DDO decryption (3 strategies: HTTP, P2P, local) +- Rich metadata enrichment (10-20 RPC calls per metadata event) - - Normal block crawling and indexing - - Event processing (metadata, orders, pricing) - - Admin operations (reindex tx, reindex chain) - - Error handling and recovery +**Performance:** -2. **Event Monitoring Deep Dive** showing: +- ~10-20 RPC calls per metadata event +- ~1-2 RPC calls per order/pricing event +- No batching (events processed one at a time) +- No parallelization within chunk - - How events are discovered on-chain - - Topic filtering and identification - - Detailed processing for each event type - - Database operations and state updates +### Error Handling Characteristics -3. **6 Detailed Flows** with: +**Retry Mechanisms:** - - Initialization and startup sequence - - Block crawling loop (ChainIndexer) - - Event processing pipeline - - Metadata event handling - - Reindex operations +- Layer 1: Startup (10 retries, recursive, checks DB) +- Layer 2: Adaptive chunk sizing (infinite, self-healing) +- Layer 3: Block processing (infinite, preserves integrity) +- Layer 4: DDO decryption (5 retries, exponential backoff) -4. **4 Retry Mechanisms** across: - - - Crawler startup (10 retries) - - Adaptive chunk sizing (infinite, recovers after 3 successes) - - Block processing retry (infinite, same chunk) - - Individual RPC retry (5 retries in decryptDDO) +**Issues:** -5. **5 Failure Scenarios** with: - - RPC provider failures - - Database unavailability - - Worker/indexer crashes (now ChainIndexer) - - Processing errors in handlers - - Reindex task failures +- No centralized retry strategy +- No circuit breaker pattern +- Silent failures on validation +- Infinite retries can cause hangs +- No error classification +- No metrics/observability -### Key Technical Insights +### Key Improvement Opportunities **Event Monitoring:** -- Uses `provider.getLogs()` with Ocean Protocol event topic filters -- Processes up to 1000 blocks per iteration (configurable) -- Adaptive chunk sizing on RPC failures -- Sequential processing within chunk (maintains order) - -**Event Processing:** - -- 12 event types with dedicated processors -- Complex validation: factory → metadata proof → access list → publisher -- DDO decryption: 3 strategies (HTTP, P2P, local) with retries -- Metadata enrichment: NFT info + token info + pricing + purgatory -- Database operations: create/update DDO + state tracking + order records - -**Concurrency Model:** - -- All chains indexed concurrently via async/await -- No worker threads (simpler, more maintainable) -- Leverages Node.js event loop for I/O operations -- Instance-based state (no global state between chains) - -### Next Steps for Meeting - -**Analysis Topics:** - -- Review event monitoring and processing flows -- Identify inconsistencies or implicit behavior -- Discuss validation complexity and optimization opportunities -- Evaluate error handling and retry strategies -- Consider batching and performance improvements - -**Improvement Areas:** - -- Serial event processing (no parallelization) -- Many RPC calls per metadata event (no batching) -- No database transaction wrapping -- Multiple retry mechanisms (uncoordinated) -- Complex nested validation logic - -**Refactoring Considerations:** - -- Separate validation from processing -- Extract DDO decryption service - Implement batch RPC calls -- Add circuit breaker pattern -- Introduce metrics and observability - ---- - -## Document Change Log - -**Version 2.0 - January 27, 2026:** +- Parallelize event processing (where safe) +- Add event prioritization +- Implement event queue -**Major Updates:** - -- ✅ Updated architecture from Worker Threads to ChainIndexer classes -- ✅ Replaced worker thread references with async/await architecture -- ✅ Added comprehensive "Event Monitoring Deep Dive" section (600+ lines) -- ✅ Detailed event handling for all 12 event types -- ✅ Updated all use cases to reflect current implementation -- ✅ Updated all flows with ChainIndexer lifecycle -- ✅ Renamed "Worker Threads & Concurrency" to "Async/Await Architecture & Concurrency" -- ✅ Enhanced summary with technical insights and improvement areas - -**New Content:** - -- Event discovery process with step-by-step breakdown -- Event identification and routing mechanism -- Detailed processing for METADATA_CREATED event (13 steps, 400+ lines) -- Detailed processing for METADATA_UPDATED event -- Detailed processing for ORDER_STARTED event -- Detailed processing for DISPENSER_ACTIVATED event -- Detailed processing for EXCHANGE_RATE_CHANGED event -- Event processing pipeline summary with visual diagram -- Performance characteristics (RPC patterns, concurrency model, failure recovery) -- Concurrency model explanation (why async/await works for I/O-bound workloads) - -**Documentation Focus:** -This document now provides deep technical insight into: +**Error Handling:** -1. How events are monitored on-chain (continuous polling, topic filtering) -2. What happens for each event type detected (validation → decryption → enrichment → storage) -3. Current implementation details (ChainIndexer, async/await, EventEmitter) -4. Pain points and improvement opportunities +- Centralize retry logic +- Add circuit breaker pattern +- Implement timeout mechanisms +- Add error classification (retryable vs permanent) +- Skip mechanism for bad events +- Metrics and alerting -**Target Audience:** +**Observability:** -- Development team preparing for refactoring -- New developers understanding the indexer -- Architecture review meeting participants -- Technical stakeholders evaluating improvements +- Track events processed/skipped/failed +- Monitor RPC health per provider +- Track processing latency +- Alert on persistent failures --- **Document Version:** 2.0 **Last Updated:** January 27, 2026 -**Status:** Ready for Meeting - Reflects Current Implementation +**Status:** Focused on Event Monitoring & Error Handling +**Word Count:** ~4,500 words (reduced from 12,000+) From 93399a321f8c2aa4f9fd0dfdaf39fcdbb53524bf Mon Sep 17 00:00:00 2001 From: Bogdan Fazakas Date: Tue, 27 Jan 2026 14:51:17 +0200 Subject: [PATCH 3/6] add more details --- INDEXER_USE_CASES_AND_FLOWS.md | 462 ++++++++++++++++++++++++++++++--- 1 file changed, 430 insertions(+), 32 deletions(-) diff --git a/INDEXER_USE_CASES_AND_FLOWS.md b/INDEXER_USE_CASES_AND_FLOWS.md index 2dad4027e..06fb9fc9f 100644 --- a/INDEXER_USE_CASES_AND_FLOWS.md +++ b/INDEXER_USE_CASES_AND_FLOWS.md @@ -191,16 +191,19 @@ Raw Blockchain Logs ## Detailed Event Handling -### A. METADATA_CREATED Event +### 1. METADATA_CREATED Event **Trigger:** New data asset published on-chain +**Processor:** `MetadataEventProcessor.ts` + **On-Chain Data:** - `owner` - Publisher address -- `flags` - Encryption/compression flags +- `flags` - Encryption/compression flags (bit 2 = encrypted) - `metadata` - Encrypted/compressed DDO - `metadataHash` - SHA256 hash of DDO +- `validateTime` - Timestamp **Processing Steps:** @@ -286,41 +289,128 @@ Raw Blockchain Logs └─> eventEmitter.emit(METADATA_CREATED, { chainId, data: ddo }) ``` -**RPC Calls Per Event:** +**RPC Calls:** ~10-20 (receipt, factory, NFT info, token info, pricing) + +--- + +### 2. METADATA_UPDATED Event + +**Trigger:** Asset metadata is updated on-chain + +**Processor:** `MetadataEventProcessor.ts` (same as METADATA_CREATED) + +**Processing:** **Similar to METADATA_CREATED** with these differences: + +``` +1-10. Same validation and processing as METADATA_CREATED + +11. RETRIEVE EXISTING DDO + └─> existingDdo = ddoDatabase.retrieve(did) + +12. MERGE DDO DATA + └─> Merge new metadata with existing: + ├─> Update: metadata, services, credentials + ├─> Preserve: existing order counts, pricing + ├─> Merge: pricing arrays (add new, keep existing) + └─> Update: indexedMetadata.event (new tx, block, datetime) + +13. UPDATE DATABASE + └─> ddoDatabase.update(mergedDdo) + ddoState.update(chainId, did, nftAddress, txId, valid=true) + +14. EMIT EVENT + └─> eventEmitter.emit(METADATA_UPDATED, { chainId, data: ddo }) +``` -- 1 call: transaction receipt -- 1+ calls: factory validation -- 1+ calls: NFT info (name, symbol, state, etc.) -- 1+ calls per datatoken: token info -- Multiple calls: pricing info (dispensers, exchanges) -- Optional: access list checks (1+ per validator) +**Key Difference:** Uses `update()` instead of `create()`, merges with existing data -**Total:** ~10-20 RPC calls per metadata event +**RPC Calls:** ~10-20 --- -### B. ORDER_STARTED Event +### 3. METADATA_STATE Event + +**Trigger:** Asset state changes (Active → Revoked/Deprecated or vice versa) -**Trigger:** Someone purchases access to a data asset +**Processor:** `MetadataStateEventProcessor.ts` + +**On-Chain Data:** + +- `metadataState` - New state value (0=Active, 1=End of Life, 2=Deprecated, 3=Revoked, etc.) **Processing Steps:** ``` 1. DECODE EVENT DATA - └─> Extract: consumer, payer, datatoken, amount, timestamp + └─> Extract: metadataState (integer) + +2. BUILD DID + └─> did = makeDid(nftAddress, chainId) + +3. RETRIEVE EXISTING DDO + └─> ddo = ddoDatabase.retrieve(did) + └─> If not found → log and skip + +4. CHECK STATE CHANGE + └─> Compare old state vs new state + + IF old=Active AND new=Revoked/Deprecated: + ├─> DDO becomes non-visible + ├─> Create short DDO (minimal version): + │ └─> { id, version: 'deprecated', chainId, nftAddress, + │ indexedMetadata: { nft: { state } } } + └─> Store short DDO + + ELSE: + └─> Update nft.state in existing DDO + +5. UPDATE DATABASE + └─> ddoDatabase.update(ddo) + +6. EMIT EVENT + └─> eventEmitter.emit(METADATA_STATE, { chainId, data: ddo }) +``` + +**Special Behavior:** When asset is revoked/deprecated, stores minimal DDO for potential future restoration + +**RPC Calls:** 1-2 (receipt, decode) + +--- + +### 4. ORDER_STARTED Event + +**Trigger:** Someone purchases/starts access to a data asset + +**Processor:** `OrderStartedEventProcessor.ts` + +**On-Chain Data:** + +- `consumer` - Buyer address +- `payer` - Payment source address +- `amount` - Amount paid +- `serviceId` - Service index +- `timestamp` - Order time + +**Processing Steps:** + +``` +1. DECODE EVENT DATA + └─> Extract: consumer, payer, amount, serviceIndex, timestamp 2. FIND NFT ADDRESS - └─> datatokenContract.getERC721Address() + └─> datatokenContract = getDtContract(signer, event.address) + nftAddress = datatokenContract.getERC721Address() 3. BUILD DID └─> did = makeDid(nftAddress, chainId) -4. RETRIEVE EXISTING DDO - └─> ddoDatabase.retrieve(did) +4. RETRIEVE DDO + └─> ddo = ddoDatabase.retrieve(did) + └─> If not found → log error, skip 5. UPDATE ORDER COUNT - └─> Find service by datatokenAddress - └─> Increment orders count + └─> Find service in ddo.indexedMetadata.stats by datatokenAddress + └─> Increment stat.orders += 1 6. CREATE ORDER RECORD └─> orderDatabase.create({ @@ -328,7 +418,7 @@ Raw Blockchain Logs timestamp, consumer, payer, - datatokenAddress, + datatokenAddress: event.address, nftAddress, did, startOrderId: txHash @@ -338,41 +428,349 @@ Raw Blockchain Logs └─> ddoDatabase.update(ddo) 8. EMIT EVENT - └─> eventEmitter.emit(ORDER_STARTED, { chainId, data }) + └─> eventEmitter.emit(ORDER_STARTED, { chainId, data: ddo }) +``` + +**RPC Calls:** 1-2 (get NFT address, receipt) + +--- + +### 5. ORDER_REUSED Event + +**Trigger:** Someone reuses an existing order for repeated access + +**Processor:** `OrderReusedEventProcessor.ts` + +**On-Chain Data:** + +- `startOrderId` - Reference to original order +- `payer` - Payment source (may differ from original) +- `timestamp` - Reuse time + +**Processing:** **Similar to ORDER_STARTED** with these differences: + +``` +1. DECODE EVENT DATA + └─> Extract: startOrderId, payer, timestamp + +2-5. Same as ORDER_STARTED (find NFT, get DDO, update count) + +6. RETRIEVE START ORDER + └─> startOrder = orderDatabase.retrieve(startOrderId) + └─> Need original order for consumer address + +7. CREATE REUSE ORDER RECORD + └─> orderDatabase.create({ + type: 'reuseOrder', + timestamp, + consumer: startOrder.consumer, // From original order + payer, // May be different + datatokenAddress: event.address, + nftAddress, + did, + startOrderId // Reference to original order + }) + +8-9. Same as ORDER_STARTED (update DDO, emit event) ``` -**RPC Calls:** 1-2 (get NFT address, possibly receipt) +**Key Difference:** Links to original order, may have different payer + +**RPC Calls:** 1-2 --- -### C. PRICING EVENTS (Dispenser/Exchange) +### 6. DISPENSER_CREATED Event -**Events:** DISPENSER_ACTIVATED, EXCHANGE_RATE_CHANGED, etc. +**Trigger:** New dispenser (free token distribution) is created + +**Processor:** `DispenserCreatedEventProcessor.ts` + +**On-Chain Data:** + +- `datatokenAddress` - Datatoken being dispensed +- `owner` - Dispenser owner +- `maxBalance` - Max tokens per user +- `maxTokens` - Max total tokens **Processing Steps:** ``` 1. DECODE EVENT DATA - └─> Extract event-specific data + └─> Extract: datatokenAddress, owner, maxBalance, maxTokens -2. FIND NFT ADDRESS - └─> Query datatoken contract +2. VALIDATE DISPENSER CONTRACT + └─> isValidDispenserContract(event.address, chainId) + └─> Check if dispenser is approved by Router + └─> If not → log warning, skip -3. RETRIEVE DDO - └─> ddoDatabase.retrieve(did) +3. FIND NFT ADDRESS + └─> datatokenContract.getERC721Address() -4. UPDATE PRICING ARRAY +4. RETRIEVE DDO + └─> ddo = ddoDatabase.retrieve(did) + +5. ADD DISPENSER TO PRICING └─> Find service by datatokenAddress - └─> Add/update price entry + └─> If dispenser doesn't exist in prices: + └─> prices.push({ + type: 'dispenser', + price: '0', // Free + contract: event.address, + token: datatokenAddress + }) + +6. UPDATE DDO + └─> ddoDatabase.update(ddo) + +7. EMIT EVENT + └─> eventEmitter.emit(DISPENSER_CREATED, { chainId, data: ddo }) +``` + +**RPC Calls:** 2-3 (receipt, validation, NFT address) + +--- + +### 7. DISPENSER_ACTIVATED Event + +**Trigger:** Dispenser is activated (enables token distribution) + +**Processor:** `DispenserActivatedEventProcessor.ts` + +**Processing:** **Similar to DISPENSER_CREATED** + +``` +1-5. Same validation and processing as DISPENSER_CREATED + +Key Addition: +- Checks if dispenser already exists before adding +- If already exists → skip (no duplicate entries) +``` + +**RPC Calls:** 2-3 + +--- + +### 8. DISPENSER_DEACTIVATED Event + +**Trigger:** Dispenser is deactivated (disables token distribution) + +**Processor:** `DispenserDeactivatedEventProcessor.ts` + +**On-Chain Data:** + +- `datatokenAddress` - Datatoken address + +**Processing:** + +``` +1. DECODE EVENT DATA + └─> Extract: datatokenAddress + +2. VALIDATE & RETRIEVE DDO + └─> Same as DISPENSER_CREATED + +3. REMOVE DISPENSER FROM PRICING + └─> Find service by datatokenAddress + └─> Find dispenser entry by contract address + └─> prices = prices.filter(p => p.contract !== event.address) + +4. UPDATE DDO + └─> ddoDatabase.update(ddo) + +5. EMIT EVENT + └─> eventEmitter.emit(DISPENSER_DEACTIVATED, { chainId, data: ddo }) +``` + +**Key Difference:** Removes dispenser entry instead of adding + +**RPC Calls:** 2-3 + +--- + +### 9. EXCHANGE_CREATED Event + +**Trigger:** New fixed-rate exchange is created for a datatoken + +**Processor:** `ExchangeCreatedEventProcessor.ts` + +**On-Chain Data:** + +- `exchangeId` - Unique exchange identifier +- `datatokenAddress` - Datatoken being sold +- `baseToken` - Payment token (e.g., USDC, DAI) +- `rate` - Exchange rate + +**Processing Steps:** + +``` +1. DECODE EVENT DATA + └─> Extract: exchangeId, datatokenAddress, baseToken, rate + +2. VALIDATE EXCHANGE CONTRACT + └─> isValidFreContract(event.address, chainId) + └─> Check if exchange is approved by Router + └─> If not → log error, skip + +3. FIND NFT ADDRESS + └─> datatokenContract.getERC721Address() + +4. RETRIEVE DDO + └─> ddo = ddoDatabase.retrieve(did) + +5. ADD EXCHANGE TO PRICING + └─> Find service by datatokenAddress + └─> If exchange doesn't exist in prices: + └─> prices.push({ + type: 'exchange', + price: rate, + contract: event.address, + token: baseToken, + exchangeId + }) + +6. UPDATE DDO + └─> ddoDatabase.update(ddo) + +7. EMIT EVENT + └─> eventEmitter.emit(EXCHANGE_CREATED, { chainId, data: ddo }) +``` + +**RPC Calls:** 2-3 + +--- + +### 10. EXCHANGE_ACTIVATED Event + +**Trigger:** Fixed-rate exchange is activated + +**Processor:** `ExchangeActivatedEventProcessor.ts` + +**Processing:** **Similar to EXCHANGE_CREATED** + +``` +1-5. Same validation and processing as EXCHANGE_CREATED + +Key Addition: +- Checks if exchange already exists before adding +- If already exists → skip (no duplicate entries) +``` + +**RPC Calls:** 2-3 + +--- + +### 11. EXCHANGE_DEACTIVATED Event + +**Trigger:** Fixed-rate exchange is deactivated + +**Processor:** `ExchangeDeactivatedEventProcessor.ts` + +**On-Chain Data:** + +- `exchangeId` - Exchange identifier + +**Processing:** + +``` +1. DECODE EVENT DATA + └─> Extract: exchangeId + +2. GET EXCHANGE DETAILS + └─> freContract.getExchange(exchangeId) + └─> Extract: datatokenAddress + +3. VALIDATE & RETRIEVE DDO + └─> Same as EXCHANGE_CREATED + +4. REMOVE EXCHANGE FROM PRICING + └─> Find service by datatokenAddress + └─> Find exchange entry by exchangeId + └─> prices = prices.filter(p => p.exchangeId !== exchangeId) 5. UPDATE DDO └─> ddoDatabase.update(ddo) 6. EMIT EVENT - └─> eventEmitter.emit(eventType, { chainId, data }) + └─> eventEmitter.emit(EXCHANGE_DEACTIVATED, { chainId, data: ddo }) ``` -**RPC Calls:** 1-2 +**Key Difference:** Removes exchange entry instead of adding + +**RPC Calls:** 2-3 + +--- + +### 12. EXCHANGE_RATE_CHANGED Event + +**Trigger:** Exchange rate is updated for a fixed-rate exchange + +**Processor:** `ExchangeRateChangedEventProcessor.ts` + +**On-Chain Data:** + +- `exchangeId` - Exchange identifier +- `newRate` - Updated exchange rate + +**Processing Steps:** + +``` +1. VALIDATE EXCHANGE CONTRACT + └─> isValidFreContract(event.address, chainId) + +2. DECODE EVENT DATA + └─> Extract: exchangeId, newRate + +3. GET EXCHANGE DETAILS + └─> freContract.getExchange(exchangeId) + └─> Extract: datatokenAddress + +4. RETRIEVE DDO + └─> ddo = ddoDatabase.retrieve(did) + +5. UPDATE EXCHANGE RATE + └─> Find service by datatokenAddress + └─> Find exchange entry by exchangeId + └─> price.price = newRate // Update in-place + +6. UPDATE DDO + └─> ddoDatabase.update(ddo) + +7. EMIT EVENT + └─> eventEmitter.emit(EXCHANGE_RATE_CHANGED, { chainId, data: ddo }) +``` + +**Key Difference:** Updates existing price instead of add/remove + +**RPC Calls:** 2-3 + +--- + +### Event Processing Summary + +**Metadata Events (3):** + +- METADATA_CREATED: Full validation + decryption + enrichment (~10-20 RPC calls) +- METADATA_UPDATED: Same as CREATED but merges with existing (~10-20 RPC calls) +- METADATA_STATE: Lightweight state update (~1-2 RPC calls) + +**Order Events (2):** + +- ORDER_STARTED: Update order count + create record (~1-2 RPC calls) +- ORDER_REUSED: Similar to STARTED, links to original order (~1-2 RPC calls) + +**Dispenser Events (3):** + +- DISPENSER_CREATED: Add pricing entry (~2-3 RPC calls) +- DISPENSER_ACTIVATED: Similar to CREATED (~2-3 RPC calls) +- DISPENSER_DEACTIVATED: Remove pricing entry (~2-3 RPC calls) + +**Exchange Events (4):** + +- EXCHANGE_CREATED: Add pricing entry (~2-3 RPC calls) +- EXCHANGE_ACTIVATED: Similar to CREATED (~2-3 RPC calls) +- EXCHANGE_DEACTIVATED: Remove pricing entry (~2-3 RPC calls) +- EXCHANGE_RATE_CHANGED: Update existing price (~2-3 RPC calls) --- From 735dc386992fd7e576bf9b94409e9b55016ce925 Mon Sep 17 00:00:00 2001 From: Bogdan Fazakas Date: Tue, 27 Jan 2026 14:59:25 +0200 Subject: [PATCH 4/6] cleanup --- INDEXER_USE_CASES_AND_FLOWS.md | 6 ------ 1 file changed, 6 deletions(-) diff --git a/INDEXER_USE_CASES_AND_FLOWS.md b/INDEXER_USE_CASES_AND_FLOWS.md index 06fb9fc9f..0609591f8 100644 --- a/INDEXER_USE_CASES_AND_FLOWS.md +++ b/INDEXER_USE_CASES_AND_FLOWS.md @@ -1,11 +1,5 @@ # Ocean Node Indexer - Event Monitoring & Error Handling -**Created:** January 2026 -**Purpose:** Event monitoring mechanisms and error handling for refactoring discussion -**Status:** Pre-Meeting Preparation Document - ---- - ## Table of Contents 1. [Overview](#overview) From 8b4a0141c3692bcb6405916b45cb9fa2d187b7e5 Mon Sep 17 00:00:00 2001 From: Bogdan Fazakas Date: Tue, 27 Jan 2026 19:11:52 +0200 Subject: [PATCH 5/6] updates --- INDEXER_USE_CASES_AND_FLOWS.md | 1269 +++++++++++++++++++++++++++++++- 1 file changed, 1262 insertions(+), 7 deletions(-) diff --git a/INDEXER_USE_CASES_AND_FLOWS.md b/INDEXER_USE_CASES_AND_FLOWS.md index 0609591f8..849290fd2 100644 --- a/INDEXER_USE_CASES_AND_FLOWS.md +++ b/INDEXER_USE_CASES_AND_FLOWS.md @@ -3,11 +3,562 @@ ## Table of Contents 1. [Overview](#overview) -2. [Event Monitoring Architecture](#event-monitoring-architecture) -3. [Event Processing Pipeline](#event-processing-pipeline) -4. [Detailed Event Handling](#detailed-event-handling) -5. [Error Handling & Retry Mechanisms](#error-handling--retry-mechanisms) -6. [Failure Scenarios & Recovery](#failure-scenarios--recovery) +2. [🔴 PROPOSED IMPROVEMENTS (Post-Meeting Changes)](#-proposed-improvements-post-meeting-changes) +3. [Event Monitoring Architecture](#event-monitoring-architecture) +4. [Event Processing Pipeline](#event-processing-pipeline) +5. [Detailed Event Handling](#detailed-event-handling) +6. [Error Handling & Retry Mechanisms](#error-handling--retry-mechanisms) +7. [Failure Scenarios & Recovery](#failure-scenarios--recovery) + +--- + +## 🔴 PROPOSED IMPROVEMENTS (Post-Meeting Changes) + +> **Status:** Draft proposals from Jan 27, 2026 meeting +> **Goal:** Improve reliability, decoupling, and error handling + +### 1. 🎯 EVENT-LEVEL RETRY MECHANISM WITH QUEUES + +**Current Issue:** Retry logic is deeply embedded in event processing steps (e.g., inside DDO decryption) + +**Proposed Change:** + +- **Move retry logic to event level** (not deep inside processing steps) +- **Implement queue-based retry system** for all 12 event types +- **Decouple retry from specific operations** (e.g., decrypt, p2p, HTTP) + +**Implementation:** + +``` +┌─────────────────────────────────────────────────────────────┐ +│ EVENT PROCESSING QUEUE │ +├─────────────────────────────────────────────────────────────┤ +│ │ +│ Event Detected → Add to Queue │ +│ ↓ │ +│ Queue Processor (async workers) │ +│ ↓ │ +│ Process Event │ +│ ├─ Success → Mark complete, update DB │ +│ └─ Failure → Add to Retry Queue with backoff │ +│ │ +│ Retry Queue (exponential backoff): │ +│ - Retry 1: ~10 seconds │ +│ - Retry 2: ~1 minute │ +│ - Retry 3: ~10 minutes │ +│ - Retry 4: ~1 hour │ +│ - Retry 5: ~1 week (final attempt) │ +│ │ +│ Benefits: │ +│ ✓ Non-blocking (doesn't halt indexer) │ +│ ✓ Works for ALL error types (HTTP, P2P, RPC, DB) │ +│ ✓ Configurable per event type │ +│ ✓ Visible retry state in monitoring │ +└─────────────────────────────────────────────────────────────┘ +``` + +**Applies to:** All event processors, especially METADATA_CREATED/UPDATED (DDO decryption) + +--- + +### 2. 🗄️ NEW DATABASE INDEX: `ddo_logs` + +**Current Issue:** + +- `ddoState` only tracks metadata events +- Order and pricing events have no error tracking +- No unified view of all DDO-related events + +**Proposed Change:** + +- Create new DB index: **`ddo_logs`** +- Store **all events** related to a DID (metadata, orders, pricing) +- Similar structure to `ddoState` but broader scope + +**Schema:** + +```typescript +interface DdoLog { + did: string // Indexed + chainId: number // Indexed + eventType: string // METADATA_CREATED, ORDER_STARTED, etc. + eventHash: string // Event signature hash + txHash: string // Transaction hash + blockNumber: number // Block number + timestamp: number // Event timestamp + status: 'success' | 'failed' | 'retrying' + error?: string // Error message if failed + retryCount: number // Number of retry attempts + lastRetry?: number // Timestamp of last retry + metadata?: Record // Event-specific data +} +``` + +**Benefits:** + +- Single source of truth for all DDO events +- Easier debugging (see all events for a DID) +- Track pricing/order event errors (not just metadata) +- Audit trail for compliance + +--- + +### 3. 🔄 REPLACE EventEmitter WITH QUEUES + +**Current Issue:** + +- Using `EventEmitter` for communication +- Synchronous, blocking behavior +- No retry/replay capability +- Difficult to test + +**Proposed Change:** + +- Replace `EventEmitter` with **persistent queue system** +- Use queue for: + - ✓ Newly indexed assets (instead of `eventEmitter.emit()`) + - ✓ Reindex requests (block & transaction level) + - ✓ Admin commands + +**Queue Types:** + +``` +1. EVENT_PROCESSING_QUEUE (primary) + - New events from blockchain + - Priority: FIFO with retry backoff + +2. REINDEX_QUEUE (existing, enhance) + - Block-level reindex + - Transaction-level reindex + - Priority: Admin requests > Auto-retry + +3. ORDER_QUEUE (new) + - Store orders even if DDO not found + - Process when DDO becomes available +``` + +**Benefits:** + +- Testable (can inject mock queue) +- Observable (queue depth, retry counts) +- Resilient (survives crashes) +- Decoupled (no tight coupling between components) + +--- + +### 4. 📦 HANDLE MISSING DDO IN ORDER/PRICING EVENTS + +**Current Issue:** + +- If DDO not found → skip order/pricing event +- Lost data if DDO indexed later + +**Proposed Change:** + +**For ORDER_STARTED/ORDER_REUSED:** + +``` +IF DDO not found: + 1. Create order record anyway (don't skip step 6) + 2. Store in database with status: 'orphaned' + 3. Add DDO processing to watch queue + 4. Skip only: step 5 (update count), step 7 (update DDO) + 5. When DDO indexed → process orphaned orders +``` + +**For PRICING EVENTS (Dispenser/Exchange):** + +``` +IF DDO not found: + 1. Check if DDO is in processing queue + 2. If yes → add pricing event to queue (process after DDO) + 3. If no → log to ddo_logs with error state + 4. Store pricing event data for future reconciliation +``` + +**Benefits:** + +- No data loss +- Can reconcile later +- Better observability + +--- + +### 5. 🚫 MOVE RETRY LOGIC TO ChainIndexer (Block Only That Chain) + +**Current Issue:** + +- Crawler startup retry in `OceanIndexer` +- Failure blocks **entire node** (all chains) + +**Proposed Change:** + +- Move `retryCrawlerWithDelay()` → **ChainIndexer** +- Each chain fails independently +- Other chains continue indexing + +**Implementation:** + +```typescript +// ChainIndexer.ts +async start() { + let retries = 0 + const maxRetries = 10 + + while (retries < maxRetries) { + try { + await this.initializeConnection() // RPC + DB + await this.indexLoop() + break + } catch (error) { + retries++ + const delay = Math.min(retries * 3000, 30000) + INDEXER_LOGGER.error( + `Chain ${this.blockchain.chainId} failed, retry ${retries}/${maxRetries} in ${delay}ms` + ) + await sleep(delay) + } + } + + if (retries === maxRetries) { + this.eventEmitter.emit('chain_failed', { + chainId: this.blockchain.chainId, + error: 'Max retries exceeded' + }) + } +} +``` + +**Benefits:** + +- Resilient multi-chain indexing +- One bad RPC doesn't kill everything +- Easier debugging (per-chain logs) + +--- + +### 6. 📍 BLOCK RETRY QUEUE IMPROVEMENTS + +**Current Issue:** + +- Failed block retried, but `lastIndexedBlock` not updated +- Same block retried indefinitely +- No expiry/max retry limit + +**Proposed Change:** + +``` +When block added to retry queue: + 1. Update lastIndexedBlock (move forward) + 2. Add block to retry queue with metadata: + - blockNumber + - retryCount (starts at 0) + - maxRetries (default: 5) + - lastError + - expiryDate (when to give up) + 3. Process retry queue separately (exponential backoff) + 4. If maxRetries exceeded → log to failed_blocks table +``` + +**Retry Queue Schema:** + +```typescript +interface BlockRetryTask { + chainId: number + blockNumber: number + retryCount: number + maxRetries: number + lastError: string + lastRetryAt: number + expiryDate: number // Timestamp when to stop retrying + events: string[] // Event hashes to reprocess +} +``` + +**Benefits:** + +- Indexer moves forward (doesn't get stuck) +- Failed blocks retried in background +- Clear failure tracking + +--- + +### 7. 🌐 REMOVE ECONNREFUSED-ONLY CONDITION + +**Current Issue:** + +- Retry only on `ECONNREFUSED` error +- Other errors (timeout, 500, p2p failures) not retried + +**Proposed Change:** + +- With event-level retry, **retry ALL error types**: + - ✓ RPC errors (timeout, 500, 429 rate limit) + - ✓ HTTP errors (decrypt service down) + - ✓ P2P errors (peer unreachable) + - ✓ Database errors (temp unavailable) + - ✓ Validation errors (maybe retryable) + +**Implementation:** + +```typescript +// Classify errors +enum ErrorType { + RETRYABLE_RPC = 'retryable_rpc', + RETRYABLE_HTTP = 'retryable_http', + RETRYABLE_P2P = 'retryable_p2p', + RETRYABLE_DB = 'retryable_db', + NON_RETRYABLE = 'non_retryable' +} + +function classifyError(error: Error): ErrorType { + if (error.code === 'ECONNREFUSED') return ErrorType.RETRYABLE_RPC + if (error.code === 'ETIMEDOUT') return ErrorType.RETRYABLE_RPC + if (error.message.includes('429')) return ErrorType.RETRYABLE_RPC + if (error.message.includes('P2P')) return ErrorType.RETRYABLE_P2P + if (error.message.includes('decrypt')) return ErrorType.RETRYABLE_HTTP + if (error.message.includes('factory')) return ErrorType.NON_RETRYABLE + return ErrorType.RETRYABLE_RPC // Default to retryable +} +``` + +--- + +### 8. ✅ UPDATE TESTS + +**Required Test Updates:** + +- Remove tests checking `EventEmitter` behavior +- Add tests for queue-based processing +- Add tests for retry with exponential backoff +- Add tests for orphaned orders +- Add tests for per-chain failure isolation +- Add tests for `ddo_logs` index +- Add tests for block retry with expiry + +--- + +### Summary Table + +| # | Change | Current Pain | Benefit | Effort | Priority | +| --- | --------------------------------------------- | --------------------------------- | ------------------------------------ | ------ | ----------- | +| 1 | Event-level retry + queues | Retry logic scattered, blocking | Unified, non-blocking, testable | High | 🔴 Critical | +| 2 | `ddo_logs` DB index | No order/pricing error tracking | Full audit trail, debugging | Medium | 🟡 High | +| 3 | Replace EventEmitter with queues | Blocking, not testable, no replay | Observable, resilient, testable | High | 🔴 Critical | +| 4 | Handle missing DDO (orphaned orders) | Lost orders/pricing data | No data loss, reconciliation | Medium | 🟡 High | +| 5 | Per-chain startup retry (ChainIndexer) | One failure kills entire node | Isolated failures, resilient | Low | 🔴 Critical | +| 6 | Block retry queue with expiry | Indexer stuck on bad blocks | Progress continues, background retry | Medium | 🟡 High | +| 7 | Retry ALL error types (not just ECONNREFUSED) | P2P/timeout/429 not retried | Comprehensive error handling | Low | 🟡 High | +| 8 | Update tests | Tests assume old architecture | Tests match new architecture | Medium | 🟢 Medium | + +--- + +### Migration Roadmap + +#### Phase 1: Foundation (Weeks 1-2) 🔴 Critical + +**Goal:** Establish queue infrastructure and database schema + +**Tasks:** + +1. Create database tables: + + - `event_queue` (new events) + - `event_retry_queue` (failed events) + - `ddo_logs` (all DDO-related events) + - `block_retry_queue` (failed blocks) + - `failed_blocks` (permanent failures) + - `dead_letter_queue` (max retries exceeded) + +2. Implement queue system: + + - `EventQueue` class (persistent queue) + - `EventQueueProcessor` class (worker pool) + - `EventRetryProcessor` class (background retries) + +3. Add error classification: + - `ErrorType` enum + - `classifyError()` function + - `isRetryable()` logic + +**Deliverables:** + +- Database migrations +- Queue infrastructure code +- Unit tests for queue operations + +--- + +#### Phase 2: Per-Chain Isolation (Week 3) 🔴 Critical + +**Goal:** Prevent one bad chain from killing entire node + +**Tasks:** + +1. Move `retryCrawlerWithDelay()` from `OceanIndexer` → `ChainIndexer.start()` +2. Add per-chain retry counters +3. Emit `chain_startup_failed` event (don't crash node) +4. Update `OceanIndexer.startThread()` to handle chain failures gracefully + +**Deliverables:** + +- Updated `ChainIndexer.start()` with retry logic +- Tests for chain isolation +- Monitoring for failed chains + +--- + +#### Phase 3: Event-Level Retry (Weeks 4-5) 🔴 Critical + +**Goal:** Replace embedded retry with queue-based system + +**Tasks:** + +1. Update all 12 event processors: + + - Remove `withRetrial()` calls + - Remove ECONNREFUSED checks + - Just process, let queue handle retries + +2. Update `ChainIndexer.indexLoop()`: + + - Replace `eventEmitter.emit()` → `eventQueue.enqueue()` + - Process events via `EventQueueProcessor` + +3. Implement exponential backoff: + + - 10s → 1min → 10min → 1hr → 1 week + +4. Log all events to `ddo_logs`: + - Success, failure, retrying states + - Track retryCount, error messages + +**Deliverables:** + +- Refactored event processors (12 files) +- Queue-based event processing +- Tests for retry logic + +--- + +#### Phase 4: Block Retry Queue (Week 6) 🟡 High + +**Goal:** Indexer continues even with failed blocks + +**Tasks:** + +1. Implement `addBlockToRetryQueue()` +2. Update `indexLoop()` error handling: + - Add failed block to queue + - Still update `lastIndexedBlock` (move forward!) +3. Implement `processBlockRetryQueue()` (background loop) +4. Add expiry logic (maxRetries, expiryDate) +5. Move permanent failures to `failed_blocks` table + +**Deliverables:** + +- Block retry queue processor +- Background retry loop +- Tests for block retry + +--- + +#### Phase 5: Handle Missing DDO (Week 7) 🟡 High + +**Goal:** No data loss for orphaned orders/pricing + +**Tasks:** + +1. Update ORDER_STARTED/ORDER_REUSED: + + - Create order record even if DDO not found + - Store as 'orphaned' status + - Add to watch queue + +2. Update pricing events (Dispenser/Exchange): + + - Check if DDO in processing queue + - If yes → add pricing event to queue + - If no → log to `ddo_logs` with error + +3. Implement reconciliation job: + - Periodically check for orphaned orders + - Process when DDO becomes available + +**Deliverables:** + +- Orphaned order handling +- Pricing event queue logic +- Reconciliation job + +--- + +#### Phase 6: Testing & Monitoring (Week 8) 🟢 Medium + +**Goal:** Comprehensive tests and observability + +**Tasks:** + +1. Update existing tests: + + - Remove EventEmitter assertions + - Add queue-based assertions + +2. Add integration tests: + + - Full retry flow (10s → 1 week) + - Chain isolation (one chain fails) + - Block retry queue + - Orphaned orders + +3. Add monitoring dashboard: + + - Queue depth (event, retry, block) + - Retry counts by error type + - Dead letter queue size + - Per-chain health + +4. Add alerting: + - Dead letter queue growing + - Chain startup failures + - High retry queue depth + +**Deliverables:** + +- Full test suite +- Monitoring dashboard +- Alerting rules + +--- + +### Expected Outcomes + +**Reliability:** + +- ✅ No single point of failure (per-chain isolation) +- ✅ Graceful degradation (some chains fail, others continue) +- ✅ No data loss (orphaned orders, retry queue) +- ✅ Progress continues (failed blocks don't block indexer) + +**Observability:** + +- ✅ Full audit trail (`ddo_logs` for all events) +- ✅ Visible retry state (queue depths, retry counts) +- ✅ Clear failure tracking (dead letter queue, failed_blocks) +- ✅ Per-chain health monitoring + +**Maintainability:** + +- ✅ Unified retry logic (no scattered code) +- ✅ Testable (queues can be mocked) +- ✅ Configurable (retry counts, backoffs) +- ✅ Decoupled (event processors just process) + +**Performance:** + +- ✅ Non-blocking (retries don't halt indexer) +- ✅ Concurrent processing (worker pool) +- ✅ Exponential backoff (reduces RPC load) --- @@ -228,10 +779,24 @@ Raw Blockchain Logs │ │ - Payload: { transactionId, chainId, signature, nonce } │ │ - Timeout: 30 seconds │ │ - Retry: up to 5 times (withRetrial) + │ │ + │ │ ⚠️ PROPOSED CHANGE: + │ │ └─> Use exponential backoff (10s → 1min → 10min → 1hr → 1 week) + │ │ └─> Non-blocking retry using queue mechanism + │ │ │ ├─> P2P: p2pNode.sendTo(decryptorURL, message) + │ │ + │ │ ⚠️ PROPOSED CHANGE: + │ │ └─> Add retry mechanism for P2P connections + │ │ │ ├─> Local: node.getCoreHandlers().handle(decryptDDOTask) │ └─> Validate response hash matches metadataHash │ + ⚠️ PROPOSED ARCHITECTURAL CHANGE: + │ └─> Move retry to EVENT LEVEL (decouple from decrypt) + │ └─> Always update ddo_logs (success or error) + │ └─> For retried DDOs: Get order count from DB (not from old DDO) + │ └─> IF COMPRESSED (flag & 2 == 0): └─> Parse directly: JSON.parse(toUtf8String(getBytes(metadata))) @@ -285,6 +850,33 @@ Raw Blockchain Logs **RPC Calls:** ~10-20 (receipt, factory, NFT info, token info, pricing) +**⚠️ PROPOSED IMPROVEMENTS:** + +``` +┌─────────────────────────────────────────────────────────────┐ +│ METADATA_CREATED/UPDATED IMPROVEMENTS │ +├─────────────────────────────────────────────────────────────┤ +│ │ +│ 1. Replace EventEmitter with Queue System │ +│ - Use persistent queue instead of eventEmitter.emit() │ +│ - Better for testing and observability │ +│ │ +│ 2. Event-Level Retry (not deep in decryption) │ +│ - Queue-based retry with exponential backoff │ +│ - Non-blocking (doesn't halt indexer) │ +│ - Works for ALL error types (HTTP, P2P, RPC, DB) │ +│ │ +│ 3. Always Update ddo_logs Index │ +│ - Log success and failures │ +│ - Track: eventHash, txHash, blockNumber, retryCount │ +│ │ +│ 4. For Retried DDOs │ +│ - Recalculate order count from DB (not from old DDO) │ +│ - Query: SELECT COUNT(*) FROM orders WHERE did = ? │ +│ │ +└─────────────────────────────────────────────────────────────┘ +``` + --- ### 2. METADATA_UPDATED Event @@ -320,6 +912,8 @@ Raw Blockchain Logs **RPC Calls:** ~10-20 +**⚠️ PROPOSED IMPROVEMENTS:** Same as METADATA_CREATED (see above) + --- ### 3. METADATA_STATE Event @@ -401,6 +995,9 @@ Raw Blockchain Logs 4. RETRIEVE DDO └─> ddo = ddoDatabase.retrieve(did) └─> If not found → log error, skip + ⚠️ PROPOSED: Don't skip! Go to step 6 (create order), skip only 5 & 7 + - Store order as 'orphaned' in DB + - Process when DDO becomes available __> go to 6 create order store and skip only step 5, 7 5. UPDATE ORDER COUNT └─> Find service in ddo.indexedMetadata.stats by datatokenAddress @@ -423,10 +1020,17 @@ Raw Blockchain Logs 8. EMIT EVENT └─> eventEmitter.emit(ORDER_STARTED, { chainId, data: ddo }) + ⚠️ PROPOSED: Replace EventEmitter with queue-based system ``` **RPC Calls:** 1-2 (get NFT address, receipt) +**⚠️ PROPOSED IMPROVEMENTS:** + +- Store orders even if DDO not found (orphaned orders) +- Log to `ddo_logs` index (not just ddoState) +- Add to ORDER_QUEUE for later processing + --- ### 5. ORDER_REUSED Event @@ -466,6 +1070,7 @@ Raw Blockchain Logs }) 8-9. Same as ORDER_STARTED (update DDO, emit event) + ⚠️ PROPOSED: Same improvements as ORDER_STARTED ``` **Key Difference:** Links to original order, may have different payer @@ -497,12 +1102,20 @@ Raw Blockchain Logs └─> isValidDispenserContract(event.address, chainId) └─> Check if dispenser is approved by Router └─> If not → log warning, skip + ⚠️ PROPOSED: Don't just skip! + - Log to `ddo_logs` index with error state + - Store: eventHash, txHash, blockNumber + - Create unified error handler for pricing events + - Keep all errors related to a DID in one place, --> add somethning similar to ddo state but for pricing errors and a handler + └─> maybe some logs and add all errors related to a did in a place keep one handler + └─> store in the logs the event hash and tx hash and block number 3. FIND NFT ADDRESS └─> datatokenContract.getERC721Address() 4. RETRIEVE DDO └─> ddo = ddoDatabase.retrieve(did) + └─> if not found -> check queue for ddo if found add to queue as well else skip applicable to all events 5. ADD DISPENSER TO PRICING └─> Find service by datatokenAddress @@ -519,10 +1132,17 @@ Raw Blockchain Logs 7. EMIT EVENT └─> eventEmitter.emit(DISPENSER_CREATED, { chainId, data: ddo }) + ⚠️ PROPOSED: Replace EventEmitter with queue-based system ``` **RPC Calls:** 2-3 (receipt, validation, NFT address) +**⚠️ PROPOSED IMPROVEMENTS:** (applies to all pricing events) + +- Log all events to `ddo_logs` index +- Handle missing DDO with queue mechanism +- Unified error handler for pricing events + --- ### 7. DISPENSER_ACTIVATED Event @@ -770,7 +1390,7 @@ Key Addition: ## Error Handling & Retry Mechanisms -### Overview: 4 Retry Layers +### Overview: 4 Retry Layers (Current) The indexer has 4 different retry mechanisms at different levels: @@ -783,6 +1403,8 @@ The indexer has 4 different retry mechanisms at different levels: │ Interval: max(fallbackRPCs.length * 3000, 5000) ms │ │ Strategy: Recursive retry with fallback RPCs │ │ Checks: Network ready + DB reachable │ +│ │ +│ ⚠️ ISSUE: Failure blocks ENTIRE NODE (all chains) │ └──────────────────────────────────────────────────────────────┘ ↓ ┌──────────────────────────────────────────────────────────────┐ @@ -801,6 +1423,8 @@ The indexer has 4 different retry mechanisms at different levels: │ Max Retries: Infinite │ │ Strategy: Don't update lastBlock, retry same chunk │ │ Backoff: Sleep for interval (30s) before retry │ +│ │ +│ ⚠️ ISSUE: Indexer stuck on failed block, no progress │ └──────────────────────────────────────────────────────────────┘ ↓ ┌──────────────────────────────────────────────────────────────┐ @@ -810,13 +1434,87 @@ The indexer has 4 different retry mechanisms at different levels: │ Max Retries: 5 │ │ Strategy: Exponential backoff │ │ Conditions: Only retry on ECONNREFUSED │ +│ │ +│ ⚠️ ISSUE: Only HTTP, not P2P/other errors, blocking │ +└──────────────────────────────────────────────────────────────┘ +``` + +--- + +### 🔴 PROPOSED: New Retry Architecture + +``` +┌──────────────────────────────────────────────────────────────┐ +│ LAYER 1: Per-Chain Startup Retry (MOVED TO ChainIndexer) │ +│ Location: ChainIndexer - start() │ +│ Scope: Initial RPC/DB connection PER CHAIN │ +│ Max Retries: 10 │ +│ Interval: Progressive (3s, 6s, 9s, ... 30s max) │ +│ Strategy: Each chain retries independently │ +│ │ +│ ✅ BENEFIT: One bad RPC doesn't kill entire node │ +│ ✅ BENEFIT: Other chains continue indexing │ +└──────────────────────────────────────────────────────────────┘ + ↓ +┌──────────────────────────────────────────────────────────────┐ +│ LAYER 2: Adaptive Chunk Sizing (UNCHANGED) │ +│ Location: ChainIndexer - indexLoop() │ +│ Scope: RPC getLogs() failures │ +│ Max Retries: Infinite (until success or stop) │ +│ Strategy: Halve chunk size on error (min: 1 block) │ +│ Recovery: Revert to original after 3 successes │ +└──────────────────────────────────────────────────────────────┘ + ↓ +┌──────────────────────────────────────────────────────────────┐ +│ LAYER 3: Block Retry Queue (ENHANCED) │ +│ Location: ChainIndexer - processBlockRetryQueue() │ +│ Scope: Failed blocks │ +│ Max Retries: 5 per block │ +│ Strategy: │ +│ 1. Failed block → add to retry queue │ +│ 2. UPDATE lastIndexedBlock (move forward!) │ +│ 3. Add expiry: maxRetries & expiryDate │ +│ 4. Process retry queue separately (background) │ +│ 5. Exponential backoff per block │ +│ │ +│ ✅ BENEFIT: Indexer doesn't get stuck │ +│ ✅ BENEFIT: Failed blocks retried in background │ +│ ✅ BENEFIT: Clear failure tracking (failed_blocks table) │ +└──────────────────────────────────────────────────────────────┘ + ↓ +┌──────────────────────────────────────────────────────────────┐ +│ LAYER 4: Event-Level Retry Queue (NEW!) │ +│ Location: EventQueueProcessor (new component) │ +│ Scope: ALL event processing errors │ +│ Max Retries: 5 per event │ +│ Strategy: Queue-based with exponential backoff │ +│ - Retry 1: ~10 seconds │ +│ - Retry 2: ~1 minute │ +│ - Retry 3: ~10 minutes │ +│ - Retry 4: ~1 hour │ +│ - Retry 5: ~1 week (final) │ +│ │ +│ Retry ALL error types: │ +│ ✅ HTTP errors (decrypt service) │ +│ ✅ P2P errors (peer unreachable) │ +│ ✅ RPC errors (timeout, 500, 429) │ +│ ✅ DB errors (temp unavailable) │ +│ ✅ Validation errors (if retryable) │ +│ │ +│ ✅ BENEFIT: Non-blocking, unified retry logic │ +│ ✅ BENEFIT: Removes ECONNREFUSED-only condition │ +│ ✅ BENEFIT: Decoupled from processing logic │ └──────────────────────────────────────────────────────────────┘ ``` ### Layer 1: Startup Retry +#### Current Implementation + **Purpose:** Ensure RPC and DB are reachable before starting indexer +**Location:** `OceanIndexer.retryCrawlerWithDelay()` + **Code Flow:** ```typescript @@ -856,6 +1554,83 @@ async retryCrawlerWithDelay(blockchain: Blockchain, interval = 5000) { - Checks both RPC and DB connectivity - Tries fallback RPCs if available +**⚠️ ISSUE:** If one chain fails, **entire node stops** (all chains blocked) + +--- + +#### 🔴 PROPOSED: Move to ChainIndexer + +**New Location:** `ChainIndexer.start()` + +**Benefits:** + +- ✅ Per-chain isolation (one bad chain doesn't kill others) +- ✅ Independent retry counters per chain +- ✅ Better error visibility (which chain failed) +- ✅ Graceful degradation (continue with working chains) + +**Proposed Code:** + +```typescript +// ChainIndexer.ts +export class ChainIndexer { + private maxStartupRetries = 10 + private startupRetryCount = 0 + + async start(): Promise { + while (this.startupRetryCount < this.maxStartupRetries) { + try { + // Initialize RPC connection + await this.initializeRpcConnection() + + // Check DB connectivity + const dbActive = await this.checkDatabaseConnection() + if (!dbActive) { + throw new Error('Database not reachable') + } + + // Start indexing loop + INDEXER_LOGGER.info(`Chain ${this.blockchain.chainId} started successfully`) + await this.indexLoop() + return true + } catch (error) { + this.startupRetryCount++ + const delay = Math.min(this.startupRetryCount * 3000, 30000) + + INDEXER_LOGGER.error( + `Chain ${this.blockchain.chainId} startup failed ` + + `(attempt ${this.startupRetryCount}/${this.maxStartupRetries}), ` + + `retry in ${delay}ms: ${error.message}` + ) + + if (this.startupRetryCount < this.maxStartupRetries) { + await sleep(delay) + // Try next fallback RPC if available + this.rotateToNextRpc() + } + } + } + + // Max retries exceeded + INDEXER_LOGGER.error( + `Chain ${this.blockchain.chainId} failed after ${this.maxStartupRetries} retries` + ) + this.eventEmitter.emit('chain_startup_failed', { + chainId: this.blockchain.chainId, + error: 'Max startup retries exceeded' + }) + return false + } +} +``` + +**Migration Steps:** + +1. Move retry logic from `OceanIndexer` → `ChainIndexer` +2. Update `OceanIndexer.startThread()` to handle per-chain failures +3. Add monitoring for failed chains +4. Update tests to verify chain isolation + --- ### Layer 2: Adaptive Chunk Sizing @@ -952,6 +1727,135 @@ try { **Critical:** This ensures no events are lost even if processing fails +**⚠️ ISSUE:** Indexer gets **stuck** on a failed block, no progress + +--- + +#### 🔴 PROPOSED: Block Retry Queue with Expiry + +**Key Changes:** + +1. **Update `lastIndexedBlock` even on failure** (move forward!) +2. Add failed block to retry queue (process separately) +3. Add expiry: maxRetries & expiryDate per block +4. Background processor for retry queue + +**Proposed Code:** + +```typescript +interface BlockRetryTask { + chainId: number + blockNumber: number + retryCount: number + maxRetries: number // Default: 5 + lastError: string + lastRetryAt: number + expiryDate: number // e.g., 1 week from first failure + events: ethers.Log[] // Events in this block +} + +// In indexLoop() +try { + processedBlocks = await processBlocks(...) + + // UPDATE last indexed block on success + currentBlock = await updateLastIndexedBlockNumber( + processedBlocks.lastBlock, + lastIndexedBlock + ) + + emitNewlyIndexedAssets(processedBlocks.foundEvents) + +} catch (error) { + INDEXER_LOGGER.error(`Processing block ${startBlock} failed: ${error.message}`) + + // NEW: Add to retry queue + await this.addBlockToRetryQueue({ + chainId: this.blockchain.chainId, + blockNumber: startBlock, + retryCount: 0, + maxRetries: 5, + lastError: error.message, + lastRetryAt: Date.now(), + expiryDate: Date.now() + (7 * 24 * 60 * 60 * 1000), // 1 week + events: chunkEvents + }) + + // NEW: Still update lastIndexedBlock (move forward!) + currentBlock = await updateLastIndexedBlockNumber( + processedBlocks?.lastBlock || startBlock, + lastIndexedBlock + ) + + // Indexer continues to next block +} + +// Background processor (separate async loop) +async processBlockRetryQueue() { + while (!this.stopSignal) { + const retryTasks = await this.getRetryTasksDue() + + for (const task of retryTasks) { + if (task.retryCount >= task.maxRetries || Date.now() > task.expiryDate) { + // Max retries or expired → move to failed_blocks table + await this.moveToFailedBlocks(task) + continue + } + + try { + // Retry processing + const processed = await processBlocks( + task.events, + this.signer, + this.provider, + task.chainId, + task.blockNumber, + 1 + ) + + // Success → remove from retry queue + await this.removeFromRetryQueue(task) + INDEXER_LOGGER.info(`Block ${task.blockNumber} retry succeeded`) + + } catch (error) { + // Failed again → update retry count with exponential backoff + task.retryCount++ + task.lastError = error.message + task.lastRetryAt = Date.now() + + // Exponential backoff: 1min, 10min, 1hr, 12hr, 1day + const backoffs = [60000, 600000, 3600000, 43200000, 86400000] + const nextRetryDelay = backoffs[task.retryCount - 1] || 86400000 + + await this.updateRetryTask(task, nextRetryDelay) + INDEXER_LOGGER.warn( + `Block ${task.blockNumber} retry ${task.retryCount}/${task.maxRetries} failed, ` + + `next retry in ${nextRetryDelay / 1000}s` + ) + } + } + + await sleep(10000) // Check every 10 seconds + } +} +``` + +**Benefits:** + +- ✅ Indexer no longer stuck on bad blocks +- ✅ Failed blocks retried in background with exponential backoff +- ✅ Clear failure tracking (`failed_blocks` table) +- ✅ Configurable retry limits +- ✅ Progress continues even with some failures + +**Migration Steps:** + +1. Add `blockRetryQueue` table to database +2. Add `failed_blocks` table for permanent failures +3. Implement `processBlockRetryQueue()` background loop +4. Update `indexLoop()` to add failures to queue +5. Add monitoring dashboard for retry queue + --- ### Layer 4: DDO Decryption Retry @@ -1031,9 +1935,360 @@ async function withRetrial(fn: () => Promise, maxRetries = 5): Promise - Retries 5XX errors (server errors) - 30-second timeout per attempt +**⚠️ ISSUES:** + +1. **Only retries ECONNREFUSED** (not P2P, timeouts, 429, etc.) +2. **Blocking** (stops processing during retries) +3. **Embedded in decryption logic** (not reusable) +4. **Short retry window** (16s total, not enough for service outages) + +--- + +### 🔴 PROPOSED: Layer 4 - Event-Level Retry Queue (NEW!) + +**Purpose:** Unified, non-blocking retry for ALL event processing errors + +**Key Concept:** Move retry logic OUT of event processors and INTO a queue-based system + +#### Architecture + +``` +┌──────────────────────────────────────────────────────────────┐ +│ EVENT PROCESSING FLOW │ +├──────────────────────────────────────────────────────────────┤ +│ │ +│ Blockchain Event Detected │ +│ ↓ │ +│ Add to EVENT_QUEUE │ +│ ↓ │ +│ EventQueueProcessor (async worker pool) │ +│ ├─ SUCCESS → Log to ddo_logs (status: success) │ +│ │ Update DB │ +│ │ Remove from queue │ +│ │ │ +│ └─ FAILURE → Log to ddo_logs (status: failed) │ +│ Classify error (retryable?) │ +│ Add to EVENT_RETRY_QUEUE │ +│ │ +│ EventRetryProcessor (background loop) │ +│ ├─ Get tasks due for retry │ +│ ├─ Check: retryCount < maxRetries │ +│ ├─ Check: Date.now() < expiryDate │ +│ ├─ Retry event processing │ +│ ├─ SUCCESS → Remove from retry queue │ +│ └─ FAILURE → Increment retryCount │ +│ Update nextRetryAt (exponential backoff) │ +│ If maxRetries → Move to dead_letter │ +└──────────────────────────────────────────────────────────────┘ +``` + +#### Data Structures + +```typescript +interface EventQueueTask { + id: string // UUID + chainId: number + eventType: string // METADATA_CREATED, ORDER_STARTED, etc. + eventHash: string + txHash: string + blockNumber: number + eventData: any // Raw event data + createdAt: number + status: 'pending' | 'processing' | 'success' | 'failed' +} + +interface EventRetryTask { + id: string + chainId: number + did?: string // If known + eventType: string + eventHash: string + txHash: string + blockNumber: number + eventData: any + retryCount: number + maxRetries: number // Default: 5 + lastError: string + errorType: ErrorType + createdAt: number + lastRetryAt: number + nextRetryAt: number // Exponential backoff + expiryDate: number // e.g., 1 week from creation +} + +enum ErrorType { + HTTP_ERROR = 'http_error', // Decrypt service down + P2P_ERROR = 'p2p_error', // Peer unreachable + RPC_ERROR = 'rpc_error', // RPC timeout, 429 + DB_ERROR = 'db_error', // Database temp unavailable + VALIDATION_ERROR = 'validation_error', // Factory check, etc. + NON_RETRYABLE = 'non_retryable' // Don't retry +} +``` + +#### Implementation + +```typescript +export class EventQueueProcessor { + private eventQueue: Queue + private retryQueue: Queue + private workerPool: number = 5 // Concurrent workers + + async start() { + // Start worker pool for new events + for (let i = 0; i < this.workerPool; i++) { + this.startWorker(i) + } + + // Start retry processor (background) + this.startRetryProcessor() + } + + private async startWorker(workerId: number) { + while (!this.stopSignal) { + const task = await this.eventQueue.dequeue() + if (!task) { + await sleep(100) + continue + } + + try { + // Update status + task.status = 'processing' + + // Get event processor + const processor = getEventProcessor(task.eventType, task.chainId) + + // Process event (no retry logic inside!) + const result = await processor.processEvent( + task.eventData, + task.chainId, + this.signer, + this.provider, + task.eventType + ) + + // Success + task.status = 'success' + await this.logToDdoLogs(task, 'success', null, result?.did) + + INDEXER_LOGGER.info( + `Worker ${workerId}: Processed ${task.eventType} tx ${task.txHash}` + ) + } catch (error) { + // Failure + task.status = 'failed' + const errorType = this.classifyError(error) + + await this.logToDdoLogs(task, 'failed', error.message, task.eventData.did) + + if (this.isRetryable(errorType)) { + // Add to retry queue + await this.addToRetryQueue(task, error, errorType) + + INDEXER_LOGGER.warn( + `Worker ${workerId}: ${task.eventType} failed (retryable), ` + + `added to retry queue: ${error.message}` + ) + } else { + INDEXER_LOGGER.error( + `Worker ${workerId}: ${task.eventType} failed (non-retryable): ` + + error.message + ) + } + } + } + } + + private async startRetryProcessor() { + while (!this.stopSignal) { + try { + const dueRetries = await this.getRetryTasksDue() + + for (const retryTask of dueRetries) { + // Check expiry + if (Date.now() > retryTask.expiryDate) { + await this.moveToDeadLetter(retryTask, 'Expired') + continue + } + + // Check max retries + if (retryTask.retryCount >= retryTask.maxRetries) { + await this.moveToDeadLetter(retryTask, 'Max retries exceeded') + continue + } + + try { + // Retry processing + const processor = getEventProcessor(retryTask.eventType, retryTask.chainId) + const result = await processor.processEvent( + retryTask.eventData, + retryTask.chainId, + this.signer, + this.provider, + retryTask.eventType + ) + + // Success! + await this.removeFromRetryQueue(retryTask) + await this.logToDdoLogs(retryTask, 'success', null, result?.did) + + INDEXER_LOGGER.info( + `Retry succeeded: ${retryTask.eventType} tx ${retryTask.txHash} ` + + `(attempt ${retryTask.retryCount + 1})` + ) + } catch (error) { + // Failed again + retryTask.retryCount++ + retryTask.lastError = error.message + retryTask.lastRetryAt = Date.now() + + // Exponential backoff: 10s, 1min, 10min, 1hr, 1 week + const backoffs = [10000, 60000, 600000, 3600000, 604800000] + const nextDelay = backoffs[retryTask.retryCount - 1] || 604800000 + retryTask.nextRetryAt = Date.now() + nextDelay + + await this.updateRetryTask(retryTask) + await this.logToDdoLogs(retryTask, 'retrying', error.message, retryTask.did) + + INDEXER_LOGGER.warn( + `Retry failed: ${retryTask.eventType} tx ${retryTask.txHash} ` + + `(attempt ${retryTask.retryCount}/${retryTask.maxRetries}), ` + + `next retry in ${nextDelay / 1000}s` + ) + } + } + } catch (error) { + INDEXER_LOGGER.error(`RetryProcessor error: ${error.message}`) + } + + await sleep(10000) // Check every 10 seconds + } + } + + private classifyError(error: Error): ErrorType { + const msg = error.message.toLowerCase() + const code = (error as any).code + + // HTTP errors (decrypt service) + if (code === 'ECONNREFUSED' || msg.includes('econnrefused')) { + return ErrorType.HTTP_ERROR + } + if (code === 'ETIMEDOUT' || msg.includes('timeout')) { + return ErrorType.HTTP_ERROR + } + if (msg.includes('429') || msg.includes('rate limit')) { + return ErrorType.RPC_ERROR + } + + // P2P errors + if (msg.includes('p2p') || msg.includes('peer')) { + return ErrorType.P2P_ERROR + } + + // RPC errors + if (msg.includes('rpc') || msg.includes('provider')) { + return ErrorType.RPC_ERROR + } + + // DB errors + if (msg.includes('database') || msg.includes('elasticsearch')) { + return ErrorType.DB_ERROR + } + + // Validation errors (usually non-retryable) + if (msg.includes('factory') || msg.includes('validation')) { + return ErrorType.NON_RETRYABLE + } + + // Default: retryable + return ErrorType.HTTP_ERROR + } + + private isRetryable(errorType: ErrorType): boolean { + return errorType !== ErrorType.NON_RETRYABLE + } + + private async logToDdoLogs( + task: EventQueueTask | EventRetryTask, + status: string, + error: string | null, + did?: string + ) { + const { ddoLogs } = await getDatabase() + await ddoLogs.create({ + did: did || 'unknown', + chainId: task.chainId, + eventType: task.eventType, + eventHash: task.eventHash, + txHash: task.txHash, + blockNumber: task.blockNumber, + status, + error, + retryCount: 'retryCount' in task ? task.retryCount : 0, + timestamp: Date.now() + }) + } +} +``` + +#### Benefits + +**✅ Unified Retry Logic** + +- All 12 event types use same retry mechanism +- No more scattered retry code in processors +- Easier to maintain and test + +**✅ Non-Blocking** + +- Indexer continues processing new events +- Retries happen in background +- No performance impact on main indexing loop + +**✅ Retry ALL Error Types** + +- HTTP errors (decrypt service down) +- P2P errors (peer unreachable) +- RPC errors (timeout, 429 rate limit) +- DB errors (temp unavailable) +- Removes ECONNREFUSED-only limitation + +**✅ Exponential Backoff with Long Window** + +- 10s → 1min → 10min → 1hr → 1 week +- Handles long service outages +- Configurable per error type + +**✅ Full Observability** + +- All events logged to `ddo_logs` +- Track retry count, error messages +- Dead letter queue for permanent failures +- Monitoring dashboard for queue depth + +**✅ Decoupled from Event Logic** + +- Event processors just process, no retry code +- Queue handles all retry complexity +- Testable in isolation + +#### Migration Steps + +1. Create `event_queue` table +2. Create `event_retry_queue` table +3. Create `ddo_logs` index (all events, not just metadata) +4. Create `dead_letter_queue` table +5. Implement `EventQueueProcessor` class +6. Update all event processors to remove retry logic +7. Update `ChainIndexer` to enqueue events (not emit) +8. Replace `EventEmitter` with queue system +9. Add monitoring dashboard +10. Update tests + --- -### Error Handling Issues +### Error Handling Issues (Current) **Current Problems:** From 940bc944c1cfc06e3e563c28e8a964e22ce9e515 Mon Sep 17 00:00:00 2001 From: Bogdan Fazakas Date: Thu, 29 Jan 2026 12:13:23 +0200 Subject: [PATCH 6/6] final revision --- FIX_DATABASE_RETRIEVE_ERROR.md | 117 -- GITHUB_ISSUE.md | 46 - INDEXER_ARCHITECTURE_ANALYSIS.md | 1315 ---------------- INDEXER_DOCS_README.md | 346 ---- INDEXER_FLOW_DIAGRAMS.md | 712 --------- INDEXER_MEETING_SUMMARY.md | 491 ------ INDEXER_USE_CASES_AND_FLOWS.md | 2541 ------------------------------ docs/IndexerRefactorStrategy.md | 328 ++++ 8 files changed, 328 insertions(+), 5568 deletions(-) delete mode 100644 FIX_DATABASE_RETRIEVE_ERROR.md delete mode 100644 GITHUB_ISSUE.md delete mode 100644 INDEXER_ARCHITECTURE_ANALYSIS.md delete mode 100644 INDEXER_DOCS_README.md delete mode 100644 INDEXER_FLOW_DIAGRAMS.md delete mode 100644 INDEXER_MEETING_SUMMARY.md delete mode 100644 INDEXER_USE_CASES_AND_FLOWS.md create mode 100644 docs/IndexerRefactorStrategy.md diff --git a/FIX_DATABASE_RETRIEVE_ERROR.md b/FIX_DATABASE_RETRIEVE_ERROR.md deleted file mode 100644 index 2f86f7bc0..000000000 --- a/FIX_DATABASE_RETRIEVE_ERROR.md +++ /dev/null @@ -1,117 +0,0 @@ -# Fix for "Cannot read properties of undefined (reading 'retrieve')" Error - -**Date:** 2026-01-15 -**Status:** ✅ Fixed - -## Problem Description - -The error occurred when the `findDDO` command tried to retrieve a DDO from the database: - -``` -2026-01-15T09:07:40.161Z error: CORE: ❌ Error: 'Cannot read properties of undefined (reading 'retrieve')' -was caught while getting DDO info for id: did:op:bb83b4b7f86b9523523be931a763aaa3a20dc9d3d46c96feb1940e86fde278ac -``` - -### Root Cause - -The issue occurred when the database configuration was invalid or incomplete. In such cases: -1. The `Database` class would be partially initialized -2. The `ddo` property (and other database properties like `indexer`, `logs`, `ddoState`) would be `undefined` -3. Code attempting to call methods like `database.ddo.retrieve()` would fail with "Cannot read properties of undefined" - -This happened because the database initialization in `/src/components/database/index.ts` only creates the `ddo`, `indexer`, `logs`, `order`, and `ddoState` properties when `hasValidDBConfiguration(config)` returns `true` (lines 65-108). - -## Solution - -Added defensive checks before accessing database properties in all affected files. The fix ensures that: -1. The database object exists -2. The specific database property (e.g., `ddo`, `indexer`, `logs`) exists -3. Returns appropriate error responses (HTTP 503 - Service Unavailable) when database is not available - -## Files Modified - -### 1. `/src/components/core/utils/findDdoHandler.ts` -**Function:** `findDDOLocally()` -- Added check for `database` and `database.ddo` before calling `retrieve()` -- Returns `undefined` with a warning log if database is not available - -### 2. `/src/components/core/handler/queryHandler.ts` -**Functions:** -- `QueryHandler.handle()` - Added check for `database.ddo` -- `QueryDdoStateHandler.handle()` - Added check for `database.ddoState` -- Returns HTTP 503 error if database is not available - -### 3. `/src/components/core/handler/ddoHandler.ts` -**Functions:** -- `GetDdoHandler.handle()` - Added check for `database.ddo` -- `FindDdoHandler.handle()` (sink function) - Added check before checking if DDO exists locally -- `findAndFormatDdo()` - Added check for `database.ddo` -- Returns HTTP 503 error if database is not available - -### 4. `/src/components/core/handler/policyServer.ts` -**Function:** `PolicyServerInitializeHandler.handle()` -- Added check for `database.ddo` before retrieving DDO -- Returns HTTP 503 error if database is not available - -### 5. `/src/components/httpRoutes/logs.ts` -**Route:** `POST /log/:id` -- Added check for `database.logs` before retrieving log -- Returns HTTP 503 error if database is not available - -### 6. `/src/components/core/utils/statusHandler.ts` -**Function:** `getIndexerBlockInfo()` -- Added check for `database.indexer` before retrieving block info -- Returns '0' with a warning log if indexer database is not available - -## Pattern Applied - -Before (unsafe): -```typescript -const ddo = await node.getDatabase().ddo.retrieve(id) -``` - -After (safe): -```typescript -const database = node.getDatabase() -if (!database || !database.ddo) { - // Handle error appropriately - return { - stream: null, - status: { httpStatus: 503, error: 'DDO database is not available' } - } -} -const ddo = await database.ddo.retrieve(id) -``` - -## Testing - -To verify the fix: -1. Run the node with an invalid database configuration -2. Try to execute a `findDDO` command -3. The system should now return a proper error message instead of crashing - -Expected behavior: -- HTTP 503 response with message: "DDO database is not available" -- Logs should show warning messages about unavailable database -- No more "Cannot read properties of undefined" errors - -## Impact - -- **Backwards Compatible:** Yes, no breaking changes -- **Error Handling:** Improved - now provides meaningful error messages -- **Stability:** Significantly improved - prevents crashes when database is not fully initialized -- **Performance:** No impact - only adds lightweight null checks - -## Related Files - -- `/src/OceanNode.ts` - Defines `getDatabase()` method -- `/src/components/database/index.ts` - Database initialization logic -- `/src/components/database/DatabaseFactory.ts` - Database factory pattern - -## Configuration Note - -To ensure full database functionality, make sure the following environment variable is properly configured: -- `DB_URL` - Required for DDO, Indexer, Logs, Order, and DDO State databases - -Without a valid `DB_URL`, only the Nonce, C2D, Auth Token, and Config databases will be initialized. - diff --git a/GITHUB_ISSUE.md b/GITHUB_ISSUE.md deleted file mode 100644 index 35343923c..000000000 --- a/GITHUB_ISSUE.md +++ /dev/null @@ -1,46 +0,0 @@ -## Bug: Cannot read properties of undefined (reading 'retrieve') in findDDO command - -### Description -The ocean-node crashes with `TypeError: Cannot read properties of undefined (reading 'retrieve')` when attempting to execute the `findDDO` command if the database is not fully initialized. - -### Steps to Reproduce -1. Run ocean-node without a valid `DB_URL` environment variable -2. Execute a `findDDO` command with any DID -3. Observe the crash - -### Error Log -``` -2026-01-15T09:07:40.160Z debug: CORE: Unable to find DDO locally. Proceeding to call findDDO -2026-01-15T09:07:40.161Z info: CORE: Checking received command data for Command "findDDO": { - "id": "did:op:bb83b4b7f86b9523523be931a763aaa3a20dc9d3d46c96feb1940e86fde278ac", - "command": "findDDO", - "force": false -} -2026-01-15T09:07:40.161Z error: CORE: ❌ Error: 'Cannot read properties of undefined (reading 'retrieve')' was caught while getting DDO info for id: did:op:bb83b4b7f86b9523523be931a763aaa3a20dc9d3d46c96feb1940e86fde278ac -``` - -### Root Cause -When `DB_URL` is invalid or missing, the `Database` class only initializes essential databases (Nonce, C2D, Auth Token, Config). Properties like `ddo`, `indexer`, `logs`, `ddoState`, and `order` remain `undefined`. Code accessing these properties without null checks throws `TypeError`. - -### Impact -- Node crashes when handling DDO-related commands without proper database configuration -- Poor error messages that don't indicate the actual problem (missing DB configuration) -- Affects multiple handlers: `findDDO`, `getDDO`, `query`, `policyServer`, etc. - -### Solution -Add defensive null checks before accessing database properties in: -- `findDdoHandler.ts` - `findDDOLocally()` -- `queryHandler.ts` - `QueryHandler` and `QueryDdoStateHandler` -- `ddoHandler.ts` - `GetDdoHandler`, `FindDdoHandler`, `findAndFormatDdo()` -- `policyServer.ts` - `PolicyServerInitializeHandler` -- `logs.ts` - `/log/:id` route -- `statusHandler.ts` - `getIndexerBlockInfo()` - -Return HTTP 503 with clear error message: "DDO database is not available" instead of crashing. - -### Expected Behavior After Fix -- Node returns HTTP 503 with descriptive error message -- Logs warning about unavailable database -- No crashes when database is not fully initialized -- Backwards compatible with existing functionality - diff --git a/INDEXER_ARCHITECTURE_ANALYSIS.md b/INDEXER_ARCHITECTURE_ANALYSIS.md deleted file mode 100644 index cb7dcca03..000000000 --- a/INDEXER_ARCHITECTURE_ANALYSIS.md +++ /dev/null @@ -1,1315 +0,0 @@ -# Ocean Node Indexer - Architecture Analysis & Refactoring Proposal - -**Date:** January 14, 2026 -**Purpose:** Architecture review and refactoring direction for the Ocean Node Indexer component - ---- - -## 1. CURRENT ARCHITECTURE OVERVIEW - -### 1.1 High-Level Components - -The Indexer system consists of the following main components: - -``` -OceanIndexer (Main Coordinator) - ├── Worker Threads (crawlerThread.ts) - One per supported chain - │ ├── Block Crawler - │ ├── Event Retrieval - │ └── Reindex Queue Manager - ├── Processor (processor.ts) - Event processing orchestrator - │ └── Event Processors (processors/*.ts) - Specific event handlers - └── Database Layer - ├── Indexer State (last indexed block per chain) - ├── DDO Storage (asset metadata) - ├── Order Storage - └── State Tracking (ddoState) -``` - -### 1.2 Component Responsibilities - -#### **OceanIndexer** (`index.ts`) - -- Main coordinator class -- Manages worker threads (one per blockchain network) -- Handles job queue for admin commands (reindex operations) -- Event emitter for DDO and crawling events -- Version management and reindexing triggers - -#### **CrawlerThread** (`crawlerThread.ts`) - -- Runs in separate Worker Thread per chain -- Infinite loop polling blockchain for new blocks -- Retrieves logs/events from block ranges -- Manages reindex queue (per transaction) -- Updates last indexed block in database - -#### **Processor** (`processor.ts`) - -- Orchestrates event processing -- Routes events to specific processors -- Handles validator checks (metadata validators, access lists) -- Manages event filtering - -#### **Event Processors** (`processors/*.ts`) - -- Specific handlers for each event type: - - MetadataEventProcessor (METADATA_CREATED, METADATA_UPDATED) - - MetadataStateEventProcessor (METADATA_STATE) - - OrderStartedEventProcessor - - OrderReusedEventProcessor - - Dispenser processors (Created, Activated, Deactivated) - - Exchange processors (Created, Activated, Deactivated, RateChanged) - ---- - -## 2. HOW BLOCK PARSING WORKS - -### 2.1 Block Crawling Flow - -``` -┌─────────────────────────────────────────────────────────────┐ -│ 1. INITIALIZATION (per chain) │ -│ - Get deployment block from contract addresses │ -│ - Get last indexed block from database │ -│ - Start block = max(deploymentBlock, lastIndexedBlock) │ -└─────────────────────────────────────────────────────────────┘ - ↓ -┌─────────────────────────────────────────────────────────────┐ -│ 2. MAIN LOOP (infinite while true) │ -│ - Get current network height │ -│ - Calculate blocks to process (min of chunkSize and │ -│ remaining blocks) │ -│ - If networkHeight > startBlock: process chunk │ -│ - Else: sleep for interval (default 30s) │ -└─────────────────────────────────────────────────────────────┘ - ↓ -┌─────────────────────────────────────────────────────────────┐ -│ 3. EVENT RETRIEVAL (retrieveChunkEvents) │ -│ - provider.getLogs({ │ -│ fromBlock: lastIndexedBlock + 1, │ -│ toBlock: lastIndexedBlock + chunkSize, │ -│ topics: [EVENT_HASHES] // All supported events │ -│ }) │ -│ - Returns array of Log objects │ -└─────────────────────────────────────────────────────────────┘ - ↓ -┌─────────────────────────────────────────────────────────────┐ -│ 4. PROCESS BLOCKS (processBlocks) │ -│ - Call processChunkLogs(logs, signer, provider, chainId) │ -│ - Update last indexed block in database │ -│ - Emit events for newly indexed assets │ -└─────────────────────────────────────────────────────────────┘ - ↓ -┌─────────────────────────────────────────────────────────────┐ -│ 5. ADAPTIVE CHUNK SIZING │ -│ - On error: chunkSize = floor(chunkSize / 2) │ -│ - After 3 successful calls: revert to original chunkSize │ -│ - Minimum chunkSize = 1 │ -└─────────────────────────────────────────────────────────────┘ -``` - -### 2.2 Key Implementation Details - -**Location:** `crawlerThread.ts` - `processNetworkData()` - -```typescript -// Main crawling loop characteristics: -- Infinite loop with lockProccessing flag -- Dynamic chunk sizing (adaptive to RPC failures) -- Retry mechanism with configurable interval -- Reindex queue processing after each chunk -- One-shot CRAWLING_STARTED event emission -``` - -**Event Retrieval:** `utils.ts` - `retrieveChunkEvents()` - -- Uses ethers `provider.getLogs()` with topic filters -- Filters by all known Ocean Protocol event hashes -- Single RPC call per chunk -- Throws error on failure (caught by crawler for retry) - ---- - -## 3. HOW EVENT STORAGE WORKS - -### 3.1 Event Processing Pipeline - -``` -Raw Log (ethers.Log) - ↓ -┌──────────────────────────────────────┐ -│ 1. EVENT IDENTIFICATION │ -│ - Match log.topics[0] with │ -│ EVENT_HASHES lookup table │ -│ - Route to appropriate processor │ -└──────────────────────────────────────┘ - ↓ -┌──────────────────────────────────────┐ -│ 2. VALIDATION LAYER │ -│ - Check if NFT deployed by │ -│ Ocean Factory │ -│ - Validate metadata proofs │ -│ - Check allowedValidators list │ -│ - Check access list memberships │ -│ - Check authorized publishers │ -└──────────────────────────────────────┘ - ↓ -┌──────────────────────────────────────┐ -│ 3. EVENT-SPECIFIC PROCESSING │ -│ - Decode event data from receipt │ -│ - For Metadata events: │ -│ • Decrypt/decompress DDO │ -│ • Validate DDO hash │ -│ • Check purgatory status │ -│ • Fetch pricing info │ -│ • Check policy server │ -│ - For Order events: │ -│ • Update order count stats │ -│ • Create order record │ -│ - For Pricing events: │ -│ • Update pricing arrays │ -└──────────────────────────────────────┘ - ↓ -┌──────────────────────────────────────┐ -│ 4. DATABASE PERSISTENCE │ -│ - DDO Database (Elasticsearch/ │ -│ Typesense) │ -│ - DDO State (validation tracking) │ -│ - Order Database │ -│ - Indexer State (last block) │ -└──────────────────────────────────────┘ -``` - -### 3.2 Storage Schemas - -**Indexer State:** - -```typescript -{ - id: chainId (string), - lastIndexedBlock: number -} -``` - -**DDO Storage:** - -- Full DDO document stored (as per Ocean Protocol DDO spec) -- Enhanced with `indexedMetadata`: - ```typescript - { - nft: { state, address, name, symbol, owner, created, tokenURI }, - event: { txid, from, contract, block, datetime }, - stats: [{ - datatokenAddress, name, symbol, serviceId, - orders: number, - prices: [{ type, price, contract, token, exchangeId? }] - }], - purgatory: { state: boolean } - } - ``` - -**DDO State Tracking:** - -```typescript -{ - chainId: number, - did: string, - nft: string, - txId: string, - valid: boolean, - error: string // if validation failed -} -``` - -**Order Storage:** - -```typescript -{ - type: 'startOrder' | 'reuseOrder', - timestamp: Date, - consumer: address, - payer: address, - datatokenAddress: address, - nftAddress: address, - did: string, - startOrderId: string -} -``` - ---- - -## 4. PAIN POINTS & ISSUES - -### 4.1 Architecture Complexity - -**Issue:** Mixed concerns and tight coupling - -- `crawlerThread.ts` handles: - - Block crawling logic - - Network communication - - Database updates - - Message passing - - Reindex queue management - - Error handling and retry logic - -**Impact:** Hard to test, debug, and modify individual components - ---- - -### 4.2 Worker Thread Architecture - -**Issue:** Complex inter-thread communication - -- Parent-child message passing using `parentPort.postMessage()` -- Shared state management through message queues -- Two separate queues: `INDEXING_QUEUE` (parent) and `REINDEX_QUEUE` (worker) -- Race conditions possible with `lockProccessing` flag - -**Code smell:** - -```typescript -// In crawlerThread.ts -parentPort.on('message', (message) => { - if (message.method === INDEXER_MESSAGES.START_CRAWLING) { ... } - else if (message.method === INDEXER_MESSAGES.REINDEX_TX) { ... } - else if (message.method === INDEXER_MESSAGES.REINDEX_CHAIN) { ... } - else if (message.method === INDEXER_MESSAGES.STOP_CRAWLING) { ... } -}) -``` - -**Impact:** - -- Hard to reason about state -- Difficult to add new features -- Testing requires mocking Worker Threads - ---- - -### 4.3 Error Handling & Recovery - -**Issue:** Multiple retry mechanisms at different levels - -1. Crawler level: `retryCrawlerWithDelay()` with max 10 retries -2. Chunk retrieval: adaptive chunk sizing on error -3. Block processing: sleep and retry on error -4. Individual RPC calls: `withRetrial()` helper with 5 retries - -**Problems:** - -- No centralized error tracking -- Unclear recovery state after multiple failures -- Potential for infinite loops or deadlocks -- No circuit breaker pattern - ---- - -### 4.4 Event Processing Complexity - -**Issue:** Monolithic `processChunkLogs()` function - -- 180+ lines in single function -- Nested validation logic for metadata events -- Multiple external contract calls during validation -- Synchronous processing (one event at a time) - -**Code complexity example:** - -```typescript -// From processor.ts lines 79-162 -if (event.type === EVENTS.METADATA_CREATED || ...) { - if (checkMetadataValidated) { - const txReceipt = await provider.getTransactionReceipt(...) - const metadataProofs = fetchEventFromTransaction(...) - if (!metadataProofs) { continue } - - const validators = metadataProofs.map(...) - const allowed = allowedValidators.filter(...) - if (!allowed.length) { continue } - - if (allowedValidatorsList && validators.length > 0) { - isAllowed = false - for (const accessListAddress of allowedValidatorsList[chain]) { - const accessListContract = new ethers.Contract(...) - for (const metaproofValidator of validators) { - const balance = await accessListContract.balanceOf(...) - // ... more nested logic - } - } - if (!isAllowed) { continue } - } - } -} -``` - -**Impact:** - -- Hard to read and maintain -- Performance bottleneck (serial processing) -- Difficult to add new validation rules -- Error in one validation affects all events - ---- - -### 4.5 Metadata Decryption Complexity - -**Issue:** `decryptDDO()` method in BaseProcessor (400+ lines) - -- Handles HTTP, P2P, and local decryption -- Complex nonce management -- Signature verification inline -- Multiple error paths -- Retry logic embedded - -**Impact:** - -- Single Responsibility Principle violated -- Hard to test different decryption strategies -- Error messages unclear about failure point - ---- - -### 4.6 Database Abstraction Issues - -**Issue:** Direct database calls throughout processors - -```typescript -const { ddo: ddoDatabase, ddoState, order: orderDatabase } = await getDatabase() -``` - -**Problems:** - -- Tight coupling to database implementation -- Transaction management unclear -- No batch operations -- No caching strategy -- Multiple database calls per event - ---- - -### 4.7 State Management - -**Issue:** Global mutable state - -```typescript -// In index.ts -let INDEXING_QUEUE: ReindexTask[] = [] -const JOBS_QUEUE: JobStatus[] = [] -const runningThreads: Map = new Map() - -// In crawlerThread.ts -let REINDEX_BLOCK: number = null -const REINDEX_QUEUE: ReindexTask[] = [] -let stoppedCrawling: boolean = false -let startedCrawling: boolean = false -``` - -**Impact:** - -- Hard to test -- Race conditions -- Unclear ownership -- Memory leaks potential - ---- - -### 4.8 Lack of Observability - -**Issue:** Limited monitoring and metrics - -- No performance metrics (events/sec, blocks/sec) -- No latency tracking -- No failure rate monitoring -- Logger used but no structured metrics -- Hard to debug production issues - ---- - -### 4.9 Testing Challenges - -**Issue:** Integration test heavy, unit tests sparse - -- Worker threads hard to unit test -- Database dependencies in all tests -- Long-running integration tests -- No mocking strategy for blockchain - ---- - -### 4.10 Configuration & Deployment - -**Issue:** Environment-dependent behavior - -- RPC URLs in environment variables -- Chunk sizes configurable but defaults unclear -- Interval timing hardcoded in multiple places -- No configuration validation - ---- - -## 5. REFACTORING PROPOSAL - HIGH-LEVEL ARCHITECTURE - -### 5.1 Design Principles - -1. **Separation of Concerns**: Each component has one clear responsibility -2. **Dependency Inversion**: Depend on abstractions, not implementations -3. **Testability**: Every component unit testable in isolation -4. **Observability**: Built-in metrics and monitoring -5. **Resilience**: Explicit error handling with circuit breakers -6. **Maintainability**: Clear code structure, documented patterns - ---- - -### 5.2 Proposed Architecture - -``` -┌─────────────────────────────────────────────────────────────────┐ -│ IndexerOrchestrator │ -│ - Coordinates all indexing operations │ -│ - Manages lifecycle of chain indexers │ -│ - Handles configuration and health checks │ -└─────────────────────────────────────────────────────────────────┘ - │ - ┌─────────────────────┼─────────────────────┐ - ↓ ↓ ↓ -┌──────────────┐ ┌──────────────┐ ┌──────────────┐ -│ChainIndexer 1│ │ChainIndexer 2│ │ChainIndexer N│ -│ (per chain) │ │ (per chain) │ ... │ (per chain) │ -└──────────────┘ └──────────────┘ └──────────────┘ - │ - ├──> BlockScanner (fetches block ranges) - │ │ - │ └──> RPC Client (with retry & fallback) - │ - ├──> EventExtractor (filters & decodes events) - │ - ├──> ValidationPipeline - │ ├──> FactoryValidator - │ ├──> MetadataValidator - │ ├──> PublisherValidator - │ └──> PolicyValidator - │ - ├──> EventProcessor - │ ├──> MetadataProcessor - │ ├──> OrderProcessor - │ └──> PricingProcessor - │ - └──> StateManager - ├──> ProgressTracker (last indexed block) - ├──> EventStore (processed events) - └──> ReindexQueue -``` - ---- - -### 5.3 Component Details - -#### **5.3.1 IndexerOrchestrator** - -**Responsibility:** Top-level coordinator - -```typescript -class IndexerOrchestrator { - private chainIndexers: Map - private config: IndexerConfig - private eventBus: EventBus - private metrics: MetricsCollector - - async start(): Promise - async stop(): Promise - async reindexChain(chainId: number, fromBlock?: number): Promise - async reindexTransaction(chainId: number, txHash: string): Promise - getStatus(): IndexerStatus -} -``` - -**Benefits:** - -- Single entry point -- Clear lifecycle management -- Easy to add new chains -- Health check support - ---- - -#### **5.3.2 ChainIndexer** - -**Responsibility:** Manages indexing for a single blockchain - -```typescript -class ChainIndexer { - private chainId: number - private scanner: BlockScanner - private extractor: EventExtractor - private pipeline: ValidationPipeline - private processor: EventProcessor - private stateManager: StateManager - private running: boolean - - async start(): Promise - async stop(): Promise - async processBlockRange(from: number, to: number): Promise -} -``` - -**Benefits:** - -- Self-contained per chain -- No worker threads needed (use async/await) -- Easy to test -- Clear dependencies - ---- - -#### **5.3.3 BlockScanner** - -**Responsibility:** Fetch blocks and logs from RPC - -```typescript -interface BlockScanner { - getLatestBlock(): Promise - getLogs(fromBlock: number, toBlock: number, topics: string[]): Promise -} - -class EthersBlockScanner implements BlockScanner { - private rpcClient: ResilientRpcClient - - // Implementation with retry and fallback -} - -class ResilientRpcClient { - private providers: JsonRpcProvider[] - private circuitBreaker: CircuitBreaker - private metrics: MetricsCollector - - async execute(fn: (provider: JsonRpcProvider) => Promise): Promise -} -``` - -**Benefits:** - -- Encapsulates RPC communication -- Retry/fallback logic in one place -- Easy to mock for testing -- Circuit breaker prevents cascade failures - ---- - -#### **5.3.4 EventExtractor** - -**Responsibility:** Decode and categorize events - -```typescript -class EventExtractor { - private eventRegistry: EventRegistry - - extractEvents(logs: Log[]): CategorizedEvents - decodeEvent(log: Log): DecodedEvent -} - -interface CategorizedEvents { - metadata: MetadataEvent[] - orders: OrderEvent[] - pricing: PricingEvent[] - unknown: Log[] -} -``` - -**Benefits:** - -- Single responsibility -- Stateless and pure -- Easy to test -- Clear input/output - ---- - -#### **5.3.5 ValidationPipeline** - -**Responsibility:** Chain of validators for events - -```typescript -interface Validator { - validate(event: DecodedEvent, context: ValidationContext): Promise -} - -class ValidationPipeline { - private validators: Validator[] - - async validate(event: DecodedEvent): Promise - addValidator(validator: Validator): void -} - -// Specific validators -class FactoryValidator implements Validator -class MetadataProofValidator implements Validator -class PublisherValidator implements Validator -class AccessListValidator implements Validator -class PolicyServerValidator implements Validator -``` - -**Benefits:** - -- Chain of Responsibility pattern -- Each validator is independent -- Easy to add/remove validators -- Parallel validation possible -- Clear failure points - ---- - -#### **5.3.6 EventProcessor** - -**Responsibility:** Transform validated events into domain models - -```typescript -interface EventHandler { - handle(event: T): Promise -} - -class EventProcessor { - private handlers: Map - - async process(event: DecodedEvent): Promise -} - -// Specific handlers -class MetadataCreatedHandler implements EventHandler -class OrderStartedHandler implements EventHandler -class DispenserActivatedHandler implements EventHandler -``` - -**Benefits:** - -- Strategy pattern for different event types -- Stateless handlers -- Easy to test -- Clear transformations - ---- - -#### **5.3.7 StateManager** - -**Responsibility:** Manage persistence and state - -```typescript -interface StateManager { - getLastIndexedBlock(chainId: number): Promise - setLastIndexedBlock(chainId: number, block: number): Promise - - saveDDO(ddo: DDO): Promise - saveOrder(order: Order): Promise - updatePricing(pricing: PricingUpdate): Promise - - // Batch operations - saveBatch(entities: DomainEntity[]): Promise -} - -class TransactionalStateManager implements StateManager { - private ddoRepository: DDORepository - private orderRepository: OrderRepository - private progressRepository: ProgressRepository - - async transaction(fn: (repos: Repositories) => Promise): Promise -} -``` - -**Benefits:** - -- Repository pattern -- Transaction support -- Batch operations for performance -- Easy to swap implementations -- Mockable for tests - ---- - -### 5.4 Data Flow Example - -**Processing a Metadata Created Event:** - -``` -1. ChainIndexer.processBlockRange(1000, 1010) - ↓ -2. BlockScanner.getLogs(1000, 1010, [...topics]) - → Returns: [Log, Log, Log, ...] - ↓ -3. EventExtractor.extractEvents(logs) - → Returns: CategorizedEvents { metadata: [event1], orders: [], ... } - ↓ -4. For each metadata event: - ValidationPipeline.validate(event) - ├─> FactoryValidator.validate() - ├─> MetadataProofValidator.validate() - ├─> PublisherValidator.validate() - └─> PolicyServerValidator.validate() - → Returns: ValidationResult { valid: true, ... } - ↓ -5. EventProcessor.process(event) - → MetadataCreatedHandler.handle(event) - ├─> Decrypt DDO - ├─> Fetch pricing info - └─> Build DDO entity - → Returns: DDO - ↓ -6. StateManager.saveDDO(ddo) - → Persisted to database - ↓ -7. EventBus.emit('ddo.created', ddo) - → Notify listeners -``` - ---- - -## 6. MIGRATION STRATEGY - -### 6.1 Phase 1: Foundation (Week 1-2) - -**Goals:** - -- Introduce new abstractions without breaking existing code -- Add comprehensive tests - -**Tasks:** - -1. Create `ResilientRpcClient` wrapper -2. Implement `BlockScanner` interface -3. Add metrics collection infrastructure -4. Write unit tests for new components - -**Deliverables:** - -- `ResilientRpcClient` with circuit breaker -- `BlockScanner` implementation -- Test coverage > 80% - ---- - -### 6.2 Phase 2: Validation Extraction (Week 3-4) - -**Goals:** - -- Extract validation logic into pipeline -- Reduce complexity of processor.ts - -**Tasks:** - -1. Create `Validator` interface -2. Implement individual validators -3. Build `ValidationPipeline` -4. Refactor `processChunkLogs()` to use pipeline - -**Deliverables:** - -- 5+ validator implementations -- Validation pipeline with tests -- Reduced complexity in processor.ts - ---- - -### 6.3 Phase 3: Event Processing (Week 5-6) - -**Goals:** - -- Separate event handling from validation -- Introduce domain models - -**Tasks:** - -1. Create `EventHandler` interface -2. Implement handlers for each event type -3. Introduce domain entities (separate from database models) -4. Refactor processors to use handlers - -**Deliverables:** - -- Event handler implementations -- Domain model layer -- Clearer separation of concerns - ---- - -### 6.4 Phase 4: State Management (Week 7-8) - -**Goals:** - -- Decouple from database implementation -- Add transaction support - -**Tasks:** - -1. Create repository interfaces -2. Implement transactional state manager -3. Add batch operation support -4. Migrate database calls to repositories - -**Deliverables:** - -- Repository layer -- Transaction support -- Batch operations -- Performance improvements - ---- - -### 6.5 Phase 5: Remove Worker Threads (Week 9-10) - -**Goals:** - -- Simplify architecture -- Remove inter-thread communication - -**Tasks:** - -1. Implement `ChainIndexer` class -2. Replace worker threads with async loops -3. Migrate message passing to direct method calls -4. Update job queue management - -**Deliverables:** - -- No worker threads -- Simplified code -- Better error handling -- Improved testability - ---- - -### 6.6 Phase 6: Observability & Monitoring (Week 11-12) - -**Goals:** - -- Add comprehensive monitoring -- Improve debugging capabilities - -**Tasks:** - -1. Add structured logging -2. Implement metrics collection -3. Add health check endpoints -4. Create monitoring dashboards - -**Deliverables:** - -- Prometheus metrics -- Grafana dashboards -- Health check API -- Debug tooling - ---- - -## 7. IMMEDIATE WINS (Quick Improvements) - -These can be implemented independently before full refactoring: - -### 7.1 Extract DDO Decryption Service - -**Current:** 400+ line method in BaseProcessor -**Proposed:** Separate `DdoDecryptionService` class - -**Benefits:** - -- Testable in isolation -- Reusable -- Clear interface - -**Effort:** 1-2 days - ---- - -### 7.2 Add Batch Database Operations - -**Current:** One database call per event -**Proposed:** Batch save operations - -```typescript -// Instead of: -for (const event of events) { - await database.save(event) -} - -// Do: -await database.saveBatch(events) -``` - -**Benefits:** - -- 10-50x performance improvement -- Reduced database load - -**Effort:** 2-3 days - ---- - -### 7.3 Extract Validation Logic - -**Current:** Nested if statements in processChunkLogs -**Proposed:** Separate validation functions - -```typescript -class EventValidation { - validateFactory(event): boolean - validateMetadataProof(event): boolean - validatePublisher(event): boolean - validateAccessList(event): boolean -} -``` - -**Benefits:** - -- Readable code -- Testable validations -- Reusable - -**Effort:** 2-3 days - ---- - -### 7.4 Add Circuit Breaker for RPC - -**Current:** Simple retry logic -**Proposed:** Circuit breaker pattern - -**Benefits:** - -- Prevent cascade failures -- Faster failure detection -- Better error messages - -**Effort:** 1-2 days - ---- - -### 7.5 Add Metrics Collection - -**Current:** Only logs -**Proposed:** Prometheus metrics - -```typescript -metrics.indexer_blocks_processed_total.inc() -metrics.indexer_events_processed_total.inc({ type: 'metadata' }) -metrics.indexer_processing_duration_seconds.observe(duration) -metrics.indexer_rpc_errors_total.inc({ provider: 'infura' }) -``` - -**Benefits:** - -- Production visibility -- Performance tracking -- Alerting capability - -**Effort:** 2-3 days - ---- - -## 8. TESTING STRATEGY - -### 8.1 Unit Tests - -**Target:** 80%+ coverage - -**Focus areas:** - -- Validators (each should be 100% covered) -- Event handlers (pure functions, easy to test) -- Extractors and decoders -- Utility functions - -**Tools:** - -- Mocha/Chai (already in use) -- Sinon for mocking -- Test fixtures for events - ---- - -### 8.2 Integration Tests - -**Target:** Critical paths covered - -**Focus areas:** - -- End-to-end event processing -- Database operations -- Reindex operations -- Multi-chain scenarios - -**Tools:** - -- Docker containers for databases -- Hardhat for blockchain mocking -- Test fixtures - ---- - -### 8.3 Performance Tests - -**Target:** Benchmarks established - -**Metrics:** - -- Events processed per second -- Memory usage over time -- RPC call latency -- Database query performance - -**Tools:** - -- k6 or Artillery -- Memory profiling -- Custom benchmarking scripts - ---- - -## 9. ALTERNATIVES CONSIDERED - -### 9.1 Keep Worker Threads - -**Pros:** - -- No need to refactor thread management -- True parallelism - -**Cons:** - -- Complex state management -- Hard to debug -- Testing challenges - -**Decision:** Remove threads (async/await sufficient) - ---- - -### 9.2 Event Sourcing - -**Pros:** - -- Complete audit trail -- Replay capability -- Temporal queries - -**Cons:** - -- Significant complexity increase -- Storage overhead -- Query performance concerns - -**Decision:** Not recommended (too much complexity for benefits) - ---- - -### 9.3 Message Queue (Kafka/RabbitMQ) - -**Pros:** - -- Decoupled components -- Built-in retry/DLQ -- Scalability - -**Cons:** - -- Additional infrastructure -- Operational complexity -- Overkill for current scale - -**Decision:** Revisit when scaling beyond 10+ chains - ---- - -### 9.4 GraphQL Subscriptions - -**Pros:** - -- Real-time updates to clients -- Flexible queries - -**Cons:** - -- Not needed for current use case -- Additional complexity - -**Decision:** Out of scope for indexer refactor - ---- - -## 10. SUCCESS METRICS - -### 10.1 Code Quality - -- **Cyclomatic Complexity:** Reduce from avg 15 to < 5 -- **Lines per Function:** < 50 lines -- **Test Coverage:** > 80% -- **Type Safety:** 100% typed (no `any`) - -### 10.2 Performance - -- **Throughput:** 2x improvement in events/sec -- **Latency:** < 100ms per event -- **Memory:** Stable (no leaks) -- **RPC Calls:** Reduce by 30% (batch operations) - -### 10.3 Reliability - -- **Uptime:** > 99.9% -- **Failed Events:** < 0.1% -- **Recovery Time:** < 5 minutes after RPC failure -- **Reindex Success Rate:** > 99% - -### 10.4 Maintainability - -- **Onboarding Time:** < 2 days for new dev -- **Bug Fix Time:** Avg < 4 hours -- **Feature Addition Time:** Avg < 1 week -- **Production Incidents:** < 1 per month - ---- - -## 11. RISKS & MITIGATION - -### 11.1 Risk: Breaking Changes - -**Mitigation:** - -- Incremental refactoring (Strangler Fig pattern) -- Comprehensive test suite -- Feature flags for new code paths -- Parallel running (old + new) during transition - -### 11.2 Risk: Performance Regression - -**Mitigation:** - -- Benchmark before refactoring -- Performance tests in CI -- Load testing before deployment -- Gradual rollout - -### 11.3 Risk: Data Loss During Migration - -**Mitigation:** - -- Database backups before changes -- Reindex capability -- Validation checks -- Dry-run mode - -### 11.4 Risk: Schedule Overrun - -**Mitigation:** - -- Phased approach with clear milestones -- Regular progress reviews -- Scope adjustment flexibility -- Priority on immediate wins - ---- - -## 12. OPEN QUESTIONS FOR DISCUSSION - -1. **Worker Threads:** Do we need true parallelism or is async/await sufficient? - -2. **Database Choice:** Should we standardize on one (Elasticsearch vs Typesense) or keep both? - -3. **Event Prioritization:** Should critical events (metadata) be prioritized over pricing events? - -4. **Reindex Strategy:** Should reindexing be a separate service/process? - -5. **Monitoring:** What metrics are most important for production monitoring? - -6. **Backward Compatibility:** How long should we support old API/database schemas? - -7. **Multi-Region:** Do we need to support indexer deployment in multiple regions? - -8. **Event Replay:** Do we need ability to replay historical events? - ---- - -## 13. CONCLUSION & NEXT STEPS - -### Current State Summary - -The Ocean Node Indexer is functional but suffers from: - -- High complexity (worker threads, mixed concerns) -- Limited observability -- Difficult to test and maintain -- Performance bottlenecks (serial processing, many RPC calls) - -### Proposed State - -After refactoring: - -- Clear component boundaries -- No worker threads (async/await) -- Comprehensive testing -- Built-in monitoring -- 2x performance improvement -- Easy to extend and maintain - -### Recommended Next Steps - -1. **This Meeting (Today):** - - - Review and discuss this document - - Agree on high-level direction - - Prioritize immediate wins vs full refactor - - Assign owners for investigation tasks - -2. **Next Week:** - - - Detailed design for Phase 1 components - - Create ADRs (Architecture Decision Records) - - Set up performance benchmarks - - Begin implementation of immediate wins - -3. **Ongoing:** - - Weekly architecture sync - - Code reviews focused on quality - - Regular performance testing - - Documentation updates - ---- - -## APPENDIX A: Key Files Reference - -``` -src/components/Indexer/ -├── index.ts - OceanIndexer main class (490 lines) -├── crawlerThread.ts - Worker thread implementation (380 lines) -├── processor.ts - Event processing orchestrator (207 lines) -├── utils.ts - Utility functions (454 lines) -├── purgatory.ts - Purgatory checking -├── version.ts - Version management -└── processors/ - ├── BaseProcessor.ts - Abstract base (442 lines) - ├── MetadataEventProcessor.ts - Metadata handling (403 lines) - ├── MetadataStateEventProcessor.ts - ├── OrderStartedEventProcessor.ts - ├── OrderReusedEventProcessor.ts - ├── DispenserActivatedEventProcessor.ts - ├── DispenserCreatedEventProcessor.ts - ├── DispenserDeactivatedEventProcessor.ts - ├── ExchangeActivatedEventProcessor.ts - ├── ExchangeCreatedEventProcessor.ts - ├── ExchangeDeactivatedEventProcessor.ts - └── ExchangeRateChangedEventProcessor.ts -``` - ---- - -## APPENDIX B: Glossary - -- **DDO:** Decentralized Data Object - Ocean Protocol asset metadata -- **NFT:** Non-Fungible Token - ERC721 contract representing data asset -- **Datatoken:** ERC20 token for accessing data -- **Dispenser:** Contract for free datatoken distribution -- **FRE:** Fixed Rate Exchange - Contract for datatoken pricing -- **Purgatory:** Blocklist for banned assets/accounts -- **MetadataProof:** Validation signature from authorized validators - ---- - -**Document Version:** 1.0 -**Last Updated:** January 14, 2026 -**Authors:** Architecture Team -**Status:** Draft for Discussion diff --git a/INDEXER_DOCS_README.md b/INDEXER_DOCS_README.md deleted file mode 100644 index 673a3ce40..000000000 --- a/INDEXER_DOCS_README.md +++ /dev/null @@ -1,346 +0,0 @@ -# Ocean Node Indexer - Architecture Review Documents - -**Created:** January 14, 2026 -**Purpose:** Architecture review meeting preparation materials - ---- - -## 📚 Document Guide - -### For Meeting Participants - -**Start here:** Read documents in this order - -1. **[INDEXER_MEETING_SUMMARY.md](./INDEXER_MEETING_SUMMARY.md)** ⭐ - - - **Time to read:** 15-20 minutes - - **Best for:** Quick overview, meeting agenda, action items - - **Contains:** TL;DR, top pain points, immediate wins, timeline - -2. **[INDEXER_FLOW_DIAGRAMS.md](./INDEXER_FLOW_DIAGRAMS.md)** 📊 - - - **Time to read:** 10-15 minutes - - **Best for:** Visual learners, understanding data flow - - **Contains:** Current vs proposed architecture diagrams - -3. **[INDEXER_ARCHITECTURE_ANALYSIS.md](./INDEXER_ARCHITECTURE_ANALYSIS.md)** 📖 - - **Time to read:** 45-60 minutes - - **Best for:** Deep dive, implementation details - - **Contains:** Complete analysis, 13 sections, migration strategy - ---- - -## 🎯 Quick Navigation - -### By Role - -**If you are a Developer:** - -- Read: Summary → Diagrams → Sections 4-5 of Analysis -- Focus on: Code complexity, testing strategy, immediate wins - -**If you are a Tech Lead:** - -- Read: All three documents -- Focus on: Architecture decisions, migration phases, risks - -**If you are a Product Manager:** - -- Read: Summary → Section 10 (Success Metrics) of Analysis -- Focus on: Timeline, priorities, business impact - -**If you are DevOps:** - -- Read: Summary → Section 9 (Diagrams) → Section 6 (Analysis) -- Focus on: Observability, deployment strategy, monitoring - ---- - -## 📋 Meeting Prep Checklist - -### Before the Meeting - -- [ ] Read INDEXER_MEETING_SUMMARY.md -- [ ] Review INDEXER_FLOW_DIAGRAMS.md -- [ ] Optionally: Deep dive into INDEXER_ARCHITECTURE_ANALYSIS.md -- [ ] Prepare your questions and concerns -- [ ] Review the codebase (key files listed in documents) - -### During the Meeting - -- [ ] Use INDEXER_MEETING_SUMMARY.md as guide -- [ ] Reference diagrams for discussions -- [ ] Note action items in the Action Items Template -- [ ] Capture decisions and concerns - -### After the Meeting - -- [ ] Review and finalize action items -- [ ] Assign owners and deadlines -- [ ] Create detailed design docs for Phase 1 -- [ ] Set up next sync meeting - ---- - -## 🔍 Document Contents Overview - -### INDEXER_MEETING_SUMMARY.md - -``` -1. Agenda (5 items) -2. Key Takeaways (TL;DR) -3. Current Architecture (Simplified) -4. Proposed Architecture -5. Top 10 Pain Points -6. Immediate Wins (5 quick improvements) -7. Phased Timeline (12 weeks) -8. Alternatives Considered -9. Open Questions (8 questions) -10. Success Metrics -11. Next Steps -12. Action Items Template -``` - -### INDEXER_FLOW_DIAGRAMS.md - -``` -1. Current Architecture - Component View -2. Current Architecture - Event Processing Flow -3. Proposed Architecture - Component View -4. Proposed Architecture - Event Processing Flow -5. Block Crawling Flow (Current vs Proposed) -6. Database Operations (Current vs Proposed) -7. Error Handling (Current vs Proposed) -8. Testing Strategy (Current vs Proposed) -9. Metrics & Observability Dashboard -10. Comparison Summary Table -``` - -### INDEXER_ARCHITECTURE_ANALYSIS.md - -``` -1. Current Architecture Overview -2. How Block Parsing Works -3. How Event Storage Works -4. Pain Points & Issues (10 detailed issues) -5. Refactoring Proposal - High-Level Architecture -6. Migration Strategy (6 phases) -7. Immediate Wins (5 quick improvements) -8. Testing Strategy -9. Alternatives Considered -10. Success Metrics -11. Risks & Mitigation -12. Open Questions -13. Conclusion & Next Steps -Appendix A: Key Files Reference -Appendix B: Glossary -``` - ---- - -## 🎨 Key Concepts at a Glance - -### Current Problems - -``` -🔴 Worker Threads → Complex inter-thread communication -🔴 Mixed Concerns → Fetching + validation + storage in one place -🔴 No Observability → Only logs, no metrics -🔴 Serial Processing → One event at a time -🔴 Many DB Calls → No batching -🔴 Hard to Test → Worker threads + tight coupling -``` - -### Proposed Solutions - -``` -🟢 Async/Await → No worker threads, simpler code -🟢 Separation of Concerns → Clear component boundaries -🟢 Built-in Metrics → Prometheus integration -🟢 Batch Operations → 10-50x performance improvement -🟢 Repository Pattern → Clean database abstraction -🟢 Dependency Injection → Easy to test and mock -``` - ---- - -## 📊 Expected Outcomes - -### Code Quality - -- Complexity: **15 → 5** (cyclomatic) -- Test Coverage: **60% → 80%+** -- Lines per Function: **100+ → <50** - -### Performance - -- Throughput: **2x improvement** -- Latency: **< 100ms per event** -- DB Calls: **30% reduction** - -### Reliability - -- Uptime: **> 99.9%** -- Recovery Time: **< 5 minutes** -- Failed Events: **< 0.1%** - -### Timeline - -- **Phase 1-2 (Weeks 1-4):** Foundation + Validation -- **Phase 3-4 (Weeks 5-8):** Processing + State Management -- **Phase 5-6 (Weeks 9-12):** Remove threads + Observability - ---- - -## 💬 Discussion Points - -### Critical Decisions Needed - -1. **Worker Threads:** Remove or keep? - - - Recommendation: **Remove** (use async/await) - -2. **Database:** Elasticsearch, Typesense, or both? - - - Recommendation: **Standardize** on one - -3. **Timeline:** Full refactor or immediate wins first? - - - Recommendation: **Both** (parallel tracks) - -4. **Backward Compatibility:** How long to support? - - Recommendation: **2 releases** - -### Optional Discussions - -5. Event prioritization strategy -6. Multi-region deployment -7. Event replay capability -8. Monitoring requirements - ---- - -## 🔗 Related Resources - -### Codebase - -``` -Key Files: -- src/components/Indexer/index.ts (490 lines) -- src/components/Indexer/crawlerThread.ts (380 lines) -- src/components/Indexer/processor.ts (207 lines) -- src/components/Indexer/processors/*.ts (13 files) -``` - -### External Documentation - -- [Ocean Protocol Docs](https://docs.oceanprotocol.com) -- [Ethers.js Provider API](https://docs.ethers.org/v6/api/providers/) -- [Node.js Worker Threads](https://nodejs.org/api/worker_threads.html) -- [Circuit Breaker Pattern](https://martinfowler.com/bliki/CircuitBreaker.html) - -### Design Patterns Referenced - -- Repository Pattern -- Strategy Pattern -- Chain of Responsibility -- Circuit Breaker -- Dependency Injection -- Event Bus - ---- - -## ✅ Pre-Meeting Validation - -**Ensure you can answer these questions before the meeting:** - -1. What is the main responsibility of the `OceanIndexer` class? -2. How does the current system handle block crawling? -3. What are the top 3 pain points you're most concerned about? -4. Which immediate win would you prioritize? -5. What are your concerns about the proposed architecture? -6. What timeline seems realistic for your team? -7. What metrics would you want to track in production? - ---- - -## 📝 Meeting Artifacts - -**After the meeting, you'll have:** - -1. ✅ **Decisions Log** - - - Worker threads: Remove/Keep - - Database choice - - Priority: Immediate wins vs full refactor - - Timeline agreement - -2. ✅ **Action Items** - - - Owner assignments - - Deadlines - - Dependencies - - Success criteria - -3. ✅ **Risk Register** - - - Identified risks - - Mitigation strategies - - Contingency plans - -4. ✅ **Next Steps** - - Phase 1 detailed design - - Performance benchmarks setup - - Team assignments - - Follow-up meeting schedule - ---- - -## 🚀 Getting Started (Post-Meeting) - -### Week 1 Tasks - -1. **Create detailed design docs** for Phase 1 components - - - ResilientRpcClient spec - - BlockScanner interface - - Metrics infrastructure - -2. **Set up performance benchmarks** - - - Current baseline measurements - - Test environment - - Monitoring tools - -3. **Begin immediate wins** - - - Extract DDO Decryption Service - - Add batch database operations - - Implement circuit breaker POC - -4. **Establish team structure** - - Assign component owners - - Set up code review process - - Create communication channels - ---- - -## 📞 Questions or Feedback? - -For questions about these documents or the proposed architecture: - -1. Open a discussion in the team channel -2. Add comments to the documents -3. Bring to the architecture sync meeting - ---- - -**Last Updated:** January 14, 2026 -**Version:** 1.0 -**Status:** Ready for Meeting - ---- - -## 🎉 Let's Build a Better Indexer! - -Good luck with your architecture review meeting! These documents should provide a solid foundation for productive discussions and clear decision-making. diff --git a/INDEXER_FLOW_DIAGRAMS.md b/INDEXER_FLOW_DIAGRAMS.md deleted file mode 100644 index 220410d73..000000000 --- a/INDEXER_FLOW_DIAGRAMS.md +++ /dev/null @@ -1,712 +0,0 @@ -# Ocean Node Indexer - Flow Diagrams - -Visual representations of current and proposed architectures. - ---- - -## 1. CURRENT ARCHITECTURE - COMPONENT VIEW - -``` -┌─────────────────────────────────────────────────────────────────────┐ -│ OceanIndexer │ -│ - Main coordinator in main thread │ -│ - Manages worker lifecycle │ -│ - Handles job queue (JOBS_QUEUE) │ -│ - Manages reindex tasks (INDEXING_QUEUE) │ -│ - Event emitters (INDEXER_DDO_EVENT_EMITTER) │ -└─────────────────────────────────────────────────────────────────────┘ - │ │ │ - │ Worker Thread │ Worker Thread │ Worker Thread - ↓ ↓ ↓ -┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ -│ CrawlerThread │ │ CrawlerThread │ │ CrawlerThread │ -│ Chain: 1 │ │ Chain: 137 │ │ Chain: 8996 │ -│ │ │ │ │ │ -│ while(true) { │ │ while(true) { │ │ while(true) { │ -│ ┌───────────┐ │ │ ┌───────────┐ │ │ ┌───────────┐ │ -│ │ Get last │ │ │ │ Get last │ │ │ │ Get last │ │ -│ │ indexed │ │ │ │ indexed │ │ │ │ indexed │ │ -│ │ block │ │ │ │ block │ │ │ │ block │ │ -│ └─────┬─────┘ │ │ └─────┬─────┘ │ │ └─────┬─────┘ │ -│ ↓ │ │ ↓ │ │ ↓ │ -│ ┌───────────┐ │ │ ┌───────────┐ │ │ ┌───────────┐ │ -│ │Get network│ │ │ │Get network│ │ │ │Get network│ │ -│ │ height │ │ │ │ height │ │ │ │ height │ │ -│ └─────┬─────┘ │ │ └─────┬─────┘ │ │ └─────┬─────┘ │ -│ ↓ │ │ ↓ │ │ ↓ │ -│ ┌───────────┐ │ │ ┌───────────┐ │ │ ┌───────────┐ │ -│ │Retrieve │ │ │ │Retrieve │ │ │ │Retrieve │ │ -│ │chunk │ │ │ │chunk │ │ │ │chunk │ │ -│ │events │ │ │ │events │ │ │ │events │ │ -│ └─────┬─────┘ │ │ └─────┬─────┘ │ │ └─────┬─────┘ │ -│ ↓ │ │ ↓ │ │ ↓ │ -│ ┌───────────┐ │ │ ┌───────────┐ │ │ ┌───────────┐ │ -│ │Process │ │ │ │Process │ │ │ │Process │ │ -│ │blocks │ │ │ │blocks │ │ │ │blocks │ │ -│ └─────┬─────┘ │ │ └─────┬─────┘ │ │ └─────┬─────┘ │ -│ ↓ │ │ ↓ │ │ ↓ │ -│ ┌───────────┐ │ │ ┌───────────┐ │ │ ┌───────────┐ │ -│ │Update DB │ │ │ │Update DB │ │ │ │Update DB │ │ -│ └─────┬─────┘ │ │ └─────┬─────┘ │ │ └─────┬─────┘ │ -│ ↓ │ │ ↓ │ │ ↓ │ -│ ┌───────────┐ │ │ ┌───────────┐ │ │ ┌───────────┐ │ -│ │Sleep 30s │ │ │ │Sleep 30s │ │ │ │Sleep 30s │ │ -│ └─────┬─────┘ │ │ └─────┬─────┘ │ │ └─────┬─────┘ │ -│ ↓ │ │ ↓ │ │ ↓ │ -│ } │ │ } │ │ } │ -└──────────────────┘ └──────────────────┘ └──────────────────┘ - │ │ │ - └────────────────────┴──────────────────────┘ - ↓ - ┌───────────────────────┐ - │ Database Layer │ - │ - DDO Storage │ - │ - Order Storage │ - │ - Indexer State │ - │ - DDO State │ - └───────────────────────┘ -``` - ---- - -## 2. CURRENT ARCHITECTURE - EVENT PROCESSING FLOW - -``` -Event Log (from RPC) - │ - ↓ -┌─────────────────────────────────────────────────────────────┐ -│ processChunkLogs(logs, signer, provider, chainId) │ -│ │ -│ for each log: │ -│ 1. findEventByKey(log.topics[0]) │ -│ 2. if (METADATA_CREATED/UPDATED/STATE): │ -│ ├─→ Check allowedValidators │ -│ ├─→ Get transaction receipt │ -│ ├─→ Fetch MetadataValidated events │ -│ ├─→ Validate validators │ -│ │ ├─→ Check ALLOWED_VALIDATORS list │ -│ │ └─→ For each access list: │ -│ │ └─→ For each validator: │ -│ │ └─→ Check balanceOf() │ -│ └─→ If not valid: continue (skip event) │ -│ 3. Route to processor │ -│ 4. Store in storeEvents{} │ -│ │ -│ return storeEvents │ -└─────────────────────────────────────────────────────────────┘ - │ - ↓ -┌─────────────────────────────────────────────────────────────┐ -│ Event Processor (processors/*.ts) │ -│ │ -│ MetadataEventProcessor: │ -│ ├─→ wasNFTDeployedByOurFactory() │ -│ ├─→ getEventData() - decode from receipt │ -│ ├─→ decryptDDO() - 400+ lines │ -│ │ ├─→ HTTP decryption │ -│ │ ├─→ P2P decryption │ -│ │ └─→ Local decryption │ -│ ├─→ Check authorizedPublishers │ -│ ├─→ Check authorizedPublishersList │ -│ ├─→ getTokenInfo() │ -│ ├─→ getNFTInfo() │ -│ ├─→ getPricingStatsForDddo() │ -│ ├─→ PolicyServer check │ -│ ├─→ Purgatory check │ -│ └─→ createOrUpdateDDO() │ -│ │ -│ OrderStartedEventProcessor: │ -│ ├─→ Decode event │ -│ ├─→ Get DDO from database │ -│ ├─→ Update stats.orders │ -│ ├─→ Create order record │ -│ └─→ Update DDO │ -└─────────────────────────────────────────────────────────────┘ - │ - ↓ -┌─────────────────────────────────────────────────────────────┐ -│ Database Operations │ -│ ├─→ ddoDatabase.update() │ -│ ├─→ ddoState.update() │ -│ └─→ orderDatabase.create() │ -└─────────────────────────────────────────────────────────────┘ -``` - ---- - -## 3. PROPOSED ARCHITECTURE - COMPONENT VIEW - -``` -┌──────────────────────────────────────────────────────────────┐ -│ IndexerOrchestrator │ -│ - Single coordinator │ -│ - Manages ChainIndexer lifecycle │ -│ - Health checks │ -│ - Metrics aggregation │ -│ - Event bus for notifications │ -└──────────────────────────────────────────────────────────────┘ - │ │ │ - │ async │ async │ async - ↓ ↓ ↓ -┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ -│ ChainIndexer │ │ ChainIndexer │ │ ChainIndexer │ -│ Chain: 1 │ │ Chain: 137 │ │ Chain: 8996 │ -└────────┬─────────┘ └────────┬─────────┘ └────────┬─────────┘ - │ │ │ - └─────────────────────┴──────────────────────┘ - │ - ┌──────────────────────┼──────────────────────┐ - ↓ ↓ ↓ -┌───────────────┐ ┌──────────────────┐ ┌──────────────────┐ -│ BlockScanner │ │ EventExtractor │ │ValidationPipeline│ -│ │ │ │ │ │ -│ - RPC client │ │ - Decode events │ │ - Chain of │ -│ - Fallback │ │ - Categorize │ │ validators │ -│ - Retry │ │ - Filter │ │ - Parallel exec │ -│ - Circuit │ │ │ │ │ -│ breaker │ │ │ │ │ -└───────────────┘ └──────────────────┘ └──────────────────┘ - │ │ │ - └──────────────────────┼──────────────────────┘ - ↓ - ┌──────────────────┐ - │ EventProcessor │ - │ │ - │ - Route to │ - │ handlers │ - │ - Transform to │ - │ domain models │ - └────────┬─────────┘ - │ - ↓ - ┌──────────────────┐ - │ StateManager │ - │ │ - │ - Repositories │ - │ - Transactions │ - │ - Batch ops │ - └────────┬─────────┘ - │ - ↓ - ┌──────────────────┐ - │ Database Layer │ - └──────────────────┘ -``` - ---- - -## 4. PROPOSED ARCHITECTURE - EVENT PROCESSING FLOW - -``` -Raw Event Logs - │ - ↓ -┌─────────────────────────────────────────────────────┐ -│ 1. EventExtractor.extractEvents(logs) │ -│ │ -│ Parse and categorize events: │ -│ { │ -│ metadata: [...], │ -│ orders: [...], │ -│ pricing: [...], │ -│ unknown: [...] │ -│ } │ -└──────────────────┬──────────────────────────────────┘ - │ - ↓ -┌─────────────────────────────────────────────────────┐ -│ 2. ValidationPipeline.validate(event) │ -│ │ -│ Chain of validators (can run in parallel): │ -│ │ -│ ┌──────────────────────┐ │ -│ │ FactoryValidator │→ Check NFT factory │ -│ └──────────┬───────────┘ │ -│ ↓ │ -│ ┌──────────────────────┐ │ -│ │MetadataProofValidator│→ Check signatures │ -│ └──────────┬───────────┘ │ -│ ↓ │ -│ ┌──────────────────────┐ │ -│ │ PublisherValidator │→ Check authorized │ -│ └──────────┬───────────┘ │ -│ ↓ │ -│ ┌──────────────────────┐ │ -│ │ AccessListValidator │→ Check access list │ -│ └──────────┬───────────┘ │ -│ ↓ │ -│ ┌──────────────────────┐ │ -│ │ PolicyServerValidator│→ Check policy │ -│ └──────────┬───────────┘ │ -│ ↓ │ -│ Result: { valid: boolean, errors: [...] } │ -└──────────────────┬──────────────────────────────────┘ - │ - ↓ (only valid events) -┌─────────────────────────────────────────────────────┐ -│ 3. EventProcessor.process(event) │ -│ │ -│ Route to appropriate handler: │ -│ │ -│ if (MetadataEvent): │ -│ ┌────────────────────────┐ │ -│ │MetadataCreatedHandler │ │ -│ │ - Decrypt DDO │ │ -│ │ - Fetch pricing │ │ -│ │ - Build entity │ │ -│ └────────┬───────────────┘ │ -│ ↓ │ -│ Domain Entity: DDO │ -│ │ -│ if (OrderEvent): │ -│ ┌────────────────────────┐ │ -│ │ OrderStartedHandler │ │ -│ │ - Update order count │ │ -│ │ - Build entity │ │ -│ └────────┬───────────────┘ │ -│ ↓ │ -│ Domain Entity: Order │ -└──────────────────┬──────────────────────────────────┘ - │ - ↓ -┌─────────────────────────────────────────────────────┐ -│ 4. StateManager.saveBatch(entities) │ -│ │ -│ transaction { │ -│ for each entity: │ -│ ├─→ ddoRepository.save() │ -│ ├─→ orderRepository.save() │ -│ └─→ stateRepository.update() │ -│ } │ -│ │ -│ Single database transaction, batched writes │ -└──────────────────┬──────────────────────────────────┘ - │ - ↓ -┌─────────────────────────────────────────────────────┐ -│ 5. EventBus.emit('event.processed', entity) │ -│ │ -│ Notify listeners for: │ -│ - Replication │ -│ - Notifications │ -│ - Webhooks │ -│ - Metrics │ -└─────────────────────────────────────────────────────┘ -``` - ---- - -## 5. BLOCK CRAWLING FLOW - CURRENT vs PROPOSED - -### CURRENT (with Worker Threads) - -``` -Main Thread Worker Thread (Chain 1) - │ │ - │ startThread(chainId) │ - ├──────────────────────────────→ │ - │ │ Start infinite loop - │ │ - │ ├→ getLastIndexedBlock() - │ │ (from DB) - │ │ - │ ├→ getNetworkHeight() - │ │ (RPC call) - │ │ - │ ├→ retrieveChunkEvents() - │ │ (RPC call) - │ │ - │ ├→ processBlocks() - │ │ ├→ processChunkLogs() - │ │ │ ├→ validate - │ │ │ ├→ process - │ │ │ └→ store - │ │ │ - │ │ └→ updateLastIndexedBlock() - │ │ (DB write) - │ │ - │ message: 'METADATA_CREATED' │ - │ ←──────────────────────────────┤ - │ emit event │ - │ │ - │ ├→ sleep(30s) - │ │ - │ └→ loop continues... - │ - │ stopThread(chainId) │ - ├──────────────────────────────→ │ - │ │ set stoppedCrawling = true - │ │ exit loop - -Issues: -❌ Complex message passing -❌ Global state management -❌ Hard to debug -❌ Testing requires Worker Thread mocking -``` - -### PROPOSED (with async/await) - -``` -IndexerOrchestrator ChainIndexer (Chain 1) - │ │ - │ start() │ - ├──────────────────────────────→ │ - │ │ async run() { - │ │ - │ ├→ progress = await stateManager - │ │ .getProgress() - │ │ - │ ├→ height = await blockScanner - │ │ .getLatestBlock() - │ │ - │ ├→ logs = await blockScanner - │ │ .getLogs(from, to) - │ │ - │ ├→ events = eventExtractor - │ │ .extract(logs) - │ │ - │ ├→ for each event: - │ │ result = await pipeline - │ │ .validate(event) - │ │ if (result.valid): - │ │ entity = await processor - │ │ .process(event) - │ │ batch.add(entity) - │ │ - │ ├→ await stateManager - │ │ .saveBatch(batch) - │ │ - │ onProgress(progress) │ - │ ←──────────────────────────────┤ eventBus.emit('progress') - │ │ - │ ├→ await sleep(interval) - │ │ - │ └→ } loop continues... - │ - │ stop() │ - ├──────────────────────────────→ │ - │ │ await this.stopSignal - │ │ return - -Benefits: -✅ Direct method calls -✅ Clear data flow -✅ Easy to test (just async functions) -✅ Better error handling -✅ No Worker Thread complexity -``` - ---- - -## 6. DATABASE OPERATIONS - CURRENT vs PROPOSED - -### CURRENT (Multiple Calls Per Event) - -``` -Event Processing - │ - ├─→ const { ddo, ddoState } = await getDatabase() - │ - ├─→ await ddo.retrieve(id) ← DB call 1 - │ - ├─→ await ddo.update(updatedDdo) ← DB call 2 - │ - └─→ await ddoState.update(...) ← DB call 3 - -For 100 events: ~300 database calls -``` - -### PROPOSED (Batch Operations) - -``` -Event Processing - │ - ├─→ Validate and process all events - │ (in memory, no DB calls) - │ - └─→ await stateManager.saveBatch([ - ...ddos, - ...orders, - ...stateUpdates - ]) - │ - └─→ Single transaction: - BEGIN - UPDATE indexer SET lastBlock = ... - INSERT INTO ddos VALUES ... - INSERT INTO orders VALUES ... - UPDATE ddo_state SET ... - COMMIT - -For 100 events: ~1 database transaction -Performance improvement: 100-300x -``` - ---- - -## 7. ERROR HANDLING - CURRENT vs PROPOSED - -### CURRENT (Multiple Retry Layers) - -``` -┌─────────────────────────────────────────┐ -│ retryCrawlerWithDelay() │ -│ MAX_CRAWL_RETRIES = 10 │ -│ ├─→ startCrawler() │ -│ │ └─→ tryFallbackRPCs() │ -│ │ │ -│ └─→ On failure: recursive call │ -└─────────────────────────────────────────┘ - │ - ↓ -┌─────────────────────────────────────────┐ -│ retrieveChunkEvents() │ -│ ├─→ provider.getLogs() │ -│ └─→ On error: throw │ -│ └─→ Caught by processNetworkData │ -│ └─→ chunkSize = chunkSize / 2 │ -└─────────────────────────────────────────┘ - │ - ↓ -┌─────────────────────────────────────────┐ -│ processBlocks() │ -│ └─→ On error: catch │ -│ └─→ sleep & retry same chunk │ -└─────────────────────────────────────────┘ - │ - ↓ -┌─────────────────────────────────────────┐ -│ withRetrial() │ -│ maxRetries = 5 │ -│ └─→ Used in decryptDDO │ -└─────────────────────────────────────────┘ - -Issues: -❌ 4 different retry mechanisms -❌ Unclear recovery state -❌ Potential infinite loops -❌ No circuit breaker -``` - -### PROPOSED (Unified Strategy) - -``` -┌─────────────────────────────────────────┐ -│ ResilientRpcClient │ -│ │ -│ ┌──────────────────────────────┐ │ -│ │ Circuit Breaker │ │ -│ │ States: CLOSED → OPEN → HALF│ │ -│ │ Failure threshold: 5 │ │ -│ │ Timeout: 60s │ │ -│ │ Success threshold: 2 │ │ -│ └──────────────────────────────┘ │ -│ │ │ -│ ↓ │ -│ ┌──────────────────────────────┐ │ -│ │ Retry Strategy │ │ -│ │ Max attempts: 3 │ │ -│ │ Backoff: exponential │ │ -│ │ Jitter: ±20% │ │ -│ └──────────────────────────────┘ │ -│ │ │ -│ ↓ │ -│ ┌──────────────────────────────┐ │ -│ │ Fallback Providers │ │ -│ │ Try each provider in order │ │ -│ │ Mark as unhealthy on error │ │ -│ └──────────────────────────────┘ │ -│ │ │ -│ ↓ │ -│ ┌──────────────────────────────┐ │ -│ │ Metrics Collection │ │ -│ │ - Success/failure rates │ │ -│ │ - Latency per provider │ │ -│ │ - Circuit breaker state │ │ -│ └──────────────────────────────┘ │ -└─────────────────────────────────────────┘ - -Benefits: -✅ Single retry mechanism -✅ Prevents cascade failures -✅ Clear recovery path -✅ Observable behavior -✅ Production-ready -``` - ---- - -## 8. TESTING STRATEGY - CURRENT vs PROPOSED - -### CURRENT - -``` -Integration Tests (Heavy) - │ - ├─→ Start local blockchain (Ganache) - ├─→ Deploy contracts - ├─→ Start Elasticsearch/Typesense - ├─→ Create OceanIndexer - ├─→ Wait for worker threads to start - ├─→ Publish test assets - ├─→ Wait for indexing (polling) - ├─→ Query database - └─→ Assert results - -Issues: -❌ Slow (30+ seconds per test) -❌ Flaky (timing issues) -❌ Hard to debug -❌ Worker threads hard to mock -❌ Few unit tests -``` - -### PROPOSED - -``` -Unit Tests (Fast) - │ - ├─→ EventExtractor - │ └─→ Mock logs → assert events - │ Time: ~10ms - │ - ├─→ FactoryValidator - │ └─→ Mock contract → assert validation - │ Time: ~5ms - │ - ├─→ MetadataCreatedHandler - │ └─→ Mock dependencies → assert DDO - │ Time: ~20ms - │ - └─→ StateManager - └─→ Mock repositories → assert calls - Time: ~5ms - -Integration Tests (Moderate) - │ - ├─→ ChainIndexer (end-to-end) - │ └─→ Mock RPC + DB → assert flow - │ Time: ~100ms - │ - └─→ ValidationPipeline - └─→ Mock validators → assert chain - Time: ~50ms - -Contract Tests - │ - └─→ ResilientRpcClient - └─→ Real RPC providers (staging) - Time: ~500ms - -Benefits: -✅ Fast feedback (< 1s for unit tests) -✅ Easy to debug -✅ High coverage -✅ Reliable -✅ Parallelizable -``` - ---- - -## 9. METRICS & OBSERVABILITY - -### Proposed Metrics Dashboard - -``` -┌─────────────────────────────────────────────────────────┐ -│ INDEXER DASHBOARD │ -├─────────────────────────────────────────────────────────┤ -│ │ -│ Chain Status │ -│ ┌────────┬─────────────┬──────────┬─────────┐ │ -│ │ Chain │ Last Block │ Lag │ Status │ │ -│ ├────────┼─────────────┼──────────┼─────────┤ │ -│ │ 1 │ 12,345,678 │ 10 mins │ 🟢 │ │ -│ │ 137 │ 45,678,901 │ 2 mins │ 🟢 │ │ -│ │ 8996 │ 1,234,567 │ 1 hour │ 🔴 │ │ -│ └────────┴─────────────┴──────────┴─────────┘ │ -│ │ -│ Event Processing │ -│ ┌────────────────────────────────────────────┐ │ -│ │ Events/sec: ████████░░░░ 127 avg │ │ -│ │ Blocks/sec: ██████░░░░░░ 45 avg │ │ -│ └────────────────────────────────────────────┘ │ -│ │ -│ Event Types (last hour) │ -│ ┌────────────────────────────────────────────┐ │ -│ │ MetadataCreated: ████████ 234 │ │ -│ │ MetadataUpdated: ███ 45 │ │ -│ │ OrderStarted: ██████ 123 │ │ -│ │ OrderReused: ██ 34 │ │ -│ └────────────────────────────────────────────┘ │ -│ │ -│ RPC Health │ -│ ┌────────┬──────────┬─────────┬──────────┐ │ -│ │Provider│ Latency │ Success │ Circuit │ │ -│ ├────────┼──────────┼─────────┼──────────┤ │ -│ │Infura │ 120ms │ 99.8% │ CLOSED │ │ -│ │Alchemy │ 95ms │ 99.9% │ CLOSED │ │ -│ │Public │ 450ms │ 87.2% │ OPEN │ │ -│ └────────┴──────────┴─────────┴──────────┘ │ -│ │ -│ Database Performance │ -│ ┌────────────────────────────────────────────┐ │ -│ │ Write Latency: 45ms avg │ │ -│ │ Read Latency: 12ms avg │ │ -│ │ Batch Size: 50 avg │ │ -│ └────────────────────────────────────────────┘ │ -│ │ -│ Errors (last hour) │ -│ ┌────────────────────────────────────────────┐ │ -│ │ RPC Errors: █ 5 │ │ -│ │ Validation Errors: ██ 12 │ │ -│ │ DB Errors: 0 │ │ -│ └────────────────────────────────────────────┘ │ -└─────────────────────────────────────────────────────────┘ -``` - -### Key Metrics - -```typescript -// Counters -indexer_blocks_processed_total{chain="1"} -indexer_events_processed_total{chain="1", type="metadata"} -indexer_rpc_calls_total{chain="1", provider="infura", result="success"} -indexer_validation_failures_total{chain="1", validator="factory"} - -// Gauges -indexer_last_indexed_block{chain="1"} -indexer_block_lag_seconds{chain="1"} -indexer_chain_status{chain="1"} // 1=healthy, 0=unhealthy - -// Histograms -indexer_block_processing_duration_seconds{chain="1"} -indexer_event_processing_duration_seconds{type="metadata"} -indexer_rpc_latency_seconds{provider="infura"} -indexer_db_write_duration_seconds - -// Summaries -indexer_batch_size{operation="save_ddos"} -``` - ---- - -## 10. COMPARISON SUMMARY - -| Aspect | Current | Proposed | Improvement | -| ------------------- | -------------------------------- | ------------------------------------- | -------------------- | -| **Architecture** | Worker threads per chain | Async ChainIndexer classes | Simpler | -| **Code Complexity** | High (nested, mixed concerns) | Low (SRP, clear layers) | 60% reduction | -| **Testing** | Integration-heavy | Unit test friendly | 10x faster tests | -| **Performance** | Serial processing, many DB calls | Batch operations, parallel validation | 2-10x faster | -| **Error Handling** | Multiple retry mechanisms | Unified circuit breaker | More reliable | -| **Observability** | Logs only | Metrics + logs + tracing | Production-ready | -| **Maintainability** | Hard to modify | Easy to extend | New feature < 1 week | -| **Memory Usage** | Thread overhead | Efficient async | 20-30% reduction | -| **Debugging** | Thread dumps, message tracing | Stack traces, clear flow | Much easier | - ---- - -**These diagrams should be used alongside the detailed architecture document for the meeting discussion.** diff --git a/INDEXER_MEETING_SUMMARY.md b/INDEXER_MEETING_SUMMARY.md deleted file mode 100644 index ab6336d8a..000000000 --- a/INDEXER_MEETING_SUMMARY.md +++ /dev/null @@ -1,491 +0,0 @@ -# Ocean Node Indexer - Meeting Summary - -## Architecture Review & Refactoring Direction - -**Date:** January 14, 2026 -**Duration:** 90 minutes -**Goal:** Align on architecture & produce draft refactoring proposal - ---- - -## 📋 AGENDA - -1. **Current Architecture Overview** (15 min) -2. **Pain Points Discussion** (20 min) -3. **Proposed Solutions** (30 min) -4. **Priorities & Timeline** (15 min) -5. **Open Questions & Next Steps** (10 min) - ---- - -## 🎯 KEY TAKEAWAYS (TL;DR) - -### What Works - -✅ Successfully indexes multiple chains -✅ Handles reindexing operations -✅ Validates events through multiple layers -✅ Stores comprehensive metadata - -### What Needs Improvement - -❌ **High Complexity** - Worker threads, mixed concerns -❌ **Limited Observability** - Hard to debug production issues -❌ **Testing Challenges** - Worker threads difficult to test -❌ **Performance Bottlenecks** - Serial processing, many RPC calls -❌ **Maintainability** - Large functions, tight coupling - ---- - -## 📊 CURRENT ARCHITECTURE (SIMPLIFIED) - -``` -OceanIndexer (Main Process) - │ - ├──► Worker Thread (Chain 1) - │ └──► while(true) { - │ - Get new blocks - │ - Retrieve events - │ - Process events - │ - Update database - │ - Sleep 30s - │ } - │ - ├──► Worker Thread (Chain 2) - ├──► Worker Thread (Chain 3) - └──► ... - -Issues: -- Complex inter-thread messaging -- Global mutable state -- Mixed concerns (fetching + validation + storage) -- Hard to test -``` - ---- - -## 🏗️ PROPOSED ARCHITECTURE - -``` -IndexerOrchestrator - │ - ├──► ChainIndexer(1) ──► BlockScanner ──► ResilientRpcClient - │ │ - │ ├──► EventExtractor - │ │ - │ ├──► ValidationPipeline - │ │ ├─ FactoryValidator - │ │ ├─ MetadataValidator - │ │ ├─ PublisherValidator - │ │ └─ PolicyValidator - │ │ - │ ├──► EventProcessor - │ │ ├─ MetadataHandler - │ │ ├─ OrderHandler - │ │ └─ PricingHandler - │ │ - │ └──► StateManager (Database Layer) - │ - ├──► ChainIndexer(2) - └──► ChainIndexer(N) - -Benefits: -✓ No worker threads (async/await) -✓ Clear separation of concerns -✓ Easy to test each component -✓ Better error handling -✓ Built-in observability -``` - ---- - -## 🔴 TOP 10 PAIN POINTS - -### 1. Worker Thread Complexity - -**Problem:** Inter-thread messaging, shared state, race conditions -**Impact:** Hard to debug, test, and extend -**Solution:** Replace with async/await ChainIndexer classes - -### 2. Monolithic Event Processing - -**Problem:** `processChunkLogs()` - 180+ lines, deeply nested -**Impact:** Hard to read, maintain, add features -**Solution:** Extract to ValidationPipeline + EventProcessor - -### 3. No Error Recovery Strategy - -**Problem:** Multiple retry mechanisms, no circuit breaker -**Impact:** Unclear state after failures, potential infinite loops -**Solution:** Implement ResilientRpcClient with circuit breaker - -### 4. DDO Decryption Complexity - -**Problem:** 400+ line method handling HTTP/P2P/local -**Impact:** Hard to test, unclear error messages -**Solution:** Extract to DdoDecryptionService - -### 5. Global Mutable State - -**Problem:** Global queues, flags scattered across files -**Impact:** Race conditions, hard to test -**Solution:** Encapsulate state in classes - -### 6. Serial Event Processing - -**Problem:** One event at a time, many RPC calls -**Impact:** Slow throughput -**Solution:** Batch operations, parallel validation - -### 7. Direct Database Coupling - -**Problem:** `await getDatabase()` everywhere -**Impact:** Hard to test, no transactions -**Solution:** Repository pattern, StateManager - -### 8. Limited Observability - -**Problem:** Only logs, no metrics -**Impact:** Can't track performance, debug issues -**Solution:** Add Prometheus metrics, structured logging - -### 9. Testing Difficulties - -**Problem:** Worker threads, database dependencies -**Impact:** Few unit tests, long integration tests -**Solution:** Dependency injection, interfaces - -### 10. Unclear Configuration - -**Problem:** Env vars, hardcoded values, no validation -**Impact:** Deployment issues, unclear behavior -**Solution:** Config class with validation - ---- - -## 💡 IMMEDIATE WINS (Can Start Tomorrow) - -These provide value without full refactor: - -### 1. Extract DDO Decryption Service - -**Effort:** 1-2 days -**Impact:** High (cleaner code, testable) - -```typescript -class DdoDecryptionService { - async decrypt(params: DecryptParams): Promise { - if (isHttp(params.decryptorURL)) { - return this.decryptHttp(params) - } else if (isP2P(params.decryptorURL)) { - return this.decryptP2P(params) - } else { - return this.decryptLocal(params) - } - } -} -``` - -### 2. Add Batch Database Operations - -**Effort:** 2-3 days -**Impact:** Very High (10-50x performance) - -```typescript -// Before: O(n) database calls -for (const event of events) { - await database.save(event) -} - -// After: O(1) database calls -await database.saveBatch(events) -``` - -### 3. Extract Validation Functions - -**Effort:** 2-3 days -**Impact:** High (readability, testability) - -```typescript -class EventValidation { - async validateFactory(event: DecodedEvent): Promise - async validateMetadataProof(event: MetadataEvent): Promise - async validatePublisher(event: MetadataEvent): Promise - async validateAccessList(event: MetadataEvent): Promise -} -``` - -### 4. Add Circuit Breaker for RPC - -**Effort:** 1-2 days -**Impact:** High (reliability) - -```typescript -class ResilientRpcClient { - private circuitBreaker: CircuitBreaker - - async execute(fn: RpcCall): Promise { - return this.circuitBreaker.execute(() => this.tryWithFallback(fn)) - } -} -``` - -### 5. Add Prometheus Metrics - -**Effort:** 2-3 days -**Impact:** Very High (observability) - -```typescript -metrics.indexer_blocks_processed_total.inc() -metrics.indexer_events_processed{type="metadata"}.inc() -metrics.indexer_processing_duration_seconds.observe(duration) -metrics.indexer_rpc_errors_total{provider="infura"}.inc() -``` - -**Total Effort:** ~2 weeks -**Total Impact:** Significant quality & performance improvements - ---- - -## 📅 PHASED REFACTORING TIMELINE - -### Phase 1: Foundation (Week 1-2) - -- ResilientRpcClient with circuit breaker -- BlockScanner interface -- Metrics infrastructure -- Tests for new components - -### Phase 2: Validation (Week 3-4) - -- Validator interface + implementations -- ValidationPipeline -- Refactor processChunkLogs() - -### Phase 3: Event Processing (Week 5-6) - -- EventHandler interface + implementations -- Domain models (separate from DB) -- Refactor processors - -### Phase 4: State Management (Week 7-8) - -- Repository pattern -- Transactional StateManager -- Batch operations - -### Phase 5: Remove Worker Threads (Week 9-10) - -- ChainIndexer class -- Replace threads with async loops -- Direct method calls (no messages) - -### Phase 6: Observability (Week 11-12) - -- Comprehensive metrics -- Health checks -- Monitoring dashboards - -**Total Timeline:** ~12 weeks (3 months) - ---- - -## 🎲 ALTERNATIVES CONSIDERED - -| Alternative | Pros | Cons | Decision | -| ------------------------- | ------------------- | ----------------------- | ------------------- | -| **Keep Worker Threads** | True parallelism | Complex, hard to debug | ❌ Remove | -| **Event Sourcing** | Audit trail, replay | Too complex | ❌ Not now | -| **Message Queue (Kafka)** | Decoupled, scalable | Infrastructure overhead | ⏸️ Revisit at scale | -| **GraphQL Subscriptions** | Real-time updates | Not needed | ❌ Out of scope | - ---- - -## ❓ OPEN QUESTIONS FOR DISCUSSION - -### Technical Questions - -1. **Worker Threads:** Do we truly need parallelism or is async/await sufficient? - - - Current: 1 thread per chain - - Proposed: Async ChainIndexer classes - - Decision needed: ? - -2. **Database Choice:** Standardize on Elasticsearch or Typesense, or keep both? - - - Current: Both supported - - Maintenance cost: High - - Decision needed: ? - -3. **Event Prioritization:** Should metadata events be prioritized over pricing events? - - - Current: FIFO processing - - Risk: Important events delayed by minor ones - - Decision needed: ? - -4. **Reindex Strategy:** Should reindexing be a separate service? - - Current: Mixed with normal indexing - - Potential: Dedicated reindex service - - Decision needed: ? - -### Product Questions - -5. **Monitoring Requirements:** What metrics are critical for production? - - - Blocks/sec? - - Events/sec? - - RPC latency? - - Error rates? - - Decision needed: ? - -6. **SLA Requirements:** What are our uptime/reliability targets? - - 99.9% uptime? - - Max 5 min recovery time? - - < 0.1% failed events? - - Decision needed: ? - -### Process Questions - -7. **Backward Compatibility:** How long support old schemas? - - - Database migrations - - API compatibility - - Decision needed: ? - -8. **Rollout Strategy:** Big bang or gradual rollout? - - Feature flags? - - Parallel running? - - Decision needed: ? - ---- - -## 📈 SUCCESS METRICS - -### Code Quality Targets - -- ✅ Cyclomatic Complexity: < 5 (currently ~15) -- ✅ Test Coverage: > 80% (currently ~60%) -- ✅ Lines per Function: < 50 (currently 100+) -- ✅ Type Safety: 100% (no `any`) - -### Performance Targets - -- ✅ Throughput: 2x improvement in events/sec -- ✅ Latency: < 100ms per event -- ✅ Memory: Stable (no leaks) -- ✅ RPC Calls: Reduce by 30% - -### Reliability Targets - -- ✅ Uptime: > 99.9% -- ✅ Failed Events: < 0.1% -- ✅ Recovery Time: < 5 minutes -- ✅ Reindex Success: > 99% - -### Maintainability Targets - -- ✅ Onboarding: < 2 days -- ✅ Bug Fix Time: < 4 hours -- ✅ Feature Time: < 1 week -- ✅ Incidents: < 1/month - ---- - -## 🚀 NEXT STEPS - -### Today (This Meeting) - -1. Review and discuss document -2. Agree on high-level direction -3. Prioritize: Immediate wins vs full refactor? -4. Assign investigation tasks - -### Next Week - -1. Detailed design for Phase 1 -2. Create ADRs (Architecture Decision Records) -3. Set up performance benchmarks -4. Begin immediate wins implementation - -### Ongoing - -1. Weekly architecture sync -2. Code review focus on quality -3. Regular performance testing -4. Documentation updates - ---- - -## 📚 REFERENCE MATERIALS - -### Main Document - -See: `INDEXER_ARCHITECTURE_ANALYSIS.md` (detailed 13-section analysis) - -### Key Code Files - -``` -src/components/Indexer/ -├── index.ts - Main coordinator (490 lines) -├── crawlerThread.ts - Worker thread (380 lines) -├── processor.ts - Event processing (207 lines) -└── processors/ - ├── BaseProcessor.ts - Base class (442 lines) - └── MetadataEventProcessor.ts - Metadata (403 lines) -``` - -### Related Documentation - -- Ocean Protocol Docs: https://docs.oceanprotocol.com -- Ethers.js Provider: https://docs.ethers.org/v6/api/providers/ -- Worker Threads: https://nodejs.org/api/worker_threads.html - ---- - -## 🤝 MEETING ROLES - -- **Facilitator:** _[Name]_ -- **Note Taker:** _[Name]_ -- **Timekeeper:** _[Name]_ -- **Decision Maker:** _[Name]_ - ---- - -## ✅ ACTION ITEMS TEMPLATE - -_To be filled during meeting_ - -| Action | Owner | Deadline | Status | -| -------------------------------- | --------- | -------------- | ------ | -| Review detailed architecture doc | Team | Before meeting | ✅ | -| Decision on worker threads | Tech Lead | End of meeting | ⏳ | -| Design Phase 1 components | Architect | Next week | ⏳ | -| Set up performance benchmarks | DevOps | Next week | ⏳ | -| Implement circuit breaker POC | Dev 1 | Week 2 | ⏳ | -| Extract validation functions | Dev 2 | Week 2 | ⏳ | - ---- - -## 💬 DISCUSSION NOTES - -_Space for notes during meeting_ - -### Architecture Direction - -- - -### Priorities - -- - -### Concerns Raised - -- - -### Decisions Made - -- - ---- - -**Remember:** The goal is alignment and direction, not final implementation details! diff --git a/INDEXER_USE_CASES_AND_FLOWS.md b/INDEXER_USE_CASES_AND_FLOWS.md deleted file mode 100644 index 849290fd2..000000000 --- a/INDEXER_USE_CASES_AND_FLOWS.md +++ /dev/null @@ -1,2541 +0,0 @@ -# Ocean Node Indexer - Event Monitoring & Error Handling - -## Table of Contents - -1. [Overview](#overview) -2. [🔴 PROPOSED IMPROVEMENTS (Post-Meeting Changes)](#-proposed-improvements-post-meeting-changes) -3. [Event Monitoring Architecture](#event-monitoring-architecture) -4. [Event Processing Pipeline](#event-processing-pipeline) -5. [Detailed Event Handling](#detailed-event-handling) -6. [Error Handling & Retry Mechanisms](#error-handling--retry-mechanisms) -7. [Failure Scenarios & Recovery](#failure-scenarios--recovery) - ---- - -## 🔴 PROPOSED IMPROVEMENTS (Post-Meeting Changes) - -> **Status:** Draft proposals from Jan 27, 2026 meeting -> **Goal:** Improve reliability, decoupling, and error handling - -### 1. 🎯 EVENT-LEVEL RETRY MECHANISM WITH QUEUES - -**Current Issue:** Retry logic is deeply embedded in event processing steps (e.g., inside DDO decryption) - -**Proposed Change:** - -- **Move retry logic to event level** (not deep inside processing steps) -- **Implement queue-based retry system** for all 12 event types -- **Decouple retry from specific operations** (e.g., decrypt, p2p, HTTP) - -**Implementation:** - -``` -┌─────────────────────────────────────────────────────────────┐ -│ EVENT PROCESSING QUEUE │ -├─────────────────────────────────────────────────────────────┤ -│ │ -│ Event Detected → Add to Queue │ -│ ↓ │ -│ Queue Processor (async workers) │ -│ ↓ │ -│ Process Event │ -│ ├─ Success → Mark complete, update DB │ -│ └─ Failure → Add to Retry Queue with backoff │ -│ │ -│ Retry Queue (exponential backoff): │ -│ - Retry 1: ~10 seconds │ -│ - Retry 2: ~1 minute │ -│ - Retry 3: ~10 minutes │ -│ - Retry 4: ~1 hour │ -│ - Retry 5: ~1 week (final attempt) │ -│ │ -│ Benefits: │ -│ ✓ Non-blocking (doesn't halt indexer) │ -│ ✓ Works for ALL error types (HTTP, P2P, RPC, DB) │ -│ ✓ Configurable per event type │ -│ ✓ Visible retry state in monitoring │ -└─────────────────────────────────────────────────────────────┘ -``` - -**Applies to:** All event processors, especially METADATA_CREATED/UPDATED (DDO decryption) - ---- - -### 2. 🗄️ NEW DATABASE INDEX: `ddo_logs` - -**Current Issue:** - -- `ddoState` only tracks metadata events -- Order and pricing events have no error tracking -- No unified view of all DDO-related events - -**Proposed Change:** - -- Create new DB index: **`ddo_logs`** -- Store **all events** related to a DID (metadata, orders, pricing) -- Similar structure to `ddoState` but broader scope - -**Schema:** - -```typescript -interface DdoLog { - did: string // Indexed - chainId: number // Indexed - eventType: string // METADATA_CREATED, ORDER_STARTED, etc. - eventHash: string // Event signature hash - txHash: string // Transaction hash - blockNumber: number // Block number - timestamp: number // Event timestamp - status: 'success' | 'failed' | 'retrying' - error?: string // Error message if failed - retryCount: number // Number of retry attempts - lastRetry?: number // Timestamp of last retry - metadata?: Record // Event-specific data -} -``` - -**Benefits:** - -- Single source of truth for all DDO events -- Easier debugging (see all events for a DID) -- Track pricing/order event errors (not just metadata) -- Audit trail for compliance - ---- - -### 3. 🔄 REPLACE EventEmitter WITH QUEUES - -**Current Issue:** - -- Using `EventEmitter` for communication -- Synchronous, blocking behavior -- No retry/replay capability -- Difficult to test - -**Proposed Change:** - -- Replace `EventEmitter` with **persistent queue system** -- Use queue for: - - ✓ Newly indexed assets (instead of `eventEmitter.emit()`) - - ✓ Reindex requests (block & transaction level) - - ✓ Admin commands - -**Queue Types:** - -``` -1. EVENT_PROCESSING_QUEUE (primary) - - New events from blockchain - - Priority: FIFO with retry backoff - -2. REINDEX_QUEUE (existing, enhance) - - Block-level reindex - - Transaction-level reindex - - Priority: Admin requests > Auto-retry - -3. ORDER_QUEUE (new) - - Store orders even if DDO not found - - Process when DDO becomes available -``` - -**Benefits:** - -- Testable (can inject mock queue) -- Observable (queue depth, retry counts) -- Resilient (survives crashes) -- Decoupled (no tight coupling between components) - ---- - -### 4. 📦 HANDLE MISSING DDO IN ORDER/PRICING EVENTS - -**Current Issue:** - -- If DDO not found → skip order/pricing event -- Lost data if DDO indexed later - -**Proposed Change:** - -**For ORDER_STARTED/ORDER_REUSED:** - -``` -IF DDO not found: - 1. Create order record anyway (don't skip step 6) - 2. Store in database with status: 'orphaned' - 3. Add DDO processing to watch queue - 4. Skip only: step 5 (update count), step 7 (update DDO) - 5. When DDO indexed → process orphaned orders -``` - -**For PRICING EVENTS (Dispenser/Exchange):** - -``` -IF DDO not found: - 1. Check if DDO is in processing queue - 2. If yes → add pricing event to queue (process after DDO) - 3. If no → log to ddo_logs with error state - 4. Store pricing event data for future reconciliation -``` - -**Benefits:** - -- No data loss -- Can reconcile later -- Better observability - ---- - -### 5. 🚫 MOVE RETRY LOGIC TO ChainIndexer (Block Only That Chain) - -**Current Issue:** - -- Crawler startup retry in `OceanIndexer` -- Failure blocks **entire node** (all chains) - -**Proposed Change:** - -- Move `retryCrawlerWithDelay()` → **ChainIndexer** -- Each chain fails independently -- Other chains continue indexing - -**Implementation:** - -```typescript -// ChainIndexer.ts -async start() { - let retries = 0 - const maxRetries = 10 - - while (retries < maxRetries) { - try { - await this.initializeConnection() // RPC + DB - await this.indexLoop() - break - } catch (error) { - retries++ - const delay = Math.min(retries * 3000, 30000) - INDEXER_LOGGER.error( - `Chain ${this.blockchain.chainId} failed, retry ${retries}/${maxRetries} in ${delay}ms` - ) - await sleep(delay) - } - } - - if (retries === maxRetries) { - this.eventEmitter.emit('chain_failed', { - chainId: this.blockchain.chainId, - error: 'Max retries exceeded' - }) - } -} -``` - -**Benefits:** - -- Resilient multi-chain indexing -- One bad RPC doesn't kill everything -- Easier debugging (per-chain logs) - ---- - -### 6. 📍 BLOCK RETRY QUEUE IMPROVEMENTS - -**Current Issue:** - -- Failed block retried, but `lastIndexedBlock` not updated -- Same block retried indefinitely -- No expiry/max retry limit - -**Proposed Change:** - -``` -When block added to retry queue: - 1. Update lastIndexedBlock (move forward) - 2. Add block to retry queue with metadata: - - blockNumber - - retryCount (starts at 0) - - maxRetries (default: 5) - - lastError - - expiryDate (when to give up) - 3. Process retry queue separately (exponential backoff) - 4. If maxRetries exceeded → log to failed_blocks table -``` - -**Retry Queue Schema:** - -```typescript -interface BlockRetryTask { - chainId: number - blockNumber: number - retryCount: number - maxRetries: number - lastError: string - lastRetryAt: number - expiryDate: number // Timestamp when to stop retrying - events: string[] // Event hashes to reprocess -} -``` - -**Benefits:** - -- Indexer moves forward (doesn't get stuck) -- Failed blocks retried in background -- Clear failure tracking - ---- - -### 7. 🌐 REMOVE ECONNREFUSED-ONLY CONDITION - -**Current Issue:** - -- Retry only on `ECONNREFUSED` error -- Other errors (timeout, 500, p2p failures) not retried - -**Proposed Change:** - -- With event-level retry, **retry ALL error types**: - - ✓ RPC errors (timeout, 500, 429 rate limit) - - ✓ HTTP errors (decrypt service down) - - ✓ P2P errors (peer unreachable) - - ✓ Database errors (temp unavailable) - - ✓ Validation errors (maybe retryable) - -**Implementation:** - -```typescript -// Classify errors -enum ErrorType { - RETRYABLE_RPC = 'retryable_rpc', - RETRYABLE_HTTP = 'retryable_http', - RETRYABLE_P2P = 'retryable_p2p', - RETRYABLE_DB = 'retryable_db', - NON_RETRYABLE = 'non_retryable' -} - -function classifyError(error: Error): ErrorType { - if (error.code === 'ECONNREFUSED') return ErrorType.RETRYABLE_RPC - if (error.code === 'ETIMEDOUT') return ErrorType.RETRYABLE_RPC - if (error.message.includes('429')) return ErrorType.RETRYABLE_RPC - if (error.message.includes('P2P')) return ErrorType.RETRYABLE_P2P - if (error.message.includes('decrypt')) return ErrorType.RETRYABLE_HTTP - if (error.message.includes('factory')) return ErrorType.NON_RETRYABLE - return ErrorType.RETRYABLE_RPC // Default to retryable -} -``` - ---- - -### 8. ✅ UPDATE TESTS - -**Required Test Updates:** - -- Remove tests checking `EventEmitter` behavior -- Add tests for queue-based processing -- Add tests for retry with exponential backoff -- Add tests for orphaned orders -- Add tests for per-chain failure isolation -- Add tests for `ddo_logs` index -- Add tests for block retry with expiry - ---- - -### Summary Table - -| # | Change | Current Pain | Benefit | Effort | Priority | -| --- | --------------------------------------------- | --------------------------------- | ------------------------------------ | ------ | ----------- | -| 1 | Event-level retry + queues | Retry logic scattered, blocking | Unified, non-blocking, testable | High | 🔴 Critical | -| 2 | `ddo_logs` DB index | No order/pricing error tracking | Full audit trail, debugging | Medium | 🟡 High | -| 3 | Replace EventEmitter with queues | Blocking, not testable, no replay | Observable, resilient, testable | High | 🔴 Critical | -| 4 | Handle missing DDO (orphaned orders) | Lost orders/pricing data | No data loss, reconciliation | Medium | 🟡 High | -| 5 | Per-chain startup retry (ChainIndexer) | One failure kills entire node | Isolated failures, resilient | Low | 🔴 Critical | -| 6 | Block retry queue with expiry | Indexer stuck on bad blocks | Progress continues, background retry | Medium | 🟡 High | -| 7 | Retry ALL error types (not just ECONNREFUSED) | P2P/timeout/429 not retried | Comprehensive error handling | Low | 🟡 High | -| 8 | Update tests | Tests assume old architecture | Tests match new architecture | Medium | 🟢 Medium | - ---- - -### Migration Roadmap - -#### Phase 1: Foundation (Weeks 1-2) 🔴 Critical - -**Goal:** Establish queue infrastructure and database schema - -**Tasks:** - -1. Create database tables: - - - `event_queue` (new events) - - `event_retry_queue` (failed events) - - `ddo_logs` (all DDO-related events) - - `block_retry_queue` (failed blocks) - - `failed_blocks` (permanent failures) - - `dead_letter_queue` (max retries exceeded) - -2. Implement queue system: - - - `EventQueue` class (persistent queue) - - `EventQueueProcessor` class (worker pool) - - `EventRetryProcessor` class (background retries) - -3. Add error classification: - - `ErrorType` enum - - `classifyError()` function - - `isRetryable()` logic - -**Deliverables:** - -- Database migrations -- Queue infrastructure code -- Unit tests for queue operations - ---- - -#### Phase 2: Per-Chain Isolation (Week 3) 🔴 Critical - -**Goal:** Prevent one bad chain from killing entire node - -**Tasks:** - -1. Move `retryCrawlerWithDelay()` from `OceanIndexer` → `ChainIndexer.start()` -2. Add per-chain retry counters -3. Emit `chain_startup_failed` event (don't crash node) -4. Update `OceanIndexer.startThread()` to handle chain failures gracefully - -**Deliverables:** - -- Updated `ChainIndexer.start()` with retry logic -- Tests for chain isolation -- Monitoring for failed chains - ---- - -#### Phase 3: Event-Level Retry (Weeks 4-5) 🔴 Critical - -**Goal:** Replace embedded retry with queue-based system - -**Tasks:** - -1. Update all 12 event processors: - - - Remove `withRetrial()` calls - - Remove ECONNREFUSED checks - - Just process, let queue handle retries - -2. Update `ChainIndexer.indexLoop()`: - - - Replace `eventEmitter.emit()` → `eventQueue.enqueue()` - - Process events via `EventQueueProcessor` - -3. Implement exponential backoff: - - - 10s → 1min → 10min → 1hr → 1 week - -4. Log all events to `ddo_logs`: - - Success, failure, retrying states - - Track retryCount, error messages - -**Deliverables:** - -- Refactored event processors (12 files) -- Queue-based event processing -- Tests for retry logic - ---- - -#### Phase 4: Block Retry Queue (Week 6) 🟡 High - -**Goal:** Indexer continues even with failed blocks - -**Tasks:** - -1. Implement `addBlockToRetryQueue()` -2. Update `indexLoop()` error handling: - - Add failed block to queue - - Still update `lastIndexedBlock` (move forward!) -3. Implement `processBlockRetryQueue()` (background loop) -4. Add expiry logic (maxRetries, expiryDate) -5. Move permanent failures to `failed_blocks` table - -**Deliverables:** - -- Block retry queue processor -- Background retry loop -- Tests for block retry - ---- - -#### Phase 5: Handle Missing DDO (Week 7) 🟡 High - -**Goal:** No data loss for orphaned orders/pricing - -**Tasks:** - -1. Update ORDER_STARTED/ORDER_REUSED: - - - Create order record even if DDO not found - - Store as 'orphaned' status - - Add to watch queue - -2. Update pricing events (Dispenser/Exchange): - - - Check if DDO in processing queue - - If yes → add pricing event to queue - - If no → log to `ddo_logs` with error - -3. Implement reconciliation job: - - Periodically check for orphaned orders - - Process when DDO becomes available - -**Deliverables:** - -- Orphaned order handling -- Pricing event queue logic -- Reconciliation job - ---- - -#### Phase 6: Testing & Monitoring (Week 8) 🟢 Medium - -**Goal:** Comprehensive tests and observability - -**Tasks:** - -1. Update existing tests: - - - Remove EventEmitter assertions - - Add queue-based assertions - -2. Add integration tests: - - - Full retry flow (10s → 1 week) - - Chain isolation (one chain fails) - - Block retry queue - - Orphaned orders - -3. Add monitoring dashboard: - - - Queue depth (event, retry, block) - - Retry counts by error type - - Dead letter queue size - - Per-chain health - -4. Add alerting: - - Dead letter queue growing - - Chain startup failures - - High retry queue depth - -**Deliverables:** - -- Full test suite -- Monitoring dashboard -- Alerting rules - ---- - -### Expected Outcomes - -**Reliability:** - -- ✅ No single point of failure (per-chain isolation) -- ✅ Graceful degradation (some chains fail, others continue) -- ✅ No data loss (orphaned orders, retry queue) -- ✅ Progress continues (failed blocks don't block indexer) - -**Observability:** - -- ✅ Full audit trail (`ddo_logs` for all events) -- ✅ Visible retry state (queue depths, retry counts) -- ✅ Clear failure tracking (dead letter queue, failed_blocks) -- ✅ Per-chain health monitoring - -**Maintainability:** - -- ✅ Unified retry logic (no scattered code) -- ✅ Testable (queues can be mocked) -- ✅ Configurable (retry counts, backoffs) -- ✅ Decoupled (event processors just process) - -**Performance:** - -- ✅ Non-blocking (retries don't halt indexer) -- ✅ Concurrent processing (worker pool) -- ✅ Exponential backoff (reduces RPC load) - ---- - -## Overview - -The Ocean Node Indexer continuously monitors blockchain networks for Ocean Protocol events and processes them in real-time. - -**Current Architecture:** - -- One `ChainIndexer` instance per blockchain network -- Async/await architecture (no worker threads) -- Event-driven communication via `EventEmitter` -- Processes 12 different event types -- Adaptive error handling with multiple retry layers - -**Key Components:** - -- **ChainIndexer** - Per-chain indexer running async indexing loop -- **Event Processors** - Handle specific blockchain event types (12 processors) -- **Validation Pipeline** - Multi-layer validation (factory, metadata, publishers) -- **Database Layer** - Persistence (Elasticsearch/Typesense) - ---- - -## Event Monitoring Architecture - -### Continuous Monitoring Process - -**Location:** `ChainIndexer.ts` - `indexLoop()` - -``` -┌─────────────────────────────────────────────────────────────────┐ -│ CONTINUOUS MONITORING LOOP │ -│ │ -│ async indexLoop() { │ -│ while (!stopSignal) { │ -│ 1. Get last indexed block from DB │ -│ 2. Get current network height from RPC │ -│ 3. Calculate chunk size (adaptive: 1-1000 blocks) │ -│ 4. Retrieve events: provider.getLogs(fromBlock, toBlock) │ -│ 5. Process events through pipeline │ -│ 6. Update last indexed block in DB │ -│ 7. Emit events to downstream consumers │ -│ 8. Sleep for interval (default: 30 seconds) │ -│ 9. Process reindex queue (if any) │ -│ } │ -│ } │ -└─────────────────────────────────────────────────────────────────┘ -``` - -### Event Discovery Mechanism - -**Step-by-Step Process:** - -``` -1. Get Network State - ├─> lastIndexedBlock = await db.indexer.retrieve(chainId) - ├─> networkHeight = await provider.getBlockNumber() - └─> startBlock = max(lastIndexedBlock, deploymentBlock) - -2. Calculate Chunk to Process - ├─> remainingBlocks = networkHeight - startBlock - ├─> blocksToProcess = min(chunkSize, remainingBlocks) - └─> Adaptive chunkSize (halves on error, recovers after 3 successes) - -3. Retrieve Events from Blockchain - └─> provider.getLogs({ - fromBlock: lastIndexedBlock + 1, - toBlock: lastIndexedBlock + blocksToProcess, - topics: [OCEAN_EVENT_TOPIC_HASHES] // Filter by event signatures - }) - Returns: Log[] (raw blockchain event logs) - -4. Route Events to Processors - └─> processChunkLogs(logs, signer, provider, chainId) -``` - -### Event Topic Filtering - -The indexer listens for these Ocean Protocol event signatures: - -```typescript -EVENT_HASHES = { - // Metadata Events - '0x5463569d...': METADATA_CREATED - '0x127c3f87...': METADATA_UPDATED - '0x1f432bc9...': METADATA_STATE - - // Order Events - '0xa0e0424c...': ORDER_STARTED - '0x6e0dd743...': ORDER_REUSED - - // Dispenser Events - '0xdcda18b5...': DISPENSER_CREATED - '0x6e0cf36d...': DISPENSER_ACTIVATED - '0x53ae36d4...': DISPENSER_DEACTIVATED - - // Exchange Events - '0xdcda18b5...': EXCHANGE_CREATED - '0x6e0cf36d...': EXCHANGE_ACTIVATED - '0x53ae36d4...': EXCHANGE_DEACTIVATED - '0x7b3b3f0f...': EXCHANGE_RATE_CHANGED -} -``` - -**Monitoring Frequency:** - -- Checks for new blocks every 30 seconds (configurable via `INDEXER_INTERVAL`) -- Processes up to `chunkSize` blocks per iteration (default: 100-1000) -- Adaptive: reduces chunk size on RPC errors, recovers after successes - ---- - -## Event Processing Pipeline - -### Overall Flow - -``` -Raw Blockchain Logs - ↓ -┌─────────────────────────────────────────────────────────────┐ -│ 1. EVENT IDENTIFICATION │ -│ - Extract topic[0] (event signature hash) │ -│ - Look up in EVENT_HASHES mapping │ -│ - Check if Ocean Protocol event │ -└─────────────────────────────────────────────────────────────┘ - ↓ -┌─────────────────────────────────────────────────────────────┐ -│ 2. VALIDATION (for metadata events) │ -│ - Get transaction receipt │ -│ - Extract MetadataValidated events │ -│ - Check allowedValidators list │ -│ - Check access list memberships (balanceOf calls) │ -│ - If validation fails → skip event, continue to next │ -└─────────────────────────────────────────────────────────────┘ - ↓ -┌─────────────────────────────────────────────────────────────┐ -│ 3. ROUTE TO PROCESSOR │ -│ - Get cached processor instance (per eventType + chain) │ -│ - Call processor.processEvent() │ -└─────────────────────────────────────────────────────────────┘ - ↓ -┌─────────────────────────────────────────────────────────────┐ -│ 4. EVENT-SPECIFIC PROCESSING │ -│ - Factory validation (NFT deployed by Ocean) │ -│ - Decode event data from receipt │ -│ - Decrypt/decompress DDO (if metadata event) │ -│ - Fetch additional on-chain data (NFT info, pricing) │ -│ - Build domain model with enriched metadata │ -└─────────────────────────────────────────────────────────────┘ - ↓ -┌─────────────────────────────────────────────────────────────┐ -│ 5. DATABASE PERSISTENCE │ -│ - Create or update DDO │ -│ - Update DDO state (validation tracking) │ -│ - Create order records (if order event) │ -└─────────────────────────────────────────────────────────────┘ - ↓ -┌─────────────────────────────────────────────────────────────┐ -│ 6. EVENT EMISSION │ -│ - ChainIndexer emits to INDEXER_CRAWLING_EVENT_EMITTER │ -│ - OceanIndexer re-emits to INDEXER_DDO_EVENT_EMITTER │ -│ - Downstream consumers notified (API, cache, webhooks) │ -└─────────────────────────────────────────────────────────────┘ -``` - -### Location References - -**Event Monitoring:** `ChainIndexer.ts` - `indexLoop()` -**Event Identification:** `processor.ts` - `processChunkLogs()` -**Event Routing:** `processor.ts` - `getEventProcessor()` -**Event Processing:** `processors/*.ts` - `processEvent()` - ---- - -## Detailed Event Handling - -### 1. METADATA_CREATED Event - -**Trigger:** New data asset published on-chain - -**Processor:** `MetadataEventProcessor.ts` - -**On-Chain Data:** - -- `owner` - Publisher address -- `flags` - Encryption/compression flags (bit 2 = encrypted) -- `metadata` - Encrypted/compressed DDO -- `metadataHash` - SHA256 hash of DDO -- `validateTime` - Timestamp - -**Processing Steps:** - -``` -1. FACTORY VALIDATION - └─> wasNFTDeployedByOurFactory(chainId, signer, nftAddress) - ├─> Instantiate ERC721Factory contract - ├─> Loop through all NFTs from factory - └─> If not deployed by Ocean → REJECT, skip event - -2. DECODE EVENT DATA - └─> getEventData(provider, txHash, ERC721Template.abi) - ├─> Fetch transaction receipt - ├─> Find log matching event hash - ├─> Parse with contract ABI - └─> Extract: owner, flags, metadata, metadataHash - -3. DDO DECRYPTION (Complex: 400+ lines, 3 strategies) - └─> decryptDDO(decryptorURL, flag, owner, nftAddress, chainId, txId, metadataHash, metadata) - │ - ├─> IF ENCRYPTED (flag & 2 != 0): - │ ├─> Get nonce from provider/timestamp - │ ├─> Build signature: - │ │ - message = txId + ethAddress + chainId + nonce - │ │ - hash = solidityPackedKeccak256(message) - │ │ - signature = wallet.signMessage(hash) - │ ├─> HTTP: POST /api/services/decrypt - │ │ - Payload: { transactionId, chainId, signature, nonce } - │ │ - Timeout: 30 seconds - │ │ - Retry: up to 5 times (withRetrial) - │ │ - │ │ ⚠️ PROPOSED CHANGE: - │ │ └─> Use exponential backoff (10s → 1min → 10min → 1hr → 1 week) - │ │ └─> Non-blocking retry using queue mechanism - │ │ - │ ├─> P2P: p2pNode.sendTo(decryptorURL, message) - │ │ - │ │ ⚠️ PROPOSED CHANGE: - │ │ └─> Add retry mechanism for P2P connections - │ │ - │ ├─> Local: node.getCoreHandlers().handle(decryptDDOTask) - │ └─> Validate response hash matches metadataHash - │ - ⚠️ PROPOSED ARCHITECTURAL CHANGE: - │ └─> Move retry to EVENT LEVEL (decouple from decrypt) - │ └─> Always update ddo_logs (success or error) - │ └─> For retried DDOs: Get order count from DB (not from old DDO) - │ - └─> IF COMPRESSED (flag & 2 == 0): - └─> Parse directly: JSON.parse(toUtf8String(getBytes(metadata))) - -4. VALIDATE DDO ID - └─> Check ddo.id === makeDid(nftAddress, chainId) - └─> If mismatch → REJECT, update ddoState with error - -5. CHECK AUTHORIZED PUBLISHERS (if configured) - └─> Check if owner in authorizedPublishers list - └─> If not → REJECT, update ddoState with error - -6. FETCH NFT INFORMATION (multiple RPC calls) - └─> getNFTInfo(nftAddress, signer, owner, timestamp) - ├─> nftContract.getMetaData() → state - ├─> nftContract.getId() → token ID - ├─> nftContract.tokenURI(id) → URI - ├─> nftContract.name() → name - ├─> nftContract.symbol() → symbol - └─> Return: { state, address, name, symbol, owner, created, tokenURI } - -7. FETCH TOKEN INFORMATION (per datatoken) - └─> For each service in DDO: - ├─> datatokenContract.name() - ├─> datatokenContract.symbol() - └─> Collect: { address, name, symbol, serviceId } - -8. FETCH PRICING INFORMATION (multiple RPC calls) - └─> For each datatoken: - ├─> Check dispenser: dispenserContract.status(datatoken) - ├─> Check exchange: exchangeContract.getAllExchanges() - └─> Build prices array: [{ type, price, contract, token }] - -9. CHECK PURGATORY STATUS - └─> Purgatory.check(nftAddress, chainId, account) - └─> Return: { state: boolean } - -10. BUILD INDEXED METADATA - └─> Construct enriched metadata: - ├─> nft: { state, address, name, symbol, owner, created, tokenURI } - ├─> event: { txid, from, contract, block, datetime } - ├─> stats: [{ datatokenAddress, name, symbol, orders: 0, prices: [...] }] - └─> purgatory: { state } - -11. STORE IN DATABASE - └─> ddoDatabase.create(ddo) - ddoState.create(chainId, did, nftAddress, txId, valid=true) - -12. EMIT EVENT - └─> eventEmitter.emit(METADATA_CREATED, { chainId, data: ddo }) -``` - -**RPC Calls:** ~10-20 (receipt, factory, NFT info, token info, pricing) - -**⚠️ PROPOSED IMPROVEMENTS:** - -``` -┌─────────────────────────────────────────────────────────────┐ -│ METADATA_CREATED/UPDATED IMPROVEMENTS │ -├─────────────────────────────────────────────────────────────┤ -│ │ -│ 1. Replace EventEmitter with Queue System │ -│ - Use persistent queue instead of eventEmitter.emit() │ -│ - Better for testing and observability │ -│ │ -│ 2. Event-Level Retry (not deep in decryption) │ -│ - Queue-based retry with exponential backoff │ -│ - Non-blocking (doesn't halt indexer) │ -│ - Works for ALL error types (HTTP, P2P, RPC, DB) │ -│ │ -│ 3. Always Update ddo_logs Index │ -│ - Log success and failures │ -│ - Track: eventHash, txHash, blockNumber, retryCount │ -│ │ -│ 4. For Retried DDOs │ -│ - Recalculate order count from DB (not from old DDO) │ -│ - Query: SELECT COUNT(*) FROM orders WHERE did = ? │ -│ │ -└─────────────────────────────────────────────────────────────┘ -``` - ---- - -### 2. METADATA_UPDATED Event - -**Trigger:** Asset metadata is updated on-chain - -**Processor:** `MetadataEventProcessor.ts` (same as METADATA_CREATED) - -**Processing:** **Similar to METADATA_CREATED** with these differences: - -``` -1-10. Same validation and processing as METADATA_CREATED - -11. RETRIEVE EXISTING DDO - └─> existingDdo = ddoDatabase.retrieve(did) - -12. MERGE DDO DATA - └─> Merge new metadata with existing: - ├─> Update: metadata, services, credentials - ├─> Preserve: existing order counts, pricing - ├─> Merge: pricing arrays (add new, keep existing) - └─> Update: indexedMetadata.event (new tx, block, datetime) - -13. UPDATE DATABASE - └─> ddoDatabase.update(mergedDdo) - ddoState.update(chainId, did, nftAddress, txId, valid=true) - -14. EMIT EVENT - └─> eventEmitter.emit(METADATA_UPDATED, { chainId, data: ddo }) -``` - -**Key Difference:** Uses `update()` instead of `create()`, merges with existing data - -**RPC Calls:** ~10-20 - -**⚠️ PROPOSED IMPROVEMENTS:** Same as METADATA_CREATED (see above) - ---- - -### 3. METADATA_STATE Event - -**Trigger:** Asset state changes (Active → Revoked/Deprecated or vice versa) - -**Processor:** `MetadataStateEventProcessor.ts` - -**On-Chain Data:** - -- `metadataState` - New state value (0=Active, 1=End of Life, 2=Deprecated, 3=Revoked, etc.) - -**Processing Steps:** - -``` -1. DECODE EVENT DATA - └─> Extract: metadataState (integer) - -2. BUILD DID - └─> did = makeDid(nftAddress, chainId) - -3. RETRIEVE EXISTING DDO - └─> ddo = ddoDatabase.retrieve(did) - └─> If not found → log and skip - -4. CHECK STATE CHANGE - └─> Compare old state vs new state - - IF old=Active AND new=Revoked/Deprecated: - ├─> DDO becomes non-visible - ├─> Create short DDO (minimal version): - │ └─> { id, version: 'deprecated', chainId, nftAddress, - │ indexedMetadata: { nft: { state } } } - └─> Store short DDO - - ELSE: - └─> Update nft.state in existing DDO - -5. UPDATE DATABASE - └─> ddoDatabase.update(ddo) - -6. EMIT EVENT - └─> eventEmitter.emit(METADATA_STATE, { chainId, data: ddo }) -``` - -**Special Behavior:** When asset is revoked/deprecated, stores minimal DDO for potential future restoration - -**RPC Calls:** 1-2 (receipt, decode) - ---- - -### 4. ORDER_STARTED Event - -**Trigger:** Someone purchases/starts access to a data asset - -**Processor:** `OrderStartedEventProcessor.ts` - -**On-Chain Data:** - -- `consumer` - Buyer address -- `payer` - Payment source address -- `amount` - Amount paid -- `serviceId` - Service index -- `timestamp` - Order time - -**Processing Steps:** - -``` -1. DECODE EVENT DATA - └─> Extract: consumer, payer, amount, serviceIndex, timestamp - -2. FIND NFT ADDRESS - └─> datatokenContract = getDtContract(signer, event.address) - nftAddress = datatokenContract.getERC721Address() - -3. BUILD DID - └─> did = makeDid(nftAddress, chainId) - -4. RETRIEVE DDO - └─> ddo = ddoDatabase.retrieve(did) - └─> If not found → log error, skip - ⚠️ PROPOSED: Don't skip! Go to step 6 (create order), skip only 5 & 7 - - Store order as 'orphaned' in DB - - Process when DDO becomes available __> go to 6 create order store and skip only step 5, 7 - -5. UPDATE ORDER COUNT - └─> Find service in ddo.indexedMetadata.stats by datatokenAddress - └─> Increment stat.orders += 1 - -6. CREATE ORDER RECORD - └─> orderDatabase.create({ - type: 'startOrder', - timestamp, - consumer, - payer, - datatokenAddress: event.address, - nftAddress, - did, - startOrderId: txHash - }) - -7. UPDATE DDO - └─> ddoDatabase.update(ddo) - -8. EMIT EVENT - └─> eventEmitter.emit(ORDER_STARTED, { chainId, data: ddo }) - ⚠️ PROPOSED: Replace EventEmitter with queue-based system -``` - -**RPC Calls:** 1-2 (get NFT address, receipt) - -**⚠️ PROPOSED IMPROVEMENTS:** - -- Store orders even if DDO not found (orphaned orders) -- Log to `ddo_logs` index (not just ddoState) -- Add to ORDER_QUEUE for later processing - ---- - -### 5. ORDER_REUSED Event - -**Trigger:** Someone reuses an existing order for repeated access - -**Processor:** `OrderReusedEventProcessor.ts` - -**On-Chain Data:** - -- `startOrderId` - Reference to original order -- `payer` - Payment source (may differ from original) -- `timestamp` - Reuse time - -**Processing:** **Similar to ORDER_STARTED** with these differences: - -``` -1. DECODE EVENT DATA - └─> Extract: startOrderId, payer, timestamp - -2-5. Same as ORDER_STARTED (find NFT, get DDO, update count) - -6. RETRIEVE START ORDER - └─> startOrder = orderDatabase.retrieve(startOrderId) - └─> Need original order for consumer address - -7. CREATE REUSE ORDER RECORD - └─> orderDatabase.create({ - type: 'reuseOrder', - timestamp, - consumer: startOrder.consumer, // From original order - payer, // May be different - datatokenAddress: event.address, - nftAddress, - did, - startOrderId // Reference to original order - }) - -8-9. Same as ORDER_STARTED (update DDO, emit event) - ⚠️ PROPOSED: Same improvements as ORDER_STARTED -``` - -**Key Difference:** Links to original order, may have different payer - -**RPC Calls:** 1-2 - ---- - -### 6. DISPENSER_CREATED Event - -**Trigger:** New dispenser (free token distribution) is created - -**Processor:** `DispenserCreatedEventProcessor.ts` - -**On-Chain Data:** - -- `datatokenAddress` - Datatoken being dispensed -- `owner` - Dispenser owner -- `maxBalance` - Max tokens per user -- `maxTokens` - Max total tokens - -**Processing Steps:** - -``` -1. DECODE EVENT DATA - └─> Extract: datatokenAddress, owner, maxBalance, maxTokens - -2. VALIDATE DISPENSER CONTRACT - └─> isValidDispenserContract(event.address, chainId) - └─> Check if dispenser is approved by Router - └─> If not → log warning, skip - ⚠️ PROPOSED: Don't just skip! - - Log to `ddo_logs` index with error state - - Store: eventHash, txHash, blockNumber - - Create unified error handler for pricing events - - Keep all errors related to a DID in one place, --> add somethning similar to ddo state but for pricing errors and a handler - └─> maybe some logs and add all errors related to a did in a place keep one handler - └─> store in the logs the event hash and tx hash and block number - -3. FIND NFT ADDRESS - └─> datatokenContract.getERC721Address() - -4. RETRIEVE DDO - └─> ddo = ddoDatabase.retrieve(did) - └─> if not found -> check queue for ddo if found add to queue as well else skip applicable to all events - -5. ADD DISPENSER TO PRICING - └─> Find service by datatokenAddress - └─> If dispenser doesn't exist in prices: - └─> prices.push({ - type: 'dispenser', - price: '0', // Free - contract: event.address, - token: datatokenAddress - }) - -6. UPDATE DDO - └─> ddoDatabase.update(ddo) - -7. EMIT EVENT - └─> eventEmitter.emit(DISPENSER_CREATED, { chainId, data: ddo }) - ⚠️ PROPOSED: Replace EventEmitter with queue-based system -``` - -**RPC Calls:** 2-3 (receipt, validation, NFT address) - -**⚠️ PROPOSED IMPROVEMENTS:** (applies to all pricing events) - -- Log all events to `ddo_logs` index -- Handle missing DDO with queue mechanism -- Unified error handler for pricing events - ---- - -### 7. DISPENSER_ACTIVATED Event - -**Trigger:** Dispenser is activated (enables token distribution) - -**Processor:** `DispenserActivatedEventProcessor.ts` - -**Processing:** **Similar to DISPENSER_CREATED** - -``` -1-5. Same validation and processing as DISPENSER_CREATED - -Key Addition: -- Checks if dispenser already exists before adding -- If already exists → skip (no duplicate entries) -``` - -**RPC Calls:** 2-3 - ---- - -### 8. DISPENSER_DEACTIVATED Event - -**Trigger:** Dispenser is deactivated (disables token distribution) - -**Processor:** `DispenserDeactivatedEventProcessor.ts` - -**On-Chain Data:** - -- `datatokenAddress` - Datatoken address - -**Processing:** - -``` -1. DECODE EVENT DATA - └─> Extract: datatokenAddress - -2. VALIDATE & RETRIEVE DDO - └─> Same as DISPENSER_CREATED - -3. REMOVE DISPENSER FROM PRICING - └─> Find service by datatokenAddress - └─> Find dispenser entry by contract address - └─> prices = prices.filter(p => p.contract !== event.address) - -4. UPDATE DDO - └─> ddoDatabase.update(ddo) - -5. EMIT EVENT - └─> eventEmitter.emit(DISPENSER_DEACTIVATED, { chainId, data: ddo }) -``` - -**Key Difference:** Removes dispenser entry instead of adding - -**RPC Calls:** 2-3 - ---- - -### 9. EXCHANGE_CREATED Event - -**Trigger:** New fixed-rate exchange is created for a datatoken - -**Processor:** `ExchangeCreatedEventProcessor.ts` - -**On-Chain Data:** - -- `exchangeId` - Unique exchange identifier -- `datatokenAddress` - Datatoken being sold -- `baseToken` - Payment token (e.g., USDC, DAI) -- `rate` - Exchange rate - -**Processing Steps:** - -``` -1. DECODE EVENT DATA - └─> Extract: exchangeId, datatokenAddress, baseToken, rate - -2. VALIDATE EXCHANGE CONTRACT - └─> isValidFreContract(event.address, chainId) - └─> Check if exchange is approved by Router - └─> If not → log error, skip - -3. FIND NFT ADDRESS - └─> datatokenContract.getERC721Address() - -4. RETRIEVE DDO - └─> ddo = ddoDatabase.retrieve(did) - -5. ADD EXCHANGE TO PRICING - └─> Find service by datatokenAddress - └─> If exchange doesn't exist in prices: - └─> prices.push({ - type: 'exchange', - price: rate, - contract: event.address, - token: baseToken, - exchangeId - }) - -6. UPDATE DDO - └─> ddoDatabase.update(ddo) - -7. EMIT EVENT - └─> eventEmitter.emit(EXCHANGE_CREATED, { chainId, data: ddo }) -``` - -**RPC Calls:** 2-3 - ---- - -### 10. EXCHANGE_ACTIVATED Event - -**Trigger:** Fixed-rate exchange is activated - -**Processor:** `ExchangeActivatedEventProcessor.ts` - -**Processing:** **Similar to EXCHANGE_CREATED** - -``` -1-5. Same validation and processing as EXCHANGE_CREATED - -Key Addition: -- Checks if exchange already exists before adding -- If already exists → skip (no duplicate entries) -``` - -**RPC Calls:** 2-3 - ---- - -### 11. EXCHANGE_DEACTIVATED Event - -**Trigger:** Fixed-rate exchange is deactivated - -**Processor:** `ExchangeDeactivatedEventProcessor.ts` - -**On-Chain Data:** - -- `exchangeId` - Exchange identifier - -**Processing:** - -``` -1. DECODE EVENT DATA - └─> Extract: exchangeId - -2. GET EXCHANGE DETAILS - └─> freContract.getExchange(exchangeId) - └─> Extract: datatokenAddress - -3. VALIDATE & RETRIEVE DDO - └─> Same as EXCHANGE_CREATED - -4. REMOVE EXCHANGE FROM PRICING - └─> Find service by datatokenAddress - └─> Find exchange entry by exchangeId - └─> prices = prices.filter(p => p.exchangeId !== exchangeId) - -5. UPDATE DDO - └─> ddoDatabase.update(ddo) - -6. EMIT EVENT - └─> eventEmitter.emit(EXCHANGE_DEACTIVATED, { chainId, data: ddo }) -``` - -**Key Difference:** Removes exchange entry instead of adding - -**RPC Calls:** 2-3 - ---- - -### 12. EXCHANGE_RATE_CHANGED Event - -**Trigger:** Exchange rate is updated for a fixed-rate exchange - -**Processor:** `ExchangeRateChangedEventProcessor.ts` - -**On-Chain Data:** - -- `exchangeId` - Exchange identifier -- `newRate` - Updated exchange rate - -**Processing Steps:** - -``` -1. VALIDATE EXCHANGE CONTRACT - └─> isValidFreContract(event.address, chainId) - -2. DECODE EVENT DATA - └─> Extract: exchangeId, newRate - -3. GET EXCHANGE DETAILS - └─> freContract.getExchange(exchangeId) - └─> Extract: datatokenAddress - -4. RETRIEVE DDO - └─> ddo = ddoDatabase.retrieve(did) - -5. UPDATE EXCHANGE RATE - └─> Find service by datatokenAddress - └─> Find exchange entry by exchangeId - └─> price.price = newRate // Update in-place - -6. UPDATE DDO - └─> ddoDatabase.update(ddo) - -7. EMIT EVENT - └─> eventEmitter.emit(EXCHANGE_RATE_CHANGED, { chainId, data: ddo }) -``` - -**Key Difference:** Updates existing price instead of add/remove - -**RPC Calls:** 2-3 - ---- - -### Event Processing Summary - -**Metadata Events (3):** - -- METADATA_CREATED: Full validation + decryption + enrichment (~10-20 RPC calls) -- METADATA_UPDATED: Same as CREATED but merges with existing (~10-20 RPC calls) -- METADATA_STATE: Lightweight state update (~1-2 RPC calls) - -**Order Events (2):** - -- ORDER_STARTED: Update order count + create record (~1-2 RPC calls) -- ORDER_REUSED: Similar to STARTED, links to original order (~1-2 RPC calls) - -**Dispenser Events (3):** - -- DISPENSER_CREATED: Add pricing entry (~2-3 RPC calls) -- DISPENSER_ACTIVATED: Similar to CREATED (~2-3 RPC calls) -- DISPENSER_DEACTIVATED: Remove pricing entry (~2-3 RPC calls) - -**Exchange Events (4):** - -- EXCHANGE_CREATED: Add pricing entry (~2-3 RPC calls) -- EXCHANGE_ACTIVATED: Similar to CREATED (~2-3 RPC calls) -- EXCHANGE_DEACTIVATED: Remove pricing entry (~2-3 RPC calls) -- EXCHANGE_RATE_CHANGED: Update existing price (~2-3 RPC calls) - ---- - -## Error Handling & Retry Mechanisms - -### Overview: 4 Retry Layers (Current) - -The indexer has 4 different retry mechanisms at different levels: - -``` -┌──────────────────────────────────────────────────────────────┐ -│ LAYER 1: Crawler Startup Retry │ -│ Location: OceanIndexer - retryCrawlerWithDelay() │ -│ Scope: Initial RPC/DB connection │ -│ Max Retries: 10 │ -│ Interval: max(fallbackRPCs.length * 3000, 5000) ms │ -│ Strategy: Recursive retry with fallback RPCs │ -│ Checks: Network ready + DB reachable │ -│ │ -│ ⚠️ ISSUE: Failure blocks ENTIRE NODE (all chains) │ -└──────────────────────────────────────────────────────────────┘ - ↓ -┌──────────────────────────────────────────────────────────────┐ -│ LAYER 2: Adaptive Chunk Sizing │ -│ Location: ChainIndexer - indexLoop() │ -│ Scope: RPC getLogs() failures │ -│ Max Retries: Infinite (until success or stop) │ -│ Strategy: Halve chunk size on error (min: 1 block) │ -│ Recovery: Revert to original after 3 successes │ -└──────────────────────────────────────────────────────────────┘ - ↓ -┌──────────────────────────────────────────────────────────────┐ -│ LAYER 3: Block Processing Retry │ -│ Location: ChainIndexer - indexLoop() catch block │ -│ Scope: Event processing errors │ -│ Max Retries: Infinite │ -│ Strategy: Don't update lastBlock, retry same chunk │ -│ Backoff: Sleep for interval (30s) before retry │ -│ │ -│ ⚠️ ISSUE: Indexer stuck on failed block, no progress │ -└──────────────────────────────────────────────────────────────┘ - ↓ -┌──────────────────────────────────────────────────────────────┐ -│ LAYER 4: Individual RPC Retry │ -│ Location: BaseProcessor - withRetrial() │ -│ Scope: DDO decryption HTTP calls │ -│ Max Retries: 5 │ -│ Strategy: Exponential backoff │ -│ Conditions: Only retry on ECONNREFUSED │ -│ │ -│ ⚠️ ISSUE: Only HTTP, not P2P/other errors, blocking │ -└──────────────────────────────────────────────────────────────┘ -``` - ---- - -### 🔴 PROPOSED: New Retry Architecture - -``` -┌──────────────────────────────────────────────────────────────┐ -│ LAYER 1: Per-Chain Startup Retry (MOVED TO ChainIndexer) │ -│ Location: ChainIndexer - start() │ -│ Scope: Initial RPC/DB connection PER CHAIN │ -│ Max Retries: 10 │ -│ Interval: Progressive (3s, 6s, 9s, ... 30s max) │ -│ Strategy: Each chain retries independently │ -│ │ -│ ✅ BENEFIT: One bad RPC doesn't kill entire node │ -│ ✅ BENEFIT: Other chains continue indexing │ -└──────────────────────────────────────────────────────────────┘ - ↓ -┌──────────────────────────────────────────────────────────────┐ -│ LAYER 2: Adaptive Chunk Sizing (UNCHANGED) │ -│ Location: ChainIndexer - indexLoop() │ -│ Scope: RPC getLogs() failures │ -│ Max Retries: Infinite (until success or stop) │ -│ Strategy: Halve chunk size on error (min: 1 block) │ -│ Recovery: Revert to original after 3 successes │ -└──────────────────────────────────────────────────────────────┘ - ↓ -┌──────────────────────────────────────────────────────────────┐ -│ LAYER 3: Block Retry Queue (ENHANCED) │ -│ Location: ChainIndexer - processBlockRetryQueue() │ -│ Scope: Failed blocks │ -│ Max Retries: 5 per block │ -│ Strategy: │ -│ 1. Failed block → add to retry queue │ -│ 2. UPDATE lastIndexedBlock (move forward!) │ -│ 3. Add expiry: maxRetries & expiryDate │ -│ 4. Process retry queue separately (background) │ -│ 5. Exponential backoff per block │ -│ │ -│ ✅ BENEFIT: Indexer doesn't get stuck │ -│ ✅ BENEFIT: Failed blocks retried in background │ -│ ✅ BENEFIT: Clear failure tracking (failed_blocks table) │ -└──────────────────────────────────────────────────────────────┘ - ↓ -┌──────────────────────────────────────────────────────────────┐ -│ LAYER 4: Event-Level Retry Queue (NEW!) │ -│ Location: EventQueueProcessor (new component) │ -│ Scope: ALL event processing errors │ -│ Max Retries: 5 per event │ -│ Strategy: Queue-based with exponential backoff │ -│ - Retry 1: ~10 seconds │ -│ - Retry 2: ~1 minute │ -│ - Retry 3: ~10 minutes │ -│ - Retry 4: ~1 hour │ -│ - Retry 5: ~1 week (final) │ -│ │ -│ Retry ALL error types: │ -│ ✅ HTTP errors (decrypt service) │ -│ ✅ P2P errors (peer unreachable) │ -│ ✅ RPC errors (timeout, 500, 429) │ -│ ✅ DB errors (temp unavailable) │ -│ ✅ Validation errors (if retryable) │ -│ │ -│ ✅ BENEFIT: Non-blocking, unified retry logic │ -│ ✅ BENEFIT: Removes ECONNREFUSED-only condition │ -│ ✅ BENEFIT: Decoupled from processing logic │ -└──────────────────────────────────────────────────────────────┘ -``` - -### Layer 1: Startup Retry - -#### Current Implementation - -**Purpose:** Ensure RPC and DB are reachable before starting indexer - -**Location:** `OceanIndexer.retryCrawlerWithDelay()` - -**Code Flow:** - -```typescript -async retryCrawlerWithDelay(blockchain: Blockchain, interval = 5000) { - const retryInterval = Math.max(blockchain.getKnownRPCs().length * 3000, interval) - - // Try to connect - const result = await startCrawler(blockchain) - const dbActive = this.getDatabase() - - // Check DB reachable - if (!dbActive || !(await isReachableConnection(dbActive.getConfig().url))) { - INDEXER_LOGGER.error(`Giving up start crawling. DB is not online!`) - return false - } - - if (result) { - INDEXER_LOGGER.info('Blockchain connection successfully established!') - return true - } else { - numCrawlAttempts++ - if (numCrawlAttempts <= MAX_CRAWL_RETRIES) { - await sleep(retryInterval) - return this.retryCrawlerWithDelay(blockchain, retryInterval) // Recursive - } else { - INDEXER_LOGGER.error(`Giving up after ${MAX_CRAWL_RETRIES} retries.`) - return false - } - } -} -``` - -**Behavior:** - -- Recursive retry up to 10 times -- Increasing interval based on number of fallback RPCs -- Checks both RPC and DB connectivity -- Tries fallback RPCs if available - -**⚠️ ISSUE:** If one chain fails, **entire node stops** (all chains blocked) - ---- - -#### 🔴 PROPOSED: Move to ChainIndexer - -**New Location:** `ChainIndexer.start()` - -**Benefits:** - -- ✅ Per-chain isolation (one bad chain doesn't kill others) -- ✅ Independent retry counters per chain -- ✅ Better error visibility (which chain failed) -- ✅ Graceful degradation (continue with working chains) - -**Proposed Code:** - -```typescript -// ChainIndexer.ts -export class ChainIndexer { - private maxStartupRetries = 10 - private startupRetryCount = 0 - - async start(): Promise { - while (this.startupRetryCount < this.maxStartupRetries) { - try { - // Initialize RPC connection - await this.initializeRpcConnection() - - // Check DB connectivity - const dbActive = await this.checkDatabaseConnection() - if (!dbActive) { - throw new Error('Database not reachable') - } - - // Start indexing loop - INDEXER_LOGGER.info(`Chain ${this.blockchain.chainId} started successfully`) - await this.indexLoop() - return true - } catch (error) { - this.startupRetryCount++ - const delay = Math.min(this.startupRetryCount * 3000, 30000) - - INDEXER_LOGGER.error( - `Chain ${this.blockchain.chainId} startup failed ` + - `(attempt ${this.startupRetryCount}/${this.maxStartupRetries}), ` + - `retry in ${delay}ms: ${error.message}` - ) - - if (this.startupRetryCount < this.maxStartupRetries) { - await sleep(delay) - // Try next fallback RPC if available - this.rotateToNextRpc() - } - } - } - - // Max retries exceeded - INDEXER_LOGGER.error( - `Chain ${this.blockchain.chainId} failed after ${this.maxStartupRetries} retries` - ) - this.eventEmitter.emit('chain_startup_failed', { - chainId: this.blockchain.chainId, - error: 'Max startup retries exceeded' - }) - return false - } -} -``` - -**Migration Steps:** - -1. Move retry logic from `OceanIndexer` → `ChainIndexer` -2. Update `OceanIndexer.startThread()` to handle per-chain failures -3. Add monitoring for failed chains -4. Update tests to verify chain isolation - ---- - -### Layer 2: Adaptive Chunk Sizing - -**Purpose:** Handle RPC rate limits and transient failures - -**Code Flow:** - -```typescript -// In indexLoop() -let chunkSize = rpcDetails.chunkSize || 1 -let successfulRetrievalCount = 0 - -while (!stopSignal) { - try { - chunkEvents = await retrieveChunkEvents( - signer, - provider, - chainId, - startBlock, - blocksToProcess - ) - successfulRetrievalCount++ - } catch (error) { - // ERROR: Reduce chunk size - INDEXER_LOGGER.warn(`RPC error: ${error.message}`) - chunkSize = Math.floor(chunkSize / 2) < 1 ? 1 : Math.floor(chunkSize / 2) - successfulRetrievalCount = 0 - INDEXER_LOGGER.info(`Reduced chunk size to ${chunkSize}`) - } - - // SUCCESS: Recover after 3 successes - if (successfulRetrievalCount >= 3 && chunkSize < rpcDetails.chunkSize) { - chunkSize = rpcDetails.chunkSize - successfulRetrievalCount = 0 - INDEXER_LOGGER.info(`Reverted chunk size to ${chunkSize}`) - } -} -``` - -**Behavior:** - -- On RPC error: halve chunk size (minimum 1 block) -- After 3 consecutive successes: restore original chunk size -- No max retries (continues until successful or stopped) -- Self-healing mechanism - ---- - -### Layer 3: Block Processing Retry - -**Purpose:** Handle event processing errors without losing progress - -**Code Flow:** - -```typescript -// In indexLoop() -try { - processedBlocks = await processBlocks( - chunkEvents, - signer, - provider, - chainId, - startBlock, - blocksToProcess - ) - - // UPDATE last indexed block on success - currentBlock = await updateLastIndexedBlockNumber( - processedBlocks.lastBlock, - lastIndexedBlock - ) - - emitNewlyIndexedAssets(processedBlocks.foundEvents) -} catch (error) { - // ERROR: Don't update last block - INDEXER_LOGGER.error(`Processing failed: ${error.message}`) - successfulRetrievalCount = 0 - - // Wait before retrying same chunk - await sleep(interval) // 30 seconds - - // Next iteration will retry same chunk (lastBlock not updated) -} -``` - -**Behavior:** - -- On processing error: last indexed block NOT updated -- Next iteration retries the same block range -- Sleep interval before retry (30s default) -- No max retries (infinite until successful) -- Preserves data integrity (no gaps in indexed blocks) - -**Critical:** This ensures no events are lost even if processing fails - -**⚠️ ISSUE:** Indexer gets **stuck** on a failed block, no progress - ---- - -#### 🔴 PROPOSED: Block Retry Queue with Expiry - -**Key Changes:** - -1. **Update `lastIndexedBlock` even on failure** (move forward!) -2. Add failed block to retry queue (process separately) -3. Add expiry: maxRetries & expiryDate per block -4. Background processor for retry queue - -**Proposed Code:** - -```typescript -interface BlockRetryTask { - chainId: number - blockNumber: number - retryCount: number - maxRetries: number // Default: 5 - lastError: string - lastRetryAt: number - expiryDate: number // e.g., 1 week from first failure - events: ethers.Log[] // Events in this block -} - -// In indexLoop() -try { - processedBlocks = await processBlocks(...) - - // UPDATE last indexed block on success - currentBlock = await updateLastIndexedBlockNumber( - processedBlocks.lastBlock, - lastIndexedBlock - ) - - emitNewlyIndexedAssets(processedBlocks.foundEvents) - -} catch (error) { - INDEXER_LOGGER.error(`Processing block ${startBlock} failed: ${error.message}`) - - // NEW: Add to retry queue - await this.addBlockToRetryQueue({ - chainId: this.blockchain.chainId, - blockNumber: startBlock, - retryCount: 0, - maxRetries: 5, - lastError: error.message, - lastRetryAt: Date.now(), - expiryDate: Date.now() + (7 * 24 * 60 * 60 * 1000), // 1 week - events: chunkEvents - }) - - // NEW: Still update lastIndexedBlock (move forward!) - currentBlock = await updateLastIndexedBlockNumber( - processedBlocks?.lastBlock || startBlock, - lastIndexedBlock - ) - - // Indexer continues to next block -} - -// Background processor (separate async loop) -async processBlockRetryQueue() { - while (!this.stopSignal) { - const retryTasks = await this.getRetryTasksDue() - - for (const task of retryTasks) { - if (task.retryCount >= task.maxRetries || Date.now() > task.expiryDate) { - // Max retries or expired → move to failed_blocks table - await this.moveToFailedBlocks(task) - continue - } - - try { - // Retry processing - const processed = await processBlocks( - task.events, - this.signer, - this.provider, - task.chainId, - task.blockNumber, - 1 - ) - - // Success → remove from retry queue - await this.removeFromRetryQueue(task) - INDEXER_LOGGER.info(`Block ${task.blockNumber} retry succeeded`) - - } catch (error) { - // Failed again → update retry count with exponential backoff - task.retryCount++ - task.lastError = error.message - task.lastRetryAt = Date.now() - - // Exponential backoff: 1min, 10min, 1hr, 12hr, 1day - const backoffs = [60000, 600000, 3600000, 43200000, 86400000] - const nextRetryDelay = backoffs[task.retryCount - 1] || 86400000 - - await this.updateRetryTask(task, nextRetryDelay) - INDEXER_LOGGER.warn( - `Block ${task.blockNumber} retry ${task.retryCount}/${task.maxRetries} failed, ` + - `next retry in ${nextRetryDelay / 1000}s` - ) - } - } - - await sleep(10000) // Check every 10 seconds - } -} -``` - -**Benefits:** - -- ✅ Indexer no longer stuck on bad blocks -- ✅ Failed blocks retried in background with exponential backoff -- ✅ Clear failure tracking (`failed_blocks` table) -- ✅ Configurable retry limits -- ✅ Progress continues even with some failures - -**Migration Steps:** - -1. Add `blockRetryQueue` table to database -2. Add `failed_blocks` table for permanent failures -3. Implement `processBlockRetryQueue()` background loop -4. Update `indexLoop()` to add failures to queue -5. Add monitoring dashboard for retry queue - ---- - -### Layer 4: DDO Decryption Retry - -**Purpose:** Handle transient HTTP/network errors during DDO decryption - -**Code Flow:** - -```typescript -// In BaseProcessor - decryptDDO() -const response = await withRetrial(async () => { - const { nonce, signature } = await createSignature() - - const payload = { - transactionId: txId, - chainId, - decrypterAddress: keys.ethAddress, - dataNftAddress: contractAddress, - signature, - nonce - } - - try { - const res = await axios({ - method: 'post', - url: `${decryptorURL}/api/services/decrypt`, - data: payload, - timeout: 30000, - validateStatus: (status) => { - return (status >= 200 && status < 300) || status === 400 || status === 403 - } - }) - - if (res.status === 400 || res.status === 403) { - // Don't retry client errors - return res - } - - if (res.status !== 200 && res.status !== 201) { - // Retry 5XX errors - throw new Error(`bProvider exception: ${res.status}`) - } - - return res - } catch (err) { - // Only retry on connection refused - if (err.code === 'ECONNREFUSED' || err.message.includes('ECONNREFUSED')) { - INDEXER_LOGGER.error(`Decrypt failed with ECONNREFUSED, retrying...`) - throw err // Will be retried by withRetrial - } - throw err // Other errors not retried - } -}) -``` - -**withRetrial Implementation:** - -```typescript -// Max 5 retries with exponential backoff -async function withRetrial(fn: () => Promise, maxRetries = 5): Promise { - for (let i = 0; i < maxRetries; i++) { - try { - return await fn() - } catch (error) { - if (i === maxRetries - 1) throw error - await sleep(Math.pow(2, i) * 1000) // Exponential backoff - } - } -} -``` - -**Behavior:** - -- Max 5 retries with exponential backoff (1s, 2s, 4s, 8s, 16s) -- Only retries ECONNREFUSED errors (connection issues) -- Does NOT retry 400/403 (client errors) -- Retries 5XX errors (server errors) -- 30-second timeout per attempt - -**⚠️ ISSUES:** - -1. **Only retries ECONNREFUSED** (not P2P, timeouts, 429, etc.) -2. **Blocking** (stops processing during retries) -3. **Embedded in decryption logic** (not reusable) -4. **Short retry window** (16s total, not enough for service outages) - ---- - -### 🔴 PROPOSED: Layer 4 - Event-Level Retry Queue (NEW!) - -**Purpose:** Unified, non-blocking retry for ALL event processing errors - -**Key Concept:** Move retry logic OUT of event processors and INTO a queue-based system - -#### Architecture - -``` -┌──────────────────────────────────────────────────────────────┐ -│ EVENT PROCESSING FLOW │ -├──────────────────────────────────────────────────────────────┤ -│ │ -│ Blockchain Event Detected │ -│ ↓ │ -│ Add to EVENT_QUEUE │ -│ ↓ │ -│ EventQueueProcessor (async worker pool) │ -│ ├─ SUCCESS → Log to ddo_logs (status: success) │ -│ │ Update DB │ -│ │ Remove from queue │ -│ │ │ -│ └─ FAILURE → Log to ddo_logs (status: failed) │ -│ Classify error (retryable?) │ -│ Add to EVENT_RETRY_QUEUE │ -│ │ -│ EventRetryProcessor (background loop) │ -│ ├─ Get tasks due for retry │ -│ ├─ Check: retryCount < maxRetries │ -│ ├─ Check: Date.now() < expiryDate │ -│ ├─ Retry event processing │ -│ ├─ SUCCESS → Remove from retry queue │ -│ └─ FAILURE → Increment retryCount │ -│ Update nextRetryAt (exponential backoff) │ -│ If maxRetries → Move to dead_letter │ -└──────────────────────────────────────────────────────────────┘ -``` - -#### Data Structures - -```typescript -interface EventQueueTask { - id: string // UUID - chainId: number - eventType: string // METADATA_CREATED, ORDER_STARTED, etc. - eventHash: string - txHash: string - blockNumber: number - eventData: any // Raw event data - createdAt: number - status: 'pending' | 'processing' | 'success' | 'failed' -} - -interface EventRetryTask { - id: string - chainId: number - did?: string // If known - eventType: string - eventHash: string - txHash: string - blockNumber: number - eventData: any - retryCount: number - maxRetries: number // Default: 5 - lastError: string - errorType: ErrorType - createdAt: number - lastRetryAt: number - nextRetryAt: number // Exponential backoff - expiryDate: number // e.g., 1 week from creation -} - -enum ErrorType { - HTTP_ERROR = 'http_error', // Decrypt service down - P2P_ERROR = 'p2p_error', // Peer unreachable - RPC_ERROR = 'rpc_error', // RPC timeout, 429 - DB_ERROR = 'db_error', // Database temp unavailable - VALIDATION_ERROR = 'validation_error', // Factory check, etc. - NON_RETRYABLE = 'non_retryable' // Don't retry -} -``` - -#### Implementation - -```typescript -export class EventQueueProcessor { - private eventQueue: Queue - private retryQueue: Queue - private workerPool: number = 5 // Concurrent workers - - async start() { - // Start worker pool for new events - for (let i = 0; i < this.workerPool; i++) { - this.startWorker(i) - } - - // Start retry processor (background) - this.startRetryProcessor() - } - - private async startWorker(workerId: number) { - while (!this.stopSignal) { - const task = await this.eventQueue.dequeue() - if (!task) { - await sleep(100) - continue - } - - try { - // Update status - task.status = 'processing' - - // Get event processor - const processor = getEventProcessor(task.eventType, task.chainId) - - // Process event (no retry logic inside!) - const result = await processor.processEvent( - task.eventData, - task.chainId, - this.signer, - this.provider, - task.eventType - ) - - // Success - task.status = 'success' - await this.logToDdoLogs(task, 'success', null, result?.did) - - INDEXER_LOGGER.info( - `Worker ${workerId}: Processed ${task.eventType} tx ${task.txHash}` - ) - } catch (error) { - // Failure - task.status = 'failed' - const errorType = this.classifyError(error) - - await this.logToDdoLogs(task, 'failed', error.message, task.eventData.did) - - if (this.isRetryable(errorType)) { - // Add to retry queue - await this.addToRetryQueue(task, error, errorType) - - INDEXER_LOGGER.warn( - `Worker ${workerId}: ${task.eventType} failed (retryable), ` + - `added to retry queue: ${error.message}` - ) - } else { - INDEXER_LOGGER.error( - `Worker ${workerId}: ${task.eventType} failed (non-retryable): ` + - error.message - ) - } - } - } - } - - private async startRetryProcessor() { - while (!this.stopSignal) { - try { - const dueRetries = await this.getRetryTasksDue() - - for (const retryTask of dueRetries) { - // Check expiry - if (Date.now() > retryTask.expiryDate) { - await this.moveToDeadLetter(retryTask, 'Expired') - continue - } - - // Check max retries - if (retryTask.retryCount >= retryTask.maxRetries) { - await this.moveToDeadLetter(retryTask, 'Max retries exceeded') - continue - } - - try { - // Retry processing - const processor = getEventProcessor(retryTask.eventType, retryTask.chainId) - const result = await processor.processEvent( - retryTask.eventData, - retryTask.chainId, - this.signer, - this.provider, - retryTask.eventType - ) - - // Success! - await this.removeFromRetryQueue(retryTask) - await this.logToDdoLogs(retryTask, 'success', null, result?.did) - - INDEXER_LOGGER.info( - `Retry succeeded: ${retryTask.eventType} tx ${retryTask.txHash} ` + - `(attempt ${retryTask.retryCount + 1})` - ) - } catch (error) { - // Failed again - retryTask.retryCount++ - retryTask.lastError = error.message - retryTask.lastRetryAt = Date.now() - - // Exponential backoff: 10s, 1min, 10min, 1hr, 1 week - const backoffs = [10000, 60000, 600000, 3600000, 604800000] - const nextDelay = backoffs[retryTask.retryCount - 1] || 604800000 - retryTask.nextRetryAt = Date.now() + nextDelay - - await this.updateRetryTask(retryTask) - await this.logToDdoLogs(retryTask, 'retrying', error.message, retryTask.did) - - INDEXER_LOGGER.warn( - `Retry failed: ${retryTask.eventType} tx ${retryTask.txHash} ` + - `(attempt ${retryTask.retryCount}/${retryTask.maxRetries}), ` + - `next retry in ${nextDelay / 1000}s` - ) - } - } - } catch (error) { - INDEXER_LOGGER.error(`RetryProcessor error: ${error.message}`) - } - - await sleep(10000) // Check every 10 seconds - } - } - - private classifyError(error: Error): ErrorType { - const msg = error.message.toLowerCase() - const code = (error as any).code - - // HTTP errors (decrypt service) - if (code === 'ECONNREFUSED' || msg.includes('econnrefused')) { - return ErrorType.HTTP_ERROR - } - if (code === 'ETIMEDOUT' || msg.includes('timeout')) { - return ErrorType.HTTP_ERROR - } - if (msg.includes('429') || msg.includes('rate limit')) { - return ErrorType.RPC_ERROR - } - - // P2P errors - if (msg.includes('p2p') || msg.includes('peer')) { - return ErrorType.P2P_ERROR - } - - // RPC errors - if (msg.includes('rpc') || msg.includes('provider')) { - return ErrorType.RPC_ERROR - } - - // DB errors - if (msg.includes('database') || msg.includes('elasticsearch')) { - return ErrorType.DB_ERROR - } - - // Validation errors (usually non-retryable) - if (msg.includes('factory') || msg.includes('validation')) { - return ErrorType.NON_RETRYABLE - } - - // Default: retryable - return ErrorType.HTTP_ERROR - } - - private isRetryable(errorType: ErrorType): boolean { - return errorType !== ErrorType.NON_RETRYABLE - } - - private async logToDdoLogs( - task: EventQueueTask | EventRetryTask, - status: string, - error: string | null, - did?: string - ) { - const { ddoLogs } = await getDatabase() - await ddoLogs.create({ - did: did || 'unknown', - chainId: task.chainId, - eventType: task.eventType, - eventHash: task.eventHash, - txHash: task.txHash, - blockNumber: task.blockNumber, - status, - error, - retryCount: 'retryCount' in task ? task.retryCount : 0, - timestamp: Date.now() - }) - } -} -``` - -#### Benefits - -**✅ Unified Retry Logic** - -- All 12 event types use same retry mechanism -- No more scattered retry code in processors -- Easier to maintain and test - -**✅ Non-Blocking** - -- Indexer continues processing new events -- Retries happen in background -- No performance impact on main indexing loop - -**✅ Retry ALL Error Types** - -- HTTP errors (decrypt service down) -- P2P errors (peer unreachable) -- RPC errors (timeout, 429 rate limit) -- DB errors (temp unavailable) -- Removes ECONNREFUSED-only limitation - -**✅ Exponential Backoff with Long Window** - -- 10s → 1min → 10min → 1hr → 1 week -- Handles long service outages -- Configurable per error type - -**✅ Full Observability** - -- All events logged to `ddo_logs` -- Track retry count, error messages -- Dead letter queue for permanent failures -- Monitoring dashboard for queue depth - -**✅ Decoupled from Event Logic** - -- Event processors just process, no retry code -- Queue handles all retry complexity -- Testable in isolation - -#### Migration Steps - -1. Create `event_queue` table -2. Create `event_retry_queue` table -3. Create `ddo_logs` index (all events, not just metadata) -4. Create `dead_letter_queue` table -5. Implement `EventQueueProcessor` class -6. Update all event processors to remove retry logic -7. Update `ChainIndexer` to enqueue events (not emit) -8. Replace `EventEmitter` with queue system -9. Add monitoring dashboard -10. Update tests - ---- - -### Error Handling Issues (Current) - -**Current Problems:** - -1. **No Centralized Strategy:** - - - 4 different retry mechanisms - - No coordination between layers - - Unclear which mechanism applies when - -2. **Silent Failures:** - - - Events skipped with `continue` statement - - No error tracking or metrics - - Difficult to diagnose missing events - -3. **No Circuit Breaker:** - - - Continues retrying failed RPCs indefinitely - - Can cause cascade failures - - No health status tracking - -4. **Infinite Retries:** - - - Layer 2 and 3 have no max retries - - Can get stuck on persistent errors - - No timeout mechanism - -5. **No Error Classification:** - - All processing errors treated equally - - No distinction between retryable and permanent errors - - Bad events can block entire chunk - ---- - -## Failure Scenarios & Recovery - -### Scenario 1: RPC Provider Fails - -**Current Behavior:** - -``` -1. retrieveChunkEvents() throws error -2. Caught in indexLoop() -3. Adaptive chunk sizing triggered: - - chunkSize = floor(chunkSize / 2) - - Minimum: 1 block -4. Next iteration retries with smaller chunk -5. If all fallback RPCs fail during startup: - - retryCrawlerWithDelay() retries up to 10 times - - After max retries → ChainIndexer not started -``` - -**Recovery:** - -- Self-healing via chunk size reduction -- Fallback RPC support (tries alternatives) -- Manual restart required if startup fails after 10 retries - -**Issues:** - -- No RPC health tracking -- No circuit breaker (keeps retrying forever after startup) -- Can get very slow (chunk size = 1) - ---- - -### Scenario 2: Database Unavailable - -**Current Behavior:** - -``` -1. DB operation fails (read or write) -2. Error thrown and caught in indexLoop() -3. Last indexed block NOT updated -4. Sleep for interval (30s) -5. Next iteration retries same chunk -6. Repeats indefinitely until DB available -``` - -**Recovery:** - -- Automatic retry (infinite) -- Data integrity preserved (no gaps) -- No manual intervention needed (if DB comes back) - -**Issues:** - -- No DB health check -- No timeout (infinite retry) -- Can process events but not store them (wasted work) -- No notification that DB is down - ---- - -### Scenario 3: Processing Error in Event Handler - -**Current Behavior:** - -``` -1. processor.processEvent() throws error -2. Caught in processBlocks() -3. Error re-thrown -4. Caught in indexLoop() -5. Last indexed block NOT updated -6. Sleep for interval -7. Next iteration retries same chunk -``` - -**Recovery:** - -- Retry same chunk indefinitely -- No max retries -- Eventually succeeds if error is transient - -**Issues:** - -- Bad event data can block entire chunk -- No skip mechanism for permanently bad events -- No event-level error handling -- All events in chunk must succeed - -**Example:** If chunk has 100 events and event #50 is corrupted, the entire chunk retries forever. - ---- - -### Scenario 4: DDO Decryption Fails - -**Current Behavior:** - -``` -1. decryptDDO() throws error after 5 retries -2. Error caught in processEvent() -3. Event skipped -4. ddoState updated with error message -5. Processing continues with next event -``` - -**Recovery:** - -- Event marked as invalid in ddoState -- Other events in chunk processed normally -- No retry (event permanently skipped) - -**Issues:** - -- Event lost (not retried later) -- No notification mechanism -- Needs manual intervention (reindex tx) - ---- - -### Scenario 5: Validation Failure - -**Current Behavior:** - -``` -1. Validation fails (e.g., not from Ocean Factory) -2. `continue` statement executed -3. Event silently skipped -4. No database update -5. Processing continues with next event -``` - -**Recovery:** - -- No recovery (by design) -- Event intentionally ignored - -**Issues:** - -- Silent failures (no logging at error level) -- No metrics on skipped events -- Difficult to diagnose why events are missing - ---- - -## Summary - -### Event Monitoring Characteristics - -**Monitoring:** - -- Continuous polling every 30 seconds -- Processes 1-1000 blocks per iteration (adaptive) -- Filter-based event retrieval (12 event types) -- Per-chain monitoring (concurrent via async/await) - -**Processing:** - -- Sequential within chunk (maintains order) -- Multi-layer validation (factory → metadata → publisher) -- Complex DDO decryption (3 strategies: HTTP, P2P, local) -- Rich metadata enrichment (10-20 RPC calls per metadata event) - -**Performance:** - -- ~10-20 RPC calls per metadata event -- ~1-2 RPC calls per order/pricing event -- No batching (events processed one at a time) -- No parallelization within chunk - -### Error Handling Characteristics - -**Retry Mechanisms:** - -- Layer 1: Startup (10 retries, recursive, checks DB) -- Layer 2: Adaptive chunk sizing (infinite, self-healing) -- Layer 3: Block processing (infinite, preserves integrity) -- Layer 4: DDO decryption (5 retries, exponential backoff) - -**Issues:** - -- No centralized retry strategy -- No circuit breaker pattern -- Silent failures on validation -- Infinite retries can cause hangs -- No error classification -- No metrics/observability - -### Key Improvement Opportunities - -**Event Monitoring:** - -- Implement batch RPC calls -- Parallelize event processing (where safe) -- Add event prioritization -- Implement event queue - -**Error Handling:** - -- Centralize retry logic -- Add circuit breaker pattern -- Implement timeout mechanisms -- Add error classification (retryable vs permanent) -- Skip mechanism for bad events -- Metrics and alerting - -**Observability:** - -- Track events processed/skipped/failed -- Monitor RPC health per provider -- Track processing latency -- Alert on persistent failures - ---- - -**Document Version:** 2.0 -**Last Updated:** January 27, 2026 -**Status:** Focused on Event Monitoring & Error Handling -**Word Count:** ~4,500 words (reduced from 12,000+) diff --git a/docs/IndexerRefactorStrategy.md b/docs/IndexerRefactorStrategy.md new file mode 100644 index 000000000..641500183 --- /dev/null +++ b/docs/IndexerRefactorStrategy.md @@ -0,0 +1,328 @@ +# Ocean Node Indexer - Event Monitoring & Error Handling + +## Table of Contents + +1. [Overview](#overview) +2. [🔴 PROPOSED IMPROVEMENTS (Post-Meeting Changes)](#-proposed-improvements-post-meeting-changes) + +--- + +## Overview + +### Current Indexer Architecture + +The Ocean Node Indexer is built with the following design principles (see [Architecture.md](./Arhitecture.md) for details): + +- **Single-threaded, non-blocking design**: Uses Node.js async/await for concurrent execution across multiple chains +- **ChainIndexer instances**: Each blockchain network is monitored by a dedicated ChainIndexer instance running concurrently via the event loop +- **Event-driven communication**: Components communicate through EventEmitter for clean separation of concerns +- **Efficient I/O handling**: All RPC calls, database operations, and network requests are non-blocking, allowing high concurrency without worker threads + +### Proposed Architecture Evolution + +The refactoring strategy below maintains the core single-threaded, non-blocking architecture while introducing key improvements: + +1. **EventEmitter → Persistent Queues**: Replace synchronous EventEmitter with persistent queue system for better reliability and observability +2. **Event-level retry**: Move retry logic from embedded operations to event-level processing +3. **Enhanced error tracking**: Introduce comprehensive logging via `ddo_logs` index +4. **Per-chain resilience**: Isolate chain failures to prevent cascading issues + +These changes preserve the efficient I/O model and concurrent ChainIndexer execution while adding production-grade error handling and monitoring. + +--- + +## 🔴 PROPOSED IMPROVEMENTS (Post-Meeting Changes) + +> **Status:** Draft proposals from Jan 27, 2026 meeting +> **Goal:** Improve reliability, decoupling, and error handling + +### 1. 🎯 EVENT-LEVEL RETRY MECHANISM WITH QUEUES + +**Current Issue:** Retry logic is deeply embedded in event processing steps (e.g., inside DDO decryption) + +**Proposed Change:** + +- **Move retry logic to event level** (not deep inside processing steps) +- **Implement queue-based retry system** for all 12 event types +- **Decouple retry from specific operations** (e.g., decrypt, p2p, HTTP) + +**Implementation:** + +``` +┌─────────────────────────────────────────────────────────────┐ +│ EVENT ERROR PROCESSING QUEUE │ +├─────────────────────────────────────────────────────────────┤ +│ │ +│ Event Detected │ +│ ↓ │ +│ Send to Processor │ +│ ↓ │ +│ Process Event │ +│ ├─ Success → Mark complete, update DB │ +│ └─ Failure → Add to Retry Queue with backoff │ +│ │ +│ Retry Queue (exponential backoff): │ +│ - Retry 1: ~10 seconds │ +│ - Retry 2: ~1 minute │ +│ - Retry 3: ~10 minutes │ +│ - Retry 4: ~1 hour │ +│ - Retry 5: ~1 week (final attempt) │ +│ │ +│ Benefits: │ +│ ✓ Non-blocking (doesn't halt chain indexing) │ +│ ✓ Works for ALL error types (HTTP, P2P, RPC, DB) │ +│ ✓ Visible retry state in monitoring │ +└─────────────────────────────────────────────────────────────┘ +``` + +**Applies to:** All event processors, especially METADATA_CREATED/UPDATED + +--- + +### 2. 🗄️ NEW DATABASE INDEX: `ddo_logs` + +**Current Issue:** + +- `ddoState` only tracks metadata events +- Order and pricing events have no error tracking +- No unified view of all DDO-related events + +**Proposed Change:** + +- Create new DB index: **`ddo_logs`** +- Store **all events** related to a DID (metadata, orders, pricing) +- Similar structure to `ddoState` but broader scope +- **Add handler and routes (HTTP + P2P)** to query all information about a DID, transaction, or event + - Similar to existing `ddo-state` handler but for comprehensive logs + - Enable querying by: `did`, `txHash`, `blockNumber`, `eventType` + - Support both HTTP API endpoints and P2P protocol for distributed querying + +**Schema:** + +```typescript +interface DdoLog { + did: string // Indexed + chainId: number // Indexed + eventType: string // METADATA_CREATED, ORDER_STARTED, etc. + eventHash: string // Event signature hash + txHash: string // Transaction hash + blockNumber: number // Block number + timestamp: number // Event timestamp + status: 'success' | 'failed' | 'retrying' + error?: string // Error message if failed + retryCount: number // Number of retry attempts default 0 + lastRetry?: number // Timestamp of last retry + metadata?: Record // Event-specific data +} +``` + +**Benefits:** + +- Single source of truth for all DDO events +- Easier debugging (see all events for a DID) +- Track pricing/order event errors (not just metadata) +- Audit trail for compliance + +--- + +### 3. 🔄 REPLACE EventEmitter WITH QUEUES + +**Current Issue:** + +- Using `EventEmitter` for communication +- Synchronous, blocking behavior +- No retry/replay capability +- Difficult to test + +**Proposed Change:** + +- Replace `EventEmitter` with **persistent queue system** +- Use queue for: + - ✓ Newly indexed assets (instead of `eventEmitter.emit()`) + - ✓ Reindex requests (block & transaction level) + - ✓ Admin commands + +**Queue Types:** + +``` +1. EVENT_PROCESSING_QUEUE (primary) + - New events from blockchain + - Priority: FIFO with retry backoff + +2. REINDEX_QUEUE (existing, enhance) + - Block-level reindex + - Transaction-level reindex + - Priority: Admin requests > Auto-retry + +3. ORDER_QUEUE (new) + - Store orders even if DDO not found + - Process when DDO becomes available +``` + +**Benefits:** + +- Testable (can inject mock queue) +- Observable (queue depth, retry counts) +- Resilient (survives crashes) +- Decoupled (no tight coupling between components) + +--- + +### 4. 📦 HANDLE MISSING DDO IN ORDER/PRICING EVENTS + +**Current Issue:** + +- If DDO not found → skip order/pricing event +- Lost data if DDO indexed later + +**Proposed Change:** + +**Unified Queue-Based Approach for Both Orders and Pricing Events:** + +``` +IF DDO not found: + 1. Check if DDO exists in database + 2. If not found → add event to pending queue + 3. Store event in ddo_logs with status: 'pending_ddo' + 4. Link event to DID for future reconciliation + 5. When DDO is successfully indexed: + → Process all pending events for that DID (orders + pricing) + → Update event status from 'pending_ddo' to 'success' or 'failed' + → Maintain event order based on blockNumber and logIndex +``` + +**Queue Structure:** + +```typescript +interface PendingEvent { + did: string + eventType: string // ORDER_STARTED, ORDER_REUSED, DISPENSER_*, EXCHANGE_* + chainId: number + txHash: string + blockNumber: number + timestamp: number + retryCount: number + queuedAt: number +} +``` + +**Benefits:** + +- **Consistent approach** for all event types (orders + pricing) +- **No data loss** - all events queued and processed eventually +- **Maintains event order** using blockNumber and logIndex +- **Automatic reconciliation** when DDO becomes available +- **Better observability** - track pending events per DID +- **Prevents orphaned records** - only create records when DDO exists + +--- + +### 5. 🚫 MOVE RETRY LOGIC TO ChainIndexer (Block Only That Chain) + +**Current Issue:** + +- Crawler startup retry in `OceanIndexer` +- Failure blocks **entire node** (all chains) + +**Proposed Change:** + +- Move `retryCrawlerWithDelay()` → **ChainIndexer** +- Each chain fails independently +- Other chains continue indexing + +**Benefits:** + +- Resilient multi-chain indexing +- One bad RPC doesn't kill everything +- Easier debugging (per-chain logs) + +--- + +### 6. 📍 BLOCK RETRY QUEUE IMPROVEMENTS + +**Current Issue:** + +- Failed block retried, but `lastIndexedBlock` not updated +- Same block retried indefinitely +- No expiry/max retry limit + +**Proposed Change:** + +``` +When block added to retry queue: + 1. Update lastIndexedBlock (move forward) + 2. Add block to retry queue with metadata: + - blockNumber + - retryCount (starts at 0) + - maxRetries (default: 5) + - lastError + - expiryDate (when to give up) + 3. Process retry queue separately (exponential backoff) + 4. If maxRetries exceeded → log to failed_blocks table +``` + +**Retry Queue Schema:** + +```typescript +interface BlockRetryTask { + chainId: number + blockNumber: number + retryCount: number + maxRetries: number + lastError: string + lastRetryAt: number +} +``` + +**Benefits:** + +- Indexer moves forward (doesn't get stuck) +- Failed blocks retried in background +- Clear failure tracking + +--- + +### 7. 🌐 REMOVE ECONNREFUSED-ONLY CONDITION + +**Current Issue:** + +- Retry only on `ECONNREFUSED` error +- Other errors (timeout, 500, p2p failures) not retried + +**Proposed Change:** + +- With event-level retry, **retry ALL error types**: + - ✓ RPC errors (timeout, 500, 429 rate limit) + - ✓ HTTP errors (decrypt service down) + - ✓ P2P errors (peer unreachable) + - ✓ Database errors (temp unavailable) + - ✓ Validation errors (maybe retryable) + +--- + +### 8. ✅ UPDATE TESTS + +**Required Test Updates:** + +- Remove tests checking `EventEmitter` behavior +- Add tests for queue-based processing +- Add tests for retry with exponential backoff +- Add tests for orphaned orders +- Add tests for per-chain failure isolation +- Add tests for `ddo_logs` index +- Add tests for block retry with expiry + +--- + +### Summary Table + +| # | Change | Current Pain | Benefit | Effort | Priority | +| --- | --------------------------------------------- | --------------------------------- | ------------------------------------ | ------ | ----------- | +| 1 | Event-level retry + queues | Retry logic scattered, blocking | Unified, non-blocking, testable | High | 🔴 Critical | +| 2 | `ddo_logs` DB index | No order/pricing error tracking | Full audit trail, debugging | Medium | 🟡 High | +| 3 | Replace EventEmitter with queues | Blocking, not testable, no replay | Observable, resilient, testable | High | 🔴 Critical | +| 4 | Handle missing DDO (orphaned orders) | Lost orders/pricing data | No data loss, reconciliation | Medium | 🟡 High | +| 5 | Per-chain startup retry (ChainIndexer) | One failure kills entire node | Isolated failures, resilient | Low | 🔴 Critical | +| 6 | Block retry queue with expiry | Indexer stuck on bad blocks | Progress continues, background retry | Medium | 🟡 High | +| 7 | Retry ALL error types (not just ECONNREFUSED) | P2P/timeout/429 not retried | Comprehensive error handling | Low | 🟡 High | +| 8 | Update tests | Tests assume old architecture | Tests match new architecture | Medium | 🟢 Medium |