Skip to content

Conversation

@nischitpra
Copy link
Collaborator

@nischitpra nischitpra commented Dec 11, 2025

Summary by CodeRabbit

  • Bug Fixes

    • Enhanced reorganization detection system with multi-source validation across block headers, transactions, and logs for improved data consistency.
    • Improved block tracking during network reorganizations with better error handling.
  • Improvements

    • Simplified reorg handling workflow and messaging.
    • Enhanced logging for better visibility during reorg operations.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 11, 2025

Walkthrough

Enhanced reorg detection with multi-source validation combining block headers, transaction counts, and logs data. Added ClickHouse query functions to detect transaction and log mismatches. Simplified Kafka publisher by removing revert pathway. Updated reorg handler to return last valid block.

Changes

Cohort / File(s) Summary
Reorg detection multi-source validation
internal/committer/reorg.go, internal/libs/clickhouse.go
Modified detectAndHandleReorgs signature to return (int64, error) instead of error. Introduced multi-stage reorg detection querying headers, transactions, and logs; consolidates conflicting ranges into final [start, end] window. Added GetTransactionMismatchRangeFromClickHouseV2 and GetLogsMismatchRangeFromClickHouseV2 functions; added blockTxAggregate and blockLogAggregate types. Removed in-function validations for transaction counts and logs bloom consistency. Enhanced RunReorgValidator logging and persistence of last valid block.
Kafka publisher simplification
internal/storage/kafka_publisher.go
Removed public PublishReorg method and internal publishBlockRevert helper; eliminated revert-and-republish orchestration. Updated PublishBlockDataReorg to publish old/new data without revert step. Enhanced diagnostic logging with block metadata (chain_id, block_number, counts, is_deleted, is_reorg). Removed createBlockRevertMessage function; consolidated message creation through createBlockDataMessage.

Sequence Diagram(s)

sequenceDiagram
    participant RV as RunReorgValidator
    participant DAH as detectAndHandleReorgs
    participant CH as ClickHouse
    participant HR as handleReorgForRange
    participant KP as KafkaPublisher
    
    RV->>DAH: call with startBlock, endBlock
    
    rect rgb(200, 220, 255)
    Note over DAH,CH: Multi-source mismatch detection
    DAH->>CH: query block headers
    CH-->>DAH: reorgStartBlock, reorgEndBlock
    
    DAH->>CH: GetTransactionMismatchRange
    CH-->>DAH: txStart, txEnd (or -1, -1)
    
    DAH->>CH: GetLogsMismatchRange
    CH-->>DAH: logsStart, logsEnd (or -1, -1)
    end
    
    rect rgb(220, 255, 220)
    Note over DAH: Consolidate ranges
    DAH->>DAH: merge header, tx, logs ranges<br/>finalStart = min, finalEnd = max
    end
    
    alt Final range exists
        DAH->>HR: handleReorgForRange(finalStart, finalEnd)
        HR->>KP: PublishBlockDataReorg
        KP-->>HR: success/error
        HR-->>DAH: error or nil
        DAH->>CH: persist lastValidBlock = finalEnd
    else No mismatches
        DAH->>CH: persist lastValidBlock = lastHeaderBlock
    end
    
    DAH-->>RV: return lastValidBlock, error
    RV->>RV: advance lastBlockCheck
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Signature change and control flow: detectAndHandleReorgs now returns (int64, error) and implements multi-stage detection with consolidation logic that requires careful verification of merge semantics and error propagation paths.
  • New query functions with duplication: GetTransactionMismatchRangeFromClickHouseV2 and GetLogsMismatchRangeFromClickHouseV2 are noted as duplicated; review should identify whether duplication is intentional and assess both implementations for correctness.
  • Range consolidation logic: The merging of three independent mismatch ranges (headers, transactions, logs) into a final [start, end] window needs careful review to ensure min/max semantics match intended behavior.
  • Persistence changes: Updates to how lastValidBlock is persisted and the conditions under which it changes require verification against the previous always-update-to-reorgEndBlock behavior.
  • Method removal in Kafka publisher: Removal of PublishReorg and publishBlockRevert eliminates a pathway; verify that the two-step reorg flow in PublishBlockDataReorg adequately replaces the removed orchestration.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the main enhancement: adding transaction and logs count validation to the reorg detection logic.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch np/reorg_fix

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/committer/reorg.go (1)

111-138: Potential nil pointer dereference when iterating block headers.

GetBlockHeadersForReorgCheck returns a pre-allocated slice where missing blocks are nil. The iteration at lines 111-131 and the access at line 134 will panic if any block header is missing.

Add nil checks before accessing block header fields:

+	// Find last non-nil block for fallback
+	var lastHeaderBlock int64 = -1
+	for i := len(blockHeaders) - 1; i >= 0; i-- {
+		if blockHeaders[i] != nil {
+			lastHeaderBlock = blockHeaders[i].Number.Int64()
+			break
+		}
+	}
+	if lastHeaderBlock == -1 {
+		log.Debug().Msg("detectAndHandleReorgs: No valid block headers found")
+		return nil
+	}
+
 	// 1) Block verification: find reorg range from header continuity (existing behavior)
 	reorgStartBlock := int64(-1)
 	reorgEndBlock := int64(-1)
 	for i := 1; i < len(blockHeaders); i++ {
+		if blockHeaders[i] == nil || blockHeaders[i-1] == nil {
+			// Missing block indicates data inconsistency - mark as reorg
+			if reorgStartBlock == -1 && blockHeaders[i-1] != nil {
+				reorgStartBlock = blockHeaders[i-1].Number.Int64()
+			} else if reorgStartBlock == -1 {
+				reorgStartBlock = int64(startBlock) + int64(i) - 1
+			}
+			continue
+		}
 		if blockHeaders[i].Number.Int64() != blockHeaders[i-1].Number.Int64()+1 {
🧹 Nitpick comments (1)
internal/committer/reorg.go (1)

140-151: Consider caching block data to avoid redundant ClickHouse queries.

The block data is loaded three times for the same range:

  1. GetBlockHeadersForReorgCheck (line 98)
  2. GetTransactionMismatchRangeFromClickHouseV2getBlocksFromV2
  3. GetLogsMismatchRangeFromClickHouseV2getBlocksFromV2

Consider refactoring to load blocks once and pass them to the mismatch detection functions, reducing database load.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 19296e1 and 2a808a8.

📒 Files selected for processing (2)
  • internal/committer/reorg.go (2 hunks)
  • internal/libs/clickhouse.go (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
internal/committer/reorg.go (2)
internal/libs/clickhouse.go (2)
  • GetTransactionMismatchRangeFromClickHouseV2 (271-339)
  • GetLogsMismatchRangeFromClickHouseV2 (348-421)
internal/libs/redis.go (1)
  • SetReorgLastValidBlock (50-52)
internal/libs/clickhouse.go (1)
internal/libs/constants.go (1)
  • EMPTY_LOGS_BLOOM (3-3)
🔇 Additional comments (4)
internal/libs/clickhouse.go (3)

47-56: LGTM!

The aggregate structs are well-defined with appropriate ClickHouse column tags for mapping query results.


267-339: LGTM!

The transaction mismatch detection logic correctly handles both scenarios: expected transactions missing, and unexpected transactions present. The nil checks on lines 297-299 and 307-309 properly guard against incomplete data.


341-421: LGTM!

The logs mismatch detection correctly verifies three conditions:

  1. Non-empty logsBloom with missing logs
  2. Empty logsBloom with unexpected logs
  3. Log index contiguity (detecting gaps via MaxLogIndex+1 != LogCount)

The logic properly uses EMPTY_LOGS_BLOOM from constants for bloom filter comparison.

internal/committer/reorg.go (1)

159-183: LGTM!

The range combination logic correctly computes the union of all detected mismatch ranges by taking min(start) and max(end) across non-empty ranges. This ensures all detected inconsistencies are addressed in a single reorg operation.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
internal/committer/reorg.go (1)

153-181: Consider enforcing basic invariants on the merged reorg ranges

The consolidated finalStart/finalEnd logic is clear and correctly takes min(start) / max(end) across headers/tx/logs, but it implicitly assumes that each source range is either completely unset (-1 for both start and end) or a valid, ordered interval.

To harden this against unexpected outputs from any of the range providers, you could defensively enforce invariants before merging, e.g.:

  • Only treat a range as non‑empty when both start >= 0 and end >= start.
  • Optionally return an error (or at least log) if a provider returns start >= 0 but end < start, instead of merging it.

For example (sketch):

- // transactions range
- if txStart > -1 {
+ // transactions range
+ if txStart >= 0 && txEnd >= txStart {
     if finalStart == -1 || txStart < finalStart {
       finalStart = txStart
     }
     if finalEnd == -1 || txEnd > finalEnd {
       finalEnd = txEnd
     }
+ } else if txStart >= 0 || txEnd >= 0 {
+   return fmt.Errorf("detectAndHandleReorgs: invalid tx mismatch range (%d, %d)", txStart, txEnd)
 }

(and similarly for logs). This keeps behavior unchanged for valid ranges while guarding against accidental partial‑sentinel or inverted ranges in the future.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2a808a8 and 5a75c64.

📒 Files selected for processing (1)
  • internal/committer/reorg.go (2 hunks)
🔇 Additional comments (3)
internal/committer/reorg.go (3)

108-138: Header continuity detection + last-header fallback look consistent

The step‑1 header continuity check still mirrors the prior behavior, and the new lastHeaderBlock / reorgEndBlock fallback cleanly handles the case where a header-based reorg start is found but the end never closes within the window, without affecting the “no-header-reorg” case (since reorgStartBlock remains -1 there). This keeps header-based detection behavior intact while enabling sane last‑valid‑block tracking.


140-152: Transaction and logs mismatch queries are integrated cleanly

Pulling in GetTransactionMismatchRangeFromClickHouseV2 and GetLogsMismatchRangeFromClickHouseV2 with explicit error wrapping gives good context to callers and keeps the validator from silently proceeding when ClickHouse verification fails. The sequential placement after header checks makes the flow easy to follow and keeps each verification concern isolated.


185-196: Last-valid-block handling and error propagation look solid

Using lastHeaderBlock as the default lastValidBlock, then switching it to finalEnd only when a reorg actually runs, matches the intended semantics: advance to the latest verified header when there is no inconsistency, or to the end of the repaired range when there is. Persisting this via SetReorgLastValidBlock with proper error propagation fixes the earlier issue where Redis write failures were ignored, so the validator’s progress tracking is now reliable.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
internal/committer/reorg.go (1)

185-196: Error handling for SetReorgLastValidBlock is now properly addressed.

The error is correctly captured and returned with context. The logic of setting lastValidBlock = finalEnd after a successful reorg is correct since handleReorgForRange replaces the data in that range with valid blocks.

🧹 Nitpick comments (2)
internal/committer/reorg.go (1)

134-138: Clarify the intent: reorgEndBlock is set even when no header reorg exists.

When reorgStartBlock == -1 (no header discontinuity detected), reorgEndBlock is still set to lastHeaderBlock. This is fine because the consolidation logic at line 160 checks reorgStartBlock > -1 before using the range. However, consider renaming or adding a brief comment to clarify that reorgEndBlock is only meaningful when reorgStartBlock > -1.

internal/libs/clickhouse.go (1)

269-337: Consider extracting common mismatch detection logic.

Both GetTransactionMismatchRangeFromClickHouseV2 and GetLogsMismatchRangeFromClickHouseV2 share similar patterns: loading blocks, executing an aggregate query, building a map, and iterating to find mismatch bounds. If more verification functions are added in the future, consider extracting the common iteration and bound-tracking logic into a helper.

Also applies to: 346-419

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 5a75c64 and a8a23b7.

📒 Files selected for processing (2)
  • internal/committer/reorg.go (3 hunks)
  • internal/libs/clickhouse.go (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
internal/committer/reorg.go (2)
internal/libs/clickhouse.go (2)
  • GetTransactionMismatchRangeFromClickHouseV2 (269-337)
  • GetLogsMismatchRangeFromClickHouseV2 (346-419)
internal/libs/redis.go (1)
  • SetReorgLastValidBlock (50-52)
🔇 Additional comments (6)
internal/committer/reorg.go (3)

36-40: LGTM on the simplified block range check.

The condition now consistently checks for at least 100 blocks regardless of previous state, and includes better logging context with last_block_check, start_block, and end_block.


140-151: LGTM on transaction and log verification integration.

Error handling is properly wrapped, and the int64-to-uint64 conversions are safe since startBlock and endBlock are validated to be positive in getReorgRange().


153-183: LGTM on range consolidation logic.

The min/max consolidation correctly combines header, transaction, and log mismatch ranges into a single final reorg window. All edge cases (single range, multiple ranges, no ranges) are handled properly.

internal/libs/clickhouse.go (3)

47-56: LGTM on new aggregate types.

The types are well-structured with appropriate ClickHouse column tags and align with existing patterns in the codebase for handling block numbers.


304-334: Verify handling of nil blocks in the range.

The function iterates over blocksRaw which is pre-allocated to the full range size. Blocks that don't exist in the blocks table will have nil/zero values. The check at line 305 (block.ChainId == nil || block.ChainId.Uint64() == 0 || block.Number == nil) correctly skips these entries.

However, if a block header is missing but transactions exist for that block number, this inconsistency won't be detected since we're only iterating over known blocks. This may be intentional (only validate consistency for blocks we have headers for), but worth confirming this is the expected behavior.


393-406: LGTM on logs mismatch detection logic.

The contiguity check (MaxLogIndex+1 != LogCount) correctly validates that log indexes are 0-based and contiguous. The logic properly handles all cases:

  • Non-empty bloom with no logs → mismatch
  • Non-empty bloom with non-contiguous indexes → mismatch
  • Empty bloom with logs → mismatch

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a8a23b7 and e1104ac.

📒 Files selected for processing (3)
  • cmd/committer.go (1 hunks)
  • internal/libs/redis.go (1 hunks)
  • internal/storage/kafka_publisher.go (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
internal/storage/kafka_publisher.go (2)
internal/libs/rpcclient.go (1)
  • ChainId (11-11)
internal/common/block.go (1)
  • Block (8-33)
cmd/committer.go (1)
internal/committer/reorg.go (1)
  • RunReorgValidator (18-51)
🔇 Additional comments (2)
cmd/committer.go (1)

35-36: Verify intentional blocking behavior and disabled streaming.

RunReorgValidator() contains an infinite loop (per the relevant snippet), so calling it synchronously will block indefinitely. CommitStreaming() is commented out and unreachable regardless.

If this is intentional (e.g., this service instance should only perform reorg validation), consider adding a comment explaining the design intent. If streaming should run concurrently, restore the go keyword.

internal/storage/kafka_publisher.go (1)

271-279: Useful diagnostic logging added.

The debug log for block metadata provides good observability into what's being processed. Using Debug level is appropriate to avoid noise in production.

var RedisClient *redis.Client

const RedisReorgLastValidBlock = "reorg_last_valid"
const RedisReorgLastValidBlock = "reorg_last_valid_debug"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Debug suffix in Redis key appears unintentional for production.

The _debug suffix suggests this is temporary test code. If merged to main, this will:

  1. Isolate this deployment from previously stored reorg_last_valid data
  2. Leave a debug artifact in production

If this is intentional for testing the new reorg consolidation, consider removing the suffix before merge or documenting why isolation is needed.

🤖 Prompt for AI Agents
In internal/libs/redis.go around line 15 the constant RedisReorgLastValidBlock
is set to "reorg_last_valid_debug", which leaves a debug artifact and isolates
this deployment from prior data; change the constant value to the production key
"reorg_last_valid" (or "reorg_last_valid_block" if that matches existing
scheme), update all usages across the codebase to use the new constant
(search-and-replace symbol or key string), and either remove the `_debug` suffix
or add a clear comment and documented migration/backfill plan if isolation is
intentional before merging.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
internal/storage/kafka_publisher.go (2)

118-128: Reorg publish count debug log looks good

The new debug log in PublishBlockDataReorg adds helpful visibility into how many old vs new blocks are being processed per reorg. If you find yourself debugging specific block ranges frequently, you could optionally extend this to include the min/max block numbers from each slice, but it's not required for correctness.


271-281: Per-block metadata debug log is accurate; just be mindful of log volume

The new debug log in createBlockDataMessage correctly reports chain ID, block number, and tx/log/trace counts, plus is_deleted / is_reorg flags. Since this fires once per block, it’s great for targeted debugging; just ensure Debug level isn’t left on in high‑throughput environments unless you’re okay with the log volume.

internal/committer/reorg.go (2)

108-196: Consolidated header/tx/log reorg range logic looks sound; consider logging the computed ranges

The new flow in detectAndHandleReorgs that:

  • derives a header-based [reorgStartBlock, reorgEndBlock] (with lastHeaderBlock fallback),
  • pulls tx and logs mismatch ranges from ClickHouse, and
  • folds them into a single [finalStart, finalEnd] by min(start)/max(end),

is internally consistent and aligns with the idea of treating any mismatch (headers, tx, or logs) as part of one reorg window. Using lastHeaderBlock as the default lastValidBlock and advancing it to finalEnd only when a non-empty inconsistent range exists also looks correct, and the Redis write is now properly error-checked (resolving the earlier concern about ignored errors).

For future debugging of unexpected reorgs or mismatches, it would be useful to add a single debug log summarizing the three sub-ranges and the final combined range, e.g., logging reorgStartBlock/reorgEndBlock, txStart/txEnd, logsStart/logsEnd, and finalStart/finalEnd before calling handleReorgForRange.


228-240: Additional diagnostics on old block data from ClickHouse are helpful

Counting nonNilOldBlocks and logging it alongside requested_old_blocks, start_block, and end_block is a good way to spot gaps in stored data before publishing the reorg. If you find it useful in practice, you could optionally add chain_id to that log for easier cross-chain triage, but the current change is already valuable.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9309415 and 8ab3050.

📒 Files selected for processing (2)
  • internal/committer/reorg.go (4 hunks)
  • internal/storage/kafka_publisher.go (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
internal/storage/kafka_publisher.go (2)
internal/libs/rpcclient.go (1)
  • ChainId (11-11)
internal/common/block.go (1)
  • Block (8-33)
internal/committer/reorg.go (3)
internal/libs/clickhouse.go (2)
  • GetTransactionMismatchRangeFromClickHouseV2 (257-325)
  • GetLogsMismatchRangeFromClickHouseV2 (334-407)
internal/libs/rpcclient.go (2)
  • ChainId (11-11)
  • ChainIdStr (12-12)
internal/libs/redis.go (1)
  • SetReorgLastValidBlock (50-52)
🔇 Additional comments (1)
internal/committer/reorg.go (1)

36-38: Structured “not enough new blocks” log is a nice observability improvement

The added structured debug log (last_block_check, start_block, end_block) when skipping reorg validation gives good visibility into why the loop is sleeping without changing behavior.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
internal/committer/reorg.go (1)

36-37: Clarify threshold math and last_block_check logging semantics

  • if endBlock-startBlock < 100 means you only run reorg validation when there are at least 101 blocks in the inclusive range (end-start+1). If you intended "at least 100 new blocks", the condition should be endBlock-startBlock+1 < 100.
  • last_block_check in the debug log is only updated after a successful detectAndHandleReorgs call, so in the "not enough new blocks" path it will often be stale (or 0 before the first successful run). If the goal is to log the most recently considered range, you may want to either:
    • Log only start_block / end_block, or
    • Update lastBlockCheck in this branch as well (e.g., to endBlock).

Also applies to: 43-49

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between cc81319 and ebf1254.

📒 Files selected for processing (1)
  • internal/committer/reorg.go (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
internal/committer/reorg.go (2)
internal/libs/clickhouse.go (3)
  • GetBlockHeadersForReorgCheck (176-202)
  • GetTransactionMismatchRangeFromClickHouseV2 (257-325)
  • GetLogsMismatchRangeFromClickHouseV2 (334-407)
internal/libs/redis.go (1)
  • SetReorgLastValidBlock (50-52)

Comment on lines +94 to 106
func detectAndHandleReorgs(startBlock int64, endBlock int64) (int64, error) {
log.Debug().Msgf("Checking for reorgs from block %d to %d", startBlock, endBlock)

// Fetch block headers for the range
blockHeaders, err := libs.GetBlockHeadersForReorgCheck(libs.ChainId.Uint64(), uint64(startBlock), uint64(endBlock))
if err != nil {
return fmt.Errorf("detectAndHandleReorgs: failed to get block headers: %w", err)
return 0, fmt.Errorf("detectAndHandleReorgs: failed to get block headers: %w", err)
}

if len(blockHeaders) == 0 {
log.Debug().Msg("detectAndHandleReorgs: No block headers found in range")
return nil
return 0, nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Harden header handling against nils and decouple header-based reorgs from ClickHouse verification failures

Overall structure of the new multi-source reorg detection looks solid, and handling the error from SetReorgLastValidBlock is a good improvement. There are, however, a couple of important robustness gaps in this function:

  1. Potential panic on nil headers from GetBlockHeadersForReorgCheck

Per the provided context, GetBlockHeadersForReorgCheck can return a []*common.Block "with nils for missing entries if any". The continuity loop and lastHeaderBlock assignment assume that every element is non-nil:

  • blockHeaders[i].Number.Int64() and blockHeaders[i-1].Number.Int64() (Lines 112–115, 121, 127)
  • lastHeaderBlock := blockHeaders[len(blockHeaders)-1].Number.Int64() (Line 134)

If any entry is nil, this will panic and take down the validator. At minimum, you should defensively guard against nil entries before dereferencing:

 blockHeaders, err := libs.GetBlockHeadersForReorgCheck(libs.ChainId.Uint64(), uint64(startBlock), uint64(endBlock))
 if err != nil {
   return 0, fmt.Errorf("detectAndHandleReorgs: failed to get block headers: %w", err)
 }
 
 if len(blockHeaders) == 0 {
   log.Debug().Msg("detectAndHandleReorgs: No block headers found in range")
   return 0, nil
 }
+
+ // Fail fast if any header in the range is unexpectedly nil to avoid panics.
+ for i, h := range blockHeaders {
+   if h == nil {
+     return 0, fmt.Errorf("detectAndHandleReorgs: nil header at index %d (start=%d end=%d)", i, startBlock, endBlock)
+   }
+ }
@@
- // set end to the last block if not set
- lastHeaderBlock := blockHeaders[len(blockHeaders)-1].Number.Int64()
+ // set end to the last block if not set
+ lastHeaderBlock := blockHeaders[len(blockHeaders)-1].Number.Int64()

If nils are actually expected in normal operation, you may instead want to skip those entries or treat them as mismatches rather than returning an error, but you should still avoid dereferencing them.

  1. ClickHouse mismatch queries currently block header-based reorg handling

Right now any failure from GetTransactionMismatchRangeFromClickHouseV2 or GetLogsMismatchRangeFromClickHouseV2 aborts the whole function:

txStart, txEnd, err := libs.GetTransactionMismatchRangeFromClickHouseV2(...)
if err != nil {
    return 0, fmt.Errorf("detectAndHandleReorgs: transaction verification failed: %w", err)
}
...
logsStart, logsEnd, err := libs.GetLogsMismatchRangeFromClickHouseV2(...)
if err != nil {
    return 0, fmt.Errorf("detectAndHandleReorgs: logs verification failed: %w", err)
}

This couples correctness to ClickHouse health even when header continuity has already detected a clear reorg range (reorgStartBlock > -1). A transient failure in either mismatch query will prevent you from handling a header-based reorg at all.

Consider treating these mismatch checks as best-effort signals and falling back to header-only behavior when they fail, for example:

 txStart, txEnd, err := libs.GetTransactionMismatchRangeFromClickHouseV2(libs.ChainId.Uint64(), uint64(startBlock), uint64(endBlock))
 if err != nil {
-    return 0, fmt.Errorf("detectAndHandleReorgs: transaction verification failed: %w", err)
+    log.Error().Err(err).
+        Int64("start_block", startBlock).
+        Int64("end_block", endBlock).
+        Msg("detectAndHandleReorgs: transaction verification failed; continuing with other signals")
+    txStart, txEnd = -1, -1
 }
@@
 logsStart, logsEnd, err := libs.GetLogsMismatchRangeFromClickHouseV2(libs.ChainId.Uint64(), uint64(startBlock), uint64(endBlock))
 if err != nil {
-    return 0, fmt.Errorf("detectAndHandleReorgs: logs verification failed: %w", err)
+    log.Error().Err(err).
+        Int64("start_block", startBlock).
+        Int64("end_block", endBlock).
+        Msg("detectAndHandleReorgs: logs verification failed; continuing with other signals")
+    logsStart, logsEnd = -1, -1
 }

This way:

  • If headers clearly show a reorg, you still reorg even when the ClickHouse aggregate queries are unhealthy.
  • When ClickHouse is healthy, you still get the stronger "combined" detection window via the tx/log ranges.
  1. Semantics when len(blockHeaders) == 0

Currently the "no headers found" path returns (0, nil) and never updates Redis:

if len(blockHeaders) == 0 {
    log.Debug().Msg("detectAndHandleReorgs: No block headers found in range")
    return 0, nil
}

Given the caller only uses the returned lastValidBlock for logging, this is not a correctness bug, but it does mean:

  • last_block_check will be logged as 0 for that iteration.
  • getLastValidBlock will continue to drive the window based on the previous Redis value or the "1 day ago" fallback.

If this "no headers in range" case is expected to happen in practice (e.g., sparse data or lag), you might want to either:

  • Return an error so it is clearly visible and handled as a failure, or
  • Return a documented sentinel (e.g., startBlock-1) and handle that in the caller instead of overloading 0.

Also applies to: 108-138, 140-152, 153-183, 185-198

@nischitpra nischitpra merged commit e8fa8b5 into main Dec 11, 2025
6 checks passed
@nischitpra nischitpra deleted the np/reorg_fix branch December 11, 2025 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants