Skip to content

Comments

Fix ~5x fetch regression by caching TelemetryCollector in DatabricksResultSet#1223

Merged
gopalldb merged 3 commits intomainfrom
gopal.lal/fix-telemetry-per-row-overhead
Feb 21, 2026
Merged

Fix ~5x fetch regression by caching TelemetryCollector in DatabricksResultSet#1223
gopalldb merged 3 commits intomainfrom
gopal.lal/fix-telemetry-per-row-overhead

Conversation

@gopalldb
Copy link
Collaborator

@gopalldb gopalldb commented Feb 21, 2026

Summary

Problem

PR #1163 refactored telemetry collection from a singleton pattern to per-connection, but inadvertently introduced per-row overhead in DatabricksResultSet.next(). On every row iteration (~6.5M times for a 1025MB query), the code now traverses:

parentStatement.getStatement().getConnection()
  → cast to DatabricksConnection
  → getConnectionContext()
  → TelemetryCollectorManager.getInstance().getOrCreateCollector(context)
  → ConcurrentHashMap.computeIfAbsent()

This adds ~2.3 microseconds per row, compounding to ~15 seconds of overhead over 6.5M rows.

Fix

Resolve the TelemetryCollector once in each constructor via a resolveTelemetryCollector() helper and store it in a final field. The next() method calls the cached collector directly, bypassing the entire per-row resolution chain. The connection-to-collector mapping is stable for the lifetime of a result set, so caching at construction time is safe.

Before:

public boolean next() throws SQLException {
  checkIfClosed();
  boolean hasNext = this.executionResult.next();
  TelemetryHelper.recordResultSetIteration(
      parentStatement, statementId, resultSetMetaData.getChunkCount(), hasNext);
  return hasNext;
}

After:

public boolean next() throws SQLException {
  checkIfClosed();
  boolean hasNext = this.executionResult.next();
  if (cachedTelemetryCollector != null) {
    cachedTelemetryCollector.recordResultSetIteration(
        statementId.toSQLExecStatementId(), resultSetMetaData.getChunkCount(), hasNext);
  }
  return hasNext;
}

Benchmark Results

Regression Bisection (1025MB query, ~6.5M rows, Arrow cloud-fetch, Thrift mode, warm iteration)

Build Commit Fetch (ms) Rows/s Per-row cost Status
v3.1.1 (baseline) 0e8b1ad 5,037 1.29M 0.78 us ✅ Baseline
Pre-#1163 9c9ffe2 4,720 1.38M 0.73 us ✅ Matches baseline
Post-#1163 8a2f333 19,727 329K 3.04 us ❌ 3.9x regression
v3.1.2 HEAD (unfixed) 47235a1 20,382 318K 3.14 us ❌ 4.0x regression
v3.1.2 (this fix) eaa3f64 4,447 1.46M 0.68 us 12% faster than baseline

Post-Fix Segment Latency (500K rows per segment, warm iteration)

The fix restores the expected bursty pattern (fast in-memory iteration alternating with chunk-download waits):

 500K rows:   73ms    (in-memory, fast)
   1M rows:  393ms    (chunk wait)
 1.5M rows:  147ms    (in-memory)
   2M rows:  317ms    (chunk wait)
 2.5M rows:  409ms    (chunk wait)
   3M rows:  282ms    (mixed)
 3.5M rows:  250ms    (mixed)
   4M rows:  314ms    (chunk wait)
 4.5M rows:  197ms    (in-memory)
   5M rows:  370ms    (chunk wait)
 5.5M rows:  361ms    (chunk wait)
   6M rows:  146ms    (in-memory)

vs the unfixed v3.1.2 which showed flat ~1,450ms per segment (dominated by uniform per-row overhead).

Comprehensive Comparison (1025MB, warm iteration)

Driver Mode Arrow Fetch (ms) Rows/s
Simba 2.7.6 Thrift N/A 109,994 59.0K
OSS v3.1.1 Thrift Yes 5,037 1.29M
OSS v3.1.1 SEA Yes 4,202 1.54M
OSS v3.1.1 Thrift No 163,051 39.8K
OSS v3.1.2 (unfixed) Thrift Yes 20,382 318K
OSS v3.1.2 (unfixed) SEA Yes 19,882 326K
OSS v3.1.2 (unfixed) Thrift No 145,131 44.7K
OSS v3.1.2 (this fix) Thrift Yes 4,447 1.46M

NoArrow mode is unaffected by the regression (~145-163s for both versions) because the per-row telemetry overhead (~2.3us) is negligible compared to the ~25us/row cost of non-Arrow iteration.

Test plan

  • mvn test -Dtest=DatabricksResultSetTest — all 48 tests pass (43 existing + 5 new)
  • New tests cover: next() with cached telemetry, null parentStatement (no NPE), exception handling in resolution, null connection context, collector resolved once at construction
  • Benchmark verified: fetch time drops from 20,382ms → 4,447ms (4.6x improvement)
  • CI pipeline

🤖 Generated with Claude Code

NO_CHANGELOG=true
OVERRIDE_FREEZE=true

gopalldb and others added 2 commits February 21, 2026 03:37
…esultSet

PR #1163 refactored telemetry from singleton to per-connection, but
introduced per-row overhead in DatabricksResultSet.next() by traversing
the parentStatement → connection → context → ConcurrentHashMap chain
on every row (~6.5M times for a 1025MB query). This added ~2.3us/row,
compounding to ~15s of overhead.

Fix: resolve the TelemetryCollector once at construction time and store
it in a final field. The next() method calls the cached collector
directly, eliminating per-row object traversal, type casts, UUID
hashing, and ConcurrentHashMap lookups.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- **Enhanced `enableMultipleCatalogSupport` behavior**: When this parameter is disabled (`enableMultipleCatalogSupport=0`), metadata operations (such as `getSchemas()`, `getTables()`, `getColumns()`, etc.) now return results only when the catalog parameter is either `null` or matches the current catalog. For any other catalog name, an empty result set is returned. This ensures metadata queries are restricted to the current catalog context. When enabled (`enableMultipleCatalogSupport=1`), metadata operations continue to work across all accessible catalogs.

### Fixed
- Optimized telemetry collection during result set iteration by caching the TelemetryCollector at construction time, eliminating per-row lookup overhead for large result sets.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as this was not released, do we need a changelog? wondering if that would confuse readers

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair point, will remove

Copy link
Collaborator

@samikshya-db samikshya-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this!

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@gopalldb gopalldb merged commit 1af2dec into main Feb 21, 2026
12 of 15 checks passed
@gopalldb gopalldb deleted the gopal.lal/fix-telemetry-per-row-overhead branch February 21, 2026 05:23
boolean hasNext = this.executionResult.next();
TelemetryHelper.recordResultSetIteration(
parentStatement, statementId, resultSetMetaData.getChunkCount(), hasNext);
if (cachedTelemetryCollector != null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we have to do this with every next? this is super wasteful and anti-pattern with no benefit. Please fix this to have it on fetch request boundary.

Copy link
Collaborator

@vikrantpuppala vikrantpuppala Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, also why we needed #1210

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants