Skip to content

Conversation

@tejassp-db
Copy link
Collaborator

Description

Databricks server shares query results in Arrow format for easy cross language functionality. The JDBC driver experiences compatibility issues with JDK 16 and later versions when processing Arrow results.

This problem arises from stricter encapsulation of internal APIs in newer Java versions, which affects the driver's use of the Apache Arrow result format consumption with the Apache Arrow library. The JDBC driver is used in partner solutions, where they do not have control of the runtime environment, and the workaround of setting JVM arguments is not feasible.

Testing

Tests are added in other stacked PRs.

Additional Notes to the Reviewer

Its a stacked PR.

Patch Arrow to create a Databricks ArrowBuf which allocates memory on
the heap and provides access to it through Java methods. This removes
the need to specify "--add-opens=java.base/java.nio=ALL-UNNAMED" as
JVM args for JDK 16+.
Use native Arrow if available. Otherwise fallback to the patch version.
Remove irrelevant reference counting in patch code. Patch code uses heap
memory for arrow operations and reference counting is not required.
@tejassp-db tejassp-db self-assigned this Dec 22, 2025
Remove redundant todos for accounting.
Patch DecimalUtility to not use unsafe methods to set decimal values on DatabricksArrowBuf.
Add notice to all patched Arrow Java code. In NOTICE file mention Arrow
has been patched by Databricks.
On static init failure of MemoryUtil class, it prints a stack trace to
stderr. Remove this print, since now we fallback to DatabricksBufferAllocator
when this happens. And the error is logged as well.

// ---- Databricks patch start ----
private final HistoricalLog historicalLog =
DEBUG ? new HistoricalLog(DEBUG_LOG_LENGTH, "ArrowBuf[%d]", id) : null;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we even need to worry about the historical log? can we not just set it to null? seems like its usage is null checked everywhere anyway?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, add a comment on why this is needed

Comment on lines +49 to +54
long currentReservation = reservedSize.get();
long newReservation = currentReservation + nBytes;
if (newReservation > allocator.getHeadroom() + currentReservation) {
return false;
}
reservedSize.addAndGet(nBytes);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not thread safe, should we use compareAndSet?

Comment on lines +114 to +122
@Override
public ReferenceManager getReferenceManager() {
return referenceManager;
}

@Override
public long capacity() {
return capacity;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think a bunch of these methods do not have a changed override behaviour, can we remove them so that it is easy to review and maintain?

if (capacity > Integer.MAX_VALUE) {
throw new IllegalArgumentException(
"DatabricksArrowBuf does not support capacity > Integer.MAX_VALUE");
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • why this limit?
  • this is missing from the other constructor
  • can we reuse constructors

static {
RootAllocator rootAllocator = null;
try {
rootAllocator = new RootAllocator();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we were using Integer.MAX_VALUE, not needed now?

// ---- to avoid unsafe allocation initialization errors.
public static final String DEBUG_ALLOCATOR = "arrow.memory.debug.allocator";
public static final int DEBUG_LOG_LENGTH = 6;
public static final boolean DEBUG;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename to better name than debug?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants