Skip to content

Conversation

@logachev
Copy link
Contributor

@logachev logachev commented Jan 6, 2026

This PR moves Google-owned JDBC driver that is now in Public Preview to this repo according to previously discussed plans.

Move is as-is from old repo, so it is not yet integrated with the rest of the repo.
It includes only Java code, all supporting scripts, readme etc will follow.

Maven, bom, CI integrations etc will be performed separately.

@logachev logachev requested review from a team as code owners January 6, 2026 23:27
@logachev logachev requested a review from mrfaizal January 6, 2026 23:27
@product-auto-label product-auto-label bot added the size: xl Pull request size is extra large. label Jan 6, 2026
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/java-bigquery API. label Jan 6, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @logachev, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request represents a foundational step in bringing the Google-owned JDBC driver for BigQuery into its new home within this repository. The primary focus of these changes is the migration of the driver's core Java codebase, which is currently in Public Preview. This initial move is performed without immediate integration into the broader repository ecosystem, setting the stage for future development efforts that will encompass build system integration, comprehensive documentation, and supporting tooling.

Highlights

  • Initial Code Migration: The core Java code for the Google-owned JDBC driver for BigQuery has been moved into this repository, marking a significant step in its development and public preview.
  • As-Is Transfer: The code was moved 'as-is' from its previous location, meaning it is not yet fully integrated with the existing repository structure or build systems.
  • Future Integrations Planned: Subsequent work will involve integrating the driver with Maven, Bill of Materials (BOM), Continuous Integration (CI), and adding supporting scripts and documentation.
  • Extensive New Files: A large number of new Java source files have been added, covering core JDBC functionalities, exception handling, type coercion, and specific implementations for Arrow and JSON data formats, along with Maven configuration and SQL resource files.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new BigQuery JDBC driver module, google-cloud-bigquery-jdbc, which includes a comprehensive set of classes for JDBC functionality. Key additions include new exception types (BigQueryConversionException, BigQueryJdbcException, etc.), implementations for java.sql.Array, ResultSet, Struct, Connection, Statement, PreparedStatement, and CallableStatement tailored for BigQuery's JSON and Arrow data formats. It also adds classes for managing connection properties, type coercions, logging, and connection pooling. The pom.xml for the new module defines dependencies and build profiles. Review comments highlight several areas for improvement, including ensuring proper resource closure for ArrowDeserializer in BigQueryArrowResultSet, correcting type mapping logic in BigQueryJdbcTypeMappings to prevent unreachable conditions and NullPointerExceptions, fixing an incorrect row count increment in BigQueryArrowResultSet when a sentinel batch is received, ensuring SQLException is thrown for findColumn as per JDBC spec, handling null parameters gracefully and correcting isAssignableFrom usage in BigQueryCallableStatement, centralizing test dependency versions in the parent pom.xml, standardizing return values for getByte in BigQueryCallableStatement, and improving the efficiency and interrupt handling of convertMapToConnectionPropertiesList and populateArrowBufferedQueue methods, as well as suggesting a more robust switch statement for getSqlTypeFromStatementType.

Comment on lines +443 to +452
public void close() {
LOG.fine(String.format("Closing BigqueryArrowResultSet %s.", this));
this.isClosed = true;
if (ownedThread != null && !ownedThread.isInterrupted()) {
// interrupt the producer thread when result set is closed
ownedThread.interrupt();
}
super.close();
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The arrowDeserializer holds resources like a RootAllocator that need to be explicitly closed to prevent memory leaks. The current close() method implementation does not close the deserializer. This can lead to a significant resource leak, especially with large result sets.

  public void close() {
    LOG.fine(String.format("Closing BigqueryArrowResultSet %s.", this));
    this.isClosed = true;
    if (arrowDeserializer != null) {
      arrowDeserializer.close();
    }
    if (ownedThread != null && !ownedThread.isInterrupted()) {
      // interrupt the producer thread when result set is closed
      ownedThread.interrupt();
    }
    super.close();
  }

Comment on lines +101 to +106
} else if (String.class.isAssignableFrom(type)) {
return StandardSQLTypeName.STRING;
} else if (String.class.isAssignableFrom(type)) {
return StandardSQLTypeName.GEOGRAPHY;
} else if (String.class.isAssignableFrom(type)) {
return StandardSQLTypeName.DATETIME;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There are multiple else if (String.class.isAssignableFrom(type)) conditions. This makes the logic for GEOGRAPHY and DATETIME unreachable, as the first condition for STRING will always match first. This logic needs to be revised to correctly map types from Class to StandardSQLTypeName. A similar issue exists for BigDecimal with NUMERIC and BIGNUMERIC.

this.vectorSchemaRoot.clear();
}
this.hasReachedEnd = true;
this.rowCount++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The rowCount is incremented even when the last batch wrapper (sentinel) is received. This batch does not contain any data, so the row count should not be incremented. This bug can lead to an incorrect row count and may cause isLast() to return true prematurely.

            this.hasReachedEnd = true;
            return false;

Comment on lines +217 to +225
protected int getColumnIndex(String columnLabel) throws SQLException {
LOG.finest("++enter++");
checkClosed();
if (columnLabel == null) {
throw new SQLException("Column label cannot be null");
}
// use schema to get the column index, add 1 for SQL index
return this.schemaFieldList.getIndex(columnLabel) + 1;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The schemaFieldList.getIndex(columnLabel) method throws an IllegalArgumentException if the column label is not found. According to the JDBC specification, methods like findColumn (which uses this) should throw a SQLException. The current implementation will propagate an uncaught runtime exception.

  protected int getColumnIndex(String columnLabel) throws SQLException {
    LOG.finest("++enter++");
    checkClosed();
    if (columnLabel == null) {
      throw new SQLException("Column label cannot be null");
    }
    try {
      // use schema to get the column index, add 1 for SQL index
      return this.schemaFieldList.getIndex(columnLabel) + 1;
    } catch (IllegalArgumentException e) {
      throw new SQLException("Column not found: " + columnLabel, e);
    }
  }

Comment on lines +231 to +280
public Reader getCharacterStream(int arg0) throws SQLException {
LOG.finest("++enter++");
Object param = this.parameterHandler.getParameter(arg0);
if (param instanceof String || param.getClass().isAssignableFrom(String.class)) {
return new StringReader(param.toString());
}

if (param instanceof BufferedReader) {
return (BufferedReader) param;
}
if (param.getClass().isAssignableFrom(BufferedReader.class)) {
return getObject(arg0, BufferedReader.class);
}

if (param instanceof CharArrayReader) {
return (CharArrayReader) param;
}
if (param.getClass().isAssignableFrom(CharArrayReader.class)) {
return getObject(arg0, CharArrayReader.class);
}

if (param instanceof FilterReader) {
return (FilterReader) param;
}
if (param.getClass().isAssignableFrom(FilterReader.class)) {
return getObject(arg0, FilterReader.class);
}

if (param instanceof InputStreamReader) {
return (InputStreamReader) param;
}
if (param.getClass().isAssignableFrom(InputStreamReader.class)) {
return getObject(arg0, InputStreamReader.class);
}

if (param instanceof PipedReader) {
return (PipedReader) param;
}
if (param.getClass().isAssignableFrom(PipedReader.class)) {
return getObject(arg0, PipedReader.class);
}

if (param instanceof StringReader) {
return (StringReader) param;
}
if (param.getClass().isAssignableFrom(StringReader.class)) {
return getObject(arg0, StringReader.class);
}
return null;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The condition param.getClass().isAssignableFrom(String.class) is used incorrectly and will not work as intended. It should be String.class.isAssignableFrom(param.getClass()). However, this is already covered by the instanceof check. More critically, param.getClass() will throw a NullPointerException if param is null. The method should handle null parameters gracefully. This incorrect and unsafe pattern is repeated for multiple types within this method.

  public Reader getCharacterStream(int arg0) throws SQLException {
    LOG.finest("++enter++");
    Object param = this.parameterHandler.getParameter(arg0);
    if (param == null) {
      return null;
    }
    if (param instanceof Reader) {
      return (Reader) param;
    }
    if (param instanceof String) {
      return new StringReader((String) param);
    }
    return null;
  }

if (param.getClass().isAssignableFrom(Byte.class)) {
return getObject(arg0, Byte.class);
}
return -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This method returns -1 on a type mismatch, which is inconsistent with other get... methods in this class that return 0. The JDBC specification for getByte states that it should return 0 for SQL NULL values. For consistency, it would be better to return 0 here in case of a type mismatch as well.

Suggested change
return -1;
return 0;

Comment on lines +970 to +984
private List<ConnectionProperty> convertMapToConnectionPropertiesList(
Map<String, String> queryPropertiesMap) {
LOG.finest("++enter++");
List<ConnectionProperty> connectionProperties = new ArrayList<ConnectionProperty>();
if (queryPropertiesMap != null) {
for (Map.Entry<String, String> entry : queryPropertiesMap.entrySet()) {
connectionProperties.add(
ConnectionProperty.newBuilder()
.setKey(entry.getKey())
.setValue(entry.getValue())
.build());
}
}
return connectionProperties;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This method can be written more concisely and efficiently using Java Streams.

Suggested change
private List<ConnectionProperty> convertMapToConnectionPropertiesList(
Map<String, String> queryPropertiesMap) {
LOG.finest("++enter++");
List<ConnectionProperty> connectionProperties = new ArrayList<ConnectionProperty>();
if (queryPropertiesMap != null) {
for (Map.Entry<String, String> entry : queryPropertiesMap.entrySet()) {
connectionProperties.add(
ConnectionProperty.newBuilder()
.setKey(entry.getKey())
.setValue(entry.getValue())
.build());
}
}
return connectionProperties;
}
private List<ConnectionProperty> convertMapToConnectionPropertiesList(
Map<String, String> queryPropertiesMap) {
LOG.finest("++enter++");
if (queryPropertiesMap == null) {
return new ArrayList<>();
}
return queryPropertiesMap.entrySet().stream()
.map(
entry ->
ConnectionProperty.newBuilder()
.setKey(entry.getKey())
.setValue(entry.getValue())
.build())
.collect(java.util.stream.Collectors.toList());
}

Comment on lines +831 to +836
} catch (RuntimeException | InterruptedException e) {
LOG.log(
Level.WARNING,
"\n" + Thread.currentThread().getName() + " Interrupted @ arrowStreamProcessor",
e);
} finally { // logic needed for graceful shutdown
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Catching InterruptedException without restoring the interrupt status can lead to issues in cancellable tasks. It's a best practice to call Thread.currentThread().interrupt() in the catch block to ensure that the interrupted status of the thread is preserved for upstream code to handle.

Suggested change
} catch (RuntimeException | InterruptedException e) {
LOG.log(
Level.WARNING,
"\n" + Thread.currentThread().getName() + " Interrupted @ arrowStreamProcessor",
e);
} finally { // logic needed for graceful shutdown
} catch (RuntimeException | InterruptedException e) {
LOG.log(
Level.WARNING,
"\n" + Thread.currentThread().getName() + " Interrupted @ arrowStreamProcessor",
e);
Thread.currentThread().interrupt();

Comment on lines +964 to +970
// calls
try {
// this is the first page which we have received.
rpcResponseQueue.put(Tuple.of(result, true));
} catch (InterruptedException e) {
LOG.log(
Level.WARNING,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The InterruptedException is caught and logged, but the thread's interrupted status is not restored. Even if this is expected during cancellation, it's crucial to call Thread.currentThread().interrupt() to propagate the interrupted state, allowing other parts of the system to react accordingly.

            } catch (InterruptedException e) {
              LOG.log(Level.WARNING, "\n" + Thread.currentThread().getName() + " Interrupted", e);
              // Thread might get interrupted while calling the Cancel method, which is
              // expected, so logging this instead of throwing the exception back
              Thread.currentThread().interrupt();
              break;

class BigQuerySqlTypeConverter {

static SqlType getSqlTypeFromStatementType(StatementType statementType) {
switch (statementType.toString()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using statementType.toString() in a switch statement is less efficient and safe than switching on the enum type directly. It's better to switch on the StatementType enum itself to leverage compile-time checks and better performance.

Suggested change
switch (statementType.toString()) {
switch (statementType) {

@lqiu96
Copy link
Member

lqiu96 commented Jan 7, 2026

Error:

[ERROR] Tests run: 186, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 437.8 s <<< FAILURE! -- in com.google.cloud.bigquery.it.ITBigQueryTest
[ERROR] com.google.cloud.bigquery.it.ITBigQueryTest.testCopyJobWithLabelsAndExpTime -- Time elapsed: 1.982 s <<< FAILURE!
java.lang.AssertionError

This has been updated in a recent PR. If you pull the latest changes, I think the CI errors should be resolved.

</dependency>

<!-- Dependencies for Arrow RS -->
<dependency>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are arrow deps from BQstorage or is arrow used natively inside jdbc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BQStorage dependency

Copy link
Member

@lqiu96 lqiu96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM. These was developed in a separate repo. Let @jinseopkim0 take a look at this approve as well. Thanks!

@logachev logachev merged commit ffb0fdf into main Jan 9, 2026
25 checks passed
@logachev logachev deleted the kirl/jdbc branch January 9, 2026 21:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigquery Issues related to the googleapis/java-bigquery API. size: xl Pull request size is extra large.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants