-
Notifications
You must be signed in to change notification settings - Fork 130
chore: move Google JDBC driver code #4050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @logachev, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request represents a foundational step in bringing the Google-owned JDBC driver for BigQuery into its new home within this repository. The primary focus of these changes is the migration of the driver's core Java codebase, which is currently in Public Preview. This initial move is performed without immediate integration into the broader repository ecosystem, setting the stage for future development efforts that will encompass build system integration, comprehensive documentation, and supporting tooling. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new BigQuery JDBC driver module, google-cloud-bigquery-jdbc, which includes a comprehensive set of classes for JDBC functionality. Key additions include new exception types (BigQueryConversionException, BigQueryJdbcException, etc.), implementations for java.sql.Array, ResultSet, Struct, Connection, Statement, PreparedStatement, and CallableStatement tailored for BigQuery's JSON and Arrow data formats. It also adds classes for managing connection properties, type coercions, logging, and connection pooling. The pom.xml for the new module defines dependencies and build profiles. Review comments highlight several areas for improvement, including ensuring proper resource closure for ArrowDeserializer in BigQueryArrowResultSet, correcting type mapping logic in BigQueryJdbcTypeMappings to prevent unreachable conditions and NullPointerExceptions, fixing an incorrect row count increment in BigQueryArrowResultSet when a sentinel batch is received, ensuring SQLException is thrown for findColumn as per JDBC spec, handling null parameters gracefully and correcting isAssignableFrom usage in BigQueryCallableStatement, centralizing test dependency versions in the parent pom.xml, standardizing return values for getByte in BigQueryCallableStatement, and improving the efficiency and interrupt handling of convertMapToConnectionPropertiesList and populateArrowBufferedQueue methods, as well as suggesting a more robust switch statement for getSqlTypeFromStatementType.
| public void close() { | ||
| LOG.fine(String.format("Closing BigqueryArrowResultSet %s.", this)); | ||
| this.isClosed = true; | ||
| if (ownedThread != null && !ownedThread.isInterrupted()) { | ||
| // interrupt the producer thread when result set is closed | ||
| ownedThread.interrupt(); | ||
| } | ||
| super.close(); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The arrowDeserializer holds resources like a RootAllocator that need to be explicitly closed to prevent memory leaks. The current close() method implementation does not close the deserializer. This can lead to a significant resource leak, especially with large result sets.
public void close() {
LOG.fine(String.format("Closing BigqueryArrowResultSet %s.", this));
this.isClosed = true;
if (arrowDeserializer != null) {
arrowDeserializer.close();
}
if (ownedThread != null && !ownedThread.isInterrupted()) {
// interrupt the producer thread when result set is closed
ownedThread.interrupt();
}
super.close();
}| } else if (String.class.isAssignableFrom(type)) { | ||
| return StandardSQLTypeName.STRING; | ||
| } else if (String.class.isAssignableFrom(type)) { | ||
| return StandardSQLTypeName.GEOGRAPHY; | ||
| } else if (String.class.isAssignableFrom(type)) { | ||
| return StandardSQLTypeName.DATETIME; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are multiple else if (String.class.isAssignableFrom(type)) conditions. This makes the logic for GEOGRAPHY and DATETIME unreachable, as the first condition for STRING will always match first. This logic needs to be revised to correctly map types from Class to StandardSQLTypeName. A similar issue exists for BigDecimal with NUMERIC and BIGNUMERIC.
| this.vectorSchemaRoot.clear(); | ||
| } | ||
| this.hasReachedEnd = true; | ||
| this.rowCount++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rowCount is incremented even when the last batch wrapper (sentinel) is received. This batch does not contain any data, so the row count should not be incremented. This bug can lead to an incorrect row count and may cause isLast() to return true prematurely.
this.hasReachedEnd = true;
return false;| protected int getColumnIndex(String columnLabel) throws SQLException { | ||
| LOG.finest("++enter++"); | ||
| checkClosed(); | ||
| if (columnLabel == null) { | ||
| throw new SQLException("Column label cannot be null"); | ||
| } | ||
| // use schema to get the column index, add 1 for SQL index | ||
| return this.schemaFieldList.getIndex(columnLabel) + 1; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The schemaFieldList.getIndex(columnLabel) method throws an IllegalArgumentException if the column label is not found. According to the JDBC specification, methods like findColumn (which uses this) should throw a SQLException. The current implementation will propagate an uncaught runtime exception.
protected int getColumnIndex(String columnLabel) throws SQLException {
LOG.finest("++enter++");
checkClosed();
if (columnLabel == null) {
throw new SQLException("Column label cannot be null");
}
try {
// use schema to get the column index, add 1 for SQL index
return this.schemaFieldList.getIndex(columnLabel) + 1;
} catch (IllegalArgumentException e) {
throw new SQLException("Column not found: " + columnLabel, e);
}
}| public Reader getCharacterStream(int arg0) throws SQLException { | ||
| LOG.finest("++enter++"); | ||
| Object param = this.parameterHandler.getParameter(arg0); | ||
| if (param instanceof String || param.getClass().isAssignableFrom(String.class)) { | ||
| return new StringReader(param.toString()); | ||
| } | ||
|
|
||
| if (param instanceof BufferedReader) { | ||
| return (BufferedReader) param; | ||
| } | ||
| if (param.getClass().isAssignableFrom(BufferedReader.class)) { | ||
| return getObject(arg0, BufferedReader.class); | ||
| } | ||
|
|
||
| if (param instanceof CharArrayReader) { | ||
| return (CharArrayReader) param; | ||
| } | ||
| if (param.getClass().isAssignableFrom(CharArrayReader.class)) { | ||
| return getObject(arg0, CharArrayReader.class); | ||
| } | ||
|
|
||
| if (param instanceof FilterReader) { | ||
| return (FilterReader) param; | ||
| } | ||
| if (param.getClass().isAssignableFrom(FilterReader.class)) { | ||
| return getObject(arg0, FilterReader.class); | ||
| } | ||
|
|
||
| if (param instanceof InputStreamReader) { | ||
| return (InputStreamReader) param; | ||
| } | ||
| if (param.getClass().isAssignableFrom(InputStreamReader.class)) { | ||
| return getObject(arg0, InputStreamReader.class); | ||
| } | ||
|
|
||
| if (param instanceof PipedReader) { | ||
| return (PipedReader) param; | ||
| } | ||
| if (param.getClass().isAssignableFrom(PipedReader.class)) { | ||
| return getObject(arg0, PipedReader.class); | ||
| } | ||
|
|
||
| if (param instanceof StringReader) { | ||
| return (StringReader) param; | ||
| } | ||
| if (param.getClass().isAssignableFrom(StringReader.class)) { | ||
| return getObject(arg0, StringReader.class); | ||
| } | ||
| return null; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition param.getClass().isAssignableFrom(String.class) is used incorrectly and will not work as intended. It should be String.class.isAssignableFrom(param.getClass()). However, this is already covered by the instanceof check. More critically, param.getClass() will throw a NullPointerException if param is null. The method should handle null parameters gracefully. This incorrect and unsafe pattern is repeated for multiple types within this method.
public Reader getCharacterStream(int arg0) throws SQLException {
LOG.finest("++enter++");
Object param = this.parameterHandler.getParameter(arg0);
if (param == null) {
return null;
}
if (param instanceof Reader) {
return (Reader) param;
}
if (param instanceof String) {
return new StringReader((String) param);
}
return null;
}| if (param.getClass().isAssignableFrom(Byte.class)) { | ||
| return getObject(arg0, Byte.class); | ||
| } | ||
| return -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method returns -1 on a type mismatch, which is inconsistent with other get... methods in this class that return 0. The JDBC specification for getByte states that it should return 0 for SQL NULL values. For consistency, it would be better to return 0 here in case of a type mismatch as well.
| return -1; | |
| return 0; |
| private List<ConnectionProperty> convertMapToConnectionPropertiesList( | ||
| Map<String, String> queryPropertiesMap) { | ||
| LOG.finest("++enter++"); | ||
| List<ConnectionProperty> connectionProperties = new ArrayList<ConnectionProperty>(); | ||
| if (queryPropertiesMap != null) { | ||
| for (Map.Entry<String, String> entry : queryPropertiesMap.entrySet()) { | ||
| connectionProperties.add( | ||
| ConnectionProperty.newBuilder() | ||
| .setKey(entry.getKey()) | ||
| .setValue(entry.getValue()) | ||
| .build()); | ||
| } | ||
| } | ||
| return connectionProperties; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method can be written more concisely and efficiently using Java Streams.
| private List<ConnectionProperty> convertMapToConnectionPropertiesList( | |
| Map<String, String> queryPropertiesMap) { | |
| LOG.finest("++enter++"); | |
| List<ConnectionProperty> connectionProperties = new ArrayList<ConnectionProperty>(); | |
| if (queryPropertiesMap != null) { | |
| for (Map.Entry<String, String> entry : queryPropertiesMap.entrySet()) { | |
| connectionProperties.add( | |
| ConnectionProperty.newBuilder() | |
| .setKey(entry.getKey()) | |
| .setValue(entry.getValue()) | |
| .build()); | |
| } | |
| } | |
| return connectionProperties; | |
| } | |
| private List<ConnectionProperty> convertMapToConnectionPropertiesList( | |
| Map<String, String> queryPropertiesMap) { | |
| LOG.finest("++enter++"); | |
| if (queryPropertiesMap == null) { | |
| return new ArrayList<>(); | |
| } | |
| return queryPropertiesMap.entrySet().stream() | |
| .map( | |
| entry -> | |
| ConnectionProperty.newBuilder() | |
| .setKey(entry.getKey()) | |
| .setValue(entry.getValue()) | |
| .build()) | |
| .collect(java.util.stream.Collectors.toList()); | |
| } |
| } catch (RuntimeException | InterruptedException e) { | ||
| LOG.log( | ||
| Level.WARNING, | ||
| "\n" + Thread.currentThread().getName() + " Interrupted @ arrowStreamProcessor", | ||
| e); | ||
| } finally { // logic needed for graceful shutdown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Catching InterruptedException without restoring the interrupt status can lead to issues in cancellable tasks. It's a best practice to call Thread.currentThread().interrupt() in the catch block to ensure that the interrupted status of the thread is preserved for upstream code to handle.
| } catch (RuntimeException | InterruptedException e) { | |
| LOG.log( | |
| Level.WARNING, | |
| "\n" + Thread.currentThread().getName() + " Interrupted @ arrowStreamProcessor", | |
| e); | |
| } finally { // logic needed for graceful shutdown | |
| } catch (RuntimeException | InterruptedException e) { | |
| LOG.log( | |
| Level.WARNING, | |
| "\n" + Thread.currentThread().getName() + " Interrupted @ arrowStreamProcessor", | |
| e); | |
| Thread.currentThread().interrupt(); |
| // calls | ||
| try { | ||
| // this is the first page which we have received. | ||
| rpcResponseQueue.put(Tuple.of(result, true)); | ||
| } catch (InterruptedException e) { | ||
| LOG.log( | ||
| Level.WARNING, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The InterruptedException is caught and logged, but the thread's interrupted status is not restored. Even if this is expected during cancellation, it's crucial to call Thread.currentThread().interrupt() to propagate the interrupted state, allowing other parts of the system to react accordingly.
} catch (InterruptedException e) {
LOG.log(Level.WARNING, "\n" + Thread.currentThread().getName() + " Interrupted", e);
// Thread might get interrupted while calling the Cancel method, which is
// expected, so logging this instead of throwing the exception back
Thread.currentThread().interrupt();
break;| class BigQuerySqlTypeConverter { | ||
|
|
||
| static SqlType getSqlTypeFromStatementType(StatementType statementType) { | ||
| switch (statementType.toString()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using statementType.toString() in a switch statement is less efficient and safe than switching on the enum type directly. It's better to switch on the StatementType enum itself to leverage compile-time checks and better performance.
| switch (statementType.toString()) { | |
| switch (statementType) { |
|
Error: This has been updated in a recent PR. If you pull the latest changes, I think the CI errors should be resolved. |
| </dependency> | ||
|
|
||
| <!-- Dependencies for Arrow RS --> | ||
| <dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are arrow deps from BQstorage or is arrow used natively inside jdbc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BQStorage dependency
lqiu96
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM. These was developed in a separate repo. Let @jinseopkim0 take a look at this approve as well. Thanks!
This PR moves Google-owned JDBC driver that is now in Public Preview to this repo according to previously discussed plans.
Move is as-is from old repo, so it is not yet integrated with the rest of the repo.
It includes only Java code, all supporting scripts, readme etc will follow.
Maven, bom, CI integrations etc will be performed separately.