-
Notifications
You must be signed in to change notification settings - Fork 27
[PECOBLR-1377] SRID in geospatial column type name #1157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
| return parser.parseJsonStringToDbStruct(object.toString(), arrowMetadata); | ||
| } | ||
|
|
||
| private static AbstractDatabricksGeospatial convertToGeospatial( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removing duplicate code, using the same logic present in GeospatialConverter
| long rowSize = executionResult.getRowCount(); | ||
| List<String> arrowMetadata = null; | ||
| if (executionResult instanceof ArrowStreamResult) { | ||
| arrowMetadata = ((ArrowStreamResult) executionResult).getArrowMetadata(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
((ArrowStreamResult) executionResult).getArrowMetadata() will always be null for cloud fetch mode at this point in the code flow. So getting arrow metadata from TGetResultSetMetadataResp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will getResultSetMetadata always be present?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we should have TGetResultSetMetadataResp object in TFetchResultsResp. Anyways we are handling this case in getArrowMetadata method to check for null values
| * @return true if the type name starts with ARRAY, MAP, STRUCT, GEOMETRY, or GEOGRAPHY, false | ||
| * otherwise | ||
| */ | ||
| private static boolean isComplexType(String typeName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moving this method to DatabricksTypeUtil for using in other classes
| return complexDatatypeSupport ? obj : obj.toString(); | ||
| } | ||
|
|
||
| private Object handleComplexDataTypesForSEAInline(Object obj, String columnName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updating this method to always return complex datatype, if complex mode is disabled, caller will convert to string
| typeText = "STRING"; | ||
| } | ||
|
|
||
| // store base type eg. DECIMAL instead of DECIMAL(7,2) except for geospatial datatypes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For geospatial, we want SRID info (in paranthesis) to be present. eg: GEOMETRY(4326)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will there be a case that SRID info might be missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, even for SRID 0, we get GEOMETRY(0)
| columnIndex++) { | ||
| TColumnDesc columnDesc = resultManifest.getSchema().getColumns().get(columnIndex); | ||
|
|
||
| ColumnInfo columnInfo = getColumnInfoFromTColumnDesc(columnDesc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TColumnDesc doesn't have complete information about the column metadata, hence enhancing it using the arrowMetadata extracted from the TGetResultSetMetadataResp. This is required for methods which use ColumnInfo later in the flow (thrift + cloudfetch + complex)
| .columnTypeClassName("java.lang.String") | ||
| .columnType(Types.OTHER) | ||
| .columnTypeText(VARIANT); | ||
| } else if (isGeometryColumn(arrowMetadata, columnIndex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is handled with updates to the column info, no separate handling is required
Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
|
|
||
| if (arrowMetadata != null && isComplexType(arrowMetadata)) { | ||
| typeText = arrowMetadata; | ||
| if (arrowMetadata.startsWith(GEOMETRY)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to check if Geospatial flag is enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an inner layer, so flag check is not required here. we are checking for geospatial flags in DatabricksResultSetMetaData which is the interface through which users get metadata info
| .map(e -> e.get(ARROW_METADATA_KEY)) | ||
| .collect(Collectors.toList()); | ||
| } catch (IOException e) { | ||
| throw new DatabricksSQLException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add logging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added error log
Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
Description
This PR enhances geospatial datatype handling to include SRID (Spatial Reference System Identifier) information in column type names and fixes multiple issues related to complex datatype handling across different result formats.
Key Changes
Geospatial Type Name Enhancement
GEOMETRY(4326)instead ofGEOMETRYSEA Inline Mode Complex Type Fix
EnableComplexDatatypeSupport=trueThrift CloudFetch Metadata Enhancement
INTfromARRAY<INT>) in Thrift CloudFetch modegetColumnInfoFromTColumnDesc()to use Arrow schema metadata alongsideTColumnDescARRAY<INT>) whileTColumnDesconly contains base type (e.g.,ARRAY)Arrow Metadata Extraction
DatabricksThriftUtil.getArrowMetadata()to deserialize Arrow schema fromTGetResultSetMetadataRespDatabricksResultSetconstructor for Thrift CloudFetch modeTesting
Unit Tests
Integration Tests
GeospatialTests.java- Comprehensive E2E integration testAdditional Notes to the Reviewer
Other required details are mentioned in comments in the diff