Skip to content

Conversation

@sreekanth-db
Copy link
Collaborator

@sreekanth-db sreekanth-db commented Dec 22, 2025

Description

This PR enhances geospatial datatype handling to include SRID (Spatial Reference System Identifier) information in column type names and fixes multiple issues related to complex datatype handling across different result formats.

Key Changes

  1. Geospatial Type Name Enhancement

    • Column type names now include SRID: GEOMETRY(4326) instead of GEOMETRY
    • Applies to both GEOMETRY and GEOGRAPHY types
    • Preserves full type information in metadata for better type identification
  2. SEA Inline Mode Complex Type Fix

    • Fixed issue where complex types (ARRAY, MAP, STRUCT) were not returned as complex objects in SEA Inline mode (JSON array result format)
    • Now properly converts to complex datatype objects when EnableComplexDatatypeSupport=true
  3. Thrift CloudFetch Metadata Enhancement

    • Fixed error when extracting type details (e.g., INT from ARRAY<INT>) in Thrift CloudFetch mode
    • Enhanced getColumnInfoFromTColumnDesc() to use Arrow schema metadata alongside TColumnDesc
    • Arrow schema provides complete type information (e.g., ARRAY<INT>) while TColumnDesc only contains base type (e.g., ARRAY)
  4. Arrow Metadata Extraction

    • Added DatabricksThriftUtil.getArrowMetadata() to deserialize Arrow schema from TGetResultSetMetadataResp
    • Fixed null arrow metadata issue in DatabricksResultSet constructor for Thrift CloudFetch mode

Testing

Unit Tests

  • All existing unit tests pass and additional tests are added for new methods

Integration Tests

  • GeospatialTests.java - Comprehensive E2E integration test
    • Tests geospatial types (GEOMETRY and GEOGRAPHY)
    • Validates 24 configuration combinations:
      • Protocol: Thrift / SEA
      • Serialization: Arrow / Inline
      • CloudFetch: Enabled / Disabled (only with Arrow, as CloudFetch requires Arrow)
      • GeoSpatial Support: Enabled / Disabled
      • Complex Type Support: Enabled / Disabled
    • Validates metadata: column types, type names, class names
    • Validates values: WKT representation, SRID
    • Validates behavior when geospatial objects are enabled vs. disabled (STRING fallback)
    • All 24 tests pass

Additional Notes to the Reviewer

Other required details are mentioned in comments in the diff

Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
return parser.parseJsonStringToDbStruct(object.toString(), arrowMetadata);
}

private static AbstractDatabricksGeospatial convertToGeospatial(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing duplicate code, using the same logic present in GeospatialConverter

long rowSize = executionResult.getRowCount();
List<String> arrowMetadata = null;
if (executionResult instanceof ArrowStreamResult) {
arrowMetadata = ((ArrowStreamResult) executionResult).getArrowMetadata();
Copy link
Collaborator Author

@sreekanth-db sreekanth-db Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

((ArrowStreamResult) executionResult).getArrowMetadata() will always be null for cloud fetch mode at this point in the code flow. So getting arrow metadata from TGetResultSetMetadataResp

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will getResultSetMetadata always be present?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should have TGetResultSetMetadataResp object in TFetchResultsResp. Anyways we are handling this case in getArrowMetadata method to check for null values

* @return true if the type name starts with ARRAY, MAP, STRUCT, GEOMETRY, or GEOGRAPHY, false
* otherwise
*/
private static boolean isComplexType(String typeName) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moving this method to DatabricksTypeUtil for using in other classes

return complexDatatypeSupport ? obj : obj.toString();
}

private Object handleComplexDataTypesForSEAInline(Object obj, String columnName)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating this method to always return complex datatype, if complex mode is disabled, caller will convert to string

typeText = "STRING";
}

// store base type eg. DECIMAL instead of DECIMAL(7,2) except for geospatial datatypes
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For geospatial, we want SRID info (in paranthesis) to be present. eg: GEOMETRY(4326)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be a case that SRID info might be missing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, even for SRID 0, we get GEOMETRY(0)

columnIndex++) {
TColumnDesc columnDesc = resultManifest.getSchema().getColumns().get(columnIndex);

ColumnInfo columnInfo = getColumnInfoFromTColumnDesc(columnDesc);
Copy link
Collaborator Author

@sreekanth-db sreekanth-db Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TColumnDesc doesn't have complete information about the column metadata, hence enhancing it using the arrowMetadata extracted from the TGetResultSetMetadataResp. This is required for methods which use ColumnInfo later in the flow (thrift + cloudfetch + complex)

.columnTypeClassName("java.lang.String")
.columnType(Types.OTHER)
.columnTypeText(VARIANT);
} else if (isGeometryColumn(arrowMetadata, columnIndex)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is handled with updates to the column info, no separate handling is required

Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
@sreekanth-db sreekanth-db changed the title geospatial column type name with srid SRID in geospatial column type name Dec 22, 2025
@sreekanth-db sreekanth-db changed the title SRID in geospatial column type name [PECOBLR-1377] SRID in geospatial column type name Dec 30, 2025

if (arrowMetadata != null && isComplexType(arrowMetadata)) {
typeText = arrowMetadata;
if (arrowMetadata.startsWith(GEOMETRY)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to check if Geospatial flag is enabled?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an inner layer, so flag check is not required here. we are checking for geospatial flags in DatabricksResultSetMetaData which is the interface through which users get metadata info

.map(e -> e.get(ARROW_METADATA_KEY))
.collect(Collectors.toList());
} catch (IOException e) {
throw new DatabricksSQLException(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add logging

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added error log

Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants