[feature](geo) support 3 spatial functions: ST_Distance, ST_GeometryType, ST_Length#60170
[feature](geo) support 3 spatial functions: ST_Distance, ST_GeometryType, ST_Length#60170zxc20041 wants to merge 15 commits intoapache:masterfrom
Conversation
…; run clang-format.sh to format code
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
There was a problem hiding this comment.
Pull request overview
This PR implements three spatial SQL functions (ST_Distance, ST_GeometryType, ST_Length) to improve compatibility with other database systems like Trino, Presto, and PostgreSQL, as part of the broader effort outlined in issue #48203.
Changes:
- Adds ST_Length function to calculate the length/perimeter of geometry objects (in meters for LineString/Polygon/Circle, 0 for Point)
- Adds ST_GeometryType function to return the geometry type name (e.g., "ST_POINT", "ST_LINESTRING")
- Adds ST_Distance function to calculate the geodesic distance between two geometry objects (in meters)
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| be/src/geo/geo_types.h | Adds virtual method declarations for GeometryType(), Length(), and Distance() to the GeoShape base class and implements them in all derived classes |
| be/src/geo/st_length.cpp | Implements Length() calculations for all geometry types using S2Earth library for geodesic measurements |
| be/src/geo/st_distance.cpp | Implements Distance() calculations between all geometry type combinations with helper functions for point-to-segment, point-to-polyline, and point-to-polygon distances |
| be/src/geo/CMakeLists.txt | Updates build configuration to include the new st_length.cpp and st_distance.cpp source files |
| be/src/vec/functions/functions_geo.cpp | Registers the three new functions (StLength, StGeometryType, StDistance) with the function factory |
| be/test/geo/geo_types_test.cpp | Adds comprehensive C++ unit tests covering all geometry types and type combinations for the new functions |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/scalar/StLength.java | Frontend scalar function class for ST_Length with proper signature (VARCHAR -> DOUBLE) |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/scalar/StGeometryType.java | Frontend scalar function class for ST_GeometryType with proper signature (VARCHAR -> VARCHAR) |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/scalar/StDistance.java | Frontend scalar function class for ST_Distance with proper signature (VARCHAR, VARCHAR -> DOUBLE) |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/visitor/ScalarFunctionVisitor.java | Adds visitor methods for the three new function types |
| fe/fe-core/src/main/java/org/apache/doris/catalog/BuiltinScalarFunctions.java | Registers the three new scalar functions in the builtin function catalog |
| regression-test/suites/nereids_p0/sql_functions/spatial_functions/test_gis_function.groovy | Adds extensive regression test cases covering all geometry type combinations for the new functions |
| regression-test/data/nereids_p0/sql_functions/spatial_functions/test_gis_function.out | Expected output for the regression tests |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
TPC-H: Total hot run time: 31037 ms |
|
oops, I forgot to modify be-ut testcase after changing geometry_type |
TPC-DS: Total hot run time: 172396 ms |
ClickBench: Total hot run time: 26.78 s |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
TPC-H: Total hot run time: 31328 ms |
TPC-DS: Total hot run time: 172939 ms |
ClickBench: Total hot run time: 26.8 s |
FE UT Coverage ReportIncrement line coverage |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
TPC-H: Total hot run time: 32889 ms |
ClickBench: Total hot run time: 28.73 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
TPC-H: Total hot run time: 32826 ms |
ClickBench: Total hot run time: 27.94 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
be/src/geo/geo_types.cpp
Outdated
| if (min_distance <= circle_radius + TOLERANCE) { | ||
| return 0.0; | ||
| } | ||
| return std::max(0.0, min_distance - circle_radius); |
There was a problem hiding this comment.
Why std::max?
And I think we should also use TOLERANCE in other DISTANCE implementation.
| return 2.0 * M_PI * radius_meters; | ||
| } | ||
|
|
||
| double GeoPoint::Distance(const GeoShape* rhs) const { |
There was a problem hiding this comment.
Can the following types directly reuse the implementation of Point's Distance, rather than implementing the same logic again?
ditto for other types.
There was a problem hiding this comment.
reuse and remove duplicated implementations
| return; | ||
| } | ||
| } | ||
| res->insert_value(shapes[0]->Distance(shapes[1].get())); |
There was a problem hiding this comment.
I see that the Distance function may return -1 (not now though), here we should add a check for the error return value and it's best to marked as [[unlikely]].
be/src/geo/geo_types.cpp
Outdated
| case GEO_SHAPE_POLYGON: { | ||
| const GeoPolygon* other = static_cast<const GeoPolygon*>(rhs); | ||
| if (_polygon->Intersects(*other->polygon()) || | ||
| polygon_touch_polygon(_polygon.get(), other->polygon())) { |
There was a problem hiding this comment.
I think is unnecessary to do the polygon_touch_polygon check, it checks by calculating the distance, which is the quit same as the following logic.
There was a problem hiding this comment.
remove this polygon_touch_polygon check
|
|
||
| static void const_vector(const ColumnPtr& left_column, const ColumnPtr& right_column, | ||
| ColumnFloat64::MutablePtr& res, NullMap& null_map, const size_t size) { | ||
| auto lhs_value = left_column->get_data_at(0); |
There was a problem hiding this comment.
cast to ColumnString like you did in vector_vector
| static bool decode_shape(const StringRef& value, std::unique_ptr<GeoShape>& shape) { | ||
| shape = GeoShape::from_encoded(value.data, value.size); | ||
| return static_cast<bool>(shape); | ||
| } | ||
|
|
||
| static void loop_do(StringRef& lhs_value, StringRef& rhs_value, | ||
| std::vector<std::unique_ptr<GeoShape>>& shapes, | ||
| ColumnFloat64::MutablePtr& res, NullMap& null_map, int row) { |
|
run buildall |
TPC-H: Total hot run time: 31658 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
ClickBench: Total hot run time: 28.5 s |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run nonConcurrent |
|
PR approved by anyone and no changes requested. |
What problem does this PR solve?
Issue Number: close #48203
Related PR: #48695
Problem Summary:
Support for ST_Distance, ST_GeometryType, ST_Length sql functions.
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)