-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[feature](geo) support 3 spatial functions: ST_Distance, ST_GeometryType, ST_Length #60170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…; run clang-format.sh to format code
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements three spatial SQL functions (ST_Distance, ST_GeometryType, ST_Length) to improve compatibility with other database systems like Trino, Presto, and PostgreSQL, as part of the broader effort outlined in issue #48203.
Changes:
- Adds ST_Length function to calculate the length/perimeter of geometry objects (in meters for LineString/Polygon/Circle, 0 for Point)
- Adds ST_GeometryType function to return the geometry type name (e.g., "ST_POINT", "ST_LINESTRING")
- Adds ST_Distance function to calculate the geodesic distance between two geometry objects (in meters)
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| be/src/geo/geo_types.h | Adds virtual method declarations for GeometryType(), Length(), and Distance() to the GeoShape base class and implements them in all derived classes |
| be/src/geo/st_length.cpp | Implements Length() calculations for all geometry types using S2Earth library for geodesic measurements |
| be/src/geo/st_distance.cpp | Implements Distance() calculations between all geometry type combinations with helper functions for point-to-segment, point-to-polyline, and point-to-polygon distances |
| be/src/geo/CMakeLists.txt | Updates build configuration to include the new st_length.cpp and st_distance.cpp source files |
| be/src/vec/functions/functions_geo.cpp | Registers the three new functions (StLength, StGeometryType, StDistance) with the function factory |
| be/test/geo/geo_types_test.cpp | Adds comprehensive C++ unit tests covering all geometry types and type combinations for the new functions |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/scalar/StLength.java | Frontend scalar function class for ST_Length with proper signature (VARCHAR -> DOUBLE) |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/scalar/StGeometryType.java | Frontend scalar function class for ST_GeometryType with proper signature (VARCHAR -> VARCHAR) |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/scalar/StDistance.java | Frontend scalar function class for ST_Distance with proper signature (VARCHAR, VARCHAR -> DOUBLE) |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/visitor/ScalarFunctionVisitor.java | Adds visitor methods for the three new function types |
| fe/fe-core/src/main/java/org/apache/doris/catalog/BuiltinScalarFunctions.java | Registers the three new scalar functions in the builtin function catalog |
| regression-test/suites/nereids_p0/sql_functions/spatial_functions/test_gis_function.groovy | Adds extensive regression test cases covering all geometry type combinations for the new functions |
| regression-test/data/nereids_p0/sql_functions/spatial_functions/test_gis_function.out | Expected output for the regression tests |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
TPC-H: Total hot run time: 31037 ms |
|
oops, I forgot to modify be-ut testcase after changing geometry_type |
TPC-DS: Total hot run time: 172396 ms |
ClickBench: Total hot run time: 26.78 s |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
TPC-H: Total hot run time: 31328 ms |
TPC-DS: Total hot run time: 172939 ms |
ClickBench: Total hot run time: 26.8 s |
FE UT Coverage ReportIncrement line coverage |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
| qt_sql "SELECT ST_ANGLE_SPHERE(116.35620117, 39.939093, 116.4274406433, 39.9020987219);" | ||
| qt_sql "SELECT ST_ANGLE_SPHERE(0, 0, 45, 0);" | ||
|
|
||
| // ST_Length tests for all geometry types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need more testcases of data in tables.
| const size_t size) { | ||
| std::vector<std::unique_ptr<GeoShape>> shapes(2); | ||
| for (int row = 0; row < size; ++row) { | ||
| auto lhs_value = left_column->get_data_at(row); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all get_data_at here may be virtual function. you can assert_cast column ptrs and pass with concrete types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, shall I use assert_cast() or check_and_get_column()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when you know what the type this column should be, use assert_cast.
be/src/geo/st_length.cpp
Outdated
| double GeoPolygon::Length() const { | ||
| // GeoPolygon is always valid with at least one loop (guaranteed by constructor) | ||
| double perimeter = 0.0; | ||
| const S2Loop* outer_loop = _polygon->loop(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why only consider loop0? what if compute with others within holes?
be/src/geo/st_distance.cpp
Outdated
| return S2Earth::GetDistanceMeters(point_ll, start_ll); | ||
| } | ||
|
|
||
| double t = ((px - x1) * dx + (py - y1) * dy) / (dx * dx + dy * dy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's this a planar projection? I think lat and lng have no absolute relation with distance. so maybe directly use S2Earth's function?
be/src/geo/st_distance.cpp
Outdated
| const GeoPoint* point = static_cast<const GeoPoint*>(rhs); | ||
| return distance_point_to_polyline(*point->point(), _polyline.get()); | ||
| } | ||
| case GEO_SHAPE_LINE_STRING: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if line crosses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plygon-polygon and circle-line seem to have this issue as well
| ColumnFloat64::MutablePtr& res, NullMap& null_map, int row) { | ||
| StringRef* strs[2] = {&lhs_value, &rhs_value}; | ||
| for (int i = 0; i < 2; ++i) { | ||
| std::unique_ptr<GeoShape> shape(GeoShape::from_encoded(strs[i]->data, strs[i]->size)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we decode const column outside the loop to avoid unnecessary repeated calculations?
…ction short-circuits; include all polygon loops in length; refactor vectorized StDistance and move implementations into geo_types.cpp;
|
btw, @zxc20041 if you'd like to easier communicate, you can add my wechat torch-wood_ |
| return -1.0; | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now lines and polygons will consider Intersect first
|
|
||
| return total_length; | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
polygon will sum all loops
| qt_sql "SELECT ST_AsText(ST_GeomFromWKB(ST_AsBinary(ST_GeometryFromText(\"LINESTRING (1 1, 2 2)\"))));" | ||
| qt_sql "SELECT ST_AsText(ST_GeomFromWKB(ST_AsBinary(ST_Polygon(\"POLYGON ((114.104486 22.547119,114.093758 22.547753,114.096504 22.532057,114.104229 22.539826,114.106203 22.542680,114.104486 22.547119))\"))));" | ||
|
|
||
| // table-driven tests for ST_Length/ST_GeometryType/ST_Distance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create a table to test 3 functions
|
run buildall |
TPC-H: Total hot run time: 32889 ms |
ClickBench: Total hot run time: 28.73 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
TPC-H: Total hot run time: 32826 ms |
ClickBench: Total hot run time: 27.94 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
What problem does this PR solve?
Issue Number: close #48203
Related PR: #48695
Problem Summary:
Support for ST_Distance, ST_GeometryType, ST_Length sql functions.
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)