diff --git a/GEMINI.md b/GEMINI.md new file mode 100644 index 0000000000..8c37580c7f --- /dev/null +++ b/GEMINI.md @@ -0,0 +1,145 @@ +# Contribution guidelines, tailored for LLM agents + +## Testing + +We use `nox` to instrument our tests. + +- To test your changes, run unit tests with `nox`: + + ```bash + nox -r -s unit + ``` + +- To run a single unit test: + + ```bash + nox -r -s unit-3.13 -- -k + ``` + +- To run system tests, you can execute:: + + # Run all system tests + $ nox -r -s system + + # Run a single system test + $ nox -r -s system-3.13 -- -k + +- The codebase must have better coverage than it had previously after each + change. You can test coverage via `nox -s unit system cover` (takes a long + time). + +## Code Style + +- We use the automatic code formatter `black`. You can run it using + the nox session `format`. This will eliminate many lint errors. Run via: + + ```bash + nox -r -s format + ``` + +- PEP8 compliance is required, with exceptions defined in the linter configuration. + If you have ``nox`` installed, you can test that you have not introduced + any non-compliant code via: + + ``` + nox -r -s lint + ``` + +## Documentation + +If a method or property is implementing the same interface as a third-party +package such as pandas or scikit-learn, place the relevant docstring in the +corresponding `third_party/bigframes_vendored/package_name` directory, not in +the `bigframes` directory. Implementations may be placed in the `bigframes` +directory, though. + +### Testing code samples + +Code samples are very important for accurate documentation. We use the "doctest" +framework to ensure the samples are functioning as expected. After adding a code +sample, please ensure it is correct by running doctest. To run the samples +doctests for just a single method, refer to the following example: + +```bash +pytest --doctest-modules bigframes/pandas/__init__.py::bigframes.pandas.cut +``` + +## Tips for implementing common BigFrames features + +### Adding a scalar operator + +For an example, see commit +[c5b7fdae74a22e581f7705bc0cf5390e928f4425](https://github.com/googleapis/python-bigquery-dataframes/commit/c5b7fdae74a22e581f7705bc0cf5390e928f4425). + +To add a new scalar operator, follow these steps: + +1. **Define the operation dataclass:** + - In `bigframes/operations/`, find the relevant file (e.g., `geo_ops.py` for geography functions) or create a new one. + - Create a new dataclass inheriting from `base_ops.UnaryOp` for unary + operators, `base_ops.BinaryOp` for binary operators, `base_ops.TernaryOp` + for ternary operators, or `base_ops.NaryOp for operators with many + arguments. Note that these operators are counting the number column-like + arguments. A function that takes only a single column but several literal + values would still be a `UnaryOp`. + - Define the `name` of the operation and any parameters it requires. + - Implement the `output_type` method to specify the data type of the result. + +2. **Export the new operation:** + - In `bigframes/operations/__init__.py`, import your new operation dataclass and add it to the `__all__` list. + +3. **Implement the user-facing function (pandas-like):** + + - Identify the canonical function from pandas / geopandas / awkward array / + other popular Python package that this operator implements. + - Find the corresponding class in BigFrames. For example, the implementation + for most geopandas.GeoSeries methods is in + `bigframes/geopandas/geoseries.py`. Pandas Series methods are implemented + in `bigframes/series.py` or one of the accessors, such as `StringMethods` + in `bigframes/operations/strings.py`. + - Create the user-facing function that will be called by users (e.g., `length`). + - If the SQL method differs from pandas or geopandas in a way that can't be + made the same, raise a `NotImplementedError` with an appropriate message and + link to the feedback form. + - Add the docstring to the corresponding file in + `third_party/bigframes_vendored`, modeled after pandas / geopandas. + +4. **Implement the user-facing function (SQL-like):** + + - In `bigframes/bigquery/_operations/`, find the relevant file (e.g., `geo.py`) or create a new one. + - Create the user-facing function that will be called by users (e.g., `st_length`). + - This function should take a `Series` for any column-like inputs, plus any other parameters. + - Inside the function, call `series._apply_unary_op`, + `series._apply_binary_op`, or similar passing the operation dataclass you + created. + - Add a comprehensive docstring with examples. + - In `bigframes/bigquery/__init__.py`, import your new user-facing function and add it to the `__all__` list. + +5. **Implement the compilation logic:** + - In `bigframes/core/compile/scalar_op_compiler.py`: + - If the BigQuery function has a direct equivalent in Ibis, you can often reuse an existing Ibis method. + - If not, define a new Ibis UDF using `@ibis_udf.scalar.builtin` to map to the specific BigQuery function signature. + - Create a new compiler implementation function (e.g., `geo_length_op_impl`). + - Register this function to your operation dataclass using `@scalar_op_compiler.register_unary_op` or `@scalar_op_compiler.register_binary_op`. + - This implementation will translate the BigQuery DataFrames operation into the appropriate Ibis expression. + +6. **Add Tests:** + - Add system tests in the `tests/system/` directory to verify the end-to-end + functionality of the new operator. Test various inputs, including edge cases + and `NULL` values. + + Where possible, run the same test code against pandas or GeoPandas and + compare that the outputs are the same (except for dtypes if BigFrames + differs from pandas). + - If you are overriding a pandas or GeoPandas property, add a unit test to + ensure the correct behavior (e.g., raising `NotImplementedError` if the + functionality is not supported). + + +## Constraints + +- Only add git commits. Do not change git history. +- Follow the spec file for development. + - Check off items in the "Acceptance + criteria" and "Detailed steps" sections with `[x]`. + - Please do this as they are completed. + - Refer back to the spec after each step. diff --git a/bigframes/bigquery/__init__.py b/bigframes/bigquery/__init__.py index 7ca7fb693b..dbaea57005 100644 --- a/bigframes/bigquery/__init__.py +++ b/bigframes/bigquery/__init__.py @@ -29,6 +29,9 @@ ) from bigframes.bigquery._operations.geo import ( st_area, + st_buffer, + st_centroid, + st_convexhull, st_difference, st_distance, st_intersection, @@ -54,11 +57,18 @@ # approximate aggregate ops "approx_top_count", # array ops - "array_length", "array_agg", + "array_length", "array_to_string", + # datetime ops + "unix_micros", + "unix_millis", + "unix_seconds", # geo ops "st_area", + "st_buffer", + "st_centroid", + "st_convexhull", "st_difference", "st_distance", "st_intersection", @@ -81,8 +91,4 @@ "sql_scalar", # struct ops "struct", - # datetime ops - "unix_micros", - "unix_millis", - "unix_seconds", ] diff --git a/bigframes/bigquery/_operations/geo.py b/bigframes/bigquery/_operations/geo.py index bdc85eed9f..9a92a8960d 100644 --- a/bigframes/bigquery/_operations/geo.py +++ b/bigframes/bigquery/_operations/geo.py @@ -103,6 +103,187 @@ def st_area( return series +def st_buffer( + series: Union[bigframes.series.Series, bigframes.geopandas.GeoSeries], + buffer_radius: float, + num_seg_quarter_circle: float = 8.0, + use_spheroid: bool = False, +) -> bigframes.series.Series: + """ + Computes a `GEOGRAPHY` that represents all points whose distance from the + input `GEOGRAPHY` is less than or equal to `distance` meters. + + .. note:: + BigQuery's Geography functions, like `st_buffer`, interpret the geometry + data type as a point set on the Earth's surface. A point set is a set + of points, lines, and polygons on the WGS84 reference spheroid, with + geodesic edges. See: https://cloud.google.com/bigquery/docs/geospatial-data + + **Examples:** + + >>> import bigframes.geopandas + >>> import bigframes.pandas as bpd + >>> import bigframes.bigquery as bbq + >>> from shapely.geometry import Point + >>> bpd.options.display.progress_bar = None + + >>> series = bigframes.geopandas.GeoSeries( + ... [ + ... Point(0, 0), + ... Point(1, 1), + ... ] + ... ) + >>> series + 0 POINT (0 0) + 1 POINT (1 1) + dtype: geometry + + >>> buffer = bbq.st_buffer(series, 100) + >>> bbq.st_area(buffer) > 0 + 0 True + 1 True + dtype: boolean + + Args: + series (bigframes.pandas.Series | bigframes.geopandas.GeoSeries): + A series containing geography objects. + buffer_radius (float): + The distance in meters. + num_seg_quarter_circle (float, optional): + Specifies the number of segments that are used to approximate a + quarter circle. The default value is 8.0. + use_spheroid (bool, optional): + Determines how this function measures distance. If use_spheroid is + FALSE, the function measures distance on the surface of a perfect + sphere. The use_spheroid parameter currently only supports the + value FALSE. The default value of use_spheroid is FALSE. + + Returns: + bigframes.pandas.Series: + A series of geography objects representing the buffered geometries. + """ + op = ops.GeoStBufferOp( + buffer_radius=buffer_radius, + num_seg_quarter_circle=num_seg_quarter_circle, + use_spheroid=use_spheroid, + ) + series = series._apply_unary_op(op) + series.name = None + return series + + +def st_centroid( + series: Union[bigframes.series.Series, bigframes.geopandas.GeoSeries], +) -> bigframes.series.Series: + """ + Computes the geometric centroid of a `GEOGRAPHY` type. + + For `POINT` and `MULTIPOINT` types, this is the arithmetic mean of the + input coordinates. For `LINESTRING` and `POLYGON` types, this is the + center of mass. For `GEOMETRYCOLLECTION` types, this is the center of + mass of the collection's elements. + + .. note:: + BigQuery's Geography functions, like `st_centroid`, interpret the geometry + data type as a point set on the Earth's surface. A point set is a set + of points, lines, and polygons on the WGS84 reference spheroid, with + geodesic edges. See: https://cloud.google.com/bigquery/docs/geospatial-data + + **Examples:** + + >>> import bigframes.geopandas + >>> import bigframes.pandas as bpd + >>> import bigframes.bigquery as bbq + >>> from shapely.geometry import Polygon, LineString, Point + >>> bpd.options.display.progress_bar = None + + >>> series = bigframes.geopandas.GeoSeries( + ... [ + ... Polygon([(0.0, 0.0), (0.1, 0.1), (0.0, 0.1)]), + ... LineString([(0, 0), (1, 1), (0, 1)]), + ... Point(0, 1), + ... ] + ... ) + >>> series + 0 POLYGON ((0 0, 0.1 0.1, 0 0.1, 0 0)) + 1 LINESTRING (0 0, 1 1, 0 1) + 2 POINT (0 1) + dtype: geometry + + >>> bbq.st_centroid(series) + 0 POINT (0.03333 0.06667) + 1 POINT (0.49998 0.70712) + 2 POINT (0 1) + dtype: geometry + + Args: + series (bigframes.pandas.Series | bigframes.geopandas.GeoSeries): + A series containing geography objects. + + Returns: + bigframes.pandas.Series: + A series of geography objects representing the centroids. + """ + series = series._apply_unary_op(ops.geo_st_centroid_op) + series.name = None + return series + + +def st_convexhull( + series: Union[bigframes.series.Series, bigframes.geopandas.GeoSeries], +) -> bigframes.series.Series: + """ + Computes the convex hull of a `GEOGRAPHY` type. + + The convex hull is the smallest convex set that contains all of the + points in the input `GEOGRAPHY`. + + .. note:: + BigQuery's Geography functions, like `st_convexhull`, interpret the geometry + data type as a point set on the Earth's surface. A point set is a set + of points, lines, and polygons on the WGS84 reference spheroid, with + geodesic edges. See: https://cloud.google.com/bigquery/docs/geospatial-data + + **Examples:** + + >>> import bigframes.geopandas + >>> import bigframes.pandas as bpd + >>> import bigframes.bigquery as bbq + >>> from shapely.geometry import Polygon, LineString, Point + >>> bpd.options.display.progress_bar = None + + >>> series = bigframes.geopandas.GeoSeries( + ... [ + ... Polygon([(0.0, 0.0), (0.1, 0.1), (0.0, 0.1)]), + ... LineString([(0, 0), (1, 1), (0, 1)]), + ... Point(0, 1), + ... ] + ... ) + >>> series + 0 POLYGON ((0 0, 0.1 0.1, 0 0.1, 0 0)) + 1 LINESTRING (0 0, 1 1, 0 1) + 2 POINT (0 1) + dtype: geometry + + >>> bbq.st_convexhull(series) + 0 POLYGON ((0 0, 0.1 0.1, 0 0.1, 0 0)) + 1 POLYGON ((0 0, 1 1, 0 1, 0 0)) + 2 POINT (0 1) + dtype: geometry + + Args: + series (bigframes.pandas.Series | bigframes.geopandas.GeoSeries): + A series containing geography objects. + + Returns: + bigframes.pandas.Series: + A series of geography objects representing the convex hulls. + """ + series = series._apply_unary_op(ops.geo_st_convexhull_op) + series.name = None + return series + + def st_difference( series: Union[bigframes.series.Series, bigframes.geopandas.GeoSeries], other: Union[ diff --git a/bigframes/core/compile/scalar_op_compiler.py b/bigframes/core/compile/scalar_op_compiler.py index 95517ead35..c3430120cf 100644 --- a/bigframes/core/compile/scalar_op_compiler.py +++ b/bigframes/core/compile/scalar_op_compiler.py @@ -1038,6 +1038,26 @@ def geo_st_boundary_op_impl(x: ibis_types.Value): return st_boundary(x) +@scalar_op_compiler.register_unary_op(ops.GeoStBufferOp, pass_op=True) +def geo_st_buffer_op_impl(x: ibis_types.Value, op: ops.GeoStBufferOp): + return st_buffer( + x, + op.buffer_radius, + op.num_seg_quarter_circle, + op.use_spheroid, + ) + + +@scalar_op_compiler.register_unary_op(ops.geo_st_centroid_op, pass_op=False) +def geo_st_centroid_op_impl(x: ibis_types.Value): + return typing.cast(ibis_types.GeoSpatialValue, x).centroid() + + +@scalar_op_compiler.register_unary_op(ops.geo_st_convexhull_op, pass_op=False) +def geo_st_convexhull_op_impl(x: ibis_types.Value): + return st_convexhull(x) + + @scalar_op_compiler.register_binary_op(ops.geo_st_difference_op, pass_op=False) def geo_st_difference_op_impl(x: ibis_types.Value, y: ibis_types.Value): return typing.cast(ibis_types.GeoSpatialValue, x).difference( @@ -2116,6 +2136,12 @@ def _ibis_num(number: float): return typing.cast(ibis_types.NumericValue, ibis_types.literal(number)) +@ibis_udf.scalar.builtin +def st_convexhull(x: ibis_dtypes.geography) -> ibis_dtypes.geography: # type: ignore + """ST_CONVEXHULL""" + ... + + @ibis_udf.scalar.builtin def st_geogfromtext(a: str) -> ibis_dtypes.geography: # type: ignore """Convert string to geography.""" @@ -2136,6 +2162,16 @@ def st_boundary(a: ibis_dtypes.geography) -> ibis_dtypes.geography: # type: ign """Find the boundary of a geography.""" +@ibis_udf.scalar.builtin +def st_buffer( + geography: ibis_dtypes.geography, # type: ignore + buffer_radius: ibis_dtypes.Float64, + num_seg_quarter_circle: ibis_dtypes.Float64, + use_spheroid: ibis_dtypes.Boolean, +) -> ibis_dtypes.geography: # type: ignore + ... + + @ibis_udf.scalar.builtin def st_distance(a: ibis_dtypes.geography, b: ibis_dtypes.geography, use_spheroid: bool) -> ibis_dtypes.float: # type: ignore """Convert string to geography.""" diff --git a/bigframes/geopandas/geoseries.py b/bigframes/geopandas/geoseries.py index 2999625cda..f3558e4b34 100644 --- a/bigframes/geopandas/geoseries.py +++ b/bigframes/geopandas/geoseries.py @@ -13,13 +13,15 @@ # limitations under the License. from __future__ import annotations +from typing import Optional + import bigframes_vendored.constants as constants import bigframes_vendored.geopandas.geoseries as vendored_geoseries import geopandas.array # type: ignore -import bigframes.geopandas import bigframes.operations as ops import bigframes.series +import bigframes.session class GeoSeries(vendored_geoseries.GeoSeries, bigframes.series.Series): @@ -73,8 +75,14 @@ def is_closed(self) -> bigframes.series.Series: ) @classmethod - def from_wkt(cls, data, index=None) -> GeoSeries: - series = bigframes.series.Series(data, index=index) + def from_wkt( + cls, + data, + index=None, + *, + session: Optional[bigframes.session.Session] = None, + ) -> GeoSeries: + series = bigframes.series.Series(data, index=index, session=session) return cls(series._apply_unary_op(ops.geo_st_geogfromtext_op)) @@ -92,6 +100,19 @@ def to_wkt(self: GeoSeries) -> bigframes.series.Series: series.name = None return series + def buffer(self: GeoSeries, distance: float) -> bigframes.series.Series: # type: ignore + raise NotImplementedError( + f"GeoSeries.buffer is not supported. Use bigframes.bigquery.st_buffer(series, distance), instead. {constants.FEEDBACK_LINK}" + ) + + @property + def centroid(self: GeoSeries) -> bigframes.series.Series: # type: ignore + return self._apply_unary_op(ops.geo_st_centroid_op) + + @property + def convex_hull(self: GeoSeries) -> bigframes.series.Series: # type: ignore + return self._apply_unary_op(ops.geo_st_convexhull_op) + def difference(self: GeoSeries, other: GeoSeries) -> bigframes.series.Series: # type: ignore return self._apply_binary_op(other, ops.geo_st_difference_op) diff --git a/bigframes/operations/__init__.py b/bigframes/operations/__init__.py index 86098d47cf..e10a972790 100644 --- a/bigframes/operations/__init__.py +++ b/bigframes/operations/__init__.py @@ -94,6 +94,8 @@ geo_area_op, geo_st_astext_op, geo_st_boundary_op, + geo_st_centroid_op, + geo_st_convexhull_op, geo_st_difference_op, geo_st_geogfromtext_op, geo_st_geogpoint_op, @@ -101,6 +103,7 @@ geo_st_isclosed_op, geo_x_op, geo_y_op, + GeoStBufferOp, GeoStDistanceOp, GeoStLengthOp, ) @@ -386,12 +389,15 @@ # Geo ops "geo_area_op", "geo_st_boundary_op", + "geo_st_centroid_op", + "geo_st_convexhull_op", "geo_st_difference_op", "geo_st_astext_op", "geo_st_geogfromtext_op", "geo_st_geogpoint_op", "geo_st_intersection_op", "geo_st_isclosed_op", + "GeoStBufferOp", "GeoStLengthOp", "geo_x_op", "geo_y_op", diff --git a/bigframes/operations/geo_ops.py b/bigframes/operations/geo_ops.py index 0268c63249..3b7754a47a 100644 --- a/bigframes/operations/geo_ops.py +++ b/bigframes/operations/geo_ops.py @@ -42,6 +42,22 @@ ) geo_st_boundary_op = GeoStBoundaryOp() +GeoStCentroidOp = base_ops.create_unary_op( + name="geo_st_centroid", + type_signature=op_typing.FixedOutputType( + dtypes.is_geo_like, dtypes.GEO_DTYPE, description="geo-like" + ), +) +geo_st_centroid_op = GeoStCentroidOp() + +GeoStConvexhullOp = base_ops.create_unary_op( + name="geo_st_convexhull", + type_signature=op_typing.FixedOutputType( + dtypes.is_geo_like, dtypes.GEO_DTYPE, description="geo-like" + ), +) +geo_st_convexhull_op = GeoStConvexhullOp() + GeoStDifferenceOp = base_ops.create_binary_op( name="geo_st_difference", type_signature=op_typing.BinaryGeo() ) @@ -90,6 +106,17 @@ geo_st_intersection_op = GeoStIntersectionOp() +@dataclasses.dataclass(frozen=True) +class GeoStBufferOp(base_ops.UnaryOp): + name = "st_buffer" + buffer_radius: float + num_seg_quarter_circle: float + use_spheroid: bool + + def output_type(self, *input_types: dtypes.ExpressionType) -> dtypes.ExpressionType: + return dtypes.GEO_DTYPE + + @dataclasses.dataclass(frozen=True) class GeoStDistanceOp(base_ops.BinaryOp): name = "st_distance" diff --git a/specs/2025-08-04-geoseries-scalars.md b/specs/2025-08-04-geoseries-scalars.md new file mode 100644 index 0000000000..38dc77c4cf --- /dev/null +++ b/specs/2025-08-04-geoseries-scalars.md @@ -0,0 +1,307 @@ +# Implementing GeoSeries scalar operators + +This project is to implement all GeoSeries scalar properties and methods in the +`bigframes.geopandas.GeoSeries` class. Likewise, all BigQuery GEOGRAPHY +functions should be exposed in the `bigframes.bigquery` module. + +## Background + +*Explain the context and why this change is necessary.* +*Include links to relevant issues or documentation.* + +* https://geopandas.org/en/stable/docs/reference/geoseries.html +* https://cloud.google.com/bigquery/docs/reference/standard-sql/geography_functions + +## Acceptance Criteria + +*Define the specific, measurable outcomes that indicate the task is complete.* +*Use a checklist format for clarity.* + +### GeoSeries methods and properties + +- [x] Constructor +- [x] GeoSeries.area +- [x] GeoSeries.boundary +- [ ] GeoSeries.bounds +- [ ] GeoSeries.total_bounds +- [x] GeoSeries.length +- [ ] GeoSeries.geom_type +- [ ] GeoSeries.offset_curve +- [x] GeoSeries.distance +- [ ] GeoSeries.hausdorff_distance +- [ ] GeoSeries.frechet_distance +- [ ] GeoSeries.representative_point +- [ ] GeoSeries.exterior +- [ ] GeoSeries.interiors +- [ ] GeoSeries.minimum_bounding_radius +- [ ] GeoSeries.minimum_clearance +- [x] GeoSeries.x +- [x] GeoSeries.y +- [ ] GeoSeries.z +- [ ] GeoSeries.m +- [ ] GeoSeries.get_coordinates +- [ ] GeoSeries.count_coordinates +- [ ] GeoSeries.count_geometries +- [ ] GeoSeries.count_interior_rings +- [ ] GeoSeries.set_precision +- [ ] GeoSeries.get_precision +- [ ] GeoSeries.get_geometry +- [x] GeoSeries.is_closed +- [ ] GeoSeries.is_empty +- [ ] GeoSeries.is_ring +- [ ] GeoSeries.is_simple +- [ ] GeoSeries.is_valid +- [ ] GeoSeries.is_valid_reason +- [ ] GeoSeries.is_valid_coverage +- [ ] GeoSeries.invalid_coverage_edges +- [ ] GeoSeries.has_m +- [ ] GeoSeries.has_z +- [ ] GeoSeries.is_ccw +- [ ] GeoSeries.contains +- [ ] GeoSeries.contains_properly +- [ ] GeoSeries.crosses +- [ ] GeoSeries.disjoint +- [ ] GeoSeries.dwithin +- [ ] GeoSeries.geom_equals +- [ ] GeoSeries.geom_equals_exact +- [ ] GeoSeries.geom_equals_identical +- [ ] GeoSeries.intersects +- [ ] GeoSeries.overlaps +- [ ] GeoSeries.touches +- [ ] GeoSeries.within +- [ ] GeoSeries.covers +- [ ] GeoSeries.covered_by +- [ ] GeoSeries.relate +- [ ] GeoSeries.relate_pattern +- [ ] GeoSeries.clip_by_rect +- [x] GeoSeries.difference +- [x] GeoSeries.intersection +- [ ] GeoSeries.symmetric_difference +- [ ] GeoSeries.union +- [x] GeoSeries.boundary +- [x] GeoSeries.buffer +- [x] GeoSeries.centroid +- [ ] GeoSeries.concave_hull +- [x] GeoSeries.convex_hull +- [ ] GeoSeries.envelope +- [ ] GeoSeries.extract_unique_points +- [ ] GeoSeries.force_2d +- [ ] GeoSeries.force_3d +- [ ] GeoSeries.make_valid +- [ ] GeoSeries.minimum_bounding_circle +- [ ] GeoSeries.maximum_inscribed_circle +- [ ] GeoSeries.minimum_clearance +- [ ] GeoSeries.minimum_clearance_line +- [ ] GeoSeries.minimum_rotated_rectangle +- [ ] GeoSeries.normalize +- [ ] GeoSeries.orient_polygons +- [ ] GeoSeries.remove_repeated_points +- [ ] GeoSeries.reverse +- [ ] GeoSeries.sample_points +- [ ] GeoSeries.segmentize +- [ ] GeoSeries.shortest_line +- [ ] GeoSeries.simplify +- [ ] GeoSeries.simplify_coverage +- [ ] GeoSeries.snap +- [ ] GeoSeries.transform +- [ ] GeoSeries.affine_transform +- [ ] GeoSeries.rotate +- [ ] GeoSeries.scale +- [ ] GeoSeries.skew +- [ ] GeoSeries.translate +- [ ] GeoSeries.interpolate +- [ ] GeoSeries.line_merge +- [ ] GeoSeries.project +- [ ] GeoSeries.shared_paths +- [ ] GeoSeries.build_area +- [ ] GeoSeries.constrained_delaunay_triangles +- [ ] GeoSeries.delaunay_triangles +- [ ] GeoSeries.explode +- [ ] GeoSeries.intersection_all +- [ ] GeoSeries.polygonize +- [ ] GeoSeries.union_all +- [ ] GeoSeries.voronoi_polygons +- [ ] GeoSeries.from_arrow +- [ ] GeoSeries.from_file +- [ ] GeoSeries.from_wkb +- [x] GeoSeries.from_wkt +- [x] GeoSeries.from_xy +- [ ] GeoSeries.to_arrow +- [ ] GeoSeries.to_file +- [ ] GeoSeries.to_json +- [ ] GeoSeries.to_wkb +- [x] GeoSeries.to_wkt +- [ ] GeoSeries.crs +- [ ] GeoSeries.set_crs +- [ ] GeoSeries.to_crs +- [ ] GeoSeries.estimate_utm_crs +- [ ] GeoSeries.fillna +- [ ] GeoSeries.isna +- [ ] GeoSeries.notna +- [ ] GeoSeries.clip +- [ ] GeoSeries.plot +- [ ] GeoSeries.explore +- [ ] GeoSeries.sindex +- [ ] GeoSeries.has_sindex +- [ ] GeoSeries.cx +- [ ] GeoSeries.__geo_interface__ + +### `bigframes.pandas` methods + +Constructors: Functions that build new geography values from coordinates or +existing geographies. + +- [x] ST_GEOGPOINT +- [ ] ST_MAKELINE +- [ ] ST_MAKEPOLYGON +- [ ] ST_MAKEPOLYGONORIENTED + +Parsers ST_GEOGFROM: Functions that create geographies from an external format +such as WKT and GeoJSON. + +- [ ] ST_GEOGFROMGEOJSON +- [x] ST_GEOGFROMTEXT +- [ ] ST_GEOGFROMWKB +- [ ] ST_GEOGPOINTFROMGEOHASH + +Formatters: Functions that export geographies to an external format such as WKT. + +- [ ] ST_ASBINARY +- [ ] ST_ASGEOJSON +- [x] ST_ASTEXT +- [ ] ST_GEOHASH + +Transformations: Functions that generate a new geography based on input. + +- [x] ST_BOUNDARY +- [x] ST_BUFFER +- [ ] ST_BUFFERWITHTOLERANCE +- [x] ST_CENTROID +- [ ] ST_CENTROID_AGG (Aggregate) +- [ ] ST_CLOSESTPOINT +- [x] ST_CONVEXHULL +- [x] ST_DIFFERENCE +- [ ] ST_EXTERIORRING +- [ ] ST_INTERIORRINGS +- [x] ST_INTERSECTION +- [ ] ST_LINEINTERPOLATEPOINT +- [ ] ST_LINESUBSTRING +- [ ] ST_SIMPLIFY +- [ ] ST_SNAPTOGRID +- [ ] ST_UNION +- [ ] ST_UNION_AGG (Aggregate) + +Accessors: Functions that provide access to properties of a geography without +side-effects. + +- [ ] ST_DIMENSION +- [ ] ST_DUMP +- [ ] ST_ENDPOINT +- [ ] ST_GEOMETRYTYPE +- [x] ST_ISCLOSED +- [ ] ST_ISCOLLECTION +- [ ] ST_ISEMPTY +- [ ] ST_ISRING +- [ ] ST_NPOINTS +- [ ] ST_NUMGEOMETRIES +- [ ] ST_NUMPOINTS +- [ ] ST_POINTN +- [ ] ST_STARTPOINT +- [x] ST_X +- [x] ST_Y + +Predicates: Functions that return TRUE or FALSE for some spatial relationship +between two geographies or some property of a geography. These functions are +commonly used in filter clauses. + +- [ ] ST_CONTAINS +- [ ] ST_COVEREDBY +- [ ] ST_COVERS +- [ ] ST_DISJOINT +- [ ] ST_DWITHIN +- [ ] ST_EQUALS +- [ ] ST_HAUSDORFFDWITHIN +- [ ] ST_INTERSECTS +- [ ] ST_INTERSECTSBOX +- [ ] ST_TOUCHES +- [ ] ST_WITHIN + +Measures: Functions that compute measurements of one or more geographies. + +- [ ] ST_ANGLE +- [x] ST_AREA +- [ ] ST_AZIMUTH +- [ ] ST_BOUNDINGBOX +- [x] ST_DISTANCE +- [ ] ST_EXTENT (Aggregate) +- [ ] ST_HAUSDORFFDISTANCE +- [ ] ST_LINELOCATEPOINT +- [x] ST_LENGTH +- [ ] ST_MAXDISTANCE +- [ ] ST_PERIMETER + +Clustering: Functions that perform clustering on geographies. + +- [ ] ST_CLUSTERDBSCAN + +S2 functions: Functions for working with S2 cell coverings of GEOGRAPHY. + +- [ ] S2_CELLIDFROMPOINT +- [ ] S2_COVERINGCELLIDS + +Raster functions: Functions for analyzing geospatial rasters using geographies. + +- [ ] ST_REGIONSTATS + +## Detailed Steps + +*Break down the implementation into small, actionable steps.* +*This section will guide the development process.* + +### Implementing a new scalar geography operation + +- [ ] **Define the operation dataclass:** + - [ ] In `bigframes/operations/geo_ops.py`, create a new dataclass inheriting from `base_ops.UnaryOp` or `base_ops.BinaryOp`. + - [ ] Define the `name` of the operation and any parameters it requires. + - [ ] Implement the `output_type` method to specify the data type of the result. +- [ ] **Export the new operation:** + - [ ] In `bigframes/operations/__init__.py`, import your new operation dataclass and add it to the `__all__` list. +- [ ] **Implement the compilation logic:** + - [ ] In `bigframes/core/compile/scalar_op_compiler.py`: + - [ ] If the BigQuery function has a direct equivalent in Ibis, you can often reuse an existing Ibis method. + - [ ] If not, define a new Ibis UDF using `@ibis_udf.scalar.builtin` to map to the specific BigQuery function signature. + - [ ] Create a new compiler implementation function (e.g., `geo_length_op_impl`). + - [ ] Register this function to your operation dataclass using `@scalar_op_compiler.register_unary_op` or `@scalar_op_compiler.register_binary_op`. +- [ ] **Implement the user-facing function or property:** + - [ ] For a `bigframes.bigquery` function: + - [ ] In `bigframes/bigquery/_operations/geo.py`, create the user-facing function (e.g., `st_length`). + - [ ] The function should take a `Series` and any other parameters. + - [ ] Inside the function, call `series._apply_unary_op` or `series._apply_binary_op`, passing the operation dataclass you created. + - [ ] Add a comprehensive docstring with examples. + - [ ] In `bigframes/bigquery/__init__.py`, import your new user-facing function and add it to the `__all__` list. + - [ ] For a `GeoSeries` property or method: + - [ ] In `bigframes/geopandas/geoseries.py`, create the property or method. + - [ ] If the operation is not possible to be supported, such as if the + geopandas method returns values in units corresponding to the + coordinate system rather than meters that BigQuery uses, raise a + `NotImplementedError` with a helpful message. + - [ ] Otherwise, call `series._apply_unary_op` or `series._apply_binary_op`, passing the operation dataclass. + - [ ] Add a comprehensive docstring with examples. +- [ ] **Add Tests:** + - [ ] Add system tests in `tests/system/small/bigquery/test_geo.py` or `tests/system/small/geopandas/test_geoseries.py` to verify the end-to-end functionality. Test various inputs, including edge cases and `NULL` values. + - [ ] If you are overriding a pandas or GeoPandas property and raising `NotImplementedError`, add a unit test to ensure the correct error is raised. + +## Verification + +*Specify the commands to run to verify the changes.* + +- [ ] The `nox -r -s format lint lint_setup_py` linter should pass. +- [ ] The `nox -r -s mypy` static type checker should pass. +- [ ] The `nox -r -s docs docfx` docs should successfully build and include relevant docs in the output. +- [ ] All new and existing unit tests `pytest tests/unit` should pass. +- [ ] Identify all related system tests in the `tests/system` directories. +- [ ] All related system tests `pytest tests/system/small/path_to_relevant_test.py::test_name` should pass. + +## Constraints + +Follow the guidelines listed in GEMINI.md at the root of the repository. diff --git a/specs/TEMPLATE.md b/specs/TEMPLATE.md new file mode 100644 index 0000000000..0d93035dcc --- /dev/null +++ b/specs/TEMPLATE.md @@ -0,0 +1,47 @@ +# Title of the Specification + +*Provide a brief overview of the feature or bug.* + +## Background + +*Explain the context and why this change is necessary.* +*Include links to relevant issues or documentation.* + +## Acceptance Criteria + +*Define the specific, measurable outcomes that indicate the task is complete.* +*Use a checklist format for clarity.* + +- [ ] Criterion 1 +- [ ] Criterion 2 +- [ ] Criterion 3 + +## Detailed Steps + +*Break down the implementation into small, actionable steps.* +*This section will guide the development process.* + +### 1. Step One + +- [ ] Action 1.1 +- [ ] Action 1.2 + +### 2. Step Two + +- [ ] Action 2.1 +- [ ] Action 2.2 + +## Verification + +*Specify the commands to run to verify the changes.* + +- [ ] The `nox -r -s format lint lint_setup_py` linter should pass. +- [ ] The `nox -r -s mypy` static type checker should pass. +- [ ] The `nox -r -s docs docfx` docs should successfully build and include relevant docs in the output. +- [ ] All new and existing unit tests `pytest tests/unit` should pass. +- [ ] Identify all related system tests in the `tests/system` directories. +- [ ] All related system tests `pytest tests/system/small/path_to_relevant_test.py::test_name` should pass. + +## Constraints + +Follow the guidelines listed in GEMINI.md at the root of the repository. diff --git a/tests/system/small/bigquery/test_geo.py b/tests/system/small/bigquery/test_geo.py index f888fd0364..c89ca59aca 100644 --- a/tests/system/small/bigquery/test_geo.py +++ b/tests/system/small/bigquery/test_geo.py @@ -12,6 +12,8 @@ # See the License for the specific language governing permissions and # limitations under the License. +from __future__ import annotations + import geopandas # type: ignore import pandas as pd import pandas.testing @@ -29,9 +31,10 @@ from bigframes.bigquery import st_length import bigframes.bigquery as bbq import bigframes.geopandas +import bigframes.session -def test_geo_st_area(): +def test_geo_st_area(session: bigframes.session.Session): data = [ Polygon([(0.000, 0.0), (0.001, 0.001), (0.000, 0.001)]), Polygon([(0.0010, 0.004), (0.009, 0.005), (0.0010, 0.005)]), @@ -41,7 +44,7 @@ def test_geo_st_area(): ] geopd_s = geopandas.GeoSeries(data=data, crs="EPSG:4326") - geobf_s = bigframes.geopandas.GeoSeries(data=data) + geobf_s = bigframes.geopandas.GeoSeries(data=data, session=session) # For `geopd_s`, the data was further projected with `geopandas.GeoSeries.to_crs` # to `to_crs(26393)` to get the area in square meter. See: https://geopandas.org/en/stable/docs/user_guide/projections.html @@ -123,7 +126,7 @@ def test_st_length_various_geometries(session): ) # type: ignore -def test_geo_st_difference_with_geometry_objects(): +def test_geo_st_difference_with_geometry_objects(session: bigframes.session.Session): data1 = [ Polygon([(0, 0), (10, 0), (10, 10), (0, 0)]), Polygon([(0, 0), (1, 1), (0, 1), (0, 0)]), @@ -136,8 +139,8 @@ def test_geo_st_difference_with_geometry_objects(): LineString([(2, 0), (0, 2)]), ] - geobf_s1 = bigframes.geopandas.GeoSeries(data=data1) - geobf_s2 = bigframes.geopandas.GeoSeries(data=data2) + geobf_s1 = bigframes.geopandas.GeoSeries(data=data1, session=session) + geobf_s2 = bigframes.geopandas.GeoSeries(data=data2, session=session) geobf_s_result = bbq.st_difference(geobf_s1, geobf_s2).to_pandas() expected = pd.Series( @@ -158,7 +161,9 @@ def test_geo_st_difference_with_geometry_objects(): ) -def test_geo_st_difference_with_single_geometry_object(): +def test_geo_st_difference_with_single_geometry_object( + session: bigframes.session.Session, +): pytest.importorskip( "shapely", minversion="2.0.0", @@ -171,7 +176,7 @@ def test_geo_st_difference_with_single_geometry_object(): Point(0, 1), ] - geobf_s1 = bigframes.geopandas.GeoSeries(data=data1) + geobf_s1 = bigframes.geopandas.GeoSeries(data=data1, session=session) geobf_s_result = bbq.st_difference( geobf_s1, Polygon([(0, 0), (10, 0), (10, 5), (0, 5), (0, 0)]), @@ -195,14 +200,16 @@ def test_geo_st_difference_with_single_geometry_object(): ) -def test_geo_st_difference_with_similar_geometry_objects(): +def test_geo_st_difference_with_similar_geometry_objects( + session: bigframes.session.Session, +): data1 = [ Polygon([(0, 0), (10, 0), (10, 10), (0, 0)]), Polygon([(0, 0), (1, 1), (0, 1)]), Point(0, 1), ] - geobf_s1 = bigframes.geopandas.GeoSeries(data=data1) + geobf_s1 = bigframes.geopandas.GeoSeries(data=data1, session=session) geobf_s_result = bbq.st_difference(geobf_s1, geobf_s1).to_pandas() expected = pd.Series( @@ -219,7 +226,7 @@ def test_geo_st_difference_with_similar_geometry_objects(): ) -def test_geo_st_distance_with_geometry_objects(): +def test_geo_st_distance_with_geometry_objects(session: bigframes.session.Session): data1 = [ # 0.00001 is approximately 1 meter. Polygon([(0, 0), (0.00001, 0), (0.00001, 0.00001), (0, 0.00001), (0, 0)]), @@ -252,8 +259,8 @@ def test_geo_st_distance_with_geometry_objects(): ), # No matching row in data1, so this will be NULL after the call to distance. ] - geobf_s1 = bigframes.geopandas.GeoSeries(data=data1) - geobf_s2 = bigframes.geopandas.GeoSeries(data=data2) + geobf_s1 = bigframes.geopandas.GeoSeries(data=data1, session=session) + geobf_s2 = bigframes.geopandas.GeoSeries(data=data2, session=session) geobf_s_result = bbq.st_distance(geobf_s1, geobf_s2).to_pandas() expected = pd.Series( @@ -275,7 +282,9 @@ def test_geo_st_distance_with_geometry_objects(): ) -def test_geo_st_distance_with_single_geometry_object(): +def test_geo_st_distance_with_single_geometry_object( + session: bigframes.session.Session, +): pytest.importorskip( "shapely", minversion="2.0.0", @@ -297,7 +306,7 @@ def test_geo_st_distance_with_single_geometry_object(): Point(0, 0.00002), ] - geobf_s1 = bigframes.geopandas.GeoSeries(data=data1) + geobf_s1 = bigframes.geopandas.GeoSeries(data=data1, session=session) geobf_s_result = bbq.st_distance( geobf_s1, Point(0, 0), @@ -320,7 +329,7 @@ def test_geo_st_distance_with_single_geometry_object(): ) -def test_geo_st_intersection_with_geometry_objects(): +def test_geo_st_intersection_with_geometry_objects(session: bigframes.session.Session): data1 = [ Polygon([(0, 0), (10, 0), (10, 10), (0, 0)]), Polygon([(0, 0), (1, 1), (0, 1), (0, 0)]), @@ -333,8 +342,8 @@ def test_geo_st_intersection_with_geometry_objects(): LineString([(2, 0), (0, 2)]), ] - geobf_s1 = bigframes.geopandas.GeoSeries(data=data1) - geobf_s2 = bigframes.geopandas.GeoSeries(data=data2) + geobf_s1 = bigframes.geopandas.GeoSeries(data=data1, session=session) + geobf_s2 = bigframes.geopandas.GeoSeries(data=data2, session=session) geobf_s_result = bbq.st_intersection(geobf_s1, geobf_s2).to_pandas() expected = pd.Series( @@ -355,7 +364,9 @@ def test_geo_st_intersection_with_geometry_objects(): ) -def test_geo_st_intersection_with_single_geometry_object(): +def test_geo_st_intersection_with_single_geometry_object( + session: bigframes.session.Session, +): pytest.importorskip( "shapely", minversion="2.0.0", @@ -368,7 +379,7 @@ def test_geo_st_intersection_with_single_geometry_object(): Point(0, 1), ] - geobf_s1 = bigframes.geopandas.GeoSeries(data=data1) + geobf_s1 = bigframes.geopandas.GeoSeries(data=data1, session=session) geobf_s_result = bbq.st_intersection( geobf_s1, Polygon([(0, 0), (10, 0), (10, 5), (0, 5), (0, 0)]), @@ -392,14 +403,16 @@ def test_geo_st_intersection_with_single_geometry_object(): ) -def test_geo_st_intersection_with_similar_geometry_objects(): +def test_geo_st_intersection_with_similar_geometry_objects( + session: bigframes.session.Session, +): data1 = [ Polygon([(0, 0), (10, 0), (10, 10), (0, 0)]), Polygon([(0, 0), (1, 1), (0, 1)]), Point(0, 1), ] - geobf_s1 = bigframes.geopandas.GeoSeries(data=data1) + geobf_s1 = bigframes.geopandas.GeoSeries(data=data1, session=session) geobf_s_result = bbq.st_intersection(geobf_s1, geobf_s1).to_pandas() expected = pd.Series( @@ -420,7 +433,7 @@ def test_geo_st_intersection_with_similar_geometry_objects(): ) -def test_geo_st_isclosed(): +def test_geo_st_isclosed(session: bigframes.session.Session): bf_gs = bigframes.geopandas.GeoSeries( [ Point(0, 0), # Point @@ -428,12 +441,15 @@ def test_geo_st_isclosed(): LineString([(0, 0), (1, 1), (0, 1), (0, 0)]), # Closed LineString Polygon([(0, 0), (1, 1), (0, 1)]), # Open polygon GeometryCollection(), # Empty GeometryCollection - bigframes.geopandas.GeoSeries.from_wkt(["GEOMETRYCOLLECTION EMPTY"]).iloc[ + bigframes.geopandas.GeoSeries.from_wkt( + ["GEOMETRYCOLLECTION EMPTY"], session=session + ).iloc[ 0 ], # Also empty None, # Should be filtered out by dropna ], index=[0, 1, 2, 3, 4, 5, 6], + session=session, ) bf_result = bbq.st_isclosed(bf_gs).to_pandas() @@ -455,3 +471,12 @@ def test_geo_st_isclosed(): # We default to Int64 (nullable) dtype, but pandas defaults to int64 index. check_index_type=False, ) + + +def test_st_buffer(session): + geoseries = bigframes.geopandas.GeoSeries( + [Point(0, 0), LineString([(1, 1), (2, 2)])], session=session + ) + result = bbq.st_buffer(geoseries, 1000).to_pandas() + assert result.iloc[0].geom_type == "Polygon" + assert result.iloc[1].geom_type == "Polygon" diff --git a/tests/system/small/geopandas/test_geoseries.py b/tests/system/small/geopandas/test_geoseries.py index 51344edcbd..a2f0759161 100644 --- a/tests/system/small/geopandas/test_geoseries.py +++ b/tests/system/small/geopandas/test_geoseries.py @@ -12,6 +12,8 @@ # See the License for the specific language governing permissions and # limitations under the License. +from __future__ import annotations + import re import bigframes_vendored.constants as constants @@ -31,6 +33,7 @@ import bigframes.geopandas import bigframes.pandas import bigframes.series +import bigframes.session from bigframes.testing.utils import assert_series_equal @@ -75,7 +78,7 @@ def test_geo_y(urban_areas_dfs): ) -def test_geo_area_not_supported(): +def test_geo_area_not_supported(session: bigframes.session.Session): s = bigframes.pandas.Series( [ Polygon([(0, 0), (1, 1), (0, 1)]), @@ -85,6 +88,7 @@ def test_geo_area_not_supported(): Point(0, 1), ], dtype=GeometryDtype(), + session=session, ) bf_series: bigframes.geopandas.GeoSeries = s.geo with pytest.raises( @@ -107,7 +111,7 @@ def test_geoseries_length_property_not_implemented(session): _ = gs.length -def test_geo_distance_not_supported(): +def test_geo_distance_not_supported(session: bigframes.session.Session): s1 = bigframes.pandas.Series( [ Polygon([(0, 0), (1, 1), (0, 1)]), @@ -117,6 +121,7 @@ def test_geo_distance_not_supported(): Point(0, 1), ], dtype=GeometryDtype(), + session=session, ) s2 = bigframes.geopandas.GeoSeries( [ @@ -125,7 +130,8 @@ def test_geo_distance_not_supported(): Polygon([(0, 0), (2, 2), (2, 0)]), LineString([(0, 0), (1, 1), (0, 1)]), Point(0, 1), - ] + ], + session=session, ) with pytest.raises( NotImplementedError, @@ -134,11 +140,11 @@ def test_geo_distance_not_supported(): s1.geo.distance(s2) -def test_geo_from_xy(): +def test_geo_from_xy(session: bigframes.session.Session): x = [2.5, 5, -3.0] y = [0.5, 1, 1.5] bf_result = ( - bigframes.geopandas.GeoSeries.from_xy(x, y) + bigframes.geopandas.GeoSeries.from_xy(x, y, session=session) .astype(geopandas.array.GeometryDtype()) .to_pandas() ) @@ -154,7 +160,7 @@ def test_geo_from_xy(): ) -def test_geo_from_wkt(): +def test_geo_from_wkt(session: bigframes.session.Session): wkts = [ "Point(0 1)", "Point(2 4)", @@ -162,7 +168,9 @@ def test_geo_from_wkt(): "Point(6 8)", ] - bf_result = bigframes.geopandas.GeoSeries.from_wkt(wkts).to_pandas() + bf_result = bigframes.geopandas.GeoSeries.from_wkt( + wkts, session=session + ).to_pandas() pd_result = geopandas.GeoSeries.from_wkt(wkts) @@ -174,14 +182,15 @@ def test_geo_from_wkt(): ) -def test_geo_to_wkt(): +def test_geo_to_wkt(session: bigframes.session.Session): bf_geo = bigframes.geopandas.GeoSeries( [ Point(0, 1), Point(2, 4), Point(5, 3), Point(6, 8), - ] + ], + session=session, ) pd_geo = geopandas.GeoSeries( @@ -209,8 +218,8 @@ def test_geo_to_wkt(): ) -def test_geo_boundary(): - bf_s = bigframes.pandas.Series( +def test_geo_boundary(session: bigframes.session.Session): + bf_s = bigframes.series.Series( [ Polygon([(0, 0), (1, 1), (0, 1)]), Polygon([(10, 0), (10, 5), (0, 0)]), @@ -218,6 +227,7 @@ def test_geo_boundary(): LineString([(0, 0), (1, 1), (0, 1)]), Point(0, 1), ], + session=session, ) pd_s = geopandas.GeoSeries( @@ -229,6 +239,7 @@ def test_geo_boundary(): Point(0, 1), ], index=pd.Index([0, 1, 2, 3, 4], dtype="Int64"), + crs="WGS84", ) bf_result = bf_s.geo.boundary.to_pandas() @@ -246,7 +257,7 @@ def test_geo_boundary(): # For example, when the difference between two polygons is empty, # GeoPandas returns 'POLYGON EMPTY' while GeoSeries returns 'GeometryCollection([])'. # This is why we are hard-coding the expected results. -def test_geo_difference_with_geometry_objects(): +def test_geo_difference_with_geometry_objects(session: bigframes.session.Session): data1 = [ Polygon([(0, 0), (10, 0), (10, 10), (0, 0)]), Polygon([(0, 0), (1, 1), (0, 1), (0, 0)]), @@ -259,8 +270,8 @@ def test_geo_difference_with_geometry_objects(): LineString([(2, 0), (0, 2)]), ] - bf_s1 = bigframes.geopandas.GeoSeries(data=data1) - bf_s2 = bigframes.geopandas.GeoSeries(data=data2) + bf_s1 = bigframes.geopandas.GeoSeries(data=data1, session=session) + bf_s2 = bigframes.geopandas.GeoSeries(data=data2, session=session) bf_result = bf_s1.difference(bf_s2).to_pandas() @@ -271,6 +282,7 @@ def test_geo_difference_with_geometry_objects(): Point(0, 1), ], index=[0, 1, 2], + session=session, ).to_pandas() assert bf_result.dtype == "geometry" @@ -279,20 +291,21 @@ def test_geo_difference_with_geometry_objects(): assert expected.iloc[2].equals(bf_result.iloc[2]) -def test_geo_difference_with_single_geometry_object(): +def test_geo_difference_with_single_geometry_object(session: bigframes.session.Session): data1 = [ Polygon([(0, 0), (10, 0), (10, 10), (0, 0)]), Polygon([(4, 2), (6, 2), (8, 6), (4, 2)]), Point(0, 1), ] - bf_s1 = bigframes.geopandas.GeoSeries(data=data1) + bf_s1 = bigframes.geopandas.GeoSeries(data=data1, session=session) bf_result = bf_s1.difference( bigframes.geopandas.GeoSeries( [ Polygon([(0, 0), (10, 0), (10, 10), (0, 0)]), Polygon([(1, 0), (0, 5), (0, 0), (1, 0)]), - ] + ], + session=session, ), ).to_pandas() @@ -303,6 +316,7 @@ def test_geo_difference_with_single_geometry_object(): None, ], index=[0, 1, 2], + session=session, ).to_pandas() assert bf_result.dtype == "geometry" @@ -311,19 +325,22 @@ def test_geo_difference_with_single_geometry_object(): assert expected.iloc[2] == bf_result.iloc[2] -def test_geo_difference_with_similar_geometry_objects(): +def test_geo_difference_with_similar_geometry_objects( + session: bigframes.session.Session, +): data1 = [ Polygon([(0, 0), (10, 0), (10, 10), (0, 0)]), Polygon([(0, 0), (1, 1), (0, 1)]), Point(0, 1), ] - bf_s1 = bigframes.geopandas.GeoSeries(data=data1) + bf_s1 = bigframes.geopandas.GeoSeries(data=data1, session=session) bf_result = bf_s1.difference(bf_s1).to_pandas() expected = bigframes.geopandas.GeoSeries( [GeometryCollection([]), GeometryCollection([]), GeometryCollection([])], index=[0, 1, 2], + session=session, ).to_pandas() assert bf_result.dtype == "geometry" @@ -332,9 +349,10 @@ def test_geo_difference_with_similar_geometry_objects(): assert expected.iloc[2].equals(bf_result.iloc[2]) -def test_geo_drop_duplicates(): +def test_geo_drop_duplicates(session: bigframes.session.Session): bf_series = bigframes.geopandas.GeoSeries( - [Point(1, 1), Point(2, 2), Point(3, 3), Point(2, 2)] + [Point(1, 1), Point(2, 2), Point(3, 3), Point(2, 2)], + session=session, ) pd_series = geopandas.GeoSeries( @@ -353,7 +371,7 @@ def test_geo_drop_duplicates(): # For example, when the intersection between two polygons is empty, # GeoPandas returns 'POLYGON EMPTY' while GeoSeries returns 'GeometryCollection([])'. # This is why we are hard-coding the expected results. -def test_geo_intersection_with_geometry_objects(): +def test_geo_intersection_with_geometry_objects(session: bigframes.session.Session): data1 = [ Polygon([(0, 0), (10, 0), (10, 10), (0, 0)]), Polygon([(0, 0), (1, 1), (0, 1), (0, 0)]), @@ -366,8 +384,8 @@ def test_geo_intersection_with_geometry_objects(): LineString([(2, 0), (0, 2)]), ] - bf_s1 = bigframes.geopandas.GeoSeries(data=data1) - bf_s2 = bigframes.geopandas.GeoSeries(data=data2) + bf_s1 = bigframes.geopandas.GeoSeries(data=data1, session=session) + bf_s2 = bigframes.geopandas.GeoSeries(data=data2, session=session) bf_result = bf_s1.intersection(bf_s2).to_pandas() @@ -377,6 +395,7 @@ def test_geo_intersection_with_geometry_objects(): Polygon([(0, 0), (1, 1), (0, 1), (0, 0)]), GeometryCollection([]), ], + session=session, ).to_pandas() assert bf_result.dtype == "geometry" @@ -385,20 +404,23 @@ def test_geo_intersection_with_geometry_objects(): assert expected.iloc[2].equals(bf_result.iloc[2]) -def test_geo_intersection_with_single_geometry_object(): +def test_geo_intersection_with_single_geometry_object( + session: bigframes.session.Session, +): data1 = [ Polygon([(0, 0), (10, 0), (10, 10), (0, 0)]), Polygon([(4, 2), (6, 2), (8, 6), (4, 2)]), Point(0, 1), ] - bf_s1 = bigframes.geopandas.GeoSeries(data=data1) + bf_s1 = bigframes.geopandas.GeoSeries(data=data1, session=session) bf_result = bf_s1.intersection( bigframes.geopandas.GeoSeries( [ Polygon([(0, 0), (10, 0), (10, 10), (0, 0)]), Polygon([(1, 0), (0, 5), (0, 0), (1, 0)]), - ] + ], + session=session, ), ).to_pandas() @@ -409,6 +431,7 @@ def test_geo_intersection_with_single_geometry_object(): None, ], index=[0, 1, 2], + session=session, ).to_pandas() assert bf_result.dtype == "geometry" @@ -417,14 +440,16 @@ def test_geo_intersection_with_single_geometry_object(): assert expected.iloc[2] == bf_result.iloc[2] -def test_geo_intersection_with_similar_geometry_objects(): +def test_geo_intersection_with_similar_geometry_objects( + session: bigframes.session.Session, +): data1 = [ Polygon([(0, 0), (10, 0), (10, 10), (0, 0)]), Polygon([(0, 0), (1, 1), (0, 1)]), Point(0, 1), ] - bf_s1 = bigframes.geopandas.GeoSeries(data=data1) + bf_s1 = bigframes.geopandas.GeoSeries(data=data1, session=session) bf_result = bf_s1.intersection(bf_s1).to_pandas() expected = bigframes.geopandas.GeoSeries( @@ -434,9 +459,119 @@ def test_geo_intersection_with_similar_geometry_objects(): Point(0, 1), ], index=[0, 1, 2], + session=session, ).to_pandas() assert bf_result.dtype == "geometry" assert expected.iloc[0].equals(bf_result.iloc[0]) assert expected.iloc[1].equals(bf_result.iloc[1]) assert expected.iloc[2].equals(bf_result.iloc[2]) + + +def test_geo_is_closed_not_supported(session: bigframes.session.Session): + s = bigframes.series.Series( + [ + Polygon([(0, 0), (1, 1), (0, 1)]), + Polygon([(10, 0), (10, 5), (0, 0)]), + Polygon([(0, 0), (2, 2), (2, 0)]), + LineString([(0, 0), (1, 1), (0, 1)]), + Point(0, 1), + ], + dtype=GeometryDtype(), + session=session, + ) + bf_series: bigframes.geopandas.GeoSeries = s.geo + with pytest.raises( + NotImplementedError, + match=re.escape( + f"GeoSeries.is_closed is not supported. Use bigframes.bigquery.st_isclosed(series), instead. {constants.FEEDBACK_LINK}" + ), + ): + bf_series.is_closed + + +def test_geo_buffer_raises_notimplemented(session: bigframes.session.Session): + """GeoPandas takes distance in units of the coordinate system, but BigQuery + uses meters. + """ + s = bigframes.geopandas.GeoSeries( + [ + Point(0, 0), + ], + session=session, + ) + with pytest.raises( + NotImplementedError, match=re.escape("bigframes.bigquery.st_buffer") + ): + s.buffer(1000) + + +def test_geo_centroid(session: bigframes.session.Session): + bf_s = bigframes.series.Series( + [ + Polygon([(0, 0), (0.1, 0.1), (0, 0.1)]), + LineString([(10, 10), (10.0001, 10.0001), (10, 10.0001)]), + Point(-10, -10), + ], + session=session, + ) + + pd_s = geopandas.GeoSeries( + [ + Polygon([(0, 0), (0.1, 0.1), (0, 0.1)]), + LineString([(10, 10), (10.0001, 10.0001), (10, 10.0001)]), + Point(-10, -10), + ], + index=pd.Index([0, 1, 2], dtype="Int64"), + crs="WGS84", + ) + + bf_result = bf_s.geo.centroid.to_pandas() + # Avoid warning that centroid is incorrect for geographic CRS. + # https://gis.stackexchange.com/a/401815/275289 + pd_result = pd_s.to_crs("+proj=cea").centroid.to_crs("WGS84") + + geopandas.testing.assert_geoseries_equal( + bf_result, + pd_result, + check_series_type=False, + check_index_type=False, + # BigQuery geography calculations are on a sphere, so results will be + # slightly different. + check_less_precise=True, + ) + + +def test_geo_convex_hull(session: bigframes.session.Session): + bf_s = bigframes.series.Series( + [ + Polygon([(0, 0), (1, 1), (0, 1)]), + Polygon([(10, 0), (10, 5), (0, 0)]), + Polygon([(0, 0), (2, 2), (2, 0)]), + LineString([(0, 0), (1, 1), (0, 1)]), + Point(0, 1), + ], + session=session, + ) + + pd_s = geopandas.GeoSeries( + [ + Polygon([(0, 0), (1, 1), (0, 1)]), + Polygon([(10, 0), (10, 5), (0, 0)]), + Polygon([(0, 0), (2, 2), (2, 0)]), + LineString([(0, 0), (1, 1), (0, 1)]), + Point(0, 1), + ], + index=pd.Index([0, 1, 2, 3, 4], dtype="Int64"), + crs="WGS84", + ) + + bf_result = bf_s.geo.convex_hull.to_pandas() + pd_result = pd_s.convex_hull + + geopandas.testing.assert_geoseries_equal( + bf_result, + pd_result, + check_series_type=False, + check_index_type=False, + )