Skip to content

feat(gis): add spatial/GIS query support with GeoParquet output#97

Open
zfarrell wants to merge 26 commits intomainfrom
feat/gis-query-support
Open

feat(gis): add spatial/GIS query support with GeoParquet output#97
zfarrell wants to merge 26 commits intomainfrom
feat/gis-query-support

Conversation

@zfarrell
Copy link
Contributor

@zfarrell zfarrell commented Feb 2, 2026

Summary

  • Add GIS/spatial column support for PostGIS, DuckDB, MySQL, and Snowflake sources
  • Automatically detect geometry columns and wrap with ST_AsBinary() for WKB extraction
  • Output GeoParquet 1.1.0 metadata for spatial columns
  • Add hex-decode pipeline for geometry columns in CSV/JSON uploads

Closes #86

@zfarrell zfarrell force-pushed the feat/gis-query-support branch from 43234f9 to a82cba4 Compare February 3, 2026 00:11
- Detect geometry columns in PostgreSQL/PostGIS with SRID metadata
- Fetch geometry data as WKB using ST_AsBinary()
- Write GeoParquet 1.1.0 metadata with CRS information
- Register geodatafusion spatial functions (st_area, st_distance, etc.)
- Add spatial type support for MySQL, Snowflake, and DuckDB backends
- Add GIS integration tests
Parse "geo" metadata from uploaded GeoParquet files and pass geometry
column info to the StreamingParquetWriter so output datasets maintain
GeoParquet 1.1.0 metadata.
Use kartoza/postgis:16-3.4 which supports arm64/amd64. Add retry logic
for container startup, fix tokio runtime requirements, and use only
geodatafusion-supported spatial functions.
Extend explicit column definitions to support geometry types with
SRID and geometry_type metadata. Geometry columns are stored as WKB
binary with GeoParquet metadata for spatial query support.
Load spatial extension in discover_tables_sync and
fetch_table_to_channel. Add graceful fallback in build_fetch_query
when ST_AsBinary is unavailable (bundled crate limitation).
Remove unused parse_geometry_type_params function. Use both udt_name
(for geometry detection) and data_type (for type mapping) in
discover_tables, matching the approach already used in fetch_table.
Previously, when a table had spatial columns but ST_AsBinary was
unavailable (bundled crate limitation), build_fetch_query silently
fell back to SELECT *. This produced DuckDB's internal geometry
format instead of WKB, but downstream code (GeoParquet writer)
would incorrectly treat it as WKB, causing data corruption.

Now returns an error with a clear message when spatial columns
are detected but the spatial extension isn't functional.
When information_schema query returns zero rows (case mismatch,
permissions), column_exprs is empty, producing invalid SQL like
SELECT  FROM .... Now falls back to SELECT * when column list empty.
@zfarrell zfarrell force-pushed the feat/gis-query-support branch from a82cba4 to ab4e778 Compare February 6, 2026 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for geography type

1 participant