-
Notifications
You must be signed in to change notification settings - Fork 412
Description
Feature Request / Improvement
Now that Java has added server-side scan planning support PR #14480, I believe Python is a great place to integrate this functionality! We have all the building blocks and they are almost brought to completion. I'm creating this issue to track all of the tasks we need to drive it through
Context
We have some open PRs with some needed model changes, but we have pivoted to using our existing models and ensuring they're properly serializable with pydantic.
For example, initially we can work on:
-
Expression Serialization to ensure
BooleanExpressionand subclasses serialize correctly for REST API. Related to @Fokko's work on RemoveGenericfrom expressions #2750 and @rambleraptor Server-side planning models #2435. -
DataFile Serialization to ensure we can properly deserialize the data/deletefiles from the server response. Open API uses kebab-case (
file-format,file-path) but our models expect snake_case (file_format,file_path).
REST API Endpoints to Implement
Based on the Iceberg REST spec:
POST /v1/{prefix}/namespaces/{namespace}/tables/{table}/plan- Submit scan for planningGET /v1/{prefix}/namespaces/{namespace}/tables/{table}/plan/{plan-id}- Fetch planning resultDELETE /v1/{prefix}/namespaces/{namespace}/tables/{table}/plan/{plan-id}- Cancel planningPOST /v1/{prefix}/namespaces/{namespace}/tables/{table}/tasks- Fetch scan tasks for a plan task
Tasks
Initially we can start with core sync planning and once that's in place we can add the async support as it looks like it exists in https://github.com/apache/iceberg/blob/main/core/src/test/java/org/apache/iceberg/rest/RESTCatalogAdapter.java
Core Sync Planning
- Build on @Fokko's expression work to ensure
Expressionclasses serialize properly with Pydantic. - Construct only plan Request/Response models for synchronous planning
- Add support for REST scan to fetch scan tasks replicating
DataScanbehavior - Parse server response to
FileScanTaskobjects (handle Data/DeleteFile construction) - Add
plan_table_scan()methods - Add documentation
Full Scan planning support (Follow-up)
Complete the full scan planning API with async operations and pagination.
- Add the rest of the models for async planning
- Add support for endpoint 2 & 3 to RESTScan
- Add endpoint 4 support to
RestCatalog - Complete documentation with all scenarios